├── README.md
├── Weed Mapping.ipynb
├── builders
    ├── __init__.py
    ├── frontend_builder.py
    └── model_builder.py
├── frontends
    ├── __init__.py
    ├── conv_blocks.py
    ├── inception_utils.py
    ├── inception_v4.py
    ├── mobilenet_base.py
    ├── mobilenet_v2.py
    ├── resnet_utils.py
    ├── resnet_v1.py
    ├── resnet_v2.py
    └── se_resnext.py
├── iou_vs_epochs.png
├── models
    ├── AdapNet.py
    ├── BiSeNet.py
    ├── DDSC.py
    ├── DeepLabV3.py
    ├── DeepLabV3_plus.py
    ├── DenseASPP.py
    ├── Encoder_Decoder.py
    ├── FC_DenseNet_Tiramisu.py
    ├── FRRN.py
    ├── GCN.py
    ├── ICNet.py
    ├── MobileUNet.py
    ├── PSPNet.py
    ├── RefineNet.py
    ├── __init__.py
    └── custom_model.py
├── predict.py
├── test.py
├── train.py
└── utils
    ├── __init__.py
    ├── get_pretrained_checkpoints.py
    ├── helpers.py
    └── utils.py


/README.md:
--------------------------------------------------------------------------------
  1 | ![banner cnns ppgcc ufsc](http://www.lapix.ufsc.br/wp-content/uploads/2019/06/VC-lapix.png)
  2 | 
  3 | # Weed-Mapping
  4 | Weed Mapping in Aerial Images through Identification and Segmentation of Crop Rows and Weeds using Convolutional Neural Networks
  5 | 
  6 | 
  7 | ![alt-text-10](http://www.lapix.ufsc.br/wp-content/uploads/2019/06/results2.png)
  8 | 
  9 | ## Description
 10 | This repository serves as a Weed Mapping Semantic Segmentation Suite. The goal is to easily be able to implement, train, and test new Semantic Segmentation models! 
 11 | 
 12 | It is based upon the, meanwhile deprecated, code repo at:  https://github.com/GeorgeSeif/Semantic-Segmentation-Suite. We, however, did not duplicate the whole repo and data here. Only the code that was necessary for our Weed Mapping apllication is here. Where modifications and extensions were needed, we did them. 
 13 | 
 14 | - The institutional code mirror repository for this work is at: https://codigos.ufsc.br/lapix/Weed-Mapping
 15 | 
 16 | We also added a Jupyter Notebook with the whole high-level code necessary for training and predicting crop rows and weed areas. The datasets we employed in our experiments are here:
 17 | 
 18 |  - http://www.lapix.ufsc.br/weed-mapping-sugar-cane (Large Sugar Cane Field – Northern Brazil - contains weeds)
 19 |  - http://www.lapix.ufsc.br/crop-rows-sugar-cane (Large Sugar Cane Field – Northern Brazil - contains only well-behaved crops)
 20 | 
 21 | This code repo is complete with the following:
 22 | 
 23 |  - Jupyter Notebook with the whole high-level code necessary for training and predicting crop rows and weed areas
 24 |  - Training and testing modes
 25 |  - Data augmentation
 26 |  - Several state-of-the-art models. Easily **plug and play** with different models
 27 |  - Able to use **any other** dataset besides our own
 28 |  - Evaluation including precision, recall, f1 score, average accuracy, per-class accuracy, and mean IoU
 29 |  - Plotting of loss function and accuracy over epochs
 30 | 
 31 | **Any suggestions to improve this repository, including any new segmentation models you would like to see are welcome!**
 32 | 
 33 | ## Frontends
 34 | 
 35 | The following feature extraction models are currently made available:
 36 | 
 37 | - [MobileNetV2](https://arxiv.org/abs/1801.04381), [ResNet50/101/152](https://arxiv.org/abs/1512.03385), and [InceptionV4](https://arxiv.org/abs/1602.07261)
 38 | 
 39 | ## Models
 40 | 
 41 | The following segmentation models are currently made available:
 42 | 
 43 | - [Encoder-Decoder based on SegNet](https://arxiv.org/abs/1511.00561). This network uses a VGG-style encoder-decoder, where the upsampling in the decoder is done using transposed convolutions.
 44 | 
 45 | - [Encoder-Decoder with skip connections based on SegNet](https://arxiv.org/abs/1511.00561). This network uses a VGG-style encoder-decoder, where the upsampling in the decoder is done using transposed convolutions. In addition, it employs additive skip connections from the encoder to the decoder. 
 46 | 
 47 | - [Mobile UNet for Semantic Segmentation](https://arxiv.org/abs/1704.04861). Combining the ideas of MobileNets Depthwise Separable Convolutions with UNet to build a high speed, low parameter Semantic Segmentation model.
 48 | 
 49 | - [Pyramid Scene Parsing Network](https://arxiv.org/abs/1612.01105). In this paper, the capability of global context information by different-region based context aggregation is applied through a pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). **Note that the original PSPNet uses a ResNet with dilated convolutions, but the one is this respository has only a regular ResNet.**
 50 | 
 51 | - [The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation](https://arxiv.org/abs/1611.09326). Uses a downsampling-upsampling style encoder-decoder network. Each stage i.e between the pooling layers uses dense blocks. In addition, it concatenated skip connections from the encoder to the decoder. In the code, this is the FC-DenseNet model.
 52 | 
 53 | - [Rethinking Atrous Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1706.05587). This is the DeepLabV3 network. Uses Atrous Spatial Pyramid Pooling to capture multi-scale context by using multiple atrous rates. This creates a large receptive field. 
 54 | 
 55 | - [RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation](https://arxiv.org/abs/1611.06612). A multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections. In this way, the deeper layers that capture high-level semantic features can be directly refined using fine-grained features from earlier convolutions.
 56 | 
 57 | - [Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes](https://arxiv.org/abs/1611.08323). Combines multi-scale context with pixel-level accuracy by using two processing streams within the network. The residual stream carries information at the full image resolution, enabling precise adherence to segment boundaries. The pooling stream undergoes a sequence of pooling operations
 58 | to obtain robust features for recognition. The two streams are coupled at the full image resolution using residuals. In the code, this is the FRRN model.
 59 | 
 60 | - [Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network](https://arxiv.org/abs/1703.02719). Proposes a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation. Uses large separable kernals to expand the receptive field, plus a boundary refinement block to further improve localization performance near boundaries. 
 61 | 
 62 | - [AdapNet: Adaptive Semantic Segmentation in Adverse Environmental Conditions](http://ais.informatik.uni-freiburg.de/publications/papers/valada17icra.pdf) Modifies the ResNet50 architecture by performing the lower resolution processing using a multi-scale strategy with atrous convolutions. This is a slightly modified version using bilinear upscaling instead of transposed convolutions as I found it gave better results.
 63 | 
 64 | - [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545). Proposes a compressed-PSPNet-based image cascade network (ICNet) that incorporates multi-resolution branches under proper label guidance to address this challenge. Most of the processing is done at low resolution for high speed and the multi-scale auxillary loss helps get an accurate model. **Note that for this model, I have implemented the network but have not integrated its training yet**
 65 | 
 66 | - [Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1802.02611). This is the DeepLabV3+ network which adds a Decoder module on top of the regular DeepLabV3 model.
 67 | 
 68 | - [DenseASPP for Semantic Segmentation in Street Scenes](http://openaccess.thecvf.com/content_cvpr_2018/html/Yang_DenseASPP_for_Semantic_CVPR_2018_paper.html). Combines many different scales using dilated convolution but with dense connections
 69 | 
 70 | - [Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation](http://openaccess.thecvf.com/content_cvpr_2018/html/Bilinski_Dense_Decoder_Shortcut_CVPR_2018_paper.html). Dense Decoder Shorcut Connections using dense connectivity in the decoder stage of the segmentation model. **Note: this network takes a bit of extra time to load due to the construction of the ResNeXt blocks** 
 71 | 
 72 | - [BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation](https://arxiv.org/abs/1808.00897). BiSeNet use a Spatial Path with a small stride to preserve the spatial information and generate high-resolution features while having a parallel Context Path with a fast downsampling strategy to obtain sufficient receptive field. 
 73 | 
 74 | - Or make your own and plug and play!
 75 | 
 76 | 
 77 | ## Files and Directories
 78 | 
 79 | - **weedMaping.ipynb**: Jupyter Notebook with the whole high-level code necessary for training and predicting crop rows and weed areas
 80 | 
 81 | - **train.py:** Training on the dataset of your choice. Default is CamVid
 82 | 
 83 | - **test.py:** Testing on the dataset of your choice. Default is CamVid
 84 | 
 85 | - **predict.py:** Use your newly trained model to run a prediction on a single image
 86 | 
 87 | - **helper.py:** Quick helper functions for data preparation and visualization
 88 | 
 89 | - **utils.py:** Utilities for printing, debugging, testing, and evaluation
 90 | 
 91 | - **models:** Folder containing all model files. Use this to build your models, or use a pre-built one
 92 | 
 93 | - **CamVid:** The CamVid datatset for Semantic Segmentation as a test bed. This is the 32 class version
 94 | 
 95 | - **checkpoints:** Checkpoint files for each epoch during training
 96 | 
 97 | - **Test:** Test results including images, per-class accuracies, precision, recall, and f1 score
 98 | 
 99 | 
100 | ## Installation
101 | This project has the following dependencies:
102 | 
103 | - Numpy `sudo pip install numpy`
104 | 
105 | - OpenCV Python `sudo apt-get install python-opencv`
106 | 
107 | - TensorFlow `sudo pip install --upgrade tensorflow-gpu`
108 | 
109 | ## Usage
110 | The only thing you have to do to get started is set up the folders in the following structure:
111 | 
112 |     ├── "dataset_name"                   
113 |     |   ├── train
114 |     |   ├── train_labels
115 |     |   ├── val
116 |     |   ├── val_labels
117 |     |   ├── test
118 |     |   ├── test_labels
119 | 
120 | Put a text file under the dataset directory called "class_dict.csv" which contains the list of classes along with the R, G, B colour labels to visualize the segmentation results. This kind of dictionairy is usually supplied with the dataset. Here is an example for the **Weed Mapping dataset**:
121 | 
122 | ```
123 | name,r,g,b
124 | SugarCane,0,255,0
125 | Soil,255,0,0
126 | Invasive,255,255,0
127 | ```
128 | 
129 | **Note:** If you are using any of the networks that rely on a pre-trained ResNet, then you will need to download the pre-trained weights using the provided script. These are currently: PSPNet, RefineNet, DeepLabV3, DeepLabV3+, GCN.
130 | 
131 | Then you can simply run `train.py`! Check out the optional command line arguments:
132 | 
133 | ```
134 | usage: train.py [-h] [--num_epochs NUM_EPOCHS]
135 |                 [--checkpoint_step CHECKPOINT_STEP]
136 |                 [--validation_step VALIDATION_STEP] [--image IMAGE]
137 |                 [--continue_training CONTINUE_TRAINING] [--dataset DATASET]
138 |                 [--crop_height CROP_HEIGHT] [--crop_width CROP_WIDTH]
139 |                 [--batch_size BATCH_SIZE] [--num_val_images NUM_VAL_IMAGES]
140 |                 [--h_flip H_FLIP] [--v_flip V_FLIP] [--brightness BRIGHTNESS]
141 |                 [--rotation ROTATION] [--model MODEL] [--frontend FRONTEND]
142 | 
143 | optional arguments:
144 |   -h, --help            show this help message and exit
145 |   --num_epochs NUM_EPOCHS
146 |                         Number of epochs to train for
147 |   --checkpoint_step CHECKPOINT_STEP
148 |                         How often to save checkpoints (epochs)
149 |   --validation_step VALIDATION_STEP
150 |                         How often to perform validation (epochs)
151 |   --image IMAGE         The image you want to predict on. Only valid in
152 |                         "predict" mode.
153 |   --continue_training CONTINUE_TRAINING
154 |                         Whether to continue training from a checkpoint
155 |   --dataset DATASET     Dataset you are using.
156 |   --crop_height CROP_HEIGHT
157 |                         Height of cropped input image to network
158 |   --crop_width CROP_WIDTH
159 |                         Width of cropped input image to network
160 |   --batch_size BATCH_SIZE
161 |                         Number of images in each batch
162 |   --num_val_images NUM_VAL_IMAGES
163 |                         The number of images to used for validations
164 |   --h_flip H_FLIP       Whether to randomly flip the image horizontally for
165 |                         data augmentation
166 |   --v_flip V_FLIP       Whether to randomly flip the image vertically for data
167 |                         augmentation
168 |   --brightness BRIGHTNESS
169 |                         Whether to randomly change the image brightness for
170 |                         data augmentation. Specifies the max bightness change
171 |                         as a factor between 0.0 and 1.0. For example, 0.1
172 |                         represents a max brightness change of 10% (+-).
173 |   --rotation ROTATION   Whether to randomly rotate the image for data
174 |                         augmentation. Specifies the max rotation angle in
175 |                         degrees.
176 |   --model MODEL         The model you are using. See model_builder.py for
177 |                         supported models
178 |   --frontend FRONTEND   The frontend you are using. See frontend_builder.py
179 |                         for supported models
180 | 
181 | ```
182 |   
183 | 
184 | 
185 | ## Acknowledgements
186 | This work was the result of a collaborative effort of a team of engaged researchers:
187 | - Alexandre Monteiro <xandemont@gmail.com>
188 | - Paulo Cesar Pereira Junior <pcpereiraj@gmail.com>
189 | - Antonio Carlos Sobieranski <asobieranski@gmail.com>
190 | - Rafael da Luz Ribeiro <rl.ribeiro@outlook.com>
191 | 
192 | 
193 | ## Citing this Git
194 | 
195 | 
196 | ```tex
197 | @misc{WeedMappingCode2019,
198 |   author = {Monteiro, A.A.O. and von Wangenheim, A.},
199 |   title = {Weed Mapping in Aerial Images through Identification and Segmentation of Crop Rows and Weeds},
200 |   year = {2019},
201 |   publisher = {GitHub},
202 |   journal = {GitHub repository},
203 |   howpublished = {\url{https://github.com/awangenh/Weed-Mapping}}
204 | }
205 | ```
206 | 
207 | ![banner Creative Commons INCoD UFSC](http://www.lapix.ufsc.br/wp-content/uploads/2019/05/cc.png)
208 | 
209 | 


--------------------------------------------------------------------------------
/Weed Mapping.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "![banner cnns ppgcc ufsc](http://www.lapix.ufsc.br/wp-content/uploads/2019/06/VC-lapix.png)\n",
  8 |     "\n",
  9 |     "# Weed Mapping in Aerial Images through Identification and Segmentation of Crop Rows and Weeds\n",
 10 |     "\n",
 11 |     "Notebook for Weed Mapping in Aerial Images through Identification and Segmentation of Crop Rows and Weeds using Convolutional Neural Networks "
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "markdown",
 16 |    "metadata": {},
 17 |    "source": [
 18 |     "<a href=\"https://colab.research.google.com/drive/1LAyKA2DM7QUMSxXn_VSW-z1PY2OLbjyX\"><img align=\"left\"  src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>&nbsp; &nbsp;<a href=\"\"><img align=\"left\" src=\"http://www.lapix.ufsc.br/wp-content/uploads/2019/04/License-CC-BY-ND-4.0-orange.png\" alt=\"Creative Commons 4.0 License\" title=\"Creative Commons 4.0 License\"></a>&nbsp; &nbsp; <a href=\"\"><img align=\"left\" src=\"http://www.lapix.ufsc.br/wp-content/uploads/2019/04/Jupyter-Notebook-v.1.0-blue.png\" alt=\"Jupyter Version\" title=\"Jupyter Version\"></a>&nbsp; &nbsp;<a href=\"\"><img align=\"left\"  src=\"http://www.lapix.ufsc.br/wp-content/uploads/2019/04/Python-v.3.7-green.png\" alt=\"Python Version\" title=\"Python Version\"></a> "
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "markdown",
 23 |    "metadata": {},
 24 |    "source": [
 25 |     "## Initializations and general instructions\n",
 26 |     "\n",
 27 |     "Networks for semantic segmentation classify objects into images and are able to associate individual pixels of the images with the object class they represent, performing in practice a segmentation of the image according to the semantics of the object to which each individual pixel is associated.\n",
 28 |     "\n",
 29 |     "In this work we use our own dataset containing RGB images of a sugarcane plantation applied to four models of CNN deployed in this repository that was adapted from the, already deprecated, code at: https://github.com/GeorgeSeif/Semantic-Segmentation-Suite by George Seif. This notebook assumes that you are using Google Colab. If not, please see the instructions of installation and usage described along our repository at: \n",
 30 |     " - https://github.com/awangenh/Weed-Mapping or\n",
 31 |     " - https://codigos.ufsc.br/lapix/Weed-Mapping\n",
 32 |     "\n",
 33 |     "We used the models: **SegNet, UNet, FRRN and PSPNet**. Some ground truths and respective results are shown in the first figure of the repo.\n",
 34 |     "\n",
 35 |     "Everything you need to know about training, testing and making predictions on your dataset  are explained in this notebook and in this repository depending on which platform you are using. To use Google Colab you don't need to import the Tensorflow Framework.\n",
 36 |     "\n",
 37 |     "## Setting you Dataset\n",
 38 |     "The first thing you need to acomplish is to organize the structure of the folders of your data as explained in the \"**Usage**\"\" part of the repository.\n",
 39 |     "Do not forget to edit the text file \"*class_dict.csv*\" specific for your information.\n",
 40 |     "\n",
 41 |     "Observe that our dataset was stored in a folder calle *Dataset_ArticleBackground*. The code below reflects this. You will have to adapt the code to your environment.\n",
 42 |     "\n",
 43 |     "After that, you just need to upload the content to the Drive.\n",
 44 |     "\n",
 45 |     "## Mounting your data:\n",
 46 |     "Next you need to define the place where all the scripts available in the repository and also your dataset are stored:\n"
 47 |    ]
 48 |   },
 49 |   {
 50 |    "cell_type": "code",
 51 |    "execution_count": null,
 52 |    "metadata": {},
 53 |    "outputs": [],
 54 |    "source": [
 55 |     "# Code to mount Google Drive\n",
 56 |     "import os\n",
 57 |     "from google.colab import drive\n",
 58 |     "drive.mount('/content/drive')"
 59 |    ]
 60 |   },
 61 |   {
 62 |    "cell_type": "markdown",
 63 |    "metadata": {},
 64 |    "source": [
 65 |     "# Check the processor\n",
 66 |     "\n",
 67 |     "To use the GPU available go to:\n",
 68 |     "\n",
 69 |     "Edit >> Notebook settings >> choose the Runtime type and GPU as Hardware accelerator.\n",
 70 |     "\n",
 71 |     "The code below is for you to check the version of the GPU being used."
 72 |    ]
 73 |   },
 74 |   {
 75 |    "cell_type": "code",
 76 |    "execution_count": null,
 77 |    "metadata": {},
 78 |    "outputs": [],
 79 |    "source": [
 80 |     "!/opt/bin/nvidia-smi\n",
 81 |     "!nvcc --version"
 82 |    ]
 83 |   },
 84 |   {
 85 |    "cell_type": "markdown",
 86 |    "metadata": {},
 87 |    "source": [
 88 |     "\n",
 89 |     "# Train the model\n",
 90 |     "\n",
 91 |     "Access the directory where you mounted your project and call the script to run the training of the model:\n",
 92 |     "\n",
 93 |     "In our work is the **train_balancing_metrics.py**\n",
 94 |     "\n",
 95 |     "Is also needed to give some parameters. We used the following:\n",
 96 |     "\n",
 97 |     "\n",
 98 |     "\n",
 99 |     "*   num_epochs = 200\n",
100 |     "\n",
101 |     "*   dataset = \"The folder where our dataset is located\"\n",
102 |     "\n",
103 |     "* num_val_images = 44, the number of images in our validation set\n",
104 |     "\n",
105 |     "* h_flip and v_flip = True, to use operations fo data augmentation\n",
106 |     "\n",
107 |     "* model = \"FRRN-B\", or any other model choosen\n",
108 |     "\n",
109 |     "* batch_size = 3 (worked for us!)\n",
110 |     "\n",
111 |     "* continue_training = False, to start training from the begining\n",
112 |     "\n",
113 |     "In the repository mentioned above, there is an explanation for all the parameters that can be used.\n",
114 |     "\n"
115 |    ]
116 |   },
117 |   {
118 |    "cell_type": "code",
119 |    "execution_count": null,
120 |    "metadata": {},
121 |    "outputs": [],
122 |    "source": [
123 |     "%cd /content/drive/My\\ Drive/DeepLearning/Semantic-Segmentation-Suite-master/\n",
124 |     "\n",
125 |     "!python train_balancing_metrics.py --num_epochs=200 --dataset=\"Dataset_ArticleBackground\" --num_val_images=44 --h_flip=True --v_flip=True --model=\"DeepLabV3\" --batch_size=3 --continue_training=False"
126 |    ]
127 |   },
128 |   {
129 |    "cell_type": "markdown",
130 |    "metadata": {},
131 |    "source": [
132 |     "# Test the model\n",
133 |     "\n",
134 |     "Here is the code to test you model over your test set.\n",
135 |     "\n",
136 |     "Call the test script (**test.py**) and pass the parameters\n",
137 |     "\n",
138 |     "The **checkpoint_path** is the path where the weights for that trained model are located."
139 |    ]
140 |   },
141 |   {
142 |    "cell_type": "code",
143 |    "execution_count": null,
144 |    "metadata": {},
145 |    "outputs": [],
146 |    "source": [
147 |     "%cd /content/drive/My\\ Drive/DeepLearning/Semantic-Segmentation-Suite-master/\n",
148 |     "\n",
149 |     "!python test.py --dataset=\"Dataset_ArticleBackground\" --model=\"FRRN-B\" --checkpoint_path='checkpoints/latest_model_FRRN-B_Dataset_ArticleBackground.ckpt' "
150 |    ]
151 |   },
152 |   {
153 |    "cell_type": "markdown",
154 |    "metadata": {},
155 |    "source": [
156 |     "# Make a Prediction\n",
157 |     "\n",
158 |     "This code is used when you want to make a prediction for new single images.\n",
159 |     "\n",
160 |     "Call the **predict.py** with the correct parameters."
161 |    ]
162 |   },
163 |   {
164 |    "cell_type": "code",
165 |    "execution_count": null,
166 |    "metadata": {},
167 |    "outputs": [],
168 |    "source": [
169 |     "%cd /content/drive/My\\ Drive/DeepLearning/Semantic-Segmentation-Suite-master/\n",
170 |     "\n",
171 |     "!python predict.py --dataset=\"Dataset_ArticleBackground\" --model=\"FRRN-B\" --checkpoint_path='checkpoints/latest_model_FRRN-B_Dataset_ArticleBackground.ckpt' --crop_height=512 --crop_width=512 --image=\"Dataset_ArticleBackground/test/115.png\""
172 |    ]
173 |   },
174 |   {
175 |    "cell_type": "markdown",
176 |    "metadata": {},
177 |    "source": [
178 |     "![banner Creative Commons INCoD UFSC](http://www.lapix.ufsc.br/wp-content/uploads/2019/05/cc.png)"
179 |    ]
180 |   }
181 |  ],
182 |  "metadata": {
183 |   "kernelspec": {
184 |    "display_name": "Python 3",
185 |    "language": "python",
186 |    "name": "python3"
187 |   },
188 |   "language_info": {
189 |    "codemirror_mode": {
190 |     "name": "ipython",
191 |     "version": 3
192 |    },
193 |    "file_extension": ".py",
194 |    "mimetype": "text/x-python",
195 |    "name": "python",
196 |    "nbconvert_exporter": "python",
197 |    "pygments_lexer": "ipython3",
198 |    "version": "3.7.1"
199 |   },
200 |   "varInspector": {
201 |    "cols": {
202 |     "lenName": "20",
203 |     "lenType": "20",
204 |     "lenVar": "60"
205 |    },
206 |    "kernels_config": {
207 |     "python": {
208 |      "delete_cmd_postfix": "",
209 |      "delete_cmd_prefix": "del ",
210 |      "library": "var_list.py",
211 |      "varRefreshCmd": "print(var_dic_list())"
212 |     },
213 |     "r": {
214 |      "delete_cmd_postfix": ") ",
215 |      "delete_cmd_prefix": "rm(",
216 |      "library": "var_list.r",
217 |      "varRefreshCmd": "cat(var_dic_list()) "
218 |     }
219 |    },
220 |    "types_to_exclude": [
221 |     "module",
222 |     "function",
223 |     "builtin_function_or_method",
224 |     "instance",
225 |     "_Feature"
226 |    ],
227 |    "window_display": false
228 |   }
229 |  },
230 |  "nbformat": 4,
231 |  "nbformat_minor": 2
232 | }
233 | 


--------------------------------------------------------------------------------
/builders/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awangenh/Weed-Mapping/72526ebbc2abe3b9d35672689de25a321e36b039/builders/__init__.py


--------------------------------------------------------------------------------
/builders/frontend_builder.py:
--------------------------------------------------------------------------------
 1 | import tensorflow as tf
 2 | from tensorflow.contrib import slim
 3 | from frontends import resnet_v2
 4 | from frontends import mobilenet_v2
 5 | from frontends import inception_v4
 6 | import os 
 7 | 
 8 | 
 9 | def build_frontend(inputs, frontend, is_training=True, pretrained_dir="models"):
10 |     if frontend == 'ResNet50':
11 |         with slim.arg_scope(resnet_v2.resnet_arg_scope()):
12 |             logits, end_points = resnet_v2.resnet_v2_50(inputs, is_training=is_training, scope='resnet_v2_50')
13 |             frontend_scope='resnet_v2_50'
14 |             init_fn = slim.assign_from_checkpoint_fn(model_path=os.path.join(pretrained_dir, 'resnet_v2_50.ckpt'), var_list=slim.get_model_variables('resnet_v2_50'), ignore_missing_vars=True)
15 |     elif frontend == 'ResNet101':
16 |         with slim.arg_scope(resnet_v2.resnet_arg_scope()):
17 |             logits, end_points = resnet_v2.resnet_v2_101(inputs, is_training=is_training, scope='resnet_v2_101')
18 |             frontend_scope='resnet_v2_101'
19 |             init_fn = slim.assign_from_checkpoint_fn(model_path=os.path.join(pretrained_dir, 'resnet_v2_101.ckpt'), var_list=slim.get_model_variables('resnet_v2_101'), ignore_missing_vars=True)
20 |     elif frontend == 'ResNet152':
21 |         with slim.arg_scope(resnet_v2.resnet_arg_scope()):
22 |             logits, end_points = resnet_v2.resnet_v2_152(inputs, is_training=is_training, scope='resnet_v2_152')
23 |             frontend_scope='resnet_v2_152'
24 |             init_fn = slim.assign_from_checkpoint_fn(model_path=os.path.join(pretrained_dir, 'resnet_v2_152.ckpt'), var_list=slim.get_model_variables('resnet_v2_152'), ignore_missing_vars=True)
25 |     elif frontend == 'MobileNetV2':
26 |         with slim.arg_scope(mobilenet_v2.training_scope()):
27 |             logits, end_points = mobilenet_v2.mobilenet(inputs, is_training=is_training, scope='mobilenet_v2', base_only=True)
28 |             frontend_scope='mobilenet_v2'
29 |             init_fn = slim.assign_from_checkpoint_fn(model_path=os.path.join(pretrained_dir, 'mobilenet_v2.ckpt'), var_list=slim.get_model_variables('mobilenet_v2'), ignore_missing_vars=True)
30 |     elif frontend == 'InceptionV4':
31 |         with slim.arg_scope(inception_v4.inception_v4_arg_scope()):
32 |             logits, end_points = inception_v4.inception_v4(inputs, is_training=is_training, scope='inception_v4')
33 |             frontend_scope='inception_v4'
34 |             init_fn = slim.assign_from_checkpoint_fn(model_path=os.path.join(pretrained_dir, 'inception_v4.ckpt'), var_list=slim.get_model_variables('inception_v4'), ignore_missing_vars=True)
35 |     else:
36 |         raise ValueError("Unsupported fronetnd model '%s'. This function only supports ResNet50, ResNet101, ResNet152, and MobileNetV2" % (frontend))
37 | 
38 |     return logits, end_points, frontend_scope, init_fn 


--------------------------------------------------------------------------------
/builders/model_builder.py:
--------------------------------------------------------------------------------
 1 | import sys, os
 2 | import tensorflow as tf
 3 | import subprocess
 4 | 
 5 | sys.path.append("models")
 6 | from models.FC_DenseNet_Tiramisu import build_fc_densenet
 7 | from models.Encoder_Decoder import build_encoder_decoder
 8 | from models.RefineNet import build_refinenet
 9 | from models.FRRN import build_frrn
10 | from models.MobileUNet import build_mobile_unet
11 | from models.PSPNet import build_pspnet
12 | from models.GCN import build_gcn
13 | from models.DeepLabV3 import build_deeplabv3
14 | from models.DeepLabV3_plus import build_deeplabv3_plus
15 | from models.AdapNet import build_adaptnet
16 | from models.custom_model import build_custom
17 | from models.DenseASPP import build_dense_aspp
18 | from models.DDSC import build_ddsc
19 | from models.BiSeNet import build_bisenet
20 | 
21 | SUPPORTED_MODELS = ["FC-DenseNet56", "FC-DenseNet67", "FC-DenseNet103", "Encoder-Decoder", "Encoder-Decoder-Skip", "RefineNet",
22 |     "FRRN-A", "FRRN-B", "MobileUNet", "MobileUNet-Skip", "PSPNet", "GCN", "DeepLabV3", "DeepLabV3_plus", "AdapNet", 
23 |     "DenseASPP", "DDSC", "BiSeNet", "custom"]
24 | 
25 | SUPPORTED_FRONTENDS = ["ResNet50", "ResNet101", "ResNet152", "MobileNetV2", "InceptionV4"]
26 | 
27 | def download_checkpoints(model_name):
28 |     subprocess.check_output(["python", "utils/get_pretrained_checkpoints.py", "--model=" + model_name])
29 | 
30 | 
31 | 
32 | def build_model(model_name, net_input, num_classes, crop_width, crop_height, frontend="ResNet101", is_training=True):
33 | 	# Get the selected model. 
34 | 	# Some of them require pre-trained ResNet
35 | 
36 | 	print("Preparing the model ...")
37 | 
38 | 	if model_name not in SUPPORTED_MODELS:
39 | 		raise ValueError("The model you selected is not supported. The following models are currently supported: {0}".format(SUPPORTED_MODELS))
40 | 
41 | 	if frontend not in SUPPORTED_FRONTENDS:
42 | 		raise ValueError("The frontend you selected is not supported. The following models are currently supported: {0}".format(SUPPORTED_FRONTENDS))
43 | 
44 | 	if "ResNet50" == frontend and not os.path.isfile("models/resnet_v2_50.ckpt"):
45 | 	    download_checkpoints("ResNet50")
46 | 	if "ResNet101" == frontend and not os.path.isfile("models/resnet_v2_101.ckpt"):
47 | 	    download_checkpoints("ResNet101")
48 | 	if "ResNet152" == frontend and not os.path.isfile("models/resnet_v2_152.ckpt"):
49 | 	    download_checkpoints("ResNet152")
50 | 	if "MobileNetV2" == frontend and not os.path.isfile("models/mobilenet_v2.ckpt.data-00000-of-00001"):
51 | 	    download_checkpoints("MobileNetV2")
52 | 	if "InceptionV4" == frontend and not os.path.isfile("models/inception_v4.ckpt"):
53 | 	    download_checkpoints("InceptionV4") 
54 | 
55 | 	network = None
56 | 	init_fn = None
57 | 	if model_name == "FC-DenseNet56" or model_name == "FC-DenseNet67" or model_name == "FC-DenseNet103":
58 | 	    network = build_fc_densenet(net_input, preset_model = model_name, num_classes=num_classes)
59 | 	elif model_name == "RefineNet":
60 | 	    # RefineNet requires pre-trained ResNet weights
61 | 	    network, init_fn = build_refinenet(net_input, preset_model = model_name, frontend=frontend, num_classes=num_classes, is_training=is_training)
62 | 	elif model_name == "FRRN-A" or model_name == "FRRN-B":
63 | 	    network = build_frrn(net_input, preset_model = model_name, num_classes=num_classes)
64 | 	elif model_name == "Encoder-Decoder" or model_name == "Encoder-Decoder-Skip":
65 | 	    network = build_encoder_decoder(net_input, preset_model = model_name, num_classes=num_classes)
66 | 	elif model_name == "MobileUNet" or model_name == "MobileUNet-Skip":
67 | 	    network = build_mobile_unet(net_input, preset_model = model_name, num_classes=num_classes)
68 | 	elif model_name == "PSPNet":
69 | 	    # Image size is required for PSPNet
70 | 	    # PSPNet requires pre-trained ResNet weights
71 | 	    network, init_fn = build_pspnet(net_input, label_size=[crop_height, crop_width], preset_model = model_name, frontend=frontend, num_classes=num_classes, is_training=is_training)
72 | 	elif model_name == "GCN":
73 | 	    # GCN requires pre-trained ResNet weights
74 | 	    network, init_fn = build_gcn(net_input, preset_model = model_name, frontend=frontend, num_classes=num_classes, is_training=is_training)
75 | 	elif model_name == "DeepLabV3":
76 | 	    # DeepLabV requires pre-trained ResNet weights
77 | 	    network, init_fn = build_deeplabv3(net_input, preset_model = model_name, frontend=frontend, num_classes=num_classes, is_training=is_training)
78 | 	elif model_name == "DeepLabV3_plus":
79 | 	    # DeepLabV3+ requires pre-trained ResNet weights
80 | 	    network, init_fn = build_deeplabv3_plus(net_input, preset_model = model_name, frontend=frontend, num_classes=num_classes, is_training=is_training)
81 | 	elif model_name == "DenseASPP":
82 | 	    # DenseASPP requires pre-trained ResNet weights
83 | 	    network, init_fn = build_dense_aspp(net_input, preset_model = model_name, frontend=frontend, num_classes=num_classes, is_training=is_training)
84 | 	elif model_name == "DDSC":
85 | 	    # DDSC requires pre-trained ResNet weights
86 | 	    network, init_fn = build_ddsc(net_input, preset_model = model_name, frontend=frontend, num_classes=num_classes, is_training=is_training)
87 | 	elif model_name == "BiSeNet":
88 | 		# BiSeNet requires pre-trained ResNet weights
89 | 		network, init_fn = build_bisenet(net_input, preset_model = model_name, frontend=frontend, num_classes=num_classes, is_training=is_training)
90 | 	elif model_name == "AdapNet":
91 | 	    network = build_adaptnet(net_input, num_classes=num_classes)
92 | 	elif model_name == "custom":
93 | 	    network = build_custom(net_input, num_classes)
94 | 	else:
95 | 	    raise ValueError("Error: the model %d is not available. Try checking which models are available using the command python main.py --help")
96 | 
97 | 	return network, init_fn


--------------------------------------------------------------------------------
/frontends/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awangenh/Weed-Mapping/72526ebbc2abe3b9d35672689de25a321e36b039/frontends/__init__.py


--------------------------------------------------------------------------------
/frontends/conv_blocks.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2018 The TensorFlow Authors. All Rights Reserved.
  2 | #
  3 | # Licensed under the Apache License, Version 2.0 (the "License");
  4 | # you may not use this file except in compliance with the License.
  5 | # You may obtain a copy of the License at
  6 | #
  7 | #     http://www.apache.org/licenses/LICENSE-2.0
  8 | #
  9 | # Unless required by applicable law or agreed to in writing, software
 10 | # distributed under the License is distributed on an "AS IS" BASIS,
 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12 | # See the License for the specific language governing permissions and
 13 | # limitations under the License.
 14 | # ==============================================================================
 15 | """Convolution blocks for mobilenet."""
 16 | import contextlib
 17 | import functools
 18 | 
 19 | import tensorflow as tf
 20 | 
 21 | slim = tf.contrib.slim
 22 | 
 23 | 
 24 | def _fixed_padding(inputs, kernel_size, rate=1):
 25 |   """Pads the input along the spatial dimensions independently of input size.
 26 | 
 27 |   Pads the input such that if it was used in a convolution with 'VALID' padding,
 28 |   the output would have the same dimensions as if the unpadded input was used
 29 |   in a convolution with 'SAME' padding.
 30 | 
 31 |   Args:
 32 |     inputs: A tensor of size [batch, height_in, width_in, channels].
 33 |     kernel_size: The kernel to be used in the conv2d or max_pool2d operation.
 34 |     rate: An integer, rate for atrous convolution.
 35 | 
 36 |   Returns:
 37 |     output: A tensor of size [batch, height_out, width_out, channels] with the
 38 |       input, either intact (if kernel_size == 1) or padded (if kernel_size > 1).
 39 |   """
 40 |   kernel_size_effective = [kernel_size[0] + (kernel_size[0] - 1) * (rate - 1),
 41 |                            kernel_size[0] + (kernel_size[0] - 1) * (rate - 1)]
 42 |   pad_total = [kernel_size_effective[0] - 1, kernel_size_effective[1] - 1]
 43 |   pad_beg = [pad_total[0] // 2, pad_total[1] // 2]
 44 |   pad_end = [pad_total[0] - pad_beg[0], pad_total[1] - pad_beg[1]]
 45 |   padded_inputs = tf.pad(inputs, [[0, 0], [pad_beg[0], pad_end[0]],
 46 |                                   [pad_beg[1], pad_end[1]], [0, 0]])
 47 |   return padded_inputs
 48 | 
 49 | 
 50 | def _make_divisible(v, divisor, min_value=None):
 51 |   if min_value is None:
 52 |     min_value = divisor
 53 |   new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
 54 |   # Make sure that round down does not go down by more than 10%.
 55 |   if new_v < 0.9 * v:
 56 |     new_v += divisor
 57 |   return new_v
 58 | 
 59 | 
 60 | def _split_divisible(num, num_ways, divisible_by=8):
 61 |   """Evenly splits num, num_ways so each piece is a multiple of divisible_by."""
 62 |   assert num % divisible_by == 0
 63 |   assert num / num_ways >= divisible_by
 64 |   # Note: want to round down, we adjust each split to match the total.
 65 |   base = num // num_ways // divisible_by * divisible_by
 66 |   result = []
 67 |   accumulated = 0
 68 |   for i in range(num_ways):
 69 |     r = base
 70 |     while accumulated + r < num * (i + 1) / num_ways:
 71 |       r += divisible_by
 72 |     result.append(r)
 73 |     accumulated += r
 74 |   assert accumulated == num
 75 |   return result
 76 | 
 77 | 
 78 | @contextlib.contextmanager
 79 | def _v1_compatible_scope_naming(scope):
 80 |   if scope is None:  # Create uniqified separable blocks.
 81 |     with tf.variable_scope(None, default_name='separable') as s, \
 82 |          tf.name_scope(s.original_name_scope):
 83 |       yield ''
 84 |   else:
 85 |     # We use scope_depthwise, scope_pointwise for compatibility with V1 ckpts.
 86 |     # which provide numbered scopes.
 87 |     scope += '_'
 88 |     yield scope
 89 | 
 90 | 
 91 | @slim.add_arg_scope
 92 | def split_separable_conv2d(input_tensor,
 93 |                            num_outputs,
 94 |                            scope=None,
 95 |                            normalizer_fn=None,
 96 |                            stride=1,
 97 |                            rate=1,
 98 |                            endpoints=None,
 99 |                            use_explicit_padding=False):
100 |   """Separable mobilenet V1 style convolution.
101 | 
102 |   Depthwise convolution, with default non-linearity,
103 |   followed by 1x1 depthwise convolution.  This is similar to
104 |   slim.separable_conv2d, but differs in tha it applies batch
105 |   normalization and non-linearity to depthwise. This  matches
106 |   the basic building of Mobilenet Paper
107 |   (https://arxiv.org/abs/1704.04861)
108 | 
109 |   Args:
110 |     input_tensor: input
111 |     num_outputs: number of outputs
112 |     scope: optional name of the scope. Note if provided it will use
113 |     scope_depthwise for deptwhise, and scope_pointwise for pointwise.
114 |     normalizer_fn: which normalizer function to use for depthwise/pointwise
115 |     stride: stride
116 |     rate: output rate (also known as dilation rate)
117 |     endpoints: optional, if provided, will export additional tensors to it.
118 |     use_explicit_padding: Use 'VALID' padding for convolutions, but prepad
119 |       inputs so that the output dimensions are the same as if 'SAME' padding
120 |       were used.
121 | 
122 |   Returns:
123 |     output tesnor
124 |   """
125 | 
126 |   with _v1_compatible_scope_naming(scope) as scope:
127 |     dw_scope = scope + 'depthwise'
128 |     endpoints = endpoints if endpoints is not None else {}
129 |     kernel_size = [3, 3]
130 |     padding = 'SAME'
131 |     if use_explicit_padding:
132 |       padding = 'VALID'
133 |       input_tensor = _fixed_padding(input_tensor, kernel_size, rate)
134 |     net = slim.separable_conv2d(
135 |         input_tensor,
136 |         None,
137 |         kernel_size,
138 |         depth_multiplier=1,
139 |         stride=stride,
140 |         rate=rate,
141 |         normalizer_fn=normalizer_fn,
142 |         padding=padding,
143 |         scope=dw_scope)
144 | 
145 |     endpoints[dw_scope] = net
146 | 
147 |     pw_scope = scope + 'pointwise'
148 |     net = slim.conv2d(
149 |         net,
150 |         num_outputs, [1, 1],
151 |         stride=1,
152 |         normalizer_fn=normalizer_fn,
153 |         scope=pw_scope)
154 |     endpoints[pw_scope] = net
155 |   return net
156 | 
157 | 
158 | def expand_input_by_factor(n, divisible_by=8):
159 |   return lambda num_inputs, **_: _make_divisible(num_inputs * n, divisible_by)
160 | 
161 | 
162 | @slim.add_arg_scope
163 | def expanded_conv(input_tensor,
164 |                   num_outputs,
165 |                   expansion_size=expand_input_by_factor(6),
166 |                   stride=1,
167 |                   rate=1,
168 |                   kernel_size=(3, 3),
169 |                   residual=True,
170 |                   normalizer_fn=None,
171 |                   project_activation_fn=tf.identity,
172 |                   split_projection=1,
173 |                   split_expansion=1,
174 |                   expansion_transform=None,
175 |                   depthwise_location='expansion',
176 |                   depthwise_channel_multiplier=1,
177 |                   endpoints=None,
178 |                   use_explicit_padding=False,
179 |                   padding='SAME',
180 |                   scope=None):
181 |   """Depthwise Convolution Block with expansion.
182 | 
183 |   Builds a composite convolution that has the following structure
184 |   expansion (1x1) -> depthwise (kernel_size) -> projection (1x1)
185 | 
186 |   Args:
187 |     input_tensor: input
188 |     num_outputs: number of outputs in the final layer.
189 |     expansion_size: the size of expansion, could be a constant or a callable.
190 |       If latter it will be provided 'num_inputs' as an input. For forward
191 |       compatibility it should accept arbitrary keyword arguments.
192 |       Default will expand the input by factor of 6.
193 |     stride: depthwise stride
194 |     rate: depthwise rate
195 |     kernel_size: depthwise kernel
196 |     residual: whether to include residual connection between input
197 |       and output.
198 |     normalizer_fn: batchnorm or otherwise
199 |     project_activation_fn: activation function for the project layer
200 |     split_projection: how many ways to split projection operator
201 |       (that is conv expansion->bottleneck)
202 |     split_expansion: how many ways to split expansion op
203 |       (that is conv bottleneck->expansion) ops will keep depth divisible
204 |       by this value.
205 |     expansion_transform: Optional function that takes expansion
206 |       as a single input and returns output.
207 |     depthwise_location: where to put depthwise covnvolutions supported
208 |       values None, 'input', 'output', 'expansion'
209 |     depthwise_channel_multiplier: depthwise channel multiplier:
210 |     each input will replicated (with different filters)
211 |     that many times. So if input had c channels,
212 |     output will have c x depthwise_channel_multpilier.
213 |     endpoints: An optional dictionary into which intermediate endpoints are
214 |       placed. The keys "expansion_output", "depthwise_output",
215 |       "projection_output" and "expansion_transform" are always populated, even
216 |       if the corresponding functions are not invoked.
217 |     use_explicit_padding: Use 'VALID' padding for convolutions, but prepad
218 |       inputs so that the output dimensions are the same as if 'SAME' padding
219 |       were used.
220 |     padding: Padding type to use if `use_explicit_padding` is not set.
221 |     scope: optional scope.
222 | 
223 |   Returns:
224 |     Tensor of depth num_outputs
225 | 
226 |   Raises:
227 |     TypeError: on inval
228 |   """
229 |   with tf.variable_scope(scope, default_name='expanded_conv') as s, \
230 |        tf.name_scope(s.original_name_scope):
231 |     prev_depth = input_tensor.get_shape().as_list()[3]
232 |     if  depthwise_location not in [None, 'input', 'output', 'expansion']:
233 |       raise TypeError('%r is unknown value for depthwise_location' %
234 |                       depthwise_location)
235 |     if use_explicit_padding:
236 |       if padding != 'SAME':
237 |         raise TypeError('`use_explicit_padding` should only be used with '
238 |                         '"SAME" padding.')
239 |       padding = 'VALID'
240 |     depthwise_func = functools.partial(
241 |         slim.separable_conv2d,
242 |         num_outputs=None,
243 |         kernel_size=kernel_size,
244 |         depth_multiplier=depthwise_channel_multiplier,
245 |         stride=stride,
246 |         rate=rate,
247 |         normalizer_fn=normalizer_fn,
248 |         padding=padding,
249 |         scope='depthwise')
250 |     # b1 -> b2 * r -> b2
251 |     #   i -> (o * r) (bottleneck) -> o
252 |     input_tensor = tf.identity(input_tensor, 'input')
253 |     net = input_tensor
254 | 
255 |     if depthwise_location == 'input':
256 |       if use_explicit_padding:
257 |         net = _fixed_padding(net, kernel_size, rate)
258 |       net = depthwise_func(net, activation_fn=None)
259 | 
260 |     if callable(expansion_size):
261 |       inner_size = expansion_size(num_inputs=prev_depth)
262 |     else:
263 |       inner_size = expansion_size
264 | 
265 |     if inner_size > net.shape[3]:
266 |       net = split_conv(
267 |           net,
268 |           inner_size,
269 |           num_ways=split_expansion,
270 |           scope='expand',
271 |           stride=1,
272 |           normalizer_fn=normalizer_fn)
273 |       net = tf.identity(net, 'expansion_output')
274 |     if endpoints is not None:
275 |       endpoints['expansion_output'] = net
276 | 
277 |     if depthwise_location == 'expansion':
278 |       if use_explicit_padding:
279 |         net = _fixed_padding(net, kernel_size, rate)
280 |       net = depthwise_func(net)
281 | 
282 |     net = tf.identity(net, name='depthwise_output')
283 |     if endpoints is not None:
284 |       endpoints['depthwise_output'] = net
285 |     if expansion_transform:
286 |       net = expansion_transform(expansion_tensor=net, input_tensor=input_tensor)
287 |     # Note in contrast with expansion, we always have
288 |     # projection to produce the desired output size.
289 |     net = split_conv(
290 |         net,
291 |         num_outputs,
292 |         num_ways=split_projection,
293 |         stride=1,
294 |         scope='project',
295 |         normalizer_fn=normalizer_fn,
296 |         activation_fn=project_activation_fn)
297 |     if endpoints is not None:
298 |       endpoints['projection_output'] = net
299 |     if depthwise_location == 'output':
300 |       if use_explicit_padding:
301 |         net = _fixed_padding(net, kernel_size, rate)
302 |       net = depthwise_func(net, activation_fn=None)
303 | 
304 |     if callable(residual):  # custom residual
305 |       net = residual(input_tensor=input_tensor, output_tensor=net)
306 |     elif (residual and
307 |           # stride check enforces that we don't add residuals when spatial
308 |           # dimensions are None
309 |           stride == 1 and
310 |           # Depth matches
311 |           net.get_shape().as_list()[3] ==
312 |           input_tensor.get_shape().as_list()[3]):
313 |       net += input_tensor
314 |     return tf.identity(net, name='output')
315 | 
316 | 
317 | def split_conv(input_tensor,
318 |                num_outputs,
319 |                num_ways,
320 |                scope,
321 |                divisible_by=8,
322 |                **kwargs):
323 |   """Creates a split convolution.
324 | 
325 |   Split convolution splits the input and output into
326 |   'num_blocks' blocks of approximately the same size each,
327 |   and only connects $i$-th input to $i$ output.
328 | 
329 |   Args:
330 |     input_tensor: input tensor
331 |     num_outputs: number of output filters
332 |     num_ways: num blocks to split by.
333 |     scope: scope for all the operators.
334 |     divisible_by: make sure that every part is divisiable by this.
335 |     **kwargs: will be passed directly into conv2d operator
336 |   Returns:
337 |     tensor
338 |   """
339 |   b = input_tensor.get_shape().as_list()[3]
340 | 
341 |   if num_ways == 1 or min(b // num_ways,
342 |                           num_outputs // num_ways) < divisible_by:
343 |     # Don't do any splitting if we end up with less than 8 filters
344 |     # on either side.
345 |     return slim.conv2d(input_tensor, num_outputs, [1, 1], scope=scope, **kwargs)
346 | 
347 |   outs = []
348 |   input_splits = _split_divisible(b, num_ways, divisible_by=divisible_by)
349 |   output_splits = _split_divisible(
350 |       num_outputs, num_ways, divisible_by=divisible_by)
351 |   inputs = tf.split(input_tensor, input_splits, axis=3, name='split_' + scope)
352 |   base = scope
353 |   for i, (input_tensor, out_size) in enumerate(zip(inputs, output_splits)):
354 |     scope = base + '_part_%d' % (i,)
355 |     n = slim.conv2d(input_tensor, out_size, [1, 1], scope=scope, **kwargs)
356 |     n = tf.identity(n, scope + '_output')
357 |     outs.append(n)
358 |   return tf.concat(outs, 3, name=scope + '_concat')


--------------------------------------------------------------------------------
/frontends/inception_utils.py:
--------------------------------------------------------------------------------
 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved.
 2 | #
 3 | # Licensed under the Apache License, Version 2.0 (the "License");
 4 | # you may not use this file except in compliance with the License.
 5 | # You may obtain a copy of the License at
 6 | #
 7 | # http://www.apache.org/licenses/LICENSE-2.0
 8 | #
 9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | # ==============================================================================
15 | """Contains common code shared by all inception models.
16 | 
17 | Usage of arg scope:
18 |   with slim.arg_scope(inception_arg_scope()):
19 |     logits, end_points = inception.inception_v3(images, num_classes,
20 |                                                 is_training=is_training)
21 | 
22 | """
23 | from __future__ import absolute_import
24 | from __future__ import division
25 | from __future__ import print_function
26 | 
27 | import tensorflow as tf
28 | 
29 | slim = tf.contrib.slim
30 | 
31 | 
32 | def inception_arg_scope(weight_decay=0.00004,
33 |                         use_batch_norm=True,
34 |                         batch_norm_decay=0.9997,
35 |                         batch_norm_epsilon=0.001,
36 |                         activation_fn=tf.nn.relu,
37 |                         batch_norm_updates_collections=tf.GraphKeys.UPDATE_OPS):
38 |   """Defines the default arg scope for inception models.
39 | 
40 |   Args:
41 |     weight_decay: The weight decay to use for regularizing the model.
42 |     use_batch_norm: "If `True`, batch_norm is applied after each convolution.
43 |     batch_norm_decay: Decay for batch norm moving average.
44 |     batch_norm_epsilon: Small float added to variance to avoid dividing by zero
45 |       in batch norm.
46 |     activation_fn: Activation function for conv2d.
47 |     batch_norm_updates_collections: Collection for the update ops for
48 |       batch norm.
49 | 
50 |   Returns:
51 |     An `arg_scope` to use for the inception models.
52 |   """
53 |   batch_norm_params = {
54 |       # Decay for the moving averages.
55 |       'decay': batch_norm_decay,
56 |       # epsilon to prevent 0s in variance.
57 |       'epsilon': batch_norm_epsilon,
58 |       # collection containing update_ops.
59 |       'updates_collections': batch_norm_updates_collections,
60 |       # use fused batch norm if possible.
61 |       'fused': None,
62 |   }
63 |   if use_batch_norm:
64 |     normalizer_fn = slim.batch_norm
65 |     normalizer_params = batch_norm_params
66 |   else:
67 |     normalizer_fn = None
68 |     normalizer_params = {}
69 |   # Set weight_decay for weights in Conv and FC layers.
70 |   with slim.arg_scope([slim.conv2d, slim.fully_connected],
71 |                       weights_regularizer=slim.l2_regularizer(weight_decay)):
72 |     with slim.arg_scope(
73 |         [slim.conv2d],
74 |         weights_initializer=slim.variance_scaling_initializer(),
75 |         activation_fn=activation_fn,
76 |         normalizer_fn=normalizer_fn,
77 |         normalizer_params=normalizer_params) as sc:
78 |       return sc


--------------------------------------------------------------------------------
/frontends/mobilenet_v2.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2018 The TensorFlow Authors. All Rights Reserved.
  2 | #
  3 | # Licensed under the Apache License, Version 2.0 (the "License");
  4 | # you may not use this file except in compliance with the License.
  5 | # You may obtain a copy of the License at
  6 | #
  7 | #     http://www.apache.org/licenses/LICENSE-2.0
  8 | #
  9 | # Unless required by applicable law or agreed to in writing, software
 10 | # distributed under the License is distributed on an "AS IS" BASIS,
 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12 | # See the License for the specific language governing permissions and
 13 | # limitations under the License.
 14 | # ==============================================================================
 15 | """Implementation of Mobilenet V2.
 16 | 
 17 | Architecture: https://arxiv.org/abs/1801.04381
 18 | 
 19 | The base model gives 72.2% accuracy on ImageNet, with 300MMadds,
 20 | 3.4 M parameters.
 21 | """
 22 | 
 23 | from __future__ import absolute_import
 24 | from __future__ import division
 25 | from __future__ import print_function
 26 | 
 27 | import copy
 28 | import functools
 29 | 
 30 | import tensorflow as tf
 31 | 
 32 | from frontends import conv_blocks as ops
 33 | from frontends import mobilenet_base as lib
 34 | 
 35 | slim = tf.contrib.slim
 36 | op = lib.op
 37 | 
 38 | expand_input = ops.expand_input_by_factor
 39 | 
 40 | # pyformat: disable
 41 | # Architecture: https://arxiv.org/abs/1801.04381
 42 | V2_DEF = dict(
 43 |     defaults={
 44 |         # Note: these parameters of batch norm affect the architecture
 45 |         # that's why they are here and not in training_scope.
 46 |         (slim.batch_norm,): {'center': True, 'scale': True},
 47 |         (slim.conv2d, slim.fully_connected, slim.separable_conv2d): {
 48 |             'normalizer_fn': slim.batch_norm, 'activation_fn': tf.nn.relu6
 49 |         },
 50 |         (ops.expanded_conv,): {
 51 |             'expansion_size': expand_input(6),
 52 |             'split_expansion': 1,
 53 |             'normalizer_fn': slim.batch_norm,
 54 |             'residual': True
 55 |         },
 56 |         (slim.conv2d, slim.separable_conv2d): {'padding': 'SAME'}
 57 |     },
 58 |     spec=[
 59 |         op(slim.conv2d, stride=2, num_outputs=32, kernel_size=[3, 3]),
 60 |         op(ops.expanded_conv,
 61 |            expansion_size=expand_input(1, divisible_by=1),
 62 |            num_outputs=16),
 63 |         op(ops.expanded_conv, stride=2, num_outputs=24),
 64 |         op(ops.expanded_conv, stride=1, num_outputs=24),
 65 |         op(ops.expanded_conv, stride=2, num_outputs=32),
 66 |         op(ops.expanded_conv, stride=1, num_outputs=32),
 67 |         op(ops.expanded_conv, stride=1, num_outputs=32),
 68 |         op(ops.expanded_conv, stride=2, num_outputs=64),
 69 |         op(ops.expanded_conv, stride=1, num_outputs=64),
 70 |         op(ops.expanded_conv, stride=1, num_outputs=64),
 71 |         op(ops.expanded_conv, stride=1, num_outputs=64),
 72 |         op(ops.expanded_conv, stride=1, num_outputs=96),
 73 |         op(ops.expanded_conv, stride=1, num_outputs=96),
 74 |         op(ops.expanded_conv, stride=1, num_outputs=96),
 75 |         op(ops.expanded_conv, stride=2, num_outputs=160),
 76 |         op(ops.expanded_conv, stride=1, num_outputs=160),
 77 |         op(ops.expanded_conv, stride=1, num_outputs=160),
 78 |         op(ops.expanded_conv, stride=1, num_outputs=320),
 79 |         op(slim.conv2d, stride=1, kernel_size=[1, 1], num_outputs=1280)
 80 |     ],
 81 | )
 82 | # pyformat: enable
 83 | 
 84 | 
 85 | @slim.add_arg_scope
 86 | def mobilenet(input_tensor,
 87 |               num_classes=1001,
 88 |               depth_multiplier=1.0,
 89 |               scope='MobilenetV2',
 90 |               conv_defs=None,
 91 |               finegrain_classification_mode=False,
 92 |               min_depth=None,
 93 |               divisible_by=None,
 94 |               **kwargs):
 95 |   """Creates mobilenet V2 network.
 96 | 
 97 |   Inference mode is created by default. To create training use training_scope
 98 |   below.
 99 | 
100 |   with tf.contrib.slim.arg_scope(mobilenet_v2.training_scope()):
101 |      logits, endpoints = mobilenet_v2.mobilenet(input_tensor)
102 | 
103 |   Args:
104 |     input_tensor: The input tensor
105 |     num_classes: number of classes
106 |     depth_multiplier: The multiplier applied to scale number of
107 |     channels in each layer. Note: this is called depth multiplier in the
108 |     paper but the name is kept for consistency with slim's model builder.
109 |     scope: Scope of the operator
110 |     conv_defs: Allows to override default conv def.
111 |     finegrain_classification_mode: When set to True, the model
112 |     will keep the last layer large even for small multipliers. Following
113 |     https://arxiv.org/abs/1801.04381
114 |     suggests that it improves performance for ImageNet-type of problems.
115 |       *Note* ignored if final_endpoint makes the builder exit earlier.
116 |     min_depth: If provided, will ensure that all layers will have that
117 |     many channels after application of depth multiplier.
118 |     divisible_by: If provided will ensure that all layers # channels
119 |     will be divisible by this number.
120 |     **kwargs: passed directly to mobilenet.mobilenet:
121 |       prediction_fn- what prediction function to use.
122 |       reuse-: whether to reuse variables (if reuse set to true, scope
123 |       must be given).
124 |   Returns:
125 |     logits/endpoints pair
126 | 
127 |   Raises:
128 |     ValueError: On invalid arguments
129 |   """
130 |   if conv_defs is None:
131 |     conv_defs = V2_DEF
132 |   if 'multiplier' in kwargs:
133 |     raise ValueError('mobilenetv2 doesn\'t support generic '
134 |                      'multiplier parameter use "depth_multiplier" instead.')
135 |   if finegrain_classification_mode:
136 |     conv_defs = copy.deepcopy(conv_defs)
137 |     if depth_multiplier < 1:
138 |       conv_defs['spec'][-1].params['num_outputs'] /= depth_multiplier
139 | 
140 |   depth_args = {}
141 |   # NB: do not set depth_args unless they are provided to avoid overriding
142 |   # whatever default depth_multiplier might have thanks to arg_scope.
143 |   if min_depth is not None:
144 |     depth_args['min_depth'] = min_depth
145 |   if divisible_by is not None:
146 |     depth_args['divisible_by'] = divisible_by
147 | 
148 |   with slim.arg_scope((lib.depth_multiplier,), **depth_args):
149 |     return lib.mobilenet(
150 |         input_tensor,
151 |         num_classes=num_classes,
152 |         conv_defs=conv_defs,
153 |         scope=scope,
154 |         multiplier=depth_multiplier,
155 |         **kwargs)
156 | 
157 | 
158 | def wrapped_partial(func, *args, **kwargs):
159 |   partial_func = functools.partial(func, *args, **kwargs)
160 |   functools.update_wrapper(partial_func, func)
161 |   return partial_func
162 | 
163 | 
164 | # Wrappers for mobilenet v2 with depth-multipliers. Be noticed that
165 | # 'finegrain_classification_mode' is set to True, which means the embedding
166 | # layer will not be shrinked when given a depth-multiplier < 1.0.
167 | mobilenet_v2_140 = wrapped_partial(mobilenet, depth_multiplier=1.4)
168 | mobilenet_v2_050 = wrapped_partial(mobilenet, depth_multiplier=0.50,
169 |                                    finegrain_classification_mode=True)
170 | mobilenet_v2_035 = wrapped_partial(mobilenet, depth_multiplier=0.35,
171 |                                    finegrain_classification_mode=True)
172 | 
173 | 
174 | @slim.add_arg_scope
175 | def mobilenet_base(input_tensor, depth_multiplier=1.0, **kwargs):
176 |   """Creates base of the mobilenet (no pooling and no logits) ."""
177 |   return mobilenet(input_tensor,
178 |                    depth_multiplier=depth_multiplier,
179 |                    base_only=True, **kwargs)
180 | 
181 | 
182 | def training_scope(**kwargs):
183 |   """Defines MobilenetV2 training scope.
184 | 
185 |   Usage:
186 |      with tf.contrib.slim.arg_scope(mobilenet_v2.training_scope()):
187 |        logits, endpoints = mobilenet_v2.mobilenet(input_tensor)
188 | 
189 |   with slim.
190 | 
191 |   Args:
192 |     **kwargs: Passed to mobilenet.training_scope. The following parameters
193 |     are supported:
194 |       weight_decay- The weight decay to use for regularizing the model.
195 |       stddev-  Standard deviation for initialization, if negative uses xavier.
196 |       dropout_keep_prob- dropout keep probability
197 |       bn_decay- decay for the batch norm moving averages.
198 | 
199 |   Returns:
200 |     An `arg_scope` to use for the mobilenet v2 model.
201 |   """
202 |   return lib.training_scope(**kwargs)
203 | 
204 | 
205 | __all__ = ['training_scope', 'mobilenet_base', 'mobilenet', 'V2_DEF']


--------------------------------------------------------------------------------
/frontends/resnet_utils.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved.
  2 | #
  3 | # Licensed under the Apache License, Version 2.0 (the "License");
  4 | # you may not use this file except in compliance with the License.
  5 | # You may obtain a copy of the License at
  6 | #
  7 | # http://www.apache.org/licenses/LICENSE-2.0
  8 | #
  9 | # Unless required by applicable law or agreed to in writing, software
 10 | # distributed under the License is distributed on an "AS IS" BASIS,
 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12 | # See the License for the specific language governing permissions and
 13 | # limitations under the License.
 14 | # ==============================================================================
 15 | """Contains building blocks for various versions of Residual Networks.
 16 | 
 17 | Residual networks (ResNets) were proposed in:
 18 |   Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
 19 |   Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2015
 20 | 
 21 | More variants were introduced in:
 22 |   Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
 23 |   Identity Mappings in Deep Residual Networks. arXiv: 1603.05027, 2016
 24 | 
 25 | We can obtain different ResNet variants by changing the network depth, width,
 26 | and form of residual unit. This module implements the infrastructure for
 27 | building them. Concrete ResNet units and full ResNet networks are implemented in
 28 | the accompanying resnet_v1.py and resnet_v2.py modules.
 29 | 
 30 | Compared to https://github.com/KaimingHe/deep-residual-networks, in the current
 31 | implementation we subsample the output activations in the last residual unit of
 32 | each block, instead of subsampling the input activations in the first residual
 33 | unit of each block. The two implementations give identical results but our
 34 | implementation is more memory efficient.
 35 | """
 36 | from __future__ import absolute_import
 37 | from __future__ import division
 38 | from __future__ import print_function
 39 | 
 40 | import collections
 41 | import tensorflow as tf
 42 | 
 43 | slim = tf.contrib.slim
 44 | 
 45 | 
 46 | class Block(collections.namedtuple('Block', ['scope', 'unit_fn', 'args'])):
 47 |   """A named tuple describing a ResNet block.
 48 | 
 49 |   Its parts are:
 50 |     scope: The scope of the `Block`.
 51 |     unit_fn: The ResNet unit function which takes as input a `Tensor` and
 52 |       returns another `Tensor` with the output of the ResNet unit.
 53 |     args: A list of length equal to the number of units in the `Block`. The list
 54 |       contains one (depth, depth_bottleneck, stride) tuple for each unit in the
 55 |       block to serve as argument to unit_fn.
 56 |   """
 57 | 
 58 | 
 59 | def subsample(inputs, factor, scope=None):
 60 |   """Subsamples the input along the spatial dimensions.
 61 | 
 62 |   Args:
 63 |     inputs: A `Tensor` of size [batch, height_in, width_in, channels].
 64 |     factor: The subsampling factor.
 65 |     scope: Optional variable_scope.
 66 | 
 67 |   Returns:
 68 |     output: A `Tensor` of size [batch, height_out, width_out, channels] with the
 69 |       input, either intact (if factor == 1) or subsampled (if factor > 1).
 70 |   """
 71 |   if factor == 1:
 72 |     return inputs
 73 |   else:
 74 |     return slim.max_pool2d(inputs, [1, 1], stride=factor, scope=scope)
 75 | 
 76 | 
 77 | def conv2d_same(inputs, num_outputs, kernel_size, stride, rate=1, scope=None):
 78 |   """Strided 2-D convolution with 'SAME' padding.
 79 | 
 80 |   When stride > 1, then we do explicit zero-padding, followed by conv2d with
 81 |   'VALID' padding.
 82 | 
 83 |   Note that
 84 | 
 85 |      net = conv2d_same(inputs, num_outputs, 3, stride=stride)
 86 | 
 87 |   is equivalent to
 88 | 
 89 |      net = slim.conv2d(inputs, num_outputs, 3, stride=1, padding='SAME')
 90 |      net = subsample(net, factor=stride)
 91 | 
 92 |   whereas
 93 | 
 94 |      net = slim.conv2d(inputs, num_outputs, 3, stride=stride, padding='SAME')
 95 | 
 96 |   is different when the input's height or width is even, which is why we add the
 97 |   current function. For more details, see ResnetUtilsTest.testConv2DSameEven().
 98 | 
 99 |   Args:
100 |     inputs: A 4-D tensor of size [batch, height_in, width_in, channels].
101 |     num_outputs: An integer, the number of output filters.
102 |     kernel_size: An int with the kernel_size of the filters.
103 |     stride: An integer, the output stride.
104 |     rate: An integer, rate for atrous convolution.
105 |     scope: Scope.
106 | 
107 |   Returns:
108 |     output: A 4-D tensor of size [batch, height_out, width_out, channels] with
109 |       the convolution output.
110 |   """
111 |   if stride == 1:
112 |     return slim.conv2d(inputs, num_outputs, kernel_size, stride=1, rate=rate,
113 |                        padding='SAME', scope=scope)
114 |   else:
115 |     kernel_size_effective = kernel_size + (kernel_size - 1) * (rate - 1)
116 |     pad_total = kernel_size_effective - 1
117 |     pad_beg = pad_total // 2
118 |     pad_end = pad_total - pad_beg
119 |     inputs = tf.pad(inputs,
120 |                     [[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]])
121 |     return slim.conv2d(inputs, num_outputs, kernel_size, stride=stride,
122 |                        rate=rate, padding='VALID', scope=scope)
123 | 
124 | 
125 | @slim.add_arg_scope
126 | def stack_blocks_dense(net, blocks, multi_grid, output_stride=None,
127 |                        outputs_collections=None):
128 |   """Stacks ResNet `Blocks` and controls output feature density.
129 | 
130 |   First, this function creates scopes for the ResNet in the form of
131 |   'block_name/unit_1', 'block_name/unit_2', etc.
132 | 
133 |   Second, this function allows the user to explicitly control the ResNet
134 |   output_stride, which is the ratio of the input to output spatial resolution.
135 |   This is useful for dense prediction tasks such as semantic segmentation or
136 |   object detection.
137 | 
138 |   Most ResNets consist of 4 ResNet blocks and subsample the activations by a
139 |   factor of 2 when transitioning between consecutive ResNet blocks. This results
140 |   to a nominal ResNet output_stride equal to 8. If we set the output_stride to
141 |   half the nominal network stride (e.g., output_stride=4), then we compute
142 |   responses twice.
143 | 
144 |   Control of the output feature density is implemented by atrous convolution.
145 | 
146 |   Args:
147 |     net: A `Tensor` of size [batch, height, width, channels].
148 |     blocks: A list of length equal to the number of ResNet `Blocks`. Each
149 |       element is a ResNet `Block` object describing the units in the `Block`.
150 |     output_stride: If `None`, then the output will be computed at the nominal
151 |       network stride. If output_stride is not `None`, it specifies the requested
152 |       ratio of input to output spatial resolution, which needs to be equal to
153 |       the product of unit strides from the start up to some level of the ResNet.
154 |       For example, if the ResNet employs units with strides 1, 2, 1, 3, 4, 1,
155 |       then valid values for the output_stride are 1, 2, 6, 24 or None (which
156 |       is equivalent to output_stride=24).
157 |     outputs_collections: Collection to add the ResNet block outputs.
158 | 
159 |   Returns:
160 |     net: Output tensor with stride equal to the specified output_stride.
161 | 
162 |   Raises:
163 |     ValueError: If the target output_stride is not valid.
164 |   """
165 |   # The current_stride variable keeps track of the effective stride of the
166 |   # activations. This allows us to invoke atrous convolution whenever applying
167 |   # the next residual unit would result in the activations having stride larger
168 |   # than the target output_stride.
169 |   current_stride = 1
170 | 
171 |   # The atrous convolution rate parameter.
172 |   rate = 1
173 | 
174 |   for block in blocks:
175 |     with tf.variable_scope(block.scope, 'block', [net]) as sc:
176 |       for i, unit in enumerate(block.args):
177 |         if output_stride is not None and current_stride > output_stride:
178 |           raise ValueError('The target output_stride cannot be reached.')
179 | 
180 |         with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
181 |           # If we have reached the target output_stride, then we need to employ
182 |           # atrous convolution with stride=1 and multiply the atrous rate by the
183 |           # current unit's stride for use in subsequent layers.
184 |           if output_stride is not None and current_stride == output_stride:
185 |             # Only uses atrous convolutions with multi-graid rates in the last (block4) block
186 |             if block.scope == "block4":
187 |               net = block.unit_fn(net, rate=rate * multi_grid[i], **dict(unit, stride=1))
188 |             else:
189 |               net = block.unit_fn(net, rate=rate, **dict(unit, stride=1))
190 |             rate *= unit.get('stride', 1)
191 |           else:
192 |             net = block.unit_fn(net, rate=1, **unit)
193 |             current_stride *= unit.get('stride', 1)
194 |       net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net)
195 | 
196 |   if output_stride is not None and current_stride != output_stride:
197 |     raise ValueError('The target output_stride cannot be reached.')
198 | 
199 |   return net
200 | 
201 | 
202 | def resnet_arg_scope(weight_decay=0.0001,
203 |                      is_training=True,
204 |                      batch_norm_decay=0.997,
205 |                      batch_norm_epsilon=1e-5,
206 |                      batch_norm_scale=True,
207 |                      activation_fn=tf.nn.relu,
208 |                      use_batch_norm=True):
209 |   """Defines the default ResNet arg scope.
210 | 
211 |   TODO(gpapan): The batch-normalization related default values above are
212 |     appropriate for use in conjunction with the reference ResNet models
213 |     released at https://github.com/KaimingHe/deep-residual-networks. When
214 |     training ResNets from scratch, they might need to be tuned.
215 | 
216 |   Args:
217 |     weight_decay: The weight decay to use for regularizing the model.
218 |     batch_norm_decay: The moving average decay when estimating layer activation
219 |       statistics in batch normalization.
220 |     batch_norm_epsilon: Small constant to prevent division by zero when
221 |       normalizing activations by their variance in batch normalization.
222 |     batch_norm_scale: If True, uses an explicit `gamma` multiplier to scale the
223 |       activations in the batch normalization layer.
224 |     activation_fn: The activation function which is used in ResNet.
225 |     use_batch_norm: Whether or not to use batch normalization.
226 | 
227 |   Returns:
228 |     An `arg_scope` to use for the resnet models.
229 |   """
230 |   batch_norm_params = {
231 |       'decay': batch_norm_decay,
232 |       'epsilon': batch_norm_epsilon,
233 |       'scale': batch_norm_scale,
234 |       'updates_collections': None,
235 |       'is_training': is_training,
236 |       'fused': True,  # Use fused batch norm if possible.
237 |   }
238 | 
239 |   with slim.arg_scope(
240 |       [slim.conv2d],
241 |       weights_regularizer=slim.l2_regularizer(weight_decay),
242 |       weights_initializer=slim.variance_scaling_initializer(),
243 |       activation_fn=activation_fn,
244 |       normalizer_fn=slim.batch_norm if use_batch_norm else None,
245 |       normalizer_params=batch_norm_params):
246 |     with slim.arg_scope([slim.batch_norm], **batch_norm_params):
247 |       # The following implies padding='SAME' for pool1, which makes feature
248 |       # alignment easier for dense prediction tasks. This is also used in
249 |       # https://github.com/facebook/fb.resnet.torch. However the accompanying
250 |       # code of 'Deep Residual Learning for Image Recognition' uses
251 |       # padding='VALID' for pool1. You can switch to that choice by setting
252 |       # slim.arg_scope([slim.max_pool2d], padding='VALID').
253 |       with slim.arg_scope([slim.max_pool2d], padding='SAME') as arg_sc:
254 |         return arg_sc


--------------------------------------------------------------------------------
/frontends/resnet_v1.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | from tensorflow.contrib import slim
  3 | from frontends import resnet_utils
  4 | 
  5 | resnet_arg_scope = resnet_utils.resnet_arg_scope
  6 | 
  7 | @slim.add_arg_scope
  8 | def bottleneck(inputs, depth, depth_bottleneck, stride, rate=1,
  9 |                outputs_collections=None, scope=None):
 10 |     """Bottleneck residual unit variant with BN after convolutions.
 11 |     This is the original residual unit proposed in [1]. See Fig. 1(a) of [2] for
 12 |     its definition. Note that we use here the bottleneck variant which has an
 13 |     extra bottleneck layer.
 14 |     When putting together two consecutive ResNet blocks that use this unit, one
 15 |     should use stride = 2 in the last unit of the first block.
 16 |     Args:
 17 |       inputs: A tensor of size [batch, height, width, channels].
 18 |       depth: The depth of the ResNet unit output.
 19 |       depth_bottleneck: The depth of the bottleneck layers.
 20 |       stride: The ResNet unit's stride. Determines the amount of downsampling of
 21 |         the units output compared to its input.
 22 |       rate: An integer, rate for atrous convolution.
 23 |       outputs_collections: Collection to add the ResNet unit output.
 24 |       scope: Optional variable_scope.
 25 |     Returns:
 26 |       The ResNet unit's output.
 27 |     """
 28 |     with tf.variable_scope(scope, 'bottleneck_v1', [inputs]) as sc:
 29 |         depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4)
 30 |         if depth == depth_in:
 31 |             shortcut = resnet_utils.subsample(inputs, stride, 'shortcut')
 32 |         else:
 33 |             shortcut = slim.conv2d(inputs, depth, [1, 1], stride=stride,
 34 |                                    activation_fn=None, scope='shortcut')
 35 |         residual = slim.conv2d(inputs, depth_bottleneck, [1, 1], stride=1,
 36 |                                scope='conv1')
 37 |         residual = resnet_utils.conv2d_same(residual, depth_bottleneck, 3, stride,
 38 |                                             rate=rate, scope='conv2')
 39 |         residual = slim.conv2d(residual, depth, [1, 1], stride=1,
 40 |                                activation_fn=None, scope='conv3')
 41 | 
 42 |         output = tf.nn.relu(shortcut + residual)
 43 | 
 44 |         return slim.utils.collect_named_outputs(outputs_collections,
 45 |                                                 sc.original_name_scope,
 46 |                                                 output)
 47 | 
 48 | 
 49 | def resnet_v1(inputs,
 50 |               blocks,
 51 |               num_classes=None,
 52 |               is_training=True,
 53 |               global_pool=True,
 54 |               output_stride=None,
 55 |               include_root_block=True,
 56 |               spatial_squeeze=True,
 57 |               reuse=None,
 58 |               scope=None):
 59 |     """Generator for v1 ResNet models.
 60 | 
 61 |     This function generates a family of ResNet v1 models. See the resnet_v1_*()
 62 |     methods for specific model instantiations, obtained by selecting different
 63 |     block instantiations that produce ResNets of various depths.
 64 | 
 65 |     Training for image classification on Imagenet is usually done with [224, 224]
 66 |     inputs, resulting in [7, 7] feature maps at the output of the last ResNet
 67 |     block for the ResNets defined in [1] that have nominal stride equal to 32.
 68 |     However, for dense prediction tasks we advise that one uses inputs with
 69 |     spatial dimensions that are multiples of 32 plus 1, e.g., [321, 321]. In
 70 |     this case the feature maps at the ResNet output will have spatial shape
 71 |     [(height - 1) / output_stride + 1, (width - 1) / output_stride + 1]
 72 |     and corners exactly aligned with the input image corners, which greatly
 73 |     facilitates alignment of the features to the image. Using as input [225, 225]
 74 |     images results in [8, 8] feature maps at the output of the last ResNet block.
 75 | 
 76 |     For dense prediction tasks, the ResNet needs to run in fully-convolutional
 77 |     (FCN) mode and global_pool needs to be set to False. The ResNets in [1, 2] all
 78 |     have nominal stride equal to 32 and a good choice in FCN mode is to use
 79 |     output_stride=16 in order to increase the density of the computed features at
 80 |     small computational and memory overhead, cf. http://arxiv.org/abs/1606.00915.
 81 | 
 82 |     Args:
 83 |       inputs: A tensor of size [batch, height_in, width_in, channels].
 84 |       blocks: A list of length equal to the number of ResNet blocks. Each element
 85 |         is a resnet_utils.Block object describing the units in the block.
 86 |       num_classes: Number of predicted classes for classification tasks. If None
 87 |         we return the features before the logit layer.
 88 |       is_training: whether is training or not.
 89 |       global_pool: If True, we perform global average pooling before computing the
 90 |         logits. Set to True for image classification, False for dense prediction.
 91 |       output_stride: If None, then the output will be computed at the nominal
 92 |         network stride. If output_stride is not None, it specifies the requested
 93 |         ratio of input to output spatial resolution.
 94 |       include_root_block: If True, include the initial convolution followed by
 95 |         max-pooling, if False excludes it.
 96 |       spatial_squeeze: if True, logits is of shape [B, C], if false logits is
 97 |           of shape [B, 1, 1, C], where B is batch_size and C is number of classes.
 98 |       reuse: whether or not the network and its variables should be reused. To be
 99 |         able to reuse 'scope' must be given.
100 |       scope: Optional variable_scope.
101 | 
102 |     Returns:
103 |       net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
104 |         If global_pool is False, then height_out and width_out are reduced by a
105 |         factor of output_stride compared to the respective height_in and width_in,
106 |         else both height_out and width_out equal one. If num_classes is None, then
107 |         net is the output of the last ResNet block, potentially after global
108 |         average pooling. If num_classes is not None, net contains the pre-softmax
109 |         activations.
110 |       end_points: A dictionary from components of the network to the corresponding
111 |         activation.
112 | 
113 |     Raises:
114 |       ValueError: If the target output_stride is not valid.
115 |     """
116 |     with tf.variable_scope(scope, 'resnet_v1', [inputs], reuse=reuse) as sc:
117 |         end_points_collection = sc.name + '_end_points'
118 |         with slim.arg_scope([slim.conv2d, bottleneck,
119 |                              resnet_utils.stack_blocks_dense],
120 |                             outputs_collections=end_points_collection):
121 |             with slim.arg_scope([slim.batch_norm], is_training=is_training):
122 |                 net = inputs
123 |                 if include_root_block:
124 |                     if output_stride is not None:
125 |                         if output_stride % 4 != 0:
126 |                             raise ValueError('The output_stride needs to be a multiple of 4.')
127 |                         output_stride /= 4
128 |                     net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1')
129 |                     net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')
130 | 
131 |                     net = slim.utils.collect_named_outputs(end_points_collection, 'pool2', net)
132 | 
133 |                 net = resnet_utils.stack_blocks_dense(net, blocks, output_stride)
134 |                 end_points = slim.utils.convert_collection_to_dict(end_points_collection)
135 | 
136 |                 end_points['pool3'] = end_points[scope + '/block1']
137 |                 end_points['pool4'] = end_points[scope + '/block2']
138 |                 end_points['pool5'] = net
139 |                 return net, end_points
140 | 
141 | 
142 | resnet_v1.default_image_size = 224
143 | 
144 | def resnet_v1_50(inputs,
145 |                  num_classes=None,
146 |                  is_training=True,
147 |                  global_pool=True,
148 |                  output_stride=None,
149 |                  spatial_squeeze=True,
150 |                  reuse=None,
151 |                  scope='resnet_v1_50'):
152 |     """ResNet-50 model of [1]. See resnet_v1() for arg and return description."""
153 |     blocks = [
154 |         resnet_utils.Block(
155 |             'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
156 |         resnet_utils.Block(
157 |             'block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
158 |         resnet_utils.Block(
159 |             'block3', bottleneck, [(1024, 256, 1)] * 5 + [(1024, 256, 2)]),
160 |         resnet_utils.Block(
161 |             'block4', bottleneck, [(2048, 512, 1)] * 3)
162 |     ]
163 |     return resnet_v1(inputs, blocks, num_classes, is_training,
164 |                      global_pool=global_pool, output_stride=output_stride,
165 |                      include_root_block=True, spatial_squeeze=spatial_squeeze,
166 |                      reuse=reuse, scope=scope)
167 | 
168 | 
169 | resnet_v1_50.default_image_size = resnet_v1.default_image_size
170 | 
171 | 
172 | def resnet_v1_101(inputs,
173 |                   num_classes=None,
174 |                   is_training=True,
175 |                   global_pool=True,
176 |                   output_stride=None,
177 |                   spatial_squeeze=True,
178 |                   reuse=None,
179 |                   scope='resnet_v1_101'):
180 |     """ResNet-101 model of [1]. See resnet_v1() for arg and return description."""
181 |     blocks = [
182 |         resnet_utils.Block(
183 |             'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
184 |         resnet_utils.Block(
185 |             'block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
186 |         resnet_utils.Block(
187 |             'block3', bottleneck, [(1024, 256, 1)] * 22 + [(1024, 256, 2)]),
188 |         resnet_utils.Block(
189 |             'block4', bottleneck, [(2048, 512, 1)] * 3)
190 |     ]
191 |     return resnet_v1(inputs, blocks, num_classes, is_training,
192 |                      global_pool=global_pool, output_stride=output_stride,
193 |                      include_root_block=True, spatial_squeeze=spatial_squeeze,
194 |                      reuse=reuse, scope=scope)
195 | 
196 | 
197 | resnet_v1_101.default_image_size = resnet_v1.default_image_size
198 | 
199 | 
200 | def resnet_v1_152(inputs,
201 |                   num_classes=None,
202 |                   is_training=True,
203 |                   global_pool=True,
204 |                   output_stride=None,
205 |                   spatial_squeeze=True,
206 |                   reuse=None,
207 |                   scope='resnet_v1_152'):
208 |     """ResNet-152 model of [1]. See resnet_v1() for arg and return description."""
209 |     blocks = [
210 |         resnet_utils.Block(
211 |             'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
212 |         resnet_utils.Block(
213 |             'block2', bottleneck, [(512, 128, 1)] * 7 + [(512, 128, 2)]),
214 |         resnet_utils.Block(
215 |             'block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
216 |         resnet_utils.Block(
217 |             'block4', bottleneck, [(2048, 512, 1)] * 3)]
218 |     return resnet_v1(inputs, blocks, num_classes, is_training,
219 |                      global_pool=global_pool, output_stride=output_stride,
220 |                      include_root_block=True, spatial_squeeze=spatial_squeeze,
221 |                      reuse=reuse, scope=scope)
222 | 
223 | 
224 | resnet_v1_152.default_image_size = resnet_v1.default_image_size
225 | 
226 | 
227 | def resnet_v1_200(inputs,
228 |                   num_classes=None,
229 |                   is_training=True,
230 |                   global_pool=True,
231 |                   output_stride=None,
232 |                   spatial_squeeze=True,
233 |                   reuse=None,
234 |                   scope='resnet_v1_200'):
235 |     """ResNet-200 model of [2]. See resnet_v1() for arg and return description."""
236 |     blocks = [
237 |         resnet_utils.Block(
238 |             'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
239 |         resnet_utils.Block(
240 |             'block2', bottleneck, [(512, 128, 1)] * 23 + [(512, 128, 2)]),
241 |         resnet_utils.Block(
242 |             'block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
243 |         resnet_utils.Block(
244 |             'block4', bottleneck, [(2048, 512, 1)] * 3)]
245 |     return resnet_v1(inputs, blocks, num_classes, is_training,
246 |                      global_pool=global_pool, output_stride=output_stride,
247 |                      include_root_block=True, spatial_squeeze=spatial_squeeze,
248 |                      reuse=reuse, scope=scope)
249 | 
250 | 
251 | resnet_v1_200.default_image_size = resnet_v1.default_image_size
252 | 
253 | 
254 | if __name__ == '__main__':
255 |     input = tf.placeholder(tf.float32, shape=(None, 224, 224, 3), name='input')
256 |     with slim.arg_scope(resnet_arg_scope()) as sc:
257 |         logits = resnet_v1_50(input)


--------------------------------------------------------------------------------
/frontends/se_resnext.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | 
  3 | import math
  4 | 
  5 | USE_FUSED_BN = True
  6 | BN_EPSILON = 9.999999747378752e-06
  7 | BN_MOMENTUM = 0.99
  8 | 
  9 | VAR_LIST = []
 10 | 
 11 | # input image order: BGR, range [0-255]
 12 | # mean_value: 104, 117, 123
 13 | # only subtract mean is used
 14 | def constant_xavier_initializer(shape, group, dtype=tf.float32, uniform=True):
 15 |     """Initializer function."""
 16 |     if not dtype.is_floating:
 17 |       raise TypeError('Cannot create initializer for non-floating point type.')
 18 |     # Estimating fan_in and fan_out is not possible to do perfectly, but we try.
 19 |     # This is the right thing for matrix multiply and convolutions.
 20 |     if shape:
 21 |       fan_in = float(shape[-2]) if len(shape) > 1 else float(shape[-1])
 22 |       fan_out = float(shape[-1])/group
 23 |     else:
 24 |       fan_in = 1.0
 25 |       fan_out = 1.0
 26 |     for dim in shape[:-2]:
 27 |       fan_in *= float(dim)
 28 |       fan_out *= float(dim)
 29 | 
 30 |     # Average number of inputs and output connections.
 31 |     n = (fan_in + fan_out) / 2.0
 32 |     if uniform:
 33 |       # To get stddev = math.sqrt(factor / n) need to adjust for uniform.
 34 |       limit = math.sqrt(3.0 * 1.0 / n)
 35 |       return tf.random_uniform(shape, -limit, limit, dtype, seed=None)
 36 |     else:
 37 |       # To get stddev = math.sqrt(factor / n) need to adjust for truncated.
 38 |       trunc_stddev = math.sqrt(1.3 * 1.0 / n)
 39 |       return tf.truncated_normal(shape, 0.0, trunc_stddev, dtype, seed=None)
 40 | 
 41 | # for root block, use dummy input_filters, e.g. 128 rather than 64 for the first block
 42 | def se_bottleneck_block(inputs, input_filters, name_prefix, is_training, group, data_format='channels_last', need_reduce=True, is_root=False, reduced_scale=16):
 43 |     bn_axis = -1 if data_format == 'channels_last' else 1
 44 |     strides_to_use = 1
 45 |     residuals = inputs
 46 |     if need_reduce:
 47 |         strides_to_use = 1 if is_root else 2
 48 |         proj_mapping = tf.layers.conv2d(inputs, input_filters, (1, 1), use_bias=False,
 49 |                                 name=name_prefix + '_1x1_proj', strides=(strides_to_use, strides_to_use),
 50 |                                 padding='valid', data_format=data_format, activation=None,
 51 |                                 kernel_initializer=tf.contrib.layers.xavier_initializer(),
 52 |                                 bias_initializer=tf.zeros_initializer())
 53 |         residuals = tf.layers.batch_normalization(proj_mapping, momentum=BN_MOMENTUM,
 54 |                                 name=name_prefix + '_1x1_proj/bn', axis=bn_axis,
 55 |                                 epsilon=BN_EPSILON, training=is_training, reuse=None, fused=USE_FUSED_BN)
 56 | 
 57 |     reduced_inputs = tf.layers.conv2d(inputs, input_filters // 2, (1, 1), use_bias=False,
 58 |                             name=name_prefix + '_1x1_reduce', strides=(1, 1),
 59 |                             padding='valid', data_format=data_format, activation=None,
 60 |                             kernel_initializer=tf.contrib.layers.xavier_initializer(),
 61 |                             bias_initializer=tf.zeros_initializer())
 62 |     reduced_inputs_bn = tf.layers.batch_normalization(reduced_inputs, momentum=BN_MOMENTUM,
 63 |                                         name=name_prefix + '_1x1_reduce/bn', axis=bn_axis,
 64 |                                         epsilon=BN_EPSILON, training=is_training, reuse=None, fused=USE_FUSED_BN)
 65 |     reduced_inputs_relu = tf.nn.relu(reduced_inputs_bn, name=name_prefix + '_1x1_reduce/relu')
 66 | 
 67 |     if data_format == 'channels_first':
 68 |         reduced_inputs_relu = tf.pad(reduced_inputs_relu, paddings = [[0, 0], [0, 0], [1, 1], [1, 1]])
 69 |         weight_shape = [3, 3, reduced_inputs_relu.get_shape().as_list()[1]//group, input_filters // 2]
 70 |         weight_ = tf.Variable(constant_xavier_initializer(weight_shape, group=group, dtype=tf.float32), trainable=is_training, name=name_prefix + '_3x3/kernel')
 71 |         weight_groups = tf.split(weight_, num_or_size_splits=group, axis=-1, name=name_prefix + '_weight_split')
 72 |         xs = tf.split(reduced_inputs_relu, num_or_size_splits=group, axis=1, name=name_prefix + '_inputs_split')
 73 |     else:
 74 |         reduced_inputs_relu = tf.pad(reduced_inputs_relu, paddings = [[0, 0], [1, 1], [1, 1], [0, 0]])
 75 |         weight_shape = [3, 3, reduced_inputs_relu.get_shape().as_list()[-1]//group, input_filters // 2]
 76 |         weight_ = tf.Variable(constant_xavier_initializer(weight_shape, group=group, dtype=tf.float32), trainable=is_training, name=name_prefix + '_3x3/kernel')
 77 |         weight_groups = tf.split(weight_, num_or_size_splits=group, axis=-1, name=name_prefix + '_weight_split')
 78 |         xs = tf.split(reduced_inputs_relu, num_or_size_splits=group, axis=-1, name=name_prefix + '_inputs_split')
 79 | 
 80 |     convolved = [tf.nn.convolution(x, weight, padding='VALID', strides=[strides_to_use, strides_to_use], name=name_prefix + '_group_conv',
 81 |                     data_format=('NCHW' if data_format == 'channels_first' else 'NHWC')) for (x, weight) in zip(xs, weight_groups)]
 82 | 
 83 |     if data_format == 'channels_first':
 84 |         conv3_inputs = tf.concat(convolved, axis=1, name=name_prefix + '_concat')
 85 |     else:
 86 |         conv3_inputs = tf.concat(convolved, axis=-1, name=name_prefix + '_concat')
 87 | 
 88 |     conv3_inputs_bn = tf.layers.batch_normalization(conv3_inputs, momentum=BN_MOMENTUM, name=name_prefix + '_3x3/bn',
 89 |                                         axis=bn_axis, epsilon=BN_EPSILON, training=is_training, reuse=None, fused=USE_FUSED_BN)
 90 |     conv3_inputs_relu = tf.nn.relu(conv3_inputs_bn, name=name_prefix + '_3x3/relu')
 91 | 
 92 | 
 93 |     increase_inputs = tf.layers.conv2d(conv3_inputs_relu, input_filters, (1, 1), use_bias=False,
 94 |                                 name=name_prefix + '_1x1_increase', strides=(1, 1),
 95 |                                 padding='valid', data_format=data_format, activation=None,
 96 |                                 kernel_initializer=tf.contrib.layers.xavier_initializer(),
 97 |                                 bias_initializer=tf.zeros_initializer())
 98 |     increase_inputs_bn = tf.layers.batch_normalization(increase_inputs, momentum=BN_MOMENTUM,
 99 |                                         name=name_prefix + '_1x1_increase/bn', axis=bn_axis,
100 |                                         epsilon=BN_EPSILON, training=is_training, reuse=None, fused=USE_FUSED_BN)
101 | 
102 |     if data_format == 'channels_first':
103 |         pooled_inputs = tf.reduce_mean(increase_inputs_bn, [2, 3], name=name_prefix + '_global_pool', keep_dims=True)
104 |     else:
105 |         pooled_inputs = tf.reduce_mean(increase_inputs_bn, [1, 2], name=name_prefix + '_global_pool', keep_dims=True)
106 | 
107 |     down_inputs = tf.layers.conv2d(pooled_inputs, input_filters // reduced_scale, (1, 1), use_bias=True,
108 |                                 name=name_prefix + '_1x1_down', strides=(1, 1),
109 |                                 padding='valid', data_format=data_format, activation=None,
110 |                                 kernel_initializer=tf.contrib.layers.xavier_initializer(),
111 |                                 bias_initializer=tf.zeros_initializer())
112 |     down_inputs_relu = tf.nn.relu(down_inputs, name=name_prefix + '_1x1_down/relu')
113 | 
114 |     up_inputs = tf.layers.conv2d(down_inputs_relu, input_filters, (1, 1), use_bias=True,
115 |                                 name=name_prefix + '_1x1_up', strides=(1, 1),
116 |                                 padding='valid', data_format=data_format, activation=None,
117 |                                 kernel_initializer=tf.contrib.layers.xavier_initializer(),
118 |                                 bias_initializer=tf.zeros_initializer())
119 |     prob_outputs = tf.nn.sigmoid(up_inputs, name=name_prefix + '_prob')
120 | 
121 |     rescaled_feat = tf.multiply(prob_outputs, increase_inputs_bn, name=name_prefix + '_mul')
122 |     pre_act = tf.add(residuals, rescaled_feat, name=name_prefix + '_add')
123 |     return tf.nn.relu(pre_act, name=name_prefix + '/relu')
124 |     #return tf.nn.relu(residuals + prob_outputs * increase_inputs_bn, name=name_prefix + '/relu')
125 | 
126 | def se_resnext(input_image, scope, is_training = False, group=16, data_format='channels_last', net_depth=50):
127 |     end_points = dict()
128 | 
129 |     bn_axis = -1 if data_format == 'channels_last' else 1
130 |     # the input image should in BGR order, note that this is not the common case in Tensorflow
131 |     # convert from RGB to BGR
132 |     if data_format == 'channels_last':
133 |         image_channels = tf.unstack(input_image, axis=-1)
134 |         swaped_input_image = tf.stack([image_channels[2], image_channels[1], image_channels[0]], axis=-1)
135 |     else:
136 |         image_channels = tf.unstack(input_image, axis=1)
137 |         swaped_input_image = tf.stack([image_channels[2], image_channels[1], image_channels[0]], axis=1)
138 |     #swaped_input_image = input_image
139 | 
140 |     if net_depth not in [50, 101]:
141 |         raise TypeError('Only ResNeXt50 or ResNeXt101 are currently supported.')
142 |     input_depth = [256, 512, 1024, 2048] # the input depth of the the first block is dummy input
143 |     num_units = [3, 4, 6, 3] if net_depth==50 else [3, 4, 23, 3]
144 | 
145 |     block_name_prefix = ['conv2_{}', 'conv3_{}', 'conv4_{}', 'conv5_{}']
146 | 
147 |     if data_format == 'channels_first':
148 |         swaped_input_image = tf.pad(swaped_input_image, paddings = [[0, 0], [0, 0], [3, 3], [3, 3]])
149 |     else:
150 |         swaped_input_image = tf.pad(swaped_input_image, paddings = [[0, 0], [3, 3], [3, 3], [0, 0]])
151 | 
152 |     inputs_features = tf.layers.conv2d(swaped_input_image, input_depth[0]//4, (7, 7), use_bias=False,
153 |                                 name='conv1/7x7_s2', strides=(2, 2),
154 |                                 padding='valid', data_format=data_format, activation=None,
155 |                                 kernel_initializer=tf.contrib.layers.xavier_initializer(),
156 |                                 bias_initializer=tf.zeros_initializer())
157 |     VAR_LIST.append('conv1/7x7_s2')
158 | 
159 |     inputs_features = tf.layers.batch_normalization(inputs_features, momentum=BN_MOMENTUM,
160 |                                         name='conv1/7x7_s2/bn', axis=bn_axis,
161 |                                         epsilon=BN_EPSILON, training=is_training, reuse=None, fused=USE_FUSED_BN)
162 |     inputs_features = tf.nn.relu(inputs_features, name='conv1/relu_7x7_s2')
163 | 
164 |     inputs_features = tf.layers.max_pooling2d(inputs_features, [3, 3], [2, 2], padding='same', data_format=data_format, name='pool1/3x3_s2')
165 | 
166 |     is_root = True
167 |     for ind, num_unit in enumerate(num_units):
168 |         need_reduce = True
169 |         for unit_index in range(1, num_unit+1):
170 |             inputs_features = se_bottleneck_block(inputs_features, input_depth[ind], block_name_prefix[ind].format(unit_index), is_training=is_training, group=group, data_format=data_format, need_reduce=need_reduce, is_root=is_root)
171 |             need_reduce = False
172 |         end_points['pool' + str(ind)] = inputs_features
173 |         is_root = False
174 | 
175 |     if data_format == 'channels_first':
176 |         pooled_inputs = tf.reduce_mean(inputs_features, [2, 3], name='pool5/7x7_s1', keep_dims=True)
177 |     else:
178 |         pooled_inputs = tf.reduce_mean(inputs_features, [1, 2], name='pool5/7x7_s1', keep_dims=True)
179 | 
180 |     pooled_inputs = tf.layers.flatten(pooled_inputs)
181 | 
182 |     # logits_output = tf.layers.dense(pooled_inputs, num_classes,
183 |     #                             kernel_initializer=tf.contrib.layers.xavier_initializer(),
184 |     #                             bias_initializer=tf.zeros_initializer(), use_bias=True)
185 | 
186 |     logits_output = None
187 | 
188 |     return logits_output, end_points, VAR_LIST
189 | 


--------------------------------------------------------------------------------
/iou_vs_epochs.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awangenh/Weed-Mapping/72526ebbc2abe3b9d35672689de25a321e36b039/iou_vs_epochs.png


--------------------------------------------------------------------------------
/models/AdapNet.py:
--------------------------------------------------------------------------------
  1 | # coding=utf-8
  2 | 
  3 | import tensorflow as tf
  4 | from tensorflow.contrib import slim
  5 | import numpy as np
  6 | from frontends import resnet_v2
  7 | import os, sys
  8 | 
  9 | 
 10 | def Upsampling(inputs,scale):
 11 |     return tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*scale,  tf.shape(inputs)[2]*scale])
 12 | 
 13 | def ConvBlock(inputs, n_filters, kernel_size=[3, 3], stride=1):
 14 |     """
 15 |     Basic conv block for Encoder-Decoder
 16 |     Apply successivly Convolution, BatchNormalization, ReLU nonlinearity
 17 |     """
 18 |     net = tf.nn.relu(slim.batch_norm(inputs, fused=True))
 19 |     net = slim.conv2d(net, n_filters, kernel_size, stride=stride, activation_fn=None, normalizer_fn=None)
 20 |     return net
 21 | 
 22 | def ResNetBlock_1(inputs, filters_1, filters_2):
 23 |     net = tf.nn.relu(slim.batch_norm(inputs, fused=True))
 24 |     net = slim.conv2d(net, filters_1, [1, 1], activation_fn=None, normalizer_fn=None)
 25 | 
 26 |     net = tf.nn.relu(slim.batch_norm(net, fused=True))
 27 |     net = slim.conv2d(net, filters_1, [3, 3], activation_fn=None, normalizer_fn=None)
 28 | 
 29 |     net = tf.nn.relu(slim.batch_norm(net, fused=True))
 30 |     net = slim.conv2d(net, filters_2, [1, 1], activation_fn=None, normalizer_fn=None)
 31 | 
 32 |     net = tf.add(inputs, net)
 33 | 
 34 |     return net
 35 | 
 36 | def ResNetBlock_2(inputs, filters_1, filters_2, s=1):
 37 |     net_1 = tf.nn.relu(slim.batch_norm(inputs, fused=True))
 38 |     net_1 = slim.conv2d(net_1, filters_1, [1, 1], stride=s, activation_fn=None, normalizer_fn=None)
 39 | 
 40 |     net_1 = tf.nn.relu(slim.batch_norm(net_1, fused=True))
 41 |     net_1 = slim.conv2d(net_1, filters_1, [3, 3], activation_fn=None, normalizer_fn=None)
 42 | 
 43 |     net_1 = tf.nn.relu(slim.batch_norm(net_1, fused=True))
 44 |     net_1 = slim.conv2d(net_1, filters_2, [1, 1], activation_fn=None, normalizer_fn=None)
 45 | 
 46 |     net_2 = tf.nn.relu(slim.batch_norm(inputs, fused=True))
 47 |     net_2 = slim.conv2d(net_2, filters_2, [1, 1], stride=s, activation_fn=None, normalizer_fn=None)
 48 | 
 49 |     net = tf.add(net_1, net_2)
 50 | 
 51 |     return net
 52 | 
 53 | 
 54 | def MultiscaleBlock_1(inputs, filters_1, filters_2, filters_3, p, d):
 55 |     net = tf.nn.relu(slim.batch_norm(inputs, fused=True))
 56 |     net = slim.conv2d(net, filters_1, [1, 1], activation_fn=None, normalizer_fn=None)
 57 | 
 58 |     scale_1 = tf.nn.relu(slim.batch_norm(net, fused=True))
 59 |     scale_1 = slim.conv2d(scale_1, filters_3 // 2, [3, 3], rate=p, activation_fn=None, normalizer_fn=None)
 60 |     scale_2 = tf.nn.relu(slim.batch_norm(net, fused=True))
 61 |     scale_2 = slim.conv2d(scale_2, filters_3 // 2, [3, 3], rate=d, activation_fn=None, normalizer_fn=None)
 62 |     net = tf.concat((scale_1, scale_2), axis=-1)
 63 | 
 64 |     net = tf.nn.relu(slim.batch_norm(net, fused=True))
 65 |     net = slim.conv2d(net, filters_2, [1, 1], activation_fn=None, normalizer_fn=None)
 66 | 
 67 |     net = tf.add(inputs, net)
 68 | 
 69 |     return net
 70 | 
 71 | 
 72 | def MultiscaleBlock_2(inputs, filters_1, filters_2, filters_3, p, d):
 73 |     net_1 = tf.nn.relu(slim.batch_norm(inputs, fused=True))
 74 |     net_1 = slim.conv2d(net_1, filters_1, [1, 1], activation_fn=None, normalizer_fn=None)
 75 | 
 76 |     scale_1 = tf.nn.relu(slim.batch_norm(net_1, fused=True))
 77 |     scale_1 = slim.conv2d(scale_1, filters_3 // 2, [3, 3], rate=p, activation_fn=None, normalizer_fn=None)
 78 |     scale_2 = tf.nn.relu(slim.batch_norm(net_1, fused=True))
 79 |     scale_2 = slim.conv2d(scale_2, filters_3 // 2, [3, 3], rate=d, activation_fn=None, normalizer_fn=None)
 80 |     net_1 = tf.concat((scale_1, scale_2), axis=-1)
 81 | 
 82 |     net_1 = tf.nn.relu(slim.batch_norm(net_1, fused=True))
 83 |     net_1 = slim.conv2d(net_1, filters_2, [1, 1], activation_fn=None, normalizer_fn=None)
 84 | 
 85 |     net_2 = tf.nn.relu(slim.batch_norm(inputs, fused=True))
 86 |     net_2 = slim.conv2d(net_2, filters_2, [1, 1], activation_fn=None, normalizer_fn=None)
 87 | 
 88 |     net = tf.add(net_1, net_2)
 89 | 
 90 |     return net
 91 | 
 92 | 
 93 | 
 94 | 
 95 | 
 96 | 
 97 | def build_adaptnet(inputs, num_classes):
 98 |     """
 99 |     Builds the AdaptNet model. 
100 | 
101 |     Arguments:
102 |       inputs: The input tensor= 
103 |       preset_model: Which model you want to use. Select which ResNet model to use for feature extraction 
104 |       num_classes: Number of classes
105 | 
106 |     Returns:
107 |       AdaptNet model
108 |     """
109 |     net = ConvBlock(inputs, n_filters=64, kernel_size=[3, 3])
110 |     net = ConvBlock(net, n_filters=64, kernel_size=[7, 7], stride=2)
111 |     net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX')
112 | 
113 |     net = ResNetBlock_2(net, filters_1=64, filters_2=256, s=1)
114 |     net = ResNetBlock_1(net, filters_1=64, filters_2=256)
115 |     net = ResNetBlock_1(net, filters_1=64, filters_2=256)
116 | 
117 |     net = ResNetBlock_2(net, filters_1=128, filters_2=512, s=2)
118 |     net = ResNetBlock_1(net, filters_1=128, filters_2=512)
119 |     net = ResNetBlock_1(net, filters_1=128, filters_2=512)
120 | 
121 |     skip_connection = ConvBlock(net, n_filters=12, kernel_size=[1, 1])
122 | 
123 | 
124 |     net = MultiscaleBlock_1(net, filters_1=128, filters_2=512, filters_3=64, p=1, d=2)
125 | 
126 |     net = ResNetBlock_2(net, filters_1=256, filters_2=1024, s=2)
127 |     net = ResNetBlock_1(net, filters_1=256, filters_2=1024)
128 |     net = MultiscaleBlock_1(net, filters_1=256, filters_2=1024, filters_3=64, p=1, d=2)
129 |     net = MultiscaleBlock_1(net, filters_1=256, filters_2=1024, filters_3=64, p=1, d=4)
130 |     net = MultiscaleBlock_1(net, filters_1=256, filters_2=1024, filters_3=64, p=1, d=8)
131 |     net = MultiscaleBlock_1(net, filters_1=256, filters_2=1024, filters_3=64, p=1, d=16)
132 | 
133 |     net = MultiscaleBlock_2(net, filters_1=512, filters_2=2048, filters_3=512, p=2, d=4)
134 |     net = MultiscaleBlock_1(net, filters_1=512, filters_2=2048, filters_3=512, p=2, d=8)
135 |     net = MultiscaleBlock_1(net, filters_1=512, filters_2=2048, filters_3=512, p=2, d=16)
136 | 
137 |     net = ConvBlock(net, n_filters=12, kernel_size=[1, 1])
138 |     net = Upsampling(net, scale=2)
139 | 
140 |     net = tf.add(skip_connection, net)
141 | 
142 |     net = Upsampling(net, scale=8)
143 | 
144 | 
145 |     
146 |     net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits')
147 | 
148 |     return net
149 | 
150 | 
151 | def mean_image_subtraction(inputs, means=[123.68, 116.78, 103.94]):
152 |     inputs=tf.to_float(inputs)
153 |     num_channels = inputs.get_shape().as_list()[-1]
154 |     if len(means) != num_channels:
155 |       raise ValueError('len(means) must match the number of channels')
156 |     channels = tf.split(axis=3, num_or_size_splits=num_channels, value=inputs)
157 |     for i in range(num_channels):
158 |         channels[i] -= means[i]
159 |     return tf.concat(axis=3, values=channels)


--------------------------------------------------------------------------------
/models/BiSeNet.py:
--------------------------------------------------------------------------------
  1 | # coding=utf-8
  2 | 
  3 | import tensorflow as tf
  4 | from tensorflow.contrib import slim
  5 | from builders import frontend_builder
  6 | import numpy as np
  7 | import os, sys
  8 | 
  9 | def Upsampling(inputs,scale):
 10 |     return tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*scale,  tf.shape(inputs)[2]*scale])
 11 | 
 12 | def ConvUpscaleBlock(inputs, n_filters, kernel_size=[3, 3], scale=2):
 13 |     """
 14 |     Basic conv transpose block for Encoder-Decoder upsampling
 15 |     Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity
 16 |     """
 17 |     net = tf.nn.relu(slim.batch_norm(inputs, fused=True))
 18 |     net = slim.conv2d_transpose(net, n_filters, kernel_size=[3, 3], stride=[scale, scale], activation_fn=None)
 19 |     return net
 20 | 
 21 | def ConvBlock(inputs, n_filters, kernel_size=[3, 3], strides=1):
 22 |     """
 23 |     Basic conv block for Encoder-Decoder
 24 |     Apply successivly Convolution, BatchNormalization, ReLU nonlinearity
 25 |     """
 26 |     net = slim.conv2d(inputs, n_filters, kernel_size, stride=[strides, strides], activation_fn=None, normalizer_fn=None)
 27 |     net = tf.nn.relu(slim.batch_norm(net, fused=True))
 28 |     return net
 29 | 
 30 | def AttentionRefinementModule(inputs, n_filters):
 31 | 
 32 |     # Global average pooling
 33 |     net = tf.reduce_mean(inputs, [1, 2], keep_dims=True)
 34 | 
 35 |     net = slim.conv2d(net, n_filters, kernel_size=[1, 1])
 36 |     net = slim.batch_norm(net, fused=True)
 37 |     net = tf.sigmoid(net)
 38 | 
 39 |     net = tf.multiply(inputs, net)
 40 | 
 41 |     return net
 42 | 
 43 | def FeatureFusionModule(input_1, input_2, n_filters):
 44 |     inputs = tf.concat([input_1, input_2], axis=-1)
 45 |     inputs = ConvBlock(inputs, n_filters=n_filters, kernel_size=[3, 3])
 46 | 
 47 |     # Global average pooling
 48 |     net = tf.reduce_mean(inputs, [1, 2], keep_dims=True)
 49 |     
 50 |     net = slim.conv2d(net, n_filters, kernel_size=[1, 1])
 51 |     net = tf.nn.relu(net)
 52 |     net = slim.conv2d(net, n_filters, kernel_size=[1, 1])
 53 |     net = tf.sigmoid(net)
 54 | 
 55 |     net = tf.multiply(inputs, net)
 56 | 
 57 |     net = tf.add(inputs, net)
 58 | 
 59 |     return net
 60 | 
 61 | 
 62 | def build_bisenet(inputs, num_classes, preset_model='BiSeNet', frontend="ResNet101", weight_decay=1e-5, is_training=True, pretrained_dir="models"):
 63 |     """
 64 |     Builds the BiSeNet model. 
 65 | 
 66 |     Arguments:
 67 |       inputs: The input tensor=
 68 |       preset_model: Which model you want to use. Select which ResNet model to use for feature extraction 
 69 |       num_classes: Number of classes
 70 | 
 71 |     Returns:
 72 |       BiSeNet model
 73 |     """
 74 | 
 75 |     ### The spatial path
 76 |     ### The number of feature maps for each convolution is not specified in the paper
 77 |     ### It was chosen here to be equal to the number of feature maps of a classification
 78 |     ### model at each corresponding stage 
 79 |     spatial_net = ConvBlock(inputs, n_filters=64, kernel_size=[3, 3], strides=2)
 80 |     spatial_net = ConvBlock(spatial_net, n_filters=128, kernel_size=[3, 3], strides=2)
 81 |     spatial_net = ConvBlock(spatial_net, n_filters=256, kernel_size=[3, 3], strides=2)
 82 | 
 83 | 
 84 |     ### Context path
 85 |     logits, end_points, frontend_scope, init_fn  = frontend_builder.build_frontend(inputs, frontend, pretrained_dir=pretrained_dir, is_training=is_training)
 86 | 
 87 |     net_4 = AttentionRefinementModule(end_points['pool4'], n_filters=512)
 88 | 
 89 |     net_5 = AttentionRefinementModule(end_points['pool5'], n_filters=2048)
 90 | 
 91 |     global_channels = tf.reduce_mean(net_5, [1, 2], keep_dims=True)
 92 |     net_5_scaled = tf.multiply(global_channels, net_5)
 93 | 
 94 |     ### Combining the paths
 95 |     net_4 = Upsampling(net_4, scale=2)
 96 |     net_5_scaled = Upsampling(net_5_scaled, scale=4)
 97 | 
 98 |     context_net = tf.concat([net_4, net_5_scaled], axis=-1)
 99 | 
100 |     net = FeatureFusionModule(input_1=spatial_net, input_2=context_net, n_filters=num_classes)
101 | 
102 | 
103 |     ### Final upscaling and finish
104 |     net = Upsampling(net, scale=8)
105 |     
106 |     net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits')
107 | 
108 |     return net, init_fn
109 | 
110 | 


--------------------------------------------------------------------------------
/models/DDSC.py:
--------------------------------------------------------------------------------
  1 | # coding=utf-8
  2 | 
  3 | import tensorflow as tf
  4 | from tensorflow.contrib import slim
  5 | from builders import frontend_builder
  6 | import numpy as np
  7 | import os, sys
  8 | 
  9 | def Upsampling(inputs,scale):
 10 |     return tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*scale,  tf.shape(inputs)[2]*scale])
 11 | 
 12 | def ConvUpscaleBlock(inputs, n_filters, kernel_size=[3, 3], scale=2):
 13 |     """
 14 |     Basic conv transpose block for Encoder-Decoder upsampling
 15 |     Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity
 16 |     """
 17 |     net = tf.nn.relu(slim.batch_norm(inputs, fused=True))
 18 |     net = slim.conv2d_transpose(net, n_filters, kernel_size=[3, 3], stride=[scale, scale], activation_fn=None)
 19 |     return net
 20 | 
 21 | def ConvBlock(inputs, n_filters, kernel_size=[3, 3]):
 22 |     """
 23 |     Basic conv block for Encoder-Decoder
 24 |     Apply successivly Convolution, BatchNormalization, ReLU nonlinearity
 25 |     """
 26 |     net = tf.nn.relu(slim.batch_norm(inputs, fused=True))
 27 |     net = slim.conv2d(net, n_filters, kernel_size, activation_fn=None, normalizer_fn=None)
 28 |     return net
 29 | 
 30 | def GroupedConvolutionBlock(inputs, grouped_channels, cardinality=32):
 31 |     group_list = []
 32 | 
 33 |     net = tf.nn.relu(slim.batch_norm(inputs, fused=True))
 34 | 
 35 |     for c in range(cardinality):
 36 |         x = net[:, :, :, c * grouped_channels:(c + 1) * grouped_channels]
 37 | 
 38 |         x = slim.conv2d(x, grouped_channels, kernel_size=[3, 3])
 39 | 
 40 |         group_list.append(x)
 41 | 
 42 |     group_merge = tf.concat(group_list, axis=-1)
 43 | 
 44 |     return group_merge
 45 | 
 46 | def ResNeXtBlock(inputs, n_filters_out, bottleneck_factor=2, cardinality=32):
 47 | 
 48 |     assert not (n_filters_out // 2) % cardinality
 49 |     grouped_channels = (n_filters_out // 2) // cardinality
 50 | 
 51 |     net = ConvBlock(inputs, n_filters=n_filters_out / bottleneck_factor, kernel_size=[1, 1])
 52 |     net = GroupedConvolutionBlock(net, grouped_channels, cardinality=32)
 53 |     net = ConvBlock(net, n_filters=n_filters_out, kernel_size=[1, 1])
 54 | 
 55 | 
 56 |     net = tf.add(inputs, net)
 57 | 
 58 |     return net
 59 | 
 60 | def EncoderAdaptionBlock(inputs, n_filters, bottleneck_factor=2, cardinality=32):
 61 | 
 62 |     net = ConvBlock(inputs, n_filters, kernel_size=[3, 3])
 63 |     net = ResNeXtBlock(net, n_filters_out=n_filters, bottleneck_factor=bottleneck_factor)
 64 |     net = ResNeXtBlock(net, n_filters_out=n_filters, bottleneck_factor=bottleneck_factor)
 65 |     net = ResNeXtBlock(net, n_filters_out=n_filters, bottleneck_factor=bottleneck_factor)
 66 |     net = ConvBlock(net, n_filters, kernel_size=[3, 3])
 67 | 
 68 |     return net
 69 | 
 70 | 
 71 | def SemanticFeatureGenerationBlock(inputs, D_features, D_prime_features, O_features, bottleneck_factor=2, cardinality=32):
 72 | 
 73 |     d_1 = ConvBlock(inputs, D_features, kernel_size=[3, 3])
 74 |     pool_1 = slim.pool(d_1, [5, 5], stride=[1, 1], pooling_type='MAX')
 75 |     d_prime_1 = ConvBlock(pool_1, D_prime_features, kernel_size=[3, 3])
 76 | 
 77 |     d_2 = ConvBlock(pool_1, D_features, kernel_size=[3, 3])
 78 |     pool_2 = slim.pool(d_2, [5, 5], stride=[1, 1], pooling_type='MAX')
 79 |     d_prime_2 = ConvBlock(pool_2, D_prime_features, kernel_size=[3, 3])
 80 | 
 81 |     d_3 = ConvBlock(pool_2, D_features, kernel_size=[3, 3])
 82 |     pool_3 = slim.pool(d_3, [5, 5], stride=[1, 1], pooling_type='MAX')
 83 |     d_prime_3 = ConvBlock(pool_3, D_prime_features, kernel_size=[3, 3])
 84 | 
 85 |     d_4 = ConvBlock(pool_3, D_features, kernel_size=[3, 3])
 86 |     pool_4 = slim.pool(d_4, [5, 5], stride=[1, 1], pooling_type='MAX')
 87 |     d_prime_4 = ConvBlock(pool_4, D_prime_features, kernel_size=[3, 3])
 88 | 
 89 | 
 90 |     net = tf.concat([d_prime_1, d_prime_2, d_prime_3, d_prime_4], axis=-1)
 91 | 
 92 |     net = ConvBlock(net, n_filters=D_features, kernel_size=[3, 3])
 93 | 
 94 |     net = ResNeXtBlock(net, n_filters_out=D_features, bottleneck_factor=bottleneck_factor)
 95 |     net = ResNeXtBlock(net, n_filters_out=D_features, bottleneck_factor=bottleneck_factor)
 96 |     net = ResNeXtBlock(net, n_filters_out=D_features, bottleneck_factor=bottleneck_factor)
 97 |     net = ResNeXtBlock(net, n_filters_out=D_features, bottleneck_factor=bottleneck_factor)
 98 | 
 99 |     net = ConvBlock(net, O_features, kernel_size=[3, 3])
100 | 
101 |     return net
102 | 
103 | 
104 | 
105 | def build_ddsc(inputs, num_classes, preset_model='DDSC', frontend="ResNet101", weight_decay=1e-5, is_training=True, pretrained_dir="models"):
106 |     """
107 |     Builds the Dense Decoder Shortcut Connections model. 
108 | 
109 |     Arguments:
110 |       inputs: The input tensor=
111 |       preset_model: Which model you want to use. Select which ResNet model to use for feature extraction 
112 |       num_classes: Number of classes
113 | 
114 |     Returns:
115 |       Dense Decoder Shortcut Connections model
116 |     """
117 | 
118 |     logits, end_points, frontend_scope, init_fn  = frontend_builder.build_frontend(inputs, frontend, pretrained_dir=pretrained_dir, is_training=is_training)
119 | 
120 |     ### Adapting features for all stages
121 |     decoder_4 = EncoderAdaptionBlock(end_points['pool5'], n_filters=1024)
122 |     decoder_3 = EncoderAdaptionBlock(end_points['pool4'], n_filters=512)
123 |     decoder_2 = EncoderAdaptionBlock(end_points['pool3'], n_filters=256)
124 |     decoder_1 = EncoderAdaptionBlock(end_points['pool2'], n_filters=128)
125 | 
126 |     decoder_4 = SemanticFeatureGenerationBlock(decoder_4, D_features=1024, D_prime_features = 1024 / 4, O_features=1024)
127 | 
128 |     ### Fusing features from 3 and 4
129 |     decoder_4 = ConvBlock(decoder_4, n_filters=512, kernel_size=[3, 3])
130 |     decoder_4 = Upsampling(decoder_4, scale=2)
131 | 
132 |     decoder_3 = ConvBlock(decoder_3, n_filters=512, kernel_size=[3, 3])
133 | 
134 |     decoder_3 = tf.add_n([decoder_4, decoder_3])
135 | 
136 |     decoder_3 = SemanticFeatureGenerationBlock(decoder_3, D_features=512, D_prime_features = 512 / 4, O_features=512)
137 | 
138 |     ### Fusing features from 2, 3, 4
139 |     decoder_4 = ConvBlock(decoder_4, n_filters=256, kernel_size=[3, 3])
140 |     decoder_4 = Upsampling(decoder_4, scale=4)
141 | 
142 |     decoder_3 = ConvBlock(decoder_3, n_filters=256, kernel_size=[3, 3])
143 |     decoder_3 = Upsampling(decoder_3, scale=2)
144 | 
145 |     decoder_2 = ConvBlock(decoder_2, n_filters=256, kernel_size=[3, 3])
146 | 
147 |     decoder_2 = tf.add_n([decoder_4, decoder_3, decoder_2])
148 | 
149 |     decoder_2 = SemanticFeatureGenerationBlock(decoder_2, D_features=256, D_prime_features = 256 / 4, O_features=256)
150 | 
151 |     ### Fusing features from 1, 2, 3, 4
152 |     decoder_4 = ConvBlock(decoder_4, n_filters=128, kernel_size=[3, 3])
153 |     decoder_4 = Upsampling(decoder_4, scale=8)
154 | 
155 |     decoder_3 = ConvBlock(decoder_3, n_filters=128, kernel_size=[3, 3])
156 |     decoder_3 = Upsampling(decoder_3, scale=4)
157 | 
158 |     decoder_2 = ConvBlock(decoder_2, n_filters=128, kernel_size=[3, 3])
159 |     decoder_2 = Upsampling(decoder_2, scale=2)
160 | 
161 |     decoder_1 = ConvBlock(decoder_1, n_filters=128, kernel_size=[3, 3])
162 | 
163 |     decoder_1 = tf.add_n([decoder_4, decoder_3, decoder_2, decoder_1])
164 | 
165 |     decoder_1 = SemanticFeatureGenerationBlock(decoder_1, D_features=128, D_prime_features = 128 / 4, O_features=num_classes)
166 | 
167 | 
168 |     ### Final upscaling and finish
169 |     net = Upsampling(decoder_1, scale=4)
170 |     
171 |     net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits')
172 | 
173 |     return net, init_fn
174 | 
175 | 


--------------------------------------------------------------------------------
/models/DeepLabV3.py:
--------------------------------------------------------------------------------
 1 | # coding=utf-8
 2 | 
 3 | import tensorflow as tf
 4 | from tensorflow.contrib import slim
 5 | import numpy as np
 6 | from builders import frontend_builder
 7 | import os, sys
 8 | 
 9 | def Upsampling(inputs,feature_map_shape):
10 |     return tf.image.resize_bilinear(inputs, size=feature_map_shape)
11 | 
12 | def ConvUpscaleBlock(inputs, n_filters, kernel_size=[3, 3], scale=2):
13 |     """
14 |     Basic conv transpose block for Encoder-Decoder upsampling
15 |     Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity
16 |     """
17 |     net = tf.nn.relu(slim.batch_norm(inputs, fused=True))
18 |     net = slim.conv2d_transpose(net, n_filters, kernel_size=[3, 3], stride=[scale, scale], activation_fn=None)
19 |     return net
20 | 
21 | def ConvBlock(inputs, n_filters, kernel_size=[3, 3]):
22 |     """
23 |     Basic conv block for Encoder-Decoder
24 |     Apply successivly Convolution, BatchNormalization, ReLU nonlinearity
25 |     """
26 |     net = tf.nn.relu(slim.batch_norm(inputs, fused=True))
27 |     net = slim.conv2d(net, n_filters, kernel_size, activation_fn=None, normalizer_fn=None)
28 |     return net
29 | 
30 | def AtrousSpatialPyramidPoolingModule(inputs, depth=256):
31 |     """
32 | 
33 |     ASPP consists of (a) one 1×1 convolution and three 3×3 convolutions with rates = (6, 12, 18) when output stride = 16
34 |     (all with 256 filters and batch normalization), and (b) the image-level features as described in the paper
35 | 
36 |     """
37 | 
38 |     feature_map_size = tf.shape(inputs)
39 | 
40 |     # Global average pooling
41 |     image_features = tf.reduce_mean(inputs, [1, 2], keep_dims=True)
42 | 
43 |     image_features = slim.conv2d(image_features, depth, [1, 1], activation_fn=None)
44 |     image_features = tf.image.resize_bilinear(image_features, (feature_map_size[1], feature_map_size[2]))
45 | 
46 |     atrous_pool_block_1 = slim.conv2d(inputs, depth, [1, 1], activation_fn=None)
47 | 
48 |     atrous_pool_block_6 = slim.conv2d(inputs, depth, [3, 3], rate=6, activation_fn=None)
49 | 
50 |     atrous_pool_block_12 = slim.conv2d(inputs, depth, [3, 3], rate=12, activation_fn=None)
51 | 
52 |     atrous_pool_block_18 = slim.conv2d(inputs, depth, [3, 3], rate=18, activation_fn=None)
53 | 
54 |     net = tf.concat((image_features, atrous_pool_block_1, atrous_pool_block_6, atrous_pool_block_12, atrous_pool_block_18), axis=3)
55 |     net = slim.conv2d(net, depth, [1, 1], scope="conv_1x1_output", activation_fn=None)
56 | 
57 |     return net
58 | 
59 | 
60 | 
61 | 
62 | 
63 | def build_deeplabv3(inputs, num_classes, preset_model='DeepLabV3', frontend="Res101", weight_decay=1e-5, is_training=True, pretrained_dir="models"):
64 |     """
65 |     Builds the DeepLabV3 model. 
66 | 
67 |     Arguments:
68 |       inputs: The input tensor= 
69 |       preset_model: Which model you want to use. Select which ResNet model to use for feature extraction 
70 |       num_classes: Number of classes
71 | 
72 |     Returns:
73 |       DeepLabV3 model
74 |     """
75 | 
76 |     logits, end_points, frontend_scope, init_fn  = frontend_builder.build_frontend(inputs, frontend, pretrained_dir=pretrained_dir, is_training=is_training)
77 | 
78 |     label_size = tf.shape(inputs)[1:3]
79 | 
80 |     net = AtrousSpatialPyramidPoolingModule(end_points['pool4'])
81 | 
82 |     net = Upsampling(net, label_size)
83 |     
84 |     net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits')
85 | 
86 |     return net, init_fn
87 | 
88 | 
89 | def mean_image_subtraction(inputs, means=[123.68, 116.78, 103.94]):
90 |     inputs=tf.to_float(inputs)
91 |     num_channels = inputs.get_shape().as_list()[-1]
92 |     if len(means) != num_channels:
93 |       raise ValueError('len(means) must match the number of channels')
94 |     channels = tf.split(axis=3, num_or_size_splits=num_channels, value=inputs)
95 |     for i in range(num_channels):
96 |         channels[i] -= means[i]
97 |     return tf.concat(axis=3, values=channels)


--------------------------------------------------------------------------------
/models/DeepLabV3_plus.py:
--------------------------------------------------------------------------------
  1 | # coding=utf-8
  2 | 
  3 | import tensorflow as tf
  4 | from tensorflow.contrib import slim
  5 | from builders import frontend_builder
  6 | import numpy as np
  7 | import os, sys
  8 | 
  9 | def Upsampling(inputs,feature_map_shape):
 10 |     return tf.image.resize_bilinear(inputs, size=tf.cast(feature_map_shape, tf.int32))
 11 | 
 12 | def ConvUpscaleBlock(inputs, n_filters, kernel_size=[3, 3], scale=2):
 13 |     """
 14 |     Basic conv transpose block for Encoder-Decoder upsampling
 15 |     Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity
 16 |     """
 17 |     net = tf.nn.relu(slim.batch_norm(inputs, fused=True))
 18 |     net = slim.conv2d_transpose(net, n_filters, kernel_size=[3, 3], stride=[scale, scale], activation_fn=None)
 19 |     return net
 20 | 
 21 | def ConvBlock(inputs, n_filters, kernel_size=[3, 3]):
 22 |     """
 23 |     Basic conv block for Encoder-Decoder
 24 |     Apply successivly Convolution, BatchNormalization, ReLU nonlinearity
 25 |     """
 26 |     net = tf.nn.relu(slim.batch_norm(inputs, fused=True))
 27 |     net = slim.conv2d(net, n_filters, kernel_size, activation_fn=None, normalizer_fn=None)
 28 |     return net
 29 | 
 30 | def AtrousSpatialPyramidPoolingModule(inputs, depth=256):
 31 |     """
 32 | 
 33 |     ASPP consists of (a) one 1×1 convolution and three 3×3 convolutions with rates = (6, 12, 18) when output stride = 16
 34 |     (all with 256 filters and batch normalization), and (b) the image-level features as described in the paper
 35 | 
 36 |     """
 37 | 
 38 |     feature_map_size = tf.shape(inputs)
 39 | 
 40 |     # Global average pooling
 41 |     image_features = tf.reduce_mean(inputs, [1, 2], keep_dims=True)
 42 | 
 43 |     image_features = slim.conv2d(image_features, depth, [1, 1], activation_fn=None)
 44 |     image_features = tf.image.resize_bilinear(image_features, (feature_map_size[1], feature_map_size[2]))
 45 | 
 46 |     atrous_pool_block_1 = slim.conv2d(inputs, depth, [1, 1], activation_fn=None)
 47 | 
 48 |     atrous_pool_block_6 = slim.conv2d(inputs, depth, [3, 3], rate=6, activation_fn=None)
 49 | 
 50 |     atrous_pool_block_12 = slim.conv2d(inputs, depth, [3, 3], rate=12, activation_fn=None)
 51 | 
 52 |     atrous_pool_block_18 = slim.conv2d(inputs, depth, [3, 3], rate=18, activation_fn=None)
 53 | 
 54 |     net = tf.concat((image_features, atrous_pool_block_1, atrous_pool_block_6, atrous_pool_block_12, atrous_pool_block_18), axis=3)
 55 | 
 56 |     return net
 57 | 
 58 | 
 59 | 
 60 | 
 61 | 
 62 | def build_deeplabv3_plus(inputs, num_classes, preset_model='DeepLabV3+', frontend="ResNet101", weight_decay=1e-5, is_training=True, pretrained_dir="models"):
 63 |     """
 64 |     Builds the DeepLabV3 model. 
 65 | 
 66 |     Arguments:
 67 |       inputs: The input tensor= 
 68 |       preset_model: Which model you want to use. Select which ResNet model to use for feature extraction 
 69 |       num_classes: Number of classes
 70 | 
 71 |     Returns:
 72 |       DeepLabV3 model
 73 |     """
 74 | 
 75 |     logits, end_points, frontend_scope, init_fn  = frontend_builder.build_frontend(inputs, frontend, pretrained_dir=pretrained_dir, is_training=is_training)
 76 | 
 77 | 
 78 |     label_size = tf.shape(inputs)[1:3]
 79 | 
 80 |     encoder_features = end_points['pool2']
 81 | 
 82 |     net = AtrousSpatialPyramidPoolingModule(end_points['pool4'])
 83 |     net = slim.conv2d(net, 256, [1, 1], scope="conv_1x1_output", activation_fn=None)
 84 |     decoder_features = Upsampling(net, label_size / 4)
 85 | 
 86 |     encoder_features = slim.conv2d(encoder_features, 48, [1, 1], activation_fn=tf.nn.relu, normalizer_fn=None)
 87 | 
 88 |     net = tf.concat((encoder_features, decoder_features), axis=3)
 89 | 
 90 |     net = slim.conv2d(net, 256, [3, 3], activation_fn=tf.nn.relu, normalizer_fn=None)
 91 |     net = slim.conv2d(net, 256, [3, 3], activation_fn=tf.nn.relu, normalizer_fn=None)
 92 | 
 93 |     net = Upsampling(net, label_size)
 94 |     
 95 |     net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits')
 96 | 
 97 |     return net, init_fn
 98 | 
 99 | 
100 | def mean_image_subtraction(inputs, means=[123.68, 116.78, 103.94]):
101 |     inputs=tf.to_float(inputs)
102 |     num_channels = inputs.get_shape().as_list()[-1]
103 |     if len(means) != num_channels:
104 |       raise ValueError('len(means) must match the number of channels')
105 |     channels = tf.split(axis=3, num_or_size_splits=num_channels, value=inputs)
106 |     for i in range(num_channels):
107 |         channels[i] -= means[i]
108 |     return tf.concat(axis=3, values=channels)
109 | 


--------------------------------------------------------------------------------
/models/DenseASPP.py:
--------------------------------------------------------------------------------
 1 | import tensorflow as tf
 2 | from tensorflow.contrib import slim
 3 | from builders import frontend_builder
 4 | import os, sys
 5 | 
 6 | 
 7 | def Upsampling(inputs,scale):
 8 |     return tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*scale,  tf.shape(inputs)[2]*scale])
 9 | 
10 | 
11 | 
12 | def DilatedConvBlock(inputs, n_filters, rate=1, kernel_size=[3, 3]):
13 |     """
14 |     Basic dilated conv block 
15 |     Apply successivly BatchNormalization, ReLU nonlinearity, dilated convolution 
16 |     """
17 |     net = tf.nn.relu(slim.batch_norm(inputs, fused=True))
18 |     net = slim.conv2d(net, n_filters, kernel_size, rate=rate, activation_fn=None, normalizer_fn=None)
19 |     return net
20 | 
21 | 
22 | 
23 | def build_dense_aspp(inputs, num_classes, preset_model='DenseASPP', frontend="ResNet101", weight_decay=1e-5, is_training=True, pretrained_dir="models"):
24 |     
25 | 
26 |     logits, end_points, frontend_scope, init_fn  = frontend_builder.build_frontend(inputs, frontend, pretrained_dir=pretrained_dir, is_training=is_training)
27 | 
28 |     init_features = end_points['pool3']
29 | 
30 |     ### First block, rate = 3
31 |     d_3_features = DilatedConvBlock(init_features, n_filters=256, kernel_size=[1, 1])
32 |     d_3 = DilatedConvBlock(d_3_features, n_filters=64, rate=3, kernel_size=[3, 3])
33 | 
34 |     ### Second block, rate = 6
35 |     d_4 = tf.concat([init_features, d_3], axis=-1)
36 |     d_4 = DilatedConvBlock(d_4, n_filters=256, kernel_size=[1, 1])
37 |     d_4 = DilatedConvBlock(d_4, n_filters=64, rate=6, kernel_size=[3, 3])
38 | 
39 |     ### Third block, rate = 12
40 |     d_5 = tf.concat([init_features, d_3, d_4], axis=-1)
41 |     d_5 = DilatedConvBlock(d_5, n_filters=256, kernel_size=[1, 1])
42 |     d_5 = DilatedConvBlock(d_5, n_filters=64, rate=12, kernel_size=[3, 3])
43 | 
44 |     ### Fourth block, rate = 18
45 |     d_6 = tf.concat([init_features, d_3, d_4, d_5], axis=-1)
46 |     d_6 = DilatedConvBlock(d_6, n_filters=256, kernel_size=[1, 1])
47 |     d_6 = DilatedConvBlock(d_6, n_filters=64, rate=18, kernel_size=[3, 3])
48 | 
49 |     ### Fifth block, rate = 24
50 |     d_7 = tf.concat([init_features, d_3, d_4, d_5, d_6], axis=-1)
51 |     d_7 = DilatedConvBlock(d_7, n_filters=256, kernel_size=[1, 1])
52 |     d_7 = DilatedConvBlock(d_7, n_filters=64, rate=24, kernel_size=[3, 3])
53 | 
54 |     full_block = tf.concat([init_features, d_3, d_4, d_5, d_6, d_7], axis=-1)
55 |     
56 |     net = slim.conv2d(full_block, num_classes, [1, 1], activation_fn=None, scope='logits')
57 | 
58 |     net = Upsampling(net, scale=8)
59 | 
60 |     return net, init_fn


--------------------------------------------------------------------------------
/models/Encoder_Decoder.py:
--------------------------------------------------------------------------------
  1 | from __future__ import division
  2 | import os,time,cv2
  3 | import tensorflow as tf
  4 | import tensorflow.contrib.slim as slim
  5 | import numpy as np
  6 | 
  7 | def conv_block(inputs, n_filters, kernel_size=[3, 3], dropout_p=0.0):
  8 | 	"""
  9 | 	Basic conv block for Encoder-Decoder
 10 | 	Apply successivly Convolution, BatchNormalization, ReLU nonlinearity
 11 | 	Dropout (if dropout_p > 0) on the inputs
 12 | 	"""
 13 | 	conv = slim.conv2d(inputs, n_filters, kernel_size, activation_fn=None, normalizer_fn=None)
 14 | 	out = tf.nn.relu(slim.batch_norm(conv, fused=True))
 15 | 	if dropout_p != 0.0:
 16 | 	  out = slim.dropout(out, keep_prob=(1.0-dropout_p))
 17 | 	return out
 18 | 
 19 | def conv_transpose_block(inputs, n_filters, kernel_size=[3, 3], dropout_p=0.0):
 20 | 	"""
 21 | 	Basic conv transpose block for Encoder-Decoder upsampling
 22 | 	Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity
 23 | 	Dropout (if dropout_p > 0) on the inputs
 24 | 	"""
 25 | 	conv = slim.conv2d_transpose(inputs, n_filters, kernel_size=[3, 3], stride=[2, 2], activation_fn=None)
 26 | 	out = tf.nn.relu(slim.batch_norm(conv))
 27 | 	if dropout_p != 0.0:
 28 | 	  out = slim.dropout(out, keep_prob=(1.0-dropout_p))
 29 | 	return out
 30 | 
 31 | def build_encoder_decoder(inputs, num_classes, preset_model = "Encoder-Decoder", dropout_p=0.5, scope=None):
 32 | 	"""
 33 | 	Builds the Encoder-Decoder model. Inspired by SegNet with some modifications
 34 | 	Optionally includes skip connections
 35 | 
 36 | 	Arguments:
 37 | 	  inputs: the input tensor
 38 | 	  n_classes: number of classes
 39 | 	  dropout_p: dropout rate applied after each convolution (0. for not using)
 40 | 
 41 | 	Returns:
 42 | 	  Encoder-Decoder model
 43 | 	"""
 44 | 
 45 | 
 46 | 	if preset_model == "Encoder-Decoder":
 47 | 		has_skip = False
 48 | 	elif preset_model == "Encoder-Decoder-Skip":
 49 | 		has_skip = True
 50 | 	else:
 51 | 		raise ValueError("Unsupported Encoder-Decoder model '%s'. This function only supports Encoder-Decoder and Encoder-Decoder-Skip" % (preset_model))
 52 | 
 53 | 	#####################
 54 | 	# Downsampling path #
 55 | 	#####################
 56 | 	net = conv_block(inputs, 64)
 57 | 	net = conv_block(net, 64)
 58 | 	net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX')
 59 | 	skip_1 = net
 60 | 
 61 | 	net = conv_block(net, 128)
 62 | 	net = conv_block(net, 128)
 63 | 	net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX')
 64 | 	skip_2 = net
 65 | 
 66 | 	net = conv_block(net, 256)
 67 | 	net = conv_block(net, 256)
 68 | 	net = conv_block(net, 256)
 69 | 	net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX')
 70 | 	skip_3 = net
 71 | 
 72 | 	net = conv_block(net, 512)
 73 | 	net = conv_block(net, 512)
 74 | 	net = conv_block(net, 512)
 75 | 	net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX')
 76 | 	skip_4 = net
 77 | 
 78 | 	net = conv_block(net, 512)
 79 | 	net = conv_block(net, 512)
 80 | 	net = conv_block(net, 512)
 81 | 	net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX')
 82 | 
 83 | 
 84 | 	#####################
 85 | 	# Upsampling path #
 86 | 	#####################
 87 | 	net = conv_transpose_block(net, 512)
 88 | 	net = conv_block(net, 512)
 89 | 	net = conv_block(net, 512)
 90 | 	net = conv_block(net, 512)
 91 | 	if has_skip:
 92 | 		net = tf.add(net, skip_4)
 93 | 
 94 | 	net = conv_transpose_block(net, 512)
 95 | 	net = conv_block(net, 512)
 96 | 	net = conv_block(net, 512)
 97 | 	net = conv_block(net, 256)
 98 | 	if has_skip:
 99 | 		net = tf.add(net, skip_3)
100 | 
101 | 	net = conv_transpose_block(net, 256)
102 | 	net = conv_block(net, 256)
103 | 	net = conv_block(net, 256)
104 | 	net = conv_block(net, 128)
105 | 	if has_skip:
106 | 		net = tf.add(net, skip_2)
107 | 
108 | 	net = conv_transpose_block(net, 128)
109 | 	net = conv_block(net, 128)
110 | 	net = conv_block(net, 64)
111 | 	if has_skip:
112 | 		net = tf.add(net, skip_1)
113 | 
114 | 	net = conv_transpose_block(net, 64)
115 | 	net = conv_block(net, 64)
116 | 	net = conv_block(net, 64)
117 | 
118 | 	#####################
119 | 	#      Softmax      #
120 | 	#####################
121 | 	net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits')
122 | 	return net


--------------------------------------------------------------------------------
/models/FC_DenseNet_Tiramisu.py:
--------------------------------------------------------------------------------
  1 | from __future__ import division
  2 | import os,time,cv2
  3 | import tensorflow as tf
  4 | import tensorflow.contrib.slim as slim
  5 | import numpy as np
  6 | 
  7 | def preact_conv(inputs, n_filters, kernel_size=[3, 3], dropout_p=0.2):
  8 |     """
  9 |     Basic pre-activation layer for DenseNets
 10 |     Apply successivly BatchNormalization, ReLU nonlinearity, Convolution and
 11 |     Dropout (if dropout_p > 0) on the inputs
 12 |     """
 13 |     preact = tf.nn.relu(slim.batch_norm(inputs, fused=True))
 14 |     conv = slim.conv2d(preact, n_filters, kernel_size, activation_fn=None, normalizer_fn=None)
 15 |     if dropout_p != 0.0:
 16 |       conv = slim.dropout(conv, keep_prob=(1.0-dropout_p))
 17 |     return conv
 18 | 
 19 | def DenseBlock(stack, n_layers, growth_rate, dropout_p, scope=None):
 20 |   """
 21 |   DenseBlock for DenseNet and FC-DenseNet
 22 |   Arguments:
 23 |     stack: input 4D tensor
 24 |     n_layers: number of internal layers
 25 |     growth_rate: number of feature maps per internal layer
 26 |   Returns:
 27 |     stack: current stack of feature maps (4D tensor)
 28 |     new_features: 4D tensor containing only the new feature maps generated
 29 |       in this block
 30 |   """
 31 |   with tf.name_scope(scope) as sc:
 32 |     new_features = []
 33 |     for j in range(n_layers):
 34 |       # Compute new feature maps
 35 |       layer = preact_conv(stack, growth_rate, dropout_p=dropout_p)
 36 |       new_features.append(layer)
 37 |       # Stack new layer
 38 |       stack = tf.concat([stack, layer], axis=-1)
 39 |     new_features = tf.concat(new_features, axis=-1)
 40 |     return stack, new_features
 41 | 
 42 | 
 43 | def TransitionDown(inputs, n_filters, dropout_p=0.2, scope=None):
 44 |   """
 45 |   Transition Down (TD) for FC-DenseNet
 46 |   Apply 1x1 BN + ReLU + conv then 2x2 max pooling
 47 |   """
 48 |   with tf.name_scope(scope) as sc:
 49 |     l = preact_conv(inputs, n_filters, kernel_size=[1, 1], dropout_p=dropout_p)
 50 |     l = slim.pool(l, [2, 2], stride=[2, 2], pooling_type='MAX')
 51 |     return l
 52 | 
 53 | 
 54 | def TransitionUp(block_to_upsample, skip_connection, n_filters_keep, scope=None):
 55 |   """
 56 |   Transition Up for FC-DenseNet
 57 |   Performs upsampling on block_to_upsample by a factor 2 and concatenates it with the skip_connection
 58 |   """
 59 |   with tf.name_scope(scope) as sc:
 60 |     # Upsample
 61 |     l = slim.conv2d_transpose(block_to_upsample, n_filters_keep, kernel_size=[3, 3], stride=[2, 2], activation_fn=None)
 62 |     # Concatenate with skip connection
 63 |     l = tf.concat([l, skip_connection], axis=-1)
 64 |     return l
 65 | 
 66 | def build_fc_densenet(inputs, num_classes, preset_model='FC-DenseNet56', n_filters_first_conv=48, n_pool=5, growth_rate=12, n_layers_per_block=4, dropout_p=0.2, scope=None):
 67 |     """
 68 |     Builds the FC-DenseNet model
 69 | 
 70 |     Arguments:
 71 |       inputs: the input tensor
 72 |       preset_model: The model you want to use
 73 |       n_classes: number of classes
 74 |       n_filters_first_conv: number of filters for the first convolution applied
 75 |       n_pool: number of pooling layers = number of transition down = number of transition up
 76 |       growth_rate: number of new feature maps created by each layer in a dense block
 77 |       n_layers_per_block: number of layers per block. Can be an int or a list of size 2 * n_pool + 1
 78 |       dropout_p: dropout rate applied after each convolution (0. for not using)
 79 | 
 80 |     Returns:
 81 |       Fc-DenseNet model
 82 |     """
 83 | 
 84 |     if preset_model == 'FC-DenseNet56':
 85 |       n_pool=5
 86 |       growth_rate=12
 87 |       n_layers_per_block=4
 88 |     elif preset_model == 'FC-DenseNet67':
 89 |       n_pool=5
 90 |       growth_rate=16
 91 |       n_layers_per_block=5
 92 |     elif preset_model == 'FC-DenseNet103':
 93 |       n_pool=5
 94 |       growth_rate=16
 95 |       n_layers_per_block=[4, 5, 7, 10, 12, 15, 12, 10, 7, 5, 4]
 96 |     else:
 97 |       raise ValueError("Unsupported FC-DenseNet model '%s'. This function only supports FC-DenseNet56, FC-DenseNet67, and FC-DenseNet103" % (preset_model)) 
 98 | 
 99 |     if type(n_layers_per_block) == list:
100 |         assert (len(n_layers_per_block) == 2 * n_pool + 1)
101 |     elif type(n_layers_per_block) == int:
102 |         n_layers_per_block = [n_layers_per_block] * (2 * n_pool + 1)
103 |     else:
104 |         raise ValueError
105 | 
106 |     with tf.variable_scope(scope, preset_model, [inputs]) as sc:
107 | 
108 |       #####################
109 |       # First Convolution #
110 |       #####################
111 |       # We perform a first convolution.
112 |       stack = slim.conv2d(inputs, n_filters_first_conv, [3, 3], scope='first_conv', activation_fn=None)
113 | 
114 |       n_filters = n_filters_first_conv
115 |       
116 |       #####################
117 |       # Downsampling path #
118 |       #####################
119 | 
120 |       skip_connection_list = []
121 | 
122 |       for i in range(n_pool):
123 |         # Dense Block
124 |         stack, _ = DenseBlock(stack, n_layers_per_block[i], growth_rate, dropout_p, scope='denseblock%d' % (i+1))
125 |         n_filters += growth_rate * n_layers_per_block[i]
126 |         # At the end of the dense block, the current stack is stored in the skip_connections list
127 |         skip_connection_list.append(stack)
128 | 
129 |         # Transition Down
130 |         stack = TransitionDown(stack, n_filters, dropout_p, scope='transitiondown%d'%(i+1))
131 | 
132 |       skip_connection_list = skip_connection_list[::-1]
133 | 
134 |       #####################
135 |       #     Bottleneck    #
136 |       #####################
137 | 
138 |       # Dense Block
139 |       # We will only upsample the new feature maps
140 |       stack, block_to_upsample = DenseBlock(stack, n_layers_per_block[n_pool], growth_rate, dropout_p, scope='denseblock%d' % (n_pool + 1))
141 | 
142 | 
143 |       #######################
144 |       #   Upsampling path   #
145 |       #######################
146 | 
147 |       for i in range(n_pool):
148 |         # Transition Up ( Upsampling + concatenation with the skip connection)
149 |         n_filters_keep = growth_rate * n_layers_per_block[n_pool + i]
150 |         stack = TransitionUp(block_to_upsample, skip_connection_list[i], n_filters_keep, scope='transitionup%d' % (n_pool + i + 1))
151 | 
152 |         # Dense Block
153 |         # We will only upsample the new feature maps
154 |         stack, block_to_upsample = DenseBlock(stack, n_layers_per_block[n_pool + i + 1], growth_rate, dropout_p, scope='denseblock%d' % (n_pool + i + 2))
155 | 
156 | 
157 |       #####################
158 |       #      Softmax      #
159 |       #####################
160 |       net = slim.conv2d(stack, num_classes, [1, 1], activation_fn=None, scope='logits')
161 |       return net


--------------------------------------------------------------------------------
/models/FRRN.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | from tensorflow.contrib import slim
  3 | 
  4 | def Upsampling(inputs,scale):
  5 |     return tf.image.resize_nearest_neighbor(inputs, size=[tf.shape(inputs)[1]*scale,  tf.shape(inputs)[2]*scale])
  6 | 
  7 | def Unpooling(inputs,scale):
  8 |     return tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*scale,  tf.shape(inputs)[2]*scale])
  9 | 
 10 | def ResidualUnit(inputs, n_filters=48, filter_size=3):
 11 |     """
 12 |     A local residual unit
 13 | 
 14 |     Arguments:
 15 |       inputs: The input tensor
 16 |       n_filters: Number of output feature maps for each conv
 17 |       filter_size: Size of convolution kernel
 18 | 
 19 |     Returns:
 20 |       Output of local residual block
 21 |     """
 22 | 
 23 |     net = slim.conv2d(inputs, n_filters, filter_size, activation_fn=None)
 24 |     net = slim.batch_norm(net, fused=True)
 25 |     net = tf.nn.relu(net)
 26 |     net = slim.conv2d(net, n_filters, filter_size, activation_fn=None)
 27 |     net = slim.batch_norm(net, fused=True)
 28 | 
 29 |     return net
 30 | 
 31 | def FullResolutionResidualUnit(pool_stream, res_stream, n_filters_3, n_filters_1, pool_scale):
 32 |     """
 33 |     A full resolution residual unit
 34 | 
 35 |     Arguments:
 36 |       pool_stream: The inputs from the pooling stream
 37 |       res_stream: The inputs from the residual stream
 38 |       n_filters_3: Number of output feature maps for each 3x3 conv
 39 |       n_filters_1: Number of output feature maps for each 1x1 conv
 40 |       pool_scale: scale of the pooling layer i.e window size and stride
 41 | 
 42 |     Returns:
 43 |       Output of full resolution residual block
 44 |     """
 45 | 
 46 |     G = tf.concat([pool_stream, slim.pool(res_stream, [pool_scale, pool_scale], stride=[pool_scale, pool_scale], pooling_type='MAX')], axis=-1)
 47 | 
 48 |     
 49 | 
 50 |     net = slim.conv2d(G, n_filters_3, kernel_size=3, activation_fn=None)
 51 |     net = slim.batch_norm(net, fused=True)
 52 |     net = tf.nn.relu(net)
 53 |     net = slim.conv2d(net, n_filters_3, kernel_size=3, activation_fn=None)
 54 |     net = slim.batch_norm(net, fused=True)
 55 |     pool_stream_out = tf.nn.relu(net)
 56 | 
 57 |     net = slim.conv2d(pool_stream_out, n_filters_1, kernel_size=1, activation_fn=None)
 58 |     net = Upsampling(net, scale=pool_scale)
 59 |     res_stream_out = tf.add(res_stream, net)
 60 | 
 61 |     return pool_stream_out, res_stream_out
 62 | 
 63 | 
 64 | 
 65 | def build_frrn(inputs, num_classes, preset_model='FRRN-A'):
 66 |     """
 67 |     Builds the Full Resolution Residual Network model. 
 68 | 
 69 |     Arguments:
 70 |       inputs: The input tensor
 71 |       preset_model: Which model you want to use. Select FRRN-A or FRRN-B
 72 |       num_classes: Number of classes
 73 | 
 74 |     Returns:
 75 |       FRRN model
 76 |     """
 77 | 
 78 |     if preset_model == 'FRRN-A':
 79 | 
 80 |         #####################
 81 |         # Initial Stage   
 82 |         #####################
 83 |         net = slim.conv2d(inputs, 48, kernel_size=5, activation_fn=None)
 84 |         net = slim.batch_norm(net, fused=True)
 85 |         net = tf.nn.relu(net)
 86 | 
 87 |         net = ResidualUnit(net, n_filters=48, filter_size=3)
 88 |         net = ResidualUnit(net, n_filters=48, filter_size=3)
 89 |         net = ResidualUnit(net, n_filters=48, filter_size=3)
 90 | 
 91 | 
 92 |         #####################
 93 |         # Downsampling Path 
 94 |         #####################
 95 |         pool_stream = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX')
 96 |         res_stream = slim.conv2d(net, 32, kernel_size=1, activation_fn=None)
 97 | 
 98 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2)
 99 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2)
100 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2)
101 | 
102 |         pool_stream = slim.pool(pool_stream, [2, 2], stride=[2, 2], pooling_type='MAX') 
103 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4)
104 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4)
105 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4)
106 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4)
107 | 
108 |         pool_stream = slim.pool(pool_stream, [2, 2], stride=[2, 2], pooling_type='MAX')
109 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=8)
110 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=8)
111 | 
112 |         pool_stream = slim.pool(pool_stream, [2, 2], stride=[2, 2], pooling_type='MAX')
113 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=16)
114 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=16)
115 | 
116 |         #####################
117 |         # Upsampling Path 
118 |         #####################
119 |         pool_stream = Unpooling(pool_stream, 2)
120 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=8)
121 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=8)
122 | 
123 |         pool_stream = Unpooling(pool_stream, 2)
124 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4)
125 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4)
126 | 
127 |         pool_stream = Unpooling(pool_stream, 2)
128 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2)
129 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2)
130 | 
131 |         pool_stream = Unpooling(pool_stream, 2)
132 | 
133 |         #####################
134 |         # Final Stage 
135 |         #####################
136 |         net = tf.concat([pool_stream, res_stream], axis=-1)
137 |         net = ResidualUnit(net, n_filters=48, filter_size=3)
138 |         net = ResidualUnit(net, n_filters=48, filter_size=3)
139 |         net = ResidualUnit(net, n_filters=48, filter_size=3)
140 | 
141 |         net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits')
142 |         return net
143 | 
144 |         
145 |     elif preset_model == 'FRRN-B':
146 |         #####################
147 |         # Initial Stage   
148 |         #####################
149 |         net = slim.conv2d(inputs, 48, kernel_size=5, activation_fn=None)
150 |         net = slim.batch_norm(net, fused=True)
151 |         net = tf.nn.relu(net)
152 | 
153 |         net = ResidualUnit(net, n_filters=48, filter_size=3)
154 |         net = ResidualUnit(net, n_filters=48, filter_size=3)
155 |         net = ResidualUnit(net, n_filters=48, filter_size=3)
156 | 
157 | 
158 |         #####################
159 |         # Downsampling Path 
160 |         #####################
161 |         pool_stream = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX')
162 |         res_stream = slim.conv2d(net, 32, kernel_size=1, activation_fn=None)
163 | 
164 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2)
165 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2)
166 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2)
167 | 
168 |         pool_stream = slim.pool(pool_stream, [2, 2], stride=[2, 2], pooling_type='MAX') 
169 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4)
170 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4)
171 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4)
172 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4)
173 | 
174 |         pool_stream = slim.pool(pool_stream, [2, 2], stride=[2, 2], pooling_type='MAX')
175 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=8)
176 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=8)
177 | 
178 |         pool_stream = slim.pool(pool_stream, [2, 2], stride=[2, 2], pooling_type='MAX')
179 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=16)
180 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=16)
181 | 
182 |         pool_stream = slim.pool(pool_stream, [2, 2], stride=[2, 2], pooling_type='MAX')
183 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=32)
184 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=32)
185 | 
186 |         #####################
187 |         # Upsampling Path 
188 |         #####################
189 |         pool_stream = Unpooling(pool_stream, 2)
190 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=16)
191 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=16)
192 | 
193 |         pool_stream = Unpooling(pool_stream, 2)
194 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=8)
195 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=8)
196 | 
197 |         pool_stream = Unpooling(pool_stream, 2)
198 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4)
199 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4)
200 | 
201 |         pool_stream = Unpooling(pool_stream, 2)
202 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2)
203 |         pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2)
204 | 
205 |         pool_stream = Unpooling(pool_stream, 2)
206 | 
207 |         #####################
208 |         # Final Stage 
209 |         #####################
210 |         net = tf.concat([pool_stream, res_stream], axis=-1)
211 |         net = ResidualUnit(net, n_filters=48, filter_size=3)
212 |         net = ResidualUnit(net, n_filters=48, filter_size=3)
213 |         net = ResidualUnit(net, n_filters=48, filter_size=3)
214 | 
215 |         net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits')
216 |         return net
217 | 
218 |     else:
219 |         raise ValueError("Unsupported FRRN model '%s'. This function only supports FRRN-A and FRRN-B" % (preset_model)) 
220 | 


--------------------------------------------------------------------------------
/models/GCN.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | from tensorflow.contrib import slim
  3 | from builders import frontend_builder
  4 | import os, sys
  5 | 
  6 | def Upsampling(inputs,scale):
  7 |     return tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*scale,  tf.shape(inputs)[2]*scale])
  8 | 
  9 | 
 10 | def ConvUpscaleBlock(inputs, n_filters, kernel_size=[3, 3], scale=2):
 11 |     """
 12 |     Basic deconv block for GCN
 13 |     Apply Transposed Convolution for feature map upscaling
 14 |     """
 15 |     net = slim.conv2d_transpose(inputs, n_filters, kernel_size=[3, 3], stride=[2, 2], activation_fn=None)
 16 |     return net
 17 | 
 18 | def BoundaryRefinementBlock(inputs, n_filters, kernel_size=[3, 3]):
 19 |     """
 20 |     Boundary Refinement Block for GCN
 21 |     """
 22 |     net = slim.conv2d(inputs, n_filters, kernel_size, activation_fn=None, normalizer_fn=None)
 23 |     net = tf.nn.relu(net)
 24 |     net = slim.conv2d(net, n_filters, kernel_size, activation_fn=None, normalizer_fn=None)
 25 |     net = tf.add(inputs, net)
 26 |     return net
 27 | 
 28 | def GlobalConvBlock(inputs, n_filters=21, size=3):
 29 |     """
 30 |     Global Conv Block for GCN
 31 |     """
 32 | 
 33 |     net_1 = slim.conv2d(inputs, n_filters, [size, 1], activation_fn=None, normalizer_fn=None)
 34 |     net_1 = slim.conv2d(net_1, n_filters, [1, size], activation_fn=None, normalizer_fn=None)
 35 | 
 36 |     net_2 = slim.conv2d(inputs, n_filters, [1, size], activation_fn=None, normalizer_fn=None)
 37 |     net_2 = slim.conv2d(net_2, n_filters, [size, 1], activation_fn=None, normalizer_fn=None)
 38 | 
 39 |     net = tf.add(net_1, net_2)
 40 | 
 41 |     return net
 42 | 
 43 | 
 44 | def build_gcn(inputs, num_classes, preset_model='GCN', frontend="ResNet101", weight_decay=1e-5, is_training=True, upscaling_method="bilinear", pretrained_dir="models"):
 45 |     """
 46 |     Builds the GCN model. 
 47 | 
 48 |     Arguments:
 49 |       inputs: The input tensor
 50 |       preset_model: Which model you want to use. Select which ResNet model to use for feature extraction 
 51 |       num_classes: Number of classes
 52 | 
 53 |     Returns:
 54 |       GCN model
 55 |     """
 56 | 
 57 |     logits, end_points, frontend_scope, init_fn  = frontend_builder.build_frontend(inputs, frontend, pretrained_dir=pretrained_dir, is_training=is_training)
 58 | 
 59 |     
 60 | 
 61 | 
 62 |     res = [end_points['pool5'], end_points['pool4'],
 63 |          end_points['pool3'], end_points['pool2']]
 64 | 
 65 |     down_5 = GlobalConvBlock(res[0], n_filters=21, size=3)
 66 |     down_5 = BoundaryRefinementBlock(down_5, n_filters=21, kernel_size=[3, 3])
 67 |     down_5 = ConvUpscaleBlock(down_5, n_filters=21, kernel_size=[3, 3], scale=2)
 68 | 
 69 |     down_4 = GlobalConvBlock(res[1], n_filters=21, size=3)
 70 |     down_4 = BoundaryRefinementBlock(down_4, n_filters=21, kernel_size=[3, 3])
 71 |     down_4 = tf.add(down_4, down_5)
 72 |     down_4 = BoundaryRefinementBlock(down_4, n_filters=21, kernel_size=[3, 3])
 73 |     down_4 = ConvUpscaleBlock(down_4, n_filters=21, kernel_size=[3, 3], scale=2)
 74 | 
 75 |     down_3 = GlobalConvBlock(res[2], n_filters=21, size=3)
 76 |     down_3 = BoundaryRefinementBlock(down_3, n_filters=21, kernel_size=[3, 3])
 77 |     down_3 = tf.add(down_3, down_4)
 78 |     down_3 = BoundaryRefinementBlock(down_3, n_filters=21, kernel_size=[3, 3])
 79 |     down_3 = ConvUpscaleBlock(down_3, n_filters=21, kernel_size=[3, 3], scale=2)
 80 | 
 81 |     down_2 = GlobalConvBlock(res[3], n_filters=21, size=3)
 82 |     down_2 = BoundaryRefinementBlock(down_2, n_filters=21, kernel_size=[3, 3])
 83 |     down_2 = tf.add(down_2, down_3)
 84 |     down_2 = BoundaryRefinementBlock(down_2, n_filters=21, kernel_size=[3, 3])
 85 |     down_2 = ConvUpscaleBlock(down_2, n_filters=21, kernel_size=[3, 3], scale=2)
 86 | 
 87 |     net = BoundaryRefinementBlock(down_2, n_filters=21, kernel_size=[3, 3])
 88 |     net = ConvUpscaleBlock(net, n_filters=21, kernel_size=[3, 3], scale=2)
 89 |     net = BoundaryRefinementBlock(net, n_filters=21, kernel_size=[3, 3])
 90 | 
 91 |     net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits')
 92 | 
 93 |     return net, init_fn
 94 | 
 95 | 
 96 | def mean_image_subtraction(inputs, means=[123.68, 116.78, 103.94]):
 97 |     inputs=tf.to_float(inputs)
 98 |     num_channels = inputs.get_shape().as_list()[-1]
 99 |     if len(means) != num_channels:
100 |       raise ValueError('len(means) must match the number of channels')
101 |     channels = tf.split(axis=3, num_or_size_splits=num_channels, value=inputs)
102 |     for i in range(num_channels):
103 |         channels[i] -= means[i]
104 |     return tf.concat(axis=3, values=channels)
105 | 


--------------------------------------------------------------------------------
/models/ICNet.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | from tensorflow.contrib import slim
  3 | import numpy as np
  4 | from frontends import frontend_builder
  5 | import os, sys
  6 | 
  7 | def Upsampling_by_shape(inputs, feature_map_shape):
  8 |     return tf.image.resize_bilinear(inputs, size=feature_map_shape)
  9 | 
 10 | def Upsampling_by_scale(inputs, scale):
 11 |     return tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*scale,  tf.shape(inputs)[2]*scale])
 12 | 
 13 | def ConvUpscaleBlock(inputs, n_filters, kernel_size=[3, 3], scale=2):
 14 |     """
 15 |     Basic conv transpose block for Encoder-Decoder upsampling
 16 |     Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity
 17 |     """
 18 |     net = slim.conv2d_transpose(inputs, n_filters, kernel_size=[3, 3], stride=[2, 2], activation_fn=None)
 19 |     net = tf.nn.relu(slim.batch_norm(net, fused=True))
 20 |     return net
 21 | 
 22 | def ConvBlock(inputs, n_filters, kernel_size=[3, 3]):
 23 |     """
 24 |     Basic conv block for Encoder-Decoder
 25 |     Apply successivly Convolution, BatchNormalization, ReLU nonlinearity
 26 |     """
 27 |     net = slim.conv2d(inputs, n_filters, kernel_size, activation_fn=None, normalizer_fn=None)
 28 |     net = tf.nn.relu(slim.batch_norm(net, fused=True))
 29 |     return net
 30 | 
 31 | def InterpBlock(net, level, feature_map_shape, pooling_type):
 32 |     
 33 |     # Compute the kernel and stride sizes according to how large the final feature map will be
 34 |     # When the kernel size and strides are equal, then we can compute the final feature map size
 35 |     # by simply dividing the current size by the kernel or stride size
 36 |     # The final feature map sizes are 1x1, 2x2, 3x3, and 6x6. We round to the closest integer
 37 |     kernel_size = [int(np.round(float(feature_map_shape[0]) / float(level))), int(np.round(float(feature_map_shape[1]) / float(level)))]
 38 |     stride_size = kernel_size
 39 | 
 40 |     net = slim.pool(net, kernel_size, stride=stride_size, pooling_type='MAX')
 41 |     net = slim.conv2d(net, 512, [1, 1], activation_fn=None)
 42 |     net = slim.batch_norm(net, fused=True)
 43 |     net = tf.nn.relu(net)
 44 |     net = Upsampling_by_shape(net, feature_map_shape)
 45 |     return net
 46 | 
 47 | def PyramidPoolingModule_ICNet(inputs, feature_map_shape, pooling_type):
 48 |     """
 49 |     Build the Pyramid Pooling Module.
 50 |     """
 51 | 
 52 |     interp_block1 = InterpBlock(inputs, 1, feature_map_shape, pooling_type)
 53 |     interp_block2 = InterpBlock(inputs, 2, feature_map_shape, pooling_type)
 54 |     interp_block3 = InterpBlock(inputs, 3, feature_map_shape, pooling_type)
 55 |     interp_block6 = InterpBlock(inputs, 6, feature_map_shape, pooling_type)
 56 | 
 57 |     res = tf.add([inputs, interp_block6, interp_block3, interp_block2, interp_block1])
 58 |     return res
 59 | 
 60 | def CFFBlock(F1, F2, num_classes):
 61 |     F1_big = Upsampling_by_scale(F1, scale=2)
 62 |     F1_out = slim.conv2d(F1_big, num_classes, [1, 1], activation_fn=None)
 63 | 
 64 |     F1_big = slim.conv2d(F1_big, 2048, [3, 3], rate=2, activation_fn=None)
 65 |     F1_big = slim.batch_norm(F1_big, fused=True)
 66 | 
 67 |     F2_proj = slim.conv2d(F2, 512, [1, 1], rate=1, activation_fn=None)
 68 |     F2_proj = slim.batch_norm(F2_proj, fused=True)
 69 | 
 70 |     F2_out = tf.add([F1_big, F2_proj])
 71 |     F2_out = tf.nn.relu(F2_out)
 72 | 
 73 |     return F1_out, F2_out
 74 | 
 75 | 
 76 | def build_icnet(inputs, label_size, num_classes, preset_model='ICNet', pooling_type = "MAX",
 77 |     frontend="ResNet101", weight_decay=1e-5, is_training=True, pretrained_dir="models"):
 78 |     """
 79 |     Builds the ICNet model. 
 80 | 
 81 |     Arguments:
 82 |       inputs: The input tensor
 83 |       label_size: Size of the final label tensor. We need to know this for proper upscaling 
 84 |       preset_model: Which model you want to use. Select which ResNet model to use for feature extraction 
 85 |       num_classes: Number of classes
 86 |       pooling_type: Max or Average pooling
 87 | 
 88 |     Returns:
 89 |       ICNet model
 90 |     """
 91 | 
 92 |     inputs_4 = tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*4,  tf.shape(inputs)[2]*4])   
 93 |     inputs_2 = tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*2,  tf.shape(inputs)[2]*2])
 94 |     inputs_1 = inputs
 95 | 
 96 |     if frontend == 'Res50':
 97 |         with slim.arg_scope(resnet_v2.resnet_arg_scope(weight_decay=weight_decay)):
 98 |             logits_32, end_points_32 = resnet_v2.resnet_v2_50(inputs_4, is_training=is_training, scope='resnet_v2_50')
 99 |             logits_16, end_points_16 = resnet_v2.resnet_v2_50(inputs_2, is_training=is_training, scope='resnet_v2_50')
100 |             logits_8, end_points_8 = resnet_v2.resnet_v2_50(inputs_1, is_training=is_training, scope='resnet_v2_50')
101 |             resnet_scope='resnet_v2_50'
102 |             # ICNet requires pre-trained ResNet weights
103 |             init_fn = slim.assign_from_checkpoint_fn(os.path.join(pretrained_dir, 'resnet_v2_50.ckpt'), slim.get_model_variables('resnet_v2_50'))
104 |     elif frontend == 'Res101':
105 |         with slim.arg_scope(resnet_v2.resnet_arg_scope(weight_decay=weight_decay)):
106 |             logits_32, end_points_32 = resnet_v2.resnet_v2_101(inputs_4, is_training=is_training, scope='resnet_v2_101')
107 |             logits_16, end_points_16 = resnet_v2.resnet_v2_101(inputs_2, is_training=is_training, scope='resnet_v2_101')
108 |             logits_8, end_points_8 = resnet_v2.resnet_v2_101(inputs_1, is_training=is_training, scope='resnet_v2_101')
109 |             resnet_scope='resnet_v2_101'
110 |             # ICNet requires pre-trained ResNet weights
111 |             init_fn = slim.assign_from_checkpoint_fn(os.path.join(pretrained_dir, 'resnet_v2_101.ckpt'), slim.get_model_variables('resnet_v2_101'))
112 |     elif frontend == 'Res152':
113 |         with slim.arg_scope(resnet_v2.resnet_arg_scope(weight_decay=weight_decay)):
114 |             logits_32, end_points_32 = resnet_v2.resnet_v2_152(inputs_4, is_training=is_training, scope='resnet_v2_152')
115 |             logits_16, end_points_16 = resnet_v2.resnet_v2_152(inputs_2, is_training=is_training, scope='resnet_v2_152')
116 |             logits_8, end_points_8 = resnet_v2.resnet_v2_152(inputs_1, is_training=is_training, scope='resnet_v2_152')
117 |             resnet_scope='resnet_v2_152'
118 |             # ICNet requires pre-trained ResNet weights
119 |             init_fn = slim.assign_from_checkpoint_fn(os.path.join(pretrained_dir, 'resnet_v2_152.ckpt'), slim.get_model_variables('resnet_v2_152'))
120 |     else:
121 |         raise ValueError("Unsupported ResNet model '%s'. This function only supports ResNet 50, ResNet 101, and ResNet 152" % (frontend))
122 | 
123 | 
124 | 
125 |     feature_map_shape = [int(x / 32.0) for x in label_size]
126 |     block_32 = PyramidPoolingModule(end_points_32['pool3'], feature_map_shape=feature_map_shape, pooling_type=pooling_type)
127 | 
128 |     out_16, block_16 = CFFBlock(psp_32, end_points_16['pool3'])
129 |     out_8, block_8 = CFFBlock(block_16, end_points_8['pool3'])
130 |     out_4 = Upsampling_by_scale(out_8, scale=2)
131 |     out_4 = slim.conv2d(out_4, num_classes, [1, 1], activation_fn=None) 
132 | 
133 |     out_full = Upsampling_by_scale(out_4, scale=2)
134 |     
135 |     out_full = slim.conv2d(out_full, num_classes, [1, 1], activation_fn=None, scope='logits')
136 | 
137 |     net = tf.concat([out_16, out_8, out_4, out_final])
138 | 
139 |     return net, init_fn
140 | 
141 | 
142 | def mean_image_subtraction(inputs, means=[123.68, 116.78, 103.94]):
143 |     inputs=tf.to_float(inputs)
144 |     num_channels = inputs.get_shape().as_list()[-1]
145 |     if len(means) != num_channels:
146 |       raise ValueError('len(means) must match the number of channels')
147 |     channels = tf.split(axis=3, num_or_size_splits=num_channels, value=inputs)
148 |     for i in range(num_channels):
149 |         channels[i] -= means[i]
150 |     return tf.concat(axis=3, values=channels)


--------------------------------------------------------------------------------
/models/MobileUNet.py:
--------------------------------------------------------------------------------
  1 | import os,time,cv2
  2 | import tensorflow as tf
  3 | import tensorflow.contrib.slim as slim
  4 | import numpy as np
  5 | 
  6 | def ConvBlock(inputs, n_filters, kernel_size=[3, 3]):
  7 | 	"""
  8 | 	Builds the conv block for MobileNets
  9 | 	Apply successivly a 2D convolution, BatchNormalization relu
 10 | 	"""
 11 | 	# Skip pointwise by setting num_outputs=Non
 12 | 	net = slim.conv2d(inputs, n_filters, kernel_size=[1, 1], activation_fn=None)
 13 | 	net = slim.batch_norm(net, fused=True)
 14 | 	net = tf.nn.relu(net)
 15 | 	return net
 16 | 
 17 | def DepthwiseSeparableConvBlock(inputs, n_filters, kernel_size=[3, 3]):
 18 | 	"""
 19 | 	Builds the Depthwise Separable conv block for MobileNets
 20 | 	Apply successivly a 2D separable convolution, BatchNormalization relu, conv, BatchNormalization, relu
 21 | 	"""
 22 | 	# Skip pointwise by setting num_outputs=None
 23 | 	net = slim.separable_convolution2d(inputs, num_outputs=None, depth_multiplier=1, kernel_size=[3, 3], activation_fn=None)
 24 | 
 25 | 	net = slim.batch_norm(net, fused=True)
 26 | 	net = tf.nn.relu(net)
 27 | 	net = slim.conv2d(net, n_filters, kernel_size=[1, 1], activation_fn=None)
 28 | 	net = slim.batch_norm(net, fused=True)
 29 | 	net = tf.nn.relu(net)
 30 | 	return net
 31 | 
 32 | def conv_transpose_block(inputs, n_filters, kernel_size=[3, 3]):
 33 | 	"""
 34 | 	Basic conv transpose block for Encoder-Decoder upsampling
 35 | 	Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity
 36 | 	"""
 37 | 	net = slim.conv2d_transpose(inputs, n_filters, kernel_size=[3, 3], stride=[2, 2], activation_fn=None)
 38 | 	net = tf.nn.relu(slim.batch_norm(net))
 39 | 	return net
 40 | 
 41 | def build_mobile_unet(inputs, preset_model, num_classes):
 42 | 
 43 | 	has_skip = False
 44 | 	if preset_model == "MobileUNet":
 45 | 		has_skip = False
 46 | 	elif preset_model == "MobileUNet-Skip":
 47 | 		has_skip = True
 48 | 	else:
 49 | 		raise ValueError("Unsupported MobileUNet model '%s'. This function only supports MobileUNet and MobileUNet-Skip" % (preset_model))
 50 | 
 51 |     #####################
 52 | 	# Downsampling path #
 53 | 	#####################
 54 | 	net = ConvBlock(inputs, 64)
 55 | 	net = DepthwiseSeparableConvBlock(net, 64)
 56 | 	net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX')
 57 | 	skip_1 = net
 58 | 
 59 | 	net = DepthwiseSeparableConvBlock(net, 128)
 60 | 	net = DepthwiseSeparableConvBlock(net, 128)
 61 | 	net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX')
 62 | 	skip_2 = net
 63 | 
 64 | 	net = DepthwiseSeparableConvBlock(net, 256)
 65 | 	net = DepthwiseSeparableConvBlock(net, 256)
 66 | 	net = DepthwiseSeparableConvBlock(net, 256)
 67 | 	net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX')
 68 | 	skip_3 = net
 69 | 
 70 | 	net = DepthwiseSeparableConvBlock(net, 512)
 71 | 	net = DepthwiseSeparableConvBlock(net, 512)
 72 | 	net = DepthwiseSeparableConvBlock(net, 512)
 73 | 	net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX')
 74 | 	skip_4 = net
 75 | 
 76 | 	net = DepthwiseSeparableConvBlock(net, 512)
 77 | 	net = DepthwiseSeparableConvBlock(net, 512)
 78 | 	net = DepthwiseSeparableConvBlock(net, 512)
 79 | 	net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX')
 80 | 
 81 | 
 82 | 	#####################
 83 | 	# Upsampling path #
 84 | 	#####################
 85 | 	net = conv_transpose_block(net, 512)
 86 | 	net = DepthwiseSeparableConvBlock(net, 512)
 87 | 	net = DepthwiseSeparableConvBlock(net, 512)
 88 | 	net = DepthwiseSeparableConvBlock(net, 512)
 89 | 	if has_skip:
 90 | 		net = tf.add(net, skip_4)
 91 | 
 92 | 	net = conv_transpose_block(net, 512)
 93 | 	net = DepthwiseSeparableConvBlock(net, 512)
 94 | 	net = DepthwiseSeparableConvBlock(net, 512)
 95 | 	net = DepthwiseSeparableConvBlock(net, 256)
 96 | 	if has_skip:
 97 | 		net = tf.add(net, skip_3)
 98 | 
 99 | 	net = conv_transpose_block(net, 256)
100 | 	net = DepthwiseSeparableConvBlock(net, 256)
101 | 	net = DepthwiseSeparableConvBlock(net, 256)
102 | 	net = DepthwiseSeparableConvBlock(net, 128)
103 | 	if has_skip:
104 | 		net = tf.add(net, skip_2)
105 | 
106 | 	net = conv_transpose_block(net, 128)
107 | 	net = DepthwiseSeparableConvBlock(net, 128)
108 | 	net = DepthwiseSeparableConvBlock(net, 64)
109 | 	if has_skip:
110 | 		net = tf.add(net, skip_1)
111 | 
112 | 	net = conv_transpose_block(net, 64)
113 | 	net = DepthwiseSeparableConvBlock(net, 64)
114 | 	net = DepthwiseSeparableConvBlock(net, 64)
115 | 
116 | 	#####################
117 | 	#      Softmax      #
118 | 	#####################
119 | 	net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits')
120 | 	return net


--------------------------------------------------------------------------------
/models/PSPNet.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | from tensorflow.contrib import slim
  3 | import numpy as np
  4 | from builders import frontend_builder
  5 | import os, sys
  6 | 
  7 | def Upsampling(inputs,feature_map_shape):
  8 |     return tf.image.resize_bilinear(inputs, size=feature_map_shape)
  9 | 
 10 | def ConvUpscaleBlock(inputs, n_filters, kernel_size=[3, 3], scale=2):
 11 |     """
 12 |     Basic conv transpose block for Encoder-Decoder upsampling
 13 |     Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity
 14 |     """
 15 |     net = tf.nn.relu(slim.batch_norm(inputs, fused=True))
 16 |     net = slim.conv2d_transpose(net, n_filters, kernel_size=[3, 3], stride=[scale, scale], activation_fn=None)
 17 |     return net
 18 | 
 19 | def ConvBlock(inputs, n_filters, kernel_size=[3, 3]):
 20 |     """
 21 |     Basic conv block for Encoder-Decoder
 22 |     Apply successivly Convolution, BatchNormalization, ReLU nonlinearity
 23 |     """
 24 |     net = tf.nn.relu(slim.batch_norm(inputs, fused=True))
 25 |     net = slim.conv2d(net, n_filters, kernel_size, activation_fn=None, normalizer_fn=None)
 26 |     return net
 27 | 
 28 | def InterpBlock(net, level, feature_map_shape, pooling_type):
 29 |     
 30 |     # Compute the kernel and stride sizes according to how large the final feature map will be
 31 |     # When the kernel size and strides are equal, then we can compute the final feature map size
 32 |     # by simply dividing the current size by the kernel or stride size
 33 |     # The final feature map sizes are 1x1, 2x2, 3x3, and 6x6. We round to the closest integer
 34 |     kernel_size = [int(np.round(float(feature_map_shape[0]) / float(level))), int(np.round(float(feature_map_shape[1]) / float(level)))]
 35 |     stride_size = kernel_size
 36 | 
 37 |     net = slim.pool(net, kernel_size, stride=stride_size, pooling_type='MAX')
 38 |     net = slim.conv2d(net, 512, [1, 1], activation_fn=None)
 39 |     net = slim.batch_norm(net, fused=True)
 40 |     net = tf.nn.relu(net)
 41 |     net = Upsampling(net, feature_map_shape)
 42 |     return net
 43 | 
 44 | def PyramidPoolingModule(inputs, feature_map_shape, pooling_type):
 45 |     """
 46 |     Build the Pyramid Pooling Module.
 47 |     """
 48 | 
 49 |     interp_block1 = InterpBlock(inputs, 1, feature_map_shape, pooling_type)
 50 |     interp_block2 = InterpBlock(inputs, 2, feature_map_shape, pooling_type)
 51 |     interp_block3 = InterpBlock(inputs, 3, feature_map_shape, pooling_type)
 52 |     interp_block6 = InterpBlock(inputs, 6, feature_map_shape, pooling_type)
 53 | 
 54 |     res = tf.concat([inputs, interp_block6, interp_block3, interp_block2, interp_block1], axis=-1)
 55 |     return res
 56 | 
 57 | 
 58 | 
 59 | def build_pspnet(inputs, label_size, num_classes, preset_model='PSPNet', frontend="ResNet101", pooling_type = "MAX",
 60 |     weight_decay=1e-5, upscaling_method="conv", is_training=True, pretrained_dir="models"):
 61 |     """
 62 |     Builds the PSPNet model. 
 63 | 
 64 |     Arguments:
 65 |       inputs: The input tensor
 66 |       label_size: Size of the final label tensor. We need to know this for proper upscaling 
 67 |       preset_model: Which model you want to use. Select which ResNet model to use for feature extraction 
 68 |       num_classes: Number of classes
 69 |       pooling_type: Max or Average pooling
 70 | 
 71 |     Returns:
 72 |       PSPNet model
 73 |     """
 74 | 
 75 |     logits, end_points, frontend_scope, init_fn  = frontend_builder.build_frontend(inputs, frontend, pretrained_dir=pretrained_dir, is_training=is_training)
 76 | 
 77 |     feature_map_shape = [int(x / 8.0) for x in label_size]
 78 |     print(feature_map_shape)
 79 |     psp = PyramidPoolingModule(end_points['pool3'], feature_map_shape=feature_map_shape, pooling_type=pooling_type)
 80 | 
 81 |     net = slim.conv2d(psp, 512, [3, 3], activation_fn=None)
 82 |     net = slim.batch_norm(net, fused=True)
 83 |     net = tf.nn.relu(net)
 84 | 
 85 |     if upscaling_method.lower() == "conv":
 86 |         net = ConvUpscaleBlock(net, 256, kernel_size=[3, 3], scale=2)
 87 |         net = ConvBlock(net, 256)
 88 |         net = ConvUpscaleBlock(net, 128, kernel_size=[3, 3], scale=2)
 89 |         net = ConvBlock(net, 128)
 90 |         net = ConvUpscaleBlock(net, 64, kernel_size=[3, 3], scale=2)
 91 |         net = ConvBlock(net, 64)
 92 |     elif upscaling_method.lower() == "bilinear":
 93 |         net = Upsampling(net, label_size)
 94 |     
 95 |     net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits')
 96 | 
 97 |     return net, init_fn
 98 | 
 99 | 
100 | def mean_image_subtraction(inputs, means=[123.68, 116.78, 103.94]):
101 |     inputs=tf.to_float(inputs)
102 |     num_channels = inputs.get_shape().as_list()[-1]
103 |     if len(means) != num_channels:
104 |       raise ValueError('len(means) must match the number of channels')
105 |     channels = tf.split(axis=3, num_or_size_splits=num_channels, value=inputs)
106 |     for i in range(num_channels):
107 |         channels[i] -= means[i]
108 |     return tf.concat(axis=3, values=channels)


--------------------------------------------------------------------------------
/models/RefineNet.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | from tensorflow.contrib import slim
  3 | from builders import frontend_builder
  4 | import os, sys
  5 | 
  6 | def Upsampling(inputs,scale):
  7 |     return tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*scale,  tf.shape(inputs)[2]*scale])
  8 | 
  9 | def ConvBlock(inputs, n_filters, kernel_size=[3, 3]):
 10 |     """
 11 |     Basic conv block for Encoder-Decoder
 12 |     Apply successivly Convolution, BatchNormalization, ReLU nonlinearity
 13 |     """
 14 |     net = tf.nn.relu(slim.batch_norm(inputs, fused=True))
 15 |     net = slim.conv2d(net, n_filters, kernel_size, activation_fn=None, normalizer_fn=None)
 16 |     return net
 17 | 
 18 | def ConvUpscaleBlock(inputs, n_filters, kernel_size=[3, 3], scale=2):
 19 |     """
 20 |     Basic conv transpose block for Encoder-Decoder upsampling
 21 |     Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity
 22 |     """
 23 |     net = tf.nn.relu(slim.batch_norm(inputs, fused=True))
 24 |     net = slim.conv2d_transpose(net, n_filters, kernel_size=[3, 3], stride=[scale, scale], activation_fn=None)
 25 |     return net
 26 | 
 27 | 
 28 | def ResidualConvUnit(inputs,n_filters=256,kernel_size=3):
 29 |     """
 30 |     A local residual unit designed to fine-tune the pretrained ResNet weights
 31 | 
 32 |     Arguments:
 33 |       inputs: The input tensor
 34 |       n_filters: Number of output feature maps for each conv
 35 |       kernel_size: Size of convolution kernel
 36 | 
 37 |     Returns:
 38 |       Output of local residual block
 39 |     """
 40 |     net=tf.nn.relu(inputs)
 41 |     net=slim.conv2d(net, n_filters, kernel_size, activation_fn=None)
 42 |     net=tf.nn.relu(net)
 43 |     net=slim.conv2d(net,n_filters,kernel_size, activation_fn=None)
 44 |     net=tf.add(net,inputs)
 45 |     return net
 46 | 
 47 | def ChainedResidualPooling(inputs,n_filters=256):
 48 |     """
 49 |     Chained residual pooling aims to capture background 
 50 |     context from a large image region. This component is 
 51 |     built as a chain of 2 pooling blocks, each consisting 
 52 |     of one max-pooling layer and one convolution layer. One pooling
 53 |     block takes the output of the previous pooling block as
 54 |     input. The output feature maps of all pooling blocks are 
 55 |     fused together with the input feature map through summation 
 56 |     of residual connections.
 57 | 
 58 |     Arguments:
 59 |       inputs: The input tensor
 60 |       n_filters: Number of output feature maps for each conv
 61 | 
 62 |     Returns:
 63 |       Double-pooled feature maps
 64 |     """
 65 | 
 66 |     net_relu=tf.nn.relu(inputs)
 67 |     net=slim.max_pool2d(net_relu, [5, 5],stride=1,padding='SAME')
 68 |     net=slim.conv2d(net,n_filters,3, activation_fn=None)
 69 |     net_sum_1=tf.add(net,net_relu)
 70 | 
 71 |     net = slim.max_pool2d(net, [5, 5], stride=1, padding='SAME')
 72 |     net = slim.conv2d(net, n_filters, 3, activation_fn=None)
 73 |     net_sum_2=tf.add(net,net_sum_1)
 74 | 
 75 |     return net_sum_2
 76 | 
 77 | 
 78 | def MultiResolutionFusion(high_inputs=None,low_inputs=None,n_filters=256):
 79 |     """
 80 |     Fuse together all path inputs. This block first applies convolutions
 81 |     for input adaptation, which generate feature maps of the same feature dimension 
 82 |     (the smallest one among the inputs), and then up-samples all (smaller) feature maps to
 83 |     the largest resolution of the inputs. Finally, all features maps are fused by summation.
 84 | 
 85 |     Arguments:
 86 |       high_inputs: The input tensors that have the higher resolution
 87 |       low_inputs: The input tensors that have the lower resolution
 88 |       n_filters: Number of output feature maps for each conv
 89 | 
 90 |     Returns:
 91 |       Fused feature maps at higher resolution
 92 |     
 93 |     """
 94 | 
 95 |     if high_inputs is None: # RefineNet block 4
 96 | 
 97 |         fuse = slim.conv2d(low_inputs, n_filters, 3, activation_fn=None)
 98 | 
 99 |         return fuse
100 | 
101 |     else:
102 | 
103 |         conv_low = slim.conv2d(low_inputs, n_filters, 3, activation_fn=None)
104 |         conv_high = slim.conv2d(high_inputs, n_filters, 3, activation_fn=None)
105 | 
106 |         conv_low_up = Upsampling(conv_low,2)
107 | 
108 |         return tf.add(conv_low_up, conv_high)
109 | 
110 | 
111 | def RefineBlock(high_inputs=None,low_inputs=None):
112 |     """
113 |     A RefineNet Block which combines together the ResidualConvUnits,
114 |     fuses the feature maps using MultiResolutionFusion, and then gets
115 |     large-scale context with the ResidualConvUnit.
116 | 
117 |     Arguments:
118 |       high_inputs: The input tensors that have the higher resolution
119 |       low_inputs: The input tensors that have the lower resolution
120 | 
121 |     Returns:
122 |       RefineNet block for a single path i.e one resolution
123 |     
124 |     """
125 | 
126 |     if low_inputs is None: # block 4
127 |         rcu_new_low= ResidualConvUnit(high_inputs, n_filters=512)
128 |         rcu_new_low = ResidualConvUnit(rcu_new_low, n_filters=512)
129 | 
130 |         fuse = MultiResolutionFusion(high_inputs=None, low_inputs=rcu_new_low, n_filters=512)
131 |         fuse_pooling = ChainedResidualPooling(fuse, n_filters=512)
132 |         output = ResidualConvUnit(fuse_pooling, n_filters=512)
133 |         return output
134 |     else:
135 |         rcu_high= ResidualConvUnit(high_inputs, n_filters=256)
136 |         rcu_high = ResidualConvUnit(rcu_high, n_filters=256)
137 | 
138 |         fuse = MultiResolutionFusion(rcu_high, low_inputs,n_filters=256)
139 |         fuse_pooling = ChainedResidualPooling(fuse, n_filters=256)
140 |         output = ResidualConvUnit(fuse_pooling, n_filters=256)
141 |         return output
142 | 
143 | 
144 | 
145 | def build_refinenet(inputs, num_classes, preset_model='RefineNet', frontend="ResNet101", weight_decay=1e-5, upscaling_method="bilinear", pretrained_dir="models", is_training=True):
146 |     """
147 |     Builds the RefineNet model. 
148 | 
149 |     Arguments:
150 |       inputs: The input tensor
151 |       preset_model: Which model you want to use. Select which ResNet model to use for feature extraction 
152 |       num_classes: Number of classes
153 | 
154 |     Returns:
155 |       RefineNet model
156 |     """
157 | 
158 |     logits, end_points, frontend_scope, init_fn  = frontend_builder.build_frontend(inputs, frontend, pretrained_dir=pretrained_dir, is_training=is_training)
159 | 
160 |     
161 | 
162 | 
163 |     high = [end_points['pool5'], end_points['pool4'],
164 |          end_points['pool3'], end_points['pool2']]
165 | 
166 |     low = [None, None, None, None]
167 | 
168 |     # Get the feature maps to the proper size with bottleneck
169 |     high[0]=slim.conv2d(high[0], 512, 1)
170 |     high[1]=slim.conv2d(high[1], 256, 1)
171 |     high[2]=slim.conv2d(high[2], 256, 1)
172 |     high[3]=slim.conv2d(high[3], 256, 1)
173 | 
174 |     # RefineNet
175 |     low[0]=RefineBlock(high_inputs=high[0],low_inputs=None) # Only input ResNet 1/32
176 |     low[1]=RefineBlock(high[1],low[0]) # High input = ResNet 1/16, Low input = Previous 1/16
177 |     low[2]=RefineBlock(high[2],low[1]) # High input = ResNet 1/8, Low input = Previous 1/8
178 |     low[3]=RefineBlock(high[3],low[2]) # High input = ResNet 1/4, Low input = Previous 1/4
179 | 
180 |     # g[3]=Upsampling(g[3],scale=4)
181 | 
182 |     net = low[3]
183 | 
184 |     net = ResidualConvUnit(net)
185 |     net = ResidualConvUnit(net)
186 | 
187 |     if upscaling_method.lower() == "conv":
188 |         net = ConvUpscaleBlock(net, 128, kernel_size=[3, 3], scale=2)
189 |         net = ConvBlock(net, 128)
190 |         net = ConvUpscaleBlock(net, 64, kernel_size=[3, 3], scale=2)
191 |         net = ConvBlock(net, 64)
192 |     elif upscaling_method.lower() == "bilinear":
193 |         net = Upsampling(net, scale=4)
194 | 
195 |     net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits')
196 | 
197 |     return net, init_fn
198 | 
199 | 
200 | def mean_image_subtraction(inputs, means=[123.68, 116.78, 103.94]):
201 |     inputs=tf.to_float(inputs)
202 |     num_channels = inputs.get_shape().as_list()[-1]
203 |     if len(means) != num_channels:
204 |       raise ValueError('len(means) must match the number of channels')
205 |     channels = tf.split(axis=3, num_or_size_splits=num_channels, value=inputs)
206 |     for i in range(num_channels):
207 |         channels[i] -= means[i]
208 |     return tf.concat(axis=3, values=channels)
209 | 


--------------------------------------------------------------------------------
/models/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awangenh/Weed-Mapping/72526ebbc2abe3b9d35672689de25a321e36b039/models/__init__.py


--------------------------------------------------------------------------------
/models/custom_model.py:
--------------------------------------------------------------------------------
 1 | from __future__ import division
 2 | import os,time,cv2
 3 | import tensorflow as tf
 4 | import tensorflow.contrib.slim as slim
 5 | import numpy as np
 6 | from builders import frontend_builder
 7 | 
 8 | def conv_block(inputs, n_filters, filter_size=[3, 3], dropout_p=0.0):
 9 | 	"""
10 | 	Basic conv block for Encoder-Decoder
11 | 	Apply successivly Convolution, BatchNormalization, ReLU nonlinearity
12 | 	Dropout (if dropout_p > 0) on the inputs
13 | 	"""
14 | 	conv = slim.conv2d(inputs, n_filters, filter_size, activation_fn=None, normalizer_fn=None)
15 | 	out = tf.nn.relu(slim.batch_norm(conv, fused=True))
16 | 	if dropout_p != 0.0:
17 | 	  out = slim.dropout(out, keep_prob=(1.0-dropout_p))
18 | 	return out
19 | 
20 | def conv_transpose_block(inputs, n_filters, strides=2, filter_size=[3, 3], dropout_p=0.0):
21 | 	"""
22 | 	Basic conv transpose block for Encoder-Decoder upsampling
23 | 	Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity
24 | 	Dropout (if dropout_p > 0) on the inputs
25 | 	"""
26 | 	conv = slim.conv2d_transpose(inputs, n_filters, kernel_size=[3, 3], stride=[strides, strides])
27 | 	out = tf.nn.relu(slim.batch_norm(conv, fused=True))
28 | 	if dropout_p != 0.0:
29 | 	  out = slim.dropout(out, keep_prob=(1.0-dropout_p))
30 | 	return out
31 | 
32 | def build_custom(inputs, num_classes, frontend="ResNet101", weight_decay=1e-5, is_training=True, pretrained_dir="models"):
33 | 	
34 | 
35 | 	logits, end_points, frontend_scope, init_fn  = frontend_builder.build_frontend(inputs, frontend, is_training=is_training)
36 | 
37 | 	up_1 = conv_transpose_block(end_points["pool2"], strides=4, n_filters=64)
38 | 	up_2 = conv_transpose_block(end_points["pool3"], strides=8, n_filters=64)
39 | 	up_3 = conv_transpose_block(end_points["pool4"], strides=16, n_filters=64)
40 | 	up_4 = conv_transpose_block(end_points["pool5"], strides=32, n_filters=64)
41 | 
42 | 	features = tf.concat([up_1, up_2, up_3, up_4], axis=-1)
43 | 
44 | 	features = conv_block(inputs=features, n_filters=256, filter_size=[1, 1])
45 | 
46 | 	features = conv_block(inputs=features, n_filters=64, filter_size=[3, 3])
47 | 	features = conv_block(inputs=features, n_filters=64, filter_size=[3, 3])
48 | 	features = conv_block(inputs=features, n_filters=64, filter_size=[3, 3])
49 | 
50 | 
51 | 	net = slim.conv2d(features, num_classes, [1, 1], scope='logits')
52 | 	return net


--------------------------------------------------------------------------------
/predict.py:
--------------------------------------------------------------------------------
 1 | import os,time,cv2, sys, math
 2 | import tensorflow as tf
 3 | import argparse
 4 | import numpy as np
 5 | 
 6 | from utils import utils, helpers
 7 | from builders import model_builder
 8 | 
 9 | parser = argparse.ArgumentParser()
10 | parser.add_argument('--image', type=str, default=None, required=True, help='The image you want to predict on. ')
11 | parser.add_argument('--checkpoint_path', type=str, default=None, required=True, help='The path to the latest checkpoint weights for your model.')
12 | parser.add_argument('--crop_height', type=int, default=512, help='Height of cropped input image to network')
13 | parser.add_argument('--crop_width', type=int, default=512, help='Width of cropped input image to network')
14 | parser.add_argument('--model', type=str, default=None, required=True, help='The model you are using')
15 | parser.add_argument('--dataset', type=str, default="CamVid", required=False, help='The dataset you are using')
16 | args = parser.parse_args()
17 | 
18 | class_names_list, label_values = helpers.get_label_info(os.path.join(args.dataset, "class_dict.csv"))
19 | 
20 | num_classes = len(label_values)
21 | 
22 | print("\n***** Begin prediction *****")
23 | print("Dataset -->", args.dataset)
24 | print("Model -->", args.model)
25 | print("Crop Height -->", args.crop_height)
26 | print("Crop Width -->", args.crop_width)
27 | print("Num Classes -->", num_classes)
28 | print("Image -->", args.image)
29 | 
30 | # Initializing network
31 | config = tf.ConfigProto()
32 | config.gpu_options.allow_growth = True
33 | sess=tf.Session(config=config)
34 | 
35 | net_input = tf.placeholder(tf.float32,shape=[None,None,None,3])
36 | net_output = tf.placeholder(tf.float32,shape=[None,None,None,num_classes]) 
37 | 
38 | network, _ = model_builder.build_model(args.model, net_input=net_input,
39 |                                         num_classes=num_classes,
40 |                                         crop_width=args.crop_width,
41 |                                         crop_height=args.crop_height,
42 |                                         is_training=False)
43 | 
44 | sess.run(tf.global_variables_initializer())
45 | 
46 | print('Loading model checkpoint weights')
47 | saver=tf.train.Saver(max_to_keep=1000)
48 | saver.restore(sess, args.checkpoint_path)
49 | 
50 | 
51 | print("Testing image " + args.image)
52 | 
53 | loaded_image = utils.load_image(args.image)
54 | resized_image =cv2.resize(loaded_image, (args.crop_width, args.crop_height))
55 | input_image = np.expand_dims(np.float32(resized_image[:args.crop_height, :args.crop_width]),axis=0)/255.0
56 | 
57 | st = time.time()
58 | output_image = sess.run(network,feed_dict={net_input:input_image})
59 | 
60 | run_time = time.time()-st
61 | 
62 | output_image = np.array(output_image[0,:,:,:])
63 | output_image = helpers.reverse_one_hot(output_image)
64 | 
65 | out_vis_image = helpers.colour_code_segmentation(output_image, label_values)
66 | file_name = utils.filepath_to_name(args.image)
67 | cv2.imwrite("%s_pred.png"%(file_name),cv2.cvtColor(np.uint8(out_vis_image), cv2.COLOR_RGB2BGR))
68 | 
69 | print("")
70 | print("Finished!")
71 | print("Wrote image " + "%s_pred.png"%(file_name))
72 | 


--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
  1 | import os,time,cv2, sys, math
  2 | import tensorflow as tf
  3 | import argparse
  4 | import numpy as np
  5 | 
  6 | from utils import utils, helpers
  7 | from builders import model_builder
  8 | 
  9 | parser = argparse.ArgumentParser()
 10 | parser.add_argument('--checkpoint_path', type=str, default=None, required=True, help='The path to the latest checkpoint weights for your model.')
 11 | parser.add_argument('--crop_height', type=int, default=512, help='Height of cropped input image to network')
 12 | parser.add_argument('--crop_width', type=int, default=512, help='Width of cropped input image to network')
 13 | parser.add_argument('--model', type=str, default=None, required=True, help='The model you are using')
 14 | parser.add_argument('--dataset', type=str, default="CamVid", required=False, help='The dataset you are using')
 15 | args = parser.parse_args()
 16 | 
 17 | # Get the names of the classes so we can record the evaluation results
 18 | print("Retrieving dataset information ...")
 19 | class_names_list, label_values = helpers.get_label_info(os.path.join(args.dataset, "class_dict.csv"))
 20 | class_names_string = ""
 21 | for class_name in class_names_list:
 22 |     if not class_name == class_names_list[-1]:
 23 |         class_names_string = class_names_string + class_name + ", "
 24 |     else:
 25 |         class_names_string = class_names_string + class_name
 26 | 
 27 | num_classes = len(label_values)
 28 | 
 29 | # Initializing network
 30 | config = tf.ConfigProto()
 31 | config.gpu_options.allow_growth = True
 32 | sess=tf.Session(config=config)
 33 | 
 34 | net_input = tf.placeholder(tf.float32,shape=[None,None,None,3])
 35 | net_output = tf.placeholder(tf.float32,shape=[None,None,None,num_classes]) 
 36 | 
 37 | network, _ = model_builder.build_model(args.model, net_input=net_input, num_classes=num_classes, crop_width=args.crop_width, crop_height=args.crop_height, is_training=False)
 38 | 
 39 | sess.run(tf.global_variables_initializer())
 40 | 
 41 | print('Loading model checkpoint weights ...')
 42 | saver=tf.train.Saver(max_to_keep=1000)
 43 | saver.restore(sess, args.checkpoint_path)
 44 | 
 45 | # Load the data
 46 | print("Loading the data ...")
 47 | train_input_names,train_output_names, val_input_names, val_output_names, test_input_names, test_output_names = utils.prepare_data(dataset_dir=args.dataset)
 48 | 
 49 | # Create directories if needed
 50 | if not os.path.isdir("%s"%("Test")):
 51 |         os.makedirs("%s"%("Test"))
 52 | 
 53 | target=open("%s/test_scores.csv"%("Test"),'w')
 54 | target.write("test_name, test_accuracy, precision, recall, f1 score, mean iou, %s\n" % (class_names_string))
 55 | scores_list = []
 56 | class_scores_list = []
 57 | precision_list = []
 58 | recall_list = []
 59 | f1_list = []
 60 | iou_list = []
 61 | run_times_list = []
 62 | 
 63 | # Run testing on ALL test images
 64 | for ind in range(len(test_input_names)):
 65 |     sys.stdout.write("\rRunning test image %d / %d"%(ind+1, len(test_input_names)))
 66 |     sys.stdout.flush()
 67 | 
 68 |     input_image = np.expand_dims(np.float32(utils.load_image(test_input_names[ind])[:args.crop_height, :args.crop_width]),axis=0)/255.0
 69 |     gt = utils.load_image(test_output_names[ind])[:args.crop_height, :args.crop_width]
 70 |     gt = helpers.reverse_one_hot(helpers.one_hot_it(gt, label_values))
 71 | 
 72 |     st = time.time()
 73 |     output_image = sess.run(network,feed_dict={net_input:input_image})
 74 | 
 75 |     run_times_list.append(time.time()-st)
 76 | 
 77 |     output_image = np.array(output_image[0,:,:,:])
 78 |     output_image = helpers.reverse_one_hot(output_image)
 79 |     out_vis_image = helpers.colour_code_segmentation(output_image, label_values)
 80 | 
 81 |     accuracy, class_accuracies, prec, rec, f1, iou = utils.evaluate_segmentation(pred=output_image, label=gt, num_classes=num_classes)
 82 | 
 83 |     file_name = utils.filepath_to_name(test_input_names[ind])
 84 |     target.write("%s, %f, %f, %f, %f, %f"%(file_name, accuracy, prec, rec, f1, iou))
 85 |     for item in class_accuracies:
 86 |         target.write(", %f"%(item))
 87 |     target.write("\n")
 88 | 
 89 |     scores_list.append(accuracy)
 90 |     class_scores_list.append(class_accuracies)
 91 |     precision_list.append(prec)
 92 |     recall_list.append(rec)
 93 |     f1_list.append(f1)
 94 |     iou_list.append(iou)
 95 |     
 96 |     gt = helpers.colour_code_segmentation(gt, label_values)
 97 | 
 98 |     cv2.imwrite("%s/%s_pred.png"%("Test", file_name),cv2.cvtColor(np.uint8(out_vis_image), cv2.COLOR_RGB2BGR))
 99 |     cv2.imwrite("%s/%s_gt.png"%("Test", file_name),cv2.cvtColor(np.uint8(gt), cv2.COLOR_RGB2BGR))
100 | 
101 | 
102 | target.close()
103 | 
104 | avg_score = np.mean(scores_list)
105 | class_avg_scores = np.mean(class_scores_list, axis=0)
106 | avg_precision = np.mean(precision_list)
107 | avg_recall = np.mean(recall_list)
108 | avg_f1 = np.mean(f1_list)
109 | avg_iou = np.mean(iou_list)
110 | avg_time = np.mean(run_times_list)
111 | print("Average test accuracy = ", avg_score)
112 | print("Average per class test accuracies = \n")
113 | for index, item in enumerate(class_avg_scores):
114 |     print("%s = %f" % (class_names_list[index], item))
115 | print("Average precision = ", avg_precision)
116 | print("Average recall = ", avg_recall)
117 | print("Average F1 score = ", avg_f1)
118 | print("Average mean IoU score = ", avg_iou)
119 | print("Average run time = ", avg_time)
120 | 


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function
  2 | import os,time,cv2, sys, math
  3 | import tensorflow as tf
  4 | import tensorflow.contrib.slim as slim
  5 | import numpy as np
  6 | import time, datetime
  7 | import argparse
  8 | import random
  9 | import os, sys
 10 | import subprocess
 11 | 
 12 | # use 'Agg' on matplotlib so that plots could be generated even without Xserver
 13 | # running
 14 | import matplotlib
 15 | matplotlib.use('Agg')
 16 | 
 17 | from utils import utils, helpers
 18 | from builders import model_builder
 19 | 
 20 | import matplotlib.pyplot as plt
 21 | 
 22 | def str2bool(v):
 23 |     if v.lower() in ('yes', 'true', 't', 'y', '1'):
 24 |         return True
 25 |     elif v.lower() in ('no', 'false', 'f', 'n', '0'):
 26 |         return False
 27 |     else:
 28 |         raise argparse.ArgumentTypeError('Boolean value expected.')
 29 | 
 30 | parser = argparse.ArgumentParser()
 31 | parser.add_argument('--num_epochs', type=int, default=300, help='Number of epochs to train for')
 32 | parser.add_argument('--epoch_start_i', type=int, default=0, help='Start counting epochs from this number')
 33 | parser.add_argument('--checkpoint_step', type=int, default=5, help='How often to save checkpoints (epochs)')
 34 | parser.add_argument('--validation_step', type=int, default=1, help='How often to perform validation (epochs)')
 35 | parser.add_argument('--image', type=str, default=None, help='The image you want to predict on. Only valid in "predict" mode.')
 36 | parser.add_argument('--continue_training', type=str2bool, default=False, help='Whether to continue training from a checkpoint')
 37 | parser.add_argument('--dataset', type=str, default="CamVid", help='Dataset you are using.')
 38 | parser.add_argument('--crop_height', type=int, default=512, help='Height of cropped input image to network')
 39 | parser.add_argument('--crop_width', type=int, default=512, help='Width of cropped input image to network')
 40 | parser.add_argument('--batch_size', type=int, default=1, help='Number of images in each batch')
 41 | parser.add_argument('--num_val_images', type=int, default=20, help='The number of images to used for validations')
 42 | parser.add_argument('--h_flip', type=str2bool, default=False, help='Whether to randomly flip the image horizontally for data augmentation')
 43 | parser.add_argument('--v_flip', type=str2bool, default=False, help='Whether to randomly flip the image vertically for data augmentation')
 44 | parser.add_argument('--brightness', type=float, default=None, help='Whether to randomly change the image brightness for data augmentation. Specifies the max bightness change as a factor between 0.0 and 1.0. For example, 0.1 represents a max brightness change of 10%% (+-).')
 45 | parser.add_argument('--rotation', type=float, default=None, help='Whether to randomly rotate the image for data augmentation. Specifies the max rotation angle in degrees.')
 46 | parser.add_argument('--model', type=str, default="FC-DenseNet56", help='The model you are using. See model_builder.py for supported models')
 47 | parser.add_argument('--frontend', type=str, default="ResNet101", help='The frontend you are using. See frontend_builder.py for supported models')
 48 | args = parser.parse_args()
 49 | 
 50 | 
 51 | def data_augmentation(input_image, output_image):
 52 |     # Data augmentation
 53 |     input_image, output_image = utils.random_crop(input_image, output_image, args.crop_height, args.crop_width)
 54 | 
 55 |     if args.h_flip and random.randint(0,1):
 56 |         input_image = cv2.flip(input_image, 1)
 57 |         output_image = cv2.flip(output_image, 1)
 58 |     if args.v_flip and random.randint(0,1):
 59 |         input_image = cv2.flip(input_image, 0)
 60 |         output_image = cv2.flip(output_image, 0)
 61 |     if args.brightness:
 62 |         factor = 1.0 + random.uniform(-1.0*args.brightness, args.brightness)
 63 |         table = np.array([((i / 255.0) * factor) * 255 for i in np.arange(0, 256)]).astype(np.uint8)
 64 |         input_image = cv2.LUT(input_image, table)
 65 |     if args.rotation:
 66 |         angle = random.uniform(-1*args.rotation, args.rotation)
 67 |     if args.rotation:
 68 |         M = cv2.getRotationMatrix2D((input_image.shape[1]//2, input_image.shape[0]//2), angle, 1.0)
 69 |         input_image = cv2.warpAffine(input_image, M, (input_image.shape[1], input_image.shape[0]), flags=cv2.INTER_NEAREST)
 70 |         output_image = cv2.warpAffine(output_image, M, (output_image.shape[1], output_image.shape[0]), flags=cv2.INTER_NEAREST)
 71 | 
 72 |     return input_image, output_image
 73 | 
 74 | 
 75 | # Get the names of the classes so we can record the evaluation results
 76 | class_names_list, label_values = helpers.get_label_info(os.path.join(args.dataset, "class_dict.csv"))
 77 | class_names_string = ""
 78 | for class_name in class_names_list:
 79 |     if not class_name == class_names_list[-1]:
 80 |         class_names_string = class_names_string + class_name + ", "
 81 |     else:
 82 |         class_names_string = class_names_string + class_name
 83 | 
 84 | num_classes = len(label_values)
 85 | 
 86 | config = tf.ConfigProto()
 87 | config.gpu_options.allow_growth = True
 88 | sess=tf.Session(config=config)
 89 | 
 90 | 
 91 | # Compute your softmax cross entropy loss
 92 | net_input = tf.placeholder(tf.float32,shape=[None,None,None,3])
 93 | net_output = tf.placeholder(tf.float32,shape=[None,None,None,num_classes])
 94 | 
 95 | network, init_fn = model_builder.build_model(model_name=args.model, frontend=args.frontend, net_input=net_input, num_classes=num_classes, crop_width=args.crop_width, crop_height=args.crop_height, is_training=True)
 96 | 
 97 | loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=network, labels=net_output))
 98 | 
 99 | opt = tf.train.RMSPropOptimizer(learning_rate=0.0001, decay=0.995).minimize(loss, var_list=[var for var in tf.trainable_variables()])
100 | 
101 | saver=tf.train.Saver(max_to_keep=1000)
102 | sess.run(tf.global_variables_initializer())
103 | 
104 | utils.count_params()
105 | 
106 | # If a pre-trained ResNet is required, load the weights.
107 | # This must be done AFTER the variables are initialized with sess.run(tf.global_variables_initializer())
108 | if init_fn is not None:
109 |     init_fn(sess)
110 | 
111 | # Load a previous checkpoint if desired
112 | model_checkpoint_name = "checkpoints/latest_model_" + args.model + "_" + args.dataset + ".ckpt"
113 | if args.continue_training:
114 |     print('Loaded latest model checkpoint')
115 |     saver.restore(sess, model_checkpoint_name)
116 | 
117 | # Load the data
118 | print("Loading the data ...")
119 | train_input_names,train_output_names, val_input_names, val_output_names, test_input_names, test_output_names = utils.prepare_data(dataset_dir=args.dataset)
120 | 
121 | 
122 | 
123 | print("\n***** Begin training *****")
124 | print("Dataset -->", args.dataset)
125 | print("Model -->", args.model)
126 | print("Crop Height -->", args.crop_height)
127 | print("Crop Width -->", args.crop_width)
128 | print("Num Epochs -->", args.num_epochs)
129 | print("Batch Size -->", args.batch_size)
130 | print("Num Classes -->", num_classes)
131 | 
132 | print("Data Augmentation:")
133 | print("\tVertical Flip -->", args.v_flip)
134 | print("\tHorizontal Flip -->", args.h_flip)
135 | print("\tBrightness Alteration -->", args.brightness)
136 | print("\tRotation -->", args.rotation)
137 | print("")
138 | 
139 | avg_loss_per_epoch = []
140 | avg_scores_per_epoch = []
141 | avg_iou_per_epoch = []
142 | 
143 | # Which validation images do we want
144 | val_indices = []
145 | num_vals = min(args.num_val_images, len(val_input_names))
146 | 
147 | # Set random seed to make sure models are validated on the same validation images.
148 | # So you can compare the results of different models more intuitively.
149 | random.seed(16)
150 | val_indices=random.sample(range(0,len(val_input_names)),num_vals)
151 | 
152 | # Do the training here
153 | for epoch in range(args.epoch_start_i, args.num_epochs):
154 | 
155 |     current_losses = []
156 | 
157 |     cnt=0
158 | 
159 |     # Equivalent to shuffling
160 |     id_list = np.random.permutation(len(train_input_names))
161 | 
162 |     num_iters = int(np.floor(len(id_list) / args.batch_size))
163 |     st = time.time()
164 |     epoch_st=time.time()
165 |     for i in range(num_iters):
166 |         # st=time.time()
167 | 
168 |         input_image_batch = []
169 |         output_image_batch = []
170 | 
171 |         # Collect a batch of images
172 |         for j in range(args.batch_size):
173 |             index = i*args.batch_size + j
174 |             id = id_list[index]
175 |             input_image = utils.load_image(train_input_names[id])
176 |             output_image = utils.load_image(train_output_names[id])
177 | 
178 |             with tf.device('/cpu:0'):
179 |                 input_image, output_image = data_augmentation(input_image, output_image)
180 | 
181 | 
182 |                 # Prep the data. Make sure the labels are in one-hot format
183 |                 input_image = np.float32(input_image) / 255.0
184 |                 output_image = np.float32(helpers.one_hot_it(label=output_image, label_values=label_values))
185 | 
186 |                 input_image_batch.append(np.expand_dims(input_image, axis=0))
187 |                 output_image_batch.append(np.expand_dims(output_image, axis=0))
188 | 
189 |         if args.batch_size == 1:
190 |             input_image_batch = input_image_batch[0]
191 |             output_image_batch = output_image_batch[0]
192 |         else:
193 |             input_image_batch = np.squeeze(np.stack(input_image_batch, axis=1))
194 |             output_image_batch = np.squeeze(np.stack(output_image_batch, axis=1))
195 | 
196 |         # Do the training
197 |         _,current=sess.run([opt,loss],feed_dict={net_input:input_image_batch,net_output:output_image_batch})
198 |         current_losses.append(current)
199 |         cnt = cnt + args.batch_size
200 |         if cnt % 20 == 0:
201 |             string_print = "Epoch = %d Count = %d Current_Loss = %.4f Time = %.2f"%(epoch,cnt,current,time.time()-st)
202 |             utils.LOG(string_print)
203 |             st = time.time()
204 | 
205 |     mean_loss = np.mean(current_losses)
206 |     avg_loss_per_epoch.append(mean_loss)
207 | 
208 |     # Create directories if needed
209 |     if not os.path.isdir("%s/%04d"%("checkpoints",epoch)):
210 |         os.makedirs("%s/%04d"%("checkpoints",epoch))
211 | 
212 |     # Save latest checkpoint to same file name
213 |     print("Saving latest checkpoint")
214 |     saver.save(sess,model_checkpoint_name)
215 | 
216 |     if val_indices != 0 and epoch % args.checkpoint_step == 0:
217 |         print("Saving checkpoint for this epoch")
218 |         saver.save(sess,"%s/%04d/model.ckpt"%("checkpoints",epoch))
219 | 
220 | 
221 |     if epoch % args.validation_step == 0:
222 |         print("Performing validation")
223 |         target=open("%s/%04d/val_scores.csv"%("checkpoints",epoch),'w')
224 |         target.write("val_name, avg_accuracy, precision, recall, f1 score, mean iou, %s\n" % (class_names_string))
225 | 
226 | 
227 |         scores_list = []
228 |         class_scores_list = []
229 |         precision_list = []
230 |         recall_list = []
231 |         f1_list = []
232 |         iou_list = []
233 | 
234 | 
235 |         # Do the validation on a small set of validation images
236 |         for ind in val_indices:
237 | 
238 |             input_image = np.expand_dims(np.float32(utils.load_image(val_input_names[ind])[:args.crop_height, :args.crop_width]),axis=0)/255.0
239 |             gt = utils.load_image(val_output_names[ind])[:args.crop_height, :args.crop_width]
240 |             gt = helpers.reverse_one_hot(helpers.one_hot_it(gt, label_values))
241 | 
242 |             # st = time.time()
243 | 
244 |             output_image = sess.run(network,feed_dict={net_input:input_image})
245 | 
246 | 
247 |             output_image = np.array(output_image[0,:,:,:])
248 |             output_image = helpers.reverse_one_hot(output_image)
249 |             out_vis_image = helpers.colour_code_segmentation(output_image, label_values)
250 | 
251 |             accuracy, class_accuracies, prec, rec, f1, iou = utils.evaluate_segmentation(pred=output_image, label=gt, num_classes=num_classes)
252 | 
253 |             file_name = utils.filepath_to_name(val_input_names[ind])
254 |             target.write("%s, %f, %f, %f, %f, %f"%(file_name, accuracy, prec, rec, f1, iou))
255 |             for item in class_accuracies:
256 |                 target.write(", %f"%(item))
257 |             target.write("\n")
258 | 
259 |             scores_list.append(accuracy)
260 |             class_scores_list.append(class_accuracies)
261 |             precision_list.append(prec)
262 |             recall_list.append(rec)
263 |             f1_list.append(f1)
264 |             iou_list.append(iou)
265 | 
266 |             gt = helpers.colour_code_segmentation(gt, label_values)
267 | 
268 |             file_name = os.path.basename(val_input_names[ind])
269 |             file_name = os.path.splitext(file_name)[0]
270 |             cv2.imwrite("%s/%04d/%s_pred.png"%("checkpoints",epoch, file_name),cv2.cvtColor(np.uint8(out_vis_image), cv2.COLOR_RGB2BGR))
271 |             cv2.imwrite("%s/%04d/%s_gt.png"%("checkpoints",epoch, file_name),cv2.cvtColor(np.uint8(gt), cv2.COLOR_RGB2BGR))
272 | 
273 | 
274 |         target.close()
275 | 
276 |         avg_score = np.mean(scores_list)
277 |         class_avg_scores = np.mean(class_scores_list, axis=0)
278 |         avg_scores_per_epoch.append(avg_score)
279 |         avg_precision = np.mean(precision_list)
280 |         avg_recall = np.mean(recall_list)
281 |         avg_f1 = np.mean(f1_list)
282 |         avg_iou = np.mean(iou_list)
283 |         avg_iou_per_epoch.append(avg_iou)
284 | 
285 |         print("\nAverage validation accuracy for epoch # %04d = %f"% (epoch, avg_score))
286 |         print("Average per class validation accuracies for epoch # %04d:"% (epoch))
287 |         for index, item in enumerate(class_avg_scores):
288 |             print("%s = %f" % (class_names_list[index], item))
289 |         print("Validation precision = ", avg_precision)
290 |         print("Validation recall = ", avg_recall)
291 |         print("Validation F1 score = ", avg_f1)
292 |         print("Validation IoU score = ", avg_iou)
293 | 
294 |     epoch_time=time.time()-epoch_st
295 |     remain_time=epoch_time*(args.num_epochs-1-epoch)
296 |     m, s = divmod(remain_time, 60)
297 |     h, m = divmod(m, 60)
298 |     if s!=0:
299 |         train_time="Remaining training time = %d hours %d minutes %d seconds\n"%(h,m,s)
300 |     else:
301 |         train_time="Remaining training time : Training completed.\n"
302 |     utils.LOG(train_time)
303 |     scores_list = []
304 | 
305 | 
306 |     fig1, ax1 = plt.subplots(figsize=(11, 8))
307 | 
308 |     ax1.plot(range(epoch+1), avg_scores_per_epoch)
309 |     ax1.set_title("Average validation accuracy vs epochs")
310 |     ax1.set_xlabel("Epoch")
311 |     ax1.set_ylabel("Avg. val. accuracy")
312 | 
313 | 
314 |     plt.savefig('accuracy_vs_epochs.png')
315 | 
316 |     plt.clf()
317 | 
318 |     fig2, ax2 = plt.subplots(figsize=(11, 8))
319 | 
320 |     ax2.plot(range(epoch+1), avg_loss_per_epoch)
321 |     ax2.set_title("Average loss vs epochs")
322 |     ax2.set_xlabel("Epoch")
323 |     ax2.set_ylabel("Current loss")
324 | 
325 |     plt.savefig('loss_vs_epochs.png')
326 | 
327 |     plt.clf()
328 | 
329 |     fig3, ax3 = plt.subplots(figsize=(11, 8))
330 | 
331 |     ax3.plot(range(epoch+1), avg_iou_per_epoch)
332 |     ax3.set_title("Average IoU vs epochs")
333 |     ax3.set_xlabel("Epoch")
334 |     ax3.set_ylabel("Current IoU")
335 | 
336 |     plt.savefig('iou_vs_epochs.png')
337 | 
338 | 
339 | 
340 | 


--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awangenh/Weed-Mapping/72526ebbc2abe3b9d35672689de25a321e36b039/utils/__init__.py


--------------------------------------------------------------------------------
/utils/get_pretrained_checkpoints.py:
--------------------------------------------------------------------------------
 1 | import subprocess
 2 | import argparse
 3 | 
 4 | parser = argparse.ArgumentParser()
 5 | parser.add_argument('--model', type=str, default="ALL", help='Which model weights to download')
 6 | args = parser.parse_args()
 7 | 
 8 | 
 9 | if args.model == "ResNet50" or args.model == "ALL":
10 | 	subprocess.check_output(['wget','http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz', "-P", "models"])
11 | 	try:
12 | 		subprocess.check_output(['tar', '-xvf', 'models/resnet_v2_50_2017_04_14.tar.gz', "-C", "models"])
13 | 		subprocess.check_output(['rm', 'models/resnet_v2_50_2017_04_14.tar.gz'])
14 | 	except Exception as e:
15 | 		print(e)
16 | 		pass
17 | 
18 | if args.model == "ResNet101" or args.model == "ALL":
19 | 	subprocess.check_output(['wget','http://download.tensorflow.org/models/resnet_v2_101_2017_04_14.tar.gz', "-P", "models"])
20 | 	try:
21 | 		subprocess.check_output(['tar', '-xvf', 'models/resnet_v2_101_2017_04_14.tar.gz', "-C", "models"])
22 | 		subprocess.check_output(['rm', 'models/resnet_v2_101_2017_04_14.tar.gz'])
23 | 	except Exception as e:
24 | 		print(e)
25 | 		pass
26 | 
27 | if args.model == "ResNet152" or args.model == "ALL":
28 | 	subprocess.check_output(['wget','http://download.tensorflow.org/models/resnet_v2_152_2017_04_14.tar.gz', "-P", "models"])
29 | 	try:
30 | 		subprocess.check_output(['tar', '-xvf', 'models/resnet_v2_152_2017_04_14.tar.gz', "-C", "models"])
31 | 		subprocess.check_output(['rm', 'models/resnet_v2_152_2017_04_14.tar.gz'])
32 | 	except Exception as e:
33 | 		print(e)
34 | 		pass
35 | 
36 | if args.model == "MobileNetV2" or args.model == "ALL":
37 | 	subprocess.check_output(['wget','https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_1.4_224.tgz', "-P", "models"])
38 | 	try:
39 | 		subprocess.check_output(['tar', '-xvf', 'models/mobilenet_v2_1.4_224.tgz', "-C", "models"])
40 | 		subprocess.check_output(['rm', 'models/mobilenet_v2_1.4_224.tgz'])
41 | 	except Exception as e:
42 | 		print(e)
43 | 		pass
44 | 
45 | if args.model == "InceptionV4" or args.model == "ALL":
46 | 	subprocess.check_output(
47 | 		['wget', 'http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz', "-P", "models"])
48 | 	try:
49 | 		subprocess.check_output(['tar', '-xvf', 'models/inception_v4_2016_09_09.tar.gz', "-C", "models"])
50 | 		subprocess.check_output(['rm', 'models/inception_v4_2016_09_09.tar.gz'])
51 | 	except Exception as e:
52 | 		print(e)
53 | 		pass
54 | 


--------------------------------------------------------------------------------
/utils/helpers.py:
--------------------------------------------------------------------------------
  1 | import cv2
  2 | import numpy as np
  3 | import itertools
  4 | import operator
  5 | import os, csv
  6 | import tensorflow as tf
  7 | 
  8 | import time, datetime
  9 | 
 10 | def get_label_info(csv_path):
 11 |     """
 12 |     Retrieve the class names and label values for the selected dataset.
 13 |     Must be in CSV format!
 14 | 
 15 |     # Arguments
 16 |         csv_path: The file path of the class dictionairy
 17 |         
 18 |     # Returns
 19 |         Two lists: one for the class names and the other for the label values
 20 |     """
 21 |     filename, file_extension = os.path.splitext(csv_path)
 22 |     if not file_extension == ".csv":
 23 |         return ValueError("File is not a CSV!")
 24 | 
 25 |     class_names = []
 26 |     label_values = []
 27 |     with open(csv_path, 'r') as csvfile:
 28 |         file_reader = csv.reader(csvfile, delimiter=',')
 29 |         header = next(file_reader)
 30 |         for row in file_reader:
 31 |             class_names.append(row[0])
 32 |             label_values.append([int(row[1]), int(row[2]), int(row[3])])
 33 |         # print(class_dict)
 34 |     return class_names, label_values
 35 | 
 36 | 
 37 | def one_hot_it(label, label_values):
 38 |     """
 39 |     Convert a segmentation image label array to one-hot format
 40 |     by replacing each pixel value with a vector of length num_classes
 41 | 
 42 |     # Arguments
 43 |         label: The 2D array segmentation image label
 44 |         label_values
 45 |         
 46 |     # Returns
 47 |         A 2D array with the same width and hieght as the input, but
 48 |         with a depth size of num_classes
 49 |     """
 50 |     # st = time.time()
 51 |     # w = label.shape[0]
 52 |     # h = label.shape[1]
 53 |     # num_classes = len(class_dict)
 54 |     # x = np.zeros([w,h,num_classes])
 55 |     # unique_labels = sortedlist((class_dict.values()))
 56 |     # for i in range(0, w):
 57 |     #     for j in range(0, h):
 58 |     #         index = unique_labels.index(list(label[i][j][:]))
 59 |     #         x[i,j,index]=1
 60 |     # print("Time 1 = ", time.time() - st)
 61 | 
 62 |     # st = time.time()
 63 |     # https://stackoverflow.com/questions/46903885/map-rgb-semantic-maps-to-one-hot-encodings-and-vice-versa-in-tensorflow
 64 |     # https://stackoverflow.com/questions/14859458/how-to-check-if-all-values-in-the-columns-of-a-numpy-matrix-are-the-same
 65 |     semantic_map = []
 66 |     for colour in label_values:
 67 |         # colour_map = np.full((label.shape[0], label.shape[1], label.shape[2]), colour, dtype=int)
 68 |         equality = np.equal(label, colour)
 69 |         class_map = np.all(equality, axis = -1)
 70 |         semantic_map.append(class_map)
 71 |     semantic_map = np.stack(semantic_map, axis=-1)
 72 |     # print("Time 2 = ", time.time() - st)
 73 | 
 74 |     return semantic_map
 75 |     
 76 | def reverse_one_hot(image):
 77 |     """
 78 |     Transform a 2D array in one-hot format (depth is num_classes),
 79 |     to a 2D array with only 1 channel, where each pixel value is
 80 |     the classified class key.
 81 | 
 82 |     # Arguments
 83 |         image: The one-hot format image 
 84 |         
 85 |     # Returns
 86 |         A 2D array with the same width and hieght as the input, but
 87 |         with a depth size of 1, where each pixel value is the classified 
 88 |         class key.
 89 |     """
 90 |     # w = image.shape[0]
 91 |     # h = image.shape[1]
 92 |     # x = np.zeros([w,h,1])
 93 | 
 94 |     # for i in range(0, w):
 95 |     #     for j in range(0, h):
 96 |     #         index, value = max(enumerate(image[i, j, :]), key=operator.itemgetter(1))
 97 |     #         x[i, j] = index
 98 | 
 99 |     x = np.argmax(image, axis = -1)
100 |     return x
101 | 
102 | 
103 | def colour_code_segmentation(image, label_values):
104 |     """
105 |     Given a 1-channel array of class keys, colour code the segmentation results.
106 | 
107 |     # Arguments
108 |         image: single channel array where each value represents the class key.
109 |         label_values
110 |         
111 |     # Returns
112 |         Colour coded image for segmentation visualization
113 |     """
114 | 
115 |     # w = image.shape[0]
116 |     # h = image.shape[1]
117 |     # x = np.zeros([w,h,3])
118 |     # colour_codes = label_values
119 |     # for i in range(0, w):
120 |     #     for j in range(0, h):
121 |     #         x[i, j, :] = colour_codes[int(image[i, j])]
122 |     
123 |     colour_codes = np.array(label_values)
124 |     x = colour_codes[image.astype(int)]
125 | 
126 |     return x
127 | 
128 | # class_dict = get_class_dict("CamVid/class_dict.csv")
129 | # gt = cv2.imread("CamVid/test_labels/0001TP_007170_L.png",-1)
130 | # gt = reverse_one_hot(one_hot_it(gt, class_dict))
131 | # gt = colour_code_segmentation(gt, class_dict)
132 | 
133 | # file_name = "gt_test.png"
134 | # cv2.imwrite(file_name,np.uint8(gt))


--------------------------------------------------------------------------------
/utils/utils.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function, division
  2 | import os,time,cv2, sys, math
  3 | import tensorflow as tf
  4 | import tensorflow.contrib.slim as slim
  5 | import numpy as np
  6 | import time, datetime
  7 | import os, random
  8 | from scipy.misc import imread
  9 | import ast
 10 | from sklearn.metrics import precision_score, \
 11 |     recall_score, confusion_matrix, classification_report, \
 12 |     accuracy_score, f1_score
 13 | 
 14 | from utils import helpers
 15 | 
 16 | def prepare_data(dataset_dir):
 17 |     train_input_names=[]
 18 |     train_output_names=[]
 19 |     val_input_names=[]
 20 |     val_output_names=[]
 21 |     test_input_names=[]
 22 |     test_output_names=[]
 23 |     for file in os.listdir(dataset_dir + "/train"):
 24 |         cwd = os.getcwd()
 25 |         train_input_names.append(cwd + "/" + dataset_dir + "/train/" + file)
 26 |     for file in os.listdir(dataset_dir + "/train_labels"):
 27 |         cwd = os.getcwd()
 28 |         train_output_names.append(cwd + "/" + dataset_dir + "/train_labels/" + file)
 29 |     for file in os.listdir(dataset_dir + "/val"):
 30 |         cwd = os.getcwd()
 31 |         val_input_names.append(cwd + "/" + dataset_dir + "/val/" + file)
 32 |     for file in os.listdir(dataset_dir + "/val_labels"):
 33 |         cwd = os.getcwd()
 34 |         val_output_names.append(cwd + "/" + dataset_dir + "/val_labels/" + file)
 35 |     for file in os.listdir(dataset_dir + "/test"):
 36 |         cwd = os.getcwd()
 37 |         test_input_names.append(cwd + "/" + dataset_dir + "/test/" + file)
 38 |     for file in os.listdir(dataset_dir + "/test_labels"):
 39 |         cwd = os.getcwd()
 40 |         test_output_names.append(cwd + "/" + dataset_dir + "/test_labels/" + file)
 41 |     train_input_names.sort(),train_output_names.sort(), val_input_names.sort(), val_output_names.sort(), test_input_names.sort(), test_output_names.sort()
 42 |     return train_input_names,train_output_names, val_input_names, val_output_names, test_input_names, test_output_names
 43 | 
 44 | def load_image(path):
 45 |     image = cv2.cvtColor(cv2.imread(path,-1), cv2.COLOR_BGR2RGB)
 46 |     return image
 47 | 
 48 | # Takes an absolute file path and returns the name of the file without th extension
 49 | def filepath_to_name(full_name):
 50 |     file_name = os.path.basename(full_name)
 51 |     file_name = os.path.splitext(file_name)[0]
 52 |     return file_name
 53 | 
 54 | # Print with time. To console or file
 55 | def LOG(X, f=None):
 56 |     time_stamp = datetime.datetime.now().strftime("[%Y-%m-%d %H:%M:%S]")
 57 |     if not f:
 58 |         print(time_stamp + " " + X)
 59 |     else:
 60 |         f.write(time_stamp + " " + X)
 61 | 
 62 | 
 63 | # Count total number of parameters in the model
 64 | def count_params():
 65 |     total_parameters = 0
 66 |     for variable in tf.trainable_variables():
 67 |         shape = variable.get_shape()
 68 |         variable_parameters = 1
 69 |         for dim in shape:
 70 |             variable_parameters *= dim.value
 71 |         total_parameters += variable_parameters
 72 |     print("This model has %d trainable parameters"% (total_parameters))
 73 | 
 74 | # Subtracts the mean images from ImageNet
 75 | def mean_image_subtraction(inputs, means=[123.68, 116.78, 103.94]):
 76 |     inputs=tf.to_float(inputs)
 77 |     num_channels = inputs.get_shape().as_list()[-1]
 78 |     if len(means) != num_channels:
 79 |       raise ValueError('len(means) must match the number of channels')
 80 |     channels = tf.split(axis=3, num_or_size_splits=num_channels, value=inputs)
 81 |     for i in range(num_channels):
 82 |         channels[i] -= means[i]
 83 |     return tf.concat(axis=3, values=channels)
 84 | 
 85 | def _lovasz_grad(gt_sorted):
 86 |     """
 87 |     Computes gradient of the Lovasz extension w.r.t sorted errors
 88 |     See Alg. 1 in paper
 89 |     """
 90 |     gts = tf.reduce_sum(gt_sorted)
 91 |     intersection = gts - tf.cumsum(gt_sorted)
 92 |     union = gts + tf.cumsum(1. - gt_sorted)
 93 |     jaccard = 1. - intersection / union
 94 |     jaccard = tf.concat((jaccard[0:1], jaccard[1:] - jaccard[:-1]), 0)
 95 |     return jaccard
 96 | 
 97 | def _flatten_probas(probas, labels, ignore=None, order='BHWC'):
 98 |     """
 99 |     Flattens predictions in the batch
100 |     """
101 |     if order == 'BCHW':
102 |         probas = tf.transpose(probas, (0, 2, 3, 1), name="BCHW_to_BHWC")
103 |         order = 'BHWC'
104 |     if order != 'BHWC':
105 |         raise NotImplementedError('Order {} unknown'.format(order))
106 |     C = probas.shape[3]
107 |     probas = tf.reshape(probas, (-1, C))
108 |     labels = tf.reshape(labels, (-1,))
109 |     if ignore is None:
110 |         return probas, labels
111 |     valid = tf.not_equal(labels, ignore)
112 |     vprobas = tf.boolean_mask(probas, valid, name='valid_probas')
113 |     vlabels = tf.boolean_mask(labels, valid, name='valid_labels')
114 |     return vprobas, vlabels
115 | 
116 | def _lovasz_softmax_flat(probas, labels, only_present=True):
117 |     """
118 |     Multi-class Lovasz-Softmax loss
119 |       probas: [P, C] Variable, class probabilities at each prediction (between 0 and 1)
120 |       labels: [P] Tensor, ground truth labels (between 0 and C - 1)
121 |       only_present: average only on classes present in ground truth
122 |     """
123 |     C = probas.shape[1]
124 |     losses = []
125 |     present = []
126 |     for c in range(C):
127 |         fg = tf.cast(tf.equal(labels, c), probas.dtype) # foreground for class c
128 |         if only_present:
129 |             present.append(tf.reduce_sum(fg) > 0)
130 |         errors = tf.abs(fg - probas[:, c])
131 |         errors_sorted, perm = tf.nn.top_k(errors, k=tf.shape(errors)[0], name="descending_sort_{}".format(c))
132 |         fg_sorted = tf.gather(fg, perm)
133 |         grad = _lovasz_grad(fg_sorted)
134 |         losses.append(
135 |             tf.tensordot(errors_sorted, tf.stop_gradient(grad), 1, name="loss_class_{}".format(c))
136 |                       )
137 |     losses_tensor = tf.stack(losses)
138 |     if only_present:
139 |         present = tf.stack(present)
140 |         losses_tensor = tf.boolean_mask(losses_tensor, present)
141 |     return losses_tensor
142 | 
143 | def lovasz_softmax(probas, labels, only_present=True, per_image=False, ignore=None, order='BHWC'):
144 |     """
145 |     Multi-class Lovasz-Softmax loss
146 |       probas: [B, H, W, C] or [B, C, H, W] Variable, class probabilities at each prediction (between 0 and 1)
147 |       labels: [B, H, W] Tensor, ground truth labels (between 0 and C - 1)
148 |       only_present: average only on classes present in ground truth
149 |       per_image: compute the loss per image instead of per batch
150 |       ignore: void class labels
151 |       order: use BHWC or BCHW
152 |     """
153 |     probas = tf.nn.softmax(probas, 3)
154 |     labels = helpers.reverse_one_hot(labels)
155 | 
156 |     if per_image:
157 |         def treat_image(prob, lab):
158 |             prob, lab = tf.expand_dims(prob, 0), tf.expand_dims(lab, 0)
159 |             prob, lab = _flatten_probas(prob, lab, ignore, order)
160 |             return _lovasz_softmax_flat(prob, lab, only_present=only_present)
161 |         losses = tf.map_fn(treat_image, (probas, labels), dtype=tf.float32)
162 |     else:
163 |         losses = _lovasz_softmax_flat(*_flatten_probas(probas, labels, ignore, order), only_present=only_present)
164 |     return losses
165 | 
166 | 
167 | # Randomly crop the image to a specific size. For data augmentation
168 | def random_crop(image, label, crop_height, crop_width):
169 |     if (image.shape[0] != label.shape[0]) or (image.shape[1] != label.shape[1]):
170 |         raise Exception('Image and label must have the same dimensions!')
171 |         
172 |     if (crop_width <= image.shape[1]) and (crop_height <= image.shape[0]):
173 |         x = random.randint(0, image.shape[1]-crop_width)
174 |         y = random.randint(0, image.shape[0]-crop_height)
175 |         
176 |         if len(label.shape) == 3:
177 |             return image[y:y+crop_height, x:x+crop_width, :], label[y:y+crop_height, x:x+crop_width, :]
178 |         else:
179 |             return image[y:y+crop_height, x:x+crop_width, :], label[y:y+crop_height, x:x+crop_width]
180 |     else:
181 |         raise Exception('Crop shape (%d, %d) exceeds image dimensions (%d, %d)!' % (crop_height, crop_width, image.shape[0], image.shape[1]))
182 | 
183 | # Compute the average segmentation accuracy across all classes
184 | def compute_global_accuracy(pred, label):
185 |     total = len(label)
186 |     count = 0.0
187 |     for i in range(total):
188 |         if pred[i] == label[i]:
189 |             count = count + 1.0
190 |     return float(count) / float(total)
191 | 
192 | # Compute the class-specific segmentation accuracy
193 | def compute_class_accuracies(pred, label, num_classes):
194 |     total = []
195 |     for val in range(num_classes):
196 |         total.append((label == val).sum())
197 | 
198 |     count = [0.0] * num_classes
199 |     for i in range(len(label)):
200 |         if pred[i] == label[i]:
201 |             count[int(pred[i])] = count[int(pred[i])] + 1.0
202 | 
203 |     # If there are no pixels from a certain class in the GT, 
204 |     # it returns NAN because of divide by zero
205 |     # Replace the nans with a 1.0.
206 |     accuracies = []
207 |     for i in range(len(total)):
208 |         if total[i] == 0:
209 |             accuracies.append(1.0)
210 |         else:
211 |             accuracies.append(count[i] / total[i])
212 | 
213 |     return accuracies
214 | 
215 | 
216 | def compute_mean_iou(pred, label):
217 | 
218 |     unique_labels = np.unique(label)
219 |     num_unique_labels = len(unique_labels);
220 | 
221 |     I = np.zeros(num_unique_labels)
222 |     U = np.zeros(num_unique_labels)
223 | 
224 |     for index, val in enumerate(unique_labels):
225 |         pred_i = pred == val
226 |         label_i = label == val
227 | 
228 |         I[index] = float(np.sum(np.logical_and(label_i, pred_i)))
229 |         U[index] = float(np.sum(np.logical_or(label_i, pred_i)))
230 | 
231 | 
232 |     mean_iou = np.mean(I / U)
233 |     return mean_iou
234 | 
235 | 
236 | def evaluate_segmentation(pred, label, num_classes, score_averaging="weighted"):
237 |     flat_pred = pred.flatten()
238 |     flat_label = label.flatten()
239 | 
240 |     global_accuracy = compute_global_accuracy(flat_pred, flat_label)
241 |     class_accuracies = compute_class_accuracies(flat_pred, flat_label, num_classes)
242 | 
243 |     prec = precision_score(flat_pred, flat_label, average=score_averaging)
244 |     rec = recall_score(flat_pred, flat_label, average=score_averaging)
245 |     f1 = f1_score(flat_pred, flat_label, average=score_averaging)
246 | 
247 |     iou = compute_mean_iou(flat_pred, flat_label)
248 | 
249 |     return global_accuracy, class_accuracies, prec, rec, f1, iou
250 | 
251 |     
252 | def compute_class_weights(labels_dir, label_values):
253 |     '''
254 |     Arguments:
255 |         labels_dir(list): Directory where the image segmentation labels are
256 |         num_classes(int): the number of classes of pixels in all images
257 | 
258 |     Returns:
259 |         class_weights(list): a list of class weights where each index represents each class label and the element is the class weight for that label.
260 | 
261 |     '''
262 |     image_files = [os.path.join(labels_dir, file) for file in os.listdir(labels_dir) if file.endswith('.png')]
263 | 
264 |     num_classes = len(label_values)
265 | 
266 |     class_pixels = np.zeros(num_classes) 
267 | 
268 |     total_pixels = 0.0
269 | 
270 |     for n in range(len(image_files)):
271 |         image = imread(image_files[n])
272 | 
273 |         for index, colour in enumerate(label_values):
274 |             class_map = np.all(np.equal(image, colour), axis = -1)
275 |             class_map = class_map.astype(np.float32)
276 |             class_pixels[index] += np.sum(class_map)
277 | 
278 |             
279 |         print("\rProcessing image: " + str(n) + " / " + str(len(image_files)), end="")
280 |         sys.stdout.flush()
281 | 
282 |     total_pixels = float(np.sum(class_pixels))
283 |     index_to_delete = np.argwhere(class_pixels==0.0)
284 |     class_pixels = np.delete(class_pixels, index_to_delete)
285 | 
286 |     class_weights = total_pixels / class_pixels
287 |     class_weights = class_weights / np.sum(class_weights)
288 | 
289 |     return class_weights
290 | 
291 | # Compute the memory usage, for debugging
292 | def memory():
293 |     import os
294 |     import psutil
295 |     pid = os.getpid()
296 |     py = psutil.Process(pid)
297 |     memoryUse = py.memory_info()[0]/2.**30  # Memory use in GB
298 |     print('Memory usage in GBs:', memoryUse)
299 | 
300 | 


--------------------------------------------------------------------------------