├── LICENSE ├── README.md ├── matlab ├── +dagnn │ └── Combine.m ├── DepthMapPrediction.m ├── error_metrics.m ├── evaluateMake3D.m ├── evaluateNYU.m └── setupMatConvNet.m └── tensorflow ├── models ├── __init__.py ├── fcrn.py └── network.py └── predict.py /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2016, Iro Laina 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 24 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Deeper Depth Prediction with Fully Convolutional Residual Networks 2 | 3 | By [Iro Laina](http://campar.in.tum.de/Main/IroLaina), [Christian Rupprecht](http://campar.in.tum.de/Main/ChristianRupprecht), [Vasileios Belagiannis](http://www.robots.ox.ac.uk/~vb/), [Federico Tombari](http://campar.in.tum.de/Main/FedericoTombari), [Nassir Navab](http://campar.in.tum.de/Main/NassirNavab). 4 | 5 | ## Contents 6 | 0. [Introduction](#introduction) 7 | 0. [Quick Guide](#quick-guide) 8 | 0. [Models](#models) 9 | 0. [Results](#results) 10 | 0. [Citation](#citation) 11 | 0. [License](#license) 12 | 13 | 14 | ## Introduction 15 | 16 | This repository contains the CNN models trained for depth prediction from a single RGB image, as described in the paper "[Deeper Depth Prediction with Fully Convolutional Residual Networks](https://arxiv.org/abs/1606.00373)". The provided models are those that were used to obtain the results reported in the paper on the benchmark datasets NYU Depth v2 and Make3D for indoor and outdoor scenes respectively. Moreover, the provided code can be used for inference on arbitrary images. 17 | 18 | 19 | ## Quick Guide 20 | 21 | The trained models are currently provided in two frameworks, MatConvNet and TensorFlow. Please read below for more information on how to get started. 22 | 23 | ### TensorFlow 24 | The code provided in the *tensorflow* folder requires accordingly a successful installation of the [TensorFlow](https://www.tensorflow.org/) library (any platform). 25 | The model's graph is constructed in ```fcrn.py``` and the corresponding weights can be downloaded using the link below. The implementation is based on [ethereon's](https://github.com/ethereon/caffe-tensorflow) Caffe-to-TensorFlow conversion tool. 26 | ```predict.py``` provides sample code for using the network to predict the depth map of an input image. Use ```python predict.py NYU_FCRN.ckpt yourimage.jpg``` to try the code. 27 | 28 | ### MatConvNet 29 | 30 | **Prerequisites** 31 | 32 | The code provided in the *matlab* folder requires the [MatConvNet toolbox](http://www.vlfeat.org/matconvnet/) for CNNs. It is required that a version of the library equal or newer than the 1.0-beta20 is successfully compiled either with or without GPU support. 33 | Furthermore, the user should modify ``` matconvnet_path = '../matconvnet-1.0-beta20' ``` within `evaluateNYU.m` and `evaluateMake3D.m` so that it points to the correct path, where the library is stored. 34 | 35 | **How-to** 36 | 37 | For acquiring the predicted depth maps and evaluation on NYU or Make3D *test sets*, the user can simply run `evaluateNYU.m` or `evaluateMake3D.m` respectively. Please note that all required data and models will be then automatically downloaded (if they do not already exist) and no further user intervention is needed, except for setting the options `opts` and `netOpts` as preferred. Make sure that you have enough free disk space (up to 5 GB). The predictions will be eventually saved in a .mat file in the specified directory. 38 | 39 | Alternatively, one could run `DepthMapPrediction.m` in order to manually use a trained model in test mode to predict the depth maps of arbitrary images. 40 | 41 | ## Models 42 | 43 | The models are fully convolutional and use the residual learning idea also for upsampling CNN layers. Here we provide the fastest variant in which interleaving of feature maps is used for upsampling. For this reason, a custom layer `+dagnn/Combine.m` is provided. 44 | 45 | The trained models - namely **ResNet-UpProj** in the paper - can also be downloaded here: 46 | 47 | - NYU Depth v2: [MatConvNet model](http://campar.in.tum.de/files/rupprecht/depthpred/NYU_ResNet-UpProj.zip), [TensorFlow model (.npy)](http://campar.in.tum.de/files/rupprecht/depthpred/NYU_ResNet-UpProj.npy), [TensorFlow model (.ckpt)](http://campar.in.tum.de/files/rupprecht/depthpred/NYU_FCRN-checkpoint.zip) 48 | - Make3D: [MatConvNet model](http://campar.in.tum.de/files/rupprecht/depthpred/Make3D_ResNet-UpProj.zip), TensorFlow model (soon) 49 | 50 | 51 | ## Results 52 | 53 | **NEW!** The predictions for the validation set of NYU-Depth-v2 dataset can also be downloaded [here](http://campar.in.tum.de/files/rupprecht/depthpred/predictions_NYUval.mat) (.mat). 54 | 55 | In the following tables, we report the results that should be obtained after evaluation and also compare to other (most recent) methods on depth prediction from a single image. 56 | - Error metrics on NYU Depth v2: 57 | 58 | | State of the art on NYU | rel | rms | log10 | 59 | |-----------------------------|:-----:|:-----:|:-----:| 60 | | [Roy & Todorovic](http://web.engr.oregonstate.edu/~sinisa/research/publications/cvpr16_NRF.pdf) (_CVPR 2016_) | 0.187 | 0.744 | 0.078 | 61 | | [Eigen & Fergus](http://cs.nyu.edu/~deigen/dnl/) (_ICCV 2015_) | 0.158 | 0.641 | - | 62 | | **Ours** | **0.127** | **0.573** | **0.055** | 63 | 64 | - Error metrics on Make3D: 65 | 66 | | State of the art on Make3D | rel | rms | log10 | 67 | |-----------------------------|:-----:|:-----:|:-----:| 68 | | [Liu et al.](https://bitbucket.org/fayao/dcnf-fcsp) (_CVPR 2015_) | 0.314 | 8.60 | 0.119 | 69 | | [Li et al.](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Li_Depth_and_Surface_2015_CVPR_paper.pdf) (_CVPR 2015_) | 0.278 | 7.19 | 0.092 | 70 | | **Ours** | **0.175** | **4.45** | **0.072** | 71 | 72 | - Qualitative results: 73 | ![Results](http://campar.in.tum.de/files/rupprecht/depthpred/images.jpg) 74 | 75 | ## Citation 76 | 77 | If you use this method in your research, please cite: 78 | 79 | @inproceedings{laina2016deeper, 80 | title={Deeper depth prediction with fully convolutional residual networks}, 81 | author={Laina, Iro and Rupprecht, Christian and Belagiannis, Vasileios and Tombari, Federico and Navab, Nassir}, 82 | booktitle={3D Vision (3DV), 2016 Fourth International Conference on}, 83 | pages={239--248}, 84 | year={2016}, 85 | organization={IEEE} 86 | } 87 | 88 | ## License 89 | 90 | Simplified BSD License 91 | 92 | Copyright (c) 2016, Iro Laina 93 | All rights reserved. 94 | 95 | Redistribution and use in source and binary forms, with or without 96 | modification, are permitted provided that the following conditions are met: 97 | 98 | * Redistributions of source code must retain the above copyright notice, this 99 | list of conditions and the following disclaimer. 100 | 101 | * Redistributions in binary form must reproduce the above copyright notice, 102 | this list of conditions and the following disclaimer in the documentation 103 | and/or other materials provided with the distribution. 104 | 105 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 106 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 107 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 108 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 109 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 110 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 111 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 112 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 113 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 114 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 115 | -------------------------------------------------------------------------------- /matlab/+dagnn/Combine.m: -------------------------------------------------------------------------------- 1 | classdef Combine < dagnn.ElementWise 2 | 3 | methods 4 | function outputs = forward(self, inputs, params) 5 | %double the size of feature maps, combining four responses 6 | Y = zeros(size(inputs{1},1)*2, size(inputs{1},2)*2, size(inputs{1},3), size(inputs{1},4), 'like', inputs{1}); 7 | Y(1:2:end, 1:2:end, :, :) = inputs{1}; %A 8 | Y(2:2:end, 1:2:end, :, :) = inputs{2}; %C 9 | Y(1:2:end, 2:2:end, :, :) = inputs{3}; %B 10 | Y(2:2:end, 2:2:end, :, :) = inputs{4}; %D 11 | outputs{1} = Y; 12 | end 13 | 14 | function [derInputs, derParams] = backward(self, inputs, params, derOutputs) 15 | %split the feature map into four feature maps of half size 16 | derInputs{1} = derOutputs{1}(1:2:end, 1:2:end, :, :); 17 | derInputs{2} = derOutputs{1}(2:2:end, 1:2:end, :, :); 18 | derInputs{3} = derOutputs{1}(1:2:end, 2:2:end, :, :); 19 | derInputs{4} = derOutputs{1}(2:2:end, 2:2:end, :, :); 20 | derParams = {} ; 21 | end 22 | 23 | function outputSizes = getOutputSizes(obj, inputSizes) 24 | outputSizes{1}(1) = 2*inputSizes{1}(1); 25 | outputSizes{1}(2) = 2*inputSizes{1}(2); 26 | outputSizes{1}(3) = inputSizes{1}(3); 27 | outputSizes{1}(4) = inputSizes{1}(4); 28 | end 29 | 30 | function obj = Combine(varargin) 31 | obj.load(varargin) ; 32 | end 33 | end 34 | end 35 | -------------------------------------------------------------------------------- /matlab/DepthMapPrediction.m: -------------------------------------------------------------------------------- 1 | function pred = DepthMapPrediction(imdb, net, varargin) 2 | 3 | % Depth prediction (inference) using a trained model. 4 | % Inputs (imdb) can be either from the NYUDepth_v2 or Make3D dataset, along 5 | % with the corresponding trained model (net). Additionally, the evaluation 6 | % can be run for any single image. MatConvNet library has to be already 7 | % setup for this function to work properly. 8 | % ------------------------------------------------------------------------- 9 | % Inputs: 10 | % - imdb: a structure with fields 'images' and 'depths' in the case of 11 | % the benchmark datasets with known ground truth. imdb could 12 | % alternatively be any single RGB image of size NxMx3 in [0,255] 13 | % or a tensor of D input images NxMx3xD. 14 | % - net: a trained model of type struct (suitable to be converted to a 15 | % DagNN object and successively processed using the DagNN 16 | % wrapper). For testing on arbitrary images, use NYU model for 17 | % indoor and Make3D model for outdoor scenes respectively. 18 | % ------------------------------------------------------------------------- 19 | 20 | opts.gpu = false; % Set to true (false) for GPU (CPU only) support 21 | opts.plot = false; % Set to true to visualize the predictions during inference 22 | opts = vl_argparse(opts, varargin); 23 | 24 | % Set network properties 25 | net = dagnn.DagNN.loadobj(net); 26 | net.mode = 'test'; 27 | out = net.getVarIndex('prediction'); 28 | if opts.gpu 29 | net.move('gpu'); 30 | end 31 | 32 | % Check input 33 | if isa(imdb, 'struct') 34 | % case of benchmark datasets (NYU, Make3D) 35 | images = imdb.images; 36 | groundTruth = imdb.depths; 37 | else 38 | % case of arbitrary image(s) 39 | images = imdb; 40 | images = imresize(images, net.meta.normalization.imageSize(1:2)); 41 | groundTruth = []; 42 | end 43 | 44 | % Get output size for initialization 45 | varSizes = net.getVarSizes({'data', net.meta.normalization.imageSize}); % get variable sizes 46 | pred = zeros(varSizes{out}(1), varSizes{out}(2), varSizes{out}(3), size(images, 4)); % initiliaze 47 | 48 | if opts.plot, figure(); end 49 | 50 | fprintf('predicting...\n'); 51 | for i = 1:size(images, 4) 52 | % get input image 53 | im = single(images(:,:,:,i)); 54 | if opts.gpu 55 | im = gpuArray(im); 56 | end 57 | 58 | % run the CNN 59 | inputs = {'data', im}; 60 | net.eval(inputs) ; 61 | 62 | % obtain prediction 63 | pred(:,:,i) = gather(net.vars(out).value); 64 | 65 | % visualize results 66 | if opts.plot 67 | colormap jet 68 | if ~isempty(groundTruth) 69 | subplot(1,3,1), imagesc(uint8(images(:,:,:,i))), title('RGB Input'), axis off 70 | subplot(1,3,2), imagesc(groundTruth(:,:,i)), title('Depth Ground Truth'), axis off 71 | subplot(1,3,3), imagesc(pred(:,:,i)), title('Depth Prediction'), axis off 72 | else 73 | subplot(1,2,1), imagesc(uint8(images(:,:,:,i))), title('RGB Input'), axis off 74 | subplot(1,2,2), imagesc(pred(:,:,i)), title('Depth Prediction'), axis off 75 | end 76 | drawnow; 77 | end 78 | end 79 | -------------------------------------------------------------------------------- /matlab/error_metrics.m: -------------------------------------------------------------------------------- 1 | function results = error_metrics(pred, gt, mask) 2 | 3 | % Compute error metrics on benchmark datasets 4 | % ------------------------------------------------------------------------- 5 | 6 | % make sure predictions and ground truth have same dimensions 7 | if size(pred) ~= size(gt) 8 | pred = imresize(pred, [size(gt,1), size(gt,2)], 'bilinear'); 9 | end 10 | 11 | if isempty(mask) 12 | n_pxls = numel(gt); 13 | else 14 | n_pxls = sum(mask(:)); % average over valid pixels only 15 | end 16 | 17 | fprintf('\n Errors computed over the entire test set \n'); 18 | fprintf('------------------------------------------\n'); 19 | 20 | % Mean Absolute Relative Error 21 | rel = abs(gt(:) - pred(:)) ./ gt(:); % compute errors 22 | rel(~mask) = 0; % mask out invalid ground truth pixels 23 | rel = sum(rel) / n_pxls; % average over all pixels 24 | fprintf('Mean Absolute Relative Error: %4f\n', rel); 25 | 26 | % Root Mean Squared Error 27 | rms = (gt(:) - pred(:)).^2; 28 | rms(~mask) = 0; 29 | rms = sqrt(sum(rms) / n_pxls); 30 | fprintf('Root Mean Squared Error: %4f\n', rms); 31 | 32 | % LOG10 Error 33 | lg10 = abs(log10(gt(:)) - log10(pred(:))); 34 | lg10(~mask) = 0; 35 | lg10 = sum(lg10) / n_pxls ; 36 | fprintf('Mean Log10 Error: %4f\n', lg10); 37 | 38 | results.rel = rel; 39 | results.rms = rms; 40 | results.log10 = lg10; 41 | -------------------------------------------------------------------------------- /matlab/evaluateMake3D.m: -------------------------------------------------------------------------------- 1 | function evaluateMake3D 2 | 3 | % Evaluation of depth prediction on Make3D dataset. 4 | 5 | % ------------------------------------------------------------------------- 6 | % Setup MatConvNet 7 | % ------------------------------------------------------------------------- 8 | 9 | % Set your matconvnet path here: 10 | matconvnet_path = '../../matconvnet-1.0-beta20'; 11 | setupMatConvNet(matconvnet_path); 12 | 13 | % ------------------------------------------------------------------------- 14 | % Options 15 | % ------------------------------------------------------------------------- 16 | 17 | opts.dataDir = fullfile(pwd, 'Make3D'); % working directory 18 | opts.interp = 'nearest'; % interpolation method applied during resizing 19 | opts.imageSize = [460,345]; % desired image size for evaluation 20 | 21 | netOpts.gpu = true; % set to true to enable GPU support 22 | netOpts.plot = true; % set to true to visualize the predictions during inference 23 | 24 | % ------------------------------------------------------------------------- 25 | % Prepate data 26 | % ------------------------------------------------------------------------- 27 | 28 | imdb = get_Make3D(opts); 29 | net = get_model(opts); 30 | 31 | % Test set 32 | testSet.images = imdb.images(:,:,:, imdb.set == 2); 33 | testSet.depths = imdb.depths(:,:, imdb.set == 2); 34 | 35 | % resize images to input resolution (equal to round(opts.imageSize/2)) 36 | testSet.images = imresize(testSet.images, net.meta.normalization.imageSize(1:2), opts.interp); 37 | % resize depth to opts.imageSize resolution 38 | testSet.depths = imresize(testSet.depths, opts.imageSize, opts.interp); 39 | 40 | % ------------------------------------------------------------------------- 41 | % Evaluate network 42 | % ------------------------------------------------------------------------- 43 | 44 | % Get predictions 45 | predictions = DepthMapPrediction(testSet, net, netOpts); 46 | predictions = squeeze(predictions); % remove singleton dimensions 47 | predictions = imresize(predictions, [size(testSet.depths,1), size(testSet.depths,2)], 'bilinear'); %rescale 48 | 49 | % Error calculation 50 | c1_mask = testSet.depths > 0 & testSet.depths < 70; 51 | errors = error_metrics(predictions, testSet.depths, c1_mask); 52 | 53 | % Save results 54 | fprintf('\nsaving predictions...'); 55 | save(fullfile(opts.dataDir, 'results.mat'), 'predictions', 'errors', '-v7.3'); 56 | fprintf('done!\n'); 57 | 58 | 59 | function imdb = get_Make3D(opts) 60 | % ------------------------------------------------------------------------- 61 | % Download required data (test only) 62 | % ------------------------------------------------------------------------- 63 | 64 | opts.dataDirImages = fullfile(opts.dataDir, 'data', 'Test134'); 65 | opts.dataDirDepths = fullfile(opts.dataDir, 'data', 'Gridlaserdata'); 66 | 67 | % Download test set 68 | if ~exist(opts.dataDirImages, 'dir') 69 | fprintf('downloading Make3D testing images (~190 MB)...'); 70 | mkdir(opts.dataDirImages); 71 | untar('http://www.cs.cornell.edu/~asaxena/learningdepth/Test134.tar.gz', fileparts(opts.dataDirImages)); 72 | fprintf('done.\n'); 73 | end 74 | 75 | if ~exist(opts.dataDirDepths, 'dir') 76 | fprintf('downloading Make3D testing depth maps (~22 MB)...'); 77 | mkdir(opts.dataDirDepths); 78 | untar('http://www.cs.cornell.edu/~asaxena/learningdepth/Test134Depth.tar.gz', fileparts(opts.dataDirDepths)); 79 | fprintf('done.\n'); 80 | end 81 | 82 | fprintf('preparing testing data...'); 83 | img_files = dir(fullfile(opts.dataDirImages, 'img-*.jpg')); 84 | depth_files = dir(fullfile(opts.dataDirDepths, 'depth_sph_corr-*.mat')); 85 | 86 | % Verify that the correct number of files has been found 87 | assert(numel(img_files)==134, 'Incorrect number of Make3D test images. \n'); 88 | assert(numel(depth_files)==134, 'Incorrect number of Make3D test depths. \n'); 89 | 90 | % Read dataset files and store necessary information to imdb structure 91 | for i = 1:numel(img_files) 92 | imdb.images(:,:,:,i) = single(imread(fullfile(opts.dataDirImages, img_files(i).name))); % get RGB image 93 | gt = load(fullfile(opts.dataDirDepths, depth_files(i).name)); 94 | imdb.depths(:,:,i) = single(gt.Position3DGrid(:,:,4)); % get depth channel 95 | imdb.set(i) = 2; 96 | end 97 | fprintf(' done!\n'); 98 | 99 | 100 | 101 | function net = get_model(opts) 102 | % ------------------------------------------------------------------------- 103 | % Download trained models 104 | % ------------------------------------------------------------------------- 105 | 106 | opts.dataDir = fullfile(opts.dataDir, 'models'); 107 | if ~exist(opts.dataDir, 'dir'), mkdir(opts.dataDir); end 108 | 109 | filename = fullfile(opts.dataDir, 'Make3D_ResNet-UpProj.mat'); 110 | if ~exist(filename, 'file') 111 | url = 'http://campar.in.tum.de/files/rupprecht/depthpred/Make3D_ResNet-UpProj.zip'; 112 | fprintf('downloading trained model: %s\n', url); 113 | unzip(url, opts.dataDir); 114 | end 115 | 116 | net = load(filename); 117 | 118 | -------------------------------------------------------------------------------- /matlab/evaluateNYU.m: -------------------------------------------------------------------------------- 1 | function evaluateNYU 2 | 3 | % Evaluation of depth prediction on NYU Depth v2 dataset. 4 | 5 | % ------------------------------------------------------------------------- 6 | % Setup MatConvNet 7 | % ------------------------------------------------------------------------- 8 | 9 | % Set your matconvnet path here: 10 | matconvnet_path = '../../matconvnet-1.0-beta20'; 11 | setupMatConvNet(matconvnet_path); 12 | 13 | % ------------------------------------------------------------------------- 14 | % Options 15 | % ------------------------------------------------------------------------- 16 | 17 | opts.dataDir = fullfile(pwd, 'NYU'); % working directory 18 | opts.interp = 'nearest'; % interpolation method applied during resizing 19 | 20 | netOpts.gpu = true; % set to true to enable GPU support 21 | netOpts.plot = true; % set to true to visualize the predictions during inference 22 | 23 | % ------------------------------------------------------------------------- 24 | % Prepate data 25 | % ------------------------------------------------------------------------- 26 | 27 | imdb = get_NYUDepth_v2(opts); 28 | net = get_model(opts); 29 | 30 | % Test set 31 | testSet.images = imdb.images(:,:,:, imdb.set == 2); 32 | testSet.depths = imdb.depths(:,:, imdb.set == 2); 33 | 34 | % Prepare input for evaluation through the network, in accordance to the 35 | % way the model was trained for the NYU dataset. No processing is applied 36 | % to the ground truth. 37 | meta = net.meta.normalization; % information about input 38 | res = meta.imageSize(1:2) + 2*meta.border; 39 | testSet.images = imresize(testSet.images, res, opts.interp); % resize 40 | testSet.images = testSet.images(1+meta.border(1):end-meta.border(1), 1+meta.border(2):end-meta.border(2), :, :); % center crop 41 | 42 | % ------------------------------------------------------------------------- 43 | % Evaluate network 44 | % ------------------------------------------------------------------------- 45 | 46 | % Get predictions 47 | predictions = DepthMapPrediction(testSet, net, netOpts); 48 | predictions = squeeze(predictions); % remove singleton dimensions 49 | predictions = imresize(predictions, [size(testSet.depths,1), size(testSet.depths,2)], 'bilinear'); %rescale 50 | 51 | % Error calculation 52 | errors = error_metrics(predictions, testSet.depths, []); 53 | 54 | % Save results 55 | fprintf('\nsaving predictions...'); 56 | save(fullfile(opts.dataDir, 'results.mat'), 'predictions', 'errors', '-v7.3'); 57 | fprintf('done!\n'); 58 | 59 | 60 | 61 | function imdb = get_NYUDepth_v2(opts) 62 | % ------------------------------------------------------------------------- 63 | % Download required data 64 | % ------------------------------------------------------------------------- 65 | 66 | opts.dataDir = fullfile(opts.dataDir, 'data'); 67 | if ~exist(opts.dataDir, 'dir'), mkdir(opts.dataDir); end 68 | 69 | % Download dataset 70 | filename = fullfile(opts.dataDir, 'nyu_depth_v2_labeled.mat'); 71 | if ~exist(filename, 'file') 72 | url = 'http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat'; 73 | fprintf('downloading dataset (~2.8 GB): %s\n', url); 74 | websave(filename, url); 75 | end 76 | 77 | % Download official train/test split 78 | filename_splits = fullfile(opts.dataDir, 'splits.mat'); 79 | if ~exist(filename_splits, 'file') 80 | url_split = 'http://horatio.cs.nyu.edu/mit/silberman/indoor_seg_sup/splits.mat'; 81 | fprintf('downloading train/test split: %s\n', url_split); 82 | websave(filename_splits, url_split); 83 | end 84 | 85 | % Load dataset and splits 86 | fprintf('loading data to workspace...'); 87 | data = load(filename); 88 | splits = load(filename_splits); 89 | 90 | % Store necessary information to imdb structure 91 | imdb.images = single(data.images); %(no mean subtraction has been performed) 92 | imdb.depths = single(data.depths); %depth filled-in values 93 | imdb.set(splits.trainNdxs) = 1; %training indices (ignored for inference) 94 | imdb.set(splits.testNdxs) = 2; %testing indices (on which evaluation is performed) 95 | fprintf(' done!\n'); 96 | 97 | 98 | 99 | function net = get_model(opts) 100 | % ------------------------------------------------------------------------- 101 | % Download trained models 102 | % ------------------------------------------------------------------------- 103 | 104 | opts.dataDir = fullfile(opts.dataDir, 'models'); 105 | if ~exist(opts.dataDir, 'dir'), mkdir(opts.dataDir); end 106 | 107 | filename = fullfile(opts.dataDir, 'NYU_ResNet-UpProj.mat'); 108 | if ~exist(filename, 'file') 109 | url = 'http://campar.in.tum.de/files/rupprecht/depthpred/NYU_ResNet-UpProj.zip'; 110 | fprintf('downloading trained model: %s\n', url); 111 | unzip(url, opts.dataDir); 112 | end 113 | 114 | net = load(filename); 115 | 116 | -------------------------------------------------------------------------------- /matlab/setupMatConvNet.m: -------------------------------------------------------------------------------- 1 | function setupMatConvNet(matconvnet_path) 2 | 3 | % check path 4 | if ~exist(fullfile(matconvnet_path, 'matlab', 'vl_setupnn.m'), 'file') 5 | error('Count not find MatConvNet in "%s"!\nPlease point matcovnet_path to the correct directory.\n', matconvnet_path); 6 | end 7 | 8 | % check if it is the right version (beta-20) 9 | mcnv = getMatConvNetVersion(matconvnet_path); 10 | if ~strcmp(mcnv, '1.0-beta20') 11 | error('Your MatConvNet version (%s) is not the required version 1.0-beta20. Please download and compile the right version.', mcnv); 12 | end 13 | 14 | % if everything is fine, then set up 15 | run(fullfile(matconvnet_path, 'matlab', 'vl_setupnn.m')); 16 | 17 | 18 | 19 | function versionName = getMatConvNetVersion(matconvnet_path) 20 | 21 | fid = fopen(fullfile(matconvnet_path, 'Makefile'), 'rt'); 22 | s = textscan(fid, '%s', 'delimiter', '\n'); 23 | fclose(fid); 24 | idxs = find(~cellfun(@isempty,strfind(s{1}, 'VER = '))); 25 | mcnVersion = s{1}(idxs(1)); 26 | versionName = mcnVersion{1}(7:end); -------------------------------------------------------------------------------- /tensorflow/models/__init__.py: -------------------------------------------------------------------------------- 1 | from .fcrn import ResNet50UpProj 2 | -------------------------------------------------------------------------------- /tensorflow/models/fcrn.py: -------------------------------------------------------------------------------- 1 | from .network import Network 2 | 3 | class ResNet50UpProj(Network): 4 | def setup(self): 5 | (self.feed('data') 6 | .conv(7, 7, 64, 2, 2, relu=False, name='conv1') 7 | .batch_normalization(relu=True, name='bn_conv1') 8 | .max_pool(3, 3, 2, 2, name='pool1') 9 | .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2a_branch1') 10 | .batch_normalization(name='bn2a_branch1')) 11 | 12 | (self.feed('pool1') 13 | .conv(1, 1, 64, 1, 1, biased=False, relu=False, name='res2a_branch2a') 14 | .batch_normalization(relu=True, name='bn2a_branch2a') 15 | .conv(3, 3, 64, 1, 1, biased=False, relu=False, name='res2a_branch2b') 16 | .batch_normalization(relu=True, name='bn2a_branch2b') 17 | .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2a_branch2c') 18 | .batch_normalization(name='bn2a_branch2c')) 19 | 20 | (self.feed('bn2a_branch1', 21 | 'bn2a_branch2c') 22 | .add(name='res2a') 23 | .relu(name='res2a_relu') 24 | .conv(1, 1, 64, 1, 1, biased=False, relu=False, name='res2b_branch2a') 25 | .batch_normalization(relu=True, name='bn2b_branch2a') 26 | .conv(3, 3, 64, 1, 1, biased=False, relu=False, name='res2b_branch2b') 27 | .batch_normalization(relu=True, name='bn2b_branch2b') 28 | .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2b_branch2c') 29 | .batch_normalization(name='bn2b_branch2c')) 30 | 31 | (self.feed('res2a_relu', 32 | 'bn2b_branch2c') 33 | .add(name='res2b') 34 | .relu(name='res2b_relu') 35 | .conv(1, 1, 64, 1, 1, biased=False, relu=False, name='res2c_branch2a') 36 | .batch_normalization(relu=True, name='bn2c_branch2a') 37 | .conv(3, 3, 64, 1, 1, biased=False, relu=False, name='res2c_branch2b') 38 | .batch_normalization(relu=True, name='bn2c_branch2b') 39 | .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2c_branch2c') 40 | .batch_normalization(name='bn2c_branch2c')) 41 | 42 | (self.feed('res2b_relu', 43 | 'bn2c_branch2c') 44 | .add(name='res2c') 45 | .relu(name='res2c_relu') 46 | .conv(1, 1, 512, 2, 2, biased=False, relu=False, name='res3a_branch1') 47 | .batch_normalization(name='bn3a_branch1')) 48 | 49 | (self.feed('res2c_relu') 50 | .conv(1, 1, 128, 2, 2, biased=False, relu=False, name='res3a_branch2a') 51 | .batch_normalization(relu=True, name='bn3a_branch2a') 52 | .conv(3, 3, 128, 1, 1, biased=False, relu=False, name='res3a_branch2b') 53 | .batch_normalization(relu=True, name='bn3a_branch2b') 54 | .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res3a_branch2c') 55 | .batch_normalization(name='bn3a_branch2c')) 56 | 57 | (self.feed('bn3a_branch1', 58 | 'bn3a_branch2c') 59 | .add(name='res3a') 60 | .relu(name='res3a_relu') 61 | .conv(1, 1, 128, 1, 1, biased=False, relu=False, name='res3b_branch2a') 62 | .batch_normalization(relu=True, name='bn3b_branch2a') 63 | .conv(3, 3, 128, 1, 1, biased=False, relu=False, name='res3b_branch2b') 64 | .batch_normalization(relu=True, name='bn3b_branch2b') 65 | .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res3b_branch2c') 66 | .batch_normalization(name='bn3b_branch2c')) 67 | 68 | (self.feed('res3a_relu', 69 | 'bn3b_branch2c') 70 | .add(name='res3b') 71 | .relu(name='res3b_relu') 72 | .conv(1, 1, 128, 1, 1, biased=False, relu=False, name='res3c_branch2a') 73 | .batch_normalization(relu=True, name='bn3c_branch2a') 74 | .conv(3, 3, 128, 1, 1, biased=False, relu=False, name='res3c_branch2b') 75 | .batch_normalization(relu=True, name='bn3c_branch2b') 76 | .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res3c_branch2c') 77 | .batch_normalization(name='bn3c_branch2c')) 78 | 79 | (self.feed('res3b_relu', 80 | 'bn3c_branch2c') 81 | .add(name='res3c') 82 | .relu(name='res3c_relu') 83 | .conv(1, 1, 128, 1, 1, biased=False, relu=False, name='res3d_branch2a') 84 | .batch_normalization(relu=True, name='bn3d_branch2a') 85 | .conv(3, 3, 128, 1, 1, biased=False, relu=False, name='res3d_branch2b') 86 | .batch_normalization(relu=True, name='bn3d_branch2b') 87 | .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res3d_branch2c') 88 | .batch_normalization(name='bn3d_branch2c')) 89 | 90 | (self.feed('res3c_relu', 91 | 'bn3d_branch2c') 92 | .add(name='res3d') 93 | .relu(name='res3d_relu') 94 | .conv(1, 1, 1024, 2, 2, biased=False, relu=False, name='res4a_branch1') 95 | .batch_normalization(name='bn4a_branch1')) 96 | 97 | (self.feed('res3d_relu') 98 | .conv(1, 1, 256, 2, 2, biased=False, relu=False, name='res4a_branch2a') 99 | .batch_normalization(relu=True, name='bn4a_branch2a') 100 | .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4a_branch2b') 101 | .batch_normalization(relu=True, name='bn4a_branch2b') 102 | .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4a_branch2c') 103 | .batch_normalization(name='bn4a_branch2c')) 104 | 105 | (self.feed('bn4a_branch1', 106 | 'bn4a_branch2c') 107 | .add(name='res4a') 108 | .relu(name='res4a_relu') 109 | .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b_branch2a') 110 | .batch_normalization(relu=True, name='bn4b_branch2a') 111 | .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b_branch2b') 112 | .batch_normalization(relu=True, name='bn4b_branch2b') 113 | .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b_branch2c') 114 | .batch_normalization(name='bn4b_branch2c')) 115 | 116 | (self.feed('res4a_relu', 117 | 'bn4b_branch2c') 118 | .add(name='res4b') 119 | .relu(name='res4b_relu') 120 | .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4c_branch2a') 121 | .batch_normalization(relu=True, name='bn4c_branch2a') 122 | .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4c_branch2b') 123 | .batch_normalization(relu=True, name='bn4c_branch2b') 124 | .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4c_branch2c') 125 | .batch_normalization(name='bn4c_branch2c')) 126 | 127 | (self.feed('res4b_relu', 128 | 'bn4c_branch2c') 129 | .add(name='res4c') 130 | .relu(name='res4c_relu') 131 | .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4d_branch2a') 132 | .batch_normalization(relu=True, name='bn4d_branch2a') 133 | .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4d_branch2b') 134 | .batch_normalization(relu=True, name='bn4d_branch2b') 135 | .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4d_branch2c') 136 | .batch_normalization(name='bn4d_branch2c')) 137 | 138 | (self.feed('res4c_relu', 139 | 'bn4d_branch2c') 140 | .add(name='res4d') 141 | .relu(name='res4d_relu') 142 | .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4e_branch2a') 143 | .batch_normalization(relu=True, name='bn4e_branch2a') 144 | .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4e_branch2b') 145 | .batch_normalization(relu=True, name='bn4e_branch2b') 146 | .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4e_branch2c') 147 | .batch_normalization(name='bn4e_branch2c')) 148 | 149 | (self.feed('res4d_relu', 150 | 'bn4e_branch2c') 151 | .add(name='res4e') 152 | .relu(name='res4e_relu') 153 | .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4f_branch2a') 154 | .batch_normalization(relu=True, name='bn4f_branch2a') 155 | .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4f_branch2b') 156 | .batch_normalization(relu=True, name='bn4f_branch2b') 157 | .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4f_branch2c') 158 | .batch_normalization(name='bn4f_branch2c')) 159 | 160 | (self.feed('res4e_relu', 161 | 'bn4f_branch2c') 162 | .add(name='res4f') 163 | .relu(name='res4f_relu') 164 | .conv(1, 1, 2048, 2, 2, biased=False, relu=False, name='res5a_branch1') 165 | .batch_normalization(name='bn5a_branch1')) 166 | 167 | (self.feed('res4f_relu') 168 | .conv(1, 1, 512, 2, 2, biased=False, relu=False, name='res5a_branch2a') 169 | .batch_normalization(relu=True, name='bn5a_branch2a') 170 | .conv(3, 3, 512, 1, 1, biased=False, relu=False, name='res5a_branch2b') 171 | .batch_normalization(relu=True, name='bn5a_branch2b') 172 | .conv(1, 1, 2048, 1, 1, biased=False, relu=False, name='res5a_branch2c') 173 | .batch_normalization(name='bn5a_branch2c')) 174 | 175 | (self.feed('bn5a_branch1', 176 | 'bn5a_branch2c') 177 | .add(name='res5a') 178 | .relu(name='res5a_relu') 179 | .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res5b_branch2a') 180 | .batch_normalization(relu=True, name='bn5b_branch2a') 181 | .conv(3, 3, 512, 1, 1, biased=False, relu=False, name='res5b_branch2b') 182 | .batch_normalization(relu=True, name='bn5b_branch2b') 183 | .conv(1, 1, 2048, 1, 1, biased=False, relu=False, name='res5b_branch2c') 184 | .batch_normalization(name='bn5b_branch2c')) 185 | 186 | (self.feed('res5a_relu', 187 | 'bn5b_branch2c') 188 | .add(name='res5b') 189 | .relu(name='res5b_relu') 190 | .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res5c_branch2a') 191 | .batch_normalization(relu=True, name='bn5c_branch2a') 192 | .conv(3, 3, 512, 1, 1, biased=False, relu=False, name='res5c_branch2b') 193 | .batch_normalization(relu=True, name='bn5c_branch2b') 194 | .conv(1, 1, 2048, 1, 1, biased=False, relu=False, name='res5c_branch2c') 195 | .batch_normalization(name='bn5c_branch2c')) 196 | 197 | (self.feed('res5b_relu', 198 | 'bn5c_branch2c') 199 | .add(name='res5c') 200 | .relu(name='res5c_relu') 201 | .conv(1, 1, 1024, 1, 1, biased=True, relu=False, name='layer1') 202 | .batch_normalization(relu=False, name='layer1_BN') 203 | .up_project([3, 3, 1024, 512], id = '2x', stride = 1, BN=True) 204 | .up_project([3, 3, 512, 256], id = '4x', stride = 1, BN=True) 205 | .up_project([3, 3, 256, 128], id = '8x', stride = 1, BN=True) 206 | .up_project([3, 3, 128, 64], id = '16x', stride = 1, BN=True) 207 | .dropout(name = 'drop', keep_prob = 1.) 208 | .conv(3, 3, 1, 1, 1, name = 'ConvPred')) 209 | -------------------------------------------------------------------------------- /tensorflow/models/network.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import tensorflow as tf 3 | 4 | # ---------------------------------------------------------------------------------- 5 | # Commonly used layers and operations based on ethereon's implementation 6 | # https://github.com/ethereon/caffe-tensorflow 7 | # Slight modifications may apply. FCRN-specific operations have also been appended. 8 | # ---------------------------------------------------------------------------------- 9 | # Thanks to *Helisa Dhamo* for the model conversion and integration into TensorFlow. 10 | # ---------------------------------------------------------------------------------- 11 | 12 | DEFAULT_PADDING = 'SAME' 13 | 14 | 15 | def get_incoming_shape(incoming): 16 | """ Returns the incoming data shape """ 17 | if isinstance(incoming, tf.Tensor): 18 | return incoming.get_shape().as_list() 19 | elif type(incoming) in [np.array, list, tuple]: 20 | return np.shape(incoming) 21 | else: 22 | raise Exception("Invalid incoming layer.") 23 | 24 | 25 | def interleave(tensors, axis): 26 | old_shape = get_incoming_shape(tensors[0])[1:] 27 | new_shape = [-1] + old_shape 28 | new_shape[axis] *= len(tensors) 29 | return tf.reshape(tf.stack(tensors, axis + 1), new_shape) 30 | 31 | def layer(op): 32 | '''Decorator for composable network layers.''' 33 | 34 | def layer_decorated(self, *args, **kwargs): 35 | # Automatically set a name if not provided. 36 | name = kwargs.setdefault('name', self.get_unique_name(op.__name__)) 37 | 38 | # Figure out the layer inputs. 39 | if len(self.terminals) == 0: 40 | raise RuntimeError('No input variables found for layer %s.' % name) 41 | elif len(self.terminals) == 1: 42 | layer_input = self.terminals[0] 43 | else: 44 | layer_input = list(self.terminals) 45 | # Perform the operation and get the output. 46 | layer_output = op(self, layer_input, *args, **kwargs) 47 | # Add to layer LUT. 48 | self.layers[name] = layer_output 49 | # This output is now the input for the next layer. 50 | self.feed(layer_output) 51 | # Return self for chained calls. 52 | return self 53 | 54 | return layer_decorated 55 | 56 | 57 | class Network(object): 58 | 59 | def __init__(self, inputs, batch, keep_prob, is_training, trainable = True): 60 | # The input nodes for this network 61 | self.inputs = inputs 62 | # The current list of terminal nodes 63 | self.terminals = [] 64 | # Mapping from layer names to layers 65 | self.layers = dict(inputs) 66 | # If true, the resulting variables are set as trainable 67 | self.trainable = trainable 68 | self.batch_size = batch 69 | self.keep_prob = keep_prob 70 | self.is_training = is_training 71 | self.setup() 72 | 73 | 74 | def setup(self): 75 | '''Construct the network. ''' 76 | raise NotImplementedError('Must be implemented by the subclass.') 77 | 78 | def load(self, data_path, session, ignore_missing=False): 79 | '''Load network weights. 80 | data_path: The path to the numpy-serialized network weights 81 | session: The current TensorFlow session 82 | ignore_missing: If true, serialized weights for missing layers are ignored. 83 | ''' 84 | data_dict = np.load(data_path, encoding='latin1').item() 85 | for op_name in data_dict: 86 | with tf.variable_scope(op_name, reuse=True): 87 | for param_name, data in iter(data_dict[op_name].items()): 88 | try: 89 | var = tf.get_variable(param_name) 90 | session.run(var.assign(data)) 91 | 92 | except ValueError: 93 | if not ignore_missing: 94 | raise 95 | 96 | def feed(self, *args): 97 | '''Set the input(s) for the next operation by replacing the terminal nodes. 98 | The arguments can be either layer names or the actual layers. 99 | ''' 100 | assert len(args) != 0 101 | self.terminals = [] 102 | for fed_layer in args: 103 | if isinstance(fed_layer, str): 104 | try: 105 | fed_layer = self.layers[fed_layer] 106 | except KeyError: 107 | raise KeyError('Unknown layer name fed: %s' % fed_layer) 108 | self.terminals.append(fed_layer) 109 | return self 110 | 111 | def get_output(self): 112 | '''Returns the current network output.''' 113 | return self.terminals[-1] 114 | 115 | def get_layer_output(self, name): 116 | return self.layers[name] 117 | 118 | def get_unique_name(self, prefix): 119 | '''Returns an index-suffixed unique name for the given prefix. 120 | This is used for auto-generating layer names based on the type-prefix. 121 | ''' 122 | ident = sum(t.startswith(prefix) for t, _ in self.layers.items()) + 1 123 | return '%s_%d' % (prefix, ident) 124 | 125 | def make_var(self, name, shape): 126 | '''Creates a new TensorFlow variable.''' 127 | return tf.get_variable(name, shape, dtype = 'float32', trainable=self.trainable) 128 | 129 | def validate_padding(self, padding): 130 | '''Verifies that the padding is one of the supported ones.''' 131 | assert padding in ('SAME', 'VALID') 132 | 133 | @layer 134 | def conv(self, 135 | input_data, 136 | k_h, 137 | k_w, 138 | c_o, 139 | s_h, 140 | s_w, 141 | name, 142 | relu=True, 143 | padding=DEFAULT_PADDING, 144 | group=1, 145 | biased=True): 146 | 147 | # Verify that the padding is acceptable 148 | self.validate_padding(padding) 149 | # Get the number of channels in the input 150 | c_i = input_data.get_shape()[-1] 151 | 152 | if (padding == 'SAME'): 153 | input_data = tf.pad(input_data, [[0, 0], [(k_h - 1)//2, (k_h - 1)//2], [(k_w - 1)//2, (k_w - 1)//2], [0, 0]], "CONSTANT") 154 | 155 | # Verify that the grouping parameter is valid 156 | assert c_i % group == 0 157 | assert c_o % group == 0 158 | # Convolution for a given input and kernel 159 | convolve = lambda i, k: tf.nn.conv2d(i, k, [1, s_h, s_w, 1], padding='VALID') 160 | 161 | with tf.variable_scope(name) as scope: 162 | kernel = self.make_var('weights', shape=[k_h, k_w, c_i // group, c_o]) 163 | 164 | if group == 1: 165 | # This is the common-case. Convolve the input without any further complications. 166 | output = convolve(input_data, kernel) 167 | else: 168 | # Split the input into groups and then convolve each of them independently 169 | 170 | input_groups = tf.split(3, group, input_data) 171 | kernel_groups = tf.split(3, group, kernel) 172 | output_groups = [convolve(i, k) for i, k in zip(input_groups, kernel_groups)] 173 | # Concatenate the groups 174 | output = tf.concat(3, output_groups) 175 | 176 | # Add the biases 177 | if biased: 178 | biases = self.make_var('biases', [c_o]) 179 | output = tf.nn.bias_add(output, biases) 180 | if relu: 181 | # ReLU non-linearity 182 | output = tf.nn.relu(output, name=scope.name) 183 | 184 | return output 185 | 186 | @layer 187 | def relu(self, input_data, name): 188 | return tf.nn.relu(input_data, name=name) 189 | 190 | @layer 191 | def max_pool(self, input_data, k_h, k_w, s_h, s_w, name, padding=DEFAULT_PADDING): 192 | self.validate_padding(padding) 193 | return tf.nn.max_pool(input_data, 194 | ksize=[1, k_h, k_w, 1], 195 | strides=[1, s_h, s_w, 1], 196 | padding=padding, 197 | name=name) 198 | 199 | @layer 200 | def avg_pool(self, input_data, k_h, k_w, s_h, s_w, name, padding=DEFAULT_PADDING): 201 | self.validate_padding(padding) 202 | return tf.nn.avg_pool(input_data, 203 | ksize=[1, k_h, k_w, 1], 204 | strides=[1, s_h, s_w, 1], 205 | padding=padding, 206 | name=name) 207 | 208 | @layer 209 | def lrn(self, input_data, radius, alpha, beta, name, bias=1.0): 210 | return tf.nn.local_response_normalization(input_data, 211 | depth_radius=radius, 212 | alpha=alpha, 213 | beta=beta, 214 | bias=bias, 215 | name=name) 216 | 217 | @layer 218 | def concat(self, inputs, axis, name): 219 | return tf.concat(concat_dim=axis, values=inputs, name=name) 220 | 221 | @layer 222 | def add(self, inputs, name): 223 | return tf.add_n(inputs, name=name) 224 | 225 | @layer 226 | def fc(self, input_data, num_out, name, relu=True): 227 | with tf.variable_scope(name) as scope: 228 | input_shape = input_data.get_shape() 229 | if input_shape.ndims == 4: 230 | # The input is spatial. Vectorize it first. 231 | dim = 1 232 | for d in input_shape[1:].as_list(): 233 | dim *= d 234 | feed_in = tf.reshape(input_data, [-1, dim]) 235 | else: 236 | feed_in, dim = (input_data, input_shape[-1].value) 237 | weights = self.make_var('weights', shape=[dim, num_out]) 238 | biases = self.make_var('biases', [num_out]) 239 | op = tf.nn.relu_layer if relu else tf.nn.xw_plus_b 240 | fc = op(feed_in, weights, biases, name=scope.name) 241 | return fc 242 | 243 | @layer 244 | def softmax(self, input_data, name): 245 | input_shape = map(lambda v: v.value, input_data.get_shape()) 246 | if len(input_shape) > 2: 247 | # For certain models (like NiN), the singleton spatial dimensions 248 | # need to be explicitly squeezed, since they're not broadcast-able 249 | # in TensorFlow's NHWC ordering (unlike Caffe's NCHW). 250 | if input_shape[1] == 1 and input_shape[2] == 1: 251 | input_data = tf.squeeze(input_data, squeeze_dims=[1, 2]) 252 | else: 253 | raise ValueError('Rank 2 tensor input expected for softmax!') 254 | return tf.nn.softmax(input_data, name) 255 | 256 | @layer 257 | def batch_normalization(self, input_data, name, scale_offset=True, relu=False): 258 | 259 | with tf.variable_scope(name) as scope: 260 | shape = [input_data.get_shape()[-1]] 261 | pop_mean = tf.get_variable("mean", shape, initializer = tf.constant_initializer(0.0), trainable=False) 262 | pop_var = tf.get_variable("variance", shape, initializer = tf.constant_initializer(1.0), trainable=False) 263 | epsilon = 1e-4 264 | decay = 0.999 265 | if scale_offset: 266 | scale = tf.get_variable("scale", shape, initializer = tf.constant_initializer(1.0)) 267 | offset = tf.get_variable("offset", shape, initializer = tf.constant_initializer(0.0)) 268 | else: 269 | scale, offset = (None, None) 270 | if self.is_training: 271 | batch_mean, batch_var = tf.nn.moments(input_data, [0, 1, 2]) 272 | 273 | train_mean = tf.assign(pop_mean, 274 | pop_mean * decay + batch_mean * (1 - decay)) 275 | train_var = tf.assign(pop_var, 276 | pop_var * decay + batch_var * (1 - decay)) 277 | with tf.control_dependencies([train_mean, train_var]): 278 | output = tf.nn.batch_normalization(input_data, 279 | batch_mean, batch_var, offset, scale, epsilon, name = name) 280 | else: 281 | output = tf.nn.batch_normalization(input_data, 282 | pop_mean, pop_var, offset, scale, epsilon, name = name) 283 | 284 | if relu: 285 | output = tf.nn.relu(output) 286 | 287 | return output 288 | 289 | @layer 290 | def dropout(self, input_data, keep_prob, name): 291 | return tf.nn.dropout(input_data, keep_prob, name=name) 292 | 293 | 294 | def unpool_as_conv(self, size, input_data, id, stride = 1, ReLU = False, BN = True): 295 | 296 | # Model upconvolutions (unpooling + convolution) as interleaving feature 297 | # maps of four convolutions (A,B,C,D). Building block for up-projections. 298 | 299 | 300 | # Convolution A (3x3) 301 | # -------------------------------------------------- 302 | layerName = "layer%s_ConvA" % (id) 303 | self.feed(input_data) 304 | self.conv( 3, 3, size[3], stride, stride, name = layerName, padding = 'SAME', relu = False) 305 | outputA = self.get_output() 306 | 307 | # Convolution B (2x3) 308 | # -------------------------------------------------- 309 | layerName = "layer%s_ConvB" % (id) 310 | padded_input_B = tf.pad(input_data, [[0, 0], [1, 0], [1, 1], [0, 0]], "CONSTANT") 311 | self.feed(padded_input_B) 312 | self.conv(2, 3, size[3], stride, stride, name = layerName, padding = 'VALID', relu = False) 313 | outputB = self.get_output() 314 | 315 | # Convolution C (3x2) 316 | # -------------------------------------------------- 317 | layerName = "layer%s_ConvC" % (id) 318 | padded_input_C = tf.pad(input_data, [[0, 0], [1, 1], [1, 0], [0, 0]], "CONSTANT") 319 | self.feed(padded_input_C) 320 | self.conv(3, 2, size[3], stride, stride, name = layerName, padding = 'VALID', relu = False) 321 | outputC = self.get_output() 322 | 323 | # Convolution D (2x2) 324 | # -------------------------------------------------- 325 | layerName = "layer%s_ConvD" % (id) 326 | padded_input_D = tf.pad(input_data, [[0, 0], [1, 0], [1, 0], [0, 0]], "CONSTANT") 327 | self.feed(padded_input_D) 328 | self.conv(2, 2, size[3], stride, stride, name = layerName, padding = 'VALID', relu = False) 329 | outputD = self.get_output() 330 | 331 | # Interleaving elements of the four feature maps 332 | # -------------------------------------------------- 333 | left = interleave([outputA, outputB], axis=1) # columns 334 | right = interleave([outputC, outputD], axis=1) # columns 335 | Y = interleave([left, right], axis=2) # rows 336 | 337 | if BN: 338 | layerName = "layer%s_BN" % (id) 339 | self.feed(Y) 340 | self.batch_normalization(name = layerName, scale_offset = True, relu = False) 341 | Y = self.get_output() 342 | 343 | if ReLU: 344 | Y = tf.nn.relu(Y, name = layerName) 345 | 346 | return Y 347 | 348 | 349 | def up_project(self, size, id, stride = 1, BN = True): 350 | 351 | # Create residual upsampling layer (UpProjection) 352 | 353 | input_data = self.get_output() 354 | 355 | # Branch 1 356 | id_br1 = "%s_br1" % (id) 357 | 358 | # Interleaving Convs of 1st branch 359 | out = self.unpool_as_conv(size, input_data, id_br1, stride, ReLU=True, BN=True) 360 | 361 | # Convolution following the upProjection on the 1st branch 362 | layerName = "layer%s_Conv" % (id) 363 | self.feed(out) 364 | self.conv(size[0], size[1], size[3], stride, stride, name = layerName, relu = False) 365 | 366 | if BN: 367 | layerName = "layer%s_BN" % (id) 368 | self.batch_normalization(name = layerName, scale_offset=True, relu = False) 369 | 370 | # Output of 1st branch 371 | branch1_output = self.get_output() 372 | 373 | 374 | # Branch 2 375 | id_br2 = "%s_br2" % (id) 376 | # Interleaving convolutions and output of 2nd branch 377 | branch2_output = self.unpool_as_conv(size, input_data, id_br2, stride, ReLU=False) 378 | 379 | 380 | # sum branches 381 | layerName = "layer%s_Sum" % (id) 382 | output = tf.add_n([branch1_output, branch2_output], name = layerName) 383 | # ReLU 384 | layerName = "layer%s_ReLU" % (id) 385 | output = tf.nn.relu(output, name=layerName) 386 | 387 | self.feed(output) 388 | return self 389 | -------------------------------------------------------------------------------- /tensorflow/predict.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | import numpy as np 4 | import tensorflow as tf 5 | from matplotlib import pyplot as plt 6 | from PIL import Image 7 | 8 | import models 9 | 10 | def predict(model_data_path, image_path): 11 | 12 | 13 | # Default input size 14 | height = 228 15 | width = 304 16 | channels = 3 17 | batch_size = 1 18 | 19 | # Read image 20 | img = Image.open(image_path) 21 | img = img.resize([width,height], Image.ANTIALIAS) 22 | img = np.array(img).astype('float32') 23 | img = np.expand_dims(np.asarray(img), axis = 0) 24 | 25 | # Create a placeholder for the input image 26 | input_node = tf.placeholder(tf.float32, shape=(None, height, width, channels)) 27 | 28 | # Construct the network 29 | net = models.ResNet50UpProj({'data': input_node}, batch_size, 1, False) 30 | 31 | with tf.Session() as sess: 32 | 33 | # Load the converted parameters 34 | print('Loading the model') 35 | 36 | # Use to load from ckpt file 37 | saver = tf.train.Saver() 38 | saver.restore(sess, model_data_path) 39 | 40 | # Use to load from npy file 41 | #net.load(model_data_path, sess) 42 | 43 | # Evalute the network for the given image 44 | pred = sess.run(net.get_output(), feed_dict={input_node: img}) 45 | 46 | # Plot result 47 | fig = plt.figure() 48 | ii = plt.imshow(pred[0,:,:,0], interpolation='nearest') 49 | fig.colorbar(ii) 50 | plt.show() 51 | 52 | return pred 53 | 54 | 55 | def main(): 56 | # Parse arguments 57 | parser = argparse.ArgumentParser() 58 | parser.add_argument('model_path', help='Converted parameters for the model') 59 | parser.add_argument('image_paths', help='Directory of images to predict') 60 | args = parser.parse_args() 61 | 62 | # Predict the image 63 | pred = predict(args.model_path, args.image_paths) 64 | 65 | os._exit(0) 66 | 67 | if __name__ == '__main__': 68 | main() 69 | 70 | 71 | 72 | 73 | 74 | --------------------------------------------------------------------------------