├── README.md
├── co_transforms.py
├── datasets
    ├── __init__.py
    ├── listdataset.py
    ├── scenelistdataset.py
    └── stillbox.py
├── images
    ├── dataset.gif
    └── still.gif
├── loss.py
├── models
    ├── DepthNet.py
    ├── __init__.py
    └── utils.py
├── requirements.txt
├── run_inference.py
├── terminal_logger.py
├── train.py
└── util.py


/README.md:
--------------------------------------------------------------------------------
  1 | # DepthNet training on Still Box
  2 | 
  3 | ### [Project page](http://perso.ensta-paristech.fr/~pinard/depthnet/)
  4 | 
  5 | This code can replicate the results of our paper that was published in UAVg-17.
  6 | If you use this repo in your work, please cite us with the following bibtex :
  7 | 
  8 | ```
  9 | @Article{isprs-annals-IV-2-W3-67-2017,
 10 | AUTHOR = {Pinard, C. and Chevalley, L. and Manzanera, A. and Filliat, D.},
 11 | TITLE = {END-TO-END DEPTH FROM MOTION WITH STABILIZED MONOCULAR VIDEOS},
 12 | JOURNAL = {ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences},
 13 | VOLUME = {IV-2/W3},
 14 | YEAR = {2017},
 15 | PAGES = {67--74},
 16 | URL = {https://www.isprs-ann-photogramm-remote-sens-spatial-inf-sci.net/IV-2-W3/67/2017/},
 17 | DOI = {10.5194/isprs-annals-IV-2-W3-67-2017}
 18 | }
 19 | ```
 20 | 
 21 | ![depthnet](images/still.gif)
 22 | 
 23 | **[End-to-end depth from motion with stabilized monocular videos](https://hal.archives-ouvertes.fr/hal-01587652v1)**
 24 | 
 25 | * This code shows how the only translational movement of the camera can be leveraged to compute a very precise depth map, even at more than 300 times the displacement.
 26 | * Thus, for a camera movement of 30cm (nominal displacement used here), you can see as far as 100m.
 27 | 
 28 | See our second paper for information about using this code on real videos with speed estimation
 29 | 
 30 | **[Multi range Real-time depth inference from a monocular stabilized footage using a Fully Convolutional Neural Network](https://hal.archives-ouvertes.fr/hal-01587658v1)**
 31 | 
 32 | *Click Below for video*
 33 | 
 34 | [![youtube video](http://img.youtube.com/vi/nU-Gv_I7zhg/0.jpg)](http://www.youtube.com/watch?v=nU-Gv_I7zhg)
 35 | 
 36 | ## DepthNet
 37 | 
 38 | DepthNet is a network designed to infer Depth Map directly from a pair of stabilized image.
 39 | 
 40 | * No information is given about movement direction
 41 | * DepthNet is Fully Convolutional, which means it is completely robust to optical center fault
 42 | * This network only works for pinhole-like pictures
 43 | 
 44 | ## Still Box
 45 | 
 46 | ![stillbox](images/dataset.gif)
 47 | 
 48 | Still box is a dataset created specifically for supervised training of depth map inference for stabilized aerial footage. It tries to mimic typical drone footages in static scenes, and depth is **impossible** to infer from a single image, as shapes get all kinds of sizes and positions.
 49 | 
 50 | * You can download it [here](https://stillbox.ensta.fr)
 51 | * The dataset webpage also provides a tutorial on how to read the data
 52 | 
 53 | ## Training
 54 | 
 55 | ### Requirements
 56 | 
 57 | ```
 58 | [sudo] pip3 install -r requirements.txt
 59 | ```
 60 | 
 61 | If you want to log some outputs from the validation set with the `--log-output` option, you need openCV python bindings to convert depth to RGB with a rainbow colormap. 
 62 | > *If you don't have opencv, grayscales will be logged*
 63 | 
 64 | ### Usage
 65 | 
 66 | Best results can be obtained by training on *still box 64* and then finetuned successively up to the resolution you target. Here are the parameters used for the paper *(please note how learning rate and batch size are changed, training was done a single GTX 980Ti)*.
 67 | 
 68 | ```
 69 | python3 train.py -j8 --lr 0.01 /path/to/still_box/64/ --log-output --activation-function elu --bn
 70 | ```
 71 | 
 72 | ```
 73 | python3 train.py -j8 --lr 0.01 /path/to/still_box/128/ --log-output --activation-function elu --bn --pretrained /path/to/DepthNet64
 74 | ```
 75 | 
 76 | ```
 77 | python3 train.py -j8 --lr 0.001 /path/to/still_box/256/ --log-output --activation-function elu --bn -b64 --pretrained /path/to/DepthNet128
 78 | ```
 79 | 
 80 | ```
 81 | python3 train.py -j8 --lr 0.001 /path/to/still_box/512/ --log-output --activation-function elu --bn -b16 --pretrained /path/to/DepthNet256
 82 | ```
 83 | 
 84 | > **Note**: You can skip 128 and 256 training if you don't have time, results will be only slightly worse. However, you need to do 64 training first as stated by our first paper. This might has something to do with either the size of 64 dataset (in terms of scene numbers) or the fact that feature maps are reduced down to 1x1 making last convolution a FC equivalent operation
 85 | 
 86 | ### Pretrained networks
 87 | 
 88 | Best results were obtained with elu for depth activation (not mentionned in the original paper), along with BatchNorm.
 89 | 
 90 | |Name                         | training set | Error (m)|                                                                                               |
 91 | |:----------------------------|-------------:|---------:|-----------------------------------------------------------------------------------------------|
 92 | |`DepthNet_elu_bn_64.pth.tar` |            64|     4.65 |[Link](http://perso.ensta-paristech.fr/~pinard/depthnet/pretrained/DepthNet_elu_bn_64.pth.tar) |
 93 | |`DepthNet_elu_bn_128.pth.tar`|           128|     3.08 |[Link](http://perso.ensta-paristech.fr/~pinard/depthnet/pretrained/DepthNet_elu_bn_128.pth.tar)|
 94 | |`DepthNet_elu_bn_256.pth.tar`|           256|     2.29 |[Link](http://perso.ensta-paristech.fr/~pinard/depthnet/pretrained/DepthNet_elu_bn_256.pth.tar)|
 95 | |`DepthNet_elu_bn_512.pth.tar`|           512|     1.97 |[Link](http://perso.ensta-paristech.fr/~pinard/depthnet/pretrained/DepthNet_elu_bn_512.pth.tar)|
 96 | 
 97 | All the networks have the same size and same structure.
 98 | 
 99 | 
100 | ### Custom FOV and focal length
101 | 
102 | Every image in still box is 90° of FOV (field of view), focal length (in pixels) is then respectively
103 | 
104 | * 32px for 64x64 images
105 | * 64px for 128x128 images
106 | * 128px for 128x128 images
107 | * 256px for 512x512 images
108 | 
109 | Training is not flexible to focal length, and for a custom focal length you will have to run a dedicated training.
110 | 
111 | If you need to use a custom focal length and FOV you can simply resize the pictures and crop them.
112 | 
113 | Say you have a picture of width `w` with an associated FOV `fov`. To get equivalent from one of the datasets you can first crop the still box pictures so that FOV will match `fov` (cropping doesn't affect focal length in pixels), and then resize it to `w`. Note that DepthNet can take rectangular pictures as input.
114 | 
115 | `cropped_w = w/tan(pi*fov/360)`
116 | 
117 | we naturally recommend to do this operation offline, metadata from `metadata.json` won't need to be altered.
118 | 
119 | #### with pretrained DepthNet
120 | 
121 | If you can resize your test pictures, thanks to its fully convolutional architecture, DepthNet is flexible to fov, as long as it stays below 90° (or max FOV encountered during training). Referring back to our witdh `w` and FOV `fov` we get with a network trained with a particular focal length `f` the following width to resize to:
122 | 
123 | `resized_w = f/2*tan(pi*fov/360)`
124 | 
125 | That way, you won't have to make a dedicated training or even download the still box dataset
126 | 
127 | ----
128 | > **/!\ These equations are only valid with pinhole equivalent cameras. Be sure to correct distortion before using DepthNet**
129 | 
130 | ## Testing Inference
131 | 
132 | The `run_inference.py` lets you run an inference on a folder of images, and save the depth maps in different visualizations.
133 | 
134 | A simple still box scene of `512x512` pictures for testing can be downloaded [here](http://perso.ensta-paristech.fr/~pinard/stub_box.zip).
135 | Otherwise, any folder with a list of jpg images will do, provided you follow the guidelines above.
136 | 
137 | ```bash
138 | python3 run_inference.py --output-depth --no-resize --dataset-dir /path/to/stub_box --pretrained /path/to/DepthNet512 --frame-shift 3 --output-dir /path/to/save/outputs
139 | ```
140 | 
141 | 
142 | ## Visualise training
143 | 
144 | Training can be visualized via tensorboard by launching this command in another terminal
145 | ```
146 | tensorboard --logdir=/path/to/DepthNet/Results
147 | ```
148 | 
149 | You can then access the board from any computer in the local network by accessing `machine_ip:6006` from a web browser, just as a regular tensorboard server. More info [here](https://www.tensorflow.org/get_started/summaries_and_tensorboard)
150 | 


--------------------------------------------------------------------------------
/co_transforms.py:
--------------------------------------------------------------------------------
 1 | from __future__ import division
 2 | import torch
 3 | import random
 4 | import numpy as np
 5 | import types
 6 | 
 7 | '''Set of tranform random routines that takes both input and target as arguments,
 8 | in order to have random but coherent transformations.
 9 | inputs are ndarrays pairs and targets are ndarrays'''
10 | 
11 | 
12 | class Compose(object):
13 |     """Compose several co_transforms together.
14 |     For example:
15 |     >>> co_transforms.Compose([
16 |     >>>     co_transforms.CenterCrop(10),
17 |     >>>     co_transforms.ToTensor(),
18 |     >>>  ])
19 |     """
20 | 
21 |     def __init__(self, co_transforms):
22 |         self.co_transforms = co_transforms
23 | 
24 |     def __call__(self, input, target, displacement):
25 |         for t in self.co_transforms:
26 |             input, target, displacement = t(input, target, displacement)
27 |         return input, target, displacement
28 | 
29 | 
30 | class ArrayToTensor(object):
31 |     """Converts a numpy.ndarray (H x W x C) to a torch.FloatTensor of shape (C x H x W)."""
32 | 
33 |     def __call__(self, array):
34 |         assert(isinstance(array, np.ndarray))
35 |         if array.ndim == 3:
36 |             array = np.transpose(array, (2, 0, 1))
37 |         # handle numpy array
38 |         tensor = torch.from_numpy(array)
39 |         # put it from HWC to CHW format
40 |         return tensor.float()
41 | 
42 | 
43 | class Clip(object):
44 | 
45 |     def __init__(self, x, y):
46 |         self.x = x
47 |         self.y = y
48 | 
49 |     def __call__(self, array):
50 |         assert(isinstance(array, np.ndarray))
51 |         return np.clip(array, self.x, self.y)
52 | 
53 | 
54 | class Lambda(object):
55 |     """Applies a lambda as a transform"""
56 | 
57 |     def __init__(self, lambd):
58 |         assert isinstance(lambd, types.LambdaType)
59 |         self.lambd = lambd
60 | 
61 |     def __call__(self, input, target, displacement):
62 |         return self.lambd(input, target, displacement)
63 | 
64 | 
65 | class RandomHorizontalFlip(object):
66 |     """Randomly horizontally flips the given numpy array with a probability of 0.5"""
67 | 
68 |     def __call__(self, inputs, target, displacement):
69 |         if random.random() < 0.5:
70 |             inputs[0] = np.copy(np.fliplr(inputs[0]))
71 |             inputs[1] = np.copy(np.fliplr(inputs[1]))
72 |             target = np.copy(np.fliplr(target))
73 |             displacement[0] *= -1
74 |         return inputs, target, displacement
75 | 
76 | 
77 | class RandomVerticalFlip(object):
78 |     """Randomly horizontally flips the given numpy array with a probability of 0.5"""
79 | 
80 |     def __call__(self, inputs, target, displacement):
81 |         if random.random() < 0.5:
82 |             inputs[0] = np.copy(np.flipud(inputs[0]))
83 |             inputs[1] = np.copy(np.flipud(inputs[1]))
84 |             target = np.copy(np.flipud(target))
85 |             displacement[1] *= -1
86 |         return inputs, target, displacement


--------------------------------------------------------------------------------
/datasets/__init__.py:
--------------------------------------------------------------------------------
1 | from .stillbox import still_box


--------------------------------------------------------------------------------
/datasets/listdataset.py:
--------------------------------------------------------------------------------
 1 | import torch.utils.data as data
 2 | from imageio import imread
 3 | import numpy as np
 4 | 
 5 | 
 6 | def default_loader(root, path_imgs, path_depth):
 7 |     imgs = [imread(root/path) for path in path_imgs]
 8 |     depth = np.load(root/path_depth)
 9 |     return [imgs, depth]
10 | 
11 | 
12 | class ListDataset(data.Dataset):
13 |     def __init__(self, root, path_list, transform=None, target_transform=None,
14 |                  co_transform=None, loader=default_loader):
15 | 
16 |         self.root = root
17 |         self.path_list = path_list
18 |         self.transform = transform
19 |         self.target_transform = target_transform
20 |         self.co_transform = co_transform
21 |         self.loader = loader
22 | 
23 |     def __getitem__(self, index):
24 |         inputs, target, displacement = self.path_list[index]
25 |         inputs, target = self.loader(self.root, inputs, target)
26 |         if self.co_transform is not None:
27 |             inputs, target, displacement = self.co_transform(inputs, target, displacement)
28 |         if self.transform is not None:
29 |             inputs[0] = self.transform(inputs[0])
30 |             inputs[1] = self.transform(inputs[1])
31 |         if self.target_transform is not None:
32 |             target = self.target_transform(target)
33 | 
34 |         return inputs, target, displacement
35 | 
36 |     def __len__(self):
37 |         return len(self.path_list)
38 | 


--------------------------------------------------------------------------------
/datasets/scenelistdataset.py:
--------------------------------------------------------------------------------
 1 | import torch.utils.data as data
 2 | from imageio import imread
 3 | import numpy as np
 4 | 
 5 | 
 6 | def default_loader(root, path_imgs, path_depth):
 7 |     imgs = [imread(root/path) for path in path_imgs]
 8 |     depth = np.load(root/path_depth)
 9 |     return [imgs, depth]
10 | 
11 | 
12 | class SceneListDataset(data.Dataset):
13 |     def __init__(self, root, scene_list, shift=3, transform=None, target_transform=None,
14 |                  co_transform=None, loader=default_loader):
15 | 
16 |         self.root = root
17 |         self.scene_list = scene_list
18 |         self.indices = []
19 |         for i, scene in enumerate(scene_list):
20 |             self.indices.extend([i for j in scene['imgs']])
21 |         self.transform = transform
22 |         self.target_transform = target_transform
23 |         self.co_transform = co_transform
24 |         self.loader = loader
25 |         self.shift = shift
26 | 
27 |     def __getitem__(self, index):
28 |         scene = self.scene_list[self.indices[index]]
29 | 
30 |         i1 = np.random.randint(0, len(scene['imgs']))
31 |         shift = round(2*self.shift*np.random.uniform())
32 |         i2 = min(len(scene['imgs'])-1, i1+shift)
33 |         displacement = scene['time_step']*np.array(scene['speed']).astype(np.float32)*self.shift
34 | 
35 |         if np.random.uniform() > 0.5:
36 |             # swap i1 and i2
37 |             i1, i2 = i2, i1
38 |             displacement *= -1
39 | 
40 |         inputs = [scene['imgs'][i1], scene['imgs'][i2]]
41 |         target = scene['depth'][i2]
42 |         inputs, target = self.loader(self.root/scene['subdir'], inputs, target)
43 | 
44 |         if i1 == i2:
45 |             target.fill(100)
46 |         else:
47 |             target *= self.shift/np.abs(i2-i1)
48 |         if self.co_transform is not None:
49 |             inputs, target, displacement = self.co_transform(inputs, target, displacement)
50 |         if self.transform is not None:
51 |             inputs[0] = self.transform(inputs[0])
52 |             inputs[1] = self.transform(inputs[1])
53 |         if self.target_transform is not None:
54 |             target = self.target_transform(target)
55 |         return inputs, target, displacement
56 | 
57 |     def __len__(self):
58 |         return len(self.indices)
59 | 


--------------------------------------------------------------------------------
/datasets/stillbox.py:
--------------------------------------------------------------------------------
 1 | import random
 2 | import math
 3 | from .listdataset import ListDataset
 4 | from .scenelistdataset import SceneListDataset
 5 | import json
 6 | from path import Path
 7 | import numpy as np
 8 | 
 9 | 
10 | def make_dataset(root_dir, split=0, shift=3, seed=None):
11 |     """Will search for subfolder and will read metadata json files."""
12 |     global args
13 |     random.seed(seed)
14 |     scenes = []
15 |     for sub_dir in root_dir.dirs():
16 |         metadata_path = sub_dir/'metadata.json'
17 |         with open(metadata_path, 'r') as f:
18 |             metadata = json.load(f)
19 |         for scene in metadata['scenes']:
20 |             scene['subdir'] = sub_dir.basename()
21 |         scenes.extend(metadata['scenes'])
22 | 
23 |     assert(len(scenes) > 0)
24 |     random.shuffle(scenes)
25 |     split_index = math.floor(len(scenes)*split/100)
26 |     assert(split_index >= 0 and split_index <= len(scenes))
27 |     train_scenes = scenes[:split_index]
28 |     test_images = []
29 |     if split_index < len(scenes):
30 |         for scene in scenes[split_index+1:]:
31 |             imgs = scene['imgs']
32 |             for i in range(len(imgs)-shift):
33 |                 img_pair = [str(scene['subdir']/imgs[i]), str(scene['subdir']/imgs[i+shift])]
34 |                 depth = str(scene['subdir']/scene['depth'][i + shift])
35 |                 displacement = np.array(scene['speed']).astype(np.float32)*shift*scene['time_step']
36 |                 test_images.append(
37 |                     [img_pair,
38 |                      depth,
39 |                      displacement]
40 |                 )
41 |     return (train_scenes, test_images)
42 | 
43 | 
44 | def still_box(root, transform=None, target_transform=None,
45 |               co_transform=None, split=80, shift=3, seed=None):
46 |     root = Path(root)
47 |     train_scenes, test_list = make_dataset(root, split, shift, seed)
48 |     train_dataset = SceneListDataset(root, train_scenes, shift, transform, target_transform, co_transform)
49 |     test_dataset = ListDataset(root, test_list, transform, target_transform)
50 | 
51 |     return train_dataset, test_dataset


--------------------------------------------------------------------------------
/images/dataset.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ClementPinard/DepthNet/3c753fc21b06c9be307d73c8e7a0c61f2ea56cc3/images/dataset.gif


--------------------------------------------------------------------------------
/images/still.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ClementPinard/DepthNet/3c753fc21b06c9be307d73c8e7a0c61f2ea56cc3/images/still.gif


--------------------------------------------------------------------------------
/loss.py:
--------------------------------------------------------------------------------
 1 | import torch.nn as nn
 2 | import torch.nn.functional as F
 3 | 
 4 | 
 5 | def depth_metric_reconstruction_loss(depth, target, weights=None, loss='L1', normalize=False):
 6 |     def one_scale(depth, target, loss_function, normalize):
 7 |         b, h, w = depth.size()
 8 | 
 9 |         target_scaled = F.interpolate(target.unsqueeze(1), size=(h, w), mode='area')[:,0]
10 | 
11 |         diff = depth-target_scaled
12 | 
13 |         if normalize:
14 |             diff = diff/target_scaled
15 | 
16 |         return loss_function(diff, depth.detach()*0)
17 | 
18 |     if weights is not None:
19 |         assert(len(weights) == len(depth))
20 |     else:
21 |         weights = [1 for d in depth]
22 |     if type(depth) not in [list, tuple]:
23 |         depth = [depth]
24 | 
25 |     if type(loss) is str:
26 |         assert(loss in ['L1', 'MSE', 'SmoothL1'])
27 | 
28 |         if loss == 'L1':
29 |             loss_function = nn.L1Loss()
30 |         elif loss == 'MSE':
31 |             loss_function = nn.MSELoss()
32 |         elif loss == 'SmoothL1':
33 |             loss_function = nn.SmoothL1Loss()
34 |     else:
35 |         loss_function = loss
36 | 
37 |     loss_output = 0
38 |     for d, w in zip(depth, weights):
39 |         loss_output += w*one_scale(d, target, loss_function, normalize)
40 |     return loss_output


--------------------------------------------------------------------------------
/models/DepthNet.py:
--------------------------------------------------------------------------------
 1 | from __future__ import division
 2 | import torch.nn as nn
 3 | from models.utils import conv, deconv, predict_depth, post_process_depth, adaptative_cat, init_modules
 4 | 
 5 | 
 6 | class DepthNet(nn.Module):
 7 | 
 8 |     def __init__(self, batch_norm=False, with_confidence=False, clamp=False, depth_activation=None):
 9 |         super(DepthNet, self).__init__()
10 | 
11 |         self.clamp = clamp
12 |         if depth_activation == 'elu':
13 |             self.depth_activation = lambda x: nn.functional.elu(x) + 1
14 |         else:
15 |             self.depth_activation = depth_activation
16 | 
17 |         self.conv1   = conv(  6,   32, stride=2, batch_norm=batch_norm)
18 |         self.conv2   = conv( 32,   64, stride=2, batch_norm=batch_norm)
19 |         self.conv3   = conv( 64,  128, stride=2, batch_norm=batch_norm)
20 |         self.conv3_1 = conv(128,  128,           batch_norm=batch_norm)
21 |         self.conv4   = conv(128,  256, stride=2, batch_norm=batch_norm)
22 |         self.conv4_1 = conv(256,  256,           batch_norm=batch_norm)
23 |         self.conv5   = conv(256,  256, stride=2, batch_norm=batch_norm)
24 |         self.conv5_1 = conv(256,  256,           batch_norm=batch_norm)
25 |         self.conv6   = conv(256,  512, stride=2, batch_norm=batch_norm)
26 |         self.conv6_1 = conv(512,  512,           batch_norm=batch_norm)
27 | 
28 |         self.deconv5 = deconv(512, 256, batch_norm=batch_norm)
29 |         self.deconv4 = deconv(513, 128, batch_norm=batch_norm)
30 |         self.deconv3 = deconv(385,  64, batch_norm=batch_norm)
31 |         self.deconv2 = deconv(193,  32, batch_norm=batch_norm)
32 | 
33 |         self.predict_depth6 = predict_depth(512, with_confidence)
34 |         self.predict_depth5 = predict_depth(513, with_confidence)
35 |         self.predict_depth4 = predict_depth(385, with_confidence)
36 |         self.predict_depth3 = predict_depth(193, with_confidence)
37 |         self.predict_depth2 = predict_depth( 97, with_confidence)
38 | 
39 |         self.upsampled_depth6_to_5 = nn.ConvTranspose2d(1, 1, 4, 2, 1, bias=False)
40 |         self.upsampled_depth5_to_4 = nn.ConvTranspose2d(1, 1, 4, 2, 1, bias=False)
41 |         self.upsampled_depth4_to_3 = nn.ConvTranspose2d(1, 1, 4, 2, 1, bias=False)
42 |         self.upsampled_depth3_to_2 = nn.ConvTranspose2d(1, 1, 4, 2, 1, bias=False)
43 | 
44 |         init_modules(self)
45 | 
46 |     def forward(self, x):
47 |         out_conv2 = self.conv2(self.conv1(x))
48 |         out_conv3 = self.conv3_1(self.conv3(out_conv2))
49 |         out_conv4 = self.conv4_1(self.conv4(out_conv3))
50 |         out_conv5 = self.conv5_1(self.conv5(out_conv4))
51 |         out_conv6 = self.conv6_1(self.conv6(out_conv5))
52 | 
53 |         out6                = self.predict_depth6(out_conv6)
54 |         depth6 = post_process_depth(out6, clamp=self.clamp, activation_function=self.depth_activation)
55 |         depth6_up           = self.upsampled_depth6_to_5(out6)
56 |         out_deconv5         = self.deconv5(out_conv6)
57 | 
58 |         concat5     = adaptative_cat(out_conv5, out_deconv5, depth6_up)
59 |         out5        = self.predict_depth5(concat5)
60 |         depth5 = post_process_depth(out5, clamp=self.clamp, activation_function=self.depth_activation)
61 |         depth5_up   = self.upsampled_depth5_to_4(out5)
62 |         out_deconv4 = self.deconv4(concat5)
63 | 
64 |         concat4     = adaptative_cat(out_conv4, out_deconv4, depth5_up)
65 |         out4        = self.predict_depth4(concat4)
66 |         depth4 = post_process_depth(out4, clamp=self.clamp, activation_function=self.depth_activation)
67 |         depth4_up   = self.upsampled_depth4_to_3(out4)
68 |         out_deconv3 = self.deconv3(concat4)
69 | 
70 |         concat3     = adaptative_cat(out_conv3, out_deconv3, depth4_up)
71 |         out3        = self.predict_depth3(concat3)
72 |         depth3 = post_process_depth(out3, clamp=self.clamp, activation_function=self.depth_activation)
73 |         depth3_up   = self.upsampled_depth3_to_2(out3)
74 |         out_deconv2 = self.deconv2(concat3)
75 | 
76 |         concat2     = adaptative_cat(out_conv2, out_deconv2, depth3_up)
77 |         out2        = self.predict_depth2(concat2)
78 |         depth2 = post_process_depth(out2, clamp=self.clamp, activation_function=self.depth_activation)
79 | 
80 |         if self.training:
81 |             return [depth2, depth3, depth4, depth5, depth6]
82 |         else:
83 |             return depth2


--------------------------------------------------------------------------------
/models/__init__.py:
--------------------------------------------------------------------------------
1 | from .DepthNet import DepthNet


--------------------------------------------------------------------------------
/models/utils.py:
--------------------------------------------------------------------------------
 1 | from __future__ import division
 2 | 
 3 | import torch
 4 | import torch.nn as nn
 5 | from torch.nn.init import xavier_normal_, constant_
 6 | 
 7 | 
 8 | def conv(in_planes, out_planes, stride=1, batch_norm=False):
 9 |     if batch_norm:
10 |         return nn.Sequential(
11 |             nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False),
12 |             nn.BatchNorm2d(out_planes, eps=1e-3),
13 |             nn.ReLU(inplace=True)
14 |         )
15 |     else:
16 |         return nn.Sequential(
17 |             nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=True),
18 |             nn.ReLU(inplace=True)
19 |         )
20 | 
21 | 
22 | def deconv(in_planes, out_planes, batch_norm=False):
23 |     if batch_norm:
24 |         return nn.Sequential(
25 |             nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1, bias=True),
26 |             nn.Conv2d(out_planes, out_planes, kernel_size=3, stride=1, padding=1, bias=False),
27 |             nn.BatchNorm2d(out_planes, eps=1e-3),
28 |             nn.ReLU(inplace=True)
29 |         )
30 |     else:
31 |         return nn.Sequential(
32 |             nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1, bias=True),
33 |             nn.Conv2d(out_planes, out_planes, kernel_size=3, stride=1, padding=1, bias=True),
34 |             nn.ReLU(inplace=True)
35 |         )
36 | 
37 | 
38 | def predict_depth(in_planes, with_confidence):
39 |     return nn.Conv2d(in_planes, 2 if with_confidence else 1, kernel_size=3, stride=1, padding=1, bias=True)
40 | 
41 | 
42 | def post_process_depth(depth, activation_function=None, clamp=False):
43 |     if activation_function is not None:
44 |         depth = activation_function(depth)
45 | 
46 |     if clamp:
47 |         depth = depth.clamp(10, 60)
48 | 
49 |     return depth[:,0]
50 | 
51 | 
52 | def adaptative_cat(out_conv, out_deconv, out_depth_up):
53 |     out_deconv = out_deconv[:, :, :out_conv.size(2), :out_conv.size(3)]
54 |     out_depth_up = out_depth_up[:, :, :out_conv.size(2), :out_conv.size(3)]
55 |     return torch.cat((out_conv, out_deconv, out_depth_up), 1)
56 | 
57 | 
58 | def init_modules(net):
59 |     for m in net.modules():
60 |         if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
61 |             xavier_normal_(m.weight)
62 |             if m.bias is not None:
63 |                 constant_(m.bias, 0)
64 |         elif isinstance(m, nn.BatchNorm2d):
65 |             constant_(m.weight, 1)
66 |             constant_(m.bias, 0)


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | torch>=0.4.1
2 | imageio
3 | argparse
4 | tensorboardX
5 | blessings
6 | progressbar2
7 | path.py
8 | 


--------------------------------------------------------------------------------
/run_inference.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | 
 3 | from scipy.misc import imread, imsave, imresize
 4 | import numpy as np
 5 | from path import Path
 6 | import argparse
 7 | from tqdm import tqdm
 8 | 
 9 | import torch.nn.functional as F
10 | from models import DepthNet
11 | from util import tensor2array
12 | 
13 | parser = argparse.ArgumentParser(description='Inference script for DepthNet img must be with no rotation',
14 |                                  formatter_class=argparse.ArgumentDefaultsHelpFormatter)
15 | parser.add_argument("--output-disp", action='store_true', help="save disparity img")
16 | parser.add_argument("--output-depth", action='store_true', help="save depth img")
17 | parser.add_argument("--output-raw", action='store_true', help="save raw numpy depth array")
18 | 
19 | parser.add_argument("--pretrained", required=True, type=str, help="pretrained DepthNet path")
20 | parser.add_argument("--frame-shift", default=1, type=int, help="temporal shift between imgs of the pairs feeded to the network")
21 | parser.add_argument("--img-height", default=512, type=int, help="Image height")
22 | parser.add_argument("--img-width", default=512, type=int, help="Image width")
23 | parser.add_argument("--no-resize", action='store_true', help="no resizing is done")
24 | 
25 | parser.add_argument("--dataset-list", default=None, type=str, help="Dataset list file")
26 | parser.add_argument("--dataset-dir", default='.', type=str, help="Dataset directory")
27 | parser.add_argument("--output-dir", default='output', type=str, help="Output directory")
28 | 
29 | parser.add_argument("--img-exts", default=['png', 'jpg', 'bmp'], nargs='*', type=str, help="images extensions to glob")
30 | 
31 | 
32 | @torch.no_grad()
33 | def main():
34 |     args = parser.parse_args()
35 |     device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
36 |     if not(args.output_disp or args.output_depth):
37 |         print('You must at least output one value !')
38 |         return
39 | 
40 |     weights = torch.load(args.pretrained)
41 |     depth_net = DepthNet(batch_norm=weights['bn'],
42 |                          depth_activation=weights['activation_function'],
43 |                          clamp=weights['clamp']).to(device)
44 |     print("running inference with {} ...".format(weights['arch']))
45 |     depth_net.load_state_dict(weights['state_dict'])
46 |     depth_net.eval()
47 | 
48 |     dataset_dir = Path(args.dataset_dir)
49 |     output_dir = Path(args.output_dir)
50 |     output_dir.makedirs_p()
51 | 
52 |     if args.dataset_list is not None:
53 |         with open(args.dataset_list, 'r') as f:
54 |             test_files = [dataset_dir/file for file in f.read().splitlines()]
55 |     else:
56 |         test_files = sorted(sum([dataset_dir.files('*.{}'.format(ext)) for ext in args.img_exts], []))
57 | 
58 |     print('{} files to test'.format(len(test_files)))
59 | 
60 |     for file1, file2 in tqdm(zip(test_files[:-args.frame_shift], test_files[args.frame_shift:])):
61 | 
62 |         img1 = imread(file1).astype(np.float32)
63 |         img2 = imread(file2).astype(np.float32)
64 | 
65 |         h,w,_ = img1.shape
66 |         assert(img1.shape == img2.shape), "img1 and img2 must be the same size"
67 |         if (not args.no_resize) and (h != args.img_height or w != args.img_width):
68 |             img1 = imresize(img1, (args.img_height, args.img_width)).astype(np.float32)
69 |             img2 = imresize(img2, (args.img_height, args.img_width)).astype(np.float32)
70 |         imgs = np.concatenate([np.transpose(img1, (2, 0, 1)), np.transpose(img2, (2, 0, 1))])
71 | 
72 |         tensor_imgs = torch.from_numpy(imgs).unsqueeze(0).to(device)
73 |         tensor_imgs = ((tensor_imgs/255 - 0.5)/0.2)
74 | 
75 |         output_depth = depth_net(tensor_imgs)
76 | 
77 |         upscaled_output = F.interpolate(output_depth.unsqueeze(1), (h,w), mode='bilinear', align_corners=False)[0,0]
78 | 
79 |         if args.output_disp:
80 |             disp = 1/upscaled_output
81 |             disp = (255*tensor2array(disp, max_value=None, colormap='bone')).astype(np.uint8)
82 |             imsave(output_dir/'{}_disp{}'.format(file2.namebase, file2.ext), disp.transpose(1,2,0))
83 |         if args.output_depth:
84 |             depth = (255*tensor2array(upscaled_output, max_value=100, colormap='rainbow')).astype(np.uint8)
85 |             imsave(output_dir/'{}_depth{}'.format(file2.namebase, file2.ext), depth.transpose(1,2,0))
86 |         if args.output_raw:
87 |             np.save(output_dir/'{}_depth.npy'.format(file2.namebase), output_depth.cpu())
88 | 
89 | 
90 | if __name__ == '__main__':
91 |     main()
92 | 


--------------------------------------------------------------------------------
/terminal_logger.py:
--------------------------------------------------------------------------------
 1 | from blessings import Terminal
 2 | import progressbar
 3 | import sys
 4 | 
 5 | 
 6 | class TermLogger(object):
 7 |     def __init__(self, n_epochs, train_size, test_size):
 8 |         self.n_epochs = n_epochs
 9 |         self.train_size = train_size
10 |         self.test_size = test_size
11 |         self.t = Terminal()
12 |         s = 10
13 |         e = 1   # epoch bar position
14 |         tr = 3  # train bar position
15 |         ts = 6  # test bar position
16 |         h = self.t.height
17 | 
18 |         for i in range(10):
19 |             print('')
20 |         self.epoch_bar = progressbar.ProgressBar(max_value=n_epochs, fd=Writer(self.t, (0, h-s+e)))
21 | 
22 |         self.train_writer = Writer(self.t, (0, h-s+tr))
23 |         self.train_bar_writer = Writer(self.t, (0, h-s+tr+1))
24 | 
25 |         self.test_writer = Writer(self.t, (0, h-s+ts))
26 |         self.test_bar_writer = Writer(self.t, (0, h-s+ts+1))
27 | 
28 |         self.reset_train_bar()
29 |         self.reset_test_bar()
30 | 
31 |     def reset_train_bar(self):
32 |         self.train_bar = progressbar.ProgressBar(max_value=self.train_size, fd=self.train_bar_writer)
33 | 
34 |     def reset_test_bar(self):
35 |         self.test_bar = progressbar.ProgressBar(max_value=self.test_size, fd=self.test_bar_writer)
36 | 
37 | 
38 | class Writer(object):
39 |     """Create an object with a write method that writes to a
40 |     specific place on the screen, defined at instantiation.
41 | 
42 |     This is the glue between blessings and progressbar.
43 |     """
44 | 
45 |     def __init__(self, t, location):
46 |         """
47 |         Input: location - tuple of ints (x, y), the position
48 |                         of the bar in the terminal
49 |         """
50 |         self.location = location
51 |         self.t = t
52 | 
53 |     def write(self, string):
54 |         with self.t.location(*self.location):
55 |             sys.stdout.write("\033[K")
56 |             print(string)
57 | 
58 |     def flush(self):
59 |         return


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import time
  3 | import csv
  4 | import os
  5 | 
  6 | import torch
  7 | import torch.backends.cudnn as cudnn
  8 | import torch.optim
  9 | import torch.utils.data
 10 | import torchvision.transforms as transforms
 11 | import co_transforms
 12 | import models
 13 | import datasets
 14 | from loss import depth_metric_reconstruction_loss as metric_loss
 15 | from terminal_logger import TermLogger
 16 | from tensorboardX import SummaryWriter
 17 | 
 18 | import util
 19 | from util import AverageMeter
 20 | 
 21 | 
 22 | parser = argparse.ArgumentParser(description='PyTorch DepthNet Training on Still Box dataset')
 23 | util.set_arguments(parser)
 24 | 
 25 | best_error = -1
 26 | n_iter = 0
 27 | device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
 28 | 
 29 | 
 30 | def main():
 31 |     global args, best_error, viz
 32 |     args = util.set_params(parser)
 33 | 
 34 |     train_writer = SummaryWriter(args.save_path/'train')
 35 |     val_writer = SummaryWriter(args.save_path/'val')
 36 |     output_writers = []
 37 |     if args.log_output:
 38 |         for i in range(3):
 39 |             output_writers.append(SummaryWriter(args.save_path/'val'/str(i)))
 40 |     torch.manual_seed(args.seed)
 41 | 
 42 |     # Data loading code
 43 |     mean = [0.5, 0.5, 0.5]
 44 |     std = [0.2, 0.2, 0.2]
 45 |     normalize = transforms.Normalize(mean=mean,
 46 |                                      std=std)
 47 |     input_transform = transforms.Compose([
 48 |         co_transforms.ArrayToTensor(),
 49 |         transforms.Normalize(mean=[0, 0, 0], std=[255, 255, 255]),
 50 |         normalize
 51 |     ])
 52 |     target_transform = transforms.Compose([
 53 |         co_transforms.Clip(0, 100),
 54 |         co_transforms.ArrayToTensor()
 55 |     ])
 56 |     co_transform = co_transforms.Compose([
 57 |         co_transforms.RandomVerticalFlip(),
 58 |         co_transforms.RandomHorizontalFlip()
 59 |     ])
 60 | 
 61 |     print("=> fetching scenes in '{}'".format(args.data))
 62 |     train_set, val_set = datasets.still_box(
 63 |         args.data,
 64 |         transform=input_transform,
 65 |         target_transform=target_transform,
 66 |         co_transform=co_transform,
 67 |         split=args.split,
 68 |         seed=args.seed
 69 |     )
 70 |     print('{} samples found, {} train scenes and {} validation samples '.format(len(val_set)+len(train_set),
 71 |                                                                                 len(train_set),
 72 |                                                                                 len(val_set)))
 73 |     train_loader = torch.utils.data.DataLoader(
 74 |         train_set, batch_size=args.batch_size, shuffle=True,
 75 |         num_workers=args.workers, pin_memory=True)
 76 |     val_loader = torch.utils.data.DataLoader(
 77 |         val_set, batch_size=args.batch_size,
 78 |         shuffle=False,
 79 |         num_workers=args.workers, pin_memory=True)
 80 |     if args.epoch_size == 0:
 81 |         args.epoch_size = len(train_loader)
 82 |     # create model
 83 |     if args.pretrained:
 84 |         data = torch.load(args.pretrained)
 85 |         assert(not data['with_confidence'])
 86 |         print("=> using pre-trained model '{}'".format(data['arch']))
 87 |         model = models.DepthNet(batch_norm=data['bn'], clamp=args.clamp, depth_activation=args.activation_function)
 88 |         model.load_state_dict(data['state_dict'])
 89 |     else:
 90 |         print("=> creating model '{}'".format(args.arch))
 91 |         model = models.DepthNet(batch_norm=args.bn, clamp=args.clamp, depth_activation=args.activation_function)
 92 | 
 93 |     model = model.to(device)
 94 |     model = torch.nn.DataParallel(model)
 95 |     cudnn.benchmark = True
 96 | 
 97 |     assert(args.solver in ['adam', 'sgd'])
 98 |     print('=> setting {} solver'.format(args.solver))
 99 |     if args.solver == 'adam':
100 |         optimizer = torch.optim.Adam(model.parameters(), args.lr,
101 |                                      betas=(args.momentum, args.beta),
102 |                                      weight_decay=args.weight_decay)
103 |     elif args.solver == 'sgd':
104 |         optimizer = torch.optim.SGD(model.parameters(), args.lr,
105 |                                     momentum=args.momentum,
106 |                                     weight_decay=args.weight_decay,
107 |                                     dampening=args.momentum)
108 | 
109 |     scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer,
110 |                                                      milestones=[19,30,44,53],
111 |                                                      gamma=0.3)
112 | 
113 |     with open(os.path.join(args.save_path, args.log_summary), 'w') as csvfile:
114 |         writer = csv.writer(csvfile, delimiter='\t')
115 |         writer.writerow(['train_loss', 'train_depth_error', 'normalized_train_depth_error', 'depth_error', 'normalized_depth_error'])
116 | 
117 |     with open(os.path.join(args.save_path, args.log_full), 'w') as csvfile:
118 |         writer = csv.writer(csvfile, delimiter='\t')
119 |         writer.writerow(['train_loss', 'train_depth_error'])
120 | 
121 |     term_logger = TermLogger(n_epochs=args.epochs, train_size=min(len(train_loader), args.epoch_size), test_size=len(val_loader))
122 |     term_logger.epoch_bar.start()
123 | 
124 |     if args.evaluate:
125 |         depth_error, normalized = validate(val_loader, model, 0, term_logger, output_writers)
126 |         term_logger.test_writer.write(' * Depth error : {:.3f}, normalized : {:.3f}'.format(depth_error, normalized))
127 |         return
128 | 
129 |     for epoch in range(args.epochs):
130 |         term_logger.epoch_bar.update(epoch)
131 |         scheduler.step()
132 | 
133 |         # train for one epoch
134 |         term_logger.reset_train_bar()
135 |         term_logger.train_bar.start()
136 |         train_loss, train_error, train_normalized_error = train(train_loader, model, optimizer, args.epoch_size, term_logger, train_writer)
137 |         term_logger.train_writer.write(' * Avg Loss : {:.3f}, Avg Depth error : {:.3f}, normalized : {:.3f}'
138 |                                        .format(train_loss, train_error, train_normalized_error))
139 |         train_writer.add_scalar('metric_error', train_error, epoch)
140 |         train_writer.add_scalar('metric_normalized_error', train_normalized_error, epoch)
141 | 
142 |         # evaluate on validation set
143 |         term_logger.reset_test_bar()
144 |         term_logger.test_bar.start()
145 |         depth_error, normalized = validate(val_loader, model, epoch, term_logger, output_writers)
146 |         term_logger.test_writer.write(' * Depth error : {:.3f}, normalized : {:.3f}'.format(depth_error, normalized))
147 |         val_writer.add_scalar('metric_error', depth_error, epoch)
148 |         val_writer.add_scalar('metric_normalized_error', normalized, epoch)
149 | 
150 |         if best_error < 0:
151 |             best_error = depth_error
152 | 
153 |         # remember lowest error and save checkpoint
154 |         is_best = depth_error < best_error
155 |         best_error = min(depth_error, best_error)
156 |         util.save_checkpoint(
157 |             args.save_path, {
158 |                 'epoch': epoch + 1,
159 |                 'arch': args.arch,
160 |                 'state_dict': model.state_dict(),
161 |                 'best_error': best_error,
162 |                 'bn': args.bn,
163 |                 'with_confidence': False,
164 |                 'activation_function': args.activation_function,
165 |                 'clamp': args.clamp,
166 |                 'mean': mean,
167 |                 'std': std
168 |             },
169 |             is_best)
170 | 
171 |         with open(os.path.join(args.save_path, args.log_summary), 'a') as csvfile:
172 |             writer = csv.writer(csvfile, delimiter='\t')
173 |             writer.writerow([train_loss, train_error, depth_error])
174 |     term_logger.epoch_bar.finish()
175 | 
176 | 
177 | def train(train_loader, model, optimizer, epoch_size, term_logger, train_writer):
178 |     global n_iter, args
179 |     batch_time = AverageMeter()
180 |     data_time = AverageMeter()
181 |     losses = AverageMeter()
182 |     depth2_metric_errors = AverageMeter()
183 |     depth2_normalized_errors = AverageMeter()
184 | 
185 |     # switch to train mode
186 |     model.train()
187 | 
188 |     end = time.time()
189 | 
190 |     for i, (input, target, _) in enumerate(train_loader):
191 |         # measure data loading time
192 |         data_time.update(time.time() - end)
193 |         target = target.to(device)
194 |         input = torch.cat(input,1).to(device)
195 | 
196 |         # compute output
197 |         output = model(input)
198 | 
199 |         loss = metric_loss(output, target, weights=(0.32, 0.08, 0.02, 0.01, 0.005), loss=args.loss)
200 |         depth2_norm_error = metric_loss(output[0], target, normalize=True)
201 |         depth2_metric_error = metric_loss(output[0], target, normalize=False)
202 |         # record loss and EPE
203 |         losses.update(loss.item(), target.size(0))
204 |         train_writer.add_scalar('train_loss', loss.item(), n_iter)
205 |         depth2_metric_errors.update(depth2_metric_error.item(), target.size(0))
206 |         depth2_normalized_errors.update(depth2_norm_error.item(), target.size(0))
207 | 
208 |         # compute gradient and do SGD step
209 |         optimizer.zero_grad()
210 |         loss.backward()
211 |         optimizer.step()
212 | 
213 |         # measure elapsed time
214 |         batch_time.update(time.time() - end)
215 |         end = time.time()
216 | 
217 |         with open(os.path.join(args.save_path, args.log_full), 'a') as csvfile:
218 |             writer = csv.writer(csvfile, delimiter='\t')
219 |             writer.writerow([loss.item(), depth2_metric_error.item()])
220 |         term_logger.train_bar.update(i+1)
221 |         if i % args.print_freq == 0:
222 |             term_logger.train_writer.write(
223 |                 'Train: Time {batch_time.val:.3f} ({batch_time.avg:.3f}) '
224 |                 'Data {data_time.val:.3f} ({data_time.avg:.3f}) '
225 |                 'Loss {loss.val:.4f} ({loss.avg:.4f}) '
226 |                 'Depth error {depth2_error.val:.3f} ({depth2_error.avg:.3f})\r'
227 |                 .format(batch_time=batch_time, data_time=data_time,
228 |                         loss=losses, depth2_error=depth2_metric_errors))
229 |         if i >= epoch_size - 1:
230 |             break
231 |         n_iter += 1
232 | 
233 |     return losses.avg, depth2_metric_errors.avg, depth2_normalized_errors.avg
234 | 
235 | 
236 | @torch.no_grad()
237 | def validate(val_loader, model, epoch, logger, output_writers=[]):
238 |     batch_time = AverageMeter()
239 |     depth2_metric_errors = AverageMeter()
240 |     depth2_norm_errors = AverageMeter()
241 |     log_outputs = len(output_writers) > 0
242 |     # switch to evaluate mode
243 |     model.eval()
244 | 
245 |     end = time.time()
246 | 
247 |     for i, (input, target, _) in enumerate(val_loader):
248 |         target = target.to(device)
249 |         input = torch.cat(input, 1).to(device)
250 |         # compute output
251 |         output = model(input)
252 |         if log_outputs and i < len(output_writers):  # log first output of 3 first batches
253 |             if epoch == 0:
254 |                 output_writers[i].add_image('GroundTruth', util.tensor2array(target[0], max_value=100), 0)
255 |                 output_writers[i].add_image('Inputs', util.tensor2array(input[0,:3]), 0)
256 |                 output_writers[i].add_image('Inputs', util.tensor2array(input[0,3:]), 1)
257 |             output_writers[i].add_image('DepthNet Outputs', util.tensor2array(output[0], max_value=100), epoch)
258 |         depth2_norm_error = metric_loss(output, target, normalize=True)
259 |         depth2_metric_error = metric_loss(output, target, normalize=False)
260 |         # record depth error
261 |         depth2_norm_errors.update(depth2_norm_error.item(), target.size(0))
262 |         depth2_metric_errors.update(depth2_metric_error.item(), target.size(0))
263 | 
264 |         # measure elapsed time
265 |         batch_time.update(time.time() - end)
266 |         end = time.time()
267 |         logger.test_bar.update(i+1)
268 |         if i % args.print_freq == 0:
269 |             logger.test_writer.write(
270 |                 'Validation: '
271 |                 'Time {batch_time.val:.3f} ({batch_time.avg:.3f}) '
272 |                 'Depth error {depth2_error.val:.3f} ({depth2_error.avg:.3f})'
273 |                 .format(batch_time=batch_time,
274 |                         depth2_error=depth2_metric_errors))
275 | 
276 |     return depth2_metric_errors.avg, depth2_norm_errors.avg
277 | 
278 | 
279 | if __name__ == '__main__':
280 |     main()
281 | 


--------------------------------------------------------------------------------
/util.py:
--------------------------------------------------------------------------------
  1 | import shutil
  2 | import datetime
  3 | import torch
  4 | from torch.autograd import Variable
  5 | from path import Path
  6 | import numpy as np
  7 | 
  8 | 
  9 | def set_arguments(parser):
 10 |     parser.add_argument('data', metavar='DIR',
 11 |                         help='path to dataset')
 12 |     parser.add_argument('--activation-function', default=None,
 13 |                         help='activation function to apply to DepthNet')
 14 |     parser.add_argument('--bn', action='store_true',
 15 |                         help='activate batchNorm (overwritten if pretrained model)')
 16 |     parser.add_argument('--clamp', action='store_true',
 17 |                         help='activate depth clamping to (10,60) in forward pass')
 18 |     parser.add_argument('--solver', default='sgd', choices=['adam', 'sgd'],
 19 |                         help='solvers: adam | sgd')
 20 |     parser.add_argument('-j', '--workers', default=4, type=int, metavar='N',
 21 |                         help='number of data loading workers (default: 4)')
 22 |     parser.add_argument('--epochs', default=55, type=int, metavar='N',
 23 |                         help='number of total epochs to run (default: 55')
 24 |     parser.add_argument('--epoch-size', default=0, type=int, metavar='N',
 25 |                         help='manual epoch size (will match dataset size if not set)')
 26 |     parser.add_argument('-b', '--batch-size', default=256, type=int,
 27 |                         metavar='N', help='mini-batch size (default: 256)')
 28 |     parser.add_argument('--lr', '--learning-rate', default=0.01, type=float,
 29 |                         metavar='LR', help='initial learning rate')
 30 |     parser.add_argument('--momentum', default=0.9, type=float, metavar='M',
 31 |                         help='momentum for sgd, alpha parameter for adam')
 32 |     parser.add_argument('--beta', default=0.999, type=float, metavar='M',
 33 |                         help='beta parameters for adam')
 34 |     parser.add_argument('--weight-decay', '--wd', default=4e-4, type=float,
 35 |                         metavar='W', help='weight decay (default: 4e-4)')
 36 |     parser.add_argument('--print-freq', '-p', default=10, type=int,
 37 |                         metavar='N', help='print frequency (default: 10)')
 38 |     parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true',
 39 |                         help='evaluate model on validation set')
 40 |     parser.add_argument('--pretrained', dest='pretrained', default=None,
 41 |                         help='path to pre-trained model')
 42 |     parser.add_argument('--seed', default=0, type=int, help='seed for random functions, test/train split, network initialization')
 43 |     parser.add_argument('-s', '--split', default=90, type=float, metavar='%',
 44 |                         help='split percentage of train samples vs test (default: 90)')
 45 |     parser.add_argument('--log-summary', default='progress_log_summary.csv',
 46 |                         help='csv where to save per-epoch train and test stats')
 47 |     parser.add_argument('--log-full', default='progress_log_full.csv',
 48 |                         help='csv where to save per-gradient descent train stats')
 49 |     parser.add_argument('--no-date', action='store_true',
 50 |                         help='don\'t append date timestamp to folder')
 51 |     parser.add_argument('--loss', default='L1', help='loss function to apply to multiScaleCriterion : L1 (default)| SmoothL1| MSE')
 52 |     parser.add_argument('--log-output', action='store_true', help='logs in tensorboard some outputs of the network during test phase. Needs OpenCV 3')
 53 | 
 54 | 
 55 | def set_params(parser, with_confidence=False):
 56 |     args = parser.parse_args()
 57 |     args.data = Path(args.data)
 58 |     folder_name = args.data.normpath().name
 59 |     arch_string = 'DepthNet'
 60 |     if with_confidence:
 61 |         arch_string += '_confidence'
 62 |     if args.activation_function is not None:
 63 |         arch_string += '_'+args.activation_function
 64 |     if args.bn:
 65 |         arch_string += '_bn'
 66 |     if args.clamp:
 67 |         arch_string += '_clamp'
 68 |     args.arch = arch_string
 69 | 
 70 |     save_path = '{},{}epochs{},b{},lr{}'.format(
 71 |         args.solver,
 72 |         args.epochs,
 73 |         ',epochSize'+str(args.epoch_size) if args.epoch_size > 0 else '',
 74 |         args.batch_size,
 75 |         args.lr)
 76 |     save_path = Path(save_path)
 77 |     if not args.no_date:
 78 |         timestamp = datetime.datetime.now().strftime("%m-%d-%H:%M")
 79 |         save_path = save_path/timestamp
 80 |     args.save_path = Path('Results')/arch_string/folder_name/save_path
 81 |     print('=> will save everything to {}'.format(save_path))
 82 |     args.save_path.makedirs_p()
 83 |     return args
 84 | 
 85 | 
 86 | def save_checkpoint(save_path, state, is_best, filename='checkpoint.pth.tar'):
 87 |     torch.save(state, save_path/filename)
 88 |     if is_best:
 89 |         shutil.copyfile(save_path/filename, save_path/'model_best.pth.tar')
 90 | 
 91 | 
 92 | class AverageMeter(object):
 93 |     """Computes and stores the average and current value."""
 94 | 
 95 |     def __init__(self):
 96 |         self.reset()
 97 | 
 98 |     def reset(self):
 99 |         self.val = 0
100 |         self.avg = 0
101 |         self.sum = 0
102 |         self.count = 0
103 | 
104 |     def update(self, val, n=1):
105 |         self.val = val
106 |         self.sum += val * n
107 |         self.count += n
108 |         self.avg = self.sum / self.count
109 | 
110 | 
111 | def adjust_learning_rate(optimizer, epoch):
112 |     # Set the learning rate to the initial LR decayed by 2 after 300K iterations, 400K and 500K
113 | 
114 |     if epoch == 19 or epoch == 44:
115 |         for param_group in optimizer.param_groups:
116 |             param_group['lr'] = param_group['lr']/2
117 |     if epoch == 30 or epoch == 53:
118 |         for param_group in optimizer.param_groups:
119 |             param_group['lr'] = param_group['lr']/5
120 | 
121 | 
122 | def tensor2array(tensor, max_value=255, colormap='rainbow'):
123 |     tensor = tensor.detach().cpu()
124 |     if max_value is None:
125 |         max_value = tensor.max().item()
126 |     if tensor.ndimension() == 2 or tensor.size(0) == 1:
127 |         try:
128 |             import cv2
129 |             if int(cv2.__version__[0]) >= 3:
130 |                 color_cvt = cv2.COLOR_BGR2RGB
131 |             else:  # 2.4
132 |                 color_cvt = cv2.cv.CV_BGR2RGB
133 |             if colormap == 'rainbow':
134 |                 colormap = cv2.COLORMAP_RAINBOW
135 |             elif colormap == 'bone':
136 |                 colormap = cv2.COLORMAP_BONE
137 |             array = (255*tensor.squeeze().numpy()/max_value).clip(0, 255).astype(np.uint8)
138 |             colored_array = cv2.applyColorMap(array, colormap)
139 |             array = cv2.cvtColor(colored_array, color_cvt).astype(np.float32)/255
140 |         except ImportError:
141 |             if tensor.ndimension() == 2:
142 |                 tensor.unsqueeze_(2)
143 |             array = (tensor.expand(tensor.size(0), tensor.size(1), 3).numpy()/max_value).clip(0,1)
144 |         array = array.transpose(2, 0, 1)
145 | 
146 |     elif tensor.ndimension() == 3:
147 |         assert(tensor.size(0) == 3)
148 |         array = 0.5 + tensor.numpy()*0.5
149 |     return array


--------------------------------------------------------------------------------