├── README.md ├── co_transforms.py ├── datasets ├── __init__.py ├── listdataset.py ├── scenelistdataset.py └── stillbox.py ├── images ├── dataset.gif └── still.gif ├── loss.py ├── models ├── DepthNet.py ├── __init__.py └── utils.py ├── requirements.txt ├── run_inference.py ├── terminal_logger.py ├── train.py └── util.py /README.md: -------------------------------------------------------------------------------- 1 | # DepthNet training on Still Box 2 | 3 | ### [Project page](http://perso.ensta-paristech.fr/~pinard/depthnet/) 4 | 5 | This code can replicate the results of our paper that was published in UAVg-17. 6 | If you use this repo in your work, please cite us with the following bibtex : 7 | 8 | ``` 9 | @Article{isprs-annals-IV-2-W3-67-2017, 10 | AUTHOR = {Pinard, C. and Chevalley, L. and Manzanera, A. and Filliat, D.}, 11 | TITLE = {END-TO-END DEPTH FROM MOTION WITH STABILIZED MONOCULAR VIDEOS}, 12 | JOURNAL = {ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences}, 13 | VOLUME = {IV-2/W3}, 14 | YEAR = {2017}, 15 | PAGES = {67--74}, 16 | URL = {https://www.isprs-ann-photogramm-remote-sens-spatial-inf-sci.net/IV-2-W3/67/2017/}, 17 | DOI = {10.5194/isprs-annals-IV-2-W3-67-2017} 18 | } 19 | ``` 20 | 21 | ![depthnet](images/still.gif) 22 | 23 | **[End-to-end depth from motion with stabilized monocular videos](https://hal.archives-ouvertes.fr/hal-01587652v1)** 24 | 25 | * This code shows how the only translational movement of the camera can be leveraged to compute a very precise depth map, even at more than 300 times the displacement. 26 | * Thus, for a camera movement of 30cm (nominal displacement used here), you can see as far as 100m. 27 | 28 | See our second paper for information about using this code on real videos with speed estimation 29 | 30 | **[Multi range Real-time depth inference from a monocular stabilized footage using a Fully Convolutional Neural Network](https://hal.archives-ouvertes.fr/hal-01587658v1)** 31 | 32 | *Click Below for video* 33 | 34 | [![youtube video](http://img.youtube.com/vi/nU-Gv_I7zhg/0.jpg)](http://www.youtube.com/watch?v=nU-Gv_I7zhg) 35 | 36 | ## DepthNet 37 | 38 | DepthNet is a network designed to infer Depth Map directly from a pair of stabilized image. 39 | 40 | * No information is given about movement direction 41 | * DepthNet is Fully Convolutional, which means it is completely robust to optical center fault 42 | * This network only works for pinhole-like pictures 43 | 44 | ## Still Box 45 | 46 | ![stillbox](images/dataset.gif) 47 | 48 | Still box is a dataset created specifically for supervised training of depth map inference for stabilized aerial footage. It tries to mimic typical drone footages in static scenes, and depth is **impossible** to infer from a single image, as shapes get all kinds of sizes and positions. 49 | 50 | * You can download it [here](https://stillbox.ensta.fr) 51 | * The dataset webpage also provides a tutorial on how to read the data 52 | 53 | ## Training 54 | 55 | ### Requirements 56 | 57 | ``` 58 | [sudo] pip3 install -r requirements.txt 59 | ``` 60 | 61 | If you want to log some outputs from the validation set with the `--log-output` option, you need openCV python bindings to convert depth to RGB with a rainbow colormap. 62 | > *If you don't have opencv, grayscales will be logged* 63 | 64 | ### Usage 65 | 66 | Best results can be obtained by training on *still box 64* and then finetuned successively up to the resolution you target. Here are the parameters used for the paper *(please note how learning rate and batch size are changed, training was done a single GTX 980Ti)*. 67 | 68 | ``` 69 | python3 train.py -j8 --lr 0.01 /path/to/still_box/64/ --log-output --activation-function elu --bn 70 | ``` 71 | 72 | ``` 73 | python3 train.py -j8 --lr 0.01 /path/to/still_box/128/ --log-output --activation-function elu --bn --pretrained /path/to/DepthNet64 74 | ``` 75 | 76 | ``` 77 | python3 train.py -j8 --lr 0.001 /path/to/still_box/256/ --log-output --activation-function elu --bn -b64 --pretrained /path/to/DepthNet128 78 | ``` 79 | 80 | ``` 81 | python3 train.py -j8 --lr 0.001 /path/to/still_box/512/ --log-output --activation-function elu --bn -b16 --pretrained /path/to/DepthNet256 82 | ``` 83 | 84 | > **Note**: You can skip 128 and 256 training if you don't have time, results will be only slightly worse. However, you need to do 64 training first as stated by our first paper. This might has something to do with either the size of 64 dataset (in terms of scene numbers) or the fact that feature maps are reduced down to 1x1 making last convolution a FC equivalent operation 85 | 86 | ### Pretrained networks 87 | 88 | Best results were obtained with elu for depth activation (not mentionned in the original paper), along with BatchNorm. 89 | 90 | |Name | training set | Error (m)| | 91 | |:----------------------------|-------------:|---------:|-----------------------------------------------------------------------------------------------| 92 | |`DepthNet_elu_bn_64.pth.tar` | 64| 4.65 |[Link](http://perso.ensta-paristech.fr/~pinard/depthnet/pretrained/DepthNet_elu_bn_64.pth.tar) | 93 | |`DepthNet_elu_bn_128.pth.tar`| 128| 3.08 |[Link](http://perso.ensta-paristech.fr/~pinard/depthnet/pretrained/DepthNet_elu_bn_128.pth.tar)| 94 | |`DepthNet_elu_bn_256.pth.tar`| 256| 2.29 |[Link](http://perso.ensta-paristech.fr/~pinard/depthnet/pretrained/DepthNet_elu_bn_256.pth.tar)| 95 | |`DepthNet_elu_bn_512.pth.tar`| 512| 1.97 |[Link](http://perso.ensta-paristech.fr/~pinard/depthnet/pretrained/DepthNet_elu_bn_512.pth.tar)| 96 | 97 | All the networks have the same size and same structure. 98 | 99 | 100 | ### Custom FOV and focal length 101 | 102 | Every image in still box is 90° of FOV (field of view), focal length (in pixels) is then respectively 103 | 104 | * 32px for 64x64 images 105 | * 64px for 128x128 images 106 | * 128px for 128x128 images 107 | * 256px for 512x512 images 108 | 109 | Training is not flexible to focal length, and for a custom focal length you will have to run a dedicated training. 110 | 111 | If you need to use a custom focal length and FOV you can simply resize the pictures and crop them. 112 | 113 | Say you have a picture of width `w` with an associated FOV `fov`. To get equivalent from one of the datasets you can first crop the still box pictures so that FOV will match `fov` (cropping doesn't affect focal length in pixels), and then resize it to `w`. Note that DepthNet can take rectangular pictures as input. 114 | 115 | `cropped_w = w/tan(pi*fov/360)` 116 | 117 | we naturally recommend to do this operation offline, metadata from `metadata.json` won't need to be altered. 118 | 119 | #### with pretrained DepthNet 120 | 121 | If you can resize your test pictures, thanks to its fully convolutional architecture, DepthNet is flexible to fov, as long as it stays below 90° (or max FOV encountered during training). Referring back to our witdh `w` and FOV `fov` we get with a network trained with a particular focal length `f` the following width to resize to: 122 | 123 | `resized_w = f/2*tan(pi*fov/360)` 124 | 125 | That way, you won't have to make a dedicated training or even download the still box dataset 126 | 127 | ---- 128 | > **/!\ These equations are only valid with pinhole equivalent cameras. Be sure to correct distortion before using DepthNet** 129 | 130 | ## Testing Inference 131 | 132 | The `run_inference.py` lets you run an inference on a folder of images, and save the depth maps in different visualizations. 133 | 134 | A simple still box scene of `512x512` pictures for testing can be downloaded [here](http://perso.ensta-paristech.fr/~pinard/stub_box.zip). 135 | Otherwise, any folder with a list of jpg images will do, provided you follow the guidelines above. 136 | 137 | ```bash 138 | python3 run_inference.py --output-depth --no-resize --dataset-dir /path/to/stub_box --pretrained /path/to/DepthNet512 --frame-shift 3 --output-dir /path/to/save/outputs 139 | ``` 140 | 141 | 142 | ## Visualise training 143 | 144 | Training can be visualized via tensorboard by launching this command in another terminal 145 | ``` 146 | tensorboard --logdir=/path/to/DepthNet/Results 147 | ``` 148 | 149 | You can then access the board from any computer in the local network by accessing `machine_ip:6006` from a web browser, just as a regular tensorboard server. More info [here](https://www.tensorflow.org/get_started/summaries_and_tensorboard) 150 | -------------------------------------------------------------------------------- /co_transforms.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | import torch 3 | import random 4 | import numpy as np 5 | import types 6 | 7 | '''Set of tranform random routines that takes both input and target as arguments, 8 | in order to have random but coherent transformations. 9 | inputs are ndarrays pairs and targets are ndarrays''' 10 | 11 | 12 | class Compose(object): 13 | """Compose several co_transforms together. 14 | For example: 15 | >>> co_transforms.Compose([ 16 | >>> co_transforms.CenterCrop(10), 17 | >>> co_transforms.ToTensor(), 18 | >>> ]) 19 | """ 20 | 21 | def __init__(self, co_transforms): 22 | self.co_transforms = co_transforms 23 | 24 | def __call__(self, input, target, displacement): 25 | for t in self.co_transforms: 26 | input, target, displacement = t(input, target, displacement) 27 | return input, target, displacement 28 | 29 | 30 | class ArrayToTensor(object): 31 | """Converts a numpy.ndarray (H x W x C) to a torch.FloatTensor of shape (C x H x W).""" 32 | 33 | def __call__(self, array): 34 | assert(isinstance(array, np.ndarray)) 35 | if array.ndim == 3: 36 | array = np.transpose(array, (2, 0, 1)) 37 | # handle numpy array 38 | tensor = torch.from_numpy(array) 39 | # put it from HWC to CHW format 40 | return tensor.float() 41 | 42 | 43 | class Clip(object): 44 | 45 | def __init__(self, x, y): 46 | self.x = x 47 | self.y = y 48 | 49 | def __call__(self, array): 50 | assert(isinstance(array, np.ndarray)) 51 | return np.clip(array, self.x, self.y) 52 | 53 | 54 | class Lambda(object): 55 | """Applies a lambda as a transform""" 56 | 57 | def __init__(self, lambd): 58 | assert isinstance(lambd, types.LambdaType) 59 | self.lambd = lambd 60 | 61 | def __call__(self, input, target, displacement): 62 | return self.lambd(input, target, displacement) 63 | 64 | 65 | class RandomHorizontalFlip(object): 66 | """Randomly horizontally flips the given numpy array with a probability of 0.5""" 67 | 68 | def __call__(self, inputs, target, displacement): 69 | if random.random() < 0.5: 70 | inputs[0] = np.copy(np.fliplr(inputs[0])) 71 | inputs[1] = np.copy(np.fliplr(inputs[1])) 72 | target = np.copy(np.fliplr(target)) 73 | displacement[0] *= -1 74 | return inputs, target, displacement 75 | 76 | 77 | class RandomVerticalFlip(object): 78 | """Randomly horizontally flips the given numpy array with a probability of 0.5""" 79 | 80 | def __call__(self, inputs, target, displacement): 81 | if random.random() < 0.5: 82 | inputs[0] = np.copy(np.flipud(inputs[0])) 83 | inputs[1] = np.copy(np.flipud(inputs[1])) 84 | target = np.copy(np.flipud(target)) 85 | displacement[1] *= -1 86 | return inputs, target, displacement -------------------------------------------------------------------------------- /datasets/__init__.py: -------------------------------------------------------------------------------- 1 | from .stillbox import still_box -------------------------------------------------------------------------------- /datasets/listdataset.py: -------------------------------------------------------------------------------- 1 | import torch.utils.data as data 2 | from imageio import imread 3 | import numpy as np 4 | 5 | 6 | def default_loader(root, path_imgs, path_depth): 7 | imgs = [imread(root/path) for path in path_imgs] 8 | depth = np.load(root/path_depth) 9 | return [imgs, depth] 10 | 11 | 12 | class ListDataset(data.Dataset): 13 | def __init__(self, root, path_list, transform=None, target_transform=None, 14 | co_transform=None, loader=default_loader): 15 | 16 | self.root = root 17 | self.path_list = path_list 18 | self.transform = transform 19 | self.target_transform = target_transform 20 | self.co_transform = co_transform 21 | self.loader = loader 22 | 23 | def __getitem__(self, index): 24 | inputs, target, displacement = self.path_list[index] 25 | inputs, target = self.loader(self.root, inputs, target) 26 | if self.co_transform is not None: 27 | inputs, target, displacement = self.co_transform(inputs, target, displacement) 28 | if self.transform is not None: 29 | inputs[0] = self.transform(inputs[0]) 30 | inputs[1] = self.transform(inputs[1]) 31 | if self.target_transform is not None: 32 | target = self.target_transform(target) 33 | 34 | return inputs, target, displacement 35 | 36 | def __len__(self): 37 | return len(self.path_list) 38 | -------------------------------------------------------------------------------- /datasets/scenelistdataset.py: -------------------------------------------------------------------------------- 1 | import torch.utils.data as data 2 | from imageio import imread 3 | import numpy as np 4 | 5 | 6 | def default_loader(root, path_imgs, path_depth): 7 | imgs = [imread(root/path) for path in path_imgs] 8 | depth = np.load(root/path_depth) 9 | return [imgs, depth] 10 | 11 | 12 | class SceneListDataset(data.Dataset): 13 | def __init__(self, root, scene_list, shift=3, transform=None, target_transform=None, 14 | co_transform=None, loader=default_loader): 15 | 16 | self.root = root 17 | self.scene_list = scene_list 18 | self.indices = [] 19 | for i, scene in enumerate(scene_list): 20 | self.indices.extend([i for j in scene['imgs']]) 21 | self.transform = transform 22 | self.target_transform = target_transform 23 | self.co_transform = co_transform 24 | self.loader = loader 25 | self.shift = shift 26 | 27 | def __getitem__(self, index): 28 | scene = self.scene_list[self.indices[index]] 29 | 30 | i1 = np.random.randint(0, len(scene['imgs'])) 31 | shift = round(2*self.shift*np.random.uniform()) 32 | i2 = min(len(scene['imgs'])-1, i1+shift) 33 | displacement = scene['time_step']*np.array(scene['speed']).astype(np.float32)*self.shift 34 | 35 | if np.random.uniform() > 0.5: 36 | # swap i1 and i2 37 | i1, i2 = i2, i1 38 | displacement *= -1 39 | 40 | inputs = [scene['imgs'][i1], scene['imgs'][i2]] 41 | target = scene['depth'][i2] 42 | inputs, target = self.loader(self.root/scene['subdir'], inputs, target) 43 | 44 | if i1 == i2: 45 | target.fill(100) 46 | else: 47 | target *= self.shift/np.abs(i2-i1) 48 | if self.co_transform is not None: 49 | inputs, target, displacement = self.co_transform(inputs, target, displacement) 50 | if self.transform is not None: 51 | inputs[0] = self.transform(inputs[0]) 52 | inputs[1] = self.transform(inputs[1]) 53 | if self.target_transform is not None: 54 | target = self.target_transform(target) 55 | return inputs, target, displacement 56 | 57 | def __len__(self): 58 | return len(self.indices) 59 | -------------------------------------------------------------------------------- /datasets/stillbox.py: -------------------------------------------------------------------------------- 1 | import random 2 | import math 3 | from .listdataset import ListDataset 4 | from .scenelistdataset import SceneListDataset 5 | import json 6 | from path import Path 7 | import numpy as np 8 | 9 | 10 | def make_dataset(root_dir, split=0, shift=3, seed=None): 11 | """Will search for subfolder and will read metadata json files.""" 12 | global args 13 | random.seed(seed) 14 | scenes = [] 15 | for sub_dir in root_dir.dirs(): 16 | metadata_path = sub_dir/'metadata.json' 17 | with open(metadata_path, 'r') as f: 18 | metadata = json.load(f) 19 | for scene in metadata['scenes']: 20 | scene['subdir'] = sub_dir.basename() 21 | scenes.extend(metadata['scenes']) 22 | 23 | assert(len(scenes) > 0) 24 | random.shuffle(scenes) 25 | split_index = math.floor(len(scenes)*split/100) 26 | assert(split_index >= 0 and split_index <= len(scenes)) 27 | train_scenes = scenes[:split_index] 28 | test_images = [] 29 | if split_index < len(scenes): 30 | for scene in scenes[split_index+1:]: 31 | imgs = scene['imgs'] 32 | for i in range(len(imgs)-shift): 33 | img_pair = [str(scene['subdir']/imgs[i]), str(scene['subdir']/imgs[i+shift])] 34 | depth = str(scene['subdir']/scene['depth'][i + shift]) 35 | displacement = np.array(scene['speed']).astype(np.float32)*shift*scene['time_step'] 36 | test_images.append( 37 | [img_pair, 38 | depth, 39 | displacement] 40 | ) 41 | return (train_scenes, test_images) 42 | 43 | 44 | def still_box(root, transform=None, target_transform=None, 45 | co_transform=None, split=80, shift=3, seed=None): 46 | root = Path(root) 47 | train_scenes, test_list = make_dataset(root, split, shift, seed) 48 | train_dataset = SceneListDataset(root, train_scenes, shift, transform, target_transform, co_transform) 49 | test_dataset = ListDataset(root, test_list, transform, target_transform) 50 | 51 | return train_dataset, test_dataset -------------------------------------------------------------------------------- /images/dataset.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ClementPinard/DepthNet/3c753fc21b06c9be307d73c8e7a0c61f2ea56cc3/images/dataset.gif -------------------------------------------------------------------------------- /images/still.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ClementPinard/DepthNet/3c753fc21b06c9be307d73c8e7a0c61f2ea56cc3/images/still.gif -------------------------------------------------------------------------------- /loss.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | import torch.nn.functional as F 3 | 4 | 5 | def depth_metric_reconstruction_loss(depth, target, weights=None, loss='L1', normalize=False): 6 | def one_scale(depth, target, loss_function, normalize): 7 | b, h, w = depth.size() 8 | 9 | target_scaled = F.interpolate(target.unsqueeze(1), size=(h, w), mode='area')[:,0] 10 | 11 | diff = depth-target_scaled 12 | 13 | if normalize: 14 | diff = diff/target_scaled 15 | 16 | return loss_function(diff, depth.detach()*0) 17 | 18 | if weights is not None: 19 | assert(len(weights) == len(depth)) 20 | else: 21 | weights = [1 for d in depth] 22 | if type(depth) not in [list, tuple]: 23 | depth = [depth] 24 | 25 | if type(loss) is str: 26 | assert(loss in ['L1', 'MSE', 'SmoothL1']) 27 | 28 | if loss == 'L1': 29 | loss_function = nn.L1Loss() 30 | elif loss == 'MSE': 31 | loss_function = nn.MSELoss() 32 | elif loss == 'SmoothL1': 33 | loss_function = nn.SmoothL1Loss() 34 | else: 35 | loss_function = loss 36 | 37 | loss_output = 0 38 | for d, w in zip(depth, weights): 39 | loss_output += w*one_scale(d, target, loss_function, normalize) 40 | return loss_output -------------------------------------------------------------------------------- /models/DepthNet.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | import torch.nn as nn 3 | from models.utils import conv, deconv, predict_depth, post_process_depth, adaptative_cat, init_modules 4 | 5 | 6 | class DepthNet(nn.Module): 7 | 8 | def __init__(self, batch_norm=False, with_confidence=False, clamp=False, depth_activation=None): 9 | super(DepthNet, self).__init__() 10 | 11 | self.clamp = clamp 12 | if depth_activation == 'elu': 13 | self.depth_activation = lambda x: nn.functional.elu(x) + 1 14 | else: 15 | self.depth_activation = depth_activation 16 | 17 | self.conv1 = conv( 6, 32, stride=2, batch_norm=batch_norm) 18 | self.conv2 = conv( 32, 64, stride=2, batch_norm=batch_norm) 19 | self.conv3 = conv( 64, 128, stride=2, batch_norm=batch_norm) 20 | self.conv3_1 = conv(128, 128, batch_norm=batch_norm) 21 | self.conv4 = conv(128, 256, stride=2, batch_norm=batch_norm) 22 | self.conv4_1 = conv(256, 256, batch_norm=batch_norm) 23 | self.conv5 = conv(256, 256, stride=2, batch_norm=batch_norm) 24 | self.conv5_1 = conv(256, 256, batch_norm=batch_norm) 25 | self.conv6 = conv(256, 512, stride=2, batch_norm=batch_norm) 26 | self.conv6_1 = conv(512, 512, batch_norm=batch_norm) 27 | 28 | self.deconv5 = deconv(512, 256, batch_norm=batch_norm) 29 | self.deconv4 = deconv(513, 128, batch_norm=batch_norm) 30 | self.deconv3 = deconv(385, 64, batch_norm=batch_norm) 31 | self.deconv2 = deconv(193, 32, batch_norm=batch_norm) 32 | 33 | self.predict_depth6 = predict_depth(512, with_confidence) 34 | self.predict_depth5 = predict_depth(513, with_confidence) 35 | self.predict_depth4 = predict_depth(385, with_confidence) 36 | self.predict_depth3 = predict_depth(193, with_confidence) 37 | self.predict_depth2 = predict_depth( 97, with_confidence) 38 | 39 | self.upsampled_depth6_to_5 = nn.ConvTranspose2d(1, 1, 4, 2, 1, bias=False) 40 | self.upsampled_depth5_to_4 = nn.ConvTranspose2d(1, 1, 4, 2, 1, bias=False) 41 | self.upsampled_depth4_to_3 = nn.ConvTranspose2d(1, 1, 4, 2, 1, bias=False) 42 | self.upsampled_depth3_to_2 = nn.ConvTranspose2d(1, 1, 4, 2, 1, bias=False) 43 | 44 | init_modules(self) 45 | 46 | def forward(self, x): 47 | out_conv2 = self.conv2(self.conv1(x)) 48 | out_conv3 = self.conv3_1(self.conv3(out_conv2)) 49 | out_conv4 = self.conv4_1(self.conv4(out_conv3)) 50 | out_conv5 = self.conv5_1(self.conv5(out_conv4)) 51 | out_conv6 = self.conv6_1(self.conv6(out_conv5)) 52 | 53 | out6 = self.predict_depth6(out_conv6) 54 | depth6 = post_process_depth(out6, clamp=self.clamp, activation_function=self.depth_activation) 55 | depth6_up = self.upsampled_depth6_to_5(out6) 56 | out_deconv5 = self.deconv5(out_conv6) 57 | 58 | concat5 = adaptative_cat(out_conv5, out_deconv5, depth6_up) 59 | out5 = self.predict_depth5(concat5) 60 | depth5 = post_process_depth(out5, clamp=self.clamp, activation_function=self.depth_activation) 61 | depth5_up = self.upsampled_depth5_to_4(out5) 62 | out_deconv4 = self.deconv4(concat5) 63 | 64 | concat4 = adaptative_cat(out_conv4, out_deconv4, depth5_up) 65 | out4 = self.predict_depth4(concat4) 66 | depth4 = post_process_depth(out4, clamp=self.clamp, activation_function=self.depth_activation) 67 | depth4_up = self.upsampled_depth4_to_3(out4) 68 | out_deconv3 = self.deconv3(concat4) 69 | 70 | concat3 = adaptative_cat(out_conv3, out_deconv3, depth4_up) 71 | out3 = self.predict_depth3(concat3) 72 | depth3 = post_process_depth(out3, clamp=self.clamp, activation_function=self.depth_activation) 73 | depth3_up = self.upsampled_depth3_to_2(out3) 74 | out_deconv2 = self.deconv2(concat3) 75 | 76 | concat2 = adaptative_cat(out_conv2, out_deconv2, depth3_up) 77 | out2 = self.predict_depth2(concat2) 78 | depth2 = post_process_depth(out2, clamp=self.clamp, activation_function=self.depth_activation) 79 | 80 | if self.training: 81 | return [depth2, depth3, depth4, depth5, depth6] 82 | else: 83 | return depth2 -------------------------------------------------------------------------------- /models/__init__.py: -------------------------------------------------------------------------------- 1 | from .DepthNet import DepthNet -------------------------------------------------------------------------------- /models/utils.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | import torch 4 | import torch.nn as nn 5 | from torch.nn.init import xavier_normal_, constant_ 6 | 7 | 8 | def conv(in_planes, out_planes, stride=1, batch_norm=False): 9 | if batch_norm: 10 | return nn.Sequential( 11 | nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False), 12 | nn.BatchNorm2d(out_planes, eps=1e-3), 13 | nn.ReLU(inplace=True) 14 | ) 15 | else: 16 | return nn.Sequential( 17 | nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=True), 18 | nn.ReLU(inplace=True) 19 | ) 20 | 21 | 22 | def deconv(in_planes, out_planes, batch_norm=False): 23 | if batch_norm: 24 | return nn.Sequential( 25 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1, bias=True), 26 | nn.Conv2d(out_planes, out_planes, kernel_size=3, stride=1, padding=1, bias=False), 27 | nn.BatchNorm2d(out_planes, eps=1e-3), 28 | nn.ReLU(inplace=True) 29 | ) 30 | else: 31 | return nn.Sequential( 32 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1, bias=True), 33 | nn.Conv2d(out_planes, out_planes, kernel_size=3, stride=1, padding=1, bias=True), 34 | nn.ReLU(inplace=True) 35 | ) 36 | 37 | 38 | def predict_depth(in_planes, with_confidence): 39 | return nn.Conv2d(in_planes, 2 if with_confidence else 1, kernel_size=3, stride=1, padding=1, bias=True) 40 | 41 | 42 | def post_process_depth(depth, activation_function=None, clamp=False): 43 | if activation_function is not None: 44 | depth = activation_function(depth) 45 | 46 | if clamp: 47 | depth = depth.clamp(10, 60) 48 | 49 | return depth[:,0] 50 | 51 | 52 | def adaptative_cat(out_conv, out_deconv, out_depth_up): 53 | out_deconv = out_deconv[:, :, :out_conv.size(2), :out_conv.size(3)] 54 | out_depth_up = out_depth_up[:, :, :out_conv.size(2), :out_conv.size(3)] 55 | return torch.cat((out_conv, out_deconv, out_depth_up), 1) 56 | 57 | 58 | def init_modules(net): 59 | for m in net.modules(): 60 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d): 61 | xavier_normal_(m.weight) 62 | if m.bias is not None: 63 | constant_(m.bias, 0) 64 | elif isinstance(m, nn.BatchNorm2d): 65 | constant_(m.weight, 1) 66 | constant_(m.bias, 0) -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | torch>=0.4.1 2 | imageio 3 | argparse 4 | tensorboardX 5 | blessings 6 | progressbar2 7 | path.py 8 | -------------------------------------------------------------------------------- /run_inference.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | from scipy.misc import imread, imsave, imresize 4 | import numpy as np 5 | from path import Path 6 | import argparse 7 | from tqdm import tqdm 8 | 9 | import torch.nn.functional as F 10 | from models import DepthNet 11 | from util import tensor2array 12 | 13 | parser = argparse.ArgumentParser(description='Inference script for DepthNet img must be with no rotation', 14 | formatter_class=argparse.ArgumentDefaultsHelpFormatter) 15 | parser.add_argument("--output-disp", action='store_true', help="save disparity img") 16 | parser.add_argument("--output-depth", action='store_true', help="save depth img") 17 | parser.add_argument("--output-raw", action='store_true', help="save raw numpy depth array") 18 | 19 | parser.add_argument("--pretrained", required=True, type=str, help="pretrained DepthNet path") 20 | parser.add_argument("--frame-shift", default=1, type=int, help="temporal shift between imgs of the pairs feeded to the network") 21 | parser.add_argument("--img-height", default=512, type=int, help="Image height") 22 | parser.add_argument("--img-width", default=512, type=int, help="Image width") 23 | parser.add_argument("--no-resize", action='store_true', help="no resizing is done") 24 | 25 | parser.add_argument("--dataset-list", default=None, type=str, help="Dataset list file") 26 | parser.add_argument("--dataset-dir", default='.', type=str, help="Dataset directory") 27 | parser.add_argument("--output-dir", default='output', type=str, help="Output directory") 28 | 29 | parser.add_argument("--img-exts", default=['png', 'jpg', 'bmp'], nargs='*', type=str, help="images extensions to glob") 30 | 31 | 32 | @torch.no_grad() 33 | def main(): 34 | args = parser.parse_args() 35 | device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") 36 | if not(args.output_disp or args.output_depth): 37 | print('You must at least output one value !') 38 | return 39 | 40 | weights = torch.load(args.pretrained) 41 | depth_net = DepthNet(batch_norm=weights['bn'], 42 | depth_activation=weights['activation_function'], 43 | clamp=weights['clamp']).to(device) 44 | print("running inference with {} ...".format(weights['arch'])) 45 | depth_net.load_state_dict(weights['state_dict']) 46 | depth_net.eval() 47 | 48 | dataset_dir = Path(args.dataset_dir) 49 | output_dir = Path(args.output_dir) 50 | output_dir.makedirs_p() 51 | 52 | if args.dataset_list is not None: 53 | with open(args.dataset_list, 'r') as f: 54 | test_files = [dataset_dir/file for file in f.read().splitlines()] 55 | else: 56 | test_files = sorted(sum([dataset_dir.files('*.{}'.format(ext)) for ext in args.img_exts], [])) 57 | 58 | print('{} files to test'.format(len(test_files))) 59 | 60 | for file1, file2 in tqdm(zip(test_files[:-args.frame_shift], test_files[args.frame_shift:])): 61 | 62 | img1 = imread(file1).astype(np.float32) 63 | img2 = imread(file2).astype(np.float32) 64 | 65 | h,w,_ = img1.shape 66 | assert(img1.shape == img2.shape), "img1 and img2 must be the same size" 67 | if (not args.no_resize) and (h != args.img_height or w != args.img_width): 68 | img1 = imresize(img1, (args.img_height, args.img_width)).astype(np.float32) 69 | img2 = imresize(img2, (args.img_height, args.img_width)).astype(np.float32) 70 | imgs = np.concatenate([np.transpose(img1, (2, 0, 1)), np.transpose(img2, (2, 0, 1))]) 71 | 72 | tensor_imgs = torch.from_numpy(imgs).unsqueeze(0).to(device) 73 | tensor_imgs = ((tensor_imgs/255 - 0.5)/0.2) 74 | 75 | output_depth = depth_net(tensor_imgs) 76 | 77 | upscaled_output = F.interpolate(output_depth.unsqueeze(1), (h,w), mode='bilinear', align_corners=False)[0,0] 78 | 79 | if args.output_disp: 80 | disp = 1/upscaled_output 81 | disp = (255*tensor2array(disp, max_value=None, colormap='bone')).astype(np.uint8) 82 | imsave(output_dir/'{}_disp{}'.format(file2.namebase, file2.ext), disp.transpose(1,2,0)) 83 | if args.output_depth: 84 | depth = (255*tensor2array(upscaled_output, max_value=100, colormap='rainbow')).astype(np.uint8) 85 | imsave(output_dir/'{}_depth{}'.format(file2.namebase, file2.ext), depth.transpose(1,2,0)) 86 | if args.output_raw: 87 | np.save(output_dir/'{}_depth.npy'.format(file2.namebase), output_depth.cpu()) 88 | 89 | 90 | if __name__ == '__main__': 91 | main() 92 | -------------------------------------------------------------------------------- /terminal_logger.py: -------------------------------------------------------------------------------- 1 | from blessings import Terminal 2 | import progressbar 3 | import sys 4 | 5 | 6 | class TermLogger(object): 7 | def __init__(self, n_epochs, train_size, test_size): 8 | self.n_epochs = n_epochs 9 | self.train_size = train_size 10 | self.test_size = test_size 11 | self.t = Terminal() 12 | s = 10 13 | e = 1 # epoch bar position 14 | tr = 3 # train bar position 15 | ts = 6 # test bar position 16 | h = self.t.height 17 | 18 | for i in range(10): 19 | print('') 20 | self.epoch_bar = progressbar.ProgressBar(max_value=n_epochs, fd=Writer(self.t, (0, h-s+e))) 21 | 22 | self.train_writer = Writer(self.t, (0, h-s+tr)) 23 | self.train_bar_writer = Writer(self.t, (0, h-s+tr+1)) 24 | 25 | self.test_writer = Writer(self.t, (0, h-s+ts)) 26 | self.test_bar_writer = Writer(self.t, (0, h-s+ts+1)) 27 | 28 | self.reset_train_bar() 29 | self.reset_test_bar() 30 | 31 | def reset_train_bar(self): 32 | self.train_bar = progressbar.ProgressBar(max_value=self.train_size, fd=self.train_bar_writer) 33 | 34 | def reset_test_bar(self): 35 | self.test_bar = progressbar.ProgressBar(max_value=self.test_size, fd=self.test_bar_writer) 36 | 37 | 38 | class Writer(object): 39 | """Create an object with a write method that writes to a 40 | specific place on the screen, defined at instantiation. 41 | 42 | This is the glue between blessings and progressbar. 43 | """ 44 | 45 | def __init__(self, t, location): 46 | """ 47 | Input: location - tuple of ints (x, y), the position 48 | of the bar in the terminal 49 | """ 50 | self.location = location 51 | self.t = t 52 | 53 | def write(self, string): 54 | with self.t.location(*self.location): 55 | sys.stdout.write("\033[K") 56 | print(string) 57 | 58 | def flush(self): 59 | return -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import time 3 | import csv 4 | import os 5 | 6 | import torch 7 | import torch.backends.cudnn as cudnn 8 | import torch.optim 9 | import torch.utils.data 10 | import torchvision.transforms as transforms 11 | import co_transforms 12 | import models 13 | import datasets 14 | from loss import depth_metric_reconstruction_loss as metric_loss 15 | from terminal_logger import TermLogger 16 | from tensorboardX import SummaryWriter 17 | 18 | import util 19 | from util import AverageMeter 20 | 21 | 22 | parser = argparse.ArgumentParser(description='PyTorch DepthNet Training on Still Box dataset') 23 | util.set_arguments(parser) 24 | 25 | best_error = -1 26 | n_iter = 0 27 | device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") 28 | 29 | 30 | def main(): 31 | global args, best_error, viz 32 | args = util.set_params(parser) 33 | 34 | train_writer = SummaryWriter(args.save_path/'train') 35 | val_writer = SummaryWriter(args.save_path/'val') 36 | output_writers = [] 37 | if args.log_output: 38 | for i in range(3): 39 | output_writers.append(SummaryWriter(args.save_path/'val'/str(i))) 40 | torch.manual_seed(args.seed) 41 | 42 | # Data loading code 43 | mean = [0.5, 0.5, 0.5] 44 | std = [0.2, 0.2, 0.2] 45 | normalize = transforms.Normalize(mean=mean, 46 | std=std) 47 | input_transform = transforms.Compose([ 48 | co_transforms.ArrayToTensor(), 49 | transforms.Normalize(mean=[0, 0, 0], std=[255, 255, 255]), 50 | normalize 51 | ]) 52 | target_transform = transforms.Compose([ 53 | co_transforms.Clip(0, 100), 54 | co_transforms.ArrayToTensor() 55 | ]) 56 | co_transform = co_transforms.Compose([ 57 | co_transforms.RandomVerticalFlip(), 58 | co_transforms.RandomHorizontalFlip() 59 | ]) 60 | 61 | print("=> fetching scenes in '{}'".format(args.data)) 62 | train_set, val_set = datasets.still_box( 63 | args.data, 64 | transform=input_transform, 65 | target_transform=target_transform, 66 | co_transform=co_transform, 67 | split=args.split, 68 | seed=args.seed 69 | ) 70 | print('{} samples found, {} train scenes and {} validation samples '.format(len(val_set)+len(train_set), 71 | len(train_set), 72 | len(val_set))) 73 | train_loader = torch.utils.data.DataLoader( 74 | train_set, batch_size=args.batch_size, shuffle=True, 75 | num_workers=args.workers, pin_memory=True) 76 | val_loader = torch.utils.data.DataLoader( 77 | val_set, batch_size=args.batch_size, 78 | shuffle=False, 79 | num_workers=args.workers, pin_memory=True) 80 | if args.epoch_size == 0: 81 | args.epoch_size = len(train_loader) 82 | # create model 83 | if args.pretrained: 84 | data = torch.load(args.pretrained) 85 | assert(not data['with_confidence']) 86 | print("=> using pre-trained model '{}'".format(data['arch'])) 87 | model = models.DepthNet(batch_norm=data['bn'], clamp=args.clamp, depth_activation=args.activation_function) 88 | model.load_state_dict(data['state_dict']) 89 | else: 90 | print("=> creating model '{}'".format(args.arch)) 91 | model = models.DepthNet(batch_norm=args.bn, clamp=args.clamp, depth_activation=args.activation_function) 92 | 93 | model = model.to(device) 94 | model = torch.nn.DataParallel(model) 95 | cudnn.benchmark = True 96 | 97 | assert(args.solver in ['adam', 'sgd']) 98 | print('=> setting {} solver'.format(args.solver)) 99 | if args.solver == 'adam': 100 | optimizer = torch.optim.Adam(model.parameters(), args.lr, 101 | betas=(args.momentum, args.beta), 102 | weight_decay=args.weight_decay) 103 | elif args.solver == 'sgd': 104 | optimizer = torch.optim.SGD(model.parameters(), args.lr, 105 | momentum=args.momentum, 106 | weight_decay=args.weight_decay, 107 | dampening=args.momentum) 108 | 109 | scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, 110 | milestones=[19,30,44,53], 111 | gamma=0.3) 112 | 113 | with open(os.path.join(args.save_path, args.log_summary), 'w') as csvfile: 114 | writer = csv.writer(csvfile, delimiter='\t') 115 | writer.writerow(['train_loss', 'train_depth_error', 'normalized_train_depth_error', 'depth_error', 'normalized_depth_error']) 116 | 117 | with open(os.path.join(args.save_path, args.log_full), 'w') as csvfile: 118 | writer = csv.writer(csvfile, delimiter='\t') 119 | writer.writerow(['train_loss', 'train_depth_error']) 120 | 121 | term_logger = TermLogger(n_epochs=args.epochs, train_size=min(len(train_loader), args.epoch_size), test_size=len(val_loader)) 122 | term_logger.epoch_bar.start() 123 | 124 | if args.evaluate: 125 | depth_error, normalized = validate(val_loader, model, 0, term_logger, output_writers) 126 | term_logger.test_writer.write(' * Depth error : {:.3f}, normalized : {:.3f}'.format(depth_error, normalized)) 127 | return 128 | 129 | for epoch in range(args.epochs): 130 | term_logger.epoch_bar.update(epoch) 131 | scheduler.step() 132 | 133 | # train for one epoch 134 | term_logger.reset_train_bar() 135 | term_logger.train_bar.start() 136 | train_loss, train_error, train_normalized_error = train(train_loader, model, optimizer, args.epoch_size, term_logger, train_writer) 137 | term_logger.train_writer.write(' * Avg Loss : {:.3f}, Avg Depth error : {:.3f}, normalized : {:.3f}' 138 | .format(train_loss, train_error, train_normalized_error)) 139 | train_writer.add_scalar('metric_error', train_error, epoch) 140 | train_writer.add_scalar('metric_normalized_error', train_normalized_error, epoch) 141 | 142 | # evaluate on validation set 143 | term_logger.reset_test_bar() 144 | term_logger.test_bar.start() 145 | depth_error, normalized = validate(val_loader, model, epoch, term_logger, output_writers) 146 | term_logger.test_writer.write(' * Depth error : {:.3f}, normalized : {:.3f}'.format(depth_error, normalized)) 147 | val_writer.add_scalar('metric_error', depth_error, epoch) 148 | val_writer.add_scalar('metric_normalized_error', normalized, epoch) 149 | 150 | if best_error < 0: 151 | best_error = depth_error 152 | 153 | # remember lowest error and save checkpoint 154 | is_best = depth_error < best_error 155 | best_error = min(depth_error, best_error) 156 | util.save_checkpoint( 157 | args.save_path, { 158 | 'epoch': epoch + 1, 159 | 'arch': args.arch, 160 | 'state_dict': model.state_dict(), 161 | 'best_error': best_error, 162 | 'bn': args.bn, 163 | 'with_confidence': False, 164 | 'activation_function': args.activation_function, 165 | 'clamp': args.clamp, 166 | 'mean': mean, 167 | 'std': std 168 | }, 169 | is_best) 170 | 171 | with open(os.path.join(args.save_path, args.log_summary), 'a') as csvfile: 172 | writer = csv.writer(csvfile, delimiter='\t') 173 | writer.writerow([train_loss, train_error, depth_error]) 174 | term_logger.epoch_bar.finish() 175 | 176 | 177 | def train(train_loader, model, optimizer, epoch_size, term_logger, train_writer): 178 | global n_iter, args 179 | batch_time = AverageMeter() 180 | data_time = AverageMeter() 181 | losses = AverageMeter() 182 | depth2_metric_errors = AverageMeter() 183 | depth2_normalized_errors = AverageMeter() 184 | 185 | # switch to train mode 186 | model.train() 187 | 188 | end = time.time() 189 | 190 | for i, (input, target, _) in enumerate(train_loader): 191 | # measure data loading time 192 | data_time.update(time.time() - end) 193 | target = target.to(device) 194 | input = torch.cat(input,1).to(device) 195 | 196 | # compute output 197 | output = model(input) 198 | 199 | loss = metric_loss(output, target, weights=(0.32, 0.08, 0.02, 0.01, 0.005), loss=args.loss) 200 | depth2_norm_error = metric_loss(output[0], target, normalize=True) 201 | depth2_metric_error = metric_loss(output[0], target, normalize=False) 202 | # record loss and EPE 203 | losses.update(loss.item(), target.size(0)) 204 | train_writer.add_scalar('train_loss', loss.item(), n_iter) 205 | depth2_metric_errors.update(depth2_metric_error.item(), target.size(0)) 206 | depth2_normalized_errors.update(depth2_norm_error.item(), target.size(0)) 207 | 208 | # compute gradient and do SGD step 209 | optimizer.zero_grad() 210 | loss.backward() 211 | optimizer.step() 212 | 213 | # measure elapsed time 214 | batch_time.update(time.time() - end) 215 | end = time.time() 216 | 217 | with open(os.path.join(args.save_path, args.log_full), 'a') as csvfile: 218 | writer = csv.writer(csvfile, delimiter='\t') 219 | writer.writerow([loss.item(), depth2_metric_error.item()]) 220 | term_logger.train_bar.update(i+1) 221 | if i % args.print_freq == 0: 222 | term_logger.train_writer.write( 223 | 'Train: Time {batch_time.val:.3f} ({batch_time.avg:.3f}) ' 224 | 'Data {data_time.val:.3f} ({data_time.avg:.3f}) ' 225 | 'Loss {loss.val:.4f} ({loss.avg:.4f}) ' 226 | 'Depth error {depth2_error.val:.3f} ({depth2_error.avg:.3f})\r' 227 | .format(batch_time=batch_time, data_time=data_time, 228 | loss=losses, depth2_error=depth2_metric_errors)) 229 | if i >= epoch_size - 1: 230 | break 231 | n_iter += 1 232 | 233 | return losses.avg, depth2_metric_errors.avg, depth2_normalized_errors.avg 234 | 235 | 236 | @torch.no_grad() 237 | def validate(val_loader, model, epoch, logger, output_writers=[]): 238 | batch_time = AverageMeter() 239 | depth2_metric_errors = AverageMeter() 240 | depth2_norm_errors = AverageMeter() 241 | log_outputs = len(output_writers) > 0 242 | # switch to evaluate mode 243 | model.eval() 244 | 245 | end = time.time() 246 | 247 | for i, (input, target, _) in enumerate(val_loader): 248 | target = target.to(device) 249 | input = torch.cat(input, 1).to(device) 250 | # compute output 251 | output = model(input) 252 | if log_outputs and i < len(output_writers): # log first output of 3 first batches 253 | if epoch == 0: 254 | output_writers[i].add_image('GroundTruth', util.tensor2array(target[0], max_value=100), 0) 255 | output_writers[i].add_image('Inputs', util.tensor2array(input[0,:3]), 0) 256 | output_writers[i].add_image('Inputs', util.tensor2array(input[0,3:]), 1) 257 | output_writers[i].add_image('DepthNet Outputs', util.tensor2array(output[0], max_value=100), epoch) 258 | depth2_norm_error = metric_loss(output, target, normalize=True) 259 | depth2_metric_error = metric_loss(output, target, normalize=False) 260 | # record depth error 261 | depth2_norm_errors.update(depth2_norm_error.item(), target.size(0)) 262 | depth2_metric_errors.update(depth2_metric_error.item(), target.size(0)) 263 | 264 | # measure elapsed time 265 | batch_time.update(time.time() - end) 266 | end = time.time() 267 | logger.test_bar.update(i+1) 268 | if i % args.print_freq == 0: 269 | logger.test_writer.write( 270 | 'Validation: ' 271 | 'Time {batch_time.val:.3f} ({batch_time.avg:.3f}) ' 272 | 'Depth error {depth2_error.val:.3f} ({depth2_error.avg:.3f})' 273 | .format(batch_time=batch_time, 274 | depth2_error=depth2_metric_errors)) 275 | 276 | return depth2_metric_errors.avg, depth2_norm_errors.avg 277 | 278 | 279 | if __name__ == '__main__': 280 | main() 281 | -------------------------------------------------------------------------------- /util.py: -------------------------------------------------------------------------------- 1 | import shutil 2 | import datetime 3 | import torch 4 | from torch.autograd import Variable 5 | from path import Path 6 | import numpy as np 7 | 8 | 9 | def set_arguments(parser): 10 | parser.add_argument('data', metavar='DIR', 11 | help='path to dataset') 12 | parser.add_argument('--activation-function', default=None, 13 | help='activation function to apply to DepthNet') 14 | parser.add_argument('--bn', action='store_true', 15 | help='activate batchNorm (overwritten if pretrained model)') 16 | parser.add_argument('--clamp', action='store_true', 17 | help='activate depth clamping to (10,60) in forward pass') 18 | parser.add_argument('--solver', default='sgd', choices=['adam', 'sgd'], 19 | help='solvers: adam | sgd') 20 | parser.add_argument('-j', '--workers', default=4, type=int, metavar='N', 21 | help='number of data loading workers (default: 4)') 22 | parser.add_argument('--epochs', default=55, type=int, metavar='N', 23 | help='number of total epochs to run (default: 55') 24 | parser.add_argument('--epoch-size', default=0, type=int, metavar='N', 25 | help='manual epoch size (will match dataset size if not set)') 26 | parser.add_argument('-b', '--batch-size', default=256, type=int, 27 | metavar='N', help='mini-batch size (default: 256)') 28 | parser.add_argument('--lr', '--learning-rate', default=0.01, type=float, 29 | metavar='LR', help='initial learning rate') 30 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M', 31 | help='momentum for sgd, alpha parameter for adam') 32 | parser.add_argument('--beta', default=0.999, type=float, metavar='M', 33 | help='beta parameters for adam') 34 | parser.add_argument('--weight-decay', '--wd', default=4e-4, type=float, 35 | metavar='W', help='weight decay (default: 4e-4)') 36 | parser.add_argument('--print-freq', '-p', default=10, type=int, 37 | metavar='N', help='print frequency (default: 10)') 38 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', 39 | help='evaluate model on validation set') 40 | parser.add_argument('--pretrained', dest='pretrained', default=None, 41 | help='path to pre-trained model') 42 | parser.add_argument('--seed', default=0, type=int, help='seed for random functions, test/train split, network initialization') 43 | parser.add_argument('-s', '--split', default=90, type=float, metavar='%', 44 | help='split percentage of train samples vs test (default: 90)') 45 | parser.add_argument('--log-summary', default='progress_log_summary.csv', 46 | help='csv where to save per-epoch train and test stats') 47 | parser.add_argument('--log-full', default='progress_log_full.csv', 48 | help='csv where to save per-gradient descent train stats') 49 | parser.add_argument('--no-date', action='store_true', 50 | help='don\'t append date timestamp to folder') 51 | parser.add_argument('--loss', default='L1', help='loss function to apply to multiScaleCriterion : L1 (default)| SmoothL1| MSE') 52 | parser.add_argument('--log-output', action='store_true', help='logs in tensorboard some outputs of the network during test phase. Needs OpenCV 3') 53 | 54 | 55 | def set_params(parser, with_confidence=False): 56 | args = parser.parse_args() 57 | args.data = Path(args.data) 58 | folder_name = args.data.normpath().name 59 | arch_string = 'DepthNet' 60 | if with_confidence: 61 | arch_string += '_confidence' 62 | if args.activation_function is not None: 63 | arch_string += '_'+args.activation_function 64 | if args.bn: 65 | arch_string += '_bn' 66 | if args.clamp: 67 | arch_string += '_clamp' 68 | args.arch = arch_string 69 | 70 | save_path = '{},{}epochs{},b{},lr{}'.format( 71 | args.solver, 72 | args.epochs, 73 | ',epochSize'+str(args.epoch_size) if args.epoch_size > 0 else '', 74 | args.batch_size, 75 | args.lr) 76 | save_path = Path(save_path) 77 | if not args.no_date: 78 | timestamp = datetime.datetime.now().strftime("%m-%d-%H:%M") 79 | save_path = save_path/timestamp 80 | args.save_path = Path('Results')/arch_string/folder_name/save_path 81 | print('=> will save everything to {}'.format(save_path)) 82 | args.save_path.makedirs_p() 83 | return args 84 | 85 | 86 | def save_checkpoint(save_path, state, is_best, filename='checkpoint.pth.tar'): 87 | torch.save(state, save_path/filename) 88 | if is_best: 89 | shutil.copyfile(save_path/filename, save_path/'model_best.pth.tar') 90 | 91 | 92 | class AverageMeter(object): 93 | """Computes and stores the average and current value.""" 94 | 95 | def __init__(self): 96 | self.reset() 97 | 98 | def reset(self): 99 | self.val = 0 100 | self.avg = 0 101 | self.sum = 0 102 | self.count = 0 103 | 104 | def update(self, val, n=1): 105 | self.val = val 106 | self.sum += val * n 107 | self.count += n 108 | self.avg = self.sum / self.count 109 | 110 | 111 | def adjust_learning_rate(optimizer, epoch): 112 | # Set the learning rate to the initial LR decayed by 2 after 300K iterations, 400K and 500K 113 | 114 | if epoch == 19 or epoch == 44: 115 | for param_group in optimizer.param_groups: 116 | param_group['lr'] = param_group['lr']/2 117 | if epoch == 30 or epoch == 53: 118 | for param_group in optimizer.param_groups: 119 | param_group['lr'] = param_group['lr']/5 120 | 121 | 122 | def tensor2array(tensor, max_value=255, colormap='rainbow'): 123 | tensor = tensor.detach().cpu() 124 | if max_value is None: 125 | max_value = tensor.max().item() 126 | if tensor.ndimension() == 2 or tensor.size(0) == 1: 127 | try: 128 | import cv2 129 | if int(cv2.__version__[0]) >= 3: 130 | color_cvt = cv2.COLOR_BGR2RGB 131 | else: # 2.4 132 | color_cvt = cv2.cv.CV_BGR2RGB 133 | if colormap == 'rainbow': 134 | colormap = cv2.COLORMAP_RAINBOW 135 | elif colormap == 'bone': 136 | colormap = cv2.COLORMAP_BONE 137 | array = (255*tensor.squeeze().numpy()/max_value).clip(0, 255).astype(np.uint8) 138 | colored_array = cv2.applyColorMap(array, colormap) 139 | array = cv2.cvtColor(colored_array, color_cvt).astype(np.float32)/255 140 | except ImportError: 141 | if tensor.ndimension() == 2: 142 | tensor.unsqueeze_(2) 143 | array = (tensor.expand(tensor.size(0), tensor.size(1), 3).numpy()/max_value).clip(0,1) 144 | array = array.transpose(2, 0, 1) 145 | 146 | elif tensor.ndimension() == 3: 147 | assert(tensor.size(0) == 3) 148 | array = 0.5 + tensor.numpy()*0.5 149 | return array --------------------------------------------------------------------------------