├── .gitignore ├── LICENSE ├── README.md ├── custom_transforms.py ├── data ├── cityscapes_loader.py ├── kitti_raw_loader.py ├── prepare_train_data.py ├── static_frames.txt └── test_scenes.txt ├── datasets ├── __init__.py ├── general_sequence_folders.py ├── sequence_folders.py ├── stacked_sequence_folders.py ├── validation_flow.py └── validation_folders.py ├── evaluate_flow.py ├── flowutils ├── __init__.py ├── flow_io.py ├── flow_viz.py ├── flowlib.py └── pfm.py ├── inverse_warp.py ├── kitti_eval ├── depth_evaluation_utils.py ├── pose_evaluation_utils.py └── test_files_eigen.txt ├── logger.py ├── loss_functions.py ├── mnist.py ├── mnist_eval.py ├── models ├── DispNetS.py ├── DispNetS6.py ├── DispResNet6.py ├── DispResNetS6.py ├── FlowNetC6.py ├── MaskNet6.py ├── MaskResNet6.py ├── PoseExpNet.py ├── PoseNet6.py ├── PoseNetB6.py ├── __init__.py ├── back2future.py ├── submodules.py └── utils.py ├── requirements.txt ├── run_inference.py ├── sintel_eval ├── pose_evaluation_utils.py └── sintel_io.py ├── ssim.py ├── stillbox_eval ├── depth_evaluation_utils.py └── test_files_90.txt ├── submit_flow.py ├── test_back2future.py ├── test_disp.py ├── test_flow.py ├── test_flownetc.py ├── test_make3d.py ├── test_mask.py ├── test_pose.py ├── test_sintel_pose.py ├── train.py └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | *.pth 3 | *.tar 4 | *.sub 5 | *.npy 6 | *.jpg 7 | *.png 8 | *.zip 9 | main.sh 10 | checkpoints* 11 | visualize/* 12 | !visualize/*.py 13 | log/* 14 | config/* 15 | models/spynet_models/* 16 | dockers/* 17 | test_script.sh 18 | kitti_data/* 19 | datasets/mnist/ 20 | datasets/svhn/ 21 | results/* 22 | pretrained/* 23 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Anurag Ranjan 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Competitive Collaboration 2 | This is an official repository of 3 | **Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation**. The project was formerly referred by **Adversarial Collaboration**. 4 | 5 | ### News 6 | - **16 August '19:** `skimage` dependencies are removed in favour of `PIL`, and are supported in the [`pil` branch](https://github.com/anuragranj/cc/tree/pil). If you discover bugs, please file an issue, or send a pull request. This will eventually be merged with `master` if users are satisfied. 7 | - **11 March '19:** We recently ported the entire code to `pytorch-1.0`, so if you discover bugs, please file an issue. 8 | 9 | [[Project Page]](http://research.nvidia.com/publication/2018-05_Adversarial-Collaboration-Joint) 10 | [[Arxiv]](https://arxiv.org/abs/1805.09806) 11 | 12 | **Skip to:** 13 | - [Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation](#jointcc) 14 | - [Mixed Domain Learning using MNIST+SVHN](#mnist) 15 | - [Download Pretrained Models and Evaluation Data](#downloads) 16 | 17 | ### Prerequisites 18 | Python3 and pytorch are required. Third party libraries can be installed (in a `python3 ` virtualenv) using: 19 | 20 | ```bash 21 | pip3 install -r requirements.txt 22 | ``` 23 | 24 | ## Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation 25 | 26 | ### Preparing training data 27 | 28 | #### KITTI 29 | For [KITTI](http://www.cvlibs.net/datasets/kitti/raw_data.php), first download the dataset using this [script](http://www.cvlibs.net/download.php?file=raw_data_downloader.zip) provided on the official website, and then run the following command. 30 | 31 | ```bash 32 | python3 data/prepare_train_data.py /path/to/raw/kitti/dataset/ --dataset-format 'kitti' --dump-root /path/to/resulting/formatted/data/ --width 832 --height 256 --num-threads 1 --static-frames data/static_frames.txt --with-gt 33 | ``` 34 | 35 | For testing optical flow ground truths on KITTI, download [KITTI2015](http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=flow) dataset. You need to download 1) `stereo 2015/flow 2015/scene flow 2015` data set (2 GB), 2) `multi-view extension` (14 GB), and 3) `calibration files` (1 MB) . In addition, download semantic labels from [here](https://keeper.mpdl.mpg.de/f/239c2dda94e54c449401/?dl=1). You should have the following directory structure: 36 | ``` 37 | kitti2015 38 | | data_scene_flow 39 | | data_scene_flow_calib 40 | | data_scene_flow_multiview 41 | | semantic_labels 42 | ``` 43 | 44 | #### Cityscapes 45 | 46 | For [Cityscapes](https://www.cityscapes-dataset.com/), download the following packages: 1) `leftImg8bit_sequence_trainvaltest.zip`, 2) `camera_trainvaltest.zip`. You will probably need to contact the administrators to be able to get it. 47 | 48 | ```bash 49 | python3 data/prepare_train_data.py /path/to/cityscapes/dataset/ --dataset-format 'cityscapes' --dump-root /path/to/resulting/formatted/data/ --width 832 --height 342 --num-threads 1 50 | ``` 51 | 52 | Notice that for Cityscapes the `img_height` is set to 342 because we crop out the bottom part of the image that contains the car logo, and the resulting image will have height 256. 53 | 54 | ### Training an experiment 55 | 56 | Once the data are formatted following the above instructions, you should be able to run a training experiment. Every experiment you run gets logged in `experiment_recorder.md`. 57 | 58 | ```bash 59 | python3 train.py /path/to/formatted/data --dispnet DispResNet6 --posenet PoseNetB6 \ 60 | --masknet MaskNet6 --flownet Back2Future --pretrained-disp /path/to/pretrained/dispnet \ 61 | --pretrained-pose /path/to/pretrained/posenet --pretrained-flow /path/to/pretrained/flownet \ 62 | --pretrained-mask /path/to/pretrained/masknet -b4 -m0.1 -pf 0.5 -pc 1.0 -s0.1 -c0.3 \ 63 | --epoch-size 1000 --log-output -f 0 --nlevels 6 --lr 1e-4 -wssim 0.997 --with-flow-gt \ 64 | --with-depth-gt --epochs 100 --smoothness-type edgeaware --fix-masknet --fix-flownet \ 65 | --log-terminal --name EXPERIMENT_NAME 66 | ``` 67 | 68 | 69 | You can then start a `tensorboard` session in this folder by 70 | ```bash 71 | tensorboard --logdir=checkpoints/ 72 | ``` 73 | and visualize the training progress by opening [https://localhost:6006](https://localhost:6006) on your browser. 74 | 75 | ### Evaluation 76 | 77 | Disparity evaluation 78 | ```bash 79 | python3 test_disp.py --dispnet DispResNet6 --pretrained-dispnet /path/to/dispnet --pretrained-posent /path/to/posenet --dataset-dir /path/to/KITTI_raw --dataset-list /path/to/test_files_list 80 | ``` 81 | 82 | Test file list is available in kitti eval folder. To get fair comparison with [Original paper evaluation code](https://github.com/tinghuiz/SfMLearner/blob/master/kitti_eval/eval_depth.py), don't specify a posenet. However, if you do, it will be used to solve the scale factor ambiguity, the only ground truth used to get it will be vehicle speed which is far more acceptable for real conditions quality measurement, but you will obviously get worse results. 83 | 84 | For pose evaluation, you need to download [KITTI Odometry](http://www.cvlibs.net/datasets/kitti/eval_odometry.php) dataset. 85 | ```bash 86 | python test_pose.py pretrained/pose_model_best.pth.tar --img-width 832 --img-height 256 --dataset-dir /path/to/kitti/odometry/ --sequences 09 --posenet PoseNetB6 87 | ``` 88 | 89 | Optical Flow evaluation 90 | ```bash 91 | python test_flow.py --pretrained-disp /path/to/dispnet --pretrained-pose /path/to/posenet --pretrained-mask /path/to/masknet --pretrained-flow /path/to/flownet --kitti-dir /path/to/kitti2015/dataset 92 | ``` 93 | 94 | Mask evaluation 95 | ```bash 96 | python test_mask.py --pretrained-disp /path/to/dispnet --pretrained-pose /path/to/posenet --pretrained-mask /path/to/masknet --pretrained-flow /path/to/flownet --kitti-dir /path/to/kitti2015/dataset 97 | ``` 98 | 99 | 100 | ## Mixed Domain Learning using MNIST+SVHN 101 | 102 | #### Training 103 | For learning classification using Competitive Collaboration with two agents, Alice and Bob, run, 104 | ```bash 105 | python3 mnist.py path/to/download/mnist/svhn/datasets/ --name EXP_NAME --log-output --log-terminal --epoch-size 1000 --epochs 400 --wr 1000 106 | ``` 107 | 108 | #### Evaluation 109 | To evaluate the performance of Alice, Bob and Moderator trained using CC, run, 110 | ```bash 111 | python3 mnist_eval.py path/to/mnist/svhn/datasets --pretrained-alice pretrained/mnist_svhn/alice.pth.tar --pretrained-bob pretrained/mnist_svhn/bob.pth.tar --pretrained-mod pretrained/mnist_svhn/mod.pth.tar 112 | ``` 113 | 114 | 115 | ## Downloads 116 | #### Pretrained Models 117 | - [DispNet, PoseNet, MaskNet and FlowNet](https://keeper.mpdl.mpg.de/f/72e946daa4e0481fb735/?dl=1) in joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. 118 | - [Alice, Bob and Moderator](https://keeper.mpdl.mpg.de/f/d0c7d4ebd0d74b84bf10/?dl=1) in Mixed Domain Classification 119 | 120 | #### Evaluation Data 121 | - [Semantic Labels for KITTI](https://keeper.mpdl.mpg.de/f/239c2dda94e54c449401/?dl=1) 122 | 123 | ## Acknowlegements 124 | We thank Frederik Kunstner for verifying the convergence proofs. We are grateful to Clement Pinard for his [github repository](https://github.com/ClementPinard/SfmLearner-Pytorch). We use it as our initial code base. We thank Georgios Pavlakos for helping us with several revisions of the paper. We thank Joel Janai for preparing optical flow visualizations, and Clement Gorard for his Make3d evaluation code. 125 | 126 | 127 | ## References 128 | *Anurag Ranjan, Varun Jampani, Lukas Balles, Deqing Sun, Kihwan Kim, Jonas Wulff and Michael J. Black.* **Competitive Collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation.** CVPR 2019. 129 | -------------------------------------------------------------------------------- /custom_transforms.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | import torch 3 | import random 4 | import numpy as np 5 | from scipy.misc import imresize, imrotate 6 | 7 | '''Set of tranform random routines that takes list of inputs as arguments, 8 | in order to have random but coherent transformations.''' 9 | 10 | 11 | class Compose(object): 12 | def __init__(self, transforms): 13 | self.transforms = transforms 14 | 15 | def __call__(self, images, intrinsics): 16 | for t in self.transforms: 17 | images, intrinsics = t(images, intrinsics) 18 | return images, intrinsics 19 | 20 | 21 | class Normalize(object): 22 | def __init__(self, mean, std): 23 | self.mean = mean 24 | self.std = std 25 | 26 | def __call__(self, images, intrinsics): 27 | for tensor in images: 28 | for t, m, s in zip(tensor, self.mean, self.std): 29 | t.sub_(m).div_(s) 30 | return images, intrinsics 31 | 32 | 33 | class NormalizeLocally(object): 34 | 35 | def __call__(self, images, intrinsics): 36 | image_tensor = torch.stack(images) 37 | assert(image_tensor.size(1)==3) #3 channel image 38 | mean = image_tensor.transpose(0,1).contiguous().view(3, -1).mean(1) 39 | std = image_tensor.transpose(0,1).contiguous().view(3, -1).std(1) 40 | 41 | for tensor in images: 42 | for t, m, s in zip(tensor, mean, std): 43 | t.sub_(m).div_(s) 44 | return images, intrinsics 45 | 46 | 47 | class ArrayToTensor(object): 48 | """Converts a list of numpy.ndarray (H x W x C) along with a intrinsics matrix to a list of torch.FloatTensor of shape (C x H x W) with a intrinsics tensor.""" 49 | 50 | def __call__(self, images, intrinsics): 51 | tensors = [] 52 | for im in images: 53 | # put it from HWC to CHW format 54 | im = np.transpose(im, (2, 0, 1)) 55 | # handle numpy array 56 | tensors.append(torch.from_numpy(im).float()/255) 57 | return tensors, intrinsics 58 | 59 | 60 | class RandomHorizontalFlip(object): 61 | """Randomly horizontally flips the given numpy array with a probability of 0.5""" 62 | 63 | def __call__(self, images, intrinsics): 64 | assert intrinsics is not None 65 | if random.random() < 0.5: 66 | output_intrinsics = np.copy(intrinsics) 67 | output_images = [np.copy(np.fliplr(im)) for im in images] 68 | w = output_images[0].shape[1] 69 | output_intrinsics[0,2] = w - output_intrinsics[0,2] 70 | else: 71 | output_images = images 72 | output_intrinsics = intrinsics 73 | return output_images, output_intrinsics 74 | 75 | class RandomRotate(object): 76 | """Randomly rotates images up to 10 degrees and crop them to keep same size as before.""" 77 | def __call__(self, images, intrinsics): 78 | if np.random.random() > 0.5: 79 | return images, intrinsics 80 | else: 81 | assert intrinsics is not None 82 | rot = np.random.uniform(0,10) 83 | rotated_images = [imrotate(im, rot) for im in images] 84 | 85 | return rotated_images, intrinsics 86 | 87 | 88 | 89 | 90 | class RandomScaleCrop(object): 91 | """Randomly zooms images up to 15% and crop them to keep same size as before.""" 92 | def __init__(self, h=0, w=0): 93 | self.h = h 94 | self.w = w 95 | 96 | def __call__(self, images, intrinsics): 97 | assert intrinsics is not None 98 | output_intrinsics = np.copy(intrinsics) 99 | 100 | in_h, in_w, _ = images[0].shape 101 | x_scaling, y_scaling = np.random.uniform(1,1.1,2) 102 | scaled_h, scaled_w = int(in_h * y_scaling), int(in_w * x_scaling) 103 | 104 | output_intrinsics[0] *= x_scaling 105 | output_intrinsics[1] *= y_scaling 106 | scaled_images = [imresize(im, (scaled_h, scaled_w)) for im in images] 107 | 108 | if self.h and self.w: 109 | in_h, in_w = self.h, self.w 110 | 111 | offset_y = np.random.randint(scaled_h - in_h + 1) 112 | offset_x = np.random.randint(scaled_w - in_w + 1) 113 | cropped_images = [im[offset_y:offset_y + in_h, offset_x:offset_x + in_w] for im in scaled_images] 114 | 115 | output_intrinsics[0,2] -= offset_x 116 | output_intrinsics[1,2] -= offset_y 117 | 118 | return cropped_images, output_intrinsics 119 | 120 | class Scale(object): 121 | """Scales images to a particular size""" 122 | def __init__(self, h, w): 123 | self.h = h 124 | self.w = w 125 | 126 | def __call__(self, images, intrinsics): 127 | assert intrinsics is not None 128 | output_intrinsics = np.copy(intrinsics) 129 | 130 | in_h, in_w, _ = images[0].shape 131 | scaled_h, scaled_w = self.h , self.w 132 | 133 | output_intrinsics[0] *= (scaled_w / in_w) 134 | output_intrinsics[1] *= (scaled_h / in_h) 135 | scaled_images = [imresize(im, (scaled_h, scaled_w)) for im in images] 136 | 137 | return scaled_images, output_intrinsics 138 | -------------------------------------------------------------------------------- /data/cityscapes_loader.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | import json 3 | import numpy as np 4 | import scipy.misc 5 | from path import Path 6 | from tqdm import tqdm 7 | 8 | 9 | class cityscapes_loader(object): 10 | def __init__(self, 11 | dataset_dir, 12 | split='train', 13 | crop_bottom=True, # Get rid of the car logo 14 | img_height=171, 15 | img_width=416): 16 | self.dataset_dir = Path(dataset_dir) 17 | self.split = split 18 | # Crop out the bottom 25% of the image to remove the car logo 19 | self.crop_bottom = crop_bottom 20 | self.img_height = img_height 21 | self.img_width = img_width 22 | self.min_speed = 2 23 | self.scenes = (self.dataset_dir/'leftImg8bit_sequence'/split).dirs() 24 | print('Total scenes collected: {}'.format(len(self.scenes))) 25 | 26 | def collect_scenes(self, city): 27 | img_files = sorted(city.files('*.png')) 28 | scenes = {} 29 | connex_scenes = {} 30 | connex_scene_data_list = [] 31 | for f in img_files: 32 | scene_id,frame_id = f.basename().split('_')[1:3] 33 | if scene_id not in scenes.keys(): 34 | scenes[scene_id] = [] 35 | scenes[scene_id].append(frame_id) 36 | 37 | # divide scenes into connexe sequences 38 | for scene_id in scenes.keys(): 39 | previous = None 40 | connex_scenes[scene_id] = [] 41 | for id in scenes[scene_id]: 42 | if previous is None or int(id) - int(previous) > 1: 43 | current_list = [] 44 | connex_scenes[scene_id].append(current_list) 45 | current_list.append(id) 46 | previous = id 47 | 48 | # create scene data dicts, and subsample scene every two frames 49 | for scene_id in connex_scenes.keys(): 50 | intrinsics = self.load_intrinsics(city, scene_id) 51 | for subscene in connex_scenes[scene_id]: 52 | frame_speeds = [self.load_speed(city, scene_id, frame_id) for frame_id in subscene] 53 | connex_scene_data_list.append({'city':city, 54 | 'scene_id': scene_id, 55 | 'rel_path': city.basename()+'_'+scene_id+'_'+subscene[0]+'_0', 56 | 'intrinsics': intrinsics, 57 | 'frame_ids':subscene[0::2], 58 | 'speeds':frame_speeds[0::2]}) 59 | connex_scene_data_list.append({'city':city, 60 | 'scene_id': scene_id, 61 | 'rel_path': city.basename()+'_'+scene_id+'_'+subscene[0]+'_1', 62 | 'intrinsics': intrinsics, 63 | 'frame_ids': subscene[1::2], 64 | 'speeds': frame_speeds[1::2]}) 65 | return connex_scene_data_list 66 | 67 | def load_intrinsics(self, city, scene_id): 68 | city_name = city.basename() 69 | camera_folder = self.dataset_dir/'camera'/self.split/city_name 70 | camera_file = camera_folder.files('{}_{}_*_camera.json'.format(city_name, scene_id))[0] 71 | frame_id = camera_file.split('_')[2] 72 | frame_path = city/'{}_{}_{}_leftImg8bit.png'.format(city_name, scene_id, frame_id) 73 | 74 | with open(camera_file, 'r') as f: 75 | camera = json.load(f) 76 | fx = camera['intrinsic']['fx'] 77 | fy = camera['intrinsic']['fy'] 78 | u0 = camera['intrinsic']['u0'] 79 | v0 = camera['intrinsic']['v0'] 80 | intrinsics = np.array([[fx, 0, u0], 81 | [0, fy, v0], 82 | [0, 0, 1]]) 83 | 84 | img = scipy.misc.imread(frame_path) 85 | h,w,_ = img.shape 86 | zoom_y = self.img_height/h 87 | zoom_x = self.img_width/w 88 | 89 | intrinsics[0] *= zoom_x 90 | intrinsics[1] *= zoom_y 91 | return intrinsics 92 | 93 | def load_speed(self, city, scene_id, frame_id): 94 | city_name = city.basename() 95 | vehicle_folder = self.dataset_dir/'vehicle_sequence'/self.split/city_name 96 | vehicle_file = vehicle_folder/'{}_{}_{}_vehicle.json'.format(city_name, scene_id, frame_id) 97 | with open(vehicle_file, 'r') as f: 98 | vehicle = json.load(f) 99 | return vehicle['speed'] 100 | 101 | def get_scene_imgs(self, scene_data): 102 | cum_speed = np.zeros(3) 103 | print(scene_data['city'].basename(), scene_data['scene_id'], scene_data['frame_ids'][0]) 104 | for i,frame_id in enumerate(scene_data['frame_ids']): 105 | cum_speed += scene_data['speeds'][i] 106 | speed_mag = np.linalg.norm(cum_speed) 107 | if speed_mag > self.min_speed: 108 | yield self.load_image(scene_data['city'], scene_data['scene_id'], frame_id), frame_id 109 | cum_speed *= 0 110 | 111 | def load_image(self, city, scene_id, frame_id): 112 | img_file = city/'{}_{}_{}_leftImg8bit.png'.format(city.basename(), 113 | scene_id, 114 | frame_id) 115 | if not img_file.isfile(): 116 | return None 117 | img = scipy.misc.imread(img_file) 118 | img = scipy.misc.imresize(img, (self.img_height, self.img_width))[:int(self.img_height*0.75)] 119 | return img 120 | -------------------------------------------------------------------------------- /data/kitti_raw_loader.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from path import Path 3 | import scipy.misc 4 | from collections import Counter 5 | 6 | 7 | class KittiRawLoader(object): 8 | def __init__(self, 9 | dataset_dir, 10 | static_frames_file=None, 11 | img_height=128, 12 | img_width=416, 13 | min_speed=2, 14 | get_gt=False): 15 | dir_path = Path(__file__).realpath().dirname() 16 | test_scene_file = dir_path/'test_scenes.txt' 17 | 18 | self.from_speed = static_frames_file is None 19 | if static_frames_file is not None: 20 | static_frames_file = Path(static_frames_file) 21 | self.collect_static_frames(static_frames_file) 22 | 23 | with open(test_scene_file, 'r') as f: 24 | test_scenes = f.readlines() 25 | self.test_scenes = [t[:-1] for t in test_scenes] 26 | self.dataset_dir = Path(dataset_dir) 27 | self.img_height = img_height 28 | self.img_width = img_width 29 | self.cam_ids = ['02', '03'] 30 | self.date_list = ['2011_09_26', '2011_09_28', '2011_09_29', '2011_09_30', '2011_10_03'] 31 | self.min_speed = min_speed 32 | self.get_gt = get_gt 33 | self.collect_train_folders() 34 | 35 | def collect_static_frames(self, static_frames_file): 36 | with open(static_frames_file, 'r') as f: 37 | frames = f.readlines() 38 | self.static_frames = {} 39 | for fr in frames: 40 | if fr == '\n': 41 | continue 42 | date, drive, frame_id = fr.split(' ') 43 | curr_fid = '%.10d' % (np.int(frame_id[:-1])) 44 | if drive not in self.static_frames.keys(): 45 | self.static_frames[drive] = [] 46 | self.static_frames[drive].append(curr_fid) 47 | 48 | def collect_train_folders(self): 49 | self.scenes = [] 50 | for date in self.date_list: 51 | drive_set = (self.dataset_dir/date).dirs() 52 | for dr in drive_set: 53 | if dr.name[:-5] not in self.test_scenes: 54 | self.scenes.append(dr) 55 | 56 | def collect_scenes(self, drive): 57 | train_scenes = [] 58 | for c in self.cam_ids: 59 | oxts = sorted((drive/'oxts'/'data').files('*.txt')) 60 | scene_data = {'cid': c, 'dir': drive, 'speed': [], 'frame_id': [], 'rel_path': drive.name + '_' + c} 61 | for n, f in enumerate(oxts): 62 | metadata = np.genfromtxt(f) 63 | speed = metadata[8:11] 64 | scene_data['speed'].append(speed) 65 | scene_data['frame_id'].append('{:010d}'.format(n)) 66 | sample = self.load_image(scene_data, 0) 67 | if sample is None: 68 | return [] 69 | scene_data['P_rect'] = self.get_P_rect(scene_data, sample[1], sample[2]) 70 | scene_data['intrinsics'] = scene_data['P_rect'][:,:3] 71 | 72 | train_scenes.append(scene_data) 73 | return train_scenes 74 | 75 | def get_scene_imgs(self, scene_data): 76 | def construct_sample(scene_data, i, frame_id): 77 | sample = [self.load_image(scene_data, i)[0], frame_id] 78 | if self.get_gt: 79 | sample.append(self.generate_depth_map(scene_data, i)) 80 | return sample 81 | 82 | if self.from_speed: 83 | cum_speed = np.zeros(3) 84 | for i, speed in enumerate(scene_data['speed']): 85 | cum_speed += speed 86 | speed_mag = np.linalg.norm(cum_speed) 87 | if speed_mag > self.min_speed: 88 | frame_id = scene_data['frame_id'][i] 89 | yield construct_sample(scene_data, i, frame_id) 90 | cum_speed *= 0 91 | else: # from static frame file 92 | drive = str(scene_data['dir'].name) 93 | for (i,frame_id) in enumerate(scene_data['frame_id']): 94 | if (drive not in self.static_frames.keys()) or (frame_id not in self.static_frames[drive]): 95 | yield construct_sample(scene_data, i, frame_id) 96 | 97 | def get_P_rect(self, scene_data, zoom_x, zoom_y): 98 | #print(zoom_x, zoom_y) 99 | calib_file = scene_data['dir'].parent/'calib_cam_to_cam.txt' 100 | 101 | filedata = self.read_raw_calib_file(calib_file) 102 | P_rect = np.reshape(filedata['P_rect_' + scene_data['cid']], (3, 4)) 103 | P_rect[0] *= zoom_x 104 | P_rect[1] *= zoom_y 105 | return P_rect 106 | 107 | def load_image(self, scene_data, tgt_idx): 108 | img_file = scene_data['dir']/'image_{}'.format(scene_data['cid'])/'data'/scene_data['frame_id'][tgt_idx]+'.png' 109 | if not img_file.isfile(): 110 | return None 111 | img = scipy.misc.imread(img_file) 112 | zoom_y = self.img_height/img.shape[0] 113 | zoom_x = self.img_width/img.shape[1] 114 | img = scipy.misc.imresize(img, (self.img_height, self.img_width)) 115 | return img, zoom_x, zoom_y 116 | 117 | def read_raw_calib_file(self, filepath): 118 | # From https://github.com/utiasSTARS/pykitti/blob/master/pykitti/utils.py 119 | """Read in a calibration file and parse into a dictionary.""" 120 | data = {} 121 | 122 | with open(filepath, 'r') as f: 123 | for line in f.readlines(): 124 | key, value = line.split(':', 1) 125 | # The only non-float values in these files are dates, which 126 | # we don't care about anyway 127 | try: 128 | data[key] = np.array([float(x) for x in value.split()]) 129 | except ValueError: 130 | pass 131 | return data 132 | 133 | def generate_depth_map(self, scene_data, tgt_idx): 134 | # compute projection matrix velodyne->image plane 135 | 136 | def sub2ind(matrixSize, rowSub, colSub): 137 | m, n = matrixSize 138 | return rowSub * (n-1) + colSub - 1 139 | 140 | R_cam2rect = np.eye(4) 141 | 142 | calib_dir = scene_data['dir'].parent 143 | cam2cam = self.read_raw_calib_file(calib_dir/'calib_cam_to_cam.txt') 144 | velo2cam = self.read_raw_calib_file(calib_dir/'calib_velo_to_cam.txt') 145 | velo2cam = np.hstack((velo2cam['R'].reshape(3,3), velo2cam['T'][..., np.newaxis])) 146 | velo2cam = np.vstack((velo2cam, np.array([0, 0, 0, 1.0]))) 147 | P_rect = scene_data['P_rect'] 148 | R_cam2rect[:3,:3] = cam2cam['R_rect_00'].reshape(3,3) 149 | 150 | P_velo2im = np.dot(np.dot(P_rect, R_cam2rect), velo2cam) 151 | 152 | velo_file_name = scene_data['dir']/'velodyne_points'/'data'/'{}.bin'.format(scene_data['frame_id'][tgt_idx]) 153 | 154 | # load velodyne points and remove all behind image plane (approximation) 155 | # each row of the velodyne data is forward, left, up, reflectance 156 | velo = np.fromfile(velo_file_name, dtype=np.float32).reshape(-1, 4) 157 | velo[:,3] = 1 158 | velo = velo[velo[:, 0] >= 0, :] 159 | 160 | # project the points to the camera 161 | velo_pts_im = np.dot(P_velo2im, velo.T).T 162 | velo_pts_im[:, :2] = velo_pts_im[:,:2] / velo_pts_im[:,-1:] 163 | 164 | # check if in bounds 165 | # use minus 1 to get the exact same value as KITTI matlab code 166 | velo_pts_im[:, 0] = np.round(velo_pts_im[:,0]) - 1 167 | velo_pts_im[:, 1] = np.round(velo_pts_im[:,1]) - 1 168 | 169 | val_inds = (velo_pts_im[:, 0] >= 0) & (velo_pts_im[:, 1] >= 0) 170 | val_inds = val_inds & (velo_pts_im[:,0] < self.img_width) & (velo_pts_im[:,1] < self.img_height) 171 | velo_pts_im = velo_pts_im[val_inds, :] 172 | 173 | # project to image 174 | depth = np.zeros((self.img_height, self.img_width)).astype(np.float32) 175 | depth[velo_pts_im[:, 1].astype(np.int), velo_pts_im[:, 0].astype(np.int)] = velo_pts_im[:, 2] 176 | 177 | # find the duplicate points and choose the closest depth 178 | inds = sub2ind(depth.shape, velo_pts_im[:, 1], velo_pts_im[:, 0]) 179 | dupe_inds = [item for item, count in Counter(inds).items() if count > 1] 180 | for dd in dupe_inds: 181 | pts = np.where(inds == dd)[0] 182 | x_loc = int(velo_pts_im[pts[0], 0]) 183 | y_loc = int(velo_pts_im[pts[0], 1]) 184 | depth[y_loc, x_loc] = velo_pts_im[pts, 2].min() 185 | depth[depth < 0] = 0 186 | return depth 187 | -------------------------------------------------------------------------------- /data/prepare_train_data.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | import argparse 3 | import scipy.misc 4 | import numpy as np 5 | from joblib import Parallel, delayed 6 | from tqdm import tqdm 7 | from path import Path 8 | 9 | parser = argparse.ArgumentParser() 10 | parser.add_argument("dataset_dir", metavar='DIR', 11 | help='path to original dataset') 12 | parser.add_argument("--dataset-format", type=str, required=True, choices=["kitti", "cityscapes"]) 13 | parser.add_argument("--static-frames", default=None, 14 | help="list of imgs to discard for being static, if not set will discard them based on speed \ 15 | (careful, on KITTI some frames have incorrect speed)") 16 | parser.add_argument("--with-gt", action='store_true', 17 | help="If available (e.g. with KITTI), will store ground truth along with images, for validation") 18 | parser.add_argument("--dump-root", type=str, required=True, help="Where to dump the data") 19 | parser.add_argument("--height", type=int, default=128, help="image height") 20 | parser.add_argument("--width", type=int, default=416, help="image width") 21 | parser.add_argument("--num-threads", type=int, default=4, help="number of threads to use") 22 | 23 | args = parser.parse_args() 24 | 25 | 26 | def dump_example(scene): 27 | scene_list = data_loader.collect_scenes(scene) 28 | for scene_data in scene_list: 29 | dump_dir = args.dump_root/scene_data['rel_path'] 30 | dump_dir.makedirs_p() 31 | intrinsics = scene_data['intrinsics'] 32 | fx = intrinsics[0, 0] 33 | fy = intrinsics[1, 1] 34 | cx = intrinsics[0, 2] 35 | cy = intrinsics[1, 2] 36 | 37 | dump_cam_file = dump_dir/'cam.txt' 38 | with open(dump_cam_file, 'w') as f: 39 | f.write('%f,0.,%f,0.,%f,%f,0.,0.,1.' % (fx, cx, fy, cy)) 40 | 41 | for sample in data_loader.get_scene_imgs(scene_data): 42 | assert(len(sample) >= 2) 43 | img, frame_nb = sample[0], sample[1] 44 | dump_img_file = dump_dir/'{}.jpg'.format(frame_nb) 45 | scipy.misc.imsave(dump_img_file, img) 46 | if len(sample) == 3: 47 | dump_depth_file = dump_dir/'{}.npy'.format(frame_nb) 48 | np.save(dump_depth_file, sample[2]) 49 | 50 | if len(dump_dir.files('*.jpg')) < 3: 51 | dump_dir.rmtree() 52 | 53 | 54 | def main(): 55 | args.dump_root = Path(args.dump_root) 56 | args.dump_root.mkdir_p() 57 | 58 | global data_loader 59 | 60 | if args.dataset_format == 'kitti': 61 | from kitti_raw_loader import KittiRawLoader 62 | data_loader = KittiRawLoader(args.dataset_dir, 63 | static_frames_file=args.static_frames, 64 | img_height=args.height, 65 | img_width=args.width, 66 | get_gt=args.with_gt) 67 | 68 | if args.dataset_format == 'cityscapes': 69 | from cityscapes_loader import cityscapes_loader 70 | data_loader = cityscapes_loader(args.dataset_dir, 71 | img_height=args.height, 72 | img_width=args.width) 73 | 74 | print('Retrieving frames') 75 | Parallel(n_jobs=args.num_threads)(delayed(dump_example)(scene) for scene in tqdm(data_loader.scenes)) 76 | # Split into train/val 77 | print('Generating train val lists') 78 | np.random.seed(8964) 79 | subfolders = args.dump_root.dirs() 80 | with open(args.dump_root / 'train.txt', 'w') as tf: 81 | with open(args.dump_root / 'val.txt', 'w') as vf: 82 | for s in tqdm(subfolders): 83 | if np.random.random() < 0.1: 84 | vf.write('{}\n'.format(s.name)) 85 | else: 86 | tf.write('{}\n'.format(s.name)) 87 | # remove useless groundtruth data for training comment if you don't want to erase it 88 | for gt_file in s.files('*.npy'): 89 | gt_file.remove_p() 90 | 91 | 92 | if __name__ == '__main__': 93 | main() 94 | -------------------------------------------------------------------------------- /data/test_scenes.txt: -------------------------------------------------------------------------------- 1 | 2011_09_26_drive_0117 2 | 2011_09_28_drive_0002 3 | 2011_09_26_drive_0052 4 | 2011_09_30_drive_0016 5 | 2011_09_26_drive_0059 6 | 2011_09_26_drive_0027 7 | 2011_09_26_drive_0020 8 | 2011_09_26_drive_0009 9 | 2011_09_26_drive_0013 10 | 2011_09_26_drive_0101 11 | 2011_09_26_drive_0046 12 | 2011_09_26_drive_0029 13 | 2011_09_26_drive_0064 14 | 2011_09_26_drive_0048 15 | 2011_10_03_drive_0027 16 | 2011_09_26_drive_0002 17 | 2011_09_26_drive_0036 18 | 2011_09_29_drive_0071 19 | 2011_10_03_drive_0047 20 | 2011_09_30_drive_0027 21 | 2011_09_26_drive_0086 22 | 2011_09_26_drive_0084 23 | 2011_09_26_drive_0096 24 | 2011_09_30_drive_0018 25 | 2011_09_26_drive_0106 26 | 2011_09_26_drive_0056 27 | 2011_09_26_drive_0023 28 | 2011_09_26_drive_0093 29 | -------------------------------------------------------------------------------- /datasets/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anuragranj/cc/2b4e36292c18f8ee68ad5d210a4190f9adf881dc/datasets/__init__.py -------------------------------------------------------------------------------- /datasets/general_sequence_folders.py: -------------------------------------------------------------------------------- 1 | import torch.utils.data as data 2 | import numpy as np 3 | from scipy.misc import imread 4 | from path import Path 5 | import random 6 | 7 | def crawl_folders(folders_list, sequence_length): 8 | sequence_set = [] 9 | demi_length = (sequence_length-1)//2 10 | for folder in folders_list: 11 | #intrinsics = np.genfromtxt(folder/'cam.txt', delimiter=',').astype(np.float32).reshape((3, 3)) 12 | imgs = sorted(folder.files('*.jpg')) 13 | if len(imgs) < sequence_length: 14 | continue 15 | for i in range(demi_length, len(imgs)-demi_length): 16 | sample = {'tgt': imgs[i], 'ref_imgs': []} 17 | for j in range(-demi_length, demi_length + 1): 18 | if j != 0: 19 | sample['ref_imgs'].append(imgs[i+j]) 20 | sequence_set.append(sample) 21 | random.shuffle(sequence_set) 22 | return sequence_set 23 | 24 | 25 | def load_as_float(path): 26 | return imread(path).astype(np.float32) 27 | 28 | 29 | class SequenceFolder(data.Dataset): 30 | """A sequence data loader where the files are arranged in this way: 31 | root/scene_1/0000000.jpg 32 | root/scene_1/0000001.jpg 33 | .. 34 | root/scene_1/cam.txt 35 | root/scene_2/0000000.jpg 36 | . 37 | 38 | transform functions must take in a list a images and a numpy array (usually intrinsics matrix) 39 | """ 40 | 41 | def __init__(self, root, seed=None, train=True, sequence_length=3, transform=None, target_transform=None): 42 | np.random.seed(seed) 43 | random.seed(seed) 44 | self.root = Path(root) 45 | #scene_list_path = self.root/'train.txt' if train else self.root/'val.txt' 46 | self.scenes = self.root.dirs() 47 | self.samples = crawl_folders(self.scenes, sequence_length) 48 | self.transform = transform 49 | 50 | def __getitem__(self, index): 51 | sample = self.samples[index] 52 | tgt_img = load_as_float(sample['tgt']) 53 | ref_imgs = [load_as_float(ref_img) for ref_img in sample['ref_imgs']] 54 | if self.transform is not None: 55 | imgs, intrinsics = self.transform([tgt_img] + ref_imgs, np.copy(sample['intrinsics'])) 56 | tgt_img = imgs[0] 57 | ref_imgs = imgs[1:] 58 | else: 59 | intrinsics = np.copy(sample['intrinsics']) 60 | return tgt_img, ref_imgs, intrinsics, np.linalg.inv(intrinsics) 61 | 62 | def __len__(self): 63 | return len(self.samples) 64 | -------------------------------------------------------------------------------- /datasets/sequence_folders.py: -------------------------------------------------------------------------------- 1 | import torch.utils.data as data 2 | import numpy as np 3 | from scipy.misc import imread 4 | from path import Path 5 | import random 6 | 7 | 8 | def crawl_folders(folders_list, sequence_length): 9 | sequence_set = [] 10 | demi_length = (sequence_length-1)//2 11 | for folder in folders_list: 12 | intrinsics = np.genfromtxt(folder/'cam.txt', delimiter=',').astype(np.float32).reshape((3, 3)) 13 | imgs = sorted(folder.files('*.jpg')) 14 | if len(imgs) < sequence_length: 15 | continue 16 | for i in range(demi_length, len(imgs)-demi_length): 17 | sample = {'intrinsics': intrinsics, 'tgt': imgs[i], 'ref_imgs': []} 18 | for j in range(-demi_length, demi_length + 1): 19 | if j != 0: 20 | sample['ref_imgs'].append(imgs[i+j]) 21 | sequence_set.append(sample) 22 | random.shuffle(sequence_set) 23 | return sequence_set 24 | 25 | 26 | def load_as_float(path): 27 | return imread(path).astype(np.float32) 28 | 29 | 30 | class SequenceFolder(data.Dataset): 31 | """A sequence data loader where the files are arranged in this way: 32 | root/scene_1/0000000.jpg 33 | root/scene_1/0000001.jpg 34 | .. 35 | root/scene_1/cam.txt 36 | root/scene_2/0000000.jpg 37 | . 38 | 39 | transform functions must take in a list a images and a numpy array (usually intrinsics matrix) 40 | """ 41 | 42 | def __init__(self, root, seed=None, train=True, sequence_length=3, transform=None, target_transform=None): 43 | np.random.seed(seed) 44 | random.seed(seed) 45 | self.root = Path(root) 46 | scene_list_path = self.root/'train.txt' if train else self.root/'val.txt' 47 | self.scenes = [self.root/folder[:-1] for folder in open(scene_list_path)] 48 | self.samples = crawl_folders(self.scenes, sequence_length) 49 | self.transform = transform 50 | 51 | def __getitem__(self, index): 52 | sample = self.samples[index] 53 | tgt_img = load_as_float(sample['tgt']) 54 | ref_imgs = [load_as_float(ref_img) for ref_img in sample['ref_imgs']] 55 | if self.transform is not None: 56 | imgs, intrinsics = self.transform([tgt_img] + ref_imgs, np.copy(sample['intrinsics'])) 57 | tgt_img = imgs[0] 58 | ref_imgs = imgs[1:] 59 | else: 60 | intrinsics = np.copy(sample['intrinsics']) 61 | return tgt_img, ref_imgs, intrinsics, np.linalg.inv(intrinsics) 62 | 63 | def __len__(self): 64 | return len(self.samples) 65 | -------------------------------------------------------------------------------- /datasets/stacked_sequence_folders.py: -------------------------------------------------------------------------------- 1 | import torch.utils.data as data 2 | import numpy as np 3 | from scipy.misc import imread 4 | from path import Path 5 | import random 6 | 7 | 8 | def crawl_folders(folders_list, sequence_length): 9 | sequence_set = [] 10 | demi_length = (sequence_length-1)//2 11 | for folder in folders_list: 12 | intrinsics = [np.genfromtxt(cam_file, delimiter=',').astype(np.float32).reshape((3, 3)) for cam_file in sorted(folder.files('*_cam.txt'))] 13 | imgs = sorted(folder.files('*.jpg')) 14 | for i in range(len(imgs)): 15 | sample = {'intrinsics': intrinsics[i], 'img_stack': imgs[i]} 16 | sequence_set.append(sample) 17 | random.shuffle(sequence_set) 18 | return sequence_set 19 | 20 | 21 | def load_as_float(path, sequence_length): 22 | stack = imread(path).astype(np.float32) 23 | h,w,_ = stack.shape 24 | w_img = int(w/(sequence_length)) 25 | imgs = [stack[:,i*w_img:(i+1)*w_img] for i in range(sequence_length)] 26 | tgt_index = sequence_length//2 27 | return([imgs[tgt_index]] + imgs[:tgt_index] + imgs[tgt_index+1:]) 28 | 29 | 30 | class SequenceFolder(data.Dataset): 31 | """A sequence data loader where the images are arranged in this way: 32 | root/scene_1/0000000.jpg 33 | root/scene_1/0000000_cam.txt 34 | root/scene_1/0000001.jpg 35 | root/scene_1/0000001_cam.txt 36 | . 37 | root/scene_2/0000000.jpg 38 | root/scene_2/0000000_cam.txt 39 | """ 40 | 41 | def __init__(self, root, seed=None, train=True, sequence_length=3, transform=None, target_transform=None): 42 | np.random.seed(seed) 43 | random.seed(seed) 44 | self.root = Path(root) 45 | self.samples = [] 46 | frames_list_path = self.root/'train.txt' if train else self.root/'val.txt' 47 | self.scenes = self.root.dirs() 48 | self.sequence_length = sequence_length 49 | for frame_path in open(frames_list_path): 50 | a,b = frame_path[:-1].split(' ') 51 | base_path = (self.root/a)/b 52 | intrinsics = np.genfromtxt(base_path+'_cam.txt', delimiter=',').astype(np.float32).reshape((3, 3)) 53 | sample = {'intrinsics': intrinsics, 'img_stack': base_path+'.jpg'} 54 | self.samples.append(sample) 55 | self.transform = transform 56 | 57 | def __getitem__(self, index): 58 | sample = self.samples[index] 59 | imgs = load_as_float(sample['img_stack'], self.sequence_length) 60 | if self.transform is not None: 61 | imgs, intrinsics = self.transform(imgs, np.copy(sample['intrinsics'])) 62 | else: 63 | intrinsics = sample['intrinsics'] 64 | return imgs[0], imgs[1:], intrinsics, np.linalg.inv(intrinsics) 65 | 66 | def __len__(self): 67 | return len(self.samples) 68 | -------------------------------------------------------------------------------- /datasets/validation_folders.py: -------------------------------------------------------------------------------- 1 | import torch.utils.data as data 2 | import numpy as np 3 | from scipy.misc import imread 4 | from path import Path 5 | import torch 6 | 7 | 8 | def crawl_folders(folders_list): 9 | imgs = [] 10 | depth = [] 11 | for folder in folders_list: 12 | current_imgs = sorted(folder.files('*.jpg')) 13 | current_depth = [] 14 | for img in current_imgs: 15 | d = img.dirname()/(img.name[:-4] + '.npy') 16 | assert(d.isfile()), "depth file {} not found".format(str(d)) 17 | depth.append(d) 18 | imgs.extend(current_imgs) 19 | depth.extend(current_depth) 20 | return imgs, depth 21 | 22 | def crawl_folders_seq(folders_list, sequence_length): 23 | imgs1 = [] 24 | imgs2 = [] 25 | depth = [] 26 | for folder in folders_list: 27 | current_imgs = sorted(folder.files('*.jpg')) 28 | current_imgs1 = current_imgs[:-1] 29 | current_imgs2 = current_imgs[1:] 30 | current_depth = [] 31 | for (img1,img2) in zip(current_imgs1, current_imgs2): 32 | d = img1.dirname()/(img1.name[:-4] + '.npy') 33 | assert(d.isfile()), "depth file {} not found".format(str(d)) 34 | depth.append(d) 35 | imgs1.extend(current_imgs1) 36 | imgs2.extend(current_imgs2) 37 | depth.extend(current_depth) 38 | return imgs1, imgs2, depth 39 | 40 | 41 | def load_as_float(path): 42 | return imread(path).astype(np.float32) 43 | 44 | 45 | class ValidationSet(data.Dataset): 46 | """A sequence data loader where the files are arranged in this way: 47 | root/scene_1/0000000.jpg 48 | root/scene_1/0000000.npy 49 | root/scene_1/0000001.jpg 50 | root/scene_1/0000001.npy 51 | .. 52 | root/scene_2/0000000.jpg 53 | root/scene_2/0000000.npy 54 | . 55 | 56 | transform functions must take in a list a images and a numpy array which can be None 57 | """ 58 | 59 | def __init__(self, root, transform=None): 60 | self.root = Path(root) 61 | scene_list_path = self.root/'val.txt' 62 | self.scenes = [self.root/folder[:-1] for folder in open(scene_list_path)] 63 | self.imgs, self.depth = crawl_folders(self.scenes) 64 | self.transform = transform 65 | 66 | def __getitem__(self, index): 67 | img = load_as_float(self.imgs[index]) 68 | depth = np.load(self.depth[index]).astype(np.float32) 69 | if self.transform is not None: 70 | img, _ = self.transform([img], None) 71 | img = img[0] 72 | return img, depth 73 | 74 | def __len__(self): 75 | return len(self.imgs) 76 | 77 | class ValidationSetSeq(data.Dataset): 78 | """A sequence data loader where the files are arranged in this way: 79 | root/scene_1/0000000.jpg 80 | root/scene_1/0000000.npy 81 | root/scene_1/0000001.jpg 82 | root/scene_1/0000001.npy 83 | .. 84 | root/scene_2/0000000.jpg 85 | root/scene_2/0000000.npy 86 | . 87 | 88 | transform functions must take in a list a images and a numpy array which can be None 89 | """ 90 | 91 | def __init__(self, root, transform=None): 92 | self.root = Path(root) 93 | scene_list_path = self.root/'val.txt' 94 | self.scenes = [self.root/folder[:-1] for folder in open(scene_list_path)] 95 | self.imgs1, self.imgs2, self.depth = crawl_folders_seq(self.scenes) 96 | self.transform = transform 97 | 98 | def __getitem__(self, index): 99 | img1 = load_as_float(self.imgs1[index]) 100 | img2 = load_as_float(self.imgs2[index]) 101 | depth = np.load(self.depth[index]).astype(np.float32) 102 | if self.transform is not None: 103 | img, _ = self.transform([img1, img2], None) 104 | img1, img2 = img[0], img[1] 105 | return (img1, img2), depth 106 | 107 | def __len__(self): 108 | return len(self.imgs1) 109 | -------------------------------------------------------------------------------- /evaluate_flow.py: -------------------------------------------------------------------------------- 1 | # Author: Anurag Ranjan 2 | # Copyright (c) 2019, Anurag Ranjan 3 | # All rights reserved. 4 | 5 | import argparse 6 | import os 7 | from tqdm import tqdm 8 | import numpy as np 9 | from path import Path 10 | from flowutils import flow_io 11 | from logger import AverageMeter 12 | epsilon = 1e-8 13 | parser = argparse.ArgumentParser(description='Benchmark optical flow predictions', 14 | formatter_class=argparse.ArgumentDefaultsHelpFormatter) 15 | parser.add_argument('--output-dir', dest='output_dir', type=str, default=None, help='path to output directory') 16 | parser.add_argument('--gt-dir', dest='gt_dir', type=str, default=None, help='path to gt directory') 17 | parser.add_argument('-N', dest='N', type=int, default=200, help='number of samples') 18 | 19 | 20 | def main(): 21 | global args 22 | args = parser.parse_args() 23 | 24 | args.output_dir = Path(args.output_dir) 25 | args.gt_dir = Path(args.gt_dir) 26 | 27 | error_names = ['epe_total', 'outliers'] 28 | errors = AverageMeter(i=len(error_names)) 29 | 30 | for i in tqdm(range(args.N)): 31 | gt_flow_path = args.gt_dir.joinpath(str(i).zfill(6)+'_10.png') 32 | output_flow_path = args.output_dir.joinpath(str(i).zfill(6)+'_10.png') 33 | u_gt,v_gt,valid_gt = flow_io.flow_read_png(gt_flow_path) 34 | u_pred,v_pred,valid_pred = flow_io.flow_read_png(output_flow_path) 35 | 36 | _errors = compute_err(u_gt, v_gt, valid_gt, u_pred, v_pred, valid_pred) 37 | errors.update(_errors) 38 | 39 | 40 | print("Results") 41 | print("\t {:>10}, {:>10} ".format(*error_names)) 42 | print("Errors \t {:10.4f}, {:10.4f}".format(*errors.avg)) 43 | 44 | def compute_err(u_gt, v_gt, valid_gt, u_pred, v_pred, valid_pred, tau=[3,0.05]): 45 | epe = np.sqrt(np.power((u_gt - u_pred), 2) + np.power((v_gt - v_pred), 2)) 46 | epe = epe * valid_gt 47 | aepe = epe.sum() / valid_gt.sum() 48 | F_mag = np.sqrt(np.power(u_gt, 2)+ np.power(v_gt, 2)) 49 | E_0 = (epe > tau[0])#.type_as(epe) 50 | E_1 = ((epe / (F_mag+epsilon) ) > tau[1])#.type_as(epe) 51 | n_err = E_0 * E_1 * valid_gt 52 | f_err = n_err.sum()/valid_gt.sum() 53 | return [aepe, f_err] 54 | 55 | 56 | if __name__ == '__main__': 57 | main() 58 | -------------------------------------------------------------------------------- /flowutils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/anuragranj/cc/2b4e36292c18f8ee68ad5d210a4190f9adf881dc/flowutils/__init__.py -------------------------------------------------------------------------------- /flowutils/flow_io.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python2 2 | 3 | """ 4 | I/O script to save and load the data coming with the MPI-Sintel low-level 5 | computer vision benchmark. 6 | 7 | For more details about the benchmark, please visit www.mpi-sintel.de 8 | 9 | CHANGELOG: 10 | v1.0 (2015/02/03): First release 11 | 12 | Copyright (c) 2015 Jonas Wulff 13 | Max Planck Institute for Intelligent Systems, Tuebingen, Germany 14 | 15 | """ 16 | 17 | # Requirements: Numpy as PIL/Pillow 18 | import numpy as np 19 | try: 20 | import png 21 | has_png = True 22 | except: 23 | has_png = False 24 | png=None 25 | 26 | 27 | 28 | # Check for endianness, based on Daniel Scharstein's optical flow code. 29 | # Using little-endian architecture, these two should be equal. 30 | TAG_FLOAT = 202021.25 31 | TAG_CHAR = 'PIEH'.encode() 32 | 33 | def flow_read(filename, return_validity=False): 34 | """ Read optical flow from file, return (U,V) tuple. 35 | 36 | Original code by Deqing Sun, adapted from Daniel Scharstein. 37 | """ 38 | f = open(filename,'rb') 39 | check = np.fromfile(f,dtype=np.float32,count=1)[0] 40 | assert check == TAG_FLOAT, ' flow_read:: Wrong tag in flow file (should be: {0}, is: {1}). Big-endian machine? '.format(TAG_FLOAT,check) 41 | width = np.fromfile(f,dtype=np.int32,count=1)[0] 42 | height = np.fromfile(f,dtype=np.int32,count=1)[0] 43 | size = width*height 44 | assert width > 0 and height > 0 and size > 1 and size < 100000000, ' flow_read:: Wrong input size (width = {0}, height = {1}).'.format(width,height) 45 | tmp = np.fromfile(f,dtype=np.float32,count=-1).reshape((height,width*2)) 46 | u = tmp[:,np.arange(width)*2] 47 | v = tmp[:,np.arange(width)*2 + 1] 48 | 49 | if return_validity: 50 | valid = u<1e19 51 | u[valid==0] = 0 52 | v[valid==0] = 0 53 | return u,v,valid 54 | else: 55 | return u,v 56 | 57 | def flow_write(filename,uv,v=None): 58 | """ Write optical flow to file. 59 | 60 | If v is None, uv is assumed to contain both u and v channels, 61 | stacked in depth. 62 | 63 | Original code by Deqing Sun, adapted from Daniel Scharstein. 64 | """ 65 | nBands = 2 66 | 67 | if v is None: 68 | uv_ = np.array(uv) 69 | assert(uv_.ndim==3) 70 | if uv_.shape[0] == 2: 71 | u = uv_[0,:,:] 72 | v = uv_[1,:,:] 73 | elif uv_.shape[2] == 2: 74 | u = uv_[:,:,0] 75 | v = uv_[:,:,1] 76 | else: 77 | raise UVError('Wrong format for flow input') 78 | else: 79 | u = uv 80 | 81 | assert(u.shape == v.shape) 82 | height,width = u.shape 83 | f = open(filename,'wb') 84 | # write the header 85 | f.write(TAG_CHAR) 86 | np.array(width).astype(np.int32).tofile(f) 87 | np.array(height).astype(np.int32).tofile(f) 88 | # arrange into matrix form 89 | tmp = np.zeros((height, width*nBands)) 90 | tmp[:,np.arange(width)*2] = u 91 | tmp[:,np.arange(width)*2 + 1] = v 92 | tmp.astype(np.float32).tofile(f) 93 | f.close() 94 | 95 | 96 | def flow_read_png(fpath): 97 | """ 98 | Read KITTI optical flow, returns u,v,valid mask 99 | 100 | """ 101 | if not has_png: 102 | print('Error. Please install the PyPNG library') 103 | return 104 | 105 | R = png.Reader(fpath) 106 | width,height,data,_ = R.asDirect() 107 | # This only worked with python2. 108 | #I = np.array(map(lambda x:x,data)).reshape((height,width,3)) 109 | I = np.array([x for x in data]).reshape((height,width,3)) 110 | u_ = I[:,:,0] 111 | v_ = I[:,:,1] 112 | valid = I[:,:,2] 113 | 114 | u = (u_.astype('float64')-2**15)/64.0 115 | v = (v_.astype('float64')-2**15)/64.0 116 | 117 | return u,v,valid 118 | 119 | 120 | def flow_write_png(fpath,u,v,valid=None): 121 | """ 122 | Write KITTI optical flow. 123 | 124 | """ 125 | if not has_png: 126 | print('Error. Please install the PyPNG library') 127 | return 128 | 129 | 130 | if valid==None: 131 | valid_ = np.ones(u.shape,dtype='uint16') 132 | else: 133 | valid_ = valid.astype('uint16') 134 | 135 | 136 | u = u.astype('float64') 137 | v = v.astype('float64') 138 | 139 | u_ = ((u*64.0)+2**15).astype('uint16') 140 | v_ = ((v*64.0)+2**15).astype('uint16') 141 | 142 | I = np.dstack((u_,v_,valid_)) 143 | 144 | W = png.Writer(width=u.shape[1], 145 | height=u.shape[0], 146 | bitdepth=16, 147 | planes=3) 148 | 149 | with open(fpath,'wb') as fil: 150 | W.write(fil,I.reshape((-1,3*u.shape[1]))) 151 | -------------------------------------------------------------------------------- /flowutils/flow_viz.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | from torchvision.transforms import ToTensor 4 | 5 | def batchComputeFlowImage(uv): 6 | flow_im = torch.zeros(uv.size(0), 3, uv.size(2), uv.size(3) ) 7 | uv_np = uv.numpy() 8 | for i in range(uv.size(0)): 9 | flow_im[i] = ToTensor()(computeFlowImage(uv_np[i][0], uv_np[i][1])) 10 | return flow_im 11 | 12 | def computeFlowImage(u,v,logscale=True,scaledown=6,output=False): 13 | """ 14 | topleft is zero, u is horiz, v is vertical 15 | red is 3 o'clock, yellow is 6, light blue is 9, blue/purple is 12 16 | """ 17 | colorwheel = makecolorwheel() 18 | ncols = colorwheel.shape[0] 19 | 20 | radius = np.sqrt(u**2 + v**2) 21 | if output: 22 | print("Maximum flow magnitude: %04f" % np.max(radius)) 23 | if logscale: 24 | radius = np.log(radius + 1) 25 | if output: 26 | print("Maximum flow magnitude (after log): %0.4f" % np.max(radius)) 27 | radius = radius / scaledown 28 | if output: 29 | print("Maximum flow magnitude (after scaledown): %0.4f" % np.max(radius)) 30 | rot = np.arctan2(-v, -u) / np.pi 31 | 32 | fk = (rot+1)/2 * (ncols-1) # -1~1 maped to 0~ncols 33 | k0 = fk.astype(np.uint8) # 0, 1, 2, ..., ncols 34 | 35 | k1 = k0+1 36 | k1[k1 == ncols] = 0 37 | 38 | f = fk - k0 39 | 40 | ncolors = colorwheel.shape[1] 41 | img = np.zeros(u.shape+(ncolors,)) 42 | for i in range(ncolors): 43 | tmp = colorwheel[:,i] 44 | col0 = tmp[k0] 45 | col1 = tmp[k1] 46 | col = (1-f)*col0 + f*col1 47 | 48 | idx = radius <= 1 49 | # increase saturation with radius 50 | col[idx] = 1 - radius[idx]*(1-col[idx]) 51 | # out of range 52 | col[~idx] *= 0.75 53 | img[:,:,i] = np.floor(255*col).astype(np.uint8) 54 | 55 | return img.astype(np.uint8) 56 | 57 | 58 | def makecolorwheel(): 59 | # Create a colorwheel for visualization 60 | RY = 15 61 | YG = 6 62 | GC = 4 63 | CB = 11 64 | BM = 13 65 | MR = 6 66 | 67 | ncols = RY + YG + GC + CB + BM + MR 68 | 69 | colorwheel = np.zeros((ncols,3)) 70 | 71 | col = 0 72 | # RY 73 | colorwheel[0:RY,0] = 1 74 | colorwheel[0:RY,1] = np.arange(0,1,1./RY) 75 | col += RY 76 | 77 | # YG 78 | colorwheel[col:col+YG,0] = np.arange(1,0,-1./YG) 79 | colorwheel[col:col+YG,1] = 1 80 | col += YG 81 | 82 | # GC 83 | colorwheel[col:col+GC,1] = 1 84 | colorwheel[col:col+GC,2] = np.arange(0,1,1./GC) 85 | col += GC 86 | 87 | # CB 88 | colorwheel[col:col+CB,1] = np.arange(1,0,-1./CB) 89 | colorwheel[col:col+CB,2] = 1 90 | col += CB 91 | 92 | # BM 93 | colorwheel[col:col+BM,2] = 1 94 | colorwheel[col:col+BM,0] = np.arange(0,1,1./BM) 95 | col += BM 96 | 97 | # MR 98 | colorwheel[col:col+MR,2] = np.arange(1,0,-1./MR) 99 | colorwheel[col:col+MR,0] = 1 100 | 101 | return colorwheel 102 | -------------------------------------------------------------------------------- /flowutils/pfm.py: -------------------------------------------------------------------------------- 1 | import re 2 | import numpy as np 3 | import sys 4 | 5 | 6 | def readPFM(file): 7 | file = open(file, 'rb') 8 | 9 | color = None 10 | width = None 11 | height = None 12 | scale = None 13 | endian = None 14 | 15 | header = file.readline().rstrip() 16 | if header == 'PF': 17 | color = True 18 | elif header == 'Pf': 19 | color = False 20 | else: 21 | raise Exception('Not a PFM file.') 22 | 23 | dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline()) 24 | if dim_match: 25 | width, height = map(int, dim_match.groups()) 26 | else: 27 | raise Exception('Malformed PFM header.') 28 | 29 | scale = float(file.readline().rstrip()) 30 | if scale < 0: # little-endian 31 | endian = '<' 32 | scale = -scale 33 | else: 34 | endian = '>' # big-endian 35 | 36 | data = np.fromfile(file, endian + 'f') 37 | shape = (height, width, 3) if color else (height, width) 38 | 39 | data = np.reshape(data, shape) 40 | data = np.flipud(data) 41 | return data, scale 42 | 43 | 44 | def writePFM(file, image, scale=1): 45 | file = open(file, 'wb') 46 | 47 | color = None 48 | 49 | if image.dtype.name != 'float32': 50 | raise Exception('Image dtype must be float32.') 51 | 52 | image = np.flipud(image) 53 | 54 | if len(image.shape) == 3 and image.shape[2] == 3: # color image 55 | color = True 56 | elif len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1: # greyscale 57 | color = False 58 | else: 59 | raise Exception('Image must have H x W x 3, H x W x 1 or H x W dimensions.') 60 | 61 | file.write('PF\n' if color else 'Pf\n') 62 | file.write('%d %d\n' % (image.shape[1], image.shape[0])) 63 | 64 | endian = image.dtype.byteorder 65 | 66 | if endian == '<' or endian == '=' and sys.byteorder == 'little': 67 | scale = -scale 68 | 69 | file.write('%f\n' % scale) 70 | 71 | image.tofile(file) -------------------------------------------------------------------------------- /kitti_eval/depth_evaluation_utils.py: -------------------------------------------------------------------------------- 1 | # Mostly based on the code written by Clement Godard: 2 | # https://github.com/mrharicot/monodepth/blob/master/utils/evaluation_utils.py 3 | import numpy as np 4 | # import pandas as pd 5 | import datetime 6 | from collections import Counter 7 | from path import Path 8 | from scipy.misc import imread 9 | from tqdm import tqdm 10 | 11 | width_to_focal = dict() 12 | width_to_focal[1242] = 721.5377 13 | width_to_focal[1241] = 718.856 14 | width_to_focal[1224] = 707.0493 15 | width_to_focal[1238] = 718.3351 16 | 17 | 18 | class test_framework_KITTI(object): 19 | def __init__(self, root, test_files, seq_length=3, min_depth=1e-3, max_depth=100, step=1): 20 | self.root = root 21 | self.min_depth, self.max_depth = min_depth, max_depth 22 | self.calib_dirs, self.gt_files, self.img_files, self.displacements, self.cams = read_scene_data(self.root, test_files, seq_length, step) 23 | 24 | def __getitem__(self, i): 25 | tgt = imread(self.img_files[i][0]).astype(np.float32) 26 | depth = generate_depth_map(self.calib_dirs[i], self.gt_files[i], tgt.shape[:2], self.cams[i]) 27 | return {'tgt': tgt, 28 | 'ref': [imread(img).astype(np.float32) for img in self.img_files[i][1]], 29 | 'path':self.img_files[i][0], 30 | 'gt_depth': depth, 31 | 'displacements': np.array(self.displacements[i]), 32 | 'mask': generate_mask(depth, self.min_depth, self.max_depth) 33 | } 34 | 35 | def __len__(self): 36 | return len(self.img_files) 37 | 38 | 39 | ############################################################################### 40 | # EIGEN 41 | 42 | def read_text_lines(file_path): 43 | f = open(file_path, 'r') 44 | lines = f.readlines() 45 | f.close() 46 | lines = [l.rstrip() for l in lines] 47 | return lines 48 | 49 | 50 | def get_displacements(oxts_root, index, shifts): 51 | with open(oxts_root/'timestamps.txt') as f: 52 | timestamps = [datetime.datetime.strptime(ts[:-3], "%Y-%m-%d %H:%M:%S.%f").timestamp() for ts in f.read().splitlines()] 53 | oxts_data = np.genfromtxt(oxts_root/'data'/'{:010d}.txt'.format(index)) 54 | speed = np.linalg.norm(oxts_data[8:11]) 55 | assert(all(index+shift < len(timestamps) and index+shift >= 0 for shift in shifts)), str([index+shift for shift in shifts]) 56 | return [speed*abs(timestamps[index] - timestamps[index + shift]) for shift in shifts] 57 | 58 | 59 | def read_scene_data(data_root, test_list, seq_length=3, step=1): 60 | data_root = Path(data_root) 61 | gt_files = [] 62 | calib_dirs = [] 63 | im_files = [] 64 | cams = [] 65 | displacements = [] 66 | demi_length = (seq_length - 1) // 2 67 | shift_range = [step*i for i in list(range(-demi_length,0)) + list(range(1, demi_length + 1))] 68 | 69 | print('getting test metadata ... ') 70 | for sample in tqdm(test_list): 71 | tgt_img_path = data_root/sample 72 | date, scene, cam_id, _, index = sample[:-4].split('/') 73 | 74 | ref_imgs_path = [tgt_img_path.dirname()/'{:010d}.png'.format(int(index) + shift) for shift in shift_range] 75 | 76 | caped_shift_range = shift_range[:] # ensures ref_imgs are present, if not, set shift to 0 so that it will be discarded later 77 | for i,img in enumerate(ref_imgs_path): 78 | if not img.isfile(): 79 | ref_imgs_path[i] = tgt_img_path 80 | caped_shift_range[i] = 0 81 | 82 | vel_path = data_root/date/scene/'velodyne_points'/'data'/'{}.bin'.format(index[:10]) 83 | 84 | if tgt_img_path.isfile(): 85 | gt_files.append(vel_path) 86 | calib_dirs.append(data_root/date) 87 | im_files.append([tgt_img_path,ref_imgs_path]) 88 | cams.append(int(cam_id[-2:])) 89 | displacements.append(get_displacements(data_root/date/scene/'oxts', int(index), caped_shift_range)) 90 | else: 91 | print('{} missing'.format(tgt_img_path)) 92 | # print(num_probs, 'files missing') 93 | 94 | return calib_dirs, gt_files, im_files, displacements, cams 95 | 96 | 97 | def load_velodyne_points(file_name): 98 | # adapted from https://github.com/hunse/kitti 99 | points = np.fromfile(file_name, dtype=np.float32).reshape(-1, 4) 100 | points[:,3] = 1 101 | return points 102 | 103 | 104 | def read_calib_file(path): 105 | # taken from https://github.com/hunse/kitti 106 | float_chars = set("0123456789.e+- ") 107 | data = {} 108 | with open(path, 'r') as f: 109 | for line in f.readlines(): 110 | key, value = line.split(':', 1) 111 | value = value.strip() 112 | data[key] = value 113 | if float_chars.issuperset(value): 114 | # try to cast to float array 115 | try: 116 | data[key] = np.array(list(map(float, value.split(' ')))) 117 | except ValueError: 118 | # casting error: data[key] already eq. value, so pass 119 | pass 120 | 121 | return data 122 | 123 | 124 | def get_focal_length_baseline(calib_dir, cam=2): 125 | cam2cam = read_calib_file(calib_dir + 'calib_cam_to_cam.txt') 126 | P2_rect = cam2cam['P_rect_02'].reshape(3,4) 127 | P3_rect = cam2cam['P_rect_03'].reshape(3,4) 128 | 129 | # cam 2 is left of camera 0 -6cm 130 | # cam 3 is to the right +54cm 131 | b2 = P2_rect[0,3] / -P2_rect[0,0] 132 | b3 = P3_rect[0,3] / -P3_rect[0,0] 133 | baseline = b3-b2 134 | 135 | if cam == 2: 136 | focal_length = P2_rect[0,0] 137 | elif cam == 3: 138 | focal_length = P3_rect[0,0] 139 | 140 | return focal_length, baseline 141 | 142 | 143 | def sub2ind(matrixSize, rowSub, colSub): 144 | m, n = matrixSize 145 | return rowSub * (n-1) + colSub - 1 146 | 147 | 148 | def generate_depth_map(calib_dir, velo_file_name, im_shape, cam=2): 149 | # load calibration files 150 | cam2cam = read_calib_file(calib_dir/'calib_cam_to_cam.txt') 151 | velo2cam = read_calib_file(calib_dir/'calib_velo_to_cam.txt') 152 | velo2cam = np.hstack((velo2cam['R'].reshape(3,3), velo2cam['T'][..., np.newaxis])) 153 | velo2cam = np.vstack((velo2cam, np.array([0, 0, 0, 1.0]))) 154 | 155 | # compute projection matrix velodyne->image plane 156 | R_cam2rect = np.eye(4) 157 | R_cam2rect[:3,:3] = cam2cam['R_rect_00'].reshape(3,3) 158 | P_rect = cam2cam['P_rect_0'+str(cam)].reshape(3,4) 159 | P_velo2im = np.dot(np.dot(P_rect, R_cam2rect), velo2cam) 160 | 161 | # load velodyne points and remove all behind image plane (approximation) 162 | # each row of the velodyne data is forward, left, up, reflectance 163 | velo = load_velodyne_points(velo_file_name) 164 | velo = velo[velo[:, 0] >= 0, :] 165 | 166 | # project the points to the camera 167 | velo_pts_im = np.dot(P_velo2im, velo.T).T 168 | velo_pts_im[:, :2] = velo_pts_im[:,:2] / velo_pts_im[:,-1:] 169 | 170 | # check if in bounds 171 | # use minus 1 to get the exact same value as KITTI matlab code 172 | velo_pts_im[:, 0] = np.round(velo_pts_im[:,0]) - 1 173 | velo_pts_im[:, 1] = np.round(velo_pts_im[:,1]) - 1 174 | val_inds = (velo_pts_im[:, 0] >= 0) & (velo_pts_im[:, 1] >= 0) 175 | val_inds = val_inds & (velo_pts_im[:,0] < im_shape[1]) & (velo_pts_im[:,1] < im_shape[0]) 176 | velo_pts_im = velo_pts_im[val_inds, :] 177 | 178 | # project to image 179 | depth = np.zeros((im_shape)) 180 | depth[velo_pts_im[:, 1].astype(np.int), velo_pts_im[:, 0].astype(np.int)] = velo_pts_im[:, 2] 181 | 182 | # find the duplicate points and choose the closest depth 183 | inds = sub2ind(depth.shape, velo_pts_im[:, 1], velo_pts_im[:, 0]) 184 | dupe_inds = [item for item, count in Counter(inds).items() if count > 1] 185 | for dd in dupe_inds: 186 | pts = np.where(inds == dd)[0] 187 | x_loc = int(velo_pts_im[pts[0], 0]) 188 | y_loc = int(velo_pts_im[pts[0], 1]) 189 | depth[y_loc, x_loc] = velo_pts_im[pts, 2].min() 190 | depth[depth < 0] = 0 191 | return depth 192 | 193 | 194 | def generate_mask(gt_depth, min_depth, max_depth): 195 | mask = np.logical_and(gt_depth > min_depth, 196 | gt_depth < max_depth) 197 | # crop used by Garg ECCV16 to reprocude Eigen NIPS14 results 198 | # if used on gt_size 370x1224 produces a crop of [-218, -3, 44, 1180] 199 | gt_height, gt_width = gt_depth.shape 200 | crop = np.array([0.40810811 * gt_height, 0.99189189 * gt_height, 201 | 0.03594771 * gt_width, 0.96405229 * gt_width]).astype(np.int32) 202 | 203 | crop_mask = np.zeros(mask.shape) 204 | crop_mask[crop[0]:crop[1],crop[2]:crop[3]] = 1 205 | mask = np.logical_and(mask, crop_mask) 206 | return mask 207 | -------------------------------------------------------------------------------- /kitti_eval/pose_evaluation_utils.py: -------------------------------------------------------------------------------- 1 | # Mostly based on the code written by Clement Godard: 2 | # https://github.com/mrharicot/monodepth/blob/master/utils/evaluation_utils.py 3 | import numpy as np 4 | # import pandas as pd 5 | from path import Path 6 | from scipy.misc import imread 7 | from tqdm import tqdm 8 | 9 | 10 | class test_framework_KITTI(object): 11 | def __init__(self, root, sequence_set, seq_length=3, step=1): 12 | self.root = root 13 | self.img_files, self.poses, self.sample_indices = read_scene_data(self.root, sequence_set, seq_length, step) 14 | 15 | def generator(self): 16 | for img_list, pose_list, sample_list in zip(self.img_files, self.poses, self.sample_indices): 17 | for snippet_indices in sample_list: 18 | imgs = [imread(img_list[i]).astype(np.float32) for i in snippet_indices] 19 | 20 | poses = np.stack(pose_list[i] for i in snippet_indices) 21 | first_pose = poses[0] 22 | poses[:,:,-1] -= first_pose[:,-1] 23 | compensated_poses = np.linalg.inv(first_pose[:,:3]) @ poses 24 | 25 | yield {'imgs': imgs, 26 | 'path': img_list[0], 27 | 'poses': compensated_poses 28 | } 29 | 30 | def __iter__(self): 31 | return self.generator() 32 | 33 | def __len__(self): 34 | return sum(len(imgs) for imgs in self.img_files) 35 | 36 | 37 | def read_scene_data(data_root, sequence_set, seq_length=3, step=1): 38 | data_root = Path(data_root) 39 | im_sequences = [] 40 | poses_sequences = [] 41 | indices_sequences = [] 42 | demi_length = (seq_length - 1) // 2 43 | shift_range = np.array([step*i for i in range(-demi_length, demi_length + 1)]).reshape(1, -1) 44 | 45 | sequences = set() 46 | for seq in sequence_set: 47 | corresponding_dirs = set((data_root/'sequences').dirs(seq)) 48 | sequences = sequences | corresponding_dirs 49 | 50 | print('getting test metadata for theses sequences : {}'.format(sequences)) 51 | for sequence in tqdm(sequences): 52 | poses = np.genfromtxt(data_root/'poses'/'{}.txt'.format(sequence.name)).astype(np.float64).reshape(-1, 3, 4) 53 | imgs = sorted((sequence/'image_2').files('*.png')) 54 | # construct 5-snippet sequences 55 | tgt_indices = np.arange(demi_length, len(imgs) - demi_length).reshape(-1, 1) 56 | snippet_indices = shift_range + tgt_indices 57 | im_sequences.append(imgs) 58 | poses_sequences.append(poses) 59 | indices_sequences.append(snippet_indices) 60 | return im_sequences, poses_sequences, indices_sequences -------------------------------------------------------------------------------- /logger.py: -------------------------------------------------------------------------------- 1 | from blessings import Terminal 2 | import progressbar 3 | import sys 4 | 5 | 6 | class TermLogger(object): 7 | def __init__(self, n_epochs, train_size, valid_size): 8 | self.n_epochs = n_epochs 9 | self.train_size = train_size 10 | self.valid_size = valid_size 11 | self.t = Terminal() 12 | s = 10 13 | e = 1 # epoch bar position 14 | tr = 3 # train bar position 15 | ts = 6 # valid bar position 16 | h = self.t.height 17 | 18 | for i in range(10): 19 | print('') 20 | self.epoch_bar = progressbar.ProgressBar(maxval=n_epochs, fd=Writer(self.t, (0, h-s+e))) 21 | 22 | self.train_writer = Writer(self.t, (0, h-s+tr)) 23 | self.train_bar_writer = Writer(self.t, (0, h-s+tr+1)) 24 | 25 | self.valid_writer = Writer(self.t, (0, h-s+ts)) 26 | self.valid_bar_writer = Writer(self.t, (0, h-s+ts+1)) 27 | 28 | self.reset_train_bar() 29 | self.reset_valid_bar() 30 | 31 | def reset_train_bar(self): 32 | self.train_bar = progressbar.ProgressBar(maxval=self.train_size, fd=self.train_bar_writer).start() 33 | 34 | def reset_valid_bar(self): 35 | self.valid_bar = progressbar.ProgressBar(maxval=self.valid_size, fd=self.valid_bar_writer).start() 36 | 37 | 38 | class Writer(object): 39 | """Create an object with a write method that writes to a 40 | specific place on the screen, defined at instantiation. 41 | 42 | This is the glue between blessings and progressbar. 43 | """ 44 | 45 | def __init__(self, t, location): 46 | """ 47 | Input: location - tuple of ints (x, y), the position 48 | of the bar in the terminal 49 | """ 50 | self.location = location 51 | self.t = t 52 | 53 | def write(self, string): 54 | with self.t.location(*self.location): 55 | sys.stdout.write("\033[K") 56 | print(string) 57 | 58 | def flush(self): 59 | return 60 | 61 | 62 | class AverageMeter(object): 63 | """Computes and stores the average and current value""" 64 | 65 | def __init__(self, i=1, precision=3): 66 | self.meters = i 67 | self.precision = precision 68 | self.reset(self.meters) 69 | 70 | def reset(self, i): 71 | self.val = [0]*i 72 | self.avg = [0]*i 73 | self.sum = [0]*i 74 | self.count = 0 75 | 76 | def update(self, val, n=1): 77 | if not isinstance(val, list): 78 | val = [val] 79 | assert(len(val) == self.meters) 80 | self.count += n 81 | for i,v in enumerate(val): 82 | self.val[i] = v 83 | self.sum[i] += v * n 84 | self.avg[i] = self.sum[i] / self.count 85 | 86 | def __repr__(self): 87 | val = ' '.join(['{:.{}f}'.format(v, self.precision) for v in self.val]) 88 | avg = ' '.join(['{:.{}f}'.format(a, self.precision) for a in self.avg]) 89 | return '{} ({})'.format(val, avg) 90 | -------------------------------------------------------------------------------- /mnist_eval.py: -------------------------------------------------------------------------------- 1 | # Author: Anurag Ranjan 2 | # Copyright (c) 2019, Anurag Ranjan 3 | # All rights reserved. 4 | 5 | import argparse 6 | import time 7 | import csv 8 | import datetime 9 | import os 10 | from tqdm import tqdm 11 | import numpy as np 12 | 13 | import torch 14 | from torch.autograd import Variable 15 | import torch.backends.cudnn as cudnn 16 | import torch.optim 17 | import torch.nn as nn 18 | import torch.utils.data 19 | import torchvision 20 | import torch.nn.functional as F 21 | 22 | from logger import TermLogger, AverageMeter 23 | from path import Path 24 | from itertools import chain 25 | from tensorboardX import SummaryWriter 26 | 27 | from utils import tensor2array, save_checkpoint 28 | 29 | parser = argparse.ArgumentParser(description='MNIST and SVHN training', 30 | formatter_class=argparse.ArgumentDefaultsHelpFormatter) 31 | parser.add_argument('data', metavar='DIR', 32 | help='path to dataset') 33 | parser.add_argument('-j', '--workers', default=4, type=int, metavar='N', 34 | help='number of data loading workers') 35 | parser.add_argument('-b', '--batch-size', default=100, type=int, 36 | metavar='N', help='mini-batch size') 37 | 38 | parser.add_argument('--pretrained-alice', dest='pretrained_alice', default=None, metavar='PATH', 39 | help='path to pre-trained alice model') 40 | parser.add_argument('--pretrained-bob', dest='pretrained_bob', default=None, metavar='PATH', 41 | help='path to pre-trained bob model') 42 | parser.add_argument('--pretrained-mod', dest='pretrained_mod', default=None, metavar='PATH', 43 | help='path to pre-trained moderator') 44 | 45 | class LeNet(nn.Module): 46 | def __init__(self, nout=10): 47 | super(LeNet, self).__init__() 48 | self.conv1 = nn.Conv2d(1, 40, 3, 1) 49 | self.conv2 = nn.Conv2d(40, 40, 3, 1) 50 | self.fc1 = nn.Linear(40*5*5, 40) 51 | self.fc2 = nn.Linear(40, nout) 52 | 53 | def forward(self, x): 54 | x = F.relu(self.conv1(x)) 55 | x = F.max_pool2d(x, 2, 2) 56 | x = F.relu(self.conv2(x)) 57 | x = F.max_pool2d(x, 2, 2) 58 | x = x.view(-1, 40*5*5) 59 | x = F.relu(self.fc1(x)) 60 | x = self.fc2(x) 61 | return x 62 | 63 | def name(self): 64 | return "LeNet" 65 | 66 | def main(): 67 | global args 68 | args = parser.parse_args() 69 | 70 | args.data = Path(args.data) 71 | 72 | print("=> fetching dataset") 73 | mnist_transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor(), 74 | torchvision.transforms.Normalize((0.1307,), (0.3081,))]) 75 | valset_mnist = torchvision.datasets.MNIST(args.data/'mnist', train=False, transform=mnist_transform, target_transform=None, download=True) 76 | 77 | svhn_transform = torchvision.transforms.Compose([torchvision.transforms.Resize(size=(28,28)), 78 | torchvision.transforms.Grayscale(), 79 | torchvision.transforms.ToTensor()]) 80 | valset_svhn = torchvision.datasets.SVHN(args.data/'svhn', split='test', transform=svhn_transform, target_transform=None, download=True) 81 | val_set = torch.utils.data.ConcatDataset([valset_mnist, valset_svhn]) 82 | 83 | 84 | print('{} Test samples found in MNIST'.format(len(valset_mnist))) 85 | print('{} Test samples found in SVHN'.format(len(valset_svhn))) 86 | 87 | val_loader = torch.utils.data.DataLoader( 88 | val_set, batch_size=args.batch_size, shuffle=False, 89 | num_workers=args.workers, pin_memory=True, drop_last=False) 90 | 91 | val_loader_mnist = torch.utils.data.DataLoader( 92 | valset_mnist, batch_size=args.batch_size, shuffle=False, 93 | num_workers=args.workers, pin_memory=True, drop_last=False) 94 | 95 | val_loader_svhn = torch.utils.data.DataLoader( 96 | valset_svhn, batch_size=args.batch_size, shuffle=False, 97 | num_workers=args.workers, pin_memory=True, drop_last=False) 98 | 99 | # create model 100 | print("=> creating model") 101 | 102 | alice_net = LeNet() 103 | bob_net = LeNet() 104 | mod_net = LeNet(nout=1) 105 | 106 | print("=> using pre-trained weights from {}".format(args.pretrained_alice)) 107 | weights = torch.load(args.pretrained_alice) 108 | alice_net.load_state_dict(weights['state_dict']) 109 | 110 | print("=> using pre-trained weights from {}".format(args.pretrained_bob)) 111 | weights = torch.load(args.pretrained_bob) 112 | bob_net.load_state_dict(weights['state_dict']) 113 | 114 | print("=> using pre-trained weights from {}".format(args.pretrained_mod)) 115 | weights = torch.load(args.pretrained_mod) 116 | mod_net.load_state_dict(weights['state_dict']) 117 | 118 | cudnn.benchmark = True 119 | alice_net = alice_net.cuda() 120 | bob_net = bob_net.cuda() 121 | mod_net = mod_net.cuda() 122 | 123 | # evaluate on validation set 124 | errors_mnist, error_names_mnist, mod_count_mnist = validate(val_loader_mnist, alice_net, bob_net, mod_net) 125 | errors_svhn, error_names_svhn, mod_count_svhn = validate(val_loader_svhn, alice_net, bob_net, mod_net) 126 | errors_total, error_names_total, _ = validate(val_loader, alice_net, bob_net, mod_net) 127 | 128 | accuracy_string_mnist = ', '.join('{} : {:.3f}'.format(name, 100*(error)) for name, error in zip(error_names_mnist, errors_mnist)) 129 | accuracy_string_svhn = ', '.join('{} : {:.3f}'.format(name, 100*(error)) for name, error in zip(error_names_svhn, errors_svhn)) 130 | accuracy_string_total = ', '.join('{} : {:.3f}'.format(name, 100*(error)) for name, error in zip(error_names_total, errors_total)) 131 | 132 | print("MNIST Error") 133 | print(accuracy_string_mnist) 134 | print("MNIST Picking Percentage- Alice {:.3f}, Bob {:.3f}".format(mod_count_mnist[0]*100, (1-mod_count_mnist[0])*100)) 135 | 136 | print("SVHN Error") 137 | print(accuracy_string_svhn) 138 | print("SVHN Picking Percentage for Alice {:.3f}, Bob {:.3f}".format(mod_count_svhn[0]*100, (1-mod_count_svhn[0])*100)) 139 | 140 | print("TOTAL Error") 141 | print(accuracy_string_total) 142 | 143 | def validate(val_loader, alice_net, bob_net, mod_net): 144 | global args 145 | accuracy = AverageMeter(i=3, precision=4) 146 | mod_count = AverageMeter() 147 | 148 | # switch to evaluate mode 149 | alice_net.eval() 150 | bob_net.eval() 151 | mod_net.eval() 152 | 153 | for i, (img, target) in enumerate(tqdm(val_loader)): 154 | img_var = Variable(img.cuda(), volatile=True) 155 | target_var = Variable(target.cuda(), volatile=True) 156 | 157 | pred_alice = alice_net(img_var) 158 | pred_bob = bob_net(img_var) 159 | pred_mod = F.sigmoid(mod_net(img_var)) 160 | _ , pred_alice_label = torch.max(pred_alice.data, 1) 161 | _ , pred_bob_label = torch.max(pred_bob.data, 1) 162 | pred_label = (pred_mod.squeeze().data > 0.5).type_as(pred_alice_label) * pred_alice_label + (pred_mod.squeeze().data <= 0.5).type_as(pred_bob_label) * pred_bob_label 163 | 164 | total_accuracy = (pred_label.cpu() == target).sum().item() / img.size(0) 165 | alice_accuracy = (pred_alice_label.cpu() == target).sum().item() / img.size(0) 166 | bob_accuracy = (pred_bob_label.cpu() == target).sum().item() / img.size(0) 167 | accuracy.update([total_accuracy, alice_accuracy, bob_accuracy]) 168 | mod_count.update((pred_mod.cpu().data > 0.5).sum().item() / img.size(0)) 169 | 170 | return list(map(lambda x: 1-x, accuracy.avg)), ['Total', 'alice', 'bob'] , mod_count.avg 171 | 172 | 173 | 174 | if __name__ == '__main__': 175 | # import sys 176 | # with open("experiment_recorder.md", "a") as f: 177 | # f.write('\n python3 ' + ' '.join(sys.argv)) 178 | main() 179 | -------------------------------------------------------------------------------- /models/DispNetS.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | 5 | def downsample_conv(in_planes, out_planes, kernel_size=3): 6 | return nn.Sequential( 7 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=2, padding=(kernel_size-1)//2), 8 | nn.ReLU(inplace=True), 9 | nn.Conv2d(out_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2), 10 | nn.ReLU(inplace=True) 11 | ) 12 | 13 | 14 | def predict_disp(in_planes): 15 | return nn.Sequential( 16 | nn.Conv2d(in_planes, 1, kernel_size=3, padding=1), 17 | nn.Sigmoid() 18 | ) 19 | 20 | 21 | def conv(in_planes, out_planes): 22 | return nn.Sequential( 23 | nn.Conv2d(in_planes, out_planes, kernel_size=3, padding=1), 24 | nn.ReLU(inplace=True) 25 | ) 26 | 27 | 28 | def upconv(in_planes, out_planes): 29 | return nn.Sequential( 30 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=3, stride=2, padding=1, output_padding=1), 31 | nn.ReLU(inplace=True) 32 | ) 33 | 34 | 35 | def crop_like(input, ref): 36 | assert(input.size(2) >= ref.size(2) and input.size(3) >= ref.size(3)) 37 | return input[:, :, :ref.size(2), :ref.size(3)] 38 | 39 | 40 | class DispNetS(nn.Module): 41 | 42 | def __init__(self, alpha=10, beta=0.01): 43 | super(DispNetS, self).__init__() 44 | 45 | self.alpha = alpha 46 | self.beta = beta 47 | 48 | conv_planes = [32, 64, 128, 256, 512, 512, 512] 49 | self.conv1 = downsample_conv(3, conv_planes[0], kernel_size=7) 50 | self.conv2 = downsample_conv(conv_planes[0], conv_planes[1], kernel_size=5) 51 | self.conv3 = downsample_conv(conv_planes[1], conv_planes[2]) 52 | self.conv4 = downsample_conv(conv_planes[2], conv_planes[3]) 53 | self.conv5 = downsample_conv(conv_planes[3], conv_planes[4]) 54 | self.conv6 = downsample_conv(conv_planes[4], conv_planes[5]) 55 | self.conv7 = downsample_conv(conv_planes[5], conv_planes[6]) 56 | 57 | upconv_planes = [512, 512, 256, 128, 64, 32, 16] 58 | self.upconv7 = upconv(conv_planes[6], upconv_planes[0]) 59 | self.upconv6 = upconv(upconv_planes[0], upconv_planes[1]) 60 | self.upconv5 = upconv(upconv_planes[1], upconv_planes[2]) 61 | self.upconv4 = upconv(upconv_planes[2], upconv_planes[3]) 62 | self.upconv3 = upconv(upconv_planes[3], upconv_planes[4]) 63 | self.upconv2 = upconv(upconv_planes[4], upconv_planes[5]) 64 | self.upconv1 = upconv(upconv_planes[5], upconv_planes[6]) 65 | 66 | self.iconv7 = conv(upconv_planes[0] + conv_planes[5], upconv_planes[0]) 67 | self.iconv6 = conv(upconv_planes[1] + conv_planes[4], upconv_planes[1]) 68 | self.iconv5 = conv(upconv_planes[2] + conv_planes[3], upconv_planes[2]) 69 | self.iconv4 = conv(upconv_planes[3] + conv_planes[2], upconv_planes[3]) 70 | self.iconv3 = conv(1 + upconv_planes[4] + conv_planes[1], upconv_planes[4]) 71 | self.iconv2 = conv(1 + upconv_planes[5] + conv_planes[0], upconv_planes[5]) 72 | self.iconv1 = conv(1 + upconv_planes[6], upconv_planes[6]) 73 | 74 | self.predict_disp4 = predict_disp(upconv_planes[3]) 75 | self.predict_disp3 = predict_disp(upconv_planes[4]) 76 | self.predict_disp2 = predict_disp(upconv_planes[5]) 77 | self.predict_disp1 = predict_disp(upconv_planes[6]) 78 | 79 | def init_weights(self): 80 | for m in self.modules(): 81 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d): 82 | nn.init.xavier_uniform(m.weight.data) 83 | if m.bias is not None: 84 | m.bias.data.zero_() 85 | 86 | def forward(self, x): 87 | out_conv1 = self.conv1(x) 88 | out_conv2 = self.conv2(out_conv1) 89 | out_conv3 = self.conv3(out_conv2) 90 | out_conv4 = self.conv4(out_conv3) 91 | out_conv5 = self.conv5(out_conv4) 92 | out_conv6 = self.conv6(out_conv5) 93 | out_conv7 = self.conv7(out_conv6) 94 | 95 | out_upconv7 = crop_like(self.upconv7(out_conv7), out_conv6) 96 | concat7 = torch.cat((out_upconv7, out_conv6), 1) 97 | out_iconv7 = self.iconv7(concat7) 98 | 99 | out_upconv6 = crop_like(self.upconv6(out_iconv7), out_conv5) 100 | concat6 = torch.cat((out_upconv6, out_conv5), 1) 101 | out_iconv6 = self.iconv6(concat6) 102 | 103 | out_upconv5 = crop_like(self.upconv5(out_iconv6), out_conv4) 104 | concat5 = torch.cat((out_upconv5, out_conv4), 1) 105 | out_iconv5 = self.iconv5(concat5) 106 | 107 | out_upconv4 = crop_like(self.upconv4(out_iconv5), out_conv3) 108 | concat4 = torch.cat((out_upconv4, out_conv3), 1) 109 | out_iconv4 = self.iconv4(concat4) 110 | disp4 = self.alpha * self.predict_disp4(out_iconv4) + self.beta 111 | 112 | out_upconv3 = crop_like(self.upconv3(out_iconv4), out_conv2) 113 | disp4_up = crop_like(nn.functional.upsample(disp4, scale_factor=2, mode='bilinear'), out_conv2) 114 | concat3 = torch.cat((out_upconv3, out_conv2, disp4_up), 1) 115 | out_iconv3 = self.iconv3(concat3) 116 | disp3 = self.alpha * self.predict_disp3(out_iconv3) + self.beta 117 | 118 | out_upconv2 = crop_like(self.upconv2(out_iconv3), out_conv1) 119 | disp3_up = crop_like(nn.functional.upsample(disp3, scale_factor=2, mode='bilinear'), out_conv1) 120 | concat2 = torch.cat((out_upconv2, out_conv1, disp3_up), 1) 121 | out_iconv2 = self.iconv2(concat2) 122 | disp2 = self.alpha * self.predict_disp2(out_iconv2) + self.beta 123 | 124 | out_upconv1 = crop_like(self.upconv1(out_iconv2), x) 125 | disp2_up = crop_like(nn.functional.upsample(disp2, scale_factor=2, mode='bilinear'), x) 126 | concat1 = torch.cat((out_upconv1, disp2_up), 1) 127 | out_iconv1 = self.iconv1(concat1) 128 | disp1 = self.alpha * self.predict_disp1(out_iconv1) + self.beta 129 | 130 | if self.training: 131 | return disp1, disp2, disp3, disp4 132 | else: 133 | return disp1 134 | -------------------------------------------------------------------------------- /models/DispNetS6.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | 5 | def downsample_conv(in_planes, out_planes, kernel_size=3): 6 | return nn.Sequential( 7 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=2, padding=(kernel_size-1)//2), 8 | nn.ReLU(inplace=True), 9 | nn.Conv2d(out_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2), 10 | nn.ReLU(inplace=True) 11 | ) 12 | 13 | 14 | def predict_disp(in_planes): 15 | return nn.Sequential( 16 | nn.Conv2d(in_planes, 1, kernel_size=3, padding=1), 17 | nn.Sigmoid() 18 | ) 19 | 20 | 21 | def conv(in_planes, out_planes): 22 | return nn.Sequential( 23 | nn.Conv2d(in_planes, out_planes, kernel_size=3, padding=1), 24 | nn.ReLU(inplace=True) 25 | ) 26 | 27 | 28 | def upconv(in_planes, out_planes): 29 | return nn.Sequential( 30 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=3, stride=2, padding=1, output_padding=1), 31 | nn.ReLU(inplace=True) 32 | ) 33 | 34 | 35 | def crop_like(input, ref): 36 | assert(input.size(2) >= ref.size(2) and input.size(3) >= ref.size(3)) 37 | return input[:, :, :ref.size(2), :ref.size(3)] 38 | 39 | 40 | class DispNetS6(nn.Module): 41 | 42 | def __init__(self, alpha=10, beta=0.01): 43 | super(DispNetS6, self).__init__() 44 | 45 | self.alpha = alpha 46 | self.beta = beta 47 | 48 | conv_planes = [32, 64, 128, 256, 512, 512, 512] 49 | self.conv1 = downsample_conv(3, conv_planes[0], kernel_size=7) 50 | self.conv2 = downsample_conv(conv_planes[0], conv_planes[1], kernel_size=5) 51 | self.conv3 = downsample_conv(conv_planes[1], conv_planes[2]) 52 | self.conv4 = downsample_conv(conv_planes[2], conv_planes[3]) 53 | self.conv5 = downsample_conv(conv_planes[3], conv_planes[4]) 54 | self.conv6 = downsample_conv(conv_planes[4], conv_planes[5]) 55 | self.conv7 = downsample_conv(conv_planes[5], conv_planes[6]) 56 | 57 | upconv_planes = [512, 512, 256, 128, 64, 32, 16] 58 | self.upconv7 = upconv(conv_planes[6], upconv_planes[0]) 59 | self.upconv6 = upconv(upconv_planes[0], upconv_planes[1]) 60 | self.upconv5 = upconv(upconv_planes[1], upconv_planes[2]) 61 | self.upconv4 = upconv(upconv_planes[2], upconv_planes[3]) 62 | self.upconv3 = upconv(upconv_planes[3], upconv_planes[4]) 63 | self.upconv2 = upconv(upconv_planes[4], upconv_planes[5]) 64 | self.upconv1 = upconv(upconv_planes[5], upconv_planes[6]) 65 | 66 | self.iconv7 = conv(upconv_planes[0] + conv_planes[5], upconv_planes[0]) 67 | self.iconv6 = conv(upconv_planes[1] + conv_planes[4], upconv_planes[1]) 68 | self.iconv5 = conv(upconv_planes[2] + conv_planes[3], upconv_planes[2]) 69 | self.iconv4 = conv(upconv_planes[3] + conv_planes[2], upconv_planes[3]) 70 | self.iconv3 = conv(1 + upconv_planes[4] + conv_planes[1], upconv_planes[4]) 71 | self.iconv2 = conv(1 + upconv_planes[5] + conv_planes[0], upconv_planes[5]) 72 | self.iconv1 = conv(1 + upconv_planes[6], upconv_planes[6]) 73 | 74 | self.predict_disp6 = predict_disp(upconv_planes[1]) 75 | self.predict_disp5 = predict_disp(upconv_planes[2]) 76 | self.predict_disp4 = predict_disp(upconv_planes[3]) 77 | self.predict_disp3 = predict_disp(upconv_planes[4]) 78 | self.predict_disp2 = predict_disp(upconv_planes[5]) 79 | self.predict_disp1 = predict_disp(upconv_planes[6]) 80 | 81 | def init_weights(self): 82 | for m in self.modules(): 83 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d): 84 | nn.init.xavier_uniform(m.weight.data) 85 | if m.bias is not None: 86 | m.bias.data.zero_() 87 | 88 | def forward(self, x): 89 | out_conv1 = self.conv1(x) 90 | out_conv2 = self.conv2(out_conv1) 91 | out_conv3 = self.conv3(out_conv2) 92 | out_conv4 = self.conv4(out_conv3) 93 | out_conv5 = self.conv5(out_conv4) 94 | out_conv6 = self.conv6(out_conv5) 95 | out_conv7 = self.conv7(out_conv6) 96 | 97 | out_upconv7 = crop_like(self.upconv7(out_conv7), out_conv6) 98 | concat7 = torch.cat((out_upconv7, out_conv6), 1) 99 | out_iconv7 = self.iconv7(concat7) 100 | 101 | out_upconv6 = crop_like(self.upconv6(out_iconv7), out_conv5) 102 | concat6 = torch.cat((out_upconv6, out_conv5), 1) 103 | out_iconv6 = self.iconv6(concat6) 104 | disp6 = self.alpha * self.predict_disp6(out_iconv6) + self.beta 105 | 106 | out_upconv5 = crop_like(self.upconv5(out_iconv6), out_conv4) 107 | concat5 = torch.cat((out_upconv5, out_conv4), 1) 108 | out_iconv5 = self.iconv5(concat5) 109 | disp5 = self.alpha * self.predict_disp5(out_iconv5) + self.beta 110 | 111 | out_upconv4 = crop_like(self.upconv4(out_iconv5), out_conv3) 112 | concat4 = torch.cat((out_upconv4, out_conv3), 1) 113 | out_iconv4 = self.iconv4(concat4) 114 | disp4 = self.alpha * self.predict_disp4(out_iconv4) + self.beta 115 | 116 | out_upconv3 = crop_like(self.upconv3(out_iconv4), out_conv2) 117 | disp4_up = crop_like(nn.functional.upsample(disp4, scale_factor=2, mode='bilinear'), out_conv2) 118 | concat3 = torch.cat((out_upconv3, out_conv2, disp4_up), 1) 119 | out_iconv3 = self.iconv3(concat3) 120 | disp3 = self.alpha * self.predict_disp3(out_iconv3) + self.beta 121 | 122 | out_upconv2 = crop_like(self.upconv2(out_iconv3), out_conv1) 123 | disp3_up = crop_like(nn.functional.upsample(disp3, scale_factor=2, mode='bilinear'), out_conv1) 124 | concat2 = torch.cat((out_upconv2, out_conv1, disp3_up), 1) 125 | out_iconv2 = self.iconv2(concat2) 126 | disp2 = self.alpha * self.predict_disp2(out_iconv2) + self.beta 127 | 128 | out_upconv1 = crop_like(self.upconv1(out_iconv2), x) 129 | disp2_up = crop_like(nn.functional.upsample(disp2, scale_factor=2, mode='bilinear'), x) 130 | concat1 = torch.cat((out_upconv1, disp2_up), 1) 131 | out_iconv1 = self.iconv1(concat1) 132 | disp1 = self.alpha * self.predict_disp1(out_iconv1) + self.beta 133 | 134 | if self.training: 135 | return disp1, disp2, disp3, disp4, disp5, disp6 136 | else: 137 | return disp1 138 | -------------------------------------------------------------------------------- /models/DispResNet6.py: -------------------------------------------------------------------------------- 1 | # Author: Anurag Ranjan 2 | # Copyright (c) 2019, Anurag Ranjan 3 | # All rights reserved. 4 | # based on github.com/ClementPinard/SfMLearner-Pytorch 5 | 6 | import torch 7 | import torch.nn as nn 8 | 9 | def conv3x3(in_planes, out_planes, stride=1): 10 | """3x3 convolution with padding""" 11 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 12 | padding=1, bias=False) 13 | 14 | class BasicBlock(nn.Module): 15 | expansion = 1 16 | 17 | def __init__(self, inplanes, planes, stride=1, downsample=None): 18 | super(BasicBlock, self).__init__() 19 | self.conv1 = conv3x3(inplanes, planes, stride) 20 | #self.bn1 = nn.BatchNorm2d(planes) 21 | self.relu = nn.ReLU(inplace=True) 22 | self.conv2 = conv3x3(planes, planes) 23 | #self.bn2 = nn.BatchNorm2d(planes) 24 | self.downsample = downsample 25 | self.stride = stride 26 | 27 | def forward(self, x): 28 | residual = x 29 | 30 | out = self.conv1(x) 31 | #out = self.bn1(out) 32 | out = self.relu(out) 33 | 34 | out = self.conv2(out) 35 | #out = self.bn2(out) 36 | 37 | if self.downsample is not None: 38 | residual = self.downsample(x) 39 | 40 | out += residual 41 | out = self.relu(out) 42 | 43 | return out 44 | 45 | def make_layer(inplanes, block, planes, blocks, stride=1): 46 | downsample = None 47 | if stride != 1 or inplanes != planes * block.expansion: 48 | downsample = nn.Sequential( 49 | nn.Conv2d(inplanes, planes * block.expansion, 50 | kernel_size=1, stride=stride, bias=False), 51 | nn.BatchNorm2d(planes * block.expansion), 52 | ) 53 | 54 | layers = [] 55 | layers.append(block(inplanes, planes, stride, downsample)) 56 | inplanes = planes * block.expansion 57 | for i in range(1, blocks): 58 | layers.append(block(inplanes, planes)) 59 | 60 | return nn.Sequential(*layers) 61 | 62 | def downsample_conv(in_planes, out_planes, kernel_size=3): 63 | return nn.Sequential( 64 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=2, padding=(kernel_size-1)//2), 65 | nn.ReLU(inplace=True), 66 | nn.Conv2d(out_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2), 67 | nn.ReLU(inplace=True) 68 | ) 69 | 70 | 71 | def predict_disp(in_planes): 72 | return nn.Sequential( 73 | nn.Conv2d(in_planes, 1, kernel_size=3, padding=1), 74 | nn.Sigmoid() 75 | ) 76 | 77 | 78 | def conv(in_planes, out_planes): 79 | return nn.Sequential( 80 | nn.Conv2d(in_planes, out_planes, kernel_size=3, padding=1), 81 | nn.ReLU(inplace=True) 82 | ) 83 | 84 | 85 | def upconv(in_planes, out_planes): 86 | return nn.Sequential( 87 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=3, stride=2, padding=1, output_padding=1), 88 | nn.ReLU(inplace=True) 89 | ) 90 | 91 | 92 | def crop_like(input, ref): 93 | assert(input.size(2) >= ref.size(2) and input.size(3) >= ref.size(3)) 94 | return input[:, :, :ref.size(2), :ref.size(3)] 95 | 96 | 97 | class DispResNet6(nn.Module): 98 | 99 | def __init__(self, alpha=10, beta=0.01): 100 | super(DispResNet6, self).__init__() 101 | 102 | self.alpha = alpha 103 | self.beta = beta 104 | 105 | conv_planes = [32, 64, 128, 256, 512, 512, 512] 106 | self.conv1 = downsample_conv(3, conv_planes[0], kernel_size=7) 107 | self.conv2 = make_layer(conv_planes[0], BasicBlock, conv_planes[1], blocks=2, stride=2) 108 | self.conv3 = make_layer(conv_planes[1], BasicBlock, conv_planes[2], blocks=2, stride=2) 109 | self.conv4 = make_layer(conv_planes[2], BasicBlock, conv_planes[3], blocks=2, stride=2) 110 | self.conv5 = make_layer(conv_planes[3], BasicBlock, conv_planes[4], blocks=2, stride=2) 111 | self.conv6 = make_layer(conv_planes[4], BasicBlock, conv_planes[5], blocks=2, stride=2) 112 | self.conv7 = make_layer(conv_planes[5], BasicBlock, conv_planes[6], blocks=2, stride=2) 113 | 114 | upconv_planes = [512, 512, 256, 128, 64, 32, 16] 115 | self.upconv7 = upconv(conv_planes[6], upconv_planes[0]) 116 | self.upconv6 = upconv(upconv_planes[0], upconv_planes[1]) 117 | self.upconv5 = upconv(upconv_planes[1], upconv_planes[2]) 118 | self.upconv4 = upconv(upconv_planes[2], upconv_planes[3]) 119 | self.upconv3 = upconv(upconv_planes[3], upconv_planes[4]) 120 | self.upconv2 = upconv(upconv_planes[4], upconv_planes[5]) 121 | self.upconv1 = upconv(upconv_planes[5], upconv_planes[6]) 122 | 123 | self.iconv7 = make_layer(upconv_planes[0] + conv_planes[5], BasicBlock, upconv_planes[0], blocks=1, stride=1) 124 | self.iconv6 = make_layer(upconv_planes[1] + conv_planes[4], BasicBlock, upconv_planes[1], blocks=1, stride=1) 125 | self.iconv5 = make_layer(upconv_planes[2] + conv_planes[3], BasicBlock, upconv_planes[2], blocks=1, stride=1) 126 | self.iconv4 = make_layer(upconv_planes[3] + conv_planes[2], BasicBlock, upconv_planes[3], blocks=1, stride=1) 127 | self.iconv3 = make_layer(1 + upconv_planes[4] + conv_planes[1], BasicBlock, upconv_planes[4], blocks=1, stride=1) 128 | self.iconv2 = make_layer(1 + upconv_planes[5] + conv_planes[0], BasicBlock, upconv_planes[5], blocks=1, stride=1) 129 | self.iconv1 = make_layer(1 + upconv_planes[6], BasicBlock, upconv_planes[6], blocks=1, stride=1) 130 | 131 | self.predict_disp6 = predict_disp(upconv_planes[1]) 132 | self.predict_disp5 = predict_disp(upconv_planes[2]) 133 | self.predict_disp4 = predict_disp(upconv_planes[3]) 134 | self.predict_disp3 = predict_disp(upconv_planes[4]) 135 | self.predict_disp2 = predict_disp(upconv_planes[5]) 136 | self.predict_disp1 = predict_disp(upconv_planes[6]) 137 | 138 | def init_weights(self): 139 | for m in self.modules(): 140 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d): 141 | nn.init.xavier_uniform(m.weight.data) 142 | if m.bias is not None: 143 | m.bias.data.zero_() 144 | 145 | def forward(self, x): 146 | out_conv1 = self.conv1(x) 147 | out_conv2 = self.conv2(out_conv1) 148 | out_conv3 = self.conv3(out_conv2) 149 | out_conv4 = self.conv4(out_conv3) 150 | out_conv5 = self.conv5(out_conv4) 151 | out_conv6 = self.conv6(out_conv5) 152 | out_conv7 = self.conv7(out_conv6) 153 | 154 | out_upconv7 = crop_like(self.upconv7(out_conv7), out_conv6) 155 | concat7 = torch.cat((out_upconv7, out_conv6), 1) 156 | out_iconv7 = self.iconv7(concat7) 157 | 158 | out_upconv6 = crop_like(self.upconv6(out_iconv7), out_conv5) 159 | concat6 = torch.cat((out_upconv6, out_conv5), 1) 160 | out_iconv6 = self.iconv6(concat6) 161 | disp6 = self.alpha * self.predict_disp6(out_iconv6) + self.beta 162 | 163 | out_upconv5 = crop_like(self.upconv5(out_iconv6), out_conv4) 164 | concat5 = torch.cat((out_upconv5, out_conv4), 1) 165 | out_iconv5 = self.iconv5(concat5) 166 | disp5 = self.alpha * self.predict_disp5(out_iconv5) + self.beta 167 | 168 | out_upconv4 = crop_like(self.upconv4(out_iconv5), out_conv3) 169 | concat4 = torch.cat((out_upconv4, out_conv3), 1) 170 | out_iconv4 = self.iconv4(concat4) 171 | disp4 = self.alpha * self.predict_disp4(out_iconv4) + self.beta 172 | 173 | out_upconv3 = crop_like(self.upconv3(out_iconv4), out_conv2) 174 | disp4_up = crop_like(nn.functional.upsample(disp4, scale_factor=2, mode='bilinear'), out_conv2) 175 | concat3 = torch.cat((out_upconv3, out_conv2, disp4_up), 1) 176 | out_iconv3 = self.iconv3(concat3) 177 | disp3 = self.alpha * self.predict_disp3(out_iconv3) + self.beta 178 | 179 | out_upconv2 = crop_like(self.upconv2(out_iconv3), out_conv1) 180 | disp3_up = crop_like(nn.functional.upsample(disp3, scale_factor=2, mode='bilinear'), out_conv1) 181 | concat2 = torch.cat((out_upconv2, out_conv1, disp3_up), 1) 182 | out_iconv2 = self.iconv2(concat2) 183 | disp2 = self.alpha * self.predict_disp2(out_iconv2) + self.beta 184 | 185 | out_upconv1 = crop_like(self.upconv1(out_iconv2), x) 186 | disp2_up = crop_like(nn.functional.upsample(disp2, scale_factor=2, mode='bilinear'), x) 187 | concat1 = torch.cat((out_upconv1, disp2_up), 1) 188 | out_iconv1 = self.iconv1(concat1) 189 | disp1 = self.alpha * self.predict_disp1(out_iconv1) + self.beta 190 | 191 | if self.training: 192 | return disp1, disp2, disp3, disp4, disp5, disp6 193 | else: 194 | return disp1 195 | -------------------------------------------------------------------------------- /models/DispResNetS6.py: -------------------------------------------------------------------------------- 1 | # Author: Anurag Ranjan 2 | # Copyright (c) 2019, Anurag Ranjan 3 | # All rights reserved. 4 | # based on github.com/ClementPinard/SfMLearner-Pytorch 5 | 6 | import torch 7 | import torch.nn as nn 8 | 9 | def conv3x3(in_planes, out_planes, stride=1): 10 | """3x3 convolution with padding""" 11 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 12 | padding=1, bias=False) 13 | 14 | class BasicBlock(nn.Module): 15 | expansion = 1 16 | 17 | def __init__(self, inplanes, planes, stride=1, downsample=None): 18 | super(BasicBlock, self).__init__() 19 | self.conv1 = conv3x3(inplanes, planes, stride) 20 | #self.bn1 = nn.BatchNorm2d(planes) 21 | self.relu = nn.ReLU(inplace=True) 22 | self.conv2 = conv3x3(planes, planes) 23 | #self.bn2 = nn.BatchNorm2d(planes) 24 | self.downsample = downsample 25 | self.stride = stride 26 | 27 | def forward(self, x): 28 | residual = x 29 | 30 | out = self.conv1(x) 31 | #out = self.bn1(out) 32 | out = self.relu(out) 33 | 34 | out = self.conv2(out) 35 | #out = self.bn2(out) 36 | 37 | if self.downsample is not None: 38 | residual = self.downsample(x) 39 | 40 | out += residual 41 | out = self.relu(out) 42 | 43 | return out 44 | 45 | def make_layer(inplanes, block, planes, blocks, stride=1): 46 | downsample = None 47 | if stride != 1 or inplanes != planes * block.expansion: 48 | downsample = nn.Sequential( 49 | nn.Conv2d(inplanes, planes * block.expansion, 50 | kernel_size=1, stride=stride, bias=False), 51 | nn.BatchNorm2d(planes * block.expansion), 52 | ) 53 | 54 | layers = [] 55 | layers.append(block(inplanes, planes, stride, downsample)) 56 | inplanes = planes * block.expansion 57 | for i in range(1, blocks): 58 | layers.append(block(inplanes, planes)) 59 | 60 | return nn.Sequential(*layers) 61 | 62 | def downsample_conv(in_planes, out_planes, kernel_size=3): 63 | return nn.Sequential( 64 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=2, padding=(kernel_size-1)//2), 65 | nn.ReLU(inplace=True), 66 | nn.Conv2d(out_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2), 67 | nn.ReLU(inplace=True) 68 | ) 69 | 70 | 71 | def predict_disp(in_planes): 72 | return nn.Sequential( 73 | nn.Conv2d(in_planes, 1, kernel_size=3, padding=1), 74 | nn.Sigmoid() 75 | ) 76 | 77 | 78 | def conv(in_planes, out_planes): 79 | return nn.Sequential( 80 | nn.Conv2d(in_planes, out_planes, kernel_size=3, padding=1), 81 | nn.ReLU(inplace=True) 82 | ) 83 | 84 | 85 | def upconv(in_planes, out_planes): 86 | return nn.Sequential( 87 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=3, stride=2, padding=1, output_padding=1), 88 | nn.ReLU(inplace=True) 89 | ) 90 | 91 | 92 | def crop_like(input, ref): 93 | assert(input.size(2) >= ref.size(2) and input.size(3) >= ref.size(3)) 94 | return input[:, :, :ref.size(2), :ref.size(3)] 95 | 96 | 97 | class DispResNetS6(nn.Module): 98 | 99 | def __init__(self, alpha=10, beta=0.01): 100 | super(DispResNetS6, self).__init__() 101 | 102 | self.alpha = alpha 103 | self.beta = beta 104 | 105 | conv_planes = [32, 64, 128, 256, 512, 512, 512] 106 | self.conv1 = downsample_conv(3, conv_planes[0], kernel_size=7) 107 | self.conv2 = make_layer(conv_planes[0], BasicBlock, conv_planes[1], blocks=2, stride=2) 108 | self.conv3 = make_layer(conv_planes[1], BasicBlock, conv_planes[2], blocks=2, stride=2) 109 | self.conv4 = make_layer(conv_planes[2], BasicBlock, conv_planes[3], blocks=3, stride=2) 110 | self.conv5 = make_layer(conv_planes[3], BasicBlock, conv_planes[4], blocks=3, stride=2) 111 | self.conv6 = make_layer(conv_planes[4], BasicBlock, conv_planes[5], blocks=3, stride=2) 112 | self.conv7 = make_layer(conv_planes[5], BasicBlock, conv_planes[6], blocks=3, stride=2) 113 | 114 | upconv_planes = [512, 512, 256, 128, 64, 32, 16] 115 | self.upconv7 = upconv(conv_planes[6], upconv_planes[0]) 116 | self.upconv6 = upconv(upconv_planes[0], upconv_planes[1]) 117 | self.upconv5 = upconv(upconv_planes[1], upconv_planes[2]) 118 | self.upconv4 = upconv(upconv_planes[2], upconv_planes[3]) 119 | self.upconv3 = upconv(upconv_planes[3], upconv_planes[4]) 120 | self.upconv2 = upconv(upconv_planes[4], upconv_planes[5]) 121 | self.upconv1 = upconv(upconv_planes[5], upconv_planes[6]) 122 | 123 | self.iconv7 = make_layer(upconv_planes[0] + conv_planes[5], BasicBlock, upconv_planes[0], blocks=2, stride=1) 124 | self.iconv6 = make_layer(upconv_planes[1] + conv_planes[4], BasicBlock, upconv_planes[1], blocks=2, stride=1) 125 | self.iconv5 = make_layer(upconv_planes[2] + conv_planes[3], BasicBlock, upconv_planes[2], blocks=2, stride=1) 126 | self.iconv4 = make_layer(upconv_planes[3] + conv_planes[2], BasicBlock, upconv_planes[3], blocks=2, stride=1) 127 | self.iconv3 = make_layer(1 + upconv_planes[4] + conv_planes[1], BasicBlock, upconv_planes[4], blocks=1, stride=1) 128 | self.iconv2 = make_layer(1 + upconv_planes[5] + conv_planes[0], BasicBlock, upconv_planes[5], blocks=1, stride=1) 129 | self.iconv1 = make_layer(1 + upconv_planes[6], BasicBlock, upconv_planes[6], blocks=1, stride=1) 130 | 131 | self.predict_disp6 = predict_disp(upconv_planes[1]) 132 | self.predict_disp5 = predict_disp(upconv_planes[2]) 133 | self.predict_disp4 = predict_disp(upconv_planes[3]) 134 | self.predict_disp3 = predict_disp(upconv_planes[4]) 135 | self.predict_disp2 = predict_disp(upconv_planes[5]) 136 | self.predict_disp1 = predict_disp(upconv_planes[6]) 137 | 138 | def init_weights(self): 139 | for m in self.modules(): 140 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d): 141 | nn.init.xavier_uniform(m.weight.data) 142 | if m.bias is not None: 143 | m.bias.data.zero_() 144 | 145 | def forward(self, x): 146 | out_conv1 = self.conv1(x) 147 | out_conv2 = self.conv2(out_conv1) 148 | out_conv3 = self.conv3(out_conv2) 149 | out_conv4 = self.conv4(out_conv3) 150 | out_conv5 = self.conv5(out_conv4) 151 | out_conv6 = self.conv6(out_conv5) 152 | out_conv7 = self.conv7(out_conv6) 153 | 154 | out_upconv7 = crop_like(self.upconv7(out_conv7), out_conv6) 155 | concat7 = torch.cat((out_upconv7, out_conv6), 1) 156 | out_iconv7 = self.iconv7(concat7) 157 | 158 | out_upconv6 = crop_like(self.upconv6(out_iconv7), out_conv5) 159 | concat6 = torch.cat((out_upconv6, out_conv5), 1) 160 | out_iconv6 = self.iconv6(concat6) 161 | disp6 = self.alpha * self.predict_disp6(out_iconv6) + self.beta 162 | 163 | out_upconv5 = crop_like(self.upconv5(out_iconv6), out_conv4) 164 | concat5 = torch.cat((out_upconv5, out_conv4), 1) 165 | out_iconv5 = self.iconv5(concat5) 166 | disp5 = self.alpha * self.predict_disp5(out_iconv5) + self.beta 167 | 168 | out_upconv4 = crop_like(self.upconv4(out_iconv5), out_conv3) 169 | concat4 = torch.cat((out_upconv4, out_conv3), 1) 170 | out_iconv4 = self.iconv4(concat4) 171 | disp4 = self.alpha * self.predict_disp4(out_iconv4) + self.beta 172 | 173 | out_upconv3 = crop_like(self.upconv3(out_iconv4), out_conv2) 174 | disp4_up = crop_like(nn.functional.upsample(disp4, scale_factor=2, mode='bilinear'), out_conv2) 175 | concat3 = torch.cat((out_upconv3, out_conv2, disp4_up), 1) 176 | out_iconv3 = self.iconv3(concat3) 177 | disp3 = self.alpha * self.predict_disp3(out_iconv3) + self.beta 178 | 179 | out_upconv2 = crop_like(self.upconv2(out_iconv3), out_conv1) 180 | disp3_up = crop_like(nn.functional.upsample(disp3, scale_factor=2, mode='bilinear'), out_conv1) 181 | concat2 = torch.cat((out_upconv2, out_conv1, disp3_up), 1) 182 | out_iconv2 = self.iconv2(concat2) 183 | disp2 = self.alpha * self.predict_disp2(out_iconv2) + self.beta 184 | 185 | out_upconv1 = crop_like(self.upconv1(out_iconv2), x) 186 | disp2_up = crop_like(nn.functional.upsample(disp2, scale_factor=2, mode='bilinear'), x) 187 | concat1 = torch.cat((out_upconv1, disp2_up), 1) 188 | out_iconv1 = self.iconv1(concat1) 189 | disp1 = self.alpha * self.predict_disp1(out_iconv1) + self.beta 190 | 191 | if self.training: 192 | return disp1, disp2, disp3, disp4, disp5, disp6 193 | else: 194 | return disp1 195 | -------------------------------------------------------------------------------- /models/FlowNetC6.py: -------------------------------------------------------------------------------- 1 | # Author: Anurag Ranjan 2 | # Copyright (c) 2019, Anurag Ranjan 3 | # All rights reserved. 4 | # based on github.com/NVIDIA/FlowNet2-Pytorch 5 | 6 | import torch 7 | import torch.nn as nn 8 | from torch.nn import init 9 | 10 | import math 11 | import numpy as np 12 | 13 | # from .correlation_package.modules.correlation import Correlation 14 | from spatial_correlation_sampler import spatial_correlation_sample 15 | from .submodules import conv, deconv, predict_flow 16 | 'Parameter count , 39,175,298 ' 17 | 18 | def correlate(input1, input2): 19 | out_corr = spatial_correlation_sample(input1, 20 | input2, 21 | kernel_size=1, 22 | patch_size=21, 23 | stride=1, 24 | padding=0, 25 | dilation_patch=2) 26 | # collate dimensions 1 and 2 in order to be treated as a 27 | # regular 4D tensor 28 | b, ph, pw, h, w = out_corr.size() 29 | out_corr = out_corr.view(b, ph * pw, h, w)/input1.size(1) 30 | return out_corr 31 | 32 | class FlowNetC6(nn.Module): 33 | def __init__(self, nlevels=5, batchNorm=False, div_flow = 20, full_res=True, pretrained=True): 34 | super(FlowNetC6,self).__init__() 35 | 36 | #assert(nlevels==5) 37 | self.batchNorm = batchNorm 38 | self.div_flow = div_flow 39 | self.full_res = full_res 40 | 41 | self.conv1 = conv(self.batchNorm, 3, 64, kernel_size=7, stride=2) 42 | self.conv2 = conv(self.batchNorm, 64, 128, kernel_size=5, stride=2) 43 | self.conv3 = conv(self.batchNorm, 128, 256, kernel_size=5, stride=2) 44 | self.conv_redir = conv(self.batchNorm, 256, 32, kernel_size=1, stride=1) 45 | 46 | # if args.fp16: 47 | # self.corr = nn.Sequential( 48 | # tofp32(), 49 | # Correlation(pad_size=20, kernel_size=1, max_displacement=20, stride1=1, stride2=2, corr_multiply=1), 50 | # tofp16()) 51 | # else: 52 | self.corr = correlate # Correlation(pad_size=20, kernel_size=1, max_displacement=20, stride1=1, stride2=2, corr_multiply=1) 53 | 54 | self.corr_activation = nn.LeakyReLU(0.1,inplace=True) 55 | self.conv3_1 = conv(self.batchNorm, 473, 256) 56 | self.conv4 = conv(self.batchNorm, 256, 512, stride=2) 57 | self.conv4_1 = conv(self.batchNorm, 512, 512) 58 | self.conv5 = conv(self.batchNorm, 512, 512, stride=2) 59 | self.conv5_1 = conv(self.batchNorm, 512, 512) 60 | self.conv6 = conv(self.batchNorm, 512, 1024, stride=2) 61 | self.conv6_1 = conv(self.batchNorm,1024, 1024) 62 | 63 | self.deconv5 = deconv(1024,512) 64 | self.deconv4 = deconv(1026,256) 65 | self.deconv3 = deconv(770,128) 66 | self.deconv2 = deconv(386,64) 67 | self.deconv1 = deconv(194,32) 68 | 69 | self.predict_flow6 = predict_flow(1024) 70 | self.predict_flow5 = predict_flow(1026) 71 | self.predict_flow4 = predict_flow(770) 72 | self.predict_flow3 = predict_flow(386) 73 | self.predict_flow2 = predict_flow(194) 74 | self.predict_flow1 = predict_flow(98) 75 | 76 | self.upsampled_flow6_to_5 = nn.ConvTranspose2d(2, 2, 4, 2, 1, bias=True) 77 | self.upsampled_flow5_to_4 = nn.ConvTranspose2d(2, 2, 4, 2, 1, bias=True) 78 | self.upsampled_flow4_to_3 = nn.ConvTranspose2d(2, 2, 4, 2, 1, bias=True) 79 | self.upsampled_flow3_to_2 = nn.ConvTranspose2d(2, 2, 4, 2, 1, bias=True) 80 | self.upsampled_flow2_to_1 = nn.ConvTranspose2d(2, 2, 4, 2, 1, bias=True) 81 | 82 | self.upsample1 = nn.Upsample(scale_factor=2, mode='bilinear') 83 | 84 | def init_weights(self): 85 | for m in self.modules(): 86 | if isinstance(m, nn.Conv2d): 87 | if m.bias is not None: 88 | init.uniform(m.bias) 89 | init.xavier_uniform(m.weight) 90 | 91 | if isinstance(m, nn.ConvTranspose2d): 92 | if m.bias is not None: 93 | init.uniform(m.bias) 94 | init.xavier_uniform(m.weight) 95 | # init_deconv_bilinear(m.weight) 96 | 97 | 98 | 99 | def forward(self, x1,x2): 100 | 101 | out_conv1a = self.conv1(x1) 102 | out_conv2a = self.conv2(out_conv1a) 103 | out_conv3a = self.conv3(out_conv2a) 104 | 105 | # FlownetC bottom input stream 106 | out_conv1b = self.conv1(x2) 107 | out_conv2b = self.conv2(out_conv1b) 108 | out_conv3b = self.conv3(out_conv2b) 109 | 110 | # Merge streams 111 | out_corr = self.corr(out_conv3a, out_conv3b) 112 | out_corr = self.corr_activation(out_corr) 113 | 114 | # Redirect top input stream and concatenate 115 | out_conv_redir = self.conv_redir(out_conv3a) 116 | 117 | in_conv3_1 = torch.cat((out_conv_redir, out_corr), 1) 118 | 119 | # Merged conv layers 120 | out_conv3_1 = self.conv3_1(in_conv3_1) 121 | out_conv4 = self.conv4_1(self.conv4(out_conv3_1)) 122 | out_conv5 = self.conv5_1(self.conv5(out_conv4)) 123 | out_conv6 = self.conv6_1(self.conv6(out_conv5)) 124 | 125 | flow6 = self.predict_flow6(out_conv6) 126 | out_deconv5 = self.deconv5(out_conv6) 127 | flow6_up = self.upsampled_flow6_to_5(flow6) 128 | 129 | concat5 = torch.cat((out_conv5,out_deconv5,flow6_up),1) 130 | 131 | flow5 = self.predict_flow5(concat5) 132 | out_deconv4 = self.deconv4(concat5) 133 | flow5_up = self.upsampled_flow5_to_4(flow5) 134 | concat4 = torch.cat((out_conv4,out_deconv4,flow5_up),1) 135 | 136 | flow4 = self.predict_flow4(concat4) 137 | out_deconv3 = self.deconv3(concat4) 138 | flow4_up = self.upsampled_flow4_to_3(flow4) 139 | concat3 = torch.cat((out_conv3_1,out_deconv3,flow4_up),1) 140 | 141 | flow3 = self.predict_flow3(concat3) 142 | out_deconv2 = self.deconv2(concat3) 143 | flow3_up = self.upsampled_flow3_to_2(flow3) 144 | concat2 = torch.cat((out_conv2a,out_deconv2,flow3_up),1) 145 | 146 | flow2 = self.predict_flow2(concat2) 147 | out_deconv1 = self.deconv1(concat2) 148 | flow2_up = self.upsampled_flow2_to_1(flow2) 149 | concat1 = torch.cat((out_conv1a,out_deconv1,flow2_up), 1) 150 | 151 | flow1 = self.predict_flow1(concat1) 152 | #out_convs = [out_conv2a, out_conv2b, out_conv3a, out_conv3b] 153 | if self.full_res: 154 | flow1 = self.div_flow*self.upsample1(flow1) 155 | flow2 = self.div_flow*self.upsample1(flow2) 156 | flow3 = self.div_flow*self.upsample1(flow3) 157 | flow4 = self.div_flow*self.upsample1(flow4) 158 | flow5 = self.div_flow*self.upsample1(flow5) 159 | flow6 = self.div_flow*self.upsample1(flow6) 160 | 161 | if self.training: 162 | return flow1, flow2,flow3,flow4,flow5,flow6 #, out_convs 163 | else: 164 | return flow1 165 | -------------------------------------------------------------------------------- /models/MaskNet6.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | 5 | def conv(in_planes, out_planes, kernel_size=3): 6 | return nn.Sequential( 7 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2, stride=2), 8 | nn.ReLU(inplace=True) 9 | ) 10 | 11 | 12 | def upconv(in_planes, out_planes): 13 | return nn.Sequential( 14 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1), 15 | nn.ReLU(inplace=True) 16 | ) 17 | 18 | 19 | class MaskNet6(nn.Module): 20 | 21 | def __init__(self, nb_ref_imgs=4, output_exp=True): 22 | super(MaskNet6, self).__init__() 23 | self.nb_ref_imgs = nb_ref_imgs 24 | self.output_exp = output_exp 25 | 26 | conv_planes = [16, 32, 64, 128, 256, 256, 256, 256] 27 | self.conv1 = conv(3*(1+self.nb_ref_imgs), conv_planes[0], kernel_size=7) 28 | self.conv2 = conv(conv_planes[0], conv_planes[1], kernel_size=5) 29 | self.conv3 = conv(conv_planes[1], conv_planes[2]) 30 | self.conv4 = conv(conv_planes[2], conv_planes[3]) 31 | self.conv5 = conv(conv_planes[3], conv_planes[4]) 32 | self.conv6 = conv(conv_planes[4], conv_planes[5]) 33 | #self.conv7 = conv(conv_planes[5], conv_planes[6]) 34 | #self.conv8 = conv(conv_planes[6], conv_planes[7]) 35 | 36 | #self.pose_pred = nn.Conv2d(conv_planes[7], 6*self.nb_ref_imgs, kernel_size=1, padding=0) 37 | 38 | if self.output_exp: 39 | upconv_planes = [256, 256, 128, 64, 32, 16] 40 | self.deconv6 = upconv(conv_planes[5], upconv_planes[0]) 41 | self.deconv5 = upconv(upconv_planes[0]+conv_planes[4], upconv_planes[1]) 42 | self.deconv4 = upconv(upconv_planes[1]+conv_planes[3], upconv_planes[2]) 43 | self.deconv3 = upconv(upconv_planes[2]+conv_planes[2], upconv_planes[3]) 44 | self.deconv2 = upconv(upconv_planes[3]+conv_planes[1], upconv_planes[4]) 45 | self.deconv1 = upconv(upconv_planes[4]+conv_planes[0], upconv_planes[5]) 46 | 47 | self.pred_mask6 = nn.Conv2d(upconv_planes[0], self.nb_ref_imgs, kernel_size=3, padding=1) 48 | self.pred_mask5 = nn.Conv2d(upconv_planes[1], self.nb_ref_imgs, kernel_size=3, padding=1) 49 | self.pred_mask4 = nn.Conv2d(upconv_planes[2], self.nb_ref_imgs, kernel_size=3, padding=1) 50 | self.pred_mask3 = nn.Conv2d(upconv_planes[3], self.nb_ref_imgs, kernel_size=3, padding=1) 51 | self.pred_mask2 = nn.Conv2d(upconv_planes[4], self.nb_ref_imgs, kernel_size=3, padding=1) 52 | self.pred_mask1 = nn.Conv2d(upconv_planes[5], self.nb_ref_imgs, kernel_size=3, padding=1) 53 | 54 | def init_weights(self): 55 | for m in self.modules(): 56 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d): 57 | nn.init.xavier_uniform(m.weight.data) 58 | if m.bias is not None: 59 | m.bias.data.zero_() 60 | 61 | def init_mask_weights(self): 62 | for m in self.modules(): 63 | if isinstance(m, nn.ConvTranspose2d): 64 | nn.init.xavier_uniform(m.weight.data) 65 | if m.bias is not None: 66 | m.bias.data.zero_() 67 | 68 | for module in [self.pred_mask1, self.pred_mask2, self.pred_mask3, self.pred_mask4, self.pred_mask5, self.pred_mask6]: 69 | for m in module.modules(): 70 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d): 71 | nn.init.xavier_uniform(m.weight.data) 72 | if m.bias is not None: 73 | m.bias.data.zero_() 74 | 75 | # for mod in [self.conv1, self.conv2, self.conv3, self.conv4, self.conv5, self.conv6, self.conv7, self.conv8, self.pose_pred]: 76 | # for fparams in mod.parameters(): 77 | # fparams.requires_grad = False 78 | 79 | 80 | def forward(self, target_image, ref_imgs): 81 | assert(len(ref_imgs) == self.nb_ref_imgs) 82 | input = [target_image] 83 | input.extend(ref_imgs) 84 | input = torch.cat(input, 1) 85 | out_conv1 = self.conv1(input) 86 | out_conv2 = self.conv2(out_conv1) 87 | out_conv3 = self.conv3(out_conv2) 88 | out_conv4 = self.conv4(out_conv3) 89 | out_conv5 = self.conv5(out_conv4) 90 | out_conv6 = self.conv6(out_conv5) 91 | #out_conv7 = self.conv7(out_conv6) 92 | #out_conv8 = self.conv8(out_conv7) 93 | 94 | #pose = self.pose_pred(out_conv8) 95 | #pose = pose.mean(3).mean(2) 96 | #pose = 0.01 * pose.view(pose.size(0), self.nb_ref_imgs, 6) 97 | 98 | if self.output_exp: 99 | out_upconv6 = self.deconv6(out_conv6 )#[:, :, 0:out_conv5.size(2), 0:out_conv5.size(3)] 100 | out_upconv5 = self.deconv5(torch.cat((out_upconv6, out_conv5), 1))#[:, :, 0:out_conv4.size(2), 0:out_conv4.size(3)] 101 | out_upconv4 = self.deconv4(torch.cat((out_upconv5, out_conv4), 1))#[:, :, 0:out_conv3.size(2), 0:out_conv3.size(3)] 102 | out_upconv3 = self.deconv3(torch.cat((out_upconv4, out_conv3), 1))#[:, :, 0:out_conv2.size(2), 0:out_conv2.size(3)] 103 | out_upconv2 = self.deconv2(torch.cat((out_upconv3, out_conv2), 1))#[:, :, 0:out_conv1.size(2), 0:out_conv1.size(3)] 104 | out_upconv1 = self.deconv1(torch.cat((out_upconv2, out_conv1), 1))#[:, :, 0:input.size(2), 0:input.size(3)] 105 | 106 | exp_mask6 = nn.functional.sigmoid(self.pred_mask6(out_upconv6)) 107 | exp_mask5 = nn.functional.sigmoid(self.pred_mask5(out_upconv5)) 108 | exp_mask4 = nn.functional.sigmoid(self.pred_mask4(out_upconv4)) 109 | exp_mask3 = nn.functional.sigmoid(self.pred_mask3(out_upconv3)) 110 | exp_mask2 = nn.functional.sigmoid(self.pred_mask2(out_upconv2)) 111 | exp_mask1 = nn.functional.sigmoid(self.pred_mask1(out_upconv1)) 112 | else: 113 | exp_mask6 = None 114 | exp_mask5 = None 115 | exp_mask4 = None 116 | exp_mask3 = None 117 | exp_mask2 = None 118 | exp_mask1 = None 119 | 120 | if self.training: 121 | return exp_mask1, exp_mask2, exp_mask3, exp_mask4, exp_mask5, exp_mask6 122 | else: 123 | return exp_mask1 124 | -------------------------------------------------------------------------------- /models/MaskResNet6.py: -------------------------------------------------------------------------------- 1 | # Author: Anurag Ranjan 2 | # Copyright (c) 2019, Anurag Ranjan 3 | # All rights reserved. 4 | 5 | import torch 6 | import torch.nn as nn 7 | 8 | def conv3x3(in_planes, out_planes, stride=1): 9 | """3x3 convolution with padding""" 10 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 11 | padding=1, bias=False) 12 | 13 | def conv(in_planes, out_planes, kernel_size=3, stride=2): 14 | return nn.Sequential( 15 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2, stride=stride), 16 | nn.ReLU(inplace=True) 17 | ) 18 | 19 | 20 | def upconv(in_planes, out_planes): 21 | return nn.Sequential( 22 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1), 23 | nn.ReLU(inplace=True) 24 | ) 25 | 26 | class BasicBlock(nn.Module): 27 | expansion = 1 28 | 29 | def __init__(self, inplanes, planes, stride=1, downsample=None): 30 | super(BasicBlock, self).__init__() 31 | self.conv1 = conv3x3(inplanes, planes, stride) 32 | self.relu = nn.ReLU(inplace=True) 33 | self.conv2 = conv3x3(planes, planes) 34 | self.downsample = downsample 35 | self.stride = stride 36 | 37 | def forward(self, x): 38 | residual = x 39 | 40 | out = self.conv1(x) 41 | out = self.relu(out) 42 | out = self.conv2(out) 43 | 44 | if self.downsample is not None: 45 | residual = self.downsample(x) 46 | 47 | out += residual 48 | out = self.relu(out) 49 | 50 | return out 51 | 52 | def make_layer(inplanes, block, planes, blocks, stride=1): 53 | downsample = None 54 | if stride != 1 or inplanes != planes * block.expansion: 55 | downsample = nn.Sequential( 56 | nn.Conv2d(inplanes, planes * block.expansion, 57 | kernel_size=1, stride=stride, bias=False), 58 | nn.BatchNorm2d(planes * block.expansion), 59 | ) 60 | 61 | layers = [] 62 | layers.append(block(inplanes, planes, stride, downsample)) 63 | inplanes = planes * block.expansion 64 | for i in range(1, blocks): 65 | layers.append(block(inplanes, planes)) 66 | 67 | return nn.Sequential(*layers) 68 | 69 | class MaskResNet6(nn.Module): 70 | 71 | def __init__(self, nb_ref_imgs=4, output_exp=True): 72 | super(MaskResNet6, self).__init__() 73 | self.nb_ref_imgs = nb_ref_imgs 74 | self.output_exp = output_exp 75 | 76 | conv_planes = [16, 32, 64, 128, 256, 256, 256, 256] 77 | self.conv1 = conv(3*(1+self.nb_ref_imgs), conv_planes[0], kernel_size=7, stride=2) 78 | self.conv2 = make_layer(conv_planes[0], BasicBlock, conv_planes[1], blocks=2, stride=2) 79 | self.conv3 = make_layer(conv_planes[1], BasicBlock, conv_planes[2], blocks=2, stride=2) 80 | self.conv4 = make_layer(conv_planes[2], BasicBlock, conv_planes[3], blocks=2, stride=2) 81 | self.conv5 = make_layer(conv_planes[3], BasicBlock, conv_planes[4], blocks=2, stride=2) 82 | self.conv6 = make_layer(conv_planes[4], BasicBlock, conv_planes[5], blocks=2, stride=2) 83 | 84 | if self.output_exp: 85 | upconv_planes = [256, 256, 128, 64, 32, 16] 86 | self.deconv6 = upconv(conv_planes[5], upconv_planes[0]) 87 | self.deconv5 = upconv(upconv_planes[0]+conv_planes[4], upconv_planes[1]) 88 | self.deconv4 = upconv(upconv_planes[1]+conv_planes[3], upconv_planes[2]) 89 | self.deconv3 = upconv(upconv_planes[2]+conv_planes[2], upconv_planes[3]) 90 | self.deconv2 = upconv(upconv_planes[3]+conv_planes[1], upconv_planes[4]) 91 | self.deconv1 = upconv(upconv_planes[4]+conv_planes[0], upconv_planes[5]) 92 | 93 | self.pred_mask6 = nn.Conv2d(upconv_planes[0], self.nb_ref_imgs, kernel_size=3, padding=1) 94 | self.pred_mask5 = nn.Conv2d(upconv_planes[1], self.nb_ref_imgs, kernel_size=3, padding=1) 95 | self.pred_mask4 = nn.Conv2d(upconv_planes[2], self.nb_ref_imgs, kernel_size=3, padding=1) 96 | self.pred_mask3 = nn.Conv2d(upconv_planes[3], self.nb_ref_imgs, kernel_size=3, padding=1) 97 | self.pred_mask2 = nn.Conv2d(upconv_planes[4], self.nb_ref_imgs, kernel_size=3, padding=1) 98 | self.pred_mask1 = nn.Conv2d(upconv_planes[5], self.nb_ref_imgs, kernel_size=3, padding=1) 99 | 100 | def init_weights(self): 101 | for m in self.modules(): 102 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d): 103 | nn.init.xavier_uniform(m.weight.data) 104 | if m.bias is not None: 105 | m.bias.data.zero_() 106 | 107 | def init_mask_weights(self): 108 | for m in self.modules(): 109 | if isinstance(m, nn.ConvTranspose2d): 110 | nn.init.xavier_uniform(m.weight.data) 111 | if m.bias is not None: 112 | m.bias.data.zero_() 113 | 114 | for module in [self.pred_mask1, self.pred_mask2, self.pred_mask3, self.pred_mask4, self.pred_mask5, self.pred_mask6]: 115 | for m in module.modules(): 116 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d): 117 | nn.init.xavier_uniform(m.weight.data) 118 | if m.bias is not None: 119 | m.bias.data.zero_() 120 | 121 | 122 | 123 | def forward(self, target_image, ref_imgs): 124 | assert(len(ref_imgs) == self.nb_ref_imgs) 125 | input = [target_image] 126 | input.extend(ref_imgs) 127 | input = torch.cat(input, 1) 128 | out_conv1 = self.conv1(input) 129 | out_conv2 = self.conv2(out_conv1) 130 | out_conv3 = self.conv3(out_conv2) 131 | out_conv4 = self.conv4(out_conv3) 132 | out_conv5 = self.conv5(out_conv4) 133 | out_conv6 = self.conv6(out_conv5) 134 | 135 | if self.output_exp: 136 | out_upconv6 = self.deconv6(out_conv6 )#[:, :, 0:out_conv5.size(2), 0:out_conv5.size(3)] 137 | out_upconv5 = self.deconv5(torch.cat((out_upconv6, out_conv5), 1))#[:, :, 0:out_conv4.size(2), 0:out_conv4.size(3)] 138 | out_upconv4 = self.deconv4(torch.cat((out_upconv5, out_conv4), 1))#[:, :, 0:out_conv3.size(2), 0:out_conv3.size(3)] 139 | out_upconv3 = self.deconv3(torch.cat((out_upconv4, out_conv3), 1))#[:, :, 0:out_conv2.size(2), 0:out_conv2.size(3)] 140 | out_upconv2 = self.deconv2(torch.cat((out_upconv3, out_conv2), 1))#[:, :, 0:out_conv1.size(2), 0:out_conv1.size(3)] 141 | out_upconv1 = self.deconv1(torch.cat((out_upconv2, out_conv1), 1))#[:, :, 0:input.size(2), 0:input.size(3)] 142 | 143 | exp_mask6 = nn.functional.sigmoid(self.pred_mask6(out_upconv6)) 144 | exp_mask5 = nn.functional.sigmoid(self.pred_mask5(out_upconv5)) 145 | exp_mask4 = nn.functional.sigmoid(self.pred_mask4(out_upconv4)) 146 | exp_mask3 = nn.functional.sigmoid(self.pred_mask3(out_upconv3)) 147 | exp_mask2 = nn.functional.sigmoid(self.pred_mask2(out_upconv2)) 148 | exp_mask1 = nn.functional.sigmoid(self.pred_mask1(out_upconv1)) 149 | else: 150 | exp_mask6 = None 151 | exp_mask5 = None 152 | exp_mask4 = None 153 | exp_mask3 = None 154 | exp_mask2 = None 155 | exp_mask1 = None 156 | 157 | if self.training: 158 | return exp_mask1, exp_mask2, exp_mask3, exp_mask4, exp_mask5, exp_mask6 159 | else: 160 | return exp_mask1 161 | -------------------------------------------------------------------------------- /models/PoseExpNet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | 5 | def conv(in_planes, out_planes, kernel_size=3): 6 | return nn.Sequential( 7 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2, stride=2), 8 | nn.ReLU(inplace=True) 9 | ) 10 | 11 | 12 | def upconv(in_planes, out_planes): 13 | return nn.Sequential( 14 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1), 15 | nn.ReLU(inplace=True) 16 | ) 17 | 18 | 19 | class PoseExpNet(nn.Module): 20 | 21 | def __init__(self, nb_ref_imgs=2, output_exp=False): 22 | super(PoseExpNet, self).__init__() 23 | self.nb_ref_imgs = nb_ref_imgs 24 | self.output_exp = output_exp 25 | 26 | conv_planes = [16, 32, 64, 128, 256, 256, 256] 27 | self.conv1 = conv(3*(1+self.nb_ref_imgs), conv_planes[0], kernel_size=7) 28 | self.conv2 = conv(conv_planes[0], conv_planes[1], kernel_size=5) 29 | self.conv3 = conv(conv_planes[1], conv_planes[2]) 30 | self.conv4 = conv(conv_planes[2], conv_planes[3]) 31 | self.conv5 = conv(conv_planes[3], conv_planes[4]) 32 | self.conv6 = conv(conv_planes[4], conv_planes[5]) 33 | self.conv7 = conv(conv_planes[5], conv_planes[6]) 34 | 35 | self.pose_pred = nn.Conv2d(conv_planes[6], 6*self.nb_ref_imgs, kernel_size=1, padding=0) 36 | 37 | if self.output_exp: 38 | upconv_planes = [256, 128, 64, 32, 16] 39 | self.upconv5 = upconv(conv_planes[4], upconv_planes[0]) 40 | self.upconv4 = upconv(upconv_planes[0], upconv_planes[1]) 41 | self.upconv3 = upconv(upconv_planes[1], upconv_planes[2]) 42 | self.upconv2 = upconv(upconv_planes[2], upconv_planes[3]) 43 | self.upconv1 = upconv(upconv_planes[3], upconv_planes[4]) 44 | 45 | self.predict_mask4 = nn.Conv2d(upconv_planes[1], self.nb_ref_imgs, kernel_size=3, padding=1) 46 | self.predict_mask3 = nn.Conv2d(upconv_planes[2], self.nb_ref_imgs, kernel_size=3, padding=1) 47 | self.predict_mask2 = nn.Conv2d(upconv_planes[3], self.nb_ref_imgs, kernel_size=3, padding=1) 48 | self.predict_mask1 = nn.Conv2d(upconv_planes[4], self.nb_ref_imgs, kernel_size=3, padding=1) 49 | 50 | def init_weights(self): 51 | for m in self.modules(): 52 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d): 53 | nn.init.xavier_uniform(m.weight.data) 54 | if m.bias is not None: 55 | m.bias.data.zero_() 56 | 57 | def forward(self, target_image, ref_imgs): 58 | assert(len(ref_imgs) == self.nb_ref_imgs) 59 | input = [target_image] 60 | input.extend(ref_imgs) 61 | input = torch.cat(input, 1) 62 | out_conv1 = self.conv1(input) 63 | out_conv2 = self.conv2(out_conv1) 64 | out_conv3 = self.conv3(out_conv2) 65 | out_conv4 = self.conv4(out_conv3) 66 | out_conv5 = self.conv5(out_conv4) 67 | out_conv6 = self.conv6(out_conv5) 68 | out_conv7 = self.conv7(out_conv6) 69 | 70 | pose = self.pose_pred(out_conv7) 71 | pose = pose.mean(3).mean(2) 72 | pose = 0.01 * pose.view(pose.size(0), self.nb_ref_imgs, 6) 73 | 74 | if self.output_exp: 75 | out_upconv5 = self.upconv5(out_conv5 )[:, :, 0:out_conv4.size(2), 0:out_conv4.size(3)] 76 | out_upconv4 = self.upconv4(out_upconv5)[:, :, 0:out_conv3.size(2), 0:out_conv3.size(3)] 77 | out_upconv3 = self.upconv3(out_upconv4)[:, :, 0:out_conv2.size(2), 0:out_conv2.size(3)] 78 | out_upconv2 = self.upconv2(out_upconv3)[:, :, 0:out_conv1.size(2), 0:out_conv1.size(3)] 79 | out_upconv1 = self.upconv1(out_upconv2)[:, :, 0:input.size(2), 0:input.size(3)] 80 | 81 | exp_mask4 = nn.functional.sigmoid(self.predict_mask4(out_upconv4)) 82 | exp_mask3 = nn.functional.sigmoid(self.predict_mask3(out_upconv3)) 83 | exp_mask2 = nn.functional.sigmoid(self.predict_mask2(out_upconv2)) 84 | exp_mask1 = nn.functional.sigmoid(self.predict_mask1(out_upconv1)) 85 | else: 86 | exp_mask4 = None 87 | exp_mask3 = None 88 | exp_mask2 = None 89 | exp_mask1 = None 90 | 91 | if self.training: 92 | return [exp_mask1, exp_mask2, exp_mask3, exp_mask4], pose 93 | else: 94 | return exp_mask1, pose 95 | -------------------------------------------------------------------------------- /models/PoseNet6.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | 5 | def conv(in_planes, out_planes, kernel_size=3): 6 | return nn.Sequential( 7 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2, stride=2), 8 | nn.ReLU(inplace=True) 9 | ) 10 | 11 | 12 | def upconv(in_planes, out_planes): 13 | return nn.Sequential( 14 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1), 15 | nn.ReLU(inplace=True) 16 | ) 17 | 18 | 19 | class PoseNet6(nn.Module): 20 | 21 | def __init__(self, nb_ref_imgs=2): 22 | super(PoseNet6, self).__init__() 23 | self.nb_ref_imgs = nb_ref_imgs 24 | 25 | conv_planes = [16, 32, 64, 128, 256, 256, 256] 26 | self.conv0 = conv(3*(1+self.nb_ref_imgs), 3*(1+self.nb_ref_imgs), kernel_size=3) 27 | self.conv1 = conv(3*(1+self.nb_ref_imgs), conv_planes[0], kernel_size=7) 28 | self.conv2 = conv(conv_planes[0], conv_planes[1], kernel_size=5) 29 | self.conv3 = conv(conv_planes[1], conv_planes[2]) 30 | self.conv4 = conv(conv_planes[2], conv_planes[3]) 31 | self.conv5 = conv(conv_planes[3], conv_planes[4]) 32 | self.conv6 = conv(conv_planes[4], conv_planes[5]) 33 | self.conv7 = conv(conv_planes[5], conv_planes[6]) 34 | 35 | self.pose_pred = nn.Conv2d(conv_planes[6], 6*self.nb_ref_imgs, kernel_size=1, padding=0) 36 | 37 | def init_weights(self): 38 | for m in self.modules(): 39 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d): 40 | nn.init.xavier_uniform(m.weight.data) 41 | if m.bias is not None: 42 | m.bias.data.zero_() 43 | 44 | def forward(self, target_image, ref_imgs): 45 | assert(len(ref_imgs) == self.nb_ref_imgs) 46 | input = [target_image] 47 | input.extend(ref_imgs) 48 | input = torch.cat(input, 1) 49 | out_conv0 = self.conv0(input) 50 | out_conv1 = self.conv1(out_conv0) 51 | out_conv2 = self.conv2(out_conv1) 52 | out_conv3 = self.conv3(out_conv2) 53 | out_conv4 = self.conv4(out_conv3) 54 | out_conv5 = self.conv5(out_conv4) 55 | out_conv6 = self.conv6(out_conv5) 56 | out_conv7 = self.conv7(out_conv6) 57 | 58 | pose = self.pose_pred(out_conv7) 59 | pose = pose.mean(3).mean(2) 60 | pose = 0.01 * pose.view(pose.size(0), self.nb_ref_imgs, 6) 61 | 62 | return pose 63 | -------------------------------------------------------------------------------- /models/PoseNetB6.py: -------------------------------------------------------------------------------- 1 | # Author: Anurag Ranjan 2 | # Copyright (c) 2019, Anurag Ranjan 3 | # All rights reserved. 4 | # based on github.com/ClementPinard/SfMLearner-Pytorch 5 | 6 | import torch 7 | import torch.nn as nn 8 | 9 | 10 | def conv(in_planes, out_planes, kernel_size=3): 11 | return nn.Sequential( 12 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2, stride=2), 13 | nn.ReLU(inplace=True) 14 | ) 15 | 16 | 17 | def upconv(in_planes, out_planes): 18 | return nn.Sequential( 19 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1), 20 | nn.ReLU(inplace=True) 21 | ) 22 | 23 | 24 | class PoseNetB6(nn.Module): 25 | 26 | def __init__(self, nb_ref_imgs=2): 27 | super(PoseNetB6, self).__init__() 28 | self.nb_ref_imgs = nb_ref_imgs 29 | 30 | conv_planes = [16, 32, 64, 128, 256, 256, 256, 256] 31 | self.conv1 = conv(3*(1+self.nb_ref_imgs), conv_planes[0], kernel_size=7) 32 | self.conv2 = conv(conv_planes[0], conv_planes[1], kernel_size=5) 33 | self.conv3 = conv(conv_planes[1], conv_planes[2]) 34 | self.conv4 = conv(conv_planes[2], conv_planes[3]) 35 | self.conv5 = conv(conv_planes[3], conv_planes[4]) 36 | self.conv6 = conv(conv_planes[4], conv_planes[5]) 37 | self.conv7 = conv(conv_planes[5], conv_planes[6]) 38 | self.conv8 = conv(conv_planes[6], conv_planes[7]) 39 | 40 | self.pose_pred = nn.Conv2d(conv_planes[7], 6*self.nb_ref_imgs, kernel_size=1, padding=0) 41 | 42 | 43 | def init_weights(self): 44 | for m in self.modules(): 45 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d): 46 | nn.init.xavier_uniform(m.weight.data) 47 | if m.bias is not None: 48 | m.bias.data.zero_() 49 | 50 | def init_mask_weights(self): 51 | for m in self.modules(): 52 | if isinstance(m, nn.ConvTranspose2d): 53 | nn.init.xavier_uniform(m.weight.data) 54 | if m.bias is not None: 55 | m.bias.data.zero_() 56 | 57 | for module in [self.pred_mask1, self.pred_mask2, self.pred_mask3, self.pred_mask4, self.pred_mask5, self.pred_mask6]: 58 | for m in module.modules(): 59 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d): 60 | nn.init.xavier_uniform(m.weight.data) 61 | if m.bias is not None: 62 | m.bias.data.zero_() 63 | 64 | 65 | def forward(self, target_image, ref_imgs): 66 | assert(len(ref_imgs) == self.nb_ref_imgs) 67 | input = [target_image] 68 | input.extend(ref_imgs) 69 | input = torch.cat(input, 1) 70 | out_conv1 = self.conv1(input) 71 | out_conv2 = self.conv2(out_conv1) 72 | out_conv3 = self.conv3(out_conv2) 73 | out_conv4 = self.conv4(out_conv3) 74 | out_conv5 = self.conv5(out_conv4) 75 | out_conv6 = self.conv6(out_conv5) 76 | out_conv7 = self.conv7(out_conv6) 77 | out_conv8 = self.conv8(out_conv7) 78 | 79 | pose = self.pose_pred(out_conv8) 80 | pose = pose.mean(3).mean(2) 81 | pose = 0.01 * pose.view(pose.size(0), self.nb_ref_imgs, 6) 82 | 83 | return pose 84 | -------------------------------------------------------------------------------- /models/__init__.py: -------------------------------------------------------------------------------- 1 | from .back2future import Model as Back2Future 2 | from .DispNetS import DispNetS 3 | from .DispNetS6 import DispNetS6 4 | from .DispResNet6 import DispResNet6 5 | from .DispResNetS6 import DispResNetS6 6 | from .FlowNetC6 import FlowNetC6 7 | from .MaskNet6 import MaskNet6 8 | from .MaskResNet6 import MaskResNet6 9 | from .PoseExpNet import PoseExpNet 10 | from .PoseNet6 import PoseNet6 11 | from .PoseNetB6 import PoseNetB6 12 | -------------------------------------------------------------------------------- /models/submodules.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | import torch 3 | import numpy as np 4 | 5 | def conv(batchNorm, in_planes, out_planes, kernel_size=3, stride=1): 6 | if batchNorm: 7 | return nn.Sequential( 8 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=(kernel_size-1)//2, bias=True), 9 | nn.BatchNorm2d(out_planes), 10 | #_leaky_relu() 11 | nn.LeakyReLU(0.1,inplace=True) 12 | ) 13 | else: 14 | return nn.Sequential( 15 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=(kernel_size-1)//2, bias=True), 16 | #_leaky_relu() 17 | nn.LeakyReLU(0.1,inplace=True) 18 | ) 19 | 20 | def i_conv(batchNorm, in_planes, out_planes, kernel_size=3, stride=1, bias = True): 21 | if batchNorm: 22 | return nn.Sequential( 23 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=(kernel_size-1)//2, bias=bias), 24 | nn.BatchNorm2d(out_planes), 25 | ) 26 | else: 27 | return nn.Sequential( 28 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=(kernel_size-1)//2, bias=bias), 29 | ) 30 | 31 | def predict_flow(in_planes): 32 | return nn.Conv2d(in_planes,2,kernel_size=3,stride=1,padding=1,bias=True) 33 | 34 | def deconv(in_planes, out_planes): 35 | return nn.Sequential( 36 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1, bias=True), 37 | #_leaky_relu() 38 | nn.LeakyReLU(0.1,inplace=True) 39 | ) 40 | 41 | class tofp16(nn.Module): 42 | def __init__(self): 43 | super(tofp16, self).__init__() 44 | 45 | def forward(self, input): 46 | return input.half() 47 | 48 | class _leaky_relu(nn.Module): 49 | def __init__(self): 50 | super(_leaky_relu, self).__init__() 51 | 52 | def forward(self, x): 53 | x_neg = 0.1*x 54 | return torch.max(x_neg, x) 55 | 56 | class tofp32(nn.Module): 57 | def __init__(self): 58 | super(tofp32, self).__init__() 59 | 60 | def forward(self, input): 61 | return input.float() 62 | 63 | 64 | def save_grad(grads, name): 65 | def hook(grad): 66 | grads[name] = grad 67 | return hook 68 | -------------------------------------------------------------------------------- /models/utils.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | import math 3 | 4 | import torch 5 | import torch.nn as nn 6 | 7 | 8 | def conv(in_planes, out_planes, stride=1, batch_norm=False): 9 | if batch_norm: 10 | return nn.Sequential( 11 | nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False), 12 | nn.BatchNorm2d(out_planes, eps=1e-3), 13 | nn.ReLU(inplace=True) 14 | ) 15 | else: 16 | return nn.Sequential( 17 | nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=True), 18 | nn.ReLU(inplace=True) 19 | ) 20 | 21 | 22 | def deconv(in_planes, out_planes, batch_norm=False): 23 | if batch_norm: 24 | return nn.Sequential( 25 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1, bias=True), 26 | nn.Conv2d(out_planes, out_planes, kernel_size=3, stride=1, padding=1, bias=False), 27 | nn.BatchNorm2d(out_planes, eps=1e-3), 28 | nn.ReLU(inplace=True) 29 | ) 30 | else: 31 | return nn.Sequential( 32 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1, bias=True), 33 | nn.Conv2d(out_planes, out_planes, kernel_size=3, stride=1, padding=1, bias=True), 34 | nn.ReLU(inplace=True) 35 | ) 36 | 37 | 38 | def predict_depth(in_planes, with_confidence): 39 | return nn.Conv2d(in_planes, 2 if with_confidence else 1, kernel_size=3, stride=1, padding=1, bias=True) 40 | 41 | 42 | def post_process_depth(depth, activation_function=None, clamp=False): 43 | if activation_function is not None: 44 | depth = activation_function(depth) 45 | 46 | if clamp: 47 | depth = depth.clamp(10, 80) 48 | 49 | return depth[:,0] 50 | 51 | 52 | def adaptative_cat(out_conv, out_deconv, out_depth_up): 53 | out_deconv = out_deconv[:, :, :out_conv.size(2), :out_conv.size(3)] 54 | out_depth_up = out_depth_up[:, :, :out_conv.size(2), :out_conv.size(3)] 55 | return torch.cat((out_conv, out_deconv, out_depth_up), 1) 56 | 57 | 58 | def init_modules(net): 59 | for m in net.modules(): 60 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d): 61 | n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels 62 | m.weight.data.normal_(0, math.sqrt(2/n)) 63 | if m.bias is not None: 64 | m.bias.data.zero_() 65 | elif isinstance(m, nn.BatchNorm2d): 66 | m.weight.data.fill_(1) 67 | m.bias.data.zero_() 68 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | torchvision 2 | scipy 3 | argparse 4 | tensorboardX 5 | blessings 6 | progressbar2 7 | path.py 8 | matplotlib 9 | opencv-python 10 | scikit-image 11 | pypng 12 | tqdm 13 | spatial-correlation-sampler 14 | -------------------------------------------------------------------------------- /run_inference.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | from scipy.misc import imread, imsave, imresize 4 | import numpy as np 5 | from path import Path 6 | import argparse 7 | from tqdm import tqdm 8 | 9 | from models import DispNetS 10 | from utils import tensor2array 11 | 12 | parser = argparse.ArgumentParser(description='Inference script for DispNet learned with \ 13 | Structure from Motion Learner inference on KITTI and CityScapes Dataset', 14 | formatter_class=argparse.ArgumentDefaultsHelpFormatter) 15 | parser.add_argument("--output-disp", action='store_true', help="save disparity img") 16 | parser.add_argument("--output-depth", action='store_true', help="save depth img") 17 | parser.add_argument("--pretrained", required=True, type=str, help="pretrained DispNet path") 18 | parser.add_argument("--img-height", default=128, type=int, help="Image height") 19 | parser.add_argument("--img-width", default=416, type=int, help="Image width") 20 | parser.add_argument("--no-resize", action='store_true', help="no resizing is done") 21 | 22 | parser.add_argument("--dataset-list", default=None, type=str, help="Dataset list file") 23 | parser.add_argument("--dataset-dir", default='.', type=str, help="Dataset directory") 24 | parser.add_argument("--output-dir", default='output', type=str, help="Output directory") 25 | 26 | parser.add_argument("--img-exts", default=['png', 'jpg', 'bmp'], nargs='*', type=str, help="images extensions to glob") 27 | 28 | 29 | def main(): 30 | args = parser.parse_args() 31 | if not(args.output_disp or args.output_depth): 32 | print('You must at least output one value !') 33 | return 34 | 35 | disp_net = DispNetS().cuda() 36 | weights = torch.load(args.pretrained) 37 | disp_net.load_state_dict(weights['state_dict']) 38 | disp_net.eval() 39 | 40 | dataset_dir = Path(args.dataset_dir) 41 | output_dir = Path(args.output_dir) 42 | output_dir.makedirs_p() 43 | 44 | if args.dataset_list is not None: 45 | with open(args.dataset_list, 'r') as f: 46 | test_files = [dataset_dir/file for file in f.read().splitlines()] 47 | else: 48 | test_files = sum([dataset_dir.files('*.{}'.format(ext)) for ext in args.img_exts], []) 49 | 50 | print('{} files to test'.format(len(test_files))) 51 | 52 | for file in tqdm(test_files): 53 | 54 | img = imread(file).astype(np.float32) 55 | 56 | h,w,_ = img.shape 57 | if (not args.no_resize) and (h != args.img_height or w != args.img_width): 58 | img = imresize(img, (args.img_height, args.img_width)).astype(np.float32) 59 | img = np.transpose(img, (2, 0, 1)) 60 | 61 | tensor_img = torch.from_numpy(img).unsqueeze(0) 62 | tensor_img = ((tensor_img/255 - 0.5)/0.2).cuda() 63 | var_img = torch.autograd.Variable(tensor_img, volatile=True) 64 | 65 | output = disp_net(var_img).data.cpu()[0] 66 | 67 | if args.output_disp: 68 | disp = (255*tensor2array(output, max_value=None, colormap='bone')).astype(np.uint8) 69 | imsave(output_dir/'{}_disp{}'.format(file.namebase,file.ext), disp) 70 | if args.output_depth: 71 | depth = 1/output 72 | depth = (255*tensor2array(depth, max_value=10, colormap='rainbow')).astype(np.uint8) 73 | imsave(output_dir/'{}_depth{}'.format(file.namebase,file.ext), depth) 74 | 75 | 76 | if __name__ == '__main__': 77 | main() 78 | -------------------------------------------------------------------------------- /sintel_eval/pose_evaluation_utils.py: -------------------------------------------------------------------------------- 1 | # Mostly based on the code written by Clement Godard: 2 | # https://github.com/mrharicot/monodepth/blob/master/utils/evaluation_utils.py 3 | import numpy as np 4 | # import pandas as pd 5 | from path import Path 6 | from scipy.misc import imread 7 | from tqdm import tqdm 8 | from .sintel_io import cam_read 9 | 10 | class test_framework_Sintel(object): 11 | def __init__(self, root, sequence_set, seq_length=3, step=1): 12 | self.root = root 13 | self.img_files, self.poses, self.sample_indices = read_scene_data(self.root, sequence_set, seq_length, step) 14 | 15 | def generator(self): 16 | for img_list, pose_list, sample_list in zip(self.img_files, self.poses, self.sample_indices): 17 | for snippet_indices in sample_list: 18 | imgs = [imread(img_list[i]).astype(np.float32) for i in snippet_indices] 19 | poses = [cam_read(pose_list[i], pose_only=True).astype(np.float32) for i in snippet_indices] 20 | poses = np.stack(poses) 21 | first_pose = poses[0] 22 | poses[:,:,-1] -= first_pose[:,-1] 23 | compensated_poses = np.linalg.inv(first_pose[:,:3]) @ poses 24 | 25 | yield {'imgs': imgs, 26 | 'path': img_list[0], 27 | 'poses': compensated_poses 28 | } 29 | 30 | def __iter__(self): 31 | return self.generator() 32 | 33 | def __len__(self): 34 | return sum(len(imgs) for imgs in self.img_files) 35 | 36 | 37 | def read_scene_data(data_root, sequence_set, seq_length=3, step=1): 38 | data_root = Path(data_root) 39 | im_sequences = [] 40 | poses_sequences = [] 41 | indices_sequences = [] 42 | demi_length = (seq_length - 1) // 2 43 | shift_range = np.array([step*i for i in range(-demi_length, demi_length + 1)]).reshape(1, -1) 44 | 45 | sequences = set() 46 | for seq in sequence_set: 47 | corresponding_dirs = set((data_root/'clean').dirs(seq)) 48 | sequences = sequences | corresponding_dirs 49 | 50 | print('getting test metadata for theses sequences : {}'.format(sequences)) 51 | for sequence in tqdm(sequences): 52 | poses = sorted(Path(sequence.replace('/clean/', '/camdata_left/')).files('*.cam')) 53 | # np.genfromtxt(data_root/'poses'/'{}.txt'.format(sequence.name)).astype(np.float64).reshape(-1, 3, 4) 54 | imgs = sorted(sequence.files('*.png')) 55 | # construct 5-snippet sequences 56 | tgt_indices = np.arange(demi_length, len(imgs) - demi_length).reshape(-1, 1) 57 | snippet_indices = shift_range + tgt_indices 58 | im_sequences.append(imgs) 59 | poses_sequences.append(poses) 60 | indices_sequences.append(snippet_indices) 61 | return im_sequences, poses_sequences, indices_sequences 62 | -------------------------------------------------------------------------------- /sintel_eval/sintel_io.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python2 2 | 3 | """ 4 | I/O script to save and load the data coming with the MPI-Sintel low-level 5 | computer vision benchmark. 6 | 7 | For more details about the benchmark, please visit www.mpi-sintel.de 8 | 9 | CHANGELOG: 10 | v1.0 (2015/02/03): First release 11 | 12 | Copyright (c) 2015 Jonas Wulff 13 | Max Planck Institute for Intelligent Systems, Tuebingen, Germany 14 | 15 | """ 16 | 17 | # Requirements: Numpy as PIL/Pillow 18 | import numpy as np 19 | from PIL import Image 20 | 21 | # Check for endianness, based on Daniel Scharstein's optical flow code. 22 | # Using little-endian architecture, these two should be equal. 23 | TAG_FLOAT = 202021.25 24 | TAG_CHAR = 'PIEH' 25 | 26 | def flow_read(filename): 27 | """ Read optical flow from file, return (U,V) tuple. 28 | 29 | Original code by Deqing Sun, adapted from Daniel Scharstein. 30 | """ 31 | f = open(filename,'rb') 32 | check = np.fromfile(f,dtype=np.float32,count=1)[0] 33 | assert check == TAG_FLOAT, ' flow_read:: Wrong tag in flow file (should be: {0}, is: {1}). Big-endian machine? '.format(TAG_FLOAT,check) 34 | width = np.fromfile(f,dtype=np.int32,count=1)[0] 35 | height = np.fromfile(f,dtype=np.int32,count=1)[0] 36 | size = width*height 37 | assert width > 0 and height > 0 and size > 1 and size < 100000000, ' flow_read:: Wrong input size (width = {0}, height = {1}).'.format(width,height) 38 | tmp = np.fromfile(f,dtype=np.float32,count=-1).reshape((height,width*2)) 39 | u = tmp[:,np.arange(width)*2] 40 | v = tmp[:,np.arange(width)*2 + 1] 41 | return u,v 42 | 43 | def flow_write(filename,uv,v=None): 44 | """ Write optical flow to file. 45 | 46 | If v is None, uv is assumed to contain both u and v channels, 47 | stacked in depth. 48 | 49 | Original code by Deqing Sun, adapted from Daniel Scharstein. 50 | """ 51 | nBands = 2 52 | 53 | if v is None: 54 | assert(uv.ndim == 3) 55 | assert(uv.shape[2] == 2) 56 | u = uv[:,:,0] 57 | v = uv[:,:,1] 58 | else: 59 | u = uv 60 | 61 | assert(u.shape == v.shape) 62 | height,width = u.shape 63 | f = open(filename,'wb') 64 | # write the header 65 | f.write(TAG_CHAR) 66 | np.array(width).astype(np.int32).tofile(f) 67 | np.array(height).astype(np.int32).tofile(f) 68 | # arrange into matrix form 69 | tmp = np.zeros((height, width*nBands)) 70 | tmp[:,np.arange(width)*2] = u 71 | tmp[:,np.arange(width)*2 + 1] = v 72 | tmp.astype(np.float32).tofile(f) 73 | f.close() 74 | 75 | 76 | def depth_read(filename): 77 | """ Read depth data from file, return as numpy array. """ 78 | f = open(filename,'rb') 79 | check = np.fromfile(f,dtype=np.float32,count=1)[0] 80 | assert check == TAG_FLOAT, ' depth_read:: Wrong tag in flow file (should be: {0}, is: {1}). Big-endian machine? '.format(TAG_FLOAT,check) 81 | width = np.fromfile(f,dtype=np.int32,count=1)[0] 82 | height = np.fromfile(f,dtype=np.int32,count=1)[0] 83 | size = width*height 84 | assert width > 0 and height > 0 and size > 1 and size < 100000000, ' depth_read:: Wrong input size (width = {0}, height = {1}).'.format(width,height) 85 | depth = np.fromfile(f,dtype=np.float32,count=-1).reshape((height,width)) 86 | return depth 87 | 88 | def depth_write(filename, depth): 89 | """ Write depth to file. """ 90 | height,width = depth.shape[:2] 91 | f = open(filename,'wb') 92 | # write the header 93 | f.write(TAG_CHAR) 94 | np.array(width).astype(np.int32).tofile(f) 95 | np.array(height).astype(np.int32).tofile(f) 96 | 97 | depth.astype(np.float32).tofile(f) 98 | f.close() 99 | 100 | 101 | def disparity_write(filename,disparity,bitdepth=16): 102 | """ Write disparity to file. 103 | 104 | bitdepth can be either 16 (default) or 32. 105 | 106 | The maximum disparity is 1024, since the image width in Sintel 107 | is 1024. 108 | """ 109 | d = disparity.copy() 110 | 111 | # Clip disparity. 112 | d[d>1024] = 1024 113 | d[d<0] = 0 114 | 115 | d_r = (d / 4.0).astype('uint8') 116 | d_g = ((d * (2.0**6)) % 256).astype('uint8') 117 | 118 | out = np.zeros((d.shape[0],d.shape[1],3),dtype='uint8') 119 | out[:,:,0] = d_r 120 | out[:,:,1] = d_g 121 | 122 | if bitdepth > 16: 123 | d_b = (d * (2**14) % 256).astype('uint8') 124 | out[:,:,2] = d_b 125 | 126 | Image.fromarray(out,'RGB').save(filename,'PNG') 127 | 128 | 129 | def disparity_read(filename): 130 | """ Return disparity read from filename. """ 131 | f_in = np.array(Image.open(filename)) 132 | d_r = f_in[:,:,0].astype('float64') 133 | d_g = f_in[:,:,1].astype('float64') 134 | d_b = f_in[:,:,2].astype('float64') 135 | 136 | depth = d_r * 4 + d_g / (2**6) + d_b / (2**14) 137 | return depth 138 | 139 | 140 | def cam_read(filename, pose_only=False): 141 | """ Read camera data, return (M,N) tuple. 142 | 143 | M is the intrinsic matrix, N is the extrinsic matrix, so that 144 | 145 | x = M*N*X, 146 | with x being a point in homogeneous image pixel coordinates, X being a 147 | point in homogeneous world coordinates. 148 | """ 149 | f = open(filename,'rb') 150 | check = np.fromfile(f,dtype=np.float32,count=1)[0] 151 | assert check == TAG_FLOAT, ' cam_read:: Wrong tag in flow file (should be: {0}, is: {1}). Big-endian machine? '.format(TAG_FLOAT,check) 152 | M = np.fromfile(f,dtype='float64',count=9).reshape((3,3)) 153 | N = np.fromfile(f,dtype='float64',count=12).reshape((3,4)) 154 | if pose_only: 155 | return N 156 | else: 157 | return M,N 158 | 159 | def cam_write(filename, M, N): 160 | """ Write intrinsic matrix M and extrinsic matrix N to file. """ 161 | f = open(filename,'wb') 162 | # write the header 163 | f.write(TAG_CHAR) 164 | M.astype('float64').tofile(f) 165 | N.astype('float64').tofile(f) 166 | f.close() 167 | 168 | 169 | def segmentation_write(filename,segmentation): 170 | """ Write segmentation to file. """ 171 | 172 | segmentation_ = segmentation.astype('int32') 173 | seg_r = np.floor(segmentation_ / (256**2)).astype('uint8') 174 | seg_g = np.floor((segmentation_ % (256**2)) / 256).astype('uint8') 175 | seg_b = np.floor(segmentation_ % 256).astype('uint8') 176 | 177 | out = np.zeros((segmentation.shape[0],segmentation.shape[1],3),dtype='uint8') 178 | out[:,:,0] = seg_r 179 | out[:,:,1] = seg_g 180 | out[:,:,2] = seg_b 181 | 182 | Image.fromarray(out,'RGB').save(filename,'PNG') 183 | 184 | 185 | def segmentation_read(filename): 186 | """ Return disparity read from filename. """ 187 | f_in = np.array(Image.open(filename)) 188 | seg_r = f_in[:,:,0].astype('int32') 189 | seg_g = f_in[:,:,1].astype('int32') 190 | seg_b = f_in[:,:,2].astype('int32') 191 | 192 | segmentation = (seg_r * 256 + seg_g) * 256 + seg_b 193 | return segmentation 194 | -------------------------------------------------------------------------------- /ssim.py: -------------------------------------------------------------------------------- 1 | # Author: Jonas Wulff 2 | 3 | import torch 4 | import torch.nn.functional as F 5 | from torch.autograd import Variable 6 | import numpy as np 7 | from math import exp 8 | 9 | def gaussian(window_size, sigma): 10 | gauss = torch.Tensor([exp(-(x - window_size//2)**2/float(2*sigma**2)) for x in range(window_size)]) 11 | return gauss/gauss.sum() 12 | 13 | def create_window(window_size, channel): 14 | _1D_window = gaussian(window_size, 1.5).unsqueeze(1) 15 | _2D_window = _1D_window.mm(_1D_window.t()).float().unsqueeze(0).unsqueeze(0) 16 | window = Variable(_2D_window.expand(channel, 1, window_size, window_size).contiguous(), requires_grad=False) 17 | return window 18 | 19 | def _ssim(img1, img2, window, window_size, channel, size_average = True): 20 | mu1 = F.conv2d(img1, window, padding = window_size//2, groups = channel) 21 | mu2 = F.conv2d(img2, window, padding = window_size//2, groups = channel) 22 | 23 | mu1_sq = mu1.pow(2) 24 | mu2_sq = mu2.pow(2) 25 | mu1_mu2 = mu1*mu2 26 | 27 | sigma1_sq = F.conv2d(img1*img1, window, padding = window_size//2, groups = channel) - mu1_sq 28 | sigma2_sq = F.conv2d(img2*img2, window, padding = window_size//2, groups = channel) - mu2_sq 29 | sigma12 = F.conv2d(img1*img2, window, padding = window_size//2, groups = channel) - mu1_mu2 30 | 31 | C1 = 0.01**2 32 | C2 = 0.03**2 33 | 34 | ssim_map = ((2*mu1_mu2 + C1)*(2*sigma12 + C2))/((mu1_sq + mu2_sq + C1)*(sigma1_sq + sigma2_sq + C2)) 35 | 36 | return ssim_map 37 | #if size_average: 38 | # return ssim_map.mean() 39 | #else: 40 | # return ssim_map.mean(1).mean(1).mean(1) 41 | 42 | class SSIM(torch.nn.Module): 43 | def __init__(self, window_size = 11, size_average = True): 44 | super(SSIM, self).__init__() 45 | self.window_size = window_size 46 | self.size_average = size_average 47 | self.channel = 1 48 | self.window = create_window(window_size, self.channel) 49 | 50 | def forward(self, img1, img2): 51 | (_, channel, _, _) = img1.size() 52 | 53 | if channel == self.channel and self.window.data.type() == img1.data.type(): 54 | window = self.window 55 | else: 56 | window = create_window(self.window_size, channel) 57 | 58 | if img1.is_cuda: 59 | window = window.cuda(img1.get_device()) 60 | window = window.type_as(img1) 61 | 62 | self.window = window 63 | self.channel = channel 64 | 65 | 66 | return _ssim(img1, img2, window, self.window_size, channel, self.size_average) 67 | 68 | def ssim(img1, img2, window_size = 13, size_average = True): 69 | (_, channel, _, _) = img1.size() 70 | window = create_window(window_size, channel) 71 | 72 | if img1.is_cuda: 73 | window = window.cuda(img1.get_device()) 74 | window = window.type_as(img1) 75 | 76 | return _ssim(img1, img2, window, window_size, channel, size_average) 77 | -------------------------------------------------------------------------------- /stillbox_eval/depth_evaluation_utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import json 3 | from path import Path 4 | from scipy.misc import imread 5 | from tqdm import tqdm 6 | 7 | 8 | class test_framework_stillbox(object): 9 | def __init__(self, root, test_files, seq_length=3, min_depth=1e-3, max_depth=80, step=1): 10 | self.root = root 11 | self.min_depth, self.max_depth = min_depth, max_depth 12 | self.gt_files, self.img_files, self.displacements = read_scene_data(self.root, test_files, seq_length, step) 13 | 14 | def __getitem__(self, i): 15 | tgt = imread(self.img_files[i][0]).astype(np.float32) 16 | depth = np.load(self.gt_files[i]) 17 | return {'tgt': tgt, 18 | 'ref': [imread(img).astype(np.float32) for img in self.img_files[i][1]], 19 | 'path':self.img_files[i][0], 20 | 'gt_depth': depth, 21 | 'displacements': np.array(self.displacements[i]), 22 | 'mask': generate_mask(depth, self.min_depth, self.max_depth) 23 | } 24 | 25 | def __len__(self): 26 | return len(self.img_files) 27 | 28 | 29 | def get_displacements(scene, index, ref_indices): 30 | speed = np.around(np.linalg.norm(scene['speed']), decimals=3) 31 | assert(all(i < scene['length'] and i >= 0 for i in ref_indices)), str(ref_indices) 32 | return [speed*scene['time_step']*abs(index - i) for i in ref_indices] 33 | 34 | 35 | def read_scene_data(data_root, test_list, seq_length=3, step=1): 36 | data_root = Path(data_root) 37 | metadata_files = {} 38 | for folder in data_root.dirs(): 39 | with open(folder/'metadata.json', 'r') as f: 40 | metadata_files[str(folder.name)] = json.load(f) 41 | gt_files = [] 42 | im_files = [] 43 | displacements = [] 44 | demi_length = (seq_length - 1) // 2 45 | shift_range = [step*i for i in list(range(-demi_length,0)) + list(range(1, demi_length + 1))] 46 | 47 | print('getting test metadata ... ') 48 | for sample in tqdm(test_list): 49 | folder, file = sample.split('/') 50 | _, scene_index, index = file[:-4].split('_') # filename is in the form 'RGB_XXXX_XX.jpg' 51 | index = int(index) 52 | scene = metadata_files[folder]['scenes'][int(scene_index)] 53 | tgt_img_path = data_root/sample 54 | folder_path = data_root/folder 55 | if tgt_img_path.isfile(): 56 | capped_indices_range = list(map(lambda x: min(max(0, index + x), scene['length'] - 1), shift_range)) 57 | ref_imgs_path = [folder_path/'{}'.format(scene['imgs'][ref_index]) for ref_index in capped_indices_range] 58 | 59 | gt_files.append(folder_path/'{}'.format(scene['depth'][index])) 60 | im_files.append([tgt_img_path,ref_imgs_path]) 61 | displacements.append(get_displacements(scene, index, capped_indices_range)) 62 | else: 63 | print('{} missing'.format(tgt_img_path)) 64 | 65 | return gt_files, im_files, displacements 66 | 67 | 68 | def generate_mask(gt_depth, min_depth, max_depth): 69 | mask = np.logical_and(gt_depth > min_depth, 70 | gt_depth < max_depth) 71 | # crop gt to exclude border values 72 | # if used on gt_size 100x100 produces a crop of [-95, -5, 5, 95] 73 | gt_height, gt_width = gt_depth.shape 74 | crop = np.array([0.05 * gt_height, 0.95 * gt_height, 75 | 0.05 * gt_width, 0.95 * gt_width]).astype(np.int32) 76 | 77 | crop_mask = np.zeros(mask.shape) 78 | crop_mask[crop[0]:crop[1],crop[2]:crop[3]] = 1 79 | mask = np.logical_and(mask, crop_mask) 80 | return mask 81 | -------------------------------------------------------------------------------- /stillbox_eval/test_files_90.txt: -------------------------------------------------------------------------------- 1 | 15/RGB_112_008.jpg 2 | 15/RGB_178_002.jpg 3 | 15/RGB_167_006.jpg 4 | 15/RGB_153_007.jpg 5 | 15/RGB_119_002.jpg 6 | 15/RGB_135_003.jpg 7 | 15/RGB_44_006.jpg 8 | 15/RGB_32_002.jpg 9 | 15/RGB_171_001.jpg 10 | 15/RGB_114_009.jpg 11 | 15/RGB_89_003.jpg 12 | 15/RGB_197_009.jpg 13 | 15/RGB_105_000.jpg 14 | 15/RGB_72_004.jpg 15 | 15/RGB_66_003.jpg 16 | 15/RGB_25_007.jpg 17 | 15/RGB_58_004.jpg 18 | 15/RGB_28_003.jpg 19 | 15/RGB_25_004.jpg 20 | 15/RGB_140_003.jpg 21 | 15/RGB_59_008.jpg 22 | 15/RGB_19_001.jpg 23 | 15/RGB_186_003.jpg 24 | 15/RGB_113_009.jpg 25 | 15/RGB_54_002.jpg 26 | 15/RGB_130_003.jpg 27 | 15/RGB_153_003.jpg 28 | 15/RGB_103_007.jpg 29 | 15/RGB_04_007.jpg 30 | 15/RGB_110_008.jpg 31 | 15/RGB_78_005.jpg 32 | 15/RGB_26_005.jpg 33 | 15/RGB_43_007.jpg 34 | 15/RGB_190_003.jpg 35 | 15/RGB_122_002.jpg 36 | 15/RGB_102_008.jpg 37 | 15/RGB_187_004.jpg 38 | 15/RGB_03_005.jpg 39 | 15/RGB_58_007.jpg 40 | 15/RGB_37_004.jpg 41 | 15/RGB_125_003.jpg 42 | 15/RGB_190_002.jpg 43 | 15/RGB_52_006.jpg 44 | 15/RGB_37_005.jpg 45 | 15/RGB_196_001.jpg 46 | 15/RGB_53_003.jpg 47 | 15/RGB_129_008.jpg 48 | 15/RGB_74_003.jpg 49 | 15/RGB_167_000.jpg 50 | 15/RGB_195_002.jpg 51 | 15/RGB_10_007.jpg 52 | 15/RGB_131_003.jpg 53 | 15/RGB_37_003.jpg 54 | 15/RGB_38_009.jpg 55 | 15/RGB_115_004.jpg 56 | 15/RGB_91_008.jpg 57 | 15/RGB_43_004.jpg 58 | 15/RGB_187_005.jpg 59 | 15/RGB_112_003.jpg 60 | 15/RGB_19_002.jpg 61 | 15/RGB_170_008.jpg 62 | 15/RGB_17_000.jpg 63 | 15/RGB_62_005.jpg 64 | 15/RGB_148_004.jpg 65 | 15/RGB_12_008.jpg 66 | 15/RGB_169_004.jpg 67 | 15/RGB_112_004.jpg 68 | 15/RGB_71_001.jpg 69 | 15/RGB_103_001.jpg 70 | 15/RGB_178_005.jpg 71 | 15/RGB_92_006.jpg 72 | 15/RGB_40_009.jpg 73 | 15/RGB_138_006.jpg 74 | 15/RGB_146_005.jpg 75 | 15/RGB_04_006.jpg 76 | 15/RGB_02_008.jpg 77 | 15/RGB_101_009.jpg 78 | 15/RGB_103_009.jpg 79 | 15/RGB_21_002.jpg 80 | 15/RGB_144_008.jpg 81 | 15/RGB_163_007.jpg 82 | 15/RGB_06_001.jpg 83 | 15/RGB_105_004.jpg 84 | 15/RGB_199_009.jpg 85 | 15/RGB_149_005.jpg 86 | 15/RGB_63_008.jpg 87 | 15/RGB_21_004.jpg 88 | 15/RGB_03_002.jpg 89 | 15/RGB_51_008.jpg 90 | 15/RGB_110_001.jpg 91 | 15/RGB_172_009.jpg 92 | 15/RGB_158_005.jpg 93 | 15/RGB_49_004.jpg 94 | 15/RGB_173_008.jpg 95 | 15/RGB_99_004.jpg 96 | 15/RGB_24_001.jpg 97 | 15/RGB_03_009.jpg 98 | 15/RGB_41_009.jpg 99 | 15/RGB_91_002.jpg 100 | 15/RGB_132_001.jpg 101 | 15/RGB_95_003.jpg 102 | 15/RGB_167_005.jpg 103 | 15/RGB_176_000.jpg 104 | 15/RGB_142_008.jpg 105 | 15/RGB_107_009.jpg 106 | 15/RGB_122_005.jpg 107 | 15/RGB_48_001.jpg 108 | 15/RGB_103_005.jpg 109 | 15/RGB_98_009.jpg 110 | 15/RGB_162_001.jpg 111 | 15/RGB_08_006.jpg 112 | 15/RGB_169_002.jpg 113 | 15/RGB_57_002.jpg 114 | 15/RGB_86_004.jpg 115 | 15/RGB_138_001.jpg 116 | 15/RGB_05_005.jpg 117 | 15/RGB_95_002.jpg 118 | 15/RGB_28_002.jpg 119 | 15/RGB_110_002.jpg 120 | 15/RGB_102_002.jpg 121 | 15/RGB_136_009.jpg 122 | 15/RGB_28_007.jpg 123 | 15/RGB_43_005.jpg 124 | 15/RGB_39_006.jpg 125 | 15/RGB_126_003.jpg 126 | 15/RGB_62_001.jpg 127 | 15/RGB_82_003.jpg 128 | 15/RGB_75_008.jpg 129 | 15/RGB_16_005.jpg 130 | 15/RGB_94_005.jpg 131 | 15/RGB_198_002.jpg 132 | 15/RGB_90_001.jpg 133 | 15/RGB_22_001.jpg 134 | 15/RGB_90_000.jpg 135 | 15/RGB_155_006.jpg 136 | 15/RGB_124_007.jpg 137 | 15/RGB_168_004.jpg 138 | 15/RGB_96_008.jpg 139 | 15/RGB_100_002.jpg 140 | 15/RGB_131_008.jpg 141 | 15/RGB_74_002.jpg 142 | 15/RGB_141_007.jpg 143 | 15/RGB_139_001.jpg 144 | 15/RGB_102_005.jpg 145 | 15/RGB_182_009.jpg 146 | 15/RGB_37_002.jpg 147 | 15/RGB_67_003.jpg 148 | 15/RGB_60_001.jpg 149 | 15/RGB_186_001.jpg 150 | 15/RGB_171_002.jpg 151 | 15/RGB_155_004.jpg 152 | 15/RGB_50_008.jpg 153 | 15/RGB_34_002.jpg 154 | 15/RGB_132_003.jpg 155 | 15/RGB_147_005.jpg 156 | 15/RGB_99_008.jpg 157 | 15/RGB_110_000.jpg 158 | 15/RGB_114_008.jpg 159 | 15/RGB_159_002.jpg 160 | 15/RGB_76_007.jpg 161 | 15/RGB_116_005.jpg 162 | 15/RGB_67_002.jpg 163 | 15/RGB_80_003.jpg 164 | 15/RGB_30_000.jpg 165 | 15/RGB_137_009.jpg 166 | 15/RGB_130_002.jpg 167 | 15/RGB_90_002.jpg 168 | 15/RGB_34_008.jpg 169 | 15/RGB_137_007.jpg 170 | 15/RGB_45_001.jpg 171 | 15/RGB_131_004.jpg 172 | 15/RGB_06_000.jpg 173 | 15/RGB_68_005.jpg 174 | 15/RGB_104_008.jpg 175 | 15/RGB_193_008.jpg 176 | 15/RGB_182_000.jpg 177 | 15/RGB_129_006.jpg 178 | 15/RGB_107_005.jpg 179 | 15/RGB_158_007.jpg 180 | 15/RGB_192_001.jpg 181 | 15/RGB_18_005.jpg 182 | 15/RGB_90_009.jpg 183 | 15/RGB_18_007.jpg 184 | 15/RGB_94_000.jpg 185 | 15/RGB_09_002.jpg 186 | 15/RGB_94_001.jpg 187 | 15/RGB_46_004.jpg 188 | 15/RGB_126_000.jpg 189 | 15/RGB_146_002.jpg 190 | 15/RGB_161_006.jpg 191 | 15/RGB_154_008.jpg 192 | 15/RGB_94_003.jpg 193 | -------------------------------------------------------------------------------- /submit_flow.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | from tqdm import tqdm 4 | import numpy as np 5 | from path import Path 6 | from tensorboardX import SummaryWriter 7 | import torch 8 | from torch.autograd import Variable 9 | import torch.nn as nn 10 | 11 | import custom_transforms 12 | from inverse_warp import pose2flow 13 | from datasets.validation_flow import KITTI2015Test 14 | import models 15 | from logger import AverageMeter 16 | from PIL import Image 17 | from torchvision.transforms import ToPILImage 18 | from flowutils.flowlib import flow_to_image 19 | from utils import tensor2array 20 | from loss_functions import compute_all_epes 21 | from flowutils import flow_io 22 | 23 | 24 | parser = argparse.ArgumentParser(description='Structure from Motion Learner training on KITTI and CityScapes Dataset', 25 | formatter_class=argparse.ArgumentDefaultsHelpFormatter) 26 | parser.add_argument('--kitti-dir', dest='kitti_dir', type=str, default='/ps/project/datasets/AllFlowData/kitti/kitti2015', 27 | help='Path to kitti2015 scene flow dataset for optical flow validation') 28 | parser.add_argument('--dispnet', dest='dispnet', type=str, default='DispResNet6', choices=['DispResNet6', 'DispNetS5', 'DispNetS6'], 29 | help='depth network architecture.') 30 | parser.add_argument('--posenet', dest='posenet', type=str, default='PoseNetB6', choices=['PoseNet6','PoseNetB6', 'PoseExpNet5', 'PoseExpNet6'], 31 | help='pose and explainabity mask network architecture. ') 32 | parser.add_argument('--masknet', dest='masknet', type=str, default='MaskNet6', choices=['MaskResNet6', 'MaskNet6', 'PoseExpNet5', 'PoseExpNet6'], 33 | help='pose and explainabity mask network architecture. ') 34 | parser.add_argument('--flownet', dest='flownet', type=str, default='Back2Future', choices=['PWCNet','FlowNetS', 'Back2Future', 'FlowNetC5','FlowNetC6', 'SpyNet'], 35 | help='flow network architecture. Options: FlowNetS | SpyNet') 36 | 37 | parser.add_argument('--DEBUG', action='store_true', help='DEBUG Mode') 38 | parser.add_argument('--THRESH', dest='THRESH', type=float, default=0.01, help='THRESH') 39 | parser.add_argument('--mu', dest='mu', type=float, default=1.0, help='mu') 40 | parser.add_argument('--pretrained-path', dest='pretrained_path', default=None, metavar='PATH', help='path to pre-trained dispnet model') 41 | parser.add_argument('--nlevels', dest='nlevels', type=int, default=6, help='number of levels in multiscale. Options: 4|5') 42 | parser.add_argument('--dataset', dest='dataset', default='kitti2015', help='path to pre-trained Flow net model') 43 | parser.add_argument('--output-dir', dest='output_dir', type=str, default=None, help='path to output directory') 44 | 45 | 46 | def main(): 47 | global args 48 | args = parser.parse_args() 49 | args.pretrained_path = Path(args.pretrained_path) 50 | 51 | if args.output_dir is not None: 52 | args.output_dir = Path(args.output_dir) 53 | args.output_dir.makedirs_p() 54 | 55 | image_dir = args.output_dir/'images' 56 | mask_dir = args.output_dir/'mask' 57 | viz_dir = args.output_dir/'viz' 58 | testing_dir = args.output_dir/'testing' 59 | testing_dir_flo = args.output_dir/'testing_flo' 60 | 61 | image_dir.makedirs_p() 62 | mask_dir.makedirs_p() 63 | viz_dir.makedirs_p() 64 | testing_dir.makedirs_p() 65 | testing_dir_flo.makedirs_p() 66 | 67 | normalize = custom_transforms.Normalize(mean=[0.5, 0.5, 0.5], 68 | std=[0.5, 0.5, 0.5]) 69 | flow_loader_h, flow_loader_w = 256, 832 70 | valid_flow_transform = custom_transforms.Compose([custom_transforms.Scale(h=flow_loader_h, w=flow_loader_w), 71 | custom_transforms.ArrayToTensor(), normalize]) 72 | 73 | val_flow_set = KITTI2015Test(root=args.kitti_dir, 74 | sequence_length=5, transform=valid_flow_transform) 75 | 76 | if args.DEBUG: 77 | print("DEBUG MODE: Using Training Set") 78 | val_flow_set = KITTI2015Test(root=args.kitti_dir, 79 | sequence_length=5, transform=valid_flow_transform, phase='training') 80 | 81 | val_loader = torch.utils.data.DataLoader(val_flow_set, batch_size=1, shuffle=False, 82 | num_workers=2, pin_memory=True, drop_last=True) 83 | 84 | disp_net = getattr(models, args.dispnet)().cuda() 85 | pose_net = getattr(models, args.posenet)(nb_ref_imgs=4).cuda() 86 | mask_net = getattr(models, args.masknet)(nb_ref_imgs=4).cuda() 87 | flow_net = getattr(models, args.flownet)(nlevels=args.nlevels).cuda() 88 | 89 | dispnet_weights = torch.load(args.pretrained_path/'dispnet_model_best.pth.tar') 90 | posenet_weights = torch.load(args.pretrained_path/'posenet_model_best.pth.tar') 91 | masknet_weights = torch.load(args.pretrained_path/'masknet_model_best.pth.tar') 92 | flownet_weights = torch.load(args.pretrained_path/'flownet_model_best.pth.tar') 93 | disp_net.load_state_dict(dispnet_weights['state_dict']) 94 | pose_net.load_state_dict(posenet_weights['state_dict']) 95 | flow_net.load_state_dict(flownet_weights['state_dict']) 96 | mask_net.load_state_dict(masknet_weights['state_dict']) 97 | 98 | disp_net.eval() 99 | pose_net.eval() 100 | mask_net.eval() 101 | flow_net.eval() 102 | 103 | for i, (tgt_img, ref_imgs, intrinsics, intrinsics_inv, tgt_img_original) in enumerate(tqdm(val_loader)): 104 | tgt_img_var = Variable(tgt_img.cuda(), volatile=True) 105 | ref_imgs_var = [Variable(img.cuda(), volatile=True) for img in ref_imgs] 106 | intrinsics_var = Variable(intrinsics.cuda(), volatile=True) 107 | intrinsics_inv_var = Variable(intrinsics_inv.cuda(), volatile=True) 108 | 109 | disp = disp_net(tgt_img_var) 110 | depth = 1/disp 111 | pose = pose_net(tgt_img_var, ref_imgs_var) 112 | explainability_mask = mask_net(tgt_img_var, ref_imgs_var) 113 | if args.flownet=='Back2Future': 114 | flow_fwd, _, _ = flow_net(tgt_img_var, ref_imgs_var[1:3]) 115 | else: 116 | flow_fwd = flow_net(tgt_img_var, ref_imgs_var[2]) 117 | flow_cam = pose2flow(depth.squeeze(1), pose[:,2], intrinsics_var, intrinsics_inv_var) 118 | 119 | rigidity_mask = 1 - (1-explainability_mask[:,1])*(1-explainability_mask[:,2]).unsqueeze(1) > 0.5 120 | 121 | rigidity_mask_census_soft = (flow_cam - flow_fwd).abs()#.normalize() 122 | rigidity_mask_census_u = rigidity_mask_census_soft[:,0] < args.THRESH 123 | rigidity_mask_census_v = rigidity_mask_census_soft[:,1] < args.THRESH 124 | rigidity_mask_census = (rigidity_mask_census_u).type_as(flow_fwd) * (rigidity_mask_census_v).type_as(flow_fwd) 125 | rigidity_mask_combined = 1 - (1-rigidity_mask.type_as(explainability_mask))*(1-rigidity_mask_census.type_as(explainability_mask)) 126 | 127 | _, _, h_pred, w_pred = flow_cam.size() 128 | _, _, h_gt, w_gt = tgt_img_original.size() 129 | rigidity_pred_mask = nn.functional.upsample(rigidity_mask_combined, size=(h_pred, w_pred), mode='bilinear') 130 | 131 | non_rigid_pred = (rigidity_pred_mask<=args.THRESH).type_as(flow_fwd).expand_as(flow_fwd) * flow_fwd 132 | rigid_pred = (rigidity_pred_mask>args.THRESH).type_as(flow_cam).expand_as(flow_cam) * flow_cam 133 | total_pred = non_rigid_pred + rigid_pred 134 | 135 | pred_fullres = nn.functional.upsample(total_pred, size=(h_gt, w_gt), mode='bilinear') 136 | pred_fullres[:,0,:,:] = pred_fullres[:,0,:,:] * (w_gt/w_pred) 137 | pred_fullres[:,1,:,:] = pred_fullres[:,1,:,:] * (h_gt/h_pred) 138 | 139 | flow_fwd_fullres = nn.functional.upsample(flow_fwd, size=(h_gt, w_gt), mode='bilinear') 140 | flow_fwd_fullres[:,0,:,:] = flow_fwd_fullres[:,0,:,:] * (w_gt/w_pred) 141 | flow_fwd_fullres[:,1,:,:] = flow_fwd_fullres[:,1,:,:] * (h_gt/h_pred) 142 | 143 | flow_cam_fullres = nn.functional.upsample(flow_cam, size=(h_gt, w_gt), mode='bilinear') 144 | flow_cam_fullres[:,0,:,:] = flow_cam_fullres[:,0,:,:] * (w_gt/w_pred) 145 | flow_cam_fullres[:,1,:,:] = flow_cam_fullres[:,1,:,:] * (h_gt/h_pred) 146 | 147 | tgt_img_np = tgt_img[0].numpy() 148 | rigidity_mask_combined_np = rigidity_mask_combined.cpu().data[0].numpy() 149 | 150 | if args.output_dir is not None: 151 | np.save(image_dir/str(i).zfill(3), tgt_img_np ) 152 | np.save(mask_dir/str(i).zfill(3), rigidity_mask_combined_np) 153 | pred_u = pred_fullres[0][0].data.cpu().numpy() 154 | pred_v = pred_fullres[0][1].data.cpu().numpy() 155 | flow_io.flow_write_png(testing_dir/str(i).zfill(6)+'_10.png' ,u=pred_u, v=pred_v) 156 | flow_io.flow_write(testing_dir_flo/str(i).zfill(6)+'_10.flo' ,pred_u, pred_v) 157 | 158 | 159 | 160 | if (args.output_dir is not None): 161 | ind = int(i) 162 | tgt_img_viz = tensor2array(tgt_img[0].cpu()) 163 | depth_viz = tensor2array(disp.data[0].cpu(), max_value=None, colormap='magma') 164 | mask_viz = tensor2array(rigidity_mask_combined.data[0].cpu(), max_value=1, colormap='magma') 165 | row2_viz = flow_to_image(np.hstack((tensor2array(flow_cam_fullres.data[0].cpu()), 166 | tensor2array(flow_fwd_fullres.data[0].cpu()), 167 | tensor2array(pred_fullres.data[0].cpu()) )) ) 168 | 169 | row1_viz = np.hstack((tgt_img_viz, depth_viz, mask_viz)) 170 | 171 | row1_viz_im = Image.fromarray((255*row1_viz.transpose(1,2,0)).astype('uint8')) 172 | row2_viz_im = Image.fromarray((255*row2_viz.transpose(1,2,0)).astype('uint8')) 173 | 174 | row1_viz_im.save(viz_dir/str(i).zfill(3)+'01.png') 175 | row2_viz_im.save(viz_dir/str(i).zfill(3)+'02.png') 176 | 177 | print("Done!") 178 | # print("\t {:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10} ".format(*error_names)) 179 | # print("Errors \t {:10.4f}, {:10.4f} {:10.4f}, {:10.4f} {:10.4f}, {:10.4f}".format(*errors.avg)) 180 | 181 | 182 | if __name__ == '__main__': 183 | main() 184 | -------------------------------------------------------------------------------- /test_back2future.py: -------------------------------------------------------------------------------- 1 | # Author: Anurag Ranjan 2 | # Copyright (c) 2019, Anurag Ranjan 3 | # All rights reserved. 4 | 5 | import argparse 6 | from loss_functions import compute_epe 7 | import custom_transforms 8 | from datasets.validation_flow import ValidationFlow, ValidationFlowKitti2012 9 | import torch 10 | from torch.autograd import Variable 11 | import models 12 | from logger import AverageMeter 13 | from loss_functions import compute_all_epes 14 | from tqdm import tqdm 15 | 16 | parser = argparse.ArgumentParser(description='Structure from Motion Learner training on KITTI and CityScapes Dataset', 17 | formatter_class=argparse.ArgumentDefaultsHelpFormatter) 18 | parser.add_argument('--flownet', dest='flownet', type=str, default='Back2Future', choices=['Back2Future'], 19 | help='flow network architecture. Options: FlowNetS | SpyNet') 20 | parser.add_argument('--nlevels', dest='nlevels', type=int, default=5, 21 | help='number of levels in multiscale. Options: 4|5|6') 22 | parser.add_argument('--pretrained-flow', dest='pretrained_flow', default=None, metavar='PATH', 23 | help='path to pre-trained Flow net model') 24 | parser.add_argument('--dataset', dest='dataset', default='kitti2015', choices=['kitti2015', 'kitti2012'], 25 | help='path to pre-trained Flow net model') 26 | 27 | 28 | def main(): 29 | global args 30 | args = parser.parse_args() 31 | normalize = custom_transforms.Normalize(mean=[0.5, 0.5, 0.5], 32 | std=[0.5, 0.5, 0.5]) 33 | flow_loader_h, flow_loader_w = 256, 832 34 | valid_flow_transform = custom_transforms.Compose([custom_transforms.Scale(h=flow_loader_h, w=flow_loader_w), 35 | custom_transforms.ArrayToTensor(), normalize]) 36 | if args.dataset == "kitti2015": 37 | val_flow_set = ValidationFlow(root='/home/anuragr/datasets/kitti/kitti2015', 38 | sequence_length=5, transform=valid_flow_transform) 39 | elif args.dataset == "kitti2012": 40 | val_flow_set = ValidationFlowKitti2012(root='/is/ps2/aranjan/AllFlowData/kitti/kitti2012', 41 | sequence_length=5, transform=valid_flow_transform) 42 | 43 | val_flow_loader = torch.utils.data.DataLoader(val_flow_set, batch_size=1, shuffle=False, 44 | num_workers=2, pin_memory=True, drop_last=True) 45 | 46 | flow_net = getattr(models, args.flownet)(nlevels=args.nlevels).cuda() 47 | 48 | if args.pretrained_flow: 49 | print("=> using pre-trained weights from {}".format(args.pretrained_flow)) 50 | weights = torch.load(args.pretrained_flow) 51 | flow_net.load_state_dict(weights['state_dict'])#, strict=False) 52 | 53 | flow_net = flow_net.cuda() 54 | flow_net.eval() 55 | error_names = ['epe_total', 'epe_non_rigid', 'epe_rigid', 'outliers'] 56 | errors = AverageMeter(i=len(error_names)) 57 | 58 | for i, (tgt_img, ref_imgs, intrinsics, intrinsics_inv, flow_gt, obj_map) in enumerate(tqdm(val_flow_loader)): 59 | tgt_img_var = Variable(tgt_img.cuda(), volatile=True) 60 | if args.dataset=="kitti2015": 61 | ref_imgs_var = [Variable(img.cuda(), volatile=True) for img in ref_imgs] 62 | ref_img_var = ref_imgs_var[1:3] 63 | elif args.dataset=="kitti2012": 64 | ref_img_var = Variable(ref_imgs.cuda(), volatile=True) 65 | 66 | flow_gt_var = Variable(flow_gt.cuda(), volatile=True) 67 | # compute output 68 | flow_fwd, flow_bwd, occ = flow_net(tgt_img_var, ref_img_var) 69 | #epe = compute_epe(gt=flow_gt_var, pred=flow_fwd) 70 | obj_map_gt_var = Variable(obj_map.cuda(), volatile=True) 71 | obj_map_gt_var_expanded = obj_map_gt_var.unsqueeze(1).type_as(flow_fwd) 72 | 73 | epe = compute_all_epes(flow_gt_var, flow_fwd, flow_fwd, (1-obj_map_gt_var_expanded) ) 74 | #print(i, epe) 75 | errors.update(epe) 76 | 77 | print("Averge EPE",errors.avg ) 78 | 79 | 80 | 81 | if __name__ == '__main__': 82 | main() 83 | -------------------------------------------------------------------------------- /test_disp.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch.autograd import Variable 3 | from PIL import Image 4 | from scipy import interpolate 5 | from scipy.misc import imresize 6 | from scipy.ndimage.interpolation import zoom 7 | import numpy as np 8 | from path import Path 9 | import argparse 10 | from tqdm import tqdm 11 | from utils import tensor2array 12 | import models 13 | from loss_functions import spatial_normalize 14 | 15 | parser = argparse.ArgumentParser(description='Script for DispNet testing with corresponding groundTruth', 16 | formatter_class=argparse.ArgumentDefaultsHelpFormatter) 17 | parser.add_argument("--dispnet", dest='dispnet', type=str, default='DispResNet6', help='dispnet architecture') 18 | parser.add_argument("--posenet", dest='posenet', type=str, default='PoseExpNet', help='posenet architecture') 19 | parser.add_argument("--pretrained-dispnet", required=True, type=str, help="pretrained DispNet path") 20 | parser.add_argument("--pretrained-posenet", default=None, type=str, help="pretrained PoseNet path (for scale factor)") 21 | parser.add_argument("--img-height", default=256, type=int, help="Image height") 22 | parser.add_argument("--img-width", default=832, type=int, help="Image width") 23 | parser.add_argument("--no-resize", action='store_true', help="no resizing is done") 24 | parser.add_argument("--spatial-normalize", action='store_true', help="spatial normalization") 25 | parser.add_argument("--min-depth", default=1e-3) 26 | parser.add_argument("--max-depth", default=80, type=float) 27 | 28 | parser.add_argument("--dataset-dir", default='.', type=str, help="Dataset directory") 29 | parser.add_argument("--dataset-list", default=None, type=str, help="Dataset list file") 30 | parser.add_argument("--output-dir", default=None, type=str, help="Output directory for saving predictions in a big 3D numpy file") 31 | 32 | parser.add_argument("--gt-type", default='KITTI', type=str, help="GroundTruth data type", choices=['npy', 'png', 'KITTI', 'stillbox']) 33 | parser.add_argument("--img-exts", default=['png', 'jpg', 'bmp'], nargs='*', type=str, help="images extensions to glob") 34 | 35 | 36 | def main(): 37 | args = parser.parse_args() 38 | if args.gt_type == 'KITTI': 39 | from kitti_eval.depth_evaluation_utils import test_framework_KITTI as test_framework 40 | elif args.gt_type == 'stillbox': 41 | from stillbox_eval.depth_evaluation_utils import test_framework_stillbox as test_framework 42 | 43 | disp_net = getattr(models, args.dispnet)().cuda() 44 | weights = torch.load(args.pretrained_dispnet) 45 | disp_net.load_state_dict(weights['state_dict']) 46 | disp_net.eval() 47 | 48 | if args.pretrained_posenet is None: 49 | print('no PoseNet specified, scale_factor will be determined by median ratio, which is kiiinda cheating\ 50 | (but consistent with original paper)') 51 | seq_length = 0 52 | else: 53 | weights = torch.load(args.pretrained_posenet) 54 | seq_length = int(weights['state_dict']['conv1.0.weight'].size(1)/3) 55 | pose_net = getattr(models, args.posenet)(nb_ref_imgs=seq_length - 1, output_exp=False).cuda() 56 | pose_net.load_state_dict(weights['state_dict'], strict=False) 57 | 58 | dataset_dir = Path(args.dataset_dir) 59 | if args.dataset_list is not None: 60 | with open(args.dataset_list, 'r') as f: 61 | test_files = list(f.read().splitlines()) 62 | else: 63 | test_files = [file.relpathto(dataset_dir) for file in sum([dataset_dir.files('*.{}'.format(ext)) for ext in args.img_exts], [])] 64 | 65 | framework = test_framework(dataset_dir, test_files, seq_length, args.min_depth, args.max_depth) 66 | 67 | print('{} files to test'.format(len(test_files))) 68 | errors = np.zeros((2, 7, len(test_files)), np.float32) 69 | if args.output_dir is not None: 70 | output_dir = Path(args.output_dir) 71 | viz_dir = output_dir/'viz' 72 | output_dir.makedirs_p() 73 | viz_dir.makedirs_p() 74 | 75 | for j, sample in enumerate(tqdm(framework)): 76 | tgt_img = sample['tgt'] 77 | 78 | ref_imgs = sample['ref'] 79 | 80 | h,w,_ = tgt_img.shape 81 | if (not args.no_resize) and (h != args.img_height or w != args.img_width): 82 | tgt_img = imresize(tgt_img, (args.img_height, args.img_width)).astype(np.float32) 83 | ref_imgs = [imresize(img, (args.img_height, args.img_width)).astype(np.float32) for img in ref_imgs] 84 | 85 | tgt_img = np.transpose(tgt_img, (2, 0, 1)) 86 | ref_imgs = [np.transpose(img, (2,0,1)) for img in ref_imgs] 87 | 88 | tgt_img = torch.from_numpy(tgt_img).unsqueeze(0) 89 | tgt_img = ((tgt_img/255 - 0.5)/0.5).cuda() 90 | tgt_img_var = Variable(tgt_img, volatile=True) 91 | 92 | ref_imgs_var = [] 93 | for i, img in enumerate(ref_imgs): 94 | img = torch.from_numpy(img).unsqueeze(0) 95 | img = ((img/255 - 0.5)/0.5).cuda() 96 | ref_imgs_var.append(Variable(img, volatile=True)) 97 | 98 | pred_disp = disp_net(tgt_img_var) 99 | if args.spatial_normalize: 100 | pred_disp = spatial_normalize(pred_disp) 101 | pred_disp = pred_disp.data.cpu().numpy()[0,0] 102 | gt_depth = sample['gt_depth'] 103 | 104 | if args.output_dir is not None: 105 | if j == 0: 106 | predictions = np.zeros((len(test_files), *pred_disp.shape)) 107 | predictions[j] = 1/pred_disp 108 | gt_viz = interp_gt_disp(gt_depth) 109 | gt_viz = torch.FloatTensor(gt_viz) 110 | gt_viz[gt_viz == 0] = 1000 111 | gt_viz = (1/gt_viz).clamp(0,10) 112 | 113 | tgt_img_viz = tensor2array(tgt_img[0].cpu()) 114 | depth_viz = tensor2array(torch.FloatTensor(pred_disp), max_value=None, colormap='hot') 115 | gt_viz = tensor2array(gt_viz, max_value=None, colormap='hot') 116 | tgt_img_viz_im = Image.fromarray((255*tgt_img_viz).astype('uint8')) 117 | tgt_img_viz_im.save(viz_dir/str(j).zfill(4)+'img.png') 118 | depth_viz_im = Image.fromarray((255*depth_viz).astype('uint8')) 119 | depth_viz_im.save(viz_dir/str(j).zfill(4)+'depth.png') 120 | gt_viz_im = Image.fromarray((255*gt_viz).astype('uint8')) 121 | gt_viz_im.save(viz_dir/str(j).zfill(4)+'gt.png') 122 | 123 | 124 | pred_depth = 1/pred_disp 125 | pred_depth_zoomed = zoom(pred_depth, (gt_depth.shape[0]/pred_depth.shape[0],gt_depth.shape[1]/pred_depth.shape[1])).clip(args.min_depth, args.max_depth) 126 | if sample['mask'] is not None: 127 | pred_depth_zoomed = pred_depth_zoomed[sample['mask']] 128 | gt_depth = gt_depth[sample['mask']] 129 | 130 | if seq_length > 0: 131 | _, poses = pose_net(tgt_img_var, ref_imgs_var) 132 | displacements = poses[0,:,:3].norm(2,1).cpu().data.numpy() # shape [1 - seq_length] 133 | 134 | scale_factors = [s1/s2 for s1, s2 in zip(sample['displacements'], displacements) if s1 > 0] 135 | scale_factor = np.mean(scale_factors) if len(scale_factors) > 0 else 0 136 | if len(scale_factors) == 0: 137 | print('not good ! ', sample['path'], sample['displacements']) 138 | errors[0,:,j] = compute_errors(gt_depth, pred_depth_zoomed*scale_factor) 139 | 140 | scale_factor = np.median(gt_depth)/np.median(pred_depth_zoomed) 141 | errors[1,:,j] = compute_errors(gt_depth, pred_depth_zoomed*scale_factor) 142 | 143 | mean_errors = errors.mean(2) 144 | error_names = ['abs_rel','sq_rel','rms','log_rms','a1','a2','a3'] 145 | if args.pretrained_posenet: 146 | print("Results with scale factor determined by PoseNet : ") 147 | print("{:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10}".format(*error_names)) 148 | print("{:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}".format(*mean_errors[0])) 149 | 150 | print("Results with scale factor determined by GT/prediction ratio (like the original paper) : ") 151 | print("{:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10}".format(*error_names)) 152 | print("{:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}".format(*mean_errors[1])) 153 | 154 | if args.output_dir is not None: 155 | np.save(output_dir/'predictions.npy', predictions) 156 | 157 | def interp_gt_disp(mat, mask_val=0): 158 | mat[mat==mask_val] = np.nan 159 | x = np.arange(0, mat.shape[1]) 160 | y = np.arange(0, mat.shape[0]) 161 | mat = np.ma.masked_invalid(mat) 162 | xx, yy = np.meshgrid(x, y) 163 | #get only the valid values 164 | x1 = xx[~mat.mask] 165 | y1 = yy[~mat.mask] 166 | newarr = mat[~mat.mask] 167 | 168 | GD1 = interpolate.griddata((x1, y1), newarr.ravel(), (xx, yy), method='linear', fill_value=mask_val) 169 | return GD1 170 | 171 | def compute_errors(gt, pred): 172 | thresh = np.maximum((gt / pred), (pred / gt)) 173 | a1 = (thresh < 1.25 ).mean() 174 | a2 = (thresh < 1.25 ** 2).mean() 175 | a3 = (thresh < 1.25 ** 3).mean() 176 | 177 | rmse = (gt - pred) ** 2 178 | rmse = np.sqrt(rmse.mean()) 179 | 180 | rmse_log = (np.log(gt) - np.log(pred)) ** 2 181 | rmse_log = np.sqrt(rmse_log.mean()) 182 | 183 | abs_rel = np.mean(np.abs(gt - pred) / gt) 184 | 185 | sq_rel = np.mean(((gt - pred)**2) / gt) 186 | 187 | return abs_rel, sq_rel, rmse, rmse_log, a1, a2, a3 188 | 189 | 190 | if __name__ == '__main__': 191 | main() 192 | -------------------------------------------------------------------------------- /test_flownetc.py: -------------------------------------------------------------------------------- 1 | # Author: Anurag Ranjan 2 | # Copyright (c) 2019, Anurag Ranjan 3 | # All rights reserved. 4 | 5 | import argparse 6 | import custom_transforms 7 | from datasets.validation_flow import ValidationFlowFlowNetC 8 | import torch 9 | from torch.autograd import Variable 10 | import models 11 | from logger import AverageMeter 12 | from torchvision.transforms import ToPILImage 13 | from tensorboardX import SummaryWriter 14 | import os 15 | from flowutils.flowlib import flow_to_image 16 | from utils import tensor2array 17 | 18 | 19 | parser = argparse.ArgumentParser(description='Test FlowNetC', 20 | formatter_class=argparse.ArgumentDefaultsHelpFormatter) 21 | parser.add_argument('--flownet', dest='flownet', type=str, default='FlowNetC5', choices=['FlowNetS', 'FlowNetS5', 'FlowNetS6', 'SpyNet', 'FlowNetC5'], 22 | help='flow network architecture. Options: FlowNetS | SpyNet') 23 | parser.add_argument('--nlevels', dest='nlevels', type=int, default=6, 24 | help='number of levels in multiscale. Options: 4|5|6') 25 | parser.add_argument('--pretrained-flow', dest='pretrained_flow', default=None, metavar='PATH', 26 | help='path to pre-trained Flow net model') 27 | parser.add_argument('--dataset', dest='dataset', default='kitti2015', choices=['kitti2015', 'kitti2012'], 28 | help='path to pre-trained Flow net model') 29 | 30 | 31 | def compute_epe(gt, pred, op='sub'): 32 | _, _, h_pred, w_pred = pred.size() 33 | bs, nc, h_gt, w_gt = gt.size() 34 | u_gt, v_gt = gt[:,0,:,:], gt[:,1,:,:] 35 | pred = torch.nn.functional.upsample(pred, size=(h_gt, w_gt), mode='bilinear') 36 | u_pred = pred[:,0,:,:] * (w_gt/w_pred) 37 | v_pred = pred[:,1,:,:] * (h_gt/h_pred) 38 | if op=='sub': 39 | epe = torch.sqrt(torch.pow((u_gt - u_pred), 2) + torch.pow((v_gt - v_pred), 2)) 40 | if op=='div': 41 | epe = ((u_gt / u_pred) + (v_gt / v_pred)) 42 | 43 | return epe 44 | 45 | def main(): 46 | global args 47 | args = parser.parse_args() 48 | save_path = 'checkpoints/test_flownetc' 49 | 50 | if not os.path.exists(save_path): 51 | os.makedirs(save_path) 52 | summary_writer = SummaryWriter(save_path) 53 | normalize = custom_transforms.Normalize(mean=[0.5, 0.5, 0.5], 54 | std=[1.0, 1.0, 1.0]) 55 | flow_loader_h, flow_loader_w = 384, 1280 56 | valid_flow_transform = custom_transforms.Compose([custom_transforms.Scale(h=flow_loader_h, w=flow_loader_w), 57 | custom_transforms.ArrayToTensor(), normalize]) 58 | if args.dataset == "kitti2015": 59 | val_flow_set = ValidationFlowFlowNetC(root='/is/ps2/aranjan/AllFlowData/kitti/kitti2015', 60 | sequence_length=5, transform=valid_flow_transform) 61 | elif args.dataset == "kitti2012": 62 | val_flow_set = ValidationFlowKitti2012(root='/is/ps2/aranjan/AllFlowData/kitti/kitti2012', 63 | sequence_length=5, transform=valid_flow_transform) 64 | 65 | val_flow_loader = torch.utils.data.DataLoader(val_flow_set, batch_size=1, shuffle=False, 66 | num_workers=2, pin_memory=True, drop_last=True) 67 | 68 | flow_net = getattr(models, args.flownet)(pretrained=True).cuda() 69 | 70 | flow_net.eval() 71 | error_names = ['epe'] 72 | errors = AverageMeter(i=len(error_names)) 73 | 74 | for i, (tgt_img, ref_imgs, intrinsics, intrinsics_inv, flow_gt, flownet_c_flow, obj_map) in enumerate(val_flow_loader): 75 | tgt_img_var = Variable(tgt_img.cuda(), volatile=True) 76 | if args.dataset=="kitti2015": 77 | ref_imgs_var = [Variable(img.cuda(), volatile=True) for img in ref_imgs] 78 | ref_img_var = ref_imgs_var[2] 79 | elif args.dataset=="kitti2012": 80 | ref_img_var = Variable(ref_imgs.cuda(), volatile=True) 81 | 82 | flow_gt_var = Variable(flow_gt.cuda(), volatile=True) 83 | flownet_c_flow = Variable(flownet_c_flow.cuda(), volatile=True) 84 | 85 | # compute output 86 | flow_fwd = flow_net(tgt_img_var, ref_img_var) 87 | epe = compute_epe(gt=flownet_c_flow, pred=flow_fwd) 88 | scale_factor = compute_epe(gt=flownet_c_flow, pred=flow_fwd, op='div') 89 | #import ipdb 90 | #ipdb.set_trace() 91 | summary_writer.add_image('Frame 1', tensor2array(tgt_img_var.data[0].cpu()) , i) 92 | summary_writer.add_image('Frame 2', tensor2array(ref_img_var.data[0].cpu()) , i) 93 | summary_writer.add_image('Flow Output', flow_to_image(tensor2array(flow_fwd.data[0].cpu())) , i) 94 | summary_writer.add_image('UnFlow Output', flow_to_image(tensor2array(flownet_c_flow.data[0][:2].cpu())) , i) 95 | summary_writer.add_image('gtFlow Output', flow_to_image(tensor2array(flow_gt_var.data[0][:2].cpu())) , i) 96 | summary_writer.add_image('EPE Image w UnFlow', tensor2array(epe.data.cpu()) , i) 97 | summary_writer.add_scalar('EPE mean w UnFlow', epe.mean().data.cpu(), i) 98 | summary_writer.add_scalar('EPE max w UnFlow', epe.max().data.cpu(), i) 99 | summary_writer.add_scalar('Scale Factor max w UnFlow', scale_factor.max().data.cpu(), i) 100 | summary_writer.add_scalar('Scale Factor mean w UnFlow', scale_factor.mean().data.cpu(), i) 101 | summary_writer.add_scalar('Flow value max', flow_fwd.max().data.cpu(), i) 102 | print(i, "EPE: ", epe.mean().item()) 103 | 104 | #print(i, epe) 105 | #errors.update(epe) 106 | 107 | print('Done') 108 | #print("Averge EPE",errors.avg ) 109 | 110 | 111 | 112 | if __name__ == '__main__': 113 | main() 114 | -------------------------------------------------------------------------------- /test_make3d.py: -------------------------------------------------------------------------------- 1 | # Author: Anurag Ranjan 2 | # Copyright (c) 2019, Anurag Ranjan 3 | # All rights reserved. 4 | # based on github.com/ClementPinard/SfMLearner-Pytorch 5 | 6 | import glob 7 | import torch 8 | import cv2 9 | from torch.autograd import Variable 10 | from PIL import Image 11 | from scipy import interpolate, io 12 | from scipy.misc import imresize, imread 13 | from scipy.ndimage.interpolation import zoom 14 | import numpy as np 15 | from path import Path 16 | import argparse 17 | from tqdm import tqdm 18 | from utils import tensor2array 19 | import models 20 | from loss_functions import spatial_normalize 21 | 22 | parser = argparse.ArgumentParser(description='Script for DispNet testing with corresponding groundTruth', 23 | formatter_class=argparse.ArgumentDefaultsHelpFormatter) 24 | parser.add_argument("--dispnet", dest='dispnet', type=str, default='DispResNet6', help='dispnet architecture') 25 | parser.add_argument("--pretrained-dispnet", required=True, type=str, help="pretrained DispNet path") 26 | parser.add_argument("--img-height", default=256, type=int, help="Image height") 27 | parser.add_argument("--img-width", default=256, type=int, help="Image width") 28 | parser.add_argument("--no-resize", action='store_true', help="no resizing is done") 29 | parser.add_argument("--min-depth", default=1e-3) 30 | parser.add_argument("--max-depth", default=70, type=float) 31 | 32 | parser.add_argument("--dataset-dir", default='.', type=str, help="Dataset directory") 33 | parser.add_argument("--output-dir", default=None, type=str, help="Output directory for saving predictions in a big 3D numpy file") 34 | 35 | parser.add_argument("--img-exts", default=['png', 'jpg', 'bmp'], nargs='*', type=str, help="images extensions to glob") 36 | 37 | class test_framework(object): 38 | def __init__(self, root, min_depth=1e-3, max_depth=70): 39 | self.root = root 40 | self.min_depth, self.max_depth = min_depth, max_depth 41 | self.img_files = sorted(glob.glob(root/'Test134/*.jpg')) 42 | self.depth_files = sorted(glob.glob(root/'Gridlaserdata/*.mat')) 43 | 44 | # This test file is corrupted in the original dataset 45 | self.img_files.pop(61) 46 | self.depth_files.pop(61) 47 | 48 | self.ratio = 2 49 | self.h_ratio = 1 / (1.33333 * self.ratio) 50 | self.color_new_height = 1704 // 2 51 | self.depth_new_height = 21 52 | 53 | def __getitem__(self, i): 54 | img = Image.open(self.img_files[i]) 55 | try: 56 | imgarr = np.array(img) 57 | tgt_img = imgarr.astype(np.float32) 58 | except: 59 | imgarr = np.array(img) 60 | tgt_img = imgarr.astype(np.float32) 61 | 62 | tgt_img = tgt_img[ (2272 - self.color_new_height)//2:(2272 + self.color_new_height)//2,:] 63 | 64 | depth_map = io.loadmat(self.depth_files[i]) 65 | depth_gt = depth_map["Position3DGrid"][:,:,3] 66 | depth_gt_cropped = depth_gt[(55 - 21)//2:(55 + 21)//2] 67 | return {'tgt': tgt_img, 68 | 'path':self.img_files[i], 69 | 'gt_depth': depth_gt_cropped, 70 | 'mask': np.logical_and(depth_gt_cropped > self.min_depth, depth_gt_cropped < self.max_depth) 71 | } 72 | 73 | def __len__(self): 74 | return len(self.img_files) 75 | 76 | def main(): 77 | args = parser.parse_args() 78 | 79 | disp_net = getattr(models, args.dispnet)().cuda() 80 | weights = torch.load(args.pretrained_dispnet) 81 | disp_net.load_state_dict(weights['state_dict']) 82 | disp_net.eval() 83 | 84 | print('no PoseNet specified, scale_factor will be determined by median ratio, which is kiiinda cheating\ 85 | (but consistent with original paper)') 86 | seq_length = 0 87 | 88 | dataset_dir = Path(args.dataset_dir) 89 | framework = test_framework(dataset_dir, args.min_depth, args.max_depth) 90 | errors = np.zeros((2, 7, len(framework)), np.float32) 91 | if args.output_dir is not None: 92 | output_dir = Path(args.output_dir) 93 | viz_dir = output_dir/'viz' 94 | output_dir.makedirs_p() 95 | viz_dir.makedirs_p() 96 | 97 | for j, sample in enumerate(tqdm(framework)): 98 | tgt_img = sample['tgt'] 99 | 100 | h,w,_ = tgt_img.shape 101 | if (not args.no_resize) and (h != args.img_height or w != args.img_width): 102 | tgt_img = imresize(tgt_img, (args.img_height, args.img_width)).astype(np.float32) 103 | 104 | tgt_img = np.transpose(tgt_img, (2, 0, 1)) 105 | tgt_img = torch.from_numpy(tgt_img).unsqueeze(0) 106 | tgt_img = ((tgt_img/255 - 0.5)/0.5).cuda() 107 | tgt_img_var = Variable(tgt_img, volatile=True) 108 | 109 | pred_disp = disp_net(tgt_img_var) 110 | pred_disp = pred_disp.data.cpu().numpy()[0,0] 111 | gt_depth = sample['gt_depth'] 112 | 113 | if args.output_dir is not None: 114 | if j == 0: 115 | predictions = np.zeros((len(framework), *pred_disp.shape)) 116 | predictions[j] = 1/pred_disp 117 | gt_viz = interp_gt_disp(gt_depth) 118 | gt_viz = torch.FloatTensor(gt_viz) 119 | gt_viz[gt_viz == 0] = 1000 120 | gt_viz = (1/gt_viz).clamp(0,10) 121 | 122 | tgt_img_viz = tensor2array(tgt_img[0].cpu()) 123 | depth_viz = tensor2array(torch.FloatTensor(pred_disp), max_value=None, colormap='hot') 124 | gt_viz = tensor2array(gt_viz, max_value=None, colormap='hot') 125 | tgt_img_viz_im = Image.fromarray((255*tgt_img_viz).astype('uint8')) 126 | tgt_img_viz_im = tgt_img_viz_im.resize(size=(args.img_width, args.img_height), resample=3) 127 | tgt_img_viz_im.save(viz_dir/str(j).zfill(4)+'img.png') 128 | depth_viz_im = Image.fromarray((255*depth_viz).astype('uint8')) 129 | depth_viz_im = depth_viz_im.resize(size=(args.img_width, args.img_height), resample=3) 130 | depth_viz_im.save(viz_dir/str(j).zfill(4)+'depth.png') 131 | gt_viz_im = Image.fromarray((255*gt_viz).astype('uint8')) 132 | gt_viz_im = gt_viz_im.resize(size=(args.img_width, args.img_height), resample=3) 133 | gt_viz_im.save(viz_dir/str(j).zfill(4)+'gt.png') 134 | 135 | all_viz_im = Image.fromarray( np.hstack([np.array(tgt_img_viz_im), np.array(gt_viz_im), np.array(depth_viz_im)]) ) 136 | all_viz_im.save(viz_dir/str(j).zfill(4)+'all.png') 137 | 138 | 139 | pred_depth = 1/pred_disp 140 | pred_depth_zoomed = zoom(pred_depth, (gt_depth.shape[0]/pred_depth.shape[0],gt_depth.shape[1]/pred_depth.shape[1])).clip(args.min_depth, args.max_depth) 141 | if sample['mask'] is not None: 142 | pred_depth_zoomed = pred_depth_zoomed[sample['mask']] 143 | gt_depth = gt_depth[sample['mask']] 144 | 145 | scale_factor = np.median(gt_depth)/np.median(pred_depth_zoomed) 146 | pred_depth_zoomed = scale_factor*pred_depth_zoomed 147 | pred_depth_zoomed[pred_depth_zoomed>args.max_depth] = args.max_depth 148 | errors[1,:,j] = compute_errors(gt_depth, pred_depth_zoomed) 149 | 150 | mean_errors = errors.mean(2) 151 | error_names = ['abs_rel','sq_rel','rms','log_rms','a1','a2','a3'] 152 | 153 | print("Results with scale factor determined by GT/prediction ratio (like the original paper) : ") 154 | print("{:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10}".format(*error_names)) 155 | print("{:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}".format(*mean_errors[1])) 156 | 157 | if args.output_dir is not None: 158 | np.save(output_dir/'predictions.npy', predictions) 159 | 160 | def interp_gt_disp(mat, mask_val=0): 161 | mat[mat==mask_val] = np.nan 162 | x = np.arange(0, mat.shape[1]) 163 | y = np.arange(0, mat.shape[0]) 164 | mat = np.ma.masked_invalid(mat) 165 | xx, yy = np.meshgrid(x, y) 166 | #get only the valid values 167 | x1 = xx[~mat.mask] 168 | y1 = yy[~mat.mask] 169 | newarr = mat[~mat.mask] 170 | 171 | GD1 = interpolate.griddata((x1, y1), newarr.ravel(), (xx, yy), method='linear', fill_value=mask_val) 172 | return GD1 173 | 174 | def compute_errors(gt, pred): 175 | thresh = np.maximum((gt / pred), (pred / gt)) 176 | a1 = (thresh < 1.25 ).mean() 177 | a2 = (thresh < 1.25 ** 2).mean() 178 | a3 = (thresh < 1.25 ** 3).mean() 179 | 180 | rmse = (gt - pred) ** 2 181 | rmse = np.sqrt(rmse.mean()) 182 | 183 | rmse_log = (np.log10(gt) - np.log10(pred)) ** 2 184 | rmse_log = np.sqrt(rmse_log.mean()) 185 | 186 | abs_rel = np.mean(np.abs(gt - pred) / gt) 187 | 188 | sq_rel = np.mean(((gt - pred)**2) / gt) 189 | 190 | return abs_rel, sq_rel, rmse, rmse_log, a1, a2, a3 191 | 192 | 193 | if __name__ == '__main__': 194 | main() 195 | -------------------------------------------------------------------------------- /test_pose.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch.autograd import Variable 3 | 4 | from scipy.misc import imresize 5 | import numpy as np 6 | from path import Path 7 | import argparse 8 | from tqdm import tqdm 9 | 10 | import models 11 | from inverse_warp import pose_vec2mat 12 | 13 | 14 | parser = argparse.ArgumentParser(description='Script for PoseNet testing with corresponding groundTruth from KITTI Odometry', 15 | formatter_class=argparse.ArgumentDefaultsHelpFormatter) 16 | parser.add_argument("pretrained_posenet", type=str, help="pretrained PoseNet path") 17 | parser.add_argument("--posenet", type=str, default="PoseNetB6", help="PoseNet model path") 18 | parser.add_argument("--img-height", default=256, type=int, help="Image height") 19 | parser.add_argument("--img-width", default=832, type=int, help="Image width") 20 | parser.add_argument("--no-resize", action='store_true', help="no resizing is done") 21 | parser.add_argument("--min-depth", default=1e-3) 22 | parser.add_argument("--max-depth", default=80) 23 | 24 | parser.add_argument("--dataset-dir", default='.', type=str, help="Dataset directory") 25 | parser.add_argument("--sequences", default=['09'], type=str, nargs='*', help="sequences to test") 26 | parser.add_argument("--output-dir", default=None, type=str, help="Output directory for saving predictions in a big 3D numpy file") 27 | parser.add_argument("--img-exts", default=['png', 'jpg', 'bmp'], nargs='*', type=str, help="images extensions to glob") 28 | parser.add_argument("--rotation-mode", default='euler', choices=['euler', 'quat'], type=str) 29 | 30 | 31 | def main(): 32 | args = parser.parse_args() 33 | from kitti_eval.pose_evaluation_utils import test_framework_KITTI as test_framework 34 | 35 | weights = torch.load(args.pretrained_posenet) 36 | seq_length = int(weights['state_dict']['conv1.0.weight'].size(1)/3) 37 | pose_net = getattr(models, args.posenet)(nb_ref_imgs=seq_length - 1).cuda() 38 | pose_net.load_state_dict(weights['state_dict'], strict=False) 39 | 40 | dataset_dir = Path(args.dataset_dir) 41 | framework = test_framework(dataset_dir, args.sequences, seq_length) 42 | 43 | print('{} snippets to test'.format(len(framework))) 44 | errors = np.zeros((len(framework), 2), np.float32) 45 | if args.output_dir is not None: 46 | output_dir = Path(args.output_dir) 47 | output_dir.makedirs_p() 48 | predictions_array = np.zeros((len(framework), seq_length, 3, 4)) 49 | 50 | for j, sample in enumerate(tqdm(framework)): 51 | imgs = sample['imgs'] 52 | 53 | h,w,_ = imgs[0].shape 54 | if (not args.no_resize) and (h != args.img_height or w != args.img_width): 55 | imgs = [imresize(img, (args.img_height, args.img_width)).astype(np.float32) for img in imgs] 56 | 57 | imgs = [np.transpose(img, (2,0,1)) for img in imgs] 58 | 59 | ref_imgs_var = [] 60 | for i, img in enumerate(imgs): 61 | img = torch.from_numpy(img).unsqueeze(0) 62 | img = ((img/255 - 0.5)/0.5).cuda() 63 | img_var = Variable(img, volatile=True) 64 | if i == len(imgs)//2: 65 | tgt_img_var = img_var 66 | else: 67 | ref_imgs_var.append(Variable(img, volatile=True)) 68 | 69 | if args.posenet in ["PoseNet6", "PoseNetB6"]: 70 | poses = pose_net(tgt_img_var, ref_imgs_var) 71 | else: 72 | _, poses = pose_net(tgt_img_var, ref_imgs_var) 73 | 74 | poses = poses.cpu().data[0] 75 | poses = torch.cat([poses[:len(imgs)//2], torch.zeros(1,6).float(), poses[len(imgs)//2:]]) 76 | 77 | inv_transform_matrices = pose_vec2mat(Variable(poses), rotation_mode=args.rotation_mode).data.numpy().astype(np.float64) 78 | 79 | rot_matrices = np.linalg.inv(inv_transform_matrices[:,:,:3]) 80 | tr_vectors = -rot_matrices @ inv_transform_matrices[:,:,-1:] 81 | 82 | transform_matrices = np.concatenate([rot_matrices, tr_vectors], axis=-1) 83 | 84 | first_inv_transform = inv_transform_matrices[0] 85 | final_poses = first_inv_transform[:,:3] @ transform_matrices 86 | final_poses[:,:,-1:] += first_inv_transform[:,-1:] 87 | 88 | if args.output_dir is not None: 89 | predictions_array[j] = final_poses 90 | 91 | ATE, RE = compute_pose_error(sample['poses'], final_poses) 92 | errors[j] = ATE, RE 93 | 94 | mean_errors = errors.mean(0) 95 | std_errors = errors.std(0) 96 | error_names = ['ATE','RE'] 97 | print('') 98 | print("Results") 99 | print("\t {:>10}, {:>10}".format(*error_names)) 100 | print("mean \t {:10.4f}, {:10.4f}".format(*mean_errors)) 101 | print("std \t {:10.4f}, {:10.4f}".format(*std_errors)) 102 | 103 | if args.output_dir is not None: 104 | np.save(output_dir/'predictions.npy', predictions_array) 105 | 106 | 107 | def compute_pose_error(gt, pred): 108 | RE = 0 109 | snippet_length = gt.shape[0] 110 | scale_factor = np.sum(gt[:,:,-1] * pred[:,:,-1])/np.sum(pred[:,:,-1] ** 2) 111 | ATE = np.linalg.norm((gt[:,:,-1] - scale_factor * pred[:,:,-1]).reshape(-1)) 112 | for gt_pose, pred_pose in zip(gt, pred): 113 | # Residual matrix to which we compute angle's sin and cos 114 | R = gt_pose[:,:3] @ np.linalg.inv(pred_pose[:,:3]) 115 | s = np.linalg.norm([R[0,1]-R[1,0], 116 | R[1,2]-R[2,1], 117 | R[0,2]-R[2,0]]) 118 | c = np.trace(R) - 1 119 | # Note: we actually compute double of cos and sin, but arctan2 is invariant to scale 120 | RE += np.arctan2(s,c) 121 | 122 | return ATE/snippet_length, RE/snippet_length 123 | 124 | 125 | if __name__ == '__main__': 126 | main() 127 | -------------------------------------------------------------------------------- /test_sintel_pose.py: -------------------------------------------------------------------------------- 1 | # Author: Anurag Ranjan 2 | # Copyright (c) 2019, Anurag Ranjan 3 | # All rights reserved. 4 | # based on github.com/ClementPinard/SfMLearner-Pytorch 5 | 6 | import torch 7 | from torch.autograd import Variable 8 | 9 | from scipy.misc import imresize 10 | import numpy as np 11 | from path import Path 12 | import argparse 13 | from tqdm import tqdm 14 | 15 | import models 16 | from inverse_warp import pose_vec2mat 17 | 18 | 19 | parser = argparse.ArgumentParser(description='Script for PoseNet testing with corresponding groundTruth from Sintel Odometry', 20 | formatter_class=argparse.ArgumentDefaultsHelpFormatter) 21 | parser.add_argument("pretrained_posenet", type=str, help="pretrained PoseNet path") 22 | parser.add_argument("--posenet", type=str, default="PoseNetB6", help="PoseNet model path") 23 | parser.add_argument("--img-height", default=128, type=int, help="Image height") 24 | parser.add_argument("--img-width", default=416, type=int, help="Image width") 25 | parser.add_argument("--no-resize", action='store_true', help="no resizing is done") 26 | parser.add_argument("--min-depth", default=1e-3) 27 | parser.add_argument("--max-depth", default=80) 28 | 29 | parser.add_argument("--dataset-dir", default='.', type=str, help="Dataset directory") 30 | parser.add_argument("--sequences", default=['alley_1'], type=str, nargs='*', help="sequences to test") 31 | parser.add_argument("--output-dir", default=None, type=str, help="Output directory for saving predictions in a big 3D numpy file") 32 | parser.add_argument("--img-exts", default=['png', 'jpg', 'bmp'], nargs='*', type=str, help="images extensions to glob") 33 | parser.add_argument("--rotation-mode", default='euler', choices=['euler', 'quat'], type=str) 34 | 35 | 36 | def main(): 37 | args = parser.parse_args() 38 | from sintel_eval.pose_evaluation_utils import test_framework_Sintel as test_framework 39 | 40 | weights = torch.load(args.pretrained_posenet) 41 | seq_length = int(weights['state_dict']['conv1.0.weight'].size(1)/3) 42 | pose_net = getattr(models, args.posenet)(nb_ref_imgs=seq_length - 1).cuda() 43 | pose_net.load_state_dict(weights['state_dict'], strict=False) 44 | 45 | dataset_dir = Path(args.dataset_dir) 46 | framework = test_framework(dataset_dir, args.sequences, seq_length) 47 | 48 | print('{} snippets to test'.format(len(framework))) 49 | RE = np.zeros((len(framework)), np.float32) 50 | if args.output_dir is not None: 51 | output_dir = Path(args.output_dir) 52 | output_dir.makedirs_p() 53 | predictions_array = np.zeros((len(framework), seq_length, 3, 4)) 54 | 55 | for j, sample in enumerate(tqdm(framework)): 56 | imgs = sample['imgs'] 57 | 58 | h,w,_ = imgs[0].shape 59 | if (not args.no_resize) and (h != args.img_height or w != args.img_width): 60 | imgs = [imresize(img, (args.img_height, args.img_width)).astype(np.float32) for img in imgs] 61 | 62 | imgs = [np.transpose(img, (2,0,1)) for img in imgs] 63 | 64 | ref_imgs_var = [] 65 | for i, img in enumerate(imgs): 66 | img = torch.from_numpy(img).unsqueeze(0) 67 | img = ((img/255 - 0.5)/0.5).cuda() 68 | img_var = Variable(img, volatile=True) 69 | if i == len(imgs)//2: 70 | tgt_img_var = img_var 71 | else: 72 | ref_imgs_var.append(Variable(img, volatile=True)) 73 | 74 | if args.posenet in ["PoseNet6", "PoseNetB6"]: 75 | poses = pose_net(tgt_img_var, ref_imgs_var) 76 | else: 77 | _, poses = pose_net(tgt_img_var, ref_imgs_var) 78 | 79 | poses = poses.cpu().data[0] 80 | poses = torch.cat([poses[:len(imgs)//2], torch.zeros(1,6).float(), poses[len(imgs)//2:]]) 81 | 82 | inv_transform_matrices = pose_vec2mat(Variable(poses), rotation_mode=args.rotation_mode).data.numpy().astype(np.float64) 83 | 84 | rot_matrices = np.linalg.inv(inv_transform_matrices[:,:,:3]) 85 | tr_vectors = -rot_matrices @ inv_transform_matrices[:,:,-1:] 86 | 87 | transform_matrices = np.concatenate([rot_matrices, tr_vectors], axis=-1) 88 | 89 | first_inv_transform = inv_transform_matrices[0] 90 | final_poses = first_inv_transform[:,:3] @ transform_matrices 91 | final_poses[:,:,-1:] += first_inv_transform[:,-1:] 92 | 93 | if args.output_dir is not None: 94 | predictions_array[j] = final_poses 95 | 96 | RE[j] = compute_pose_error(sample['poses'], final_poses) 97 | 98 | print('') 99 | print("Results") 100 | print("\t {:>10}".format('RE')) 101 | print("mean \t {:10.4f}".format(RE.mean())) 102 | print("std \t {:10.4f}".format(RE.std())) 103 | 104 | if args.output_dir is not None: 105 | np.save(output_dir/'predictions.npy', predictions_array) 106 | 107 | 108 | def compute_pose_error(gt, pred): 109 | RE = 0 110 | snippet_length = gt.shape[0] 111 | for gt_pose, pred_pose in zip(gt, pred): 112 | # Residual matrix to which we compute angle's sin and cos 113 | R = gt_pose[:,:3] @ np.linalg.inv(pred_pose[:,:3]) 114 | s = np.linalg.norm([R[0,1]-R[1,0], 115 | R[1,2]-R[2,1], 116 | R[0,2]-R[2,0]]) 117 | c = np.trace(R) - 1 118 | # Note: we actually compute double of cos and sin, but arctan2 is invariant to scale 119 | RE += np.arctan2(s,c) 120 | 121 | return RE/snippet_length 122 | 123 | 124 | if __name__ == '__main__': 125 | main() 126 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | import shutil 3 | import numpy as np 4 | import torch 5 | from matplotlib import cm 6 | from matplotlib.colors import ListedColormap, LinearSegmentedColormap 7 | 8 | def high_res_colormap(low_res_cmap, resolution=1000, max_value=1): 9 | # Construct the list colormap, with interpolated values for higer resolution 10 | # For a linear segmented colormap, you can just specify the number of point in 11 | # cm.get_cmap(name, lutsize) with the parameter lutsize 12 | x = np.linspace(0,1,low_res_cmap.N) 13 | low_res = low_res_cmap(x) 14 | new_x = np.linspace(0,max_value,resolution) 15 | high_res = np.stack([np.interp(new_x, x, low_res[:,i]) for i in range(low_res.shape[1])], axis=1) 16 | return ListedColormap(high_res) 17 | 18 | 19 | def opencv_rainbow(resolution=1000): 20 | # Construct the opencv equivalent of Rainbow 21 | opencv_rainbow_data = ( 22 | (0.000, (1.00, 0.00, 0.00)), 23 | (0.400, (1.00, 1.00, 0.00)), 24 | (0.600, (0.00, 1.00, 0.00)), 25 | (0.800, (0.00, 0.00, 1.00)), 26 | (1.000, (0.60, 0.00, 1.00)) 27 | ) 28 | 29 | return LinearSegmentedColormap.from_list('opencv_rainbow', opencv_rainbow_data, resolution) 30 | 31 | 32 | COLORMAPS = {'rainbow': opencv_rainbow(), 33 | 'magma': high_res_colormap(cm.get_cmap('magma')), 34 | 'bone': cm.get_cmap('bone', 10000)} 35 | 36 | 37 | def tensor2array(tensor, max_value=None, colormap='rainbow'): 38 | tensor = tensor.detach().cpu() 39 | if max_value is None: 40 | max_value = tensor.max().item() 41 | if tensor.ndimension() == 2 or tensor.size(0) == 1: 42 | norm_array = tensor.squeeze().numpy()/max_value 43 | array = COLORMAPS[colormap](norm_array).astype(np.float32) 44 | array = array[:,:,:3] 45 | array = array.transpose(2, 0, 1) 46 | 47 | elif tensor.ndimension() == 3: 48 | if (tensor.size(0) == 3): 49 | array = 0.5 + tensor.numpy()*0.5 50 | elif (tensor.size(0) == 2): 51 | array = tensor.numpy() 52 | 53 | return array 54 | 55 | def save_checkpoint(save_path, dispnet_state, posenet_state, masknet_state, flownet_state, optimizer_state, is_best, filename='checkpoint.pth.tar'): 56 | file_prefixes = ['dispnet', 'posenet', 'masknet', 'flownet', 'optimizer'] 57 | states = [dispnet_state, posenet_state, masknet_state, flownet_state, optimizer_state] 58 | for (prefix, state) in zip(file_prefixes, states): 59 | torch.save(state, save_path/'{}_{}'.format(prefix,filename)) 60 | 61 | if is_best: 62 | for prefix in file_prefixes: 63 | shutil.copyfile(save_path/'{}_{}'.format(prefix,filename), save_path/'{}_model_best.pth.tar'.format(prefix)) 64 | --------------------------------------------------------------------------------