├── .gitignore
├── LICENSE
├── README.md
├── custom_transforms.py
├── data
├── cityscapes_loader.py
├── kitti_raw_loader.py
├── prepare_train_data.py
├── static_frames.txt
└── test_scenes.txt
├── datasets
├── __init__.py
├── general_sequence_folders.py
├── sequence_folders.py
├── stacked_sequence_folders.py
├── validation_flow.py
└── validation_folders.py
├── evaluate_flow.py
├── flowutils
├── __init__.py
├── flow_io.py
├── flow_viz.py
├── flowlib.py
└── pfm.py
├── inverse_warp.py
├── kitti_eval
├── depth_evaluation_utils.py
├── pose_evaluation_utils.py
└── test_files_eigen.txt
├── logger.py
├── loss_functions.py
├── mnist.py
├── mnist_eval.py
├── models
├── DispNetS.py
├── DispNetS6.py
├── DispResNet6.py
├── DispResNetS6.py
├── FlowNetC6.py
├── MaskNet6.py
├── MaskResNet6.py
├── PoseExpNet.py
├── PoseNet6.py
├── PoseNetB6.py
├── __init__.py
├── back2future.py
├── submodules.py
└── utils.py
├── requirements.txt
├── run_inference.py
├── sintel_eval
├── pose_evaluation_utils.py
└── sintel_io.py
├── ssim.py
├── stillbox_eval
├── depth_evaluation_utils.py
└── test_files_90.txt
├── submit_flow.py
├── test_back2future.py
├── test_disp.py
├── test_flow.py
├── test_flownetc.py
├── test_make3d.py
├── test_mask.py
├── test_pose.py
├── test_sintel_pose.py
├── train.py
└── utils.py
/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | *.pth
3 | *.tar
4 | *.sub
5 | *.npy
6 | *.jpg
7 | *.png
8 | *.zip
9 | main.sh
10 | checkpoints*
11 | visualize/*
12 | !visualize/*.py
13 | log/*
14 | config/*
15 | models/spynet_models/*
16 | dockers/*
17 | test_script.sh
18 | kitti_data/*
19 | datasets/mnist/
20 | datasets/svhn/
21 | results/*
22 | pretrained/*
23 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2019 Anurag Ranjan
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Competitive Collaboration
2 | This is an official repository of
3 | **Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation**. The project was formerly referred by **Adversarial Collaboration**.
4 |
5 | ### News
6 | - **16 August '19:** `skimage` dependencies are removed in favour of `PIL`, and are supported in the [`pil` branch](https://github.com/anuragranj/cc/tree/pil). If you discover bugs, please file an issue, or send a pull request. This will eventually be merged with `master` if users are satisfied.
7 | - **11 March '19:** We recently ported the entire code to `pytorch-1.0`, so if you discover bugs, please file an issue.
8 |
9 | [[Project Page]](http://research.nvidia.com/publication/2018-05_Adversarial-Collaboration-Joint)
10 | [[Arxiv]](https://arxiv.org/abs/1805.09806)
11 |
12 | **Skip to:**
13 | - [Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation](#jointcc)
14 | - [Mixed Domain Learning using MNIST+SVHN](#mnist)
15 | - [Download Pretrained Models and Evaluation Data](#downloads)
16 |
17 | ### Prerequisites
18 | Python3 and pytorch are required. Third party libraries can be installed (in a `python3 ` virtualenv) using:
19 |
20 | ```bash
21 | pip3 install -r requirements.txt
22 | ```
23 |
24 | ## Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation
25 |
26 | ### Preparing training data
27 |
28 | #### KITTI
29 | For [KITTI](http://www.cvlibs.net/datasets/kitti/raw_data.php), first download the dataset using this [script](http://www.cvlibs.net/download.php?file=raw_data_downloader.zip) provided on the official website, and then run the following command.
30 |
31 | ```bash
32 | python3 data/prepare_train_data.py /path/to/raw/kitti/dataset/ --dataset-format 'kitti' --dump-root /path/to/resulting/formatted/data/ --width 832 --height 256 --num-threads 1 --static-frames data/static_frames.txt --with-gt
33 | ```
34 |
35 | For testing optical flow ground truths on KITTI, download [KITTI2015](http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=flow) dataset. You need to download 1) `stereo 2015/flow 2015/scene flow 2015` data set (2 GB), 2) `multi-view extension` (14 GB), and 3) `calibration files` (1 MB) . In addition, download semantic labels from [here](https://keeper.mpdl.mpg.de/f/239c2dda94e54c449401/?dl=1). You should have the following directory structure:
36 | ```
37 | kitti2015
38 | | data_scene_flow
39 | | data_scene_flow_calib
40 | | data_scene_flow_multiview
41 | | semantic_labels
42 | ```
43 |
44 | #### Cityscapes
45 |
46 | For [Cityscapes](https://www.cityscapes-dataset.com/), download the following packages: 1) `leftImg8bit_sequence_trainvaltest.zip`, 2) `camera_trainvaltest.zip`. You will probably need to contact the administrators to be able to get it.
47 |
48 | ```bash
49 | python3 data/prepare_train_data.py /path/to/cityscapes/dataset/ --dataset-format 'cityscapes' --dump-root /path/to/resulting/formatted/data/ --width 832 --height 342 --num-threads 1
50 | ```
51 |
52 | Notice that for Cityscapes the `img_height` is set to 342 because we crop out the bottom part of the image that contains the car logo, and the resulting image will have height 256.
53 |
54 | ### Training an experiment
55 |
56 | Once the data are formatted following the above instructions, you should be able to run a training experiment. Every experiment you run gets logged in `experiment_recorder.md`.
57 |
58 | ```bash
59 | python3 train.py /path/to/formatted/data --dispnet DispResNet6 --posenet PoseNetB6 \
60 | --masknet MaskNet6 --flownet Back2Future --pretrained-disp /path/to/pretrained/dispnet \
61 | --pretrained-pose /path/to/pretrained/posenet --pretrained-flow /path/to/pretrained/flownet \
62 | --pretrained-mask /path/to/pretrained/masknet -b4 -m0.1 -pf 0.5 -pc 1.0 -s0.1 -c0.3 \
63 | --epoch-size 1000 --log-output -f 0 --nlevels 6 --lr 1e-4 -wssim 0.997 --with-flow-gt \
64 | --with-depth-gt --epochs 100 --smoothness-type edgeaware --fix-masknet --fix-flownet \
65 | --log-terminal --name EXPERIMENT_NAME
66 | ```
67 |
68 |
69 | You can then start a `tensorboard` session in this folder by
70 | ```bash
71 | tensorboard --logdir=checkpoints/
72 | ```
73 | and visualize the training progress by opening [https://localhost:6006](https://localhost:6006) on your browser.
74 |
75 | ### Evaluation
76 |
77 | Disparity evaluation
78 | ```bash
79 | python3 test_disp.py --dispnet DispResNet6 --pretrained-dispnet /path/to/dispnet --pretrained-posent /path/to/posenet --dataset-dir /path/to/KITTI_raw --dataset-list /path/to/test_files_list
80 | ```
81 |
82 | Test file list is available in kitti eval folder. To get fair comparison with [Original paper evaluation code](https://github.com/tinghuiz/SfMLearner/blob/master/kitti_eval/eval_depth.py), don't specify a posenet. However, if you do, it will be used to solve the scale factor ambiguity, the only ground truth used to get it will be vehicle speed which is far more acceptable for real conditions quality measurement, but you will obviously get worse results.
83 |
84 | For pose evaluation, you need to download [KITTI Odometry](http://www.cvlibs.net/datasets/kitti/eval_odometry.php) dataset.
85 | ```bash
86 | python test_pose.py pretrained/pose_model_best.pth.tar --img-width 832 --img-height 256 --dataset-dir /path/to/kitti/odometry/ --sequences 09 --posenet PoseNetB6
87 | ```
88 |
89 | Optical Flow evaluation
90 | ```bash
91 | python test_flow.py --pretrained-disp /path/to/dispnet --pretrained-pose /path/to/posenet --pretrained-mask /path/to/masknet --pretrained-flow /path/to/flownet --kitti-dir /path/to/kitti2015/dataset
92 | ```
93 |
94 | Mask evaluation
95 | ```bash
96 | python test_mask.py --pretrained-disp /path/to/dispnet --pretrained-pose /path/to/posenet --pretrained-mask /path/to/masknet --pretrained-flow /path/to/flownet --kitti-dir /path/to/kitti2015/dataset
97 | ```
98 |
99 |
100 | ## Mixed Domain Learning using MNIST+SVHN
101 |
102 | #### Training
103 | For learning classification using Competitive Collaboration with two agents, Alice and Bob, run,
104 | ```bash
105 | python3 mnist.py path/to/download/mnist/svhn/datasets/ --name EXP_NAME --log-output --log-terminal --epoch-size 1000 --epochs 400 --wr 1000
106 | ```
107 |
108 | #### Evaluation
109 | To evaluate the performance of Alice, Bob and Moderator trained using CC, run,
110 | ```bash
111 | python3 mnist_eval.py path/to/mnist/svhn/datasets --pretrained-alice pretrained/mnist_svhn/alice.pth.tar --pretrained-bob pretrained/mnist_svhn/bob.pth.tar --pretrained-mod pretrained/mnist_svhn/mod.pth.tar
112 | ```
113 |
114 |
115 | ## Downloads
116 | #### Pretrained Models
117 | - [DispNet, PoseNet, MaskNet and FlowNet](https://keeper.mpdl.mpg.de/f/72e946daa4e0481fb735/?dl=1) in joint unsupervised learning of depth, camera motion, optical flow and motion segmentation.
118 | - [Alice, Bob and Moderator](https://keeper.mpdl.mpg.de/f/d0c7d4ebd0d74b84bf10/?dl=1) in Mixed Domain Classification
119 |
120 | #### Evaluation Data
121 | - [Semantic Labels for KITTI](https://keeper.mpdl.mpg.de/f/239c2dda94e54c449401/?dl=1)
122 |
123 | ## Acknowlegements
124 | We thank Frederik Kunstner for verifying the convergence proofs. We are grateful to Clement Pinard for his [github repository](https://github.com/ClementPinard/SfmLearner-Pytorch). We use it as our initial code base. We thank Georgios Pavlakos for helping us with several revisions of the paper. We thank Joel Janai for preparing optical flow visualizations, and Clement Gorard for his Make3d evaluation code.
125 |
126 |
127 | ## References
128 | *Anurag Ranjan, Varun Jampani, Lukas Balles, Deqing Sun, Kihwan Kim, Jonas Wulff and Michael J. Black.* **Competitive Collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation.** CVPR 2019.
129 |
--------------------------------------------------------------------------------
/custom_transforms.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 | import torch
3 | import random
4 | import numpy as np
5 | from scipy.misc import imresize, imrotate
6 |
7 | '''Set of tranform random routines that takes list of inputs as arguments,
8 | in order to have random but coherent transformations.'''
9 |
10 |
11 | class Compose(object):
12 | def __init__(self, transforms):
13 | self.transforms = transforms
14 |
15 | def __call__(self, images, intrinsics):
16 | for t in self.transforms:
17 | images, intrinsics = t(images, intrinsics)
18 | return images, intrinsics
19 |
20 |
21 | class Normalize(object):
22 | def __init__(self, mean, std):
23 | self.mean = mean
24 | self.std = std
25 |
26 | def __call__(self, images, intrinsics):
27 | for tensor in images:
28 | for t, m, s in zip(tensor, self.mean, self.std):
29 | t.sub_(m).div_(s)
30 | return images, intrinsics
31 |
32 |
33 | class NormalizeLocally(object):
34 |
35 | def __call__(self, images, intrinsics):
36 | image_tensor = torch.stack(images)
37 | assert(image_tensor.size(1)==3) #3 channel image
38 | mean = image_tensor.transpose(0,1).contiguous().view(3, -1).mean(1)
39 | std = image_tensor.transpose(0,1).contiguous().view(3, -1).std(1)
40 |
41 | for tensor in images:
42 | for t, m, s in zip(tensor, mean, std):
43 | t.sub_(m).div_(s)
44 | return images, intrinsics
45 |
46 |
47 | class ArrayToTensor(object):
48 | """Converts a list of numpy.ndarray (H x W x C) along with a intrinsics matrix to a list of torch.FloatTensor of shape (C x H x W) with a intrinsics tensor."""
49 |
50 | def __call__(self, images, intrinsics):
51 | tensors = []
52 | for im in images:
53 | # put it from HWC to CHW format
54 | im = np.transpose(im, (2, 0, 1))
55 | # handle numpy array
56 | tensors.append(torch.from_numpy(im).float()/255)
57 | return tensors, intrinsics
58 |
59 |
60 | class RandomHorizontalFlip(object):
61 | """Randomly horizontally flips the given numpy array with a probability of 0.5"""
62 |
63 | def __call__(self, images, intrinsics):
64 | assert intrinsics is not None
65 | if random.random() < 0.5:
66 | output_intrinsics = np.copy(intrinsics)
67 | output_images = [np.copy(np.fliplr(im)) for im in images]
68 | w = output_images[0].shape[1]
69 | output_intrinsics[0,2] = w - output_intrinsics[0,2]
70 | else:
71 | output_images = images
72 | output_intrinsics = intrinsics
73 | return output_images, output_intrinsics
74 |
75 | class RandomRotate(object):
76 | """Randomly rotates images up to 10 degrees and crop them to keep same size as before."""
77 | def __call__(self, images, intrinsics):
78 | if np.random.random() > 0.5:
79 | return images, intrinsics
80 | else:
81 | assert intrinsics is not None
82 | rot = np.random.uniform(0,10)
83 | rotated_images = [imrotate(im, rot) for im in images]
84 |
85 | return rotated_images, intrinsics
86 |
87 |
88 |
89 |
90 | class RandomScaleCrop(object):
91 | """Randomly zooms images up to 15% and crop them to keep same size as before."""
92 | def __init__(self, h=0, w=0):
93 | self.h = h
94 | self.w = w
95 |
96 | def __call__(self, images, intrinsics):
97 | assert intrinsics is not None
98 | output_intrinsics = np.copy(intrinsics)
99 |
100 | in_h, in_w, _ = images[0].shape
101 | x_scaling, y_scaling = np.random.uniform(1,1.1,2)
102 | scaled_h, scaled_w = int(in_h * y_scaling), int(in_w * x_scaling)
103 |
104 | output_intrinsics[0] *= x_scaling
105 | output_intrinsics[1] *= y_scaling
106 | scaled_images = [imresize(im, (scaled_h, scaled_w)) for im in images]
107 |
108 | if self.h and self.w:
109 | in_h, in_w = self.h, self.w
110 |
111 | offset_y = np.random.randint(scaled_h - in_h + 1)
112 | offset_x = np.random.randint(scaled_w - in_w + 1)
113 | cropped_images = [im[offset_y:offset_y + in_h, offset_x:offset_x + in_w] for im in scaled_images]
114 |
115 | output_intrinsics[0,2] -= offset_x
116 | output_intrinsics[1,2] -= offset_y
117 |
118 | return cropped_images, output_intrinsics
119 |
120 | class Scale(object):
121 | """Scales images to a particular size"""
122 | def __init__(self, h, w):
123 | self.h = h
124 | self.w = w
125 |
126 | def __call__(self, images, intrinsics):
127 | assert intrinsics is not None
128 | output_intrinsics = np.copy(intrinsics)
129 |
130 | in_h, in_w, _ = images[0].shape
131 | scaled_h, scaled_w = self.h , self.w
132 |
133 | output_intrinsics[0] *= (scaled_w / in_w)
134 | output_intrinsics[1] *= (scaled_h / in_h)
135 | scaled_images = [imresize(im, (scaled_h, scaled_w)) for im in images]
136 |
137 | return scaled_images, output_intrinsics
138 |
--------------------------------------------------------------------------------
/data/cityscapes_loader.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 | import json
3 | import numpy as np
4 | import scipy.misc
5 | from path import Path
6 | from tqdm import tqdm
7 |
8 |
9 | class cityscapes_loader(object):
10 | def __init__(self,
11 | dataset_dir,
12 | split='train',
13 | crop_bottom=True, # Get rid of the car logo
14 | img_height=171,
15 | img_width=416):
16 | self.dataset_dir = Path(dataset_dir)
17 | self.split = split
18 | # Crop out the bottom 25% of the image to remove the car logo
19 | self.crop_bottom = crop_bottom
20 | self.img_height = img_height
21 | self.img_width = img_width
22 | self.min_speed = 2
23 | self.scenes = (self.dataset_dir/'leftImg8bit_sequence'/split).dirs()
24 | print('Total scenes collected: {}'.format(len(self.scenes)))
25 |
26 | def collect_scenes(self, city):
27 | img_files = sorted(city.files('*.png'))
28 | scenes = {}
29 | connex_scenes = {}
30 | connex_scene_data_list = []
31 | for f in img_files:
32 | scene_id,frame_id = f.basename().split('_')[1:3]
33 | if scene_id not in scenes.keys():
34 | scenes[scene_id] = []
35 | scenes[scene_id].append(frame_id)
36 |
37 | # divide scenes into connexe sequences
38 | for scene_id in scenes.keys():
39 | previous = None
40 | connex_scenes[scene_id] = []
41 | for id in scenes[scene_id]:
42 | if previous is None or int(id) - int(previous) > 1:
43 | current_list = []
44 | connex_scenes[scene_id].append(current_list)
45 | current_list.append(id)
46 | previous = id
47 |
48 | # create scene data dicts, and subsample scene every two frames
49 | for scene_id in connex_scenes.keys():
50 | intrinsics = self.load_intrinsics(city, scene_id)
51 | for subscene in connex_scenes[scene_id]:
52 | frame_speeds = [self.load_speed(city, scene_id, frame_id) for frame_id in subscene]
53 | connex_scene_data_list.append({'city':city,
54 | 'scene_id': scene_id,
55 | 'rel_path': city.basename()+'_'+scene_id+'_'+subscene[0]+'_0',
56 | 'intrinsics': intrinsics,
57 | 'frame_ids':subscene[0::2],
58 | 'speeds':frame_speeds[0::2]})
59 | connex_scene_data_list.append({'city':city,
60 | 'scene_id': scene_id,
61 | 'rel_path': city.basename()+'_'+scene_id+'_'+subscene[0]+'_1',
62 | 'intrinsics': intrinsics,
63 | 'frame_ids': subscene[1::2],
64 | 'speeds': frame_speeds[1::2]})
65 | return connex_scene_data_list
66 |
67 | def load_intrinsics(self, city, scene_id):
68 | city_name = city.basename()
69 | camera_folder = self.dataset_dir/'camera'/self.split/city_name
70 | camera_file = camera_folder.files('{}_{}_*_camera.json'.format(city_name, scene_id))[0]
71 | frame_id = camera_file.split('_')[2]
72 | frame_path = city/'{}_{}_{}_leftImg8bit.png'.format(city_name, scene_id, frame_id)
73 |
74 | with open(camera_file, 'r') as f:
75 | camera = json.load(f)
76 | fx = camera['intrinsic']['fx']
77 | fy = camera['intrinsic']['fy']
78 | u0 = camera['intrinsic']['u0']
79 | v0 = camera['intrinsic']['v0']
80 | intrinsics = np.array([[fx, 0, u0],
81 | [0, fy, v0],
82 | [0, 0, 1]])
83 |
84 | img = scipy.misc.imread(frame_path)
85 | h,w,_ = img.shape
86 | zoom_y = self.img_height/h
87 | zoom_x = self.img_width/w
88 |
89 | intrinsics[0] *= zoom_x
90 | intrinsics[1] *= zoom_y
91 | return intrinsics
92 |
93 | def load_speed(self, city, scene_id, frame_id):
94 | city_name = city.basename()
95 | vehicle_folder = self.dataset_dir/'vehicle_sequence'/self.split/city_name
96 | vehicle_file = vehicle_folder/'{}_{}_{}_vehicle.json'.format(city_name, scene_id, frame_id)
97 | with open(vehicle_file, 'r') as f:
98 | vehicle = json.load(f)
99 | return vehicle['speed']
100 |
101 | def get_scene_imgs(self, scene_data):
102 | cum_speed = np.zeros(3)
103 | print(scene_data['city'].basename(), scene_data['scene_id'], scene_data['frame_ids'][0])
104 | for i,frame_id in enumerate(scene_data['frame_ids']):
105 | cum_speed += scene_data['speeds'][i]
106 | speed_mag = np.linalg.norm(cum_speed)
107 | if speed_mag > self.min_speed:
108 | yield self.load_image(scene_data['city'], scene_data['scene_id'], frame_id), frame_id
109 | cum_speed *= 0
110 |
111 | def load_image(self, city, scene_id, frame_id):
112 | img_file = city/'{}_{}_{}_leftImg8bit.png'.format(city.basename(),
113 | scene_id,
114 | frame_id)
115 | if not img_file.isfile():
116 | return None
117 | img = scipy.misc.imread(img_file)
118 | img = scipy.misc.imresize(img, (self.img_height, self.img_width))[:int(self.img_height*0.75)]
119 | return img
120 |
--------------------------------------------------------------------------------
/data/kitti_raw_loader.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from path import Path
3 | import scipy.misc
4 | from collections import Counter
5 |
6 |
7 | class KittiRawLoader(object):
8 | def __init__(self,
9 | dataset_dir,
10 | static_frames_file=None,
11 | img_height=128,
12 | img_width=416,
13 | min_speed=2,
14 | get_gt=False):
15 | dir_path = Path(__file__).realpath().dirname()
16 | test_scene_file = dir_path/'test_scenes.txt'
17 |
18 | self.from_speed = static_frames_file is None
19 | if static_frames_file is not None:
20 | static_frames_file = Path(static_frames_file)
21 | self.collect_static_frames(static_frames_file)
22 |
23 | with open(test_scene_file, 'r') as f:
24 | test_scenes = f.readlines()
25 | self.test_scenes = [t[:-1] for t in test_scenes]
26 | self.dataset_dir = Path(dataset_dir)
27 | self.img_height = img_height
28 | self.img_width = img_width
29 | self.cam_ids = ['02', '03']
30 | self.date_list = ['2011_09_26', '2011_09_28', '2011_09_29', '2011_09_30', '2011_10_03']
31 | self.min_speed = min_speed
32 | self.get_gt = get_gt
33 | self.collect_train_folders()
34 |
35 | def collect_static_frames(self, static_frames_file):
36 | with open(static_frames_file, 'r') as f:
37 | frames = f.readlines()
38 | self.static_frames = {}
39 | for fr in frames:
40 | if fr == '\n':
41 | continue
42 | date, drive, frame_id = fr.split(' ')
43 | curr_fid = '%.10d' % (np.int(frame_id[:-1]))
44 | if drive not in self.static_frames.keys():
45 | self.static_frames[drive] = []
46 | self.static_frames[drive].append(curr_fid)
47 |
48 | def collect_train_folders(self):
49 | self.scenes = []
50 | for date in self.date_list:
51 | drive_set = (self.dataset_dir/date).dirs()
52 | for dr in drive_set:
53 | if dr.name[:-5] not in self.test_scenes:
54 | self.scenes.append(dr)
55 |
56 | def collect_scenes(self, drive):
57 | train_scenes = []
58 | for c in self.cam_ids:
59 | oxts = sorted((drive/'oxts'/'data').files('*.txt'))
60 | scene_data = {'cid': c, 'dir': drive, 'speed': [], 'frame_id': [], 'rel_path': drive.name + '_' + c}
61 | for n, f in enumerate(oxts):
62 | metadata = np.genfromtxt(f)
63 | speed = metadata[8:11]
64 | scene_data['speed'].append(speed)
65 | scene_data['frame_id'].append('{:010d}'.format(n))
66 | sample = self.load_image(scene_data, 0)
67 | if sample is None:
68 | return []
69 | scene_data['P_rect'] = self.get_P_rect(scene_data, sample[1], sample[2])
70 | scene_data['intrinsics'] = scene_data['P_rect'][:,:3]
71 |
72 | train_scenes.append(scene_data)
73 | return train_scenes
74 |
75 | def get_scene_imgs(self, scene_data):
76 | def construct_sample(scene_data, i, frame_id):
77 | sample = [self.load_image(scene_data, i)[0], frame_id]
78 | if self.get_gt:
79 | sample.append(self.generate_depth_map(scene_data, i))
80 | return sample
81 |
82 | if self.from_speed:
83 | cum_speed = np.zeros(3)
84 | for i, speed in enumerate(scene_data['speed']):
85 | cum_speed += speed
86 | speed_mag = np.linalg.norm(cum_speed)
87 | if speed_mag > self.min_speed:
88 | frame_id = scene_data['frame_id'][i]
89 | yield construct_sample(scene_data, i, frame_id)
90 | cum_speed *= 0
91 | else: # from static frame file
92 | drive = str(scene_data['dir'].name)
93 | for (i,frame_id) in enumerate(scene_data['frame_id']):
94 | if (drive not in self.static_frames.keys()) or (frame_id not in self.static_frames[drive]):
95 | yield construct_sample(scene_data, i, frame_id)
96 |
97 | def get_P_rect(self, scene_data, zoom_x, zoom_y):
98 | #print(zoom_x, zoom_y)
99 | calib_file = scene_data['dir'].parent/'calib_cam_to_cam.txt'
100 |
101 | filedata = self.read_raw_calib_file(calib_file)
102 | P_rect = np.reshape(filedata['P_rect_' + scene_data['cid']], (3, 4))
103 | P_rect[0] *= zoom_x
104 | P_rect[1] *= zoom_y
105 | return P_rect
106 |
107 | def load_image(self, scene_data, tgt_idx):
108 | img_file = scene_data['dir']/'image_{}'.format(scene_data['cid'])/'data'/scene_data['frame_id'][tgt_idx]+'.png'
109 | if not img_file.isfile():
110 | return None
111 | img = scipy.misc.imread(img_file)
112 | zoom_y = self.img_height/img.shape[0]
113 | zoom_x = self.img_width/img.shape[1]
114 | img = scipy.misc.imresize(img, (self.img_height, self.img_width))
115 | return img, zoom_x, zoom_y
116 |
117 | def read_raw_calib_file(self, filepath):
118 | # From https://github.com/utiasSTARS/pykitti/blob/master/pykitti/utils.py
119 | """Read in a calibration file and parse into a dictionary."""
120 | data = {}
121 |
122 | with open(filepath, 'r') as f:
123 | for line in f.readlines():
124 | key, value = line.split(':', 1)
125 | # The only non-float values in these files are dates, which
126 | # we don't care about anyway
127 | try:
128 | data[key] = np.array([float(x) for x in value.split()])
129 | except ValueError:
130 | pass
131 | return data
132 |
133 | def generate_depth_map(self, scene_data, tgt_idx):
134 | # compute projection matrix velodyne->image plane
135 |
136 | def sub2ind(matrixSize, rowSub, colSub):
137 | m, n = matrixSize
138 | return rowSub * (n-1) + colSub - 1
139 |
140 | R_cam2rect = np.eye(4)
141 |
142 | calib_dir = scene_data['dir'].parent
143 | cam2cam = self.read_raw_calib_file(calib_dir/'calib_cam_to_cam.txt')
144 | velo2cam = self.read_raw_calib_file(calib_dir/'calib_velo_to_cam.txt')
145 | velo2cam = np.hstack((velo2cam['R'].reshape(3,3), velo2cam['T'][..., np.newaxis]))
146 | velo2cam = np.vstack((velo2cam, np.array([0, 0, 0, 1.0])))
147 | P_rect = scene_data['P_rect']
148 | R_cam2rect[:3,:3] = cam2cam['R_rect_00'].reshape(3,3)
149 |
150 | P_velo2im = np.dot(np.dot(P_rect, R_cam2rect), velo2cam)
151 |
152 | velo_file_name = scene_data['dir']/'velodyne_points'/'data'/'{}.bin'.format(scene_data['frame_id'][tgt_idx])
153 |
154 | # load velodyne points and remove all behind image plane (approximation)
155 | # each row of the velodyne data is forward, left, up, reflectance
156 | velo = np.fromfile(velo_file_name, dtype=np.float32).reshape(-1, 4)
157 | velo[:,3] = 1
158 | velo = velo[velo[:, 0] >= 0, :]
159 |
160 | # project the points to the camera
161 | velo_pts_im = np.dot(P_velo2im, velo.T).T
162 | velo_pts_im[:, :2] = velo_pts_im[:,:2] / velo_pts_im[:,-1:]
163 |
164 | # check if in bounds
165 | # use minus 1 to get the exact same value as KITTI matlab code
166 | velo_pts_im[:, 0] = np.round(velo_pts_im[:,0]) - 1
167 | velo_pts_im[:, 1] = np.round(velo_pts_im[:,1]) - 1
168 |
169 | val_inds = (velo_pts_im[:, 0] >= 0) & (velo_pts_im[:, 1] >= 0)
170 | val_inds = val_inds & (velo_pts_im[:,0] < self.img_width) & (velo_pts_im[:,1] < self.img_height)
171 | velo_pts_im = velo_pts_im[val_inds, :]
172 |
173 | # project to image
174 | depth = np.zeros((self.img_height, self.img_width)).astype(np.float32)
175 | depth[velo_pts_im[:, 1].astype(np.int), velo_pts_im[:, 0].astype(np.int)] = velo_pts_im[:, 2]
176 |
177 | # find the duplicate points and choose the closest depth
178 | inds = sub2ind(depth.shape, velo_pts_im[:, 1], velo_pts_im[:, 0])
179 | dupe_inds = [item for item, count in Counter(inds).items() if count > 1]
180 | for dd in dupe_inds:
181 | pts = np.where(inds == dd)[0]
182 | x_loc = int(velo_pts_im[pts[0], 0])
183 | y_loc = int(velo_pts_im[pts[0], 1])
184 | depth[y_loc, x_loc] = velo_pts_im[pts, 2].min()
185 | depth[depth < 0] = 0
186 | return depth
187 |
--------------------------------------------------------------------------------
/data/prepare_train_data.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 | import argparse
3 | import scipy.misc
4 | import numpy as np
5 | from joblib import Parallel, delayed
6 | from tqdm import tqdm
7 | from path import Path
8 |
9 | parser = argparse.ArgumentParser()
10 | parser.add_argument("dataset_dir", metavar='DIR',
11 | help='path to original dataset')
12 | parser.add_argument("--dataset-format", type=str, required=True, choices=["kitti", "cityscapes"])
13 | parser.add_argument("--static-frames", default=None,
14 | help="list of imgs to discard for being static, if not set will discard them based on speed \
15 | (careful, on KITTI some frames have incorrect speed)")
16 | parser.add_argument("--with-gt", action='store_true',
17 | help="If available (e.g. with KITTI), will store ground truth along with images, for validation")
18 | parser.add_argument("--dump-root", type=str, required=True, help="Where to dump the data")
19 | parser.add_argument("--height", type=int, default=128, help="image height")
20 | parser.add_argument("--width", type=int, default=416, help="image width")
21 | parser.add_argument("--num-threads", type=int, default=4, help="number of threads to use")
22 |
23 | args = parser.parse_args()
24 |
25 |
26 | def dump_example(scene):
27 | scene_list = data_loader.collect_scenes(scene)
28 | for scene_data in scene_list:
29 | dump_dir = args.dump_root/scene_data['rel_path']
30 | dump_dir.makedirs_p()
31 | intrinsics = scene_data['intrinsics']
32 | fx = intrinsics[0, 0]
33 | fy = intrinsics[1, 1]
34 | cx = intrinsics[0, 2]
35 | cy = intrinsics[1, 2]
36 |
37 | dump_cam_file = dump_dir/'cam.txt'
38 | with open(dump_cam_file, 'w') as f:
39 | f.write('%f,0.,%f,0.,%f,%f,0.,0.,1.' % (fx, cx, fy, cy))
40 |
41 | for sample in data_loader.get_scene_imgs(scene_data):
42 | assert(len(sample) >= 2)
43 | img, frame_nb = sample[0], sample[1]
44 | dump_img_file = dump_dir/'{}.jpg'.format(frame_nb)
45 | scipy.misc.imsave(dump_img_file, img)
46 | if len(sample) == 3:
47 | dump_depth_file = dump_dir/'{}.npy'.format(frame_nb)
48 | np.save(dump_depth_file, sample[2])
49 |
50 | if len(dump_dir.files('*.jpg')) < 3:
51 | dump_dir.rmtree()
52 |
53 |
54 | def main():
55 | args.dump_root = Path(args.dump_root)
56 | args.dump_root.mkdir_p()
57 |
58 | global data_loader
59 |
60 | if args.dataset_format == 'kitti':
61 | from kitti_raw_loader import KittiRawLoader
62 | data_loader = KittiRawLoader(args.dataset_dir,
63 | static_frames_file=args.static_frames,
64 | img_height=args.height,
65 | img_width=args.width,
66 | get_gt=args.with_gt)
67 |
68 | if args.dataset_format == 'cityscapes':
69 | from cityscapes_loader import cityscapes_loader
70 | data_loader = cityscapes_loader(args.dataset_dir,
71 | img_height=args.height,
72 | img_width=args.width)
73 |
74 | print('Retrieving frames')
75 | Parallel(n_jobs=args.num_threads)(delayed(dump_example)(scene) for scene in tqdm(data_loader.scenes))
76 | # Split into train/val
77 | print('Generating train val lists')
78 | np.random.seed(8964)
79 | subfolders = args.dump_root.dirs()
80 | with open(args.dump_root / 'train.txt', 'w') as tf:
81 | with open(args.dump_root / 'val.txt', 'w') as vf:
82 | for s in tqdm(subfolders):
83 | if np.random.random() < 0.1:
84 | vf.write('{}\n'.format(s.name))
85 | else:
86 | tf.write('{}\n'.format(s.name))
87 | # remove useless groundtruth data for training comment if you don't want to erase it
88 | for gt_file in s.files('*.npy'):
89 | gt_file.remove_p()
90 |
91 |
92 | if __name__ == '__main__':
93 | main()
94 |
--------------------------------------------------------------------------------
/data/test_scenes.txt:
--------------------------------------------------------------------------------
1 | 2011_09_26_drive_0117
2 | 2011_09_28_drive_0002
3 | 2011_09_26_drive_0052
4 | 2011_09_30_drive_0016
5 | 2011_09_26_drive_0059
6 | 2011_09_26_drive_0027
7 | 2011_09_26_drive_0020
8 | 2011_09_26_drive_0009
9 | 2011_09_26_drive_0013
10 | 2011_09_26_drive_0101
11 | 2011_09_26_drive_0046
12 | 2011_09_26_drive_0029
13 | 2011_09_26_drive_0064
14 | 2011_09_26_drive_0048
15 | 2011_10_03_drive_0027
16 | 2011_09_26_drive_0002
17 | 2011_09_26_drive_0036
18 | 2011_09_29_drive_0071
19 | 2011_10_03_drive_0047
20 | 2011_09_30_drive_0027
21 | 2011_09_26_drive_0086
22 | 2011_09_26_drive_0084
23 | 2011_09_26_drive_0096
24 | 2011_09_30_drive_0018
25 | 2011_09_26_drive_0106
26 | 2011_09_26_drive_0056
27 | 2011_09_26_drive_0023
28 | 2011_09_26_drive_0093
29 |
--------------------------------------------------------------------------------
/datasets/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/anuragranj/cc/2b4e36292c18f8ee68ad5d210a4190f9adf881dc/datasets/__init__.py
--------------------------------------------------------------------------------
/datasets/general_sequence_folders.py:
--------------------------------------------------------------------------------
1 | import torch.utils.data as data
2 | import numpy as np
3 | from scipy.misc import imread
4 | from path import Path
5 | import random
6 |
7 | def crawl_folders(folders_list, sequence_length):
8 | sequence_set = []
9 | demi_length = (sequence_length-1)//2
10 | for folder in folders_list:
11 | #intrinsics = np.genfromtxt(folder/'cam.txt', delimiter=',').astype(np.float32).reshape((3, 3))
12 | imgs = sorted(folder.files('*.jpg'))
13 | if len(imgs) < sequence_length:
14 | continue
15 | for i in range(demi_length, len(imgs)-demi_length):
16 | sample = {'tgt': imgs[i], 'ref_imgs': []}
17 | for j in range(-demi_length, demi_length + 1):
18 | if j != 0:
19 | sample['ref_imgs'].append(imgs[i+j])
20 | sequence_set.append(sample)
21 | random.shuffle(sequence_set)
22 | return sequence_set
23 |
24 |
25 | def load_as_float(path):
26 | return imread(path).astype(np.float32)
27 |
28 |
29 | class SequenceFolder(data.Dataset):
30 | """A sequence data loader where the files are arranged in this way:
31 | root/scene_1/0000000.jpg
32 | root/scene_1/0000001.jpg
33 | ..
34 | root/scene_1/cam.txt
35 | root/scene_2/0000000.jpg
36 | .
37 |
38 | transform functions must take in a list a images and a numpy array (usually intrinsics matrix)
39 | """
40 |
41 | def __init__(self, root, seed=None, train=True, sequence_length=3, transform=None, target_transform=None):
42 | np.random.seed(seed)
43 | random.seed(seed)
44 | self.root = Path(root)
45 | #scene_list_path = self.root/'train.txt' if train else self.root/'val.txt'
46 | self.scenes = self.root.dirs()
47 | self.samples = crawl_folders(self.scenes, sequence_length)
48 | self.transform = transform
49 |
50 | def __getitem__(self, index):
51 | sample = self.samples[index]
52 | tgt_img = load_as_float(sample['tgt'])
53 | ref_imgs = [load_as_float(ref_img) for ref_img in sample['ref_imgs']]
54 | if self.transform is not None:
55 | imgs, intrinsics = self.transform([tgt_img] + ref_imgs, np.copy(sample['intrinsics']))
56 | tgt_img = imgs[0]
57 | ref_imgs = imgs[1:]
58 | else:
59 | intrinsics = np.copy(sample['intrinsics'])
60 | return tgt_img, ref_imgs, intrinsics, np.linalg.inv(intrinsics)
61 |
62 | def __len__(self):
63 | return len(self.samples)
64 |
--------------------------------------------------------------------------------
/datasets/sequence_folders.py:
--------------------------------------------------------------------------------
1 | import torch.utils.data as data
2 | import numpy as np
3 | from scipy.misc import imread
4 | from path import Path
5 | import random
6 |
7 |
8 | def crawl_folders(folders_list, sequence_length):
9 | sequence_set = []
10 | demi_length = (sequence_length-1)//2
11 | for folder in folders_list:
12 | intrinsics = np.genfromtxt(folder/'cam.txt', delimiter=',').astype(np.float32).reshape((3, 3))
13 | imgs = sorted(folder.files('*.jpg'))
14 | if len(imgs) < sequence_length:
15 | continue
16 | for i in range(demi_length, len(imgs)-demi_length):
17 | sample = {'intrinsics': intrinsics, 'tgt': imgs[i], 'ref_imgs': []}
18 | for j in range(-demi_length, demi_length + 1):
19 | if j != 0:
20 | sample['ref_imgs'].append(imgs[i+j])
21 | sequence_set.append(sample)
22 | random.shuffle(sequence_set)
23 | return sequence_set
24 |
25 |
26 | def load_as_float(path):
27 | return imread(path).astype(np.float32)
28 |
29 |
30 | class SequenceFolder(data.Dataset):
31 | """A sequence data loader where the files are arranged in this way:
32 | root/scene_1/0000000.jpg
33 | root/scene_1/0000001.jpg
34 | ..
35 | root/scene_1/cam.txt
36 | root/scene_2/0000000.jpg
37 | .
38 |
39 | transform functions must take in a list a images and a numpy array (usually intrinsics matrix)
40 | """
41 |
42 | def __init__(self, root, seed=None, train=True, sequence_length=3, transform=None, target_transform=None):
43 | np.random.seed(seed)
44 | random.seed(seed)
45 | self.root = Path(root)
46 | scene_list_path = self.root/'train.txt' if train else self.root/'val.txt'
47 | self.scenes = [self.root/folder[:-1] for folder in open(scene_list_path)]
48 | self.samples = crawl_folders(self.scenes, sequence_length)
49 | self.transform = transform
50 |
51 | def __getitem__(self, index):
52 | sample = self.samples[index]
53 | tgt_img = load_as_float(sample['tgt'])
54 | ref_imgs = [load_as_float(ref_img) for ref_img in sample['ref_imgs']]
55 | if self.transform is not None:
56 | imgs, intrinsics = self.transform([tgt_img] + ref_imgs, np.copy(sample['intrinsics']))
57 | tgt_img = imgs[0]
58 | ref_imgs = imgs[1:]
59 | else:
60 | intrinsics = np.copy(sample['intrinsics'])
61 | return tgt_img, ref_imgs, intrinsics, np.linalg.inv(intrinsics)
62 |
63 | def __len__(self):
64 | return len(self.samples)
65 |
--------------------------------------------------------------------------------
/datasets/stacked_sequence_folders.py:
--------------------------------------------------------------------------------
1 | import torch.utils.data as data
2 | import numpy as np
3 | from scipy.misc import imread
4 | from path import Path
5 | import random
6 |
7 |
8 | def crawl_folders(folders_list, sequence_length):
9 | sequence_set = []
10 | demi_length = (sequence_length-1)//2
11 | for folder in folders_list:
12 | intrinsics = [np.genfromtxt(cam_file, delimiter=',').astype(np.float32).reshape((3, 3)) for cam_file in sorted(folder.files('*_cam.txt'))]
13 | imgs = sorted(folder.files('*.jpg'))
14 | for i in range(len(imgs)):
15 | sample = {'intrinsics': intrinsics[i], 'img_stack': imgs[i]}
16 | sequence_set.append(sample)
17 | random.shuffle(sequence_set)
18 | return sequence_set
19 |
20 |
21 | def load_as_float(path, sequence_length):
22 | stack = imread(path).astype(np.float32)
23 | h,w,_ = stack.shape
24 | w_img = int(w/(sequence_length))
25 | imgs = [stack[:,i*w_img:(i+1)*w_img] for i in range(sequence_length)]
26 | tgt_index = sequence_length//2
27 | return([imgs[tgt_index]] + imgs[:tgt_index] + imgs[tgt_index+1:])
28 |
29 |
30 | class SequenceFolder(data.Dataset):
31 | """A sequence data loader where the images are arranged in this way:
32 | root/scene_1/0000000.jpg
33 | root/scene_1/0000000_cam.txt
34 | root/scene_1/0000001.jpg
35 | root/scene_1/0000001_cam.txt
36 | .
37 | root/scene_2/0000000.jpg
38 | root/scene_2/0000000_cam.txt
39 | """
40 |
41 | def __init__(self, root, seed=None, train=True, sequence_length=3, transform=None, target_transform=None):
42 | np.random.seed(seed)
43 | random.seed(seed)
44 | self.root = Path(root)
45 | self.samples = []
46 | frames_list_path = self.root/'train.txt' if train else self.root/'val.txt'
47 | self.scenes = self.root.dirs()
48 | self.sequence_length = sequence_length
49 | for frame_path in open(frames_list_path):
50 | a,b = frame_path[:-1].split(' ')
51 | base_path = (self.root/a)/b
52 | intrinsics = np.genfromtxt(base_path+'_cam.txt', delimiter=',').astype(np.float32).reshape((3, 3))
53 | sample = {'intrinsics': intrinsics, 'img_stack': base_path+'.jpg'}
54 | self.samples.append(sample)
55 | self.transform = transform
56 |
57 | def __getitem__(self, index):
58 | sample = self.samples[index]
59 | imgs = load_as_float(sample['img_stack'], self.sequence_length)
60 | if self.transform is not None:
61 | imgs, intrinsics = self.transform(imgs, np.copy(sample['intrinsics']))
62 | else:
63 | intrinsics = sample['intrinsics']
64 | return imgs[0], imgs[1:], intrinsics, np.linalg.inv(intrinsics)
65 |
66 | def __len__(self):
67 | return len(self.samples)
68 |
--------------------------------------------------------------------------------
/datasets/validation_folders.py:
--------------------------------------------------------------------------------
1 | import torch.utils.data as data
2 | import numpy as np
3 | from scipy.misc import imread
4 | from path import Path
5 | import torch
6 |
7 |
8 | def crawl_folders(folders_list):
9 | imgs = []
10 | depth = []
11 | for folder in folders_list:
12 | current_imgs = sorted(folder.files('*.jpg'))
13 | current_depth = []
14 | for img in current_imgs:
15 | d = img.dirname()/(img.name[:-4] + '.npy')
16 | assert(d.isfile()), "depth file {} not found".format(str(d))
17 | depth.append(d)
18 | imgs.extend(current_imgs)
19 | depth.extend(current_depth)
20 | return imgs, depth
21 |
22 | def crawl_folders_seq(folders_list, sequence_length):
23 | imgs1 = []
24 | imgs2 = []
25 | depth = []
26 | for folder in folders_list:
27 | current_imgs = sorted(folder.files('*.jpg'))
28 | current_imgs1 = current_imgs[:-1]
29 | current_imgs2 = current_imgs[1:]
30 | current_depth = []
31 | for (img1,img2) in zip(current_imgs1, current_imgs2):
32 | d = img1.dirname()/(img1.name[:-4] + '.npy')
33 | assert(d.isfile()), "depth file {} not found".format(str(d))
34 | depth.append(d)
35 | imgs1.extend(current_imgs1)
36 | imgs2.extend(current_imgs2)
37 | depth.extend(current_depth)
38 | return imgs1, imgs2, depth
39 |
40 |
41 | def load_as_float(path):
42 | return imread(path).astype(np.float32)
43 |
44 |
45 | class ValidationSet(data.Dataset):
46 | """A sequence data loader where the files are arranged in this way:
47 | root/scene_1/0000000.jpg
48 | root/scene_1/0000000.npy
49 | root/scene_1/0000001.jpg
50 | root/scene_1/0000001.npy
51 | ..
52 | root/scene_2/0000000.jpg
53 | root/scene_2/0000000.npy
54 | .
55 |
56 | transform functions must take in a list a images and a numpy array which can be None
57 | """
58 |
59 | def __init__(self, root, transform=None):
60 | self.root = Path(root)
61 | scene_list_path = self.root/'val.txt'
62 | self.scenes = [self.root/folder[:-1] for folder in open(scene_list_path)]
63 | self.imgs, self.depth = crawl_folders(self.scenes)
64 | self.transform = transform
65 |
66 | def __getitem__(self, index):
67 | img = load_as_float(self.imgs[index])
68 | depth = np.load(self.depth[index]).astype(np.float32)
69 | if self.transform is not None:
70 | img, _ = self.transform([img], None)
71 | img = img[0]
72 | return img, depth
73 |
74 | def __len__(self):
75 | return len(self.imgs)
76 |
77 | class ValidationSetSeq(data.Dataset):
78 | """A sequence data loader where the files are arranged in this way:
79 | root/scene_1/0000000.jpg
80 | root/scene_1/0000000.npy
81 | root/scene_1/0000001.jpg
82 | root/scene_1/0000001.npy
83 | ..
84 | root/scene_2/0000000.jpg
85 | root/scene_2/0000000.npy
86 | .
87 |
88 | transform functions must take in a list a images and a numpy array which can be None
89 | """
90 |
91 | def __init__(self, root, transform=None):
92 | self.root = Path(root)
93 | scene_list_path = self.root/'val.txt'
94 | self.scenes = [self.root/folder[:-1] for folder in open(scene_list_path)]
95 | self.imgs1, self.imgs2, self.depth = crawl_folders_seq(self.scenes)
96 | self.transform = transform
97 |
98 | def __getitem__(self, index):
99 | img1 = load_as_float(self.imgs1[index])
100 | img2 = load_as_float(self.imgs2[index])
101 | depth = np.load(self.depth[index]).astype(np.float32)
102 | if self.transform is not None:
103 | img, _ = self.transform([img1, img2], None)
104 | img1, img2 = img[0], img[1]
105 | return (img1, img2), depth
106 |
107 | def __len__(self):
108 | return len(self.imgs1)
109 |
--------------------------------------------------------------------------------
/evaluate_flow.py:
--------------------------------------------------------------------------------
1 | # Author: Anurag Ranjan
2 | # Copyright (c) 2019, Anurag Ranjan
3 | # All rights reserved.
4 |
5 | import argparse
6 | import os
7 | from tqdm import tqdm
8 | import numpy as np
9 | from path import Path
10 | from flowutils import flow_io
11 | from logger import AverageMeter
12 | epsilon = 1e-8
13 | parser = argparse.ArgumentParser(description='Benchmark optical flow predictions',
14 | formatter_class=argparse.ArgumentDefaultsHelpFormatter)
15 | parser.add_argument('--output-dir', dest='output_dir', type=str, default=None, help='path to output directory')
16 | parser.add_argument('--gt-dir', dest='gt_dir', type=str, default=None, help='path to gt directory')
17 | parser.add_argument('-N', dest='N', type=int, default=200, help='number of samples')
18 |
19 |
20 | def main():
21 | global args
22 | args = parser.parse_args()
23 |
24 | args.output_dir = Path(args.output_dir)
25 | args.gt_dir = Path(args.gt_dir)
26 |
27 | error_names = ['epe_total', 'outliers']
28 | errors = AverageMeter(i=len(error_names))
29 |
30 | for i in tqdm(range(args.N)):
31 | gt_flow_path = args.gt_dir.joinpath(str(i).zfill(6)+'_10.png')
32 | output_flow_path = args.output_dir.joinpath(str(i).zfill(6)+'_10.png')
33 | u_gt,v_gt,valid_gt = flow_io.flow_read_png(gt_flow_path)
34 | u_pred,v_pred,valid_pred = flow_io.flow_read_png(output_flow_path)
35 |
36 | _errors = compute_err(u_gt, v_gt, valid_gt, u_pred, v_pred, valid_pred)
37 | errors.update(_errors)
38 |
39 |
40 | print("Results")
41 | print("\t {:>10}, {:>10} ".format(*error_names))
42 | print("Errors \t {:10.4f}, {:10.4f}".format(*errors.avg))
43 |
44 | def compute_err(u_gt, v_gt, valid_gt, u_pred, v_pred, valid_pred, tau=[3,0.05]):
45 | epe = np.sqrt(np.power((u_gt - u_pred), 2) + np.power((v_gt - v_pred), 2))
46 | epe = epe * valid_gt
47 | aepe = epe.sum() / valid_gt.sum()
48 | F_mag = np.sqrt(np.power(u_gt, 2)+ np.power(v_gt, 2))
49 | E_0 = (epe > tau[0])#.type_as(epe)
50 | E_1 = ((epe / (F_mag+epsilon) ) > tau[1])#.type_as(epe)
51 | n_err = E_0 * E_1 * valid_gt
52 | f_err = n_err.sum()/valid_gt.sum()
53 | return [aepe, f_err]
54 |
55 |
56 | if __name__ == '__main__':
57 | main()
58 |
--------------------------------------------------------------------------------
/flowutils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/anuragranj/cc/2b4e36292c18f8ee68ad5d210a4190f9adf881dc/flowutils/__init__.py
--------------------------------------------------------------------------------
/flowutils/flow_io.py:
--------------------------------------------------------------------------------
1 | #! /usr/bin/env python2
2 |
3 | """
4 | I/O script to save and load the data coming with the MPI-Sintel low-level
5 | computer vision benchmark.
6 |
7 | For more details about the benchmark, please visit www.mpi-sintel.de
8 |
9 | CHANGELOG:
10 | v1.0 (2015/02/03): First release
11 |
12 | Copyright (c) 2015 Jonas Wulff
13 | Max Planck Institute for Intelligent Systems, Tuebingen, Germany
14 |
15 | """
16 |
17 | # Requirements: Numpy as PIL/Pillow
18 | import numpy as np
19 | try:
20 | import png
21 | has_png = True
22 | except:
23 | has_png = False
24 | png=None
25 |
26 |
27 |
28 | # Check for endianness, based on Daniel Scharstein's optical flow code.
29 | # Using little-endian architecture, these two should be equal.
30 | TAG_FLOAT = 202021.25
31 | TAG_CHAR = 'PIEH'.encode()
32 |
33 | def flow_read(filename, return_validity=False):
34 | """ Read optical flow from file, return (U,V) tuple.
35 |
36 | Original code by Deqing Sun, adapted from Daniel Scharstein.
37 | """
38 | f = open(filename,'rb')
39 | check = np.fromfile(f,dtype=np.float32,count=1)[0]
40 | assert check == TAG_FLOAT, ' flow_read:: Wrong tag in flow file (should be: {0}, is: {1}). Big-endian machine? '.format(TAG_FLOAT,check)
41 | width = np.fromfile(f,dtype=np.int32,count=1)[0]
42 | height = np.fromfile(f,dtype=np.int32,count=1)[0]
43 | size = width*height
44 | assert width > 0 and height > 0 and size > 1 and size < 100000000, ' flow_read:: Wrong input size (width = {0}, height = {1}).'.format(width,height)
45 | tmp = np.fromfile(f,dtype=np.float32,count=-1).reshape((height,width*2))
46 | u = tmp[:,np.arange(width)*2]
47 | v = tmp[:,np.arange(width)*2 + 1]
48 |
49 | if return_validity:
50 | valid = u<1e19
51 | u[valid==0] = 0
52 | v[valid==0] = 0
53 | return u,v,valid
54 | else:
55 | return u,v
56 |
57 | def flow_write(filename,uv,v=None):
58 | """ Write optical flow to file.
59 |
60 | If v is None, uv is assumed to contain both u and v channels,
61 | stacked in depth.
62 |
63 | Original code by Deqing Sun, adapted from Daniel Scharstein.
64 | """
65 | nBands = 2
66 |
67 | if v is None:
68 | uv_ = np.array(uv)
69 | assert(uv_.ndim==3)
70 | if uv_.shape[0] == 2:
71 | u = uv_[0,:,:]
72 | v = uv_[1,:,:]
73 | elif uv_.shape[2] == 2:
74 | u = uv_[:,:,0]
75 | v = uv_[:,:,1]
76 | else:
77 | raise UVError('Wrong format for flow input')
78 | else:
79 | u = uv
80 |
81 | assert(u.shape == v.shape)
82 | height,width = u.shape
83 | f = open(filename,'wb')
84 | # write the header
85 | f.write(TAG_CHAR)
86 | np.array(width).astype(np.int32).tofile(f)
87 | np.array(height).astype(np.int32).tofile(f)
88 | # arrange into matrix form
89 | tmp = np.zeros((height, width*nBands))
90 | tmp[:,np.arange(width)*2] = u
91 | tmp[:,np.arange(width)*2 + 1] = v
92 | tmp.astype(np.float32).tofile(f)
93 | f.close()
94 |
95 |
96 | def flow_read_png(fpath):
97 | """
98 | Read KITTI optical flow, returns u,v,valid mask
99 |
100 | """
101 | if not has_png:
102 | print('Error. Please install the PyPNG library')
103 | return
104 |
105 | R = png.Reader(fpath)
106 | width,height,data,_ = R.asDirect()
107 | # This only worked with python2.
108 | #I = np.array(map(lambda x:x,data)).reshape((height,width,3))
109 | I = np.array([x for x in data]).reshape((height,width,3))
110 | u_ = I[:,:,0]
111 | v_ = I[:,:,1]
112 | valid = I[:,:,2]
113 |
114 | u = (u_.astype('float64')-2**15)/64.0
115 | v = (v_.astype('float64')-2**15)/64.0
116 |
117 | return u,v,valid
118 |
119 |
120 | def flow_write_png(fpath,u,v,valid=None):
121 | """
122 | Write KITTI optical flow.
123 |
124 | """
125 | if not has_png:
126 | print('Error. Please install the PyPNG library')
127 | return
128 |
129 |
130 | if valid==None:
131 | valid_ = np.ones(u.shape,dtype='uint16')
132 | else:
133 | valid_ = valid.astype('uint16')
134 |
135 |
136 | u = u.astype('float64')
137 | v = v.astype('float64')
138 |
139 | u_ = ((u*64.0)+2**15).astype('uint16')
140 | v_ = ((v*64.0)+2**15).astype('uint16')
141 |
142 | I = np.dstack((u_,v_,valid_))
143 |
144 | W = png.Writer(width=u.shape[1],
145 | height=u.shape[0],
146 | bitdepth=16,
147 | planes=3)
148 |
149 | with open(fpath,'wb') as fil:
150 | W.write(fil,I.reshape((-1,3*u.shape[1])))
151 |
--------------------------------------------------------------------------------
/flowutils/flow_viz.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 | from torchvision.transforms import ToTensor
4 |
5 | def batchComputeFlowImage(uv):
6 | flow_im = torch.zeros(uv.size(0), 3, uv.size(2), uv.size(3) )
7 | uv_np = uv.numpy()
8 | for i in range(uv.size(0)):
9 | flow_im[i] = ToTensor()(computeFlowImage(uv_np[i][0], uv_np[i][1]))
10 | return flow_im
11 |
12 | def computeFlowImage(u,v,logscale=True,scaledown=6,output=False):
13 | """
14 | topleft is zero, u is horiz, v is vertical
15 | red is 3 o'clock, yellow is 6, light blue is 9, blue/purple is 12
16 | """
17 | colorwheel = makecolorwheel()
18 | ncols = colorwheel.shape[0]
19 |
20 | radius = np.sqrt(u**2 + v**2)
21 | if output:
22 | print("Maximum flow magnitude: %04f" % np.max(radius))
23 | if logscale:
24 | radius = np.log(radius + 1)
25 | if output:
26 | print("Maximum flow magnitude (after log): %0.4f" % np.max(radius))
27 | radius = radius / scaledown
28 | if output:
29 | print("Maximum flow magnitude (after scaledown): %0.4f" % np.max(radius))
30 | rot = np.arctan2(-v, -u) / np.pi
31 |
32 | fk = (rot+1)/2 * (ncols-1) # -1~1 maped to 0~ncols
33 | k0 = fk.astype(np.uint8) # 0, 1, 2, ..., ncols
34 |
35 | k1 = k0+1
36 | k1[k1 == ncols] = 0
37 |
38 | f = fk - k0
39 |
40 | ncolors = colorwheel.shape[1]
41 | img = np.zeros(u.shape+(ncolors,))
42 | for i in range(ncolors):
43 | tmp = colorwheel[:,i]
44 | col0 = tmp[k0]
45 | col1 = tmp[k1]
46 | col = (1-f)*col0 + f*col1
47 |
48 | idx = radius <= 1
49 | # increase saturation with radius
50 | col[idx] = 1 - radius[idx]*(1-col[idx])
51 | # out of range
52 | col[~idx] *= 0.75
53 | img[:,:,i] = np.floor(255*col).astype(np.uint8)
54 |
55 | return img.astype(np.uint8)
56 |
57 |
58 | def makecolorwheel():
59 | # Create a colorwheel for visualization
60 | RY = 15
61 | YG = 6
62 | GC = 4
63 | CB = 11
64 | BM = 13
65 | MR = 6
66 |
67 | ncols = RY + YG + GC + CB + BM + MR
68 |
69 | colorwheel = np.zeros((ncols,3))
70 |
71 | col = 0
72 | # RY
73 | colorwheel[0:RY,0] = 1
74 | colorwheel[0:RY,1] = np.arange(0,1,1./RY)
75 | col += RY
76 |
77 | # YG
78 | colorwheel[col:col+YG,0] = np.arange(1,0,-1./YG)
79 | colorwheel[col:col+YG,1] = 1
80 | col += YG
81 |
82 | # GC
83 | colorwheel[col:col+GC,1] = 1
84 | colorwheel[col:col+GC,2] = np.arange(0,1,1./GC)
85 | col += GC
86 |
87 | # CB
88 | colorwheel[col:col+CB,1] = np.arange(1,0,-1./CB)
89 | colorwheel[col:col+CB,2] = 1
90 | col += CB
91 |
92 | # BM
93 | colorwheel[col:col+BM,2] = 1
94 | colorwheel[col:col+BM,0] = np.arange(0,1,1./BM)
95 | col += BM
96 |
97 | # MR
98 | colorwheel[col:col+MR,2] = np.arange(1,0,-1./MR)
99 | colorwheel[col:col+MR,0] = 1
100 |
101 | return colorwheel
102 |
--------------------------------------------------------------------------------
/flowutils/pfm.py:
--------------------------------------------------------------------------------
1 | import re
2 | import numpy as np
3 | import sys
4 |
5 |
6 | def readPFM(file):
7 | file = open(file, 'rb')
8 |
9 | color = None
10 | width = None
11 | height = None
12 | scale = None
13 | endian = None
14 |
15 | header = file.readline().rstrip()
16 | if header == 'PF':
17 | color = True
18 | elif header == 'Pf':
19 | color = False
20 | else:
21 | raise Exception('Not a PFM file.')
22 |
23 | dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline())
24 | if dim_match:
25 | width, height = map(int, dim_match.groups())
26 | else:
27 | raise Exception('Malformed PFM header.')
28 |
29 | scale = float(file.readline().rstrip())
30 | if scale < 0: # little-endian
31 | endian = '<'
32 | scale = -scale
33 | else:
34 | endian = '>' # big-endian
35 |
36 | data = np.fromfile(file, endian + 'f')
37 | shape = (height, width, 3) if color else (height, width)
38 |
39 | data = np.reshape(data, shape)
40 | data = np.flipud(data)
41 | return data, scale
42 |
43 |
44 | def writePFM(file, image, scale=1):
45 | file = open(file, 'wb')
46 |
47 | color = None
48 |
49 | if image.dtype.name != 'float32':
50 | raise Exception('Image dtype must be float32.')
51 |
52 | image = np.flipud(image)
53 |
54 | if len(image.shape) == 3 and image.shape[2] == 3: # color image
55 | color = True
56 | elif len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1: # greyscale
57 | color = False
58 | else:
59 | raise Exception('Image must have H x W x 3, H x W x 1 or H x W dimensions.')
60 |
61 | file.write('PF\n' if color else 'Pf\n')
62 | file.write('%d %d\n' % (image.shape[1], image.shape[0]))
63 |
64 | endian = image.dtype.byteorder
65 |
66 | if endian == '<' or endian == '=' and sys.byteorder == 'little':
67 | scale = -scale
68 |
69 | file.write('%f\n' % scale)
70 |
71 | image.tofile(file)
--------------------------------------------------------------------------------
/kitti_eval/depth_evaluation_utils.py:
--------------------------------------------------------------------------------
1 | # Mostly based on the code written by Clement Godard:
2 | # https://github.com/mrharicot/monodepth/blob/master/utils/evaluation_utils.py
3 | import numpy as np
4 | # import pandas as pd
5 | import datetime
6 | from collections import Counter
7 | from path import Path
8 | from scipy.misc import imread
9 | from tqdm import tqdm
10 |
11 | width_to_focal = dict()
12 | width_to_focal[1242] = 721.5377
13 | width_to_focal[1241] = 718.856
14 | width_to_focal[1224] = 707.0493
15 | width_to_focal[1238] = 718.3351
16 |
17 |
18 | class test_framework_KITTI(object):
19 | def __init__(self, root, test_files, seq_length=3, min_depth=1e-3, max_depth=100, step=1):
20 | self.root = root
21 | self.min_depth, self.max_depth = min_depth, max_depth
22 | self.calib_dirs, self.gt_files, self.img_files, self.displacements, self.cams = read_scene_data(self.root, test_files, seq_length, step)
23 |
24 | def __getitem__(self, i):
25 | tgt = imread(self.img_files[i][0]).astype(np.float32)
26 | depth = generate_depth_map(self.calib_dirs[i], self.gt_files[i], tgt.shape[:2], self.cams[i])
27 | return {'tgt': tgt,
28 | 'ref': [imread(img).astype(np.float32) for img in self.img_files[i][1]],
29 | 'path':self.img_files[i][0],
30 | 'gt_depth': depth,
31 | 'displacements': np.array(self.displacements[i]),
32 | 'mask': generate_mask(depth, self.min_depth, self.max_depth)
33 | }
34 |
35 | def __len__(self):
36 | return len(self.img_files)
37 |
38 |
39 | ###############################################################################
40 | # EIGEN
41 |
42 | def read_text_lines(file_path):
43 | f = open(file_path, 'r')
44 | lines = f.readlines()
45 | f.close()
46 | lines = [l.rstrip() for l in lines]
47 | return lines
48 |
49 |
50 | def get_displacements(oxts_root, index, shifts):
51 | with open(oxts_root/'timestamps.txt') as f:
52 | timestamps = [datetime.datetime.strptime(ts[:-3], "%Y-%m-%d %H:%M:%S.%f").timestamp() for ts in f.read().splitlines()]
53 | oxts_data = np.genfromtxt(oxts_root/'data'/'{:010d}.txt'.format(index))
54 | speed = np.linalg.norm(oxts_data[8:11])
55 | assert(all(index+shift < len(timestamps) and index+shift >= 0 for shift in shifts)), str([index+shift for shift in shifts])
56 | return [speed*abs(timestamps[index] - timestamps[index + shift]) for shift in shifts]
57 |
58 |
59 | def read_scene_data(data_root, test_list, seq_length=3, step=1):
60 | data_root = Path(data_root)
61 | gt_files = []
62 | calib_dirs = []
63 | im_files = []
64 | cams = []
65 | displacements = []
66 | demi_length = (seq_length - 1) // 2
67 | shift_range = [step*i for i in list(range(-demi_length,0)) + list(range(1, demi_length + 1))]
68 |
69 | print('getting test metadata ... ')
70 | for sample in tqdm(test_list):
71 | tgt_img_path = data_root/sample
72 | date, scene, cam_id, _, index = sample[:-4].split('/')
73 |
74 | ref_imgs_path = [tgt_img_path.dirname()/'{:010d}.png'.format(int(index) + shift) for shift in shift_range]
75 |
76 | caped_shift_range = shift_range[:] # ensures ref_imgs are present, if not, set shift to 0 so that it will be discarded later
77 | for i,img in enumerate(ref_imgs_path):
78 | if not img.isfile():
79 | ref_imgs_path[i] = tgt_img_path
80 | caped_shift_range[i] = 0
81 |
82 | vel_path = data_root/date/scene/'velodyne_points'/'data'/'{}.bin'.format(index[:10])
83 |
84 | if tgt_img_path.isfile():
85 | gt_files.append(vel_path)
86 | calib_dirs.append(data_root/date)
87 | im_files.append([tgt_img_path,ref_imgs_path])
88 | cams.append(int(cam_id[-2:]))
89 | displacements.append(get_displacements(data_root/date/scene/'oxts', int(index), caped_shift_range))
90 | else:
91 | print('{} missing'.format(tgt_img_path))
92 | # print(num_probs, 'files missing')
93 |
94 | return calib_dirs, gt_files, im_files, displacements, cams
95 |
96 |
97 | def load_velodyne_points(file_name):
98 | # adapted from https://github.com/hunse/kitti
99 | points = np.fromfile(file_name, dtype=np.float32).reshape(-1, 4)
100 | points[:,3] = 1
101 | return points
102 |
103 |
104 | def read_calib_file(path):
105 | # taken from https://github.com/hunse/kitti
106 | float_chars = set("0123456789.e+- ")
107 | data = {}
108 | with open(path, 'r') as f:
109 | for line in f.readlines():
110 | key, value = line.split(':', 1)
111 | value = value.strip()
112 | data[key] = value
113 | if float_chars.issuperset(value):
114 | # try to cast to float array
115 | try:
116 | data[key] = np.array(list(map(float, value.split(' '))))
117 | except ValueError:
118 | # casting error: data[key] already eq. value, so pass
119 | pass
120 |
121 | return data
122 |
123 |
124 | def get_focal_length_baseline(calib_dir, cam=2):
125 | cam2cam = read_calib_file(calib_dir + 'calib_cam_to_cam.txt')
126 | P2_rect = cam2cam['P_rect_02'].reshape(3,4)
127 | P3_rect = cam2cam['P_rect_03'].reshape(3,4)
128 |
129 | # cam 2 is left of camera 0 -6cm
130 | # cam 3 is to the right +54cm
131 | b2 = P2_rect[0,3] / -P2_rect[0,0]
132 | b3 = P3_rect[0,3] / -P3_rect[0,0]
133 | baseline = b3-b2
134 |
135 | if cam == 2:
136 | focal_length = P2_rect[0,0]
137 | elif cam == 3:
138 | focal_length = P3_rect[0,0]
139 |
140 | return focal_length, baseline
141 |
142 |
143 | def sub2ind(matrixSize, rowSub, colSub):
144 | m, n = matrixSize
145 | return rowSub * (n-1) + colSub - 1
146 |
147 |
148 | def generate_depth_map(calib_dir, velo_file_name, im_shape, cam=2):
149 | # load calibration files
150 | cam2cam = read_calib_file(calib_dir/'calib_cam_to_cam.txt')
151 | velo2cam = read_calib_file(calib_dir/'calib_velo_to_cam.txt')
152 | velo2cam = np.hstack((velo2cam['R'].reshape(3,3), velo2cam['T'][..., np.newaxis]))
153 | velo2cam = np.vstack((velo2cam, np.array([0, 0, 0, 1.0])))
154 |
155 | # compute projection matrix velodyne->image plane
156 | R_cam2rect = np.eye(4)
157 | R_cam2rect[:3,:3] = cam2cam['R_rect_00'].reshape(3,3)
158 | P_rect = cam2cam['P_rect_0'+str(cam)].reshape(3,4)
159 | P_velo2im = np.dot(np.dot(P_rect, R_cam2rect), velo2cam)
160 |
161 | # load velodyne points and remove all behind image plane (approximation)
162 | # each row of the velodyne data is forward, left, up, reflectance
163 | velo = load_velodyne_points(velo_file_name)
164 | velo = velo[velo[:, 0] >= 0, :]
165 |
166 | # project the points to the camera
167 | velo_pts_im = np.dot(P_velo2im, velo.T).T
168 | velo_pts_im[:, :2] = velo_pts_im[:,:2] / velo_pts_im[:,-1:]
169 |
170 | # check if in bounds
171 | # use minus 1 to get the exact same value as KITTI matlab code
172 | velo_pts_im[:, 0] = np.round(velo_pts_im[:,0]) - 1
173 | velo_pts_im[:, 1] = np.round(velo_pts_im[:,1]) - 1
174 | val_inds = (velo_pts_im[:, 0] >= 0) & (velo_pts_im[:, 1] >= 0)
175 | val_inds = val_inds & (velo_pts_im[:,0] < im_shape[1]) & (velo_pts_im[:,1] < im_shape[0])
176 | velo_pts_im = velo_pts_im[val_inds, :]
177 |
178 | # project to image
179 | depth = np.zeros((im_shape))
180 | depth[velo_pts_im[:, 1].astype(np.int), velo_pts_im[:, 0].astype(np.int)] = velo_pts_im[:, 2]
181 |
182 | # find the duplicate points and choose the closest depth
183 | inds = sub2ind(depth.shape, velo_pts_im[:, 1], velo_pts_im[:, 0])
184 | dupe_inds = [item for item, count in Counter(inds).items() if count > 1]
185 | for dd in dupe_inds:
186 | pts = np.where(inds == dd)[0]
187 | x_loc = int(velo_pts_im[pts[0], 0])
188 | y_loc = int(velo_pts_im[pts[0], 1])
189 | depth[y_loc, x_loc] = velo_pts_im[pts, 2].min()
190 | depth[depth < 0] = 0
191 | return depth
192 |
193 |
194 | def generate_mask(gt_depth, min_depth, max_depth):
195 | mask = np.logical_and(gt_depth > min_depth,
196 | gt_depth < max_depth)
197 | # crop used by Garg ECCV16 to reprocude Eigen NIPS14 results
198 | # if used on gt_size 370x1224 produces a crop of [-218, -3, 44, 1180]
199 | gt_height, gt_width = gt_depth.shape
200 | crop = np.array([0.40810811 * gt_height, 0.99189189 * gt_height,
201 | 0.03594771 * gt_width, 0.96405229 * gt_width]).astype(np.int32)
202 |
203 | crop_mask = np.zeros(mask.shape)
204 | crop_mask[crop[0]:crop[1],crop[2]:crop[3]] = 1
205 | mask = np.logical_and(mask, crop_mask)
206 | return mask
207 |
--------------------------------------------------------------------------------
/kitti_eval/pose_evaluation_utils.py:
--------------------------------------------------------------------------------
1 | # Mostly based on the code written by Clement Godard:
2 | # https://github.com/mrharicot/monodepth/blob/master/utils/evaluation_utils.py
3 | import numpy as np
4 | # import pandas as pd
5 | from path import Path
6 | from scipy.misc import imread
7 | from tqdm import tqdm
8 |
9 |
10 | class test_framework_KITTI(object):
11 | def __init__(self, root, sequence_set, seq_length=3, step=1):
12 | self.root = root
13 | self.img_files, self.poses, self.sample_indices = read_scene_data(self.root, sequence_set, seq_length, step)
14 |
15 | def generator(self):
16 | for img_list, pose_list, sample_list in zip(self.img_files, self.poses, self.sample_indices):
17 | for snippet_indices in sample_list:
18 | imgs = [imread(img_list[i]).astype(np.float32) for i in snippet_indices]
19 |
20 | poses = np.stack(pose_list[i] for i in snippet_indices)
21 | first_pose = poses[0]
22 | poses[:,:,-1] -= first_pose[:,-1]
23 | compensated_poses = np.linalg.inv(first_pose[:,:3]) @ poses
24 |
25 | yield {'imgs': imgs,
26 | 'path': img_list[0],
27 | 'poses': compensated_poses
28 | }
29 |
30 | def __iter__(self):
31 | return self.generator()
32 |
33 | def __len__(self):
34 | return sum(len(imgs) for imgs in self.img_files)
35 |
36 |
37 | def read_scene_data(data_root, sequence_set, seq_length=3, step=1):
38 | data_root = Path(data_root)
39 | im_sequences = []
40 | poses_sequences = []
41 | indices_sequences = []
42 | demi_length = (seq_length - 1) // 2
43 | shift_range = np.array([step*i for i in range(-demi_length, demi_length + 1)]).reshape(1, -1)
44 |
45 | sequences = set()
46 | for seq in sequence_set:
47 | corresponding_dirs = set((data_root/'sequences').dirs(seq))
48 | sequences = sequences | corresponding_dirs
49 |
50 | print('getting test metadata for theses sequences : {}'.format(sequences))
51 | for sequence in tqdm(sequences):
52 | poses = np.genfromtxt(data_root/'poses'/'{}.txt'.format(sequence.name)).astype(np.float64).reshape(-1, 3, 4)
53 | imgs = sorted((sequence/'image_2').files('*.png'))
54 | # construct 5-snippet sequences
55 | tgt_indices = np.arange(demi_length, len(imgs) - demi_length).reshape(-1, 1)
56 | snippet_indices = shift_range + tgt_indices
57 | im_sequences.append(imgs)
58 | poses_sequences.append(poses)
59 | indices_sequences.append(snippet_indices)
60 | return im_sequences, poses_sequences, indices_sequences
--------------------------------------------------------------------------------
/logger.py:
--------------------------------------------------------------------------------
1 | from blessings import Terminal
2 | import progressbar
3 | import sys
4 |
5 |
6 | class TermLogger(object):
7 | def __init__(self, n_epochs, train_size, valid_size):
8 | self.n_epochs = n_epochs
9 | self.train_size = train_size
10 | self.valid_size = valid_size
11 | self.t = Terminal()
12 | s = 10
13 | e = 1 # epoch bar position
14 | tr = 3 # train bar position
15 | ts = 6 # valid bar position
16 | h = self.t.height
17 |
18 | for i in range(10):
19 | print('')
20 | self.epoch_bar = progressbar.ProgressBar(maxval=n_epochs, fd=Writer(self.t, (0, h-s+e)))
21 |
22 | self.train_writer = Writer(self.t, (0, h-s+tr))
23 | self.train_bar_writer = Writer(self.t, (0, h-s+tr+1))
24 |
25 | self.valid_writer = Writer(self.t, (0, h-s+ts))
26 | self.valid_bar_writer = Writer(self.t, (0, h-s+ts+1))
27 |
28 | self.reset_train_bar()
29 | self.reset_valid_bar()
30 |
31 | def reset_train_bar(self):
32 | self.train_bar = progressbar.ProgressBar(maxval=self.train_size, fd=self.train_bar_writer).start()
33 |
34 | def reset_valid_bar(self):
35 | self.valid_bar = progressbar.ProgressBar(maxval=self.valid_size, fd=self.valid_bar_writer).start()
36 |
37 |
38 | class Writer(object):
39 | """Create an object with a write method that writes to a
40 | specific place on the screen, defined at instantiation.
41 |
42 | This is the glue between blessings and progressbar.
43 | """
44 |
45 | def __init__(self, t, location):
46 | """
47 | Input: location - tuple of ints (x, y), the position
48 | of the bar in the terminal
49 | """
50 | self.location = location
51 | self.t = t
52 |
53 | def write(self, string):
54 | with self.t.location(*self.location):
55 | sys.stdout.write("\033[K")
56 | print(string)
57 |
58 | def flush(self):
59 | return
60 |
61 |
62 | class AverageMeter(object):
63 | """Computes and stores the average and current value"""
64 |
65 | def __init__(self, i=1, precision=3):
66 | self.meters = i
67 | self.precision = precision
68 | self.reset(self.meters)
69 |
70 | def reset(self, i):
71 | self.val = [0]*i
72 | self.avg = [0]*i
73 | self.sum = [0]*i
74 | self.count = 0
75 |
76 | def update(self, val, n=1):
77 | if not isinstance(val, list):
78 | val = [val]
79 | assert(len(val) == self.meters)
80 | self.count += n
81 | for i,v in enumerate(val):
82 | self.val[i] = v
83 | self.sum[i] += v * n
84 | self.avg[i] = self.sum[i] / self.count
85 |
86 | def __repr__(self):
87 | val = ' '.join(['{:.{}f}'.format(v, self.precision) for v in self.val])
88 | avg = ' '.join(['{:.{}f}'.format(a, self.precision) for a in self.avg])
89 | return '{} ({})'.format(val, avg)
90 |
--------------------------------------------------------------------------------
/mnist_eval.py:
--------------------------------------------------------------------------------
1 | # Author: Anurag Ranjan
2 | # Copyright (c) 2019, Anurag Ranjan
3 | # All rights reserved.
4 |
5 | import argparse
6 | import time
7 | import csv
8 | import datetime
9 | import os
10 | from tqdm import tqdm
11 | import numpy as np
12 |
13 | import torch
14 | from torch.autograd import Variable
15 | import torch.backends.cudnn as cudnn
16 | import torch.optim
17 | import torch.nn as nn
18 | import torch.utils.data
19 | import torchvision
20 | import torch.nn.functional as F
21 |
22 | from logger import TermLogger, AverageMeter
23 | from path import Path
24 | from itertools import chain
25 | from tensorboardX import SummaryWriter
26 |
27 | from utils import tensor2array, save_checkpoint
28 |
29 | parser = argparse.ArgumentParser(description='MNIST and SVHN training',
30 | formatter_class=argparse.ArgumentDefaultsHelpFormatter)
31 | parser.add_argument('data', metavar='DIR',
32 | help='path to dataset')
33 | parser.add_argument('-j', '--workers', default=4, type=int, metavar='N',
34 | help='number of data loading workers')
35 | parser.add_argument('-b', '--batch-size', default=100, type=int,
36 | metavar='N', help='mini-batch size')
37 |
38 | parser.add_argument('--pretrained-alice', dest='pretrained_alice', default=None, metavar='PATH',
39 | help='path to pre-trained alice model')
40 | parser.add_argument('--pretrained-bob', dest='pretrained_bob', default=None, metavar='PATH',
41 | help='path to pre-trained bob model')
42 | parser.add_argument('--pretrained-mod', dest='pretrained_mod', default=None, metavar='PATH',
43 | help='path to pre-trained moderator')
44 |
45 | class LeNet(nn.Module):
46 | def __init__(self, nout=10):
47 | super(LeNet, self).__init__()
48 | self.conv1 = nn.Conv2d(1, 40, 3, 1)
49 | self.conv2 = nn.Conv2d(40, 40, 3, 1)
50 | self.fc1 = nn.Linear(40*5*5, 40)
51 | self.fc2 = nn.Linear(40, nout)
52 |
53 | def forward(self, x):
54 | x = F.relu(self.conv1(x))
55 | x = F.max_pool2d(x, 2, 2)
56 | x = F.relu(self.conv2(x))
57 | x = F.max_pool2d(x, 2, 2)
58 | x = x.view(-1, 40*5*5)
59 | x = F.relu(self.fc1(x))
60 | x = self.fc2(x)
61 | return x
62 |
63 | def name(self):
64 | return "LeNet"
65 |
66 | def main():
67 | global args
68 | args = parser.parse_args()
69 |
70 | args.data = Path(args.data)
71 |
72 | print("=> fetching dataset")
73 | mnist_transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor(),
74 | torchvision.transforms.Normalize((0.1307,), (0.3081,))])
75 | valset_mnist = torchvision.datasets.MNIST(args.data/'mnist', train=False, transform=mnist_transform, target_transform=None, download=True)
76 |
77 | svhn_transform = torchvision.transforms.Compose([torchvision.transforms.Resize(size=(28,28)),
78 | torchvision.transforms.Grayscale(),
79 | torchvision.transforms.ToTensor()])
80 | valset_svhn = torchvision.datasets.SVHN(args.data/'svhn', split='test', transform=svhn_transform, target_transform=None, download=True)
81 | val_set = torch.utils.data.ConcatDataset([valset_mnist, valset_svhn])
82 |
83 |
84 | print('{} Test samples found in MNIST'.format(len(valset_mnist)))
85 | print('{} Test samples found in SVHN'.format(len(valset_svhn)))
86 |
87 | val_loader = torch.utils.data.DataLoader(
88 | val_set, batch_size=args.batch_size, shuffle=False,
89 | num_workers=args.workers, pin_memory=True, drop_last=False)
90 |
91 | val_loader_mnist = torch.utils.data.DataLoader(
92 | valset_mnist, batch_size=args.batch_size, shuffle=False,
93 | num_workers=args.workers, pin_memory=True, drop_last=False)
94 |
95 | val_loader_svhn = torch.utils.data.DataLoader(
96 | valset_svhn, batch_size=args.batch_size, shuffle=False,
97 | num_workers=args.workers, pin_memory=True, drop_last=False)
98 |
99 | # create model
100 | print("=> creating model")
101 |
102 | alice_net = LeNet()
103 | bob_net = LeNet()
104 | mod_net = LeNet(nout=1)
105 |
106 | print("=> using pre-trained weights from {}".format(args.pretrained_alice))
107 | weights = torch.load(args.pretrained_alice)
108 | alice_net.load_state_dict(weights['state_dict'])
109 |
110 | print("=> using pre-trained weights from {}".format(args.pretrained_bob))
111 | weights = torch.load(args.pretrained_bob)
112 | bob_net.load_state_dict(weights['state_dict'])
113 |
114 | print("=> using pre-trained weights from {}".format(args.pretrained_mod))
115 | weights = torch.load(args.pretrained_mod)
116 | mod_net.load_state_dict(weights['state_dict'])
117 |
118 | cudnn.benchmark = True
119 | alice_net = alice_net.cuda()
120 | bob_net = bob_net.cuda()
121 | mod_net = mod_net.cuda()
122 |
123 | # evaluate on validation set
124 | errors_mnist, error_names_mnist, mod_count_mnist = validate(val_loader_mnist, alice_net, bob_net, mod_net)
125 | errors_svhn, error_names_svhn, mod_count_svhn = validate(val_loader_svhn, alice_net, bob_net, mod_net)
126 | errors_total, error_names_total, _ = validate(val_loader, alice_net, bob_net, mod_net)
127 |
128 | accuracy_string_mnist = ', '.join('{} : {:.3f}'.format(name, 100*(error)) for name, error in zip(error_names_mnist, errors_mnist))
129 | accuracy_string_svhn = ', '.join('{} : {:.3f}'.format(name, 100*(error)) for name, error in zip(error_names_svhn, errors_svhn))
130 | accuracy_string_total = ', '.join('{} : {:.3f}'.format(name, 100*(error)) for name, error in zip(error_names_total, errors_total))
131 |
132 | print("MNIST Error")
133 | print(accuracy_string_mnist)
134 | print("MNIST Picking Percentage- Alice {:.3f}, Bob {:.3f}".format(mod_count_mnist[0]*100, (1-mod_count_mnist[0])*100))
135 |
136 | print("SVHN Error")
137 | print(accuracy_string_svhn)
138 | print("SVHN Picking Percentage for Alice {:.3f}, Bob {:.3f}".format(mod_count_svhn[0]*100, (1-mod_count_svhn[0])*100))
139 |
140 | print("TOTAL Error")
141 | print(accuracy_string_total)
142 |
143 | def validate(val_loader, alice_net, bob_net, mod_net):
144 | global args
145 | accuracy = AverageMeter(i=3, precision=4)
146 | mod_count = AverageMeter()
147 |
148 | # switch to evaluate mode
149 | alice_net.eval()
150 | bob_net.eval()
151 | mod_net.eval()
152 |
153 | for i, (img, target) in enumerate(tqdm(val_loader)):
154 | img_var = Variable(img.cuda(), volatile=True)
155 | target_var = Variable(target.cuda(), volatile=True)
156 |
157 | pred_alice = alice_net(img_var)
158 | pred_bob = bob_net(img_var)
159 | pred_mod = F.sigmoid(mod_net(img_var))
160 | _ , pred_alice_label = torch.max(pred_alice.data, 1)
161 | _ , pred_bob_label = torch.max(pred_bob.data, 1)
162 | pred_label = (pred_mod.squeeze().data > 0.5).type_as(pred_alice_label) * pred_alice_label + (pred_mod.squeeze().data <= 0.5).type_as(pred_bob_label) * pred_bob_label
163 |
164 | total_accuracy = (pred_label.cpu() == target).sum().item() / img.size(0)
165 | alice_accuracy = (pred_alice_label.cpu() == target).sum().item() / img.size(0)
166 | bob_accuracy = (pred_bob_label.cpu() == target).sum().item() / img.size(0)
167 | accuracy.update([total_accuracy, alice_accuracy, bob_accuracy])
168 | mod_count.update((pred_mod.cpu().data > 0.5).sum().item() / img.size(0))
169 |
170 | return list(map(lambda x: 1-x, accuracy.avg)), ['Total', 'alice', 'bob'] , mod_count.avg
171 |
172 |
173 |
174 | if __name__ == '__main__':
175 | # import sys
176 | # with open("experiment_recorder.md", "a") as f:
177 | # f.write('\n python3 ' + ' '.join(sys.argv))
178 | main()
179 |
--------------------------------------------------------------------------------
/models/DispNetS.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 |
4 |
5 | def downsample_conv(in_planes, out_planes, kernel_size=3):
6 | return nn.Sequential(
7 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=2, padding=(kernel_size-1)//2),
8 | nn.ReLU(inplace=True),
9 | nn.Conv2d(out_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2),
10 | nn.ReLU(inplace=True)
11 | )
12 |
13 |
14 | def predict_disp(in_planes):
15 | return nn.Sequential(
16 | nn.Conv2d(in_planes, 1, kernel_size=3, padding=1),
17 | nn.Sigmoid()
18 | )
19 |
20 |
21 | def conv(in_planes, out_planes):
22 | return nn.Sequential(
23 | nn.Conv2d(in_planes, out_planes, kernel_size=3, padding=1),
24 | nn.ReLU(inplace=True)
25 | )
26 |
27 |
28 | def upconv(in_planes, out_planes):
29 | return nn.Sequential(
30 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=3, stride=2, padding=1, output_padding=1),
31 | nn.ReLU(inplace=True)
32 | )
33 |
34 |
35 | def crop_like(input, ref):
36 | assert(input.size(2) >= ref.size(2) and input.size(3) >= ref.size(3))
37 | return input[:, :, :ref.size(2), :ref.size(3)]
38 |
39 |
40 | class DispNetS(nn.Module):
41 |
42 | def __init__(self, alpha=10, beta=0.01):
43 | super(DispNetS, self).__init__()
44 |
45 | self.alpha = alpha
46 | self.beta = beta
47 |
48 | conv_planes = [32, 64, 128, 256, 512, 512, 512]
49 | self.conv1 = downsample_conv(3, conv_planes[0], kernel_size=7)
50 | self.conv2 = downsample_conv(conv_planes[0], conv_planes[1], kernel_size=5)
51 | self.conv3 = downsample_conv(conv_planes[1], conv_planes[2])
52 | self.conv4 = downsample_conv(conv_planes[2], conv_planes[3])
53 | self.conv5 = downsample_conv(conv_planes[3], conv_planes[4])
54 | self.conv6 = downsample_conv(conv_planes[4], conv_planes[5])
55 | self.conv7 = downsample_conv(conv_planes[5], conv_planes[6])
56 |
57 | upconv_planes = [512, 512, 256, 128, 64, 32, 16]
58 | self.upconv7 = upconv(conv_planes[6], upconv_planes[0])
59 | self.upconv6 = upconv(upconv_planes[0], upconv_planes[1])
60 | self.upconv5 = upconv(upconv_planes[1], upconv_planes[2])
61 | self.upconv4 = upconv(upconv_planes[2], upconv_planes[3])
62 | self.upconv3 = upconv(upconv_planes[3], upconv_planes[4])
63 | self.upconv2 = upconv(upconv_planes[4], upconv_planes[5])
64 | self.upconv1 = upconv(upconv_planes[5], upconv_planes[6])
65 |
66 | self.iconv7 = conv(upconv_planes[0] + conv_planes[5], upconv_planes[0])
67 | self.iconv6 = conv(upconv_planes[1] + conv_planes[4], upconv_planes[1])
68 | self.iconv5 = conv(upconv_planes[2] + conv_planes[3], upconv_planes[2])
69 | self.iconv4 = conv(upconv_planes[3] + conv_planes[2], upconv_planes[3])
70 | self.iconv3 = conv(1 + upconv_planes[4] + conv_planes[1], upconv_planes[4])
71 | self.iconv2 = conv(1 + upconv_planes[5] + conv_planes[0], upconv_planes[5])
72 | self.iconv1 = conv(1 + upconv_planes[6], upconv_planes[6])
73 |
74 | self.predict_disp4 = predict_disp(upconv_planes[3])
75 | self.predict_disp3 = predict_disp(upconv_planes[4])
76 | self.predict_disp2 = predict_disp(upconv_planes[5])
77 | self.predict_disp1 = predict_disp(upconv_planes[6])
78 |
79 | def init_weights(self):
80 | for m in self.modules():
81 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
82 | nn.init.xavier_uniform(m.weight.data)
83 | if m.bias is not None:
84 | m.bias.data.zero_()
85 |
86 | def forward(self, x):
87 | out_conv1 = self.conv1(x)
88 | out_conv2 = self.conv2(out_conv1)
89 | out_conv3 = self.conv3(out_conv2)
90 | out_conv4 = self.conv4(out_conv3)
91 | out_conv5 = self.conv5(out_conv4)
92 | out_conv6 = self.conv6(out_conv5)
93 | out_conv7 = self.conv7(out_conv6)
94 |
95 | out_upconv7 = crop_like(self.upconv7(out_conv7), out_conv6)
96 | concat7 = torch.cat((out_upconv7, out_conv6), 1)
97 | out_iconv7 = self.iconv7(concat7)
98 |
99 | out_upconv6 = crop_like(self.upconv6(out_iconv7), out_conv5)
100 | concat6 = torch.cat((out_upconv6, out_conv5), 1)
101 | out_iconv6 = self.iconv6(concat6)
102 |
103 | out_upconv5 = crop_like(self.upconv5(out_iconv6), out_conv4)
104 | concat5 = torch.cat((out_upconv5, out_conv4), 1)
105 | out_iconv5 = self.iconv5(concat5)
106 |
107 | out_upconv4 = crop_like(self.upconv4(out_iconv5), out_conv3)
108 | concat4 = torch.cat((out_upconv4, out_conv3), 1)
109 | out_iconv4 = self.iconv4(concat4)
110 | disp4 = self.alpha * self.predict_disp4(out_iconv4) + self.beta
111 |
112 | out_upconv3 = crop_like(self.upconv3(out_iconv4), out_conv2)
113 | disp4_up = crop_like(nn.functional.upsample(disp4, scale_factor=2, mode='bilinear'), out_conv2)
114 | concat3 = torch.cat((out_upconv3, out_conv2, disp4_up), 1)
115 | out_iconv3 = self.iconv3(concat3)
116 | disp3 = self.alpha * self.predict_disp3(out_iconv3) + self.beta
117 |
118 | out_upconv2 = crop_like(self.upconv2(out_iconv3), out_conv1)
119 | disp3_up = crop_like(nn.functional.upsample(disp3, scale_factor=2, mode='bilinear'), out_conv1)
120 | concat2 = torch.cat((out_upconv2, out_conv1, disp3_up), 1)
121 | out_iconv2 = self.iconv2(concat2)
122 | disp2 = self.alpha * self.predict_disp2(out_iconv2) + self.beta
123 |
124 | out_upconv1 = crop_like(self.upconv1(out_iconv2), x)
125 | disp2_up = crop_like(nn.functional.upsample(disp2, scale_factor=2, mode='bilinear'), x)
126 | concat1 = torch.cat((out_upconv1, disp2_up), 1)
127 | out_iconv1 = self.iconv1(concat1)
128 | disp1 = self.alpha * self.predict_disp1(out_iconv1) + self.beta
129 |
130 | if self.training:
131 | return disp1, disp2, disp3, disp4
132 | else:
133 | return disp1
134 |
--------------------------------------------------------------------------------
/models/DispNetS6.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 |
4 |
5 | def downsample_conv(in_planes, out_planes, kernel_size=3):
6 | return nn.Sequential(
7 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=2, padding=(kernel_size-1)//2),
8 | nn.ReLU(inplace=True),
9 | nn.Conv2d(out_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2),
10 | nn.ReLU(inplace=True)
11 | )
12 |
13 |
14 | def predict_disp(in_planes):
15 | return nn.Sequential(
16 | nn.Conv2d(in_planes, 1, kernel_size=3, padding=1),
17 | nn.Sigmoid()
18 | )
19 |
20 |
21 | def conv(in_planes, out_planes):
22 | return nn.Sequential(
23 | nn.Conv2d(in_planes, out_planes, kernel_size=3, padding=1),
24 | nn.ReLU(inplace=True)
25 | )
26 |
27 |
28 | def upconv(in_planes, out_planes):
29 | return nn.Sequential(
30 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=3, stride=2, padding=1, output_padding=1),
31 | nn.ReLU(inplace=True)
32 | )
33 |
34 |
35 | def crop_like(input, ref):
36 | assert(input.size(2) >= ref.size(2) and input.size(3) >= ref.size(3))
37 | return input[:, :, :ref.size(2), :ref.size(3)]
38 |
39 |
40 | class DispNetS6(nn.Module):
41 |
42 | def __init__(self, alpha=10, beta=0.01):
43 | super(DispNetS6, self).__init__()
44 |
45 | self.alpha = alpha
46 | self.beta = beta
47 |
48 | conv_planes = [32, 64, 128, 256, 512, 512, 512]
49 | self.conv1 = downsample_conv(3, conv_planes[0], kernel_size=7)
50 | self.conv2 = downsample_conv(conv_planes[0], conv_planes[1], kernel_size=5)
51 | self.conv3 = downsample_conv(conv_planes[1], conv_planes[2])
52 | self.conv4 = downsample_conv(conv_planes[2], conv_planes[3])
53 | self.conv5 = downsample_conv(conv_planes[3], conv_planes[4])
54 | self.conv6 = downsample_conv(conv_planes[4], conv_planes[5])
55 | self.conv7 = downsample_conv(conv_planes[5], conv_planes[6])
56 |
57 | upconv_planes = [512, 512, 256, 128, 64, 32, 16]
58 | self.upconv7 = upconv(conv_planes[6], upconv_planes[0])
59 | self.upconv6 = upconv(upconv_planes[0], upconv_planes[1])
60 | self.upconv5 = upconv(upconv_planes[1], upconv_planes[2])
61 | self.upconv4 = upconv(upconv_planes[2], upconv_planes[3])
62 | self.upconv3 = upconv(upconv_planes[3], upconv_planes[4])
63 | self.upconv2 = upconv(upconv_planes[4], upconv_planes[5])
64 | self.upconv1 = upconv(upconv_planes[5], upconv_planes[6])
65 |
66 | self.iconv7 = conv(upconv_planes[0] + conv_planes[5], upconv_planes[0])
67 | self.iconv6 = conv(upconv_planes[1] + conv_planes[4], upconv_planes[1])
68 | self.iconv5 = conv(upconv_planes[2] + conv_planes[3], upconv_planes[2])
69 | self.iconv4 = conv(upconv_planes[3] + conv_planes[2], upconv_planes[3])
70 | self.iconv3 = conv(1 + upconv_planes[4] + conv_planes[1], upconv_planes[4])
71 | self.iconv2 = conv(1 + upconv_planes[5] + conv_planes[0], upconv_planes[5])
72 | self.iconv1 = conv(1 + upconv_planes[6], upconv_planes[6])
73 |
74 | self.predict_disp6 = predict_disp(upconv_planes[1])
75 | self.predict_disp5 = predict_disp(upconv_planes[2])
76 | self.predict_disp4 = predict_disp(upconv_planes[3])
77 | self.predict_disp3 = predict_disp(upconv_planes[4])
78 | self.predict_disp2 = predict_disp(upconv_planes[5])
79 | self.predict_disp1 = predict_disp(upconv_planes[6])
80 |
81 | def init_weights(self):
82 | for m in self.modules():
83 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
84 | nn.init.xavier_uniform(m.weight.data)
85 | if m.bias is not None:
86 | m.bias.data.zero_()
87 |
88 | def forward(self, x):
89 | out_conv1 = self.conv1(x)
90 | out_conv2 = self.conv2(out_conv1)
91 | out_conv3 = self.conv3(out_conv2)
92 | out_conv4 = self.conv4(out_conv3)
93 | out_conv5 = self.conv5(out_conv4)
94 | out_conv6 = self.conv6(out_conv5)
95 | out_conv7 = self.conv7(out_conv6)
96 |
97 | out_upconv7 = crop_like(self.upconv7(out_conv7), out_conv6)
98 | concat7 = torch.cat((out_upconv7, out_conv6), 1)
99 | out_iconv7 = self.iconv7(concat7)
100 |
101 | out_upconv6 = crop_like(self.upconv6(out_iconv7), out_conv5)
102 | concat6 = torch.cat((out_upconv6, out_conv5), 1)
103 | out_iconv6 = self.iconv6(concat6)
104 | disp6 = self.alpha * self.predict_disp6(out_iconv6) + self.beta
105 |
106 | out_upconv5 = crop_like(self.upconv5(out_iconv6), out_conv4)
107 | concat5 = torch.cat((out_upconv5, out_conv4), 1)
108 | out_iconv5 = self.iconv5(concat5)
109 | disp5 = self.alpha * self.predict_disp5(out_iconv5) + self.beta
110 |
111 | out_upconv4 = crop_like(self.upconv4(out_iconv5), out_conv3)
112 | concat4 = torch.cat((out_upconv4, out_conv3), 1)
113 | out_iconv4 = self.iconv4(concat4)
114 | disp4 = self.alpha * self.predict_disp4(out_iconv4) + self.beta
115 |
116 | out_upconv3 = crop_like(self.upconv3(out_iconv4), out_conv2)
117 | disp4_up = crop_like(nn.functional.upsample(disp4, scale_factor=2, mode='bilinear'), out_conv2)
118 | concat3 = torch.cat((out_upconv3, out_conv2, disp4_up), 1)
119 | out_iconv3 = self.iconv3(concat3)
120 | disp3 = self.alpha * self.predict_disp3(out_iconv3) + self.beta
121 |
122 | out_upconv2 = crop_like(self.upconv2(out_iconv3), out_conv1)
123 | disp3_up = crop_like(nn.functional.upsample(disp3, scale_factor=2, mode='bilinear'), out_conv1)
124 | concat2 = torch.cat((out_upconv2, out_conv1, disp3_up), 1)
125 | out_iconv2 = self.iconv2(concat2)
126 | disp2 = self.alpha * self.predict_disp2(out_iconv2) + self.beta
127 |
128 | out_upconv1 = crop_like(self.upconv1(out_iconv2), x)
129 | disp2_up = crop_like(nn.functional.upsample(disp2, scale_factor=2, mode='bilinear'), x)
130 | concat1 = torch.cat((out_upconv1, disp2_up), 1)
131 | out_iconv1 = self.iconv1(concat1)
132 | disp1 = self.alpha * self.predict_disp1(out_iconv1) + self.beta
133 |
134 | if self.training:
135 | return disp1, disp2, disp3, disp4, disp5, disp6
136 | else:
137 | return disp1
138 |
--------------------------------------------------------------------------------
/models/DispResNet6.py:
--------------------------------------------------------------------------------
1 | # Author: Anurag Ranjan
2 | # Copyright (c) 2019, Anurag Ranjan
3 | # All rights reserved.
4 | # based on github.com/ClementPinard/SfMLearner-Pytorch
5 |
6 | import torch
7 | import torch.nn as nn
8 |
9 | def conv3x3(in_planes, out_planes, stride=1):
10 | """3x3 convolution with padding"""
11 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
12 | padding=1, bias=False)
13 |
14 | class BasicBlock(nn.Module):
15 | expansion = 1
16 |
17 | def __init__(self, inplanes, planes, stride=1, downsample=None):
18 | super(BasicBlock, self).__init__()
19 | self.conv1 = conv3x3(inplanes, planes, stride)
20 | #self.bn1 = nn.BatchNorm2d(planes)
21 | self.relu = nn.ReLU(inplace=True)
22 | self.conv2 = conv3x3(planes, planes)
23 | #self.bn2 = nn.BatchNorm2d(planes)
24 | self.downsample = downsample
25 | self.stride = stride
26 |
27 | def forward(self, x):
28 | residual = x
29 |
30 | out = self.conv1(x)
31 | #out = self.bn1(out)
32 | out = self.relu(out)
33 |
34 | out = self.conv2(out)
35 | #out = self.bn2(out)
36 |
37 | if self.downsample is not None:
38 | residual = self.downsample(x)
39 |
40 | out += residual
41 | out = self.relu(out)
42 |
43 | return out
44 |
45 | def make_layer(inplanes, block, planes, blocks, stride=1):
46 | downsample = None
47 | if stride != 1 or inplanes != planes * block.expansion:
48 | downsample = nn.Sequential(
49 | nn.Conv2d(inplanes, planes * block.expansion,
50 | kernel_size=1, stride=stride, bias=False),
51 | nn.BatchNorm2d(planes * block.expansion),
52 | )
53 |
54 | layers = []
55 | layers.append(block(inplanes, planes, stride, downsample))
56 | inplanes = planes * block.expansion
57 | for i in range(1, blocks):
58 | layers.append(block(inplanes, planes))
59 |
60 | return nn.Sequential(*layers)
61 |
62 | def downsample_conv(in_planes, out_planes, kernel_size=3):
63 | return nn.Sequential(
64 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=2, padding=(kernel_size-1)//2),
65 | nn.ReLU(inplace=True),
66 | nn.Conv2d(out_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2),
67 | nn.ReLU(inplace=True)
68 | )
69 |
70 |
71 | def predict_disp(in_planes):
72 | return nn.Sequential(
73 | nn.Conv2d(in_planes, 1, kernel_size=3, padding=1),
74 | nn.Sigmoid()
75 | )
76 |
77 |
78 | def conv(in_planes, out_planes):
79 | return nn.Sequential(
80 | nn.Conv2d(in_planes, out_planes, kernel_size=3, padding=1),
81 | nn.ReLU(inplace=True)
82 | )
83 |
84 |
85 | def upconv(in_planes, out_planes):
86 | return nn.Sequential(
87 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=3, stride=2, padding=1, output_padding=1),
88 | nn.ReLU(inplace=True)
89 | )
90 |
91 |
92 | def crop_like(input, ref):
93 | assert(input.size(2) >= ref.size(2) and input.size(3) >= ref.size(3))
94 | return input[:, :, :ref.size(2), :ref.size(3)]
95 |
96 |
97 | class DispResNet6(nn.Module):
98 |
99 | def __init__(self, alpha=10, beta=0.01):
100 | super(DispResNet6, self).__init__()
101 |
102 | self.alpha = alpha
103 | self.beta = beta
104 |
105 | conv_planes = [32, 64, 128, 256, 512, 512, 512]
106 | self.conv1 = downsample_conv(3, conv_planes[0], kernel_size=7)
107 | self.conv2 = make_layer(conv_planes[0], BasicBlock, conv_planes[1], blocks=2, stride=2)
108 | self.conv3 = make_layer(conv_planes[1], BasicBlock, conv_planes[2], blocks=2, stride=2)
109 | self.conv4 = make_layer(conv_planes[2], BasicBlock, conv_planes[3], blocks=2, stride=2)
110 | self.conv5 = make_layer(conv_planes[3], BasicBlock, conv_planes[4], blocks=2, stride=2)
111 | self.conv6 = make_layer(conv_planes[4], BasicBlock, conv_planes[5], blocks=2, stride=2)
112 | self.conv7 = make_layer(conv_planes[5], BasicBlock, conv_planes[6], blocks=2, stride=2)
113 |
114 | upconv_planes = [512, 512, 256, 128, 64, 32, 16]
115 | self.upconv7 = upconv(conv_planes[6], upconv_planes[0])
116 | self.upconv6 = upconv(upconv_planes[0], upconv_planes[1])
117 | self.upconv5 = upconv(upconv_planes[1], upconv_planes[2])
118 | self.upconv4 = upconv(upconv_planes[2], upconv_planes[3])
119 | self.upconv3 = upconv(upconv_planes[3], upconv_planes[4])
120 | self.upconv2 = upconv(upconv_planes[4], upconv_planes[5])
121 | self.upconv1 = upconv(upconv_planes[5], upconv_planes[6])
122 |
123 | self.iconv7 = make_layer(upconv_planes[0] + conv_planes[5], BasicBlock, upconv_planes[0], blocks=1, stride=1)
124 | self.iconv6 = make_layer(upconv_planes[1] + conv_planes[4], BasicBlock, upconv_planes[1], blocks=1, stride=1)
125 | self.iconv5 = make_layer(upconv_planes[2] + conv_planes[3], BasicBlock, upconv_planes[2], blocks=1, stride=1)
126 | self.iconv4 = make_layer(upconv_planes[3] + conv_planes[2], BasicBlock, upconv_planes[3], blocks=1, stride=1)
127 | self.iconv3 = make_layer(1 + upconv_planes[4] + conv_planes[1], BasicBlock, upconv_planes[4], blocks=1, stride=1)
128 | self.iconv2 = make_layer(1 + upconv_planes[5] + conv_planes[0], BasicBlock, upconv_planes[5], blocks=1, stride=1)
129 | self.iconv1 = make_layer(1 + upconv_planes[6], BasicBlock, upconv_planes[6], blocks=1, stride=1)
130 |
131 | self.predict_disp6 = predict_disp(upconv_planes[1])
132 | self.predict_disp5 = predict_disp(upconv_planes[2])
133 | self.predict_disp4 = predict_disp(upconv_planes[3])
134 | self.predict_disp3 = predict_disp(upconv_planes[4])
135 | self.predict_disp2 = predict_disp(upconv_planes[5])
136 | self.predict_disp1 = predict_disp(upconv_planes[6])
137 |
138 | def init_weights(self):
139 | for m in self.modules():
140 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
141 | nn.init.xavier_uniform(m.weight.data)
142 | if m.bias is not None:
143 | m.bias.data.zero_()
144 |
145 | def forward(self, x):
146 | out_conv1 = self.conv1(x)
147 | out_conv2 = self.conv2(out_conv1)
148 | out_conv3 = self.conv3(out_conv2)
149 | out_conv4 = self.conv4(out_conv3)
150 | out_conv5 = self.conv5(out_conv4)
151 | out_conv6 = self.conv6(out_conv5)
152 | out_conv7 = self.conv7(out_conv6)
153 |
154 | out_upconv7 = crop_like(self.upconv7(out_conv7), out_conv6)
155 | concat7 = torch.cat((out_upconv7, out_conv6), 1)
156 | out_iconv7 = self.iconv7(concat7)
157 |
158 | out_upconv6 = crop_like(self.upconv6(out_iconv7), out_conv5)
159 | concat6 = torch.cat((out_upconv6, out_conv5), 1)
160 | out_iconv6 = self.iconv6(concat6)
161 | disp6 = self.alpha * self.predict_disp6(out_iconv6) + self.beta
162 |
163 | out_upconv5 = crop_like(self.upconv5(out_iconv6), out_conv4)
164 | concat5 = torch.cat((out_upconv5, out_conv4), 1)
165 | out_iconv5 = self.iconv5(concat5)
166 | disp5 = self.alpha * self.predict_disp5(out_iconv5) + self.beta
167 |
168 | out_upconv4 = crop_like(self.upconv4(out_iconv5), out_conv3)
169 | concat4 = torch.cat((out_upconv4, out_conv3), 1)
170 | out_iconv4 = self.iconv4(concat4)
171 | disp4 = self.alpha * self.predict_disp4(out_iconv4) + self.beta
172 |
173 | out_upconv3 = crop_like(self.upconv3(out_iconv4), out_conv2)
174 | disp4_up = crop_like(nn.functional.upsample(disp4, scale_factor=2, mode='bilinear'), out_conv2)
175 | concat3 = torch.cat((out_upconv3, out_conv2, disp4_up), 1)
176 | out_iconv3 = self.iconv3(concat3)
177 | disp3 = self.alpha * self.predict_disp3(out_iconv3) + self.beta
178 |
179 | out_upconv2 = crop_like(self.upconv2(out_iconv3), out_conv1)
180 | disp3_up = crop_like(nn.functional.upsample(disp3, scale_factor=2, mode='bilinear'), out_conv1)
181 | concat2 = torch.cat((out_upconv2, out_conv1, disp3_up), 1)
182 | out_iconv2 = self.iconv2(concat2)
183 | disp2 = self.alpha * self.predict_disp2(out_iconv2) + self.beta
184 |
185 | out_upconv1 = crop_like(self.upconv1(out_iconv2), x)
186 | disp2_up = crop_like(nn.functional.upsample(disp2, scale_factor=2, mode='bilinear'), x)
187 | concat1 = torch.cat((out_upconv1, disp2_up), 1)
188 | out_iconv1 = self.iconv1(concat1)
189 | disp1 = self.alpha * self.predict_disp1(out_iconv1) + self.beta
190 |
191 | if self.training:
192 | return disp1, disp2, disp3, disp4, disp5, disp6
193 | else:
194 | return disp1
195 |
--------------------------------------------------------------------------------
/models/DispResNetS6.py:
--------------------------------------------------------------------------------
1 | # Author: Anurag Ranjan
2 | # Copyright (c) 2019, Anurag Ranjan
3 | # All rights reserved.
4 | # based on github.com/ClementPinard/SfMLearner-Pytorch
5 |
6 | import torch
7 | import torch.nn as nn
8 |
9 | def conv3x3(in_planes, out_planes, stride=1):
10 | """3x3 convolution with padding"""
11 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
12 | padding=1, bias=False)
13 |
14 | class BasicBlock(nn.Module):
15 | expansion = 1
16 |
17 | def __init__(self, inplanes, planes, stride=1, downsample=None):
18 | super(BasicBlock, self).__init__()
19 | self.conv1 = conv3x3(inplanes, planes, stride)
20 | #self.bn1 = nn.BatchNorm2d(planes)
21 | self.relu = nn.ReLU(inplace=True)
22 | self.conv2 = conv3x3(planes, planes)
23 | #self.bn2 = nn.BatchNorm2d(planes)
24 | self.downsample = downsample
25 | self.stride = stride
26 |
27 | def forward(self, x):
28 | residual = x
29 |
30 | out = self.conv1(x)
31 | #out = self.bn1(out)
32 | out = self.relu(out)
33 |
34 | out = self.conv2(out)
35 | #out = self.bn2(out)
36 |
37 | if self.downsample is not None:
38 | residual = self.downsample(x)
39 |
40 | out += residual
41 | out = self.relu(out)
42 |
43 | return out
44 |
45 | def make_layer(inplanes, block, planes, blocks, stride=1):
46 | downsample = None
47 | if stride != 1 or inplanes != planes * block.expansion:
48 | downsample = nn.Sequential(
49 | nn.Conv2d(inplanes, planes * block.expansion,
50 | kernel_size=1, stride=stride, bias=False),
51 | nn.BatchNorm2d(planes * block.expansion),
52 | )
53 |
54 | layers = []
55 | layers.append(block(inplanes, planes, stride, downsample))
56 | inplanes = planes * block.expansion
57 | for i in range(1, blocks):
58 | layers.append(block(inplanes, planes))
59 |
60 | return nn.Sequential(*layers)
61 |
62 | def downsample_conv(in_planes, out_planes, kernel_size=3):
63 | return nn.Sequential(
64 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=2, padding=(kernel_size-1)//2),
65 | nn.ReLU(inplace=True),
66 | nn.Conv2d(out_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2),
67 | nn.ReLU(inplace=True)
68 | )
69 |
70 |
71 | def predict_disp(in_planes):
72 | return nn.Sequential(
73 | nn.Conv2d(in_planes, 1, kernel_size=3, padding=1),
74 | nn.Sigmoid()
75 | )
76 |
77 |
78 | def conv(in_planes, out_planes):
79 | return nn.Sequential(
80 | nn.Conv2d(in_planes, out_planes, kernel_size=3, padding=1),
81 | nn.ReLU(inplace=True)
82 | )
83 |
84 |
85 | def upconv(in_planes, out_planes):
86 | return nn.Sequential(
87 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=3, stride=2, padding=1, output_padding=1),
88 | nn.ReLU(inplace=True)
89 | )
90 |
91 |
92 | def crop_like(input, ref):
93 | assert(input.size(2) >= ref.size(2) and input.size(3) >= ref.size(3))
94 | return input[:, :, :ref.size(2), :ref.size(3)]
95 |
96 |
97 | class DispResNetS6(nn.Module):
98 |
99 | def __init__(self, alpha=10, beta=0.01):
100 | super(DispResNetS6, self).__init__()
101 |
102 | self.alpha = alpha
103 | self.beta = beta
104 |
105 | conv_planes = [32, 64, 128, 256, 512, 512, 512]
106 | self.conv1 = downsample_conv(3, conv_planes[0], kernel_size=7)
107 | self.conv2 = make_layer(conv_planes[0], BasicBlock, conv_planes[1], blocks=2, stride=2)
108 | self.conv3 = make_layer(conv_planes[1], BasicBlock, conv_planes[2], blocks=2, stride=2)
109 | self.conv4 = make_layer(conv_planes[2], BasicBlock, conv_planes[3], blocks=3, stride=2)
110 | self.conv5 = make_layer(conv_planes[3], BasicBlock, conv_planes[4], blocks=3, stride=2)
111 | self.conv6 = make_layer(conv_planes[4], BasicBlock, conv_planes[5], blocks=3, stride=2)
112 | self.conv7 = make_layer(conv_planes[5], BasicBlock, conv_planes[6], blocks=3, stride=2)
113 |
114 | upconv_planes = [512, 512, 256, 128, 64, 32, 16]
115 | self.upconv7 = upconv(conv_planes[6], upconv_planes[0])
116 | self.upconv6 = upconv(upconv_planes[0], upconv_planes[1])
117 | self.upconv5 = upconv(upconv_planes[1], upconv_planes[2])
118 | self.upconv4 = upconv(upconv_planes[2], upconv_planes[3])
119 | self.upconv3 = upconv(upconv_planes[3], upconv_planes[4])
120 | self.upconv2 = upconv(upconv_planes[4], upconv_planes[5])
121 | self.upconv1 = upconv(upconv_planes[5], upconv_planes[6])
122 |
123 | self.iconv7 = make_layer(upconv_planes[0] + conv_planes[5], BasicBlock, upconv_planes[0], blocks=2, stride=1)
124 | self.iconv6 = make_layer(upconv_planes[1] + conv_planes[4], BasicBlock, upconv_planes[1], blocks=2, stride=1)
125 | self.iconv5 = make_layer(upconv_planes[2] + conv_planes[3], BasicBlock, upconv_planes[2], blocks=2, stride=1)
126 | self.iconv4 = make_layer(upconv_planes[3] + conv_planes[2], BasicBlock, upconv_planes[3], blocks=2, stride=1)
127 | self.iconv3 = make_layer(1 + upconv_planes[4] + conv_planes[1], BasicBlock, upconv_planes[4], blocks=1, stride=1)
128 | self.iconv2 = make_layer(1 + upconv_planes[5] + conv_planes[0], BasicBlock, upconv_planes[5], blocks=1, stride=1)
129 | self.iconv1 = make_layer(1 + upconv_planes[6], BasicBlock, upconv_planes[6], blocks=1, stride=1)
130 |
131 | self.predict_disp6 = predict_disp(upconv_planes[1])
132 | self.predict_disp5 = predict_disp(upconv_planes[2])
133 | self.predict_disp4 = predict_disp(upconv_planes[3])
134 | self.predict_disp3 = predict_disp(upconv_planes[4])
135 | self.predict_disp2 = predict_disp(upconv_planes[5])
136 | self.predict_disp1 = predict_disp(upconv_planes[6])
137 |
138 | def init_weights(self):
139 | for m in self.modules():
140 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
141 | nn.init.xavier_uniform(m.weight.data)
142 | if m.bias is not None:
143 | m.bias.data.zero_()
144 |
145 | def forward(self, x):
146 | out_conv1 = self.conv1(x)
147 | out_conv2 = self.conv2(out_conv1)
148 | out_conv3 = self.conv3(out_conv2)
149 | out_conv4 = self.conv4(out_conv3)
150 | out_conv5 = self.conv5(out_conv4)
151 | out_conv6 = self.conv6(out_conv5)
152 | out_conv7 = self.conv7(out_conv6)
153 |
154 | out_upconv7 = crop_like(self.upconv7(out_conv7), out_conv6)
155 | concat7 = torch.cat((out_upconv7, out_conv6), 1)
156 | out_iconv7 = self.iconv7(concat7)
157 |
158 | out_upconv6 = crop_like(self.upconv6(out_iconv7), out_conv5)
159 | concat6 = torch.cat((out_upconv6, out_conv5), 1)
160 | out_iconv6 = self.iconv6(concat6)
161 | disp6 = self.alpha * self.predict_disp6(out_iconv6) + self.beta
162 |
163 | out_upconv5 = crop_like(self.upconv5(out_iconv6), out_conv4)
164 | concat5 = torch.cat((out_upconv5, out_conv4), 1)
165 | out_iconv5 = self.iconv5(concat5)
166 | disp5 = self.alpha * self.predict_disp5(out_iconv5) + self.beta
167 |
168 | out_upconv4 = crop_like(self.upconv4(out_iconv5), out_conv3)
169 | concat4 = torch.cat((out_upconv4, out_conv3), 1)
170 | out_iconv4 = self.iconv4(concat4)
171 | disp4 = self.alpha * self.predict_disp4(out_iconv4) + self.beta
172 |
173 | out_upconv3 = crop_like(self.upconv3(out_iconv4), out_conv2)
174 | disp4_up = crop_like(nn.functional.upsample(disp4, scale_factor=2, mode='bilinear'), out_conv2)
175 | concat3 = torch.cat((out_upconv3, out_conv2, disp4_up), 1)
176 | out_iconv3 = self.iconv3(concat3)
177 | disp3 = self.alpha * self.predict_disp3(out_iconv3) + self.beta
178 |
179 | out_upconv2 = crop_like(self.upconv2(out_iconv3), out_conv1)
180 | disp3_up = crop_like(nn.functional.upsample(disp3, scale_factor=2, mode='bilinear'), out_conv1)
181 | concat2 = torch.cat((out_upconv2, out_conv1, disp3_up), 1)
182 | out_iconv2 = self.iconv2(concat2)
183 | disp2 = self.alpha * self.predict_disp2(out_iconv2) + self.beta
184 |
185 | out_upconv1 = crop_like(self.upconv1(out_iconv2), x)
186 | disp2_up = crop_like(nn.functional.upsample(disp2, scale_factor=2, mode='bilinear'), x)
187 | concat1 = torch.cat((out_upconv1, disp2_up), 1)
188 | out_iconv1 = self.iconv1(concat1)
189 | disp1 = self.alpha * self.predict_disp1(out_iconv1) + self.beta
190 |
191 | if self.training:
192 | return disp1, disp2, disp3, disp4, disp5, disp6
193 | else:
194 | return disp1
195 |
--------------------------------------------------------------------------------
/models/FlowNetC6.py:
--------------------------------------------------------------------------------
1 | # Author: Anurag Ranjan
2 | # Copyright (c) 2019, Anurag Ranjan
3 | # All rights reserved.
4 | # based on github.com/NVIDIA/FlowNet2-Pytorch
5 |
6 | import torch
7 | import torch.nn as nn
8 | from torch.nn import init
9 |
10 | import math
11 | import numpy as np
12 |
13 | # from .correlation_package.modules.correlation import Correlation
14 | from spatial_correlation_sampler import spatial_correlation_sample
15 | from .submodules import conv, deconv, predict_flow
16 | 'Parameter count , 39,175,298 '
17 |
18 | def correlate(input1, input2):
19 | out_corr = spatial_correlation_sample(input1,
20 | input2,
21 | kernel_size=1,
22 | patch_size=21,
23 | stride=1,
24 | padding=0,
25 | dilation_patch=2)
26 | # collate dimensions 1 and 2 in order to be treated as a
27 | # regular 4D tensor
28 | b, ph, pw, h, w = out_corr.size()
29 | out_corr = out_corr.view(b, ph * pw, h, w)/input1.size(1)
30 | return out_corr
31 |
32 | class FlowNetC6(nn.Module):
33 | def __init__(self, nlevels=5, batchNorm=False, div_flow = 20, full_res=True, pretrained=True):
34 | super(FlowNetC6,self).__init__()
35 |
36 | #assert(nlevels==5)
37 | self.batchNorm = batchNorm
38 | self.div_flow = div_flow
39 | self.full_res = full_res
40 |
41 | self.conv1 = conv(self.batchNorm, 3, 64, kernel_size=7, stride=2)
42 | self.conv2 = conv(self.batchNorm, 64, 128, kernel_size=5, stride=2)
43 | self.conv3 = conv(self.batchNorm, 128, 256, kernel_size=5, stride=2)
44 | self.conv_redir = conv(self.batchNorm, 256, 32, kernel_size=1, stride=1)
45 |
46 | # if args.fp16:
47 | # self.corr = nn.Sequential(
48 | # tofp32(),
49 | # Correlation(pad_size=20, kernel_size=1, max_displacement=20, stride1=1, stride2=2, corr_multiply=1),
50 | # tofp16())
51 | # else:
52 | self.corr = correlate # Correlation(pad_size=20, kernel_size=1, max_displacement=20, stride1=1, stride2=2, corr_multiply=1)
53 |
54 | self.corr_activation = nn.LeakyReLU(0.1,inplace=True)
55 | self.conv3_1 = conv(self.batchNorm, 473, 256)
56 | self.conv4 = conv(self.batchNorm, 256, 512, stride=2)
57 | self.conv4_1 = conv(self.batchNorm, 512, 512)
58 | self.conv5 = conv(self.batchNorm, 512, 512, stride=2)
59 | self.conv5_1 = conv(self.batchNorm, 512, 512)
60 | self.conv6 = conv(self.batchNorm, 512, 1024, stride=2)
61 | self.conv6_1 = conv(self.batchNorm,1024, 1024)
62 |
63 | self.deconv5 = deconv(1024,512)
64 | self.deconv4 = deconv(1026,256)
65 | self.deconv3 = deconv(770,128)
66 | self.deconv2 = deconv(386,64)
67 | self.deconv1 = deconv(194,32)
68 |
69 | self.predict_flow6 = predict_flow(1024)
70 | self.predict_flow5 = predict_flow(1026)
71 | self.predict_flow4 = predict_flow(770)
72 | self.predict_flow3 = predict_flow(386)
73 | self.predict_flow2 = predict_flow(194)
74 | self.predict_flow1 = predict_flow(98)
75 |
76 | self.upsampled_flow6_to_5 = nn.ConvTranspose2d(2, 2, 4, 2, 1, bias=True)
77 | self.upsampled_flow5_to_4 = nn.ConvTranspose2d(2, 2, 4, 2, 1, bias=True)
78 | self.upsampled_flow4_to_3 = nn.ConvTranspose2d(2, 2, 4, 2, 1, bias=True)
79 | self.upsampled_flow3_to_2 = nn.ConvTranspose2d(2, 2, 4, 2, 1, bias=True)
80 | self.upsampled_flow2_to_1 = nn.ConvTranspose2d(2, 2, 4, 2, 1, bias=True)
81 |
82 | self.upsample1 = nn.Upsample(scale_factor=2, mode='bilinear')
83 |
84 | def init_weights(self):
85 | for m in self.modules():
86 | if isinstance(m, nn.Conv2d):
87 | if m.bias is not None:
88 | init.uniform(m.bias)
89 | init.xavier_uniform(m.weight)
90 |
91 | if isinstance(m, nn.ConvTranspose2d):
92 | if m.bias is not None:
93 | init.uniform(m.bias)
94 | init.xavier_uniform(m.weight)
95 | # init_deconv_bilinear(m.weight)
96 |
97 |
98 |
99 | def forward(self, x1,x2):
100 |
101 | out_conv1a = self.conv1(x1)
102 | out_conv2a = self.conv2(out_conv1a)
103 | out_conv3a = self.conv3(out_conv2a)
104 |
105 | # FlownetC bottom input stream
106 | out_conv1b = self.conv1(x2)
107 | out_conv2b = self.conv2(out_conv1b)
108 | out_conv3b = self.conv3(out_conv2b)
109 |
110 | # Merge streams
111 | out_corr = self.corr(out_conv3a, out_conv3b)
112 | out_corr = self.corr_activation(out_corr)
113 |
114 | # Redirect top input stream and concatenate
115 | out_conv_redir = self.conv_redir(out_conv3a)
116 |
117 | in_conv3_1 = torch.cat((out_conv_redir, out_corr), 1)
118 |
119 | # Merged conv layers
120 | out_conv3_1 = self.conv3_1(in_conv3_1)
121 | out_conv4 = self.conv4_1(self.conv4(out_conv3_1))
122 | out_conv5 = self.conv5_1(self.conv5(out_conv4))
123 | out_conv6 = self.conv6_1(self.conv6(out_conv5))
124 |
125 | flow6 = self.predict_flow6(out_conv6)
126 | out_deconv5 = self.deconv5(out_conv6)
127 | flow6_up = self.upsampled_flow6_to_5(flow6)
128 |
129 | concat5 = torch.cat((out_conv5,out_deconv5,flow6_up),1)
130 |
131 | flow5 = self.predict_flow5(concat5)
132 | out_deconv4 = self.deconv4(concat5)
133 | flow5_up = self.upsampled_flow5_to_4(flow5)
134 | concat4 = torch.cat((out_conv4,out_deconv4,flow5_up),1)
135 |
136 | flow4 = self.predict_flow4(concat4)
137 | out_deconv3 = self.deconv3(concat4)
138 | flow4_up = self.upsampled_flow4_to_3(flow4)
139 | concat3 = torch.cat((out_conv3_1,out_deconv3,flow4_up),1)
140 |
141 | flow3 = self.predict_flow3(concat3)
142 | out_deconv2 = self.deconv2(concat3)
143 | flow3_up = self.upsampled_flow3_to_2(flow3)
144 | concat2 = torch.cat((out_conv2a,out_deconv2,flow3_up),1)
145 |
146 | flow2 = self.predict_flow2(concat2)
147 | out_deconv1 = self.deconv1(concat2)
148 | flow2_up = self.upsampled_flow2_to_1(flow2)
149 | concat1 = torch.cat((out_conv1a,out_deconv1,flow2_up), 1)
150 |
151 | flow1 = self.predict_flow1(concat1)
152 | #out_convs = [out_conv2a, out_conv2b, out_conv3a, out_conv3b]
153 | if self.full_res:
154 | flow1 = self.div_flow*self.upsample1(flow1)
155 | flow2 = self.div_flow*self.upsample1(flow2)
156 | flow3 = self.div_flow*self.upsample1(flow3)
157 | flow4 = self.div_flow*self.upsample1(flow4)
158 | flow5 = self.div_flow*self.upsample1(flow5)
159 | flow6 = self.div_flow*self.upsample1(flow6)
160 |
161 | if self.training:
162 | return flow1, flow2,flow3,flow4,flow5,flow6 #, out_convs
163 | else:
164 | return flow1
165 |
--------------------------------------------------------------------------------
/models/MaskNet6.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 |
4 |
5 | def conv(in_planes, out_planes, kernel_size=3):
6 | return nn.Sequential(
7 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2, stride=2),
8 | nn.ReLU(inplace=True)
9 | )
10 |
11 |
12 | def upconv(in_planes, out_planes):
13 | return nn.Sequential(
14 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1),
15 | nn.ReLU(inplace=True)
16 | )
17 |
18 |
19 | class MaskNet6(nn.Module):
20 |
21 | def __init__(self, nb_ref_imgs=4, output_exp=True):
22 | super(MaskNet6, self).__init__()
23 | self.nb_ref_imgs = nb_ref_imgs
24 | self.output_exp = output_exp
25 |
26 | conv_planes = [16, 32, 64, 128, 256, 256, 256, 256]
27 | self.conv1 = conv(3*(1+self.nb_ref_imgs), conv_planes[0], kernel_size=7)
28 | self.conv2 = conv(conv_planes[0], conv_planes[1], kernel_size=5)
29 | self.conv3 = conv(conv_planes[1], conv_planes[2])
30 | self.conv4 = conv(conv_planes[2], conv_planes[3])
31 | self.conv5 = conv(conv_planes[3], conv_planes[4])
32 | self.conv6 = conv(conv_planes[4], conv_planes[5])
33 | #self.conv7 = conv(conv_planes[5], conv_planes[6])
34 | #self.conv8 = conv(conv_planes[6], conv_planes[7])
35 |
36 | #self.pose_pred = nn.Conv2d(conv_planes[7], 6*self.nb_ref_imgs, kernel_size=1, padding=0)
37 |
38 | if self.output_exp:
39 | upconv_planes = [256, 256, 128, 64, 32, 16]
40 | self.deconv6 = upconv(conv_planes[5], upconv_planes[0])
41 | self.deconv5 = upconv(upconv_planes[0]+conv_planes[4], upconv_planes[1])
42 | self.deconv4 = upconv(upconv_planes[1]+conv_planes[3], upconv_planes[2])
43 | self.deconv3 = upconv(upconv_planes[2]+conv_planes[2], upconv_planes[3])
44 | self.deconv2 = upconv(upconv_planes[3]+conv_planes[1], upconv_planes[4])
45 | self.deconv1 = upconv(upconv_planes[4]+conv_planes[0], upconv_planes[5])
46 |
47 | self.pred_mask6 = nn.Conv2d(upconv_planes[0], self.nb_ref_imgs, kernel_size=3, padding=1)
48 | self.pred_mask5 = nn.Conv2d(upconv_planes[1], self.nb_ref_imgs, kernel_size=3, padding=1)
49 | self.pred_mask4 = nn.Conv2d(upconv_planes[2], self.nb_ref_imgs, kernel_size=3, padding=1)
50 | self.pred_mask3 = nn.Conv2d(upconv_planes[3], self.nb_ref_imgs, kernel_size=3, padding=1)
51 | self.pred_mask2 = nn.Conv2d(upconv_planes[4], self.nb_ref_imgs, kernel_size=3, padding=1)
52 | self.pred_mask1 = nn.Conv2d(upconv_planes[5], self.nb_ref_imgs, kernel_size=3, padding=1)
53 |
54 | def init_weights(self):
55 | for m in self.modules():
56 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
57 | nn.init.xavier_uniform(m.weight.data)
58 | if m.bias is not None:
59 | m.bias.data.zero_()
60 |
61 | def init_mask_weights(self):
62 | for m in self.modules():
63 | if isinstance(m, nn.ConvTranspose2d):
64 | nn.init.xavier_uniform(m.weight.data)
65 | if m.bias is not None:
66 | m.bias.data.zero_()
67 |
68 | for module in [self.pred_mask1, self.pred_mask2, self.pred_mask3, self.pred_mask4, self.pred_mask5, self.pred_mask6]:
69 | for m in module.modules():
70 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
71 | nn.init.xavier_uniform(m.weight.data)
72 | if m.bias is not None:
73 | m.bias.data.zero_()
74 |
75 | # for mod in [self.conv1, self.conv2, self.conv3, self.conv4, self.conv5, self.conv6, self.conv7, self.conv8, self.pose_pred]:
76 | # for fparams in mod.parameters():
77 | # fparams.requires_grad = False
78 |
79 |
80 | def forward(self, target_image, ref_imgs):
81 | assert(len(ref_imgs) == self.nb_ref_imgs)
82 | input = [target_image]
83 | input.extend(ref_imgs)
84 | input = torch.cat(input, 1)
85 | out_conv1 = self.conv1(input)
86 | out_conv2 = self.conv2(out_conv1)
87 | out_conv3 = self.conv3(out_conv2)
88 | out_conv4 = self.conv4(out_conv3)
89 | out_conv5 = self.conv5(out_conv4)
90 | out_conv6 = self.conv6(out_conv5)
91 | #out_conv7 = self.conv7(out_conv6)
92 | #out_conv8 = self.conv8(out_conv7)
93 |
94 | #pose = self.pose_pred(out_conv8)
95 | #pose = pose.mean(3).mean(2)
96 | #pose = 0.01 * pose.view(pose.size(0), self.nb_ref_imgs, 6)
97 |
98 | if self.output_exp:
99 | out_upconv6 = self.deconv6(out_conv6 )#[:, :, 0:out_conv5.size(2), 0:out_conv5.size(3)]
100 | out_upconv5 = self.deconv5(torch.cat((out_upconv6, out_conv5), 1))#[:, :, 0:out_conv4.size(2), 0:out_conv4.size(3)]
101 | out_upconv4 = self.deconv4(torch.cat((out_upconv5, out_conv4), 1))#[:, :, 0:out_conv3.size(2), 0:out_conv3.size(3)]
102 | out_upconv3 = self.deconv3(torch.cat((out_upconv4, out_conv3), 1))#[:, :, 0:out_conv2.size(2), 0:out_conv2.size(3)]
103 | out_upconv2 = self.deconv2(torch.cat((out_upconv3, out_conv2), 1))#[:, :, 0:out_conv1.size(2), 0:out_conv1.size(3)]
104 | out_upconv1 = self.deconv1(torch.cat((out_upconv2, out_conv1), 1))#[:, :, 0:input.size(2), 0:input.size(3)]
105 |
106 | exp_mask6 = nn.functional.sigmoid(self.pred_mask6(out_upconv6))
107 | exp_mask5 = nn.functional.sigmoid(self.pred_mask5(out_upconv5))
108 | exp_mask4 = nn.functional.sigmoid(self.pred_mask4(out_upconv4))
109 | exp_mask3 = nn.functional.sigmoid(self.pred_mask3(out_upconv3))
110 | exp_mask2 = nn.functional.sigmoid(self.pred_mask2(out_upconv2))
111 | exp_mask1 = nn.functional.sigmoid(self.pred_mask1(out_upconv1))
112 | else:
113 | exp_mask6 = None
114 | exp_mask5 = None
115 | exp_mask4 = None
116 | exp_mask3 = None
117 | exp_mask2 = None
118 | exp_mask1 = None
119 |
120 | if self.training:
121 | return exp_mask1, exp_mask2, exp_mask3, exp_mask4, exp_mask5, exp_mask6
122 | else:
123 | return exp_mask1
124 |
--------------------------------------------------------------------------------
/models/MaskResNet6.py:
--------------------------------------------------------------------------------
1 | # Author: Anurag Ranjan
2 | # Copyright (c) 2019, Anurag Ranjan
3 | # All rights reserved.
4 |
5 | import torch
6 | import torch.nn as nn
7 |
8 | def conv3x3(in_planes, out_planes, stride=1):
9 | """3x3 convolution with padding"""
10 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
11 | padding=1, bias=False)
12 |
13 | def conv(in_planes, out_planes, kernel_size=3, stride=2):
14 | return nn.Sequential(
15 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2, stride=stride),
16 | nn.ReLU(inplace=True)
17 | )
18 |
19 |
20 | def upconv(in_planes, out_planes):
21 | return nn.Sequential(
22 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1),
23 | nn.ReLU(inplace=True)
24 | )
25 |
26 | class BasicBlock(nn.Module):
27 | expansion = 1
28 |
29 | def __init__(self, inplanes, planes, stride=1, downsample=None):
30 | super(BasicBlock, self).__init__()
31 | self.conv1 = conv3x3(inplanes, planes, stride)
32 | self.relu = nn.ReLU(inplace=True)
33 | self.conv2 = conv3x3(planes, planes)
34 | self.downsample = downsample
35 | self.stride = stride
36 |
37 | def forward(self, x):
38 | residual = x
39 |
40 | out = self.conv1(x)
41 | out = self.relu(out)
42 | out = self.conv2(out)
43 |
44 | if self.downsample is not None:
45 | residual = self.downsample(x)
46 |
47 | out += residual
48 | out = self.relu(out)
49 |
50 | return out
51 |
52 | def make_layer(inplanes, block, planes, blocks, stride=1):
53 | downsample = None
54 | if stride != 1 or inplanes != planes * block.expansion:
55 | downsample = nn.Sequential(
56 | nn.Conv2d(inplanes, planes * block.expansion,
57 | kernel_size=1, stride=stride, bias=False),
58 | nn.BatchNorm2d(planes * block.expansion),
59 | )
60 |
61 | layers = []
62 | layers.append(block(inplanes, planes, stride, downsample))
63 | inplanes = planes * block.expansion
64 | for i in range(1, blocks):
65 | layers.append(block(inplanes, planes))
66 |
67 | return nn.Sequential(*layers)
68 |
69 | class MaskResNet6(nn.Module):
70 |
71 | def __init__(self, nb_ref_imgs=4, output_exp=True):
72 | super(MaskResNet6, self).__init__()
73 | self.nb_ref_imgs = nb_ref_imgs
74 | self.output_exp = output_exp
75 |
76 | conv_planes = [16, 32, 64, 128, 256, 256, 256, 256]
77 | self.conv1 = conv(3*(1+self.nb_ref_imgs), conv_planes[0], kernel_size=7, stride=2)
78 | self.conv2 = make_layer(conv_planes[0], BasicBlock, conv_planes[1], blocks=2, stride=2)
79 | self.conv3 = make_layer(conv_planes[1], BasicBlock, conv_planes[2], blocks=2, stride=2)
80 | self.conv4 = make_layer(conv_planes[2], BasicBlock, conv_planes[3], blocks=2, stride=2)
81 | self.conv5 = make_layer(conv_planes[3], BasicBlock, conv_planes[4], blocks=2, stride=2)
82 | self.conv6 = make_layer(conv_planes[4], BasicBlock, conv_planes[5], blocks=2, stride=2)
83 |
84 | if self.output_exp:
85 | upconv_planes = [256, 256, 128, 64, 32, 16]
86 | self.deconv6 = upconv(conv_planes[5], upconv_planes[0])
87 | self.deconv5 = upconv(upconv_planes[0]+conv_planes[4], upconv_planes[1])
88 | self.deconv4 = upconv(upconv_planes[1]+conv_planes[3], upconv_planes[2])
89 | self.deconv3 = upconv(upconv_planes[2]+conv_planes[2], upconv_planes[3])
90 | self.deconv2 = upconv(upconv_planes[3]+conv_planes[1], upconv_planes[4])
91 | self.deconv1 = upconv(upconv_planes[4]+conv_planes[0], upconv_planes[5])
92 |
93 | self.pred_mask6 = nn.Conv2d(upconv_planes[0], self.nb_ref_imgs, kernel_size=3, padding=1)
94 | self.pred_mask5 = nn.Conv2d(upconv_planes[1], self.nb_ref_imgs, kernel_size=3, padding=1)
95 | self.pred_mask4 = nn.Conv2d(upconv_planes[2], self.nb_ref_imgs, kernel_size=3, padding=1)
96 | self.pred_mask3 = nn.Conv2d(upconv_planes[3], self.nb_ref_imgs, kernel_size=3, padding=1)
97 | self.pred_mask2 = nn.Conv2d(upconv_planes[4], self.nb_ref_imgs, kernel_size=3, padding=1)
98 | self.pred_mask1 = nn.Conv2d(upconv_planes[5], self.nb_ref_imgs, kernel_size=3, padding=1)
99 |
100 | def init_weights(self):
101 | for m in self.modules():
102 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
103 | nn.init.xavier_uniform(m.weight.data)
104 | if m.bias is not None:
105 | m.bias.data.zero_()
106 |
107 | def init_mask_weights(self):
108 | for m in self.modules():
109 | if isinstance(m, nn.ConvTranspose2d):
110 | nn.init.xavier_uniform(m.weight.data)
111 | if m.bias is not None:
112 | m.bias.data.zero_()
113 |
114 | for module in [self.pred_mask1, self.pred_mask2, self.pred_mask3, self.pred_mask4, self.pred_mask5, self.pred_mask6]:
115 | for m in module.modules():
116 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
117 | nn.init.xavier_uniform(m.weight.data)
118 | if m.bias is not None:
119 | m.bias.data.zero_()
120 |
121 |
122 |
123 | def forward(self, target_image, ref_imgs):
124 | assert(len(ref_imgs) == self.nb_ref_imgs)
125 | input = [target_image]
126 | input.extend(ref_imgs)
127 | input = torch.cat(input, 1)
128 | out_conv1 = self.conv1(input)
129 | out_conv2 = self.conv2(out_conv1)
130 | out_conv3 = self.conv3(out_conv2)
131 | out_conv4 = self.conv4(out_conv3)
132 | out_conv5 = self.conv5(out_conv4)
133 | out_conv6 = self.conv6(out_conv5)
134 |
135 | if self.output_exp:
136 | out_upconv6 = self.deconv6(out_conv6 )#[:, :, 0:out_conv5.size(2), 0:out_conv5.size(3)]
137 | out_upconv5 = self.deconv5(torch.cat((out_upconv6, out_conv5), 1))#[:, :, 0:out_conv4.size(2), 0:out_conv4.size(3)]
138 | out_upconv4 = self.deconv4(torch.cat((out_upconv5, out_conv4), 1))#[:, :, 0:out_conv3.size(2), 0:out_conv3.size(3)]
139 | out_upconv3 = self.deconv3(torch.cat((out_upconv4, out_conv3), 1))#[:, :, 0:out_conv2.size(2), 0:out_conv2.size(3)]
140 | out_upconv2 = self.deconv2(torch.cat((out_upconv3, out_conv2), 1))#[:, :, 0:out_conv1.size(2), 0:out_conv1.size(3)]
141 | out_upconv1 = self.deconv1(torch.cat((out_upconv2, out_conv1), 1))#[:, :, 0:input.size(2), 0:input.size(3)]
142 |
143 | exp_mask6 = nn.functional.sigmoid(self.pred_mask6(out_upconv6))
144 | exp_mask5 = nn.functional.sigmoid(self.pred_mask5(out_upconv5))
145 | exp_mask4 = nn.functional.sigmoid(self.pred_mask4(out_upconv4))
146 | exp_mask3 = nn.functional.sigmoid(self.pred_mask3(out_upconv3))
147 | exp_mask2 = nn.functional.sigmoid(self.pred_mask2(out_upconv2))
148 | exp_mask1 = nn.functional.sigmoid(self.pred_mask1(out_upconv1))
149 | else:
150 | exp_mask6 = None
151 | exp_mask5 = None
152 | exp_mask4 = None
153 | exp_mask3 = None
154 | exp_mask2 = None
155 | exp_mask1 = None
156 |
157 | if self.training:
158 | return exp_mask1, exp_mask2, exp_mask3, exp_mask4, exp_mask5, exp_mask6
159 | else:
160 | return exp_mask1
161 |
--------------------------------------------------------------------------------
/models/PoseExpNet.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 |
4 |
5 | def conv(in_planes, out_planes, kernel_size=3):
6 | return nn.Sequential(
7 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2, stride=2),
8 | nn.ReLU(inplace=True)
9 | )
10 |
11 |
12 | def upconv(in_planes, out_planes):
13 | return nn.Sequential(
14 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1),
15 | nn.ReLU(inplace=True)
16 | )
17 |
18 |
19 | class PoseExpNet(nn.Module):
20 |
21 | def __init__(self, nb_ref_imgs=2, output_exp=False):
22 | super(PoseExpNet, self).__init__()
23 | self.nb_ref_imgs = nb_ref_imgs
24 | self.output_exp = output_exp
25 |
26 | conv_planes = [16, 32, 64, 128, 256, 256, 256]
27 | self.conv1 = conv(3*(1+self.nb_ref_imgs), conv_planes[0], kernel_size=7)
28 | self.conv2 = conv(conv_planes[0], conv_planes[1], kernel_size=5)
29 | self.conv3 = conv(conv_planes[1], conv_planes[2])
30 | self.conv4 = conv(conv_planes[2], conv_planes[3])
31 | self.conv5 = conv(conv_planes[3], conv_planes[4])
32 | self.conv6 = conv(conv_planes[4], conv_planes[5])
33 | self.conv7 = conv(conv_planes[5], conv_planes[6])
34 |
35 | self.pose_pred = nn.Conv2d(conv_planes[6], 6*self.nb_ref_imgs, kernel_size=1, padding=0)
36 |
37 | if self.output_exp:
38 | upconv_planes = [256, 128, 64, 32, 16]
39 | self.upconv5 = upconv(conv_planes[4], upconv_planes[0])
40 | self.upconv4 = upconv(upconv_planes[0], upconv_planes[1])
41 | self.upconv3 = upconv(upconv_planes[1], upconv_planes[2])
42 | self.upconv2 = upconv(upconv_planes[2], upconv_planes[3])
43 | self.upconv1 = upconv(upconv_planes[3], upconv_planes[4])
44 |
45 | self.predict_mask4 = nn.Conv2d(upconv_planes[1], self.nb_ref_imgs, kernel_size=3, padding=1)
46 | self.predict_mask3 = nn.Conv2d(upconv_planes[2], self.nb_ref_imgs, kernel_size=3, padding=1)
47 | self.predict_mask2 = nn.Conv2d(upconv_planes[3], self.nb_ref_imgs, kernel_size=3, padding=1)
48 | self.predict_mask1 = nn.Conv2d(upconv_planes[4], self.nb_ref_imgs, kernel_size=3, padding=1)
49 |
50 | def init_weights(self):
51 | for m in self.modules():
52 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
53 | nn.init.xavier_uniform(m.weight.data)
54 | if m.bias is not None:
55 | m.bias.data.zero_()
56 |
57 | def forward(self, target_image, ref_imgs):
58 | assert(len(ref_imgs) == self.nb_ref_imgs)
59 | input = [target_image]
60 | input.extend(ref_imgs)
61 | input = torch.cat(input, 1)
62 | out_conv1 = self.conv1(input)
63 | out_conv2 = self.conv2(out_conv1)
64 | out_conv3 = self.conv3(out_conv2)
65 | out_conv4 = self.conv4(out_conv3)
66 | out_conv5 = self.conv5(out_conv4)
67 | out_conv6 = self.conv6(out_conv5)
68 | out_conv7 = self.conv7(out_conv6)
69 |
70 | pose = self.pose_pred(out_conv7)
71 | pose = pose.mean(3).mean(2)
72 | pose = 0.01 * pose.view(pose.size(0), self.nb_ref_imgs, 6)
73 |
74 | if self.output_exp:
75 | out_upconv5 = self.upconv5(out_conv5 )[:, :, 0:out_conv4.size(2), 0:out_conv4.size(3)]
76 | out_upconv4 = self.upconv4(out_upconv5)[:, :, 0:out_conv3.size(2), 0:out_conv3.size(3)]
77 | out_upconv3 = self.upconv3(out_upconv4)[:, :, 0:out_conv2.size(2), 0:out_conv2.size(3)]
78 | out_upconv2 = self.upconv2(out_upconv3)[:, :, 0:out_conv1.size(2), 0:out_conv1.size(3)]
79 | out_upconv1 = self.upconv1(out_upconv2)[:, :, 0:input.size(2), 0:input.size(3)]
80 |
81 | exp_mask4 = nn.functional.sigmoid(self.predict_mask4(out_upconv4))
82 | exp_mask3 = nn.functional.sigmoid(self.predict_mask3(out_upconv3))
83 | exp_mask2 = nn.functional.sigmoid(self.predict_mask2(out_upconv2))
84 | exp_mask1 = nn.functional.sigmoid(self.predict_mask1(out_upconv1))
85 | else:
86 | exp_mask4 = None
87 | exp_mask3 = None
88 | exp_mask2 = None
89 | exp_mask1 = None
90 |
91 | if self.training:
92 | return [exp_mask1, exp_mask2, exp_mask3, exp_mask4], pose
93 | else:
94 | return exp_mask1, pose
95 |
--------------------------------------------------------------------------------
/models/PoseNet6.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 |
4 |
5 | def conv(in_planes, out_planes, kernel_size=3):
6 | return nn.Sequential(
7 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2, stride=2),
8 | nn.ReLU(inplace=True)
9 | )
10 |
11 |
12 | def upconv(in_planes, out_planes):
13 | return nn.Sequential(
14 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1),
15 | nn.ReLU(inplace=True)
16 | )
17 |
18 |
19 | class PoseNet6(nn.Module):
20 |
21 | def __init__(self, nb_ref_imgs=2):
22 | super(PoseNet6, self).__init__()
23 | self.nb_ref_imgs = nb_ref_imgs
24 |
25 | conv_planes = [16, 32, 64, 128, 256, 256, 256]
26 | self.conv0 = conv(3*(1+self.nb_ref_imgs), 3*(1+self.nb_ref_imgs), kernel_size=3)
27 | self.conv1 = conv(3*(1+self.nb_ref_imgs), conv_planes[0], kernel_size=7)
28 | self.conv2 = conv(conv_planes[0], conv_planes[1], kernel_size=5)
29 | self.conv3 = conv(conv_planes[1], conv_planes[2])
30 | self.conv4 = conv(conv_planes[2], conv_planes[3])
31 | self.conv5 = conv(conv_planes[3], conv_planes[4])
32 | self.conv6 = conv(conv_planes[4], conv_planes[5])
33 | self.conv7 = conv(conv_planes[5], conv_planes[6])
34 |
35 | self.pose_pred = nn.Conv2d(conv_planes[6], 6*self.nb_ref_imgs, kernel_size=1, padding=0)
36 |
37 | def init_weights(self):
38 | for m in self.modules():
39 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
40 | nn.init.xavier_uniform(m.weight.data)
41 | if m.bias is not None:
42 | m.bias.data.zero_()
43 |
44 | def forward(self, target_image, ref_imgs):
45 | assert(len(ref_imgs) == self.nb_ref_imgs)
46 | input = [target_image]
47 | input.extend(ref_imgs)
48 | input = torch.cat(input, 1)
49 | out_conv0 = self.conv0(input)
50 | out_conv1 = self.conv1(out_conv0)
51 | out_conv2 = self.conv2(out_conv1)
52 | out_conv3 = self.conv3(out_conv2)
53 | out_conv4 = self.conv4(out_conv3)
54 | out_conv5 = self.conv5(out_conv4)
55 | out_conv6 = self.conv6(out_conv5)
56 | out_conv7 = self.conv7(out_conv6)
57 |
58 | pose = self.pose_pred(out_conv7)
59 | pose = pose.mean(3).mean(2)
60 | pose = 0.01 * pose.view(pose.size(0), self.nb_ref_imgs, 6)
61 |
62 | return pose
63 |
--------------------------------------------------------------------------------
/models/PoseNetB6.py:
--------------------------------------------------------------------------------
1 | # Author: Anurag Ranjan
2 | # Copyright (c) 2019, Anurag Ranjan
3 | # All rights reserved.
4 | # based on github.com/ClementPinard/SfMLearner-Pytorch
5 |
6 | import torch
7 | import torch.nn as nn
8 |
9 |
10 | def conv(in_planes, out_planes, kernel_size=3):
11 | return nn.Sequential(
12 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, padding=(kernel_size-1)//2, stride=2),
13 | nn.ReLU(inplace=True)
14 | )
15 |
16 |
17 | def upconv(in_planes, out_planes):
18 | return nn.Sequential(
19 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1),
20 | nn.ReLU(inplace=True)
21 | )
22 |
23 |
24 | class PoseNetB6(nn.Module):
25 |
26 | def __init__(self, nb_ref_imgs=2):
27 | super(PoseNetB6, self).__init__()
28 | self.nb_ref_imgs = nb_ref_imgs
29 |
30 | conv_planes = [16, 32, 64, 128, 256, 256, 256, 256]
31 | self.conv1 = conv(3*(1+self.nb_ref_imgs), conv_planes[0], kernel_size=7)
32 | self.conv2 = conv(conv_planes[0], conv_planes[1], kernel_size=5)
33 | self.conv3 = conv(conv_planes[1], conv_planes[2])
34 | self.conv4 = conv(conv_planes[2], conv_planes[3])
35 | self.conv5 = conv(conv_planes[3], conv_planes[4])
36 | self.conv6 = conv(conv_planes[4], conv_planes[5])
37 | self.conv7 = conv(conv_planes[5], conv_planes[6])
38 | self.conv8 = conv(conv_planes[6], conv_planes[7])
39 |
40 | self.pose_pred = nn.Conv2d(conv_planes[7], 6*self.nb_ref_imgs, kernel_size=1, padding=0)
41 |
42 |
43 | def init_weights(self):
44 | for m in self.modules():
45 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
46 | nn.init.xavier_uniform(m.weight.data)
47 | if m.bias is not None:
48 | m.bias.data.zero_()
49 |
50 | def init_mask_weights(self):
51 | for m in self.modules():
52 | if isinstance(m, nn.ConvTranspose2d):
53 | nn.init.xavier_uniform(m.weight.data)
54 | if m.bias is not None:
55 | m.bias.data.zero_()
56 |
57 | for module in [self.pred_mask1, self.pred_mask2, self.pred_mask3, self.pred_mask4, self.pred_mask5, self.pred_mask6]:
58 | for m in module.modules():
59 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
60 | nn.init.xavier_uniform(m.weight.data)
61 | if m.bias is not None:
62 | m.bias.data.zero_()
63 |
64 |
65 | def forward(self, target_image, ref_imgs):
66 | assert(len(ref_imgs) == self.nb_ref_imgs)
67 | input = [target_image]
68 | input.extend(ref_imgs)
69 | input = torch.cat(input, 1)
70 | out_conv1 = self.conv1(input)
71 | out_conv2 = self.conv2(out_conv1)
72 | out_conv3 = self.conv3(out_conv2)
73 | out_conv4 = self.conv4(out_conv3)
74 | out_conv5 = self.conv5(out_conv4)
75 | out_conv6 = self.conv6(out_conv5)
76 | out_conv7 = self.conv7(out_conv6)
77 | out_conv8 = self.conv8(out_conv7)
78 |
79 | pose = self.pose_pred(out_conv8)
80 | pose = pose.mean(3).mean(2)
81 | pose = 0.01 * pose.view(pose.size(0), self.nb_ref_imgs, 6)
82 |
83 | return pose
84 |
--------------------------------------------------------------------------------
/models/__init__.py:
--------------------------------------------------------------------------------
1 | from .back2future import Model as Back2Future
2 | from .DispNetS import DispNetS
3 | from .DispNetS6 import DispNetS6
4 | from .DispResNet6 import DispResNet6
5 | from .DispResNetS6 import DispResNetS6
6 | from .FlowNetC6 import FlowNetC6
7 | from .MaskNet6 import MaskNet6
8 | from .MaskResNet6 import MaskResNet6
9 | from .PoseExpNet import PoseExpNet
10 | from .PoseNet6 import PoseNet6
11 | from .PoseNetB6 import PoseNetB6
12 |
--------------------------------------------------------------------------------
/models/submodules.py:
--------------------------------------------------------------------------------
1 | import torch.nn as nn
2 | import torch
3 | import numpy as np
4 |
5 | def conv(batchNorm, in_planes, out_planes, kernel_size=3, stride=1):
6 | if batchNorm:
7 | return nn.Sequential(
8 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=(kernel_size-1)//2, bias=True),
9 | nn.BatchNorm2d(out_planes),
10 | #_leaky_relu()
11 | nn.LeakyReLU(0.1,inplace=True)
12 | )
13 | else:
14 | return nn.Sequential(
15 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=(kernel_size-1)//2, bias=True),
16 | #_leaky_relu()
17 | nn.LeakyReLU(0.1,inplace=True)
18 | )
19 |
20 | def i_conv(batchNorm, in_planes, out_planes, kernel_size=3, stride=1, bias = True):
21 | if batchNorm:
22 | return nn.Sequential(
23 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=(kernel_size-1)//2, bias=bias),
24 | nn.BatchNorm2d(out_planes),
25 | )
26 | else:
27 | return nn.Sequential(
28 | nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=(kernel_size-1)//2, bias=bias),
29 | )
30 |
31 | def predict_flow(in_planes):
32 | return nn.Conv2d(in_planes,2,kernel_size=3,stride=1,padding=1,bias=True)
33 |
34 | def deconv(in_planes, out_planes):
35 | return nn.Sequential(
36 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1, bias=True),
37 | #_leaky_relu()
38 | nn.LeakyReLU(0.1,inplace=True)
39 | )
40 |
41 | class tofp16(nn.Module):
42 | def __init__(self):
43 | super(tofp16, self).__init__()
44 |
45 | def forward(self, input):
46 | return input.half()
47 |
48 | class _leaky_relu(nn.Module):
49 | def __init__(self):
50 | super(_leaky_relu, self).__init__()
51 |
52 | def forward(self, x):
53 | x_neg = 0.1*x
54 | return torch.max(x_neg, x)
55 |
56 | class tofp32(nn.Module):
57 | def __init__(self):
58 | super(tofp32, self).__init__()
59 |
60 | def forward(self, input):
61 | return input.float()
62 |
63 |
64 | def save_grad(grads, name):
65 | def hook(grad):
66 | grads[name] = grad
67 | return hook
68 |
--------------------------------------------------------------------------------
/models/utils.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 | import math
3 |
4 | import torch
5 | import torch.nn as nn
6 |
7 |
8 | def conv(in_planes, out_planes, stride=1, batch_norm=False):
9 | if batch_norm:
10 | return nn.Sequential(
11 | nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False),
12 | nn.BatchNorm2d(out_planes, eps=1e-3),
13 | nn.ReLU(inplace=True)
14 | )
15 | else:
16 | return nn.Sequential(
17 | nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=True),
18 | nn.ReLU(inplace=True)
19 | )
20 |
21 |
22 | def deconv(in_planes, out_planes, batch_norm=False):
23 | if batch_norm:
24 | return nn.Sequential(
25 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1, bias=True),
26 | nn.Conv2d(out_planes, out_planes, kernel_size=3, stride=1, padding=1, bias=False),
27 | nn.BatchNorm2d(out_planes, eps=1e-3),
28 | nn.ReLU(inplace=True)
29 | )
30 | else:
31 | return nn.Sequential(
32 | nn.ConvTranspose2d(in_planes, out_planes, kernel_size=4, stride=2, padding=1, bias=True),
33 | nn.Conv2d(out_planes, out_planes, kernel_size=3, stride=1, padding=1, bias=True),
34 | nn.ReLU(inplace=True)
35 | )
36 |
37 |
38 | def predict_depth(in_planes, with_confidence):
39 | return nn.Conv2d(in_planes, 2 if with_confidence else 1, kernel_size=3, stride=1, padding=1, bias=True)
40 |
41 |
42 | def post_process_depth(depth, activation_function=None, clamp=False):
43 | if activation_function is not None:
44 | depth = activation_function(depth)
45 |
46 | if clamp:
47 | depth = depth.clamp(10, 80)
48 |
49 | return depth[:,0]
50 |
51 |
52 | def adaptative_cat(out_conv, out_deconv, out_depth_up):
53 | out_deconv = out_deconv[:, :, :out_conv.size(2), :out_conv.size(3)]
54 | out_depth_up = out_depth_up[:, :, :out_conv.size(2), :out_conv.size(3)]
55 | return torch.cat((out_conv, out_deconv, out_depth_up), 1)
56 |
57 |
58 | def init_modules(net):
59 | for m in net.modules():
60 | if isinstance(m, nn.Conv2d) or isinstance(m, nn.ConvTranspose2d):
61 | n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
62 | m.weight.data.normal_(0, math.sqrt(2/n))
63 | if m.bias is not None:
64 | m.bias.data.zero_()
65 | elif isinstance(m, nn.BatchNorm2d):
66 | m.weight.data.fill_(1)
67 | m.bias.data.zero_()
68 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | torchvision
2 | scipy
3 | argparse
4 | tensorboardX
5 | blessings
6 | progressbar2
7 | path.py
8 | matplotlib
9 | opencv-python
10 | scikit-image
11 | pypng
12 | tqdm
13 | spatial-correlation-sampler
14 |
--------------------------------------------------------------------------------
/run_inference.py:
--------------------------------------------------------------------------------
1 | import torch
2 |
3 | from scipy.misc import imread, imsave, imresize
4 | import numpy as np
5 | from path import Path
6 | import argparse
7 | from tqdm import tqdm
8 |
9 | from models import DispNetS
10 | from utils import tensor2array
11 |
12 | parser = argparse.ArgumentParser(description='Inference script for DispNet learned with \
13 | Structure from Motion Learner inference on KITTI and CityScapes Dataset',
14 | formatter_class=argparse.ArgumentDefaultsHelpFormatter)
15 | parser.add_argument("--output-disp", action='store_true', help="save disparity img")
16 | parser.add_argument("--output-depth", action='store_true', help="save depth img")
17 | parser.add_argument("--pretrained", required=True, type=str, help="pretrained DispNet path")
18 | parser.add_argument("--img-height", default=128, type=int, help="Image height")
19 | parser.add_argument("--img-width", default=416, type=int, help="Image width")
20 | parser.add_argument("--no-resize", action='store_true', help="no resizing is done")
21 |
22 | parser.add_argument("--dataset-list", default=None, type=str, help="Dataset list file")
23 | parser.add_argument("--dataset-dir", default='.', type=str, help="Dataset directory")
24 | parser.add_argument("--output-dir", default='output', type=str, help="Output directory")
25 |
26 | parser.add_argument("--img-exts", default=['png', 'jpg', 'bmp'], nargs='*', type=str, help="images extensions to glob")
27 |
28 |
29 | def main():
30 | args = parser.parse_args()
31 | if not(args.output_disp or args.output_depth):
32 | print('You must at least output one value !')
33 | return
34 |
35 | disp_net = DispNetS().cuda()
36 | weights = torch.load(args.pretrained)
37 | disp_net.load_state_dict(weights['state_dict'])
38 | disp_net.eval()
39 |
40 | dataset_dir = Path(args.dataset_dir)
41 | output_dir = Path(args.output_dir)
42 | output_dir.makedirs_p()
43 |
44 | if args.dataset_list is not None:
45 | with open(args.dataset_list, 'r') as f:
46 | test_files = [dataset_dir/file for file in f.read().splitlines()]
47 | else:
48 | test_files = sum([dataset_dir.files('*.{}'.format(ext)) for ext in args.img_exts], [])
49 |
50 | print('{} files to test'.format(len(test_files)))
51 |
52 | for file in tqdm(test_files):
53 |
54 | img = imread(file).astype(np.float32)
55 |
56 | h,w,_ = img.shape
57 | if (not args.no_resize) and (h != args.img_height or w != args.img_width):
58 | img = imresize(img, (args.img_height, args.img_width)).astype(np.float32)
59 | img = np.transpose(img, (2, 0, 1))
60 |
61 | tensor_img = torch.from_numpy(img).unsqueeze(0)
62 | tensor_img = ((tensor_img/255 - 0.5)/0.2).cuda()
63 | var_img = torch.autograd.Variable(tensor_img, volatile=True)
64 |
65 | output = disp_net(var_img).data.cpu()[0]
66 |
67 | if args.output_disp:
68 | disp = (255*tensor2array(output, max_value=None, colormap='bone')).astype(np.uint8)
69 | imsave(output_dir/'{}_disp{}'.format(file.namebase,file.ext), disp)
70 | if args.output_depth:
71 | depth = 1/output
72 | depth = (255*tensor2array(depth, max_value=10, colormap='rainbow')).astype(np.uint8)
73 | imsave(output_dir/'{}_depth{}'.format(file.namebase,file.ext), depth)
74 |
75 |
76 | if __name__ == '__main__':
77 | main()
78 |
--------------------------------------------------------------------------------
/sintel_eval/pose_evaluation_utils.py:
--------------------------------------------------------------------------------
1 | # Mostly based on the code written by Clement Godard:
2 | # https://github.com/mrharicot/monodepth/blob/master/utils/evaluation_utils.py
3 | import numpy as np
4 | # import pandas as pd
5 | from path import Path
6 | from scipy.misc import imread
7 | from tqdm import tqdm
8 | from .sintel_io import cam_read
9 |
10 | class test_framework_Sintel(object):
11 | def __init__(self, root, sequence_set, seq_length=3, step=1):
12 | self.root = root
13 | self.img_files, self.poses, self.sample_indices = read_scene_data(self.root, sequence_set, seq_length, step)
14 |
15 | def generator(self):
16 | for img_list, pose_list, sample_list in zip(self.img_files, self.poses, self.sample_indices):
17 | for snippet_indices in sample_list:
18 | imgs = [imread(img_list[i]).astype(np.float32) for i in snippet_indices]
19 | poses = [cam_read(pose_list[i], pose_only=True).astype(np.float32) for i in snippet_indices]
20 | poses = np.stack(poses)
21 | first_pose = poses[0]
22 | poses[:,:,-1] -= first_pose[:,-1]
23 | compensated_poses = np.linalg.inv(first_pose[:,:3]) @ poses
24 |
25 | yield {'imgs': imgs,
26 | 'path': img_list[0],
27 | 'poses': compensated_poses
28 | }
29 |
30 | def __iter__(self):
31 | return self.generator()
32 |
33 | def __len__(self):
34 | return sum(len(imgs) for imgs in self.img_files)
35 |
36 |
37 | def read_scene_data(data_root, sequence_set, seq_length=3, step=1):
38 | data_root = Path(data_root)
39 | im_sequences = []
40 | poses_sequences = []
41 | indices_sequences = []
42 | demi_length = (seq_length - 1) // 2
43 | shift_range = np.array([step*i for i in range(-demi_length, demi_length + 1)]).reshape(1, -1)
44 |
45 | sequences = set()
46 | for seq in sequence_set:
47 | corresponding_dirs = set((data_root/'clean').dirs(seq))
48 | sequences = sequences | corresponding_dirs
49 |
50 | print('getting test metadata for theses sequences : {}'.format(sequences))
51 | for sequence in tqdm(sequences):
52 | poses = sorted(Path(sequence.replace('/clean/', '/camdata_left/')).files('*.cam'))
53 | # np.genfromtxt(data_root/'poses'/'{}.txt'.format(sequence.name)).astype(np.float64).reshape(-1, 3, 4)
54 | imgs = sorted(sequence.files('*.png'))
55 | # construct 5-snippet sequences
56 | tgt_indices = np.arange(demi_length, len(imgs) - demi_length).reshape(-1, 1)
57 | snippet_indices = shift_range + tgt_indices
58 | im_sequences.append(imgs)
59 | poses_sequences.append(poses)
60 | indices_sequences.append(snippet_indices)
61 | return im_sequences, poses_sequences, indices_sequences
62 |
--------------------------------------------------------------------------------
/sintel_eval/sintel_io.py:
--------------------------------------------------------------------------------
1 | #! /usr/bin/env python2
2 |
3 | """
4 | I/O script to save and load the data coming with the MPI-Sintel low-level
5 | computer vision benchmark.
6 |
7 | For more details about the benchmark, please visit www.mpi-sintel.de
8 |
9 | CHANGELOG:
10 | v1.0 (2015/02/03): First release
11 |
12 | Copyright (c) 2015 Jonas Wulff
13 | Max Planck Institute for Intelligent Systems, Tuebingen, Germany
14 |
15 | """
16 |
17 | # Requirements: Numpy as PIL/Pillow
18 | import numpy as np
19 | from PIL import Image
20 |
21 | # Check for endianness, based on Daniel Scharstein's optical flow code.
22 | # Using little-endian architecture, these two should be equal.
23 | TAG_FLOAT = 202021.25
24 | TAG_CHAR = 'PIEH'
25 |
26 | def flow_read(filename):
27 | """ Read optical flow from file, return (U,V) tuple.
28 |
29 | Original code by Deqing Sun, adapted from Daniel Scharstein.
30 | """
31 | f = open(filename,'rb')
32 | check = np.fromfile(f,dtype=np.float32,count=1)[0]
33 | assert check == TAG_FLOAT, ' flow_read:: Wrong tag in flow file (should be: {0}, is: {1}). Big-endian machine? '.format(TAG_FLOAT,check)
34 | width = np.fromfile(f,dtype=np.int32,count=1)[0]
35 | height = np.fromfile(f,dtype=np.int32,count=1)[0]
36 | size = width*height
37 | assert width > 0 and height > 0 and size > 1 and size < 100000000, ' flow_read:: Wrong input size (width = {0}, height = {1}).'.format(width,height)
38 | tmp = np.fromfile(f,dtype=np.float32,count=-1).reshape((height,width*2))
39 | u = tmp[:,np.arange(width)*2]
40 | v = tmp[:,np.arange(width)*2 + 1]
41 | return u,v
42 |
43 | def flow_write(filename,uv,v=None):
44 | """ Write optical flow to file.
45 |
46 | If v is None, uv is assumed to contain both u and v channels,
47 | stacked in depth.
48 |
49 | Original code by Deqing Sun, adapted from Daniel Scharstein.
50 | """
51 | nBands = 2
52 |
53 | if v is None:
54 | assert(uv.ndim == 3)
55 | assert(uv.shape[2] == 2)
56 | u = uv[:,:,0]
57 | v = uv[:,:,1]
58 | else:
59 | u = uv
60 |
61 | assert(u.shape == v.shape)
62 | height,width = u.shape
63 | f = open(filename,'wb')
64 | # write the header
65 | f.write(TAG_CHAR)
66 | np.array(width).astype(np.int32).tofile(f)
67 | np.array(height).astype(np.int32).tofile(f)
68 | # arrange into matrix form
69 | tmp = np.zeros((height, width*nBands))
70 | tmp[:,np.arange(width)*2] = u
71 | tmp[:,np.arange(width)*2 + 1] = v
72 | tmp.astype(np.float32).tofile(f)
73 | f.close()
74 |
75 |
76 | def depth_read(filename):
77 | """ Read depth data from file, return as numpy array. """
78 | f = open(filename,'rb')
79 | check = np.fromfile(f,dtype=np.float32,count=1)[0]
80 | assert check == TAG_FLOAT, ' depth_read:: Wrong tag in flow file (should be: {0}, is: {1}). Big-endian machine? '.format(TAG_FLOAT,check)
81 | width = np.fromfile(f,dtype=np.int32,count=1)[0]
82 | height = np.fromfile(f,dtype=np.int32,count=1)[0]
83 | size = width*height
84 | assert width > 0 and height > 0 and size > 1 and size < 100000000, ' depth_read:: Wrong input size (width = {0}, height = {1}).'.format(width,height)
85 | depth = np.fromfile(f,dtype=np.float32,count=-1).reshape((height,width))
86 | return depth
87 |
88 | def depth_write(filename, depth):
89 | """ Write depth to file. """
90 | height,width = depth.shape[:2]
91 | f = open(filename,'wb')
92 | # write the header
93 | f.write(TAG_CHAR)
94 | np.array(width).astype(np.int32).tofile(f)
95 | np.array(height).astype(np.int32).tofile(f)
96 |
97 | depth.astype(np.float32).tofile(f)
98 | f.close()
99 |
100 |
101 | def disparity_write(filename,disparity,bitdepth=16):
102 | """ Write disparity to file.
103 |
104 | bitdepth can be either 16 (default) or 32.
105 |
106 | The maximum disparity is 1024, since the image width in Sintel
107 | is 1024.
108 | """
109 | d = disparity.copy()
110 |
111 | # Clip disparity.
112 | d[d>1024] = 1024
113 | d[d<0] = 0
114 |
115 | d_r = (d / 4.0).astype('uint8')
116 | d_g = ((d * (2.0**6)) % 256).astype('uint8')
117 |
118 | out = np.zeros((d.shape[0],d.shape[1],3),dtype='uint8')
119 | out[:,:,0] = d_r
120 | out[:,:,1] = d_g
121 |
122 | if bitdepth > 16:
123 | d_b = (d * (2**14) % 256).astype('uint8')
124 | out[:,:,2] = d_b
125 |
126 | Image.fromarray(out,'RGB').save(filename,'PNG')
127 |
128 |
129 | def disparity_read(filename):
130 | """ Return disparity read from filename. """
131 | f_in = np.array(Image.open(filename))
132 | d_r = f_in[:,:,0].astype('float64')
133 | d_g = f_in[:,:,1].astype('float64')
134 | d_b = f_in[:,:,2].astype('float64')
135 |
136 | depth = d_r * 4 + d_g / (2**6) + d_b / (2**14)
137 | return depth
138 |
139 |
140 | def cam_read(filename, pose_only=False):
141 | """ Read camera data, return (M,N) tuple.
142 |
143 | M is the intrinsic matrix, N is the extrinsic matrix, so that
144 |
145 | x = M*N*X,
146 | with x being a point in homogeneous image pixel coordinates, X being a
147 | point in homogeneous world coordinates.
148 | """
149 | f = open(filename,'rb')
150 | check = np.fromfile(f,dtype=np.float32,count=1)[0]
151 | assert check == TAG_FLOAT, ' cam_read:: Wrong tag in flow file (should be: {0}, is: {1}). Big-endian machine? '.format(TAG_FLOAT,check)
152 | M = np.fromfile(f,dtype='float64',count=9).reshape((3,3))
153 | N = np.fromfile(f,dtype='float64',count=12).reshape((3,4))
154 | if pose_only:
155 | return N
156 | else:
157 | return M,N
158 |
159 | def cam_write(filename, M, N):
160 | """ Write intrinsic matrix M and extrinsic matrix N to file. """
161 | f = open(filename,'wb')
162 | # write the header
163 | f.write(TAG_CHAR)
164 | M.astype('float64').tofile(f)
165 | N.astype('float64').tofile(f)
166 | f.close()
167 |
168 |
169 | def segmentation_write(filename,segmentation):
170 | """ Write segmentation to file. """
171 |
172 | segmentation_ = segmentation.astype('int32')
173 | seg_r = np.floor(segmentation_ / (256**2)).astype('uint8')
174 | seg_g = np.floor((segmentation_ % (256**2)) / 256).astype('uint8')
175 | seg_b = np.floor(segmentation_ % 256).astype('uint8')
176 |
177 | out = np.zeros((segmentation.shape[0],segmentation.shape[1],3),dtype='uint8')
178 | out[:,:,0] = seg_r
179 | out[:,:,1] = seg_g
180 | out[:,:,2] = seg_b
181 |
182 | Image.fromarray(out,'RGB').save(filename,'PNG')
183 |
184 |
185 | def segmentation_read(filename):
186 | """ Return disparity read from filename. """
187 | f_in = np.array(Image.open(filename))
188 | seg_r = f_in[:,:,0].astype('int32')
189 | seg_g = f_in[:,:,1].astype('int32')
190 | seg_b = f_in[:,:,2].astype('int32')
191 |
192 | segmentation = (seg_r * 256 + seg_g) * 256 + seg_b
193 | return segmentation
194 |
--------------------------------------------------------------------------------
/ssim.py:
--------------------------------------------------------------------------------
1 | # Author: Jonas Wulff
2 |
3 | import torch
4 | import torch.nn.functional as F
5 | from torch.autograd import Variable
6 | import numpy as np
7 | from math import exp
8 |
9 | def gaussian(window_size, sigma):
10 | gauss = torch.Tensor([exp(-(x - window_size//2)**2/float(2*sigma**2)) for x in range(window_size)])
11 | return gauss/gauss.sum()
12 |
13 | def create_window(window_size, channel):
14 | _1D_window = gaussian(window_size, 1.5).unsqueeze(1)
15 | _2D_window = _1D_window.mm(_1D_window.t()).float().unsqueeze(0).unsqueeze(0)
16 | window = Variable(_2D_window.expand(channel, 1, window_size, window_size).contiguous(), requires_grad=False)
17 | return window
18 |
19 | def _ssim(img1, img2, window, window_size, channel, size_average = True):
20 | mu1 = F.conv2d(img1, window, padding = window_size//2, groups = channel)
21 | mu2 = F.conv2d(img2, window, padding = window_size//2, groups = channel)
22 |
23 | mu1_sq = mu1.pow(2)
24 | mu2_sq = mu2.pow(2)
25 | mu1_mu2 = mu1*mu2
26 |
27 | sigma1_sq = F.conv2d(img1*img1, window, padding = window_size//2, groups = channel) - mu1_sq
28 | sigma2_sq = F.conv2d(img2*img2, window, padding = window_size//2, groups = channel) - mu2_sq
29 | sigma12 = F.conv2d(img1*img2, window, padding = window_size//2, groups = channel) - mu1_mu2
30 |
31 | C1 = 0.01**2
32 | C2 = 0.03**2
33 |
34 | ssim_map = ((2*mu1_mu2 + C1)*(2*sigma12 + C2))/((mu1_sq + mu2_sq + C1)*(sigma1_sq + sigma2_sq + C2))
35 |
36 | return ssim_map
37 | #if size_average:
38 | # return ssim_map.mean()
39 | #else:
40 | # return ssim_map.mean(1).mean(1).mean(1)
41 |
42 | class SSIM(torch.nn.Module):
43 | def __init__(self, window_size = 11, size_average = True):
44 | super(SSIM, self).__init__()
45 | self.window_size = window_size
46 | self.size_average = size_average
47 | self.channel = 1
48 | self.window = create_window(window_size, self.channel)
49 |
50 | def forward(self, img1, img2):
51 | (_, channel, _, _) = img1.size()
52 |
53 | if channel == self.channel and self.window.data.type() == img1.data.type():
54 | window = self.window
55 | else:
56 | window = create_window(self.window_size, channel)
57 |
58 | if img1.is_cuda:
59 | window = window.cuda(img1.get_device())
60 | window = window.type_as(img1)
61 |
62 | self.window = window
63 | self.channel = channel
64 |
65 |
66 | return _ssim(img1, img2, window, self.window_size, channel, self.size_average)
67 |
68 | def ssim(img1, img2, window_size = 13, size_average = True):
69 | (_, channel, _, _) = img1.size()
70 | window = create_window(window_size, channel)
71 |
72 | if img1.is_cuda:
73 | window = window.cuda(img1.get_device())
74 | window = window.type_as(img1)
75 |
76 | return _ssim(img1, img2, window, window_size, channel, size_average)
77 |
--------------------------------------------------------------------------------
/stillbox_eval/depth_evaluation_utils.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import json
3 | from path import Path
4 | from scipy.misc import imread
5 | from tqdm import tqdm
6 |
7 |
8 | class test_framework_stillbox(object):
9 | def __init__(self, root, test_files, seq_length=3, min_depth=1e-3, max_depth=80, step=1):
10 | self.root = root
11 | self.min_depth, self.max_depth = min_depth, max_depth
12 | self.gt_files, self.img_files, self.displacements = read_scene_data(self.root, test_files, seq_length, step)
13 |
14 | def __getitem__(self, i):
15 | tgt = imread(self.img_files[i][0]).astype(np.float32)
16 | depth = np.load(self.gt_files[i])
17 | return {'tgt': tgt,
18 | 'ref': [imread(img).astype(np.float32) for img in self.img_files[i][1]],
19 | 'path':self.img_files[i][0],
20 | 'gt_depth': depth,
21 | 'displacements': np.array(self.displacements[i]),
22 | 'mask': generate_mask(depth, self.min_depth, self.max_depth)
23 | }
24 |
25 | def __len__(self):
26 | return len(self.img_files)
27 |
28 |
29 | def get_displacements(scene, index, ref_indices):
30 | speed = np.around(np.linalg.norm(scene['speed']), decimals=3)
31 | assert(all(i < scene['length'] and i >= 0 for i in ref_indices)), str(ref_indices)
32 | return [speed*scene['time_step']*abs(index - i) for i in ref_indices]
33 |
34 |
35 | def read_scene_data(data_root, test_list, seq_length=3, step=1):
36 | data_root = Path(data_root)
37 | metadata_files = {}
38 | for folder in data_root.dirs():
39 | with open(folder/'metadata.json', 'r') as f:
40 | metadata_files[str(folder.name)] = json.load(f)
41 | gt_files = []
42 | im_files = []
43 | displacements = []
44 | demi_length = (seq_length - 1) // 2
45 | shift_range = [step*i for i in list(range(-demi_length,0)) + list(range(1, demi_length + 1))]
46 |
47 | print('getting test metadata ... ')
48 | for sample in tqdm(test_list):
49 | folder, file = sample.split('/')
50 | _, scene_index, index = file[:-4].split('_') # filename is in the form 'RGB_XXXX_XX.jpg'
51 | index = int(index)
52 | scene = metadata_files[folder]['scenes'][int(scene_index)]
53 | tgt_img_path = data_root/sample
54 | folder_path = data_root/folder
55 | if tgt_img_path.isfile():
56 | capped_indices_range = list(map(lambda x: min(max(0, index + x), scene['length'] - 1), shift_range))
57 | ref_imgs_path = [folder_path/'{}'.format(scene['imgs'][ref_index]) for ref_index in capped_indices_range]
58 |
59 | gt_files.append(folder_path/'{}'.format(scene['depth'][index]))
60 | im_files.append([tgt_img_path,ref_imgs_path])
61 | displacements.append(get_displacements(scene, index, capped_indices_range))
62 | else:
63 | print('{} missing'.format(tgt_img_path))
64 |
65 | return gt_files, im_files, displacements
66 |
67 |
68 | def generate_mask(gt_depth, min_depth, max_depth):
69 | mask = np.logical_and(gt_depth > min_depth,
70 | gt_depth < max_depth)
71 | # crop gt to exclude border values
72 | # if used on gt_size 100x100 produces a crop of [-95, -5, 5, 95]
73 | gt_height, gt_width = gt_depth.shape
74 | crop = np.array([0.05 * gt_height, 0.95 * gt_height,
75 | 0.05 * gt_width, 0.95 * gt_width]).astype(np.int32)
76 |
77 | crop_mask = np.zeros(mask.shape)
78 | crop_mask[crop[0]:crop[1],crop[2]:crop[3]] = 1
79 | mask = np.logical_and(mask, crop_mask)
80 | return mask
81 |
--------------------------------------------------------------------------------
/stillbox_eval/test_files_90.txt:
--------------------------------------------------------------------------------
1 | 15/RGB_112_008.jpg
2 | 15/RGB_178_002.jpg
3 | 15/RGB_167_006.jpg
4 | 15/RGB_153_007.jpg
5 | 15/RGB_119_002.jpg
6 | 15/RGB_135_003.jpg
7 | 15/RGB_44_006.jpg
8 | 15/RGB_32_002.jpg
9 | 15/RGB_171_001.jpg
10 | 15/RGB_114_009.jpg
11 | 15/RGB_89_003.jpg
12 | 15/RGB_197_009.jpg
13 | 15/RGB_105_000.jpg
14 | 15/RGB_72_004.jpg
15 | 15/RGB_66_003.jpg
16 | 15/RGB_25_007.jpg
17 | 15/RGB_58_004.jpg
18 | 15/RGB_28_003.jpg
19 | 15/RGB_25_004.jpg
20 | 15/RGB_140_003.jpg
21 | 15/RGB_59_008.jpg
22 | 15/RGB_19_001.jpg
23 | 15/RGB_186_003.jpg
24 | 15/RGB_113_009.jpg
25 | 15/RGB_54_002.jpg
26 | 15/RGB_130_003.jpg
27 | 15/RGB_153_003.jpg
28 | 15/RGB_103_007.jpg
29 | 15/RGB_04_007.jpg
30 | 15/RGB_110_008.jpg
31 | 15/RGB_78_005.jpg
32 | 15/RGB_26_005.jpg
33 | 15/RGB_43_007.jpg
34 | 15/RGB_190_003.jpg
35 | 15/RGB_122_002.jpg
36 | 15/RGB_102_008.jpg
37 | 15/RGB_187_004.jpg
38 | 15/RGB_03_005.jpg
39 | 15/RGB_58_007.jpg
40 | 15/RGB_37_004.jpg
41 | 15/RGB_125_003.jpg
42 | 15/RGB_190_002.jpg
43 | 15/RGB_52_006.jpg
44 | 15/RGB_37_005.jpg
45 | 15/RGB_196_001.jpg
46 | 15/RGB_53_003.jpg
47 | 15/RGB_129_008.jpg
48 | 15/RGB_74_003.jpg
49 | 15/RGB_167_000.jpg
50 | 15/RGB_195_002.jpg
51 | 15/RGB_10_007.jpg
52 | 15/RGB_131_003.jpg
53 | 15/RGB_37_003.jpg
54 | 15/RGB_38_009.jpg
55 | 15/RGB_115_004.jpg
56 | 15/RGB_91_008.jpg
57 | 15/RGB_43_004.jpg
58 | 15/RGB_187_005.jpg
59 | 15/RGB_112_003.jpg
60 | 15/RGB_19_002.jpg
61 | 15/RGB_170_008.jpg
62 | 15/RGB_17_000.jpg
63 | 15/RGB_62_005.jpg
64 | 15/RGB_148_004.jpg
65 | 15/RGB_12_008.jpg
66 | 15/RGB_169_004.jpg
67 | 15/RGB_112_004.jpg
68 | 15/RGB_71_001.jpg
69 | 15/RGB_103_001.jpg
70 | 15/RGB_178_005.jpg
71 | 15/RGB_92_006.jpg
72 | 15/RGB_40_009.jpg
73 | 15/RGB_138_006.jpg
74 | 15/RGB_146_005.jpg
75 | 15/RGB_04_006.jpg
76 | 15/RGB_02_008.jpg
77 | 15/RGB_101_009.jpg
78 | 15/RGB_103_009.jpg
79 | 15/RGB_21_002.jpg
80 | 15/RGB_144_008.jpg
81 | 15/RGB_163_007.jpg
82 | 15/RGB_06_001.jpg
83 | 15/RGB_105_004.jpg
84 | 15/RGB_199_009.jpg
85 | 15/RGB_149_005.jpg
86 | 15/RGB_63_008.jpg
87 | 15/RGB_21_004.jpg
88 | 15/RGB_03_002.jpg
89 | 15/RGB_51_008.jpg
90 | 15/RGB_110_001.jpg
91 | 15/RGB_172_009.jpg
92 | 15/RGB_158_005.jpg
93 | 15/RGB_49_004.jpg
94 | 15/RGB_173_008.jpg
95 | 15/RGB_99_004.jpg
96 | 15/RGB_24_001.jpg
97 | 15/RGB_03_009.jpg
98 | 15/RGB_41_009.jpg
99 | 15/RGB_91_002.jpg
100 | 15/RGB_132_001.jpg
101 | 15/RGB_95_003.jpg
102 | 15/RGB_167_005.jpg
103 | 15/RGB_176_000.jpg
104 | 15/RGB_142_008.jpg
105 | 15/RGB_107_009.jpg
106 | 15/RGB_122_005.jpg
107 | 15/RGB_48_001.jpg
108 | 15/RGB_103_005.jpg
109 | 15/RGB_98_009.jpg
110 | 15/RGB_162_001.jpg
111 | 15/RGB_08_006.jpg
112 | 15/RGB_169_002.jpg
113 | 15/RGB_57_002.jpg
114 | 15/RGB_86_004.jpg
115 | 15/RGB_138_001.jpg
116 | 15/RGB_05_005.jpg
117 | 15/RGB_95_002.jpg
118 | 15/RGB_28_002.jpg
119 | 15/RGB_110_002.jpg
120 | 15/RGB_102_002.jpg
121 | 15/RGB_136_009.jpg
122 | 15/RGB_28_007.jpg
123 | 15/RGB_43_005.jpg
124 | 15/RGB_39_006.jpg
125 | 15/RGB_126_003.jpg
126 | 15/RGB_62_001.jpg
127 | 15/RGB_82_003.jpg
128 | 15/RGB_75_008.jpg
129 | 15/RGB_16_005.jpg
130 | 15/RGB_94_005.jpg
131 | 15/RGB_198_002.jpg
132 | 15/RGB_90_001.jpg
133 | 15/RGB_22_001.jpg
134 | 15/RGB_90_000.jpg
135 | 15/RGB_155_006.jpg
136 | 15/RGB_124_007.jpg
137 | 15/RGB_168_004.jpg
138 | 15/RGB_96_008.jpg
139 | 15/RGB_100_002.jpg
140 | 15/RGB_131_008.jpg
141 | 15/RGB_74_002.jpg
142 | 15/RGB_141_007.jpg
143 | 15/RGB_139_001.jpg
144 | 15/RGB_102_005.jpg
145 | 15/RGB_182_009.jpg
146 | 15/RGB_37_002.jpg
147 | 15/RGB_67_003.jpg
148 | 15/RGB_60_001.jpg
149 | 15/RGB_186_001.jpg
150 | 15/RGB_171_002.jpg
151 | 15/RGB_155_004.jpg
152 | 15/RGB_50_008.jpg
153 | 15/RGB_34_002.jpg
154 | 15/RGB_132_003.jpg
155 | 15/RGB_147_005.jpg
156 | 15/RGB_99_008.jpg
157 | 15/RGB_110_000.jpg
158 | 15/RGB_114_008.jpg
159 | 15/RGB_159_002.jpg
160 | 15/RGB_76_007.jpg
161 | 15/RGB_116_005.jpg
162 | 15/RGB_67_002.jpg
163 | 15/RGB_80_003.jpg
164 | 15/RGB_30_000.jpg
165 | 15/RGB_137_009.jpg
166 | 15/RGB_130_002.jpg
167 | 15/RGB_90_002.jpg
168 | 15/RGB_34_008.jpg
169 | 15/RGB_137_007.jpg
170 | 15/RGB_45_001.jpg
171 | 15/RGB_131_004.jpg
172 | 15/RGB_06_000.jpg
173 | 15/RGB_68_005.jpg
174 | 15/RGB_104_008.jpg
175 | 15/RGB_193_008.jpg
176 | 15/RGB_182_000.jpg
177 | 15/RGB_129_006.jpg
178 | 15/RGB_107_005.jpg
179 | 15/RGB_158_007.jpg
180 | 15/RGB_192_001.jpg
181 | 15/RGB_18_005.jpg
182 | 15/RGB_90_009.jpg
183 | 15/RGB_18_007.jpg
184 | 15/RGB_94_000.jpg
185 | 15/RGB_09_002.jpg
186 | 15/RGB_94_001.jpg
187 | 15/RGB_46_004.jpg
188 | 15/RGB_126_000.jpg
189 | 15/RGB_146_002.jpg
190 | 15/RGB_161_006.jpg
191 | 15/RGB_154_008.jpg
192 | 15/RGB_94_003.jpg
193 |
--------------------------------------------------------------------------------
/submit_flow.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import os
3 | from tqdm import tqdm
4 | import numpy as np
5 | from path import Path
6 | from tensorboardX import SummaryWriter
7 | import torch
8 | from torch.autograd import Variable
9 | import torch.nn as nn
10 |
11 | import custom_transforms
12 | from inverse_warp import pose2flow
13 | from datasets.validation_flow import KITTI2015Test
14 | import models
15 | from logger import AverageMeter
16 | from PIL import Image
17 | from torchvision.transforms import ToPILImage
18 | from flowutils.flowlib import flow_to_image
19 | from utils import tensor2array
20 | from loss_functions import compute_all_epes
21 | from flowutils import flow_io
22 |
23 |
24 | parser = argparse.ArgumentParser(description='Structure from Motion Learner training on KITTI and CityScapes Dataset',
25 | formatter_class=argparse.ArgumentDefaultsHelpFormatter)
26 | parser.add_argument('--kitti-dir', dest='kitti_dir', type=str, default='/ps/project/datasets/AllFlowData/kitti/kitti2015',
27 | help='Path to kitti2015 scene flow dataset for optical flow validation')
28 | parser.add_argument('--dispnet', dest='dispnet', type=str, default='DispResNet6', choices=['DispResNet6', 'DispNetS5', 'DispNetS6'],
29 | help='depth network architecture.')
30 | parser.add_argument('--posenet', dest='posenet', type=str, default='PoseNetB6', choices=['PoseNet6','PoseNetB6', 'PoseExpNet5', 'PoseExpNet6'],
31 | help='pose and explainabity mask network architecture. ')
32 | parser.add_argument('--masknet', dest='masknet', type=str, default='MaskNet6', choices=['MaskResNet6', 'MaskNet6', 'PoseExpNet5', 'PoseExpNet6'],
33 | help='pose and explainabity mask network architecture. ')
34 | parser.add_argument('--flownet', dest='flownet', type=str, default='Back2Future', choices=['PWCNet','FlowNetS', 'Back2Future', 'FlowNetC5','FlowNetC6', 'SpyNet'],
35 | help='flow network architecture. Options: FlowNetS | SpyNet')
36 |
37 | parser.add_argument('--DEBUG', action='store_true', help='DEBUG Mode')
38 | parser.add_argument('--THRESH', dest='THRESH', type=float, default=0.01, help='THRESH')
39 | parser.add_argument('--mu', dest='mu', type=float, default=1.0, help='mu')
40 | parser.add_argument('--pretrained-path', dest='pretrained_path', default=None, metavar='PATH', help='path to pre-trained dispnet model')
41 | parser.add_argument('--nlevels', dest='nlevels', type=int, default=6, help='number of levels in multiscale. Options: 4|5')
42 | parser.add_argument('--dataset', dest='dataset', default='kitti2015', help='path to pre-trained Flow net model')
43 | parser.add_argument('--output-dir', dest='output_dir', type=str, default=None, help='path to output directory')
44 |
45 |
46 | def main():
47 | global args
48 | args = parser.parse_args()
49 | args.pretrained_path = Path(args.pretrained_path)
50 |
51 | if args.output_dir is not None:
52 | args.output_dir = Path(args.output_dir)
53 | args.output_dir.makedirs_p()
54 |
55 | image_dir = args.output_dir/'images'
56 | mask_dir = args.output_dir/'mask'
57 | viz_dir = args.output_dir/'viz'
58 | testing_dir = args.output_dir/'testing'
59 | testing_dir_flo = args.output_dir/'testing_flo'
60 |
61 | image_dir.makedirs_p()
62 | mask_dir.makedirs_p()
63 | viz_dir.makedirs_p()
64 | testing_dir.makedirs_p()
65 | testing_dir_flo.makedirs_p()
66 |
67 | normalize = custom_transforms.Normalize(mean=[0.5, 0.5, 0.5],
68 | std=[0.5, 0.5, 0.5])
69 | flow_loader_h, flow_loader_w = 256, 832
70 | valid_flow_transform = custom_transforms.Compose([custom_transforms.Scale(h=flow_loader_h, w=flow_loader_w),
71 | custom_transforms.ArrayToTensor(), normalize])
72 |
73 | val_flow_set = KITTI2015Test(root=args.kitti_dir,
74 | sequence_length=5, transform=valid_flow_transform)
75 |
76 | if args.DEBUG:
77 | print("DEBUG MODE: Using Training Set")
78 | val_flow_set = KITTI2015Test(root=args.kitti_dir,
79 | sequence_length=5, transform=valid_flow_transform, phase='training')
80 |
81 | val_loader = torch.utils.data.DataLoader(val_flow_set, batch_size=1, shuffle=False,
82 | num_workers=2, pin_memory=True, drop_last=True)
83 |
84 | disp_net = getattr(models, args.dispnet)().cuda()
85 | pose_net = getattr(models, args.posenet)(nb_ref_imgs=4).cuda()
86 | mask_net = getattr(models, args.masknet)(nb_ref_imgs=4).cuda()
87 | flow_net = getattr(models, args.flownet)(nlevels=args.nlevels).cuda()
88 |
89 | dispnet_weights = torch.load(args.pretrained_path/'dispnet_model_best.pth.tar')
90 | posenet_weights = torch.load(args.pretrained_path/'posenet_model_best.pth.tar')
91 | masknet_weights = torch.load(args.pretrained_path/'masknet_model_best.pth.tar')
92 | flownet_weights = torch.load(args.pretrained_path/'flownet_model_best.pth.tar')
93 | disp_net.load_state_dict(dispnet_weights['state_dict'])
94 | pose_net.load_state_dict(posenet_weights['state_dict'])
95 | flow_net.load_state_dict(flownet_weights['state_dict'])
96 | mask_net.load_state_dict(masknet_weights['state_dict'])
97 |
98 | disp_net.eval()
99 | pose_net.eval()
100 | mask_net.eval()
101 | flow_net.eval()
102 |
103 | for i, (tgt_img, ref_imgs, intrinsics, intrinsics_inv, tgt_img_original) in enumerate(tqdm(val_loader)):
104 | tgt_img_var = Variable(tgt_img.cuda(), volatile=True)
105 | ref_imgs_var = [Variable(img.cuda(), volatile=True) for img in ref_imgs]
106 | intrinsics_var = Variable(intrinsics.cuda(), volatile=True)
107 | intrinsics_inv_var = Variable(intrinsics_inv.cuda(), volatile=True)
108 |
109 | disp = disp_net(tgt_img_var)
110 | depth = 1/disp
111 | pose = pose_net(tgt_img_var, ref_imgs_var)
112 | explainability_mask = mask_net(tgt_img_var, ref_imgs_var)
113 | if args.flownet=='Back2Future':
114 | flow_fwd, _, _ = flow_net(tgt_img_var, ref_imgs_var[1:3])
115 | else:
116 | flow_fwd = flow_net(tgt_img_var, ref_imgs_var[2])
117 | flow_cam = pose2flow(depth.squeeze(1), pose[:,2], intrinsics_var, intrinsics_inv_var)
118 |
119 | rigidity_mask = 1 - (1-explainability_mask[:,1])*(1-explainability_mask[:,2]).unsqueeze(1) > 0.5
120 |
121 | rigidity_mask_census_soft = (flow_cam - flow_fwd).abs()#.normalize()
122 | rigidity_mask_census_u = rigidity_mask_census_soft[:,0] < args.THRESH
123 | rigidity_mask_census_v = rigidity_mask_census_soft[:,1] < args.THRESH
124 | rigidity_mask_census = (rigidity_mask_census_u).type_as(flow_fwd) * (rigidity_mask_census_v).type_as(flow_fwd)
125 | rigidity_mask_combined = 1 - (1-rigidity_mask.type_as(explainability_mask))*(1-rigidity_mask_census.type_as(explainability_mask))
126 |
127 | _, _, h_pred, w_pred = flow_cam.size()
128 | _, _, h_gt, w_gt = tgt_img_original.size()
129 | rigidity_pred_mask = nn.functional.upsample(rigidity_mask_combined, size=(h_pred, w_pred), mode='bilinear')
130 |
131 | non_rigid_pred = (rigidity_pred_mask<=args.THRESH).type_as(flow_fwd).expand_as(flow_fwd) * flow_fwd
132 | rigid_pred = (rigidity_pred_mask>args.THRESH).type_as(flow_cam).expand_as(flow_cam) * flow_cam
133 | total_pred = non_rigid_pred + rigid_pred
134 |
135 | pred_fullres = nn.functional.upsample(total_pred, size=(h_gt, w_gt), mode='bilinear')
136 | pred_fullres[:,0,:,:] = pred_fullres[:,0,:,:] * (w_gt/w_pred)
137 | pred_fullres[:,1,:,:] = pred_fullres[:,1,:,:] * (h_gt/h_pred)
138 |
139 | flow_fwd_fullres = nn.functional.upsample(flow_fwd, size=(h_gt, w_gt), mode='bilinear')
140 | flow_fwd_fullres[:,0,:,:] = flow_fwd_fullres[:,0,:,:] * (w_gt/w_pred)
141 | flow_fwd_fullres[:,1,:,:] = flow_fwd_fullres[:,1,:,:] * (h_gt/h_pred)
142 |
143 | flow_cam_fullres = nn.functional.upsample(flow_cam, size=(h_gt, w_gt), mode='bilinear')
144 | flow_cam_fullres[:,0,:,:] = flow_cam_fullres[:,0,:,:] * (w_gt/w_pred)
145 | flow_cam_fullres[:,1,:,:] = flow_cam_fullres[:,1,:,:] * (h_gt/h_pred)
146 |
147 | tgt_img_np = tgt_img[0].numpy()
148 | rigidity_mask_combined_np = rigidity_mask_combined.cpu().data[0].numpy()
149 |
150 | if args.output_dir is not None:
151 | np.save(image_dir/str(i).zfill(3), tgt_img_np )
152 | np.save(mask_dir/str(i).zfill(3), rigidity_mask_combined_np)
153 | pred_u = pred_fullres[0][0].data.cpu().numpy()
154 | pred_v = pred_fullres[0][1].data.cpu().numpy()
155 | flow_io.flow_write_png(testing_dir/str(i).zfill(6)+'_10.png' ,u=pred_u, v=pred_v)
156 | flow_io.flow_write(testing_dir_flo/str(i).zfill(6)+'_10.flo' ,pred_u, pred_v)
157 |
158 |
159 |
160 | if (args.output_dir is not None):
161 | ind = int(i)
162 | tgt_img_viz = tensor2array(tgt_img[0].cpu())
163 | depth_viz = tensor2array(disp.data[0].cpu(), max_value=None, colormap='magma')
164 | mask_viz = tensor2array(rigidity_mask_combined.data[0].cpu(), max_value=1, colormap='magma')
165 | row2_viz = flow_to_image(np.hstack((tensor2array(flow_cam_fullres.data[0].cpu()),
166 | tensor2array(flow_fwd_fullres.data[0].cpu()),
167 | tensor2array(pred_fullres.data[0].cpu()) )) )
168 |
169 | row1_viz = np.hstack((tgt_img_viz, depth_viz, mask_viz))
170 |
171 | row1_viz_im = Image.fromarray((255*row1_viz.transpose(1,2,0)).astype('uint8'))
172 | row2_viz_im = Image.fromarray((255*row2_viz.transpose(1,2,0)).astype('uint8'))
173 |
174 | row1_viz_im.save(viz_dir/str(i).zfill(3)+'01.png')
175 | row2_viz_im.save(viz_dir/str(i).zfill(3)+'02.png')
176 |
177 | print("Done!")
178 | # print("\t {:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10} ".format(*error_names))
179 | # print("Errors \t {:10.4f}, {:10.4f} {:10.4f}, {:10.4f} {:10.4f}, {:10.4f}".format(*errors.avg))
180 |
181 |
182 | if __name__ == '__main__':
183 | main()
184 |
--------------------------------------------------------------------------------
/test_back2future.py:
--------------------------------------------------------------------------------
1 | # Author: Anurag Ranjan
2 | # Copyright (c) 2019, Anurag Ranjan
3 | # All rights reserved.
4 |
5 | import argparse
6 | from loss_functions import compute_epe
7 | import custom_transforms
8 | from datasets.validation_flow import ValidationFlow, ValidationFlowKitti2012
9 | import torch
10 | from torch.autograd import Variable
11 | import models
12 | from logger import AverageMeter
13 | from loss_functions import compute_all_epes
14 | from tqdm import tqdm
15 |
16 | parser = argparse.ArgumentParser(description='Structure from Motion Learner training on KITTI and CityScapes Dataset',
17 | formatter_class=argparse.ArgumentDefaultsHelpFormatter)
18 | parser.add_argument('--flownet', dest='flownet', type=str, default='Back2Future', choices=['Back2Future'],
19 | help='flow network architecture. Options: FlowNetS | SpyNet')
20 | parser.add_argument('--nlevels', dest='nlevels', type=int, default=5,
21 | help='number of levels in multiscale. Options: 4|5|6')
22 | parser.add_argument('--pretrained-flow', dest='pretrained_flow', default=None, metavar='PATH',
23 | help='path to pre-trained Flow net model')
24 | parser.add_argument('--dataset', dest='dataset', default='kitti2015', choices=['kitti2015', 'kitti2012'],
25 | help='path to pre-trained Flow net model')
26 |
27 |
28 | def main():
29 | global args
30 | args = parser.parse_args()
31 | normalize = custom_transforms.Normalize(mean=[0.5, 0.5, 0.5],
32 | std=[0.5, 0.5, 0.5])
33 | flow_loader_h, flow_loader_w = 256, 832
34 | valid_flow_transform = custom_transforms.Compose([custom_transforms.Scale(h=flow_loader_h, w=flow_loader_w),
35 | custom_transforms.ArrayToTensor(), normalize])
36 | if args.dataset == "kitti2015":
37 | val_flow_set = ValidationFlow(root='/home/anuragr/datasets/kitti/kitti2015',
38 | sequence_length=5, transform=valid_flow_transform)
39 | elif args.dataset == "kitti2012":
40 | val_flow_set = ValidationFlowKitti2012(root='/is/ps2/aranjan/AllFlowData/kitti/kitti2012',
41 | sequence_length=5, transform=valid_flow_transform)
42 |
43 | val_flow_loader = torch.utils.data.DataLoader(val_flow_set, batch_size=1, shuffle=False,
44 | num_workers=2, pin_memory=True, drop_last=True)
45 |
46 | flow_net = getattr(models, args.flownet)(nlevels=args.nlevels).cuda()
47 |
48 | if args.pretrained_flow:
49 | print("=> using pre-trained weights from {}".format(args.pretrained_flow))
50 | weights = torch.load(args.pretrained_flow)
51 | flow_net.load_state_dict(weights['state_dict'])#, strict=False)
52 |
53 | flow_net = flow_net.cuda()
54 | flow_net.eval()
55 | error_names = ['epe_total', 'epe_non_rigid', 'epe_rigid', 'outliers']
56 | errors = AverageMeter(i=len(error_names))
57 |
58 | for i, (tgt_img, ref_imgs, intrinsics, intrinsics_inv, flow_gt, obj_map) in enumerate(tqdm(val_flow_loader)):
59 | tgt_img_var = Variable(tgt_img.cuda(), volatile=True)
60 | if args.dataset=="kitti2015":
61 | ref_imgs_var = [Variable(img.cuda(), volatile=True) for img in ref_imgs]
62 | ref_img_var = ref_imgs_var[1:3]
63 | elif args.dataset=="kitti2012":
64 | ref_img_var = Variable(ref_imgs.cuda(), volatile=True)
65 |
66 | flow_gt_var = Variable(flow_gt.cuda(), volatile=True)
67 | # compute output
68 | flow_fwd, flow_bwd, occ = flow_net(tgt_img_var, ref_img_var)
69 | #epe = compute_epe(gt=flow_gt_var, pred=flow_fwd)
70 | obj_map_gt_var = Variable(obj_map.cuda(), volatile=True)
71 | obj_map_gt_var_expanded = obj_map_gt_var.unsqueeze(1).type_as(flow_fwd)
72 |
73 | epe = compute_all_epes(flow_gt_var, flow_fwd, flow_fwd, (1-obj_map_gt_var_expanded) )
74 | #print(i, epe)
75 | errors.update(epe)
76 |
77 | print("Averge EPE",errors.avg )
78 |
79 |
80 |
81 | if __name__ == '__main__':
82 | main()
83 |
--------------------------------------------------------------------------------
/test_disp.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from torch.autograd import Variable
3 | from PIL import Image
4 | from scipy import interpolate
5 | from scipy.misc import imresize
6 | from scipy.ndimage.interpolation import zoom
7 | import numpy as np
8 | from path import Path
9 | import argparse
10 | from tqdm import tqdm
11 | from utils import tensor2array
12 | import models
13 | from loss_functions import spatial_normalize
14 |
15 | parser = argparse.ArgumentParser(description='Script for DispNet testing with corresponding groundTruth',
16 | formatter_class=argparse.ArgumentDefaultsHelpFormatter)
17 | parser.add_argument("--dispnet", dest='dispnet', type=str, default='DispResNet6', help='dispnet architecture')
18 | parser.add_argument("--posenet", dest='posenet', type=str, default='PoseExpNet', help='posenet architecture')
19 | parser.add_argument("--pretrained-dispnet", required=True, type=str, help="pretrained DispNet path")
20 | parser.add_argument("--pretrained-posenet", default=None, type=str, help="pretrained PoseNet path (for scale factor)")
21 | parser.add_argument("--img-height", default=256, type=int, help="Image height")
22 | parser.add_argument("--img-width", default=832, type=int, help="Image width")
23 | parser.add_argument("--no-resize", action='store_true', help="no resizing is done")
24 | parser.add_argument("--spatial-normalize", action='store_true', help="spatial normalization")
25 | parser.add_argument("--min-depth", default=1e-3)
26 | parser.add_argument("--max-depth", default=80, type=float)
27 |
28 | parser.add_argument("--dataset-dir", default='.', type=str, help="Dataset directory")
29 | parser.add_argument("--dataset-list", default=None, type=str, help="Dataset list file")
30 | parser.add_argument("--output-dir", default=None, type=str, help="Output directory for saving predictions in a big 3D numpy file")
31 |
32 | parser.add_argument("--gt-type", default='KITTI', type=str, help="GroundTruth data type", choices=['npy', 'png', 'KITTI', 'stillbox'])
33 | parser.add_argument("--img-exts", default=['png', 'jpg', 'bmp'], nargs='*', type=str, help="images extensions to glob")
34 |
35 |
36 | def main():
37 | args = parser.parse_args()
38 | if args.gt_type == 'KITTI':
39 | from kitti_eval.depth_evaluation_utils import test_framework_KITTI as test_framework
40 | elif args.gt_type == 'stillbox':
41 | from stillbox_eval.depth_evaluation_utils import test_framework_stillbox as test_framework
42 |
43 | disp_net = getattr(models, args.dispnet)().cuda()
44 | weights = torch.load(args.pretrained_dispnet)
45 | disp_net.load_state_dict(weights['state_dict'])
46 | disp_net.eval()
47 |
48 | if args.pretrained_posenet is None:
49 | print('no PoseNet specified, scale_factor will be determined by median ratio, which is kiiinda cheating\
50 | (but consistent with original paper)')
51 | seq_length = 0
52 | else:
53 | weights = torch.load(args.pretrained_posenet)
54 | seq_length = int(weights['state_dict']['conv1.0.weight'].size(1)/3)
55 | pose_net = getattr(models, args.posenet)(nb_ref_imgs=seq_length - 1, output_exp=False).cuda()
56 | pose_net.load_state_dict(weights['state_dict'], strict=False)
57 |
58 | dataset_dir = Path(args.dataset_dir)
59 | if args.dataset_list is not None:
60 | with open(args.dataset_list, 'r') as f:
61 | test_files = list(f.read().splitlines())
62 | else:
63 | test_files = [file.relpathto(dataset_dir) for file in sum([dataset_dir.files('*.{}'.format(ext)) for ext in args.img_exts], [])]
64 |
65 | framework = test_framework(dataset_dir, test_files, seq_length, args.min_depth, args.max_depth)
66 |
67 | print('{} files to test'.format(len(test_files)))
68 | errors = np.zeros((2, 7, len(test_files)), np.float32)
69 | if args.output_dir is not None:
70 | output_dir = Path(args.output_dir)
71 | viz_dir = output_dir/'viz'
72 | output_dir.makedirs_p()
73 | viz_dir.makedirs_p()
74 |
75 | for j, sample in enumerate(tqdm(framework)):
76 | tgt_img = sample['tgt']
77 |
78 | ref_imgs = sample['ref']
79 |
80 | h,w,_ = tgt_img.shape
81 | if (not args.no_resize) and (h != args.img_height or w != args.img_width):
82 | tgt_img = imresize(tgt_img, (args.img_height, args.img_width)).astype(np.float32)
83 | ref_imgs = [imresize(img, (args.img_height, args.img_width)).astype(np.float32) for img in ref_imgs]
84 |
85 | tgt_img = np.transpose(tgt_img, (2, 0, 1))
86 | ref_imgs = [np.transpose(img, (2,0,1)) for img in ref_imgs]
87 |
88 | tgt_img = torch.from_numpy(tgt_img).unsqueeze(0)
89 | tgt_img = ((tgt_img/255 - 0.5)/0.5).cuda()
90 | tgt_img_var = Variable(tgt_img, volatile=True)
91 |
92 | ref_imgs_var = []
93 | for i, img in enumerate(ref_imgs):
94 | img = torch.from_numpy(img).unsqueeze(0)
95 | img = ((img/255 - 0.5)/0.5).cuda()
96 | ref_imgs_var.append(Variable(img, volatile=True))
97 |
98 | pred_disp = disp_net(tgt_img_var)
99 | if args.spatial_normalize:
100 | pred_disp = spatial_normalize(pred_disp)
101 | pred_disp = pred_disp.data.cpu().numpy()[0,0]
102 | gt_depth = sample['gt_depth']
103 |
104 | if args.output_dir is not None:
105 | if j == 0:
106 | predictions = np.zeros((len(test_files), *pred_disp.shape))
107 | predictions[j] = 1/pred_disp
108 | gt_viz = interp_gt_disp(gt_depth)
109 | gt_viz = torch.FloatTensor(gt_viz)
110 | gt_viz[gt_viz == 0] = 1000
111 | gt_viz = (1/gt_viz).clamp(0,10)
112 |
113 | tgt_img_viz = tensor2array(tgt_img[0].cpu())
114 | depth_viz = tensor2array(torch.FloatTensor(pred_disp), max_value=None, colormap='hot')
115 | gt_viz = tensor2array(gt_viz, max_value=None, colormap='hot')
116 | tgt_img_viz_im = Image.fromarray((255*tgt_img_viz).astype('uint8'))
117 | tgt_img_viz_im.save(viz_dir/str(j).zfill(4)+'img.png')
118 | depth_viz_im = Image.fromarray((255*depth_viz).astype('uint8'))
119 | depth_viz_im.save(viz_dir/str(j).zfill(4)+'depth.png')
120 | gt_viz_im = Image.fromarray((255*gt_viz).astype('uint8'))
121 | gt_viz_im.save(viz_dir/str(j).zfill(4)+'gt.png')
122 |
123 |
124 | pred_depth = 1/pred_disp
125 | pred_depth_zoomed = zoom(pred_depth, (gt_depth.shape[0]/pred_depth.shape[0],gt_depth.shape[1]/pred_depth.shape[1])).clip(args.min_depth, args.max_depth)
126 | if sample['mask'] is not None:
127 | pred_depth_zoomed = pred_depth_zoomed[sample['mask']]
128 | gt_depth = gt_depth[sample['mask']]
129 |
130 | if seq_length > 0:
131 | _, poses = pose_net(tgt_img_var, ref_imgs_var)
132 | displacements = poses[0,:,:3].norm(2,1).cpu().data.numpy() # shape [1 - seq_length]
133 |
134 | scale_factors = [s1/s2 for s1, s2 in zip(sample['displacements'], displacements) if s1 > 0]
135 | scale_factor = np.mean(scale_factors) if len(scale_factors) > 0 else 0
136 | if len(scale_factors) == 0:
137 | print('not good ! ', sample['path'], sample['displacements'])
138 | errors[0,:,j] = compute_errors(gt_depth, pred_depth_zoomed*scale_factor)
139 |
140 | scale_factor = np.median(gt_depth)/np.median(pred_depth_zoomed)
141 | errors[1,:,j] = compute_errors(gt_depth, pred_depth_zoomed*scale_factor)
142 |
143 | mean_errors = errors.mean(2)
144 | error_names = ['abs_rel','sq_rel','rms','log_rms','a1','a2','a3']
145 | if args.pretrained_posenet:
146 | print("Results with scale factor determined by PoseNet : ")
147 | print("{:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10}".format(*error_names))
148 | print("{:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}".format(*mean_errors[0]))
149 |
150 | print("Results with scale factor determined by GT/prediction ratio (like the original paper) : ")
151 | print("{:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10}".format(*error_names))
152 | print("{:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}".format(*mean_errors[1]))
153 |
154 | if args.output_dir is not None:
155 | np.save(output_dir/'predictions.npy', predictions)
156 |
157 | def interp_gt_disp(mat, mask_val=0):
158 | mat[mat==mask_val] = np.nan
159 | x = np.arange(0, mat.shape[1])
160 | y = np.arange(0, mat.shape[0])
161 | mat = np.ma.masked_invalid(mat)
162 | xx, yy = np.meshgrid(x, y)
163 | #get only the valid values
164 | x1 = xx[~mat.mask]
165 | y1 = yy[~mat.mask]
166 | newarr = mat[~mat.mask]
167 |
168 | GD1 = interpolate.griddata((x1, y1), newarr.ravel(), (xx, yy), method='linear', fill_value=mask_val)
169 | return GD1
170 |
171 | def compute_errors(gt, pred):
172 | thresh = np.maximum((gt / pred), (pred / gt))
173 | a1 = (thresh < 1.25 ).mean()
174 | a2 = (thresh < 1.25 ** 2).mean()
175 | a3 = (thresh < 1.25 ** 3).mean()
176 |
177 | rmse = (gt - pred) ** 2
178 | rmse = np.sqrt(rmse.mean())
179 |
180 | rmse_log = (np.log(gt) - np.log(pred)) ** 2
181 | rmse_log = np.sqrt(rmse_log.mean())
182 |
183 | abs_rel = np.mean(np.abs(gt - pred) / gt)
184 |
185 | sq_rel = np.mean(((gt - pred)**2) / gt)
186 |
187 | return abs_rel, sq_rel, rmse, rmse_log, a1, a2, a3
188 |
189 |
190 | if __name__ == '__main__':
191 | main()
192 |
--------------------------------------------------------------------------------
/test_flownetc.py:
--------------------------------------------------------------------------------
1 | # Author: Anurag Ranjan
2 | # Copyright (c) 2019, Anurag Ranjan
3 | # All rights reserved.
4 |
5 | import argparse
6 | import custom_transforms
7 | from datasets.validation_flow import ValidationFlowFlowNetC
8 | import torch
9 | from torch.autograd import Variable
10 | import models
11 | from logger import AverageMeter
12 | from torchvision.transforms import ToPILImage
13 | from tensorboardX import SummaryWriter
14 | import os
15 | from flowutils.flowlib import flow_to_image
16 | from utils import tensor2array
17 |
18 |
19 | parser = argparse.ArgumentParser(description='Test FlowNetC',
20 | formatter_class=argparse.ArgumentDefaultsHelpFormatter)
21 | parser.add_argument('--flownet', dest='flownet', type=str, default='FlowNetC5', choices=['FlowNetS', 'FlowNetS5', 'FlowNetS6', 'SpyNet', 'FlowNetC5'],
22 | help='flow network architecture. Options: FlowNetS | SpyNet')
23 | parser.add_argument('--nlevels', dest='nlevels', type=int, default=6,
24 | help='number of levels in multiscale. Options: 4|5|6')
25 | parser.add_argument('--pretrained-flow', dest='pretrained_flow', default=None, metavar='PATH',
26 | help='path to pre-trained Flow net model')
27 | parser.add_argument('--dataset', dest='dataset', default='kitti2015', choices=['kitti2015', 'kitti2012'],
28 | help='path to pre-trained Flow net model')
29 |
30 |
31 | def compute_epe(gt, pred, op='sub'):
32 | _, _, h_pred, w_pred = pred.size()
33 | bs, nc, h_gt, w_gt = gt.size()
34 | u_gt, v_gt = gt[:,0,:,:], gt[:,1,:,:]
35 | pred = torch.nn.functional.upsample(pred, size=(h_gt, w_gt), mode='bilinear')
36 | u_pred = pred[:,0,:,:] * (w_gt/w_pred)
37 | v_pred = pred[:,1,:,:] * (h_gt/h_pred)
38 | if op=='sub':
39 | epe = torch.sqrt(torch.pow((u_gt - u_pred), 2) + torch.pow((v_gt - v_pred), 2))
40 | if op=='div':
41 | epe = ((u_gt / u_pred) + (v_gt / v_pred))
42 |
43 | return epe
44 |
45 | def main():
46 | global args
47 | args = parser.parse_args()
48 | save_path = 'checkpoints/test_flownetc'
49 |
50 | if not os.path.exists(save_path):
51 | os.makedirs(save_path)
52 | summary_writer = SummaryWriter(save_path)
53 | normalize = custom_transforms.Normalize(mean=[0.5, 0.5, 0.5],
54 | std=[1.0, 1.0, 1.0])
55 | flow_loader_h, flow_loader_w = 384, 1280
56 | valid_flow_transform = custom_transforms.Compose([custom_transforms.Scale(h=flow_loader_h, w=flow_loader_w),
57 | custom_transforms.ArrayToTensor(), normalize])
58 | if args.dataset == "kitti2015":
59 | val_flow_set = ValidationFlowFlowNetC(root='/is/ps2/aranjan/AllFlowData/kitti/kitti2015',
60 | sequence_length=5, transform=valid_flow_transform)
61 | elif args.dataset == "kitti2012":
62 | val_flow_set = ValidationFlowKitti2012(root='/is/ps2/aranjan/AllFlowData/kitti/kitti2012',
63 | sequence_length=5, transform=valid_flow_transform)
64 |
65 | val_flow_loader = torch.utils.data.DataLoader(val_flow_set, batch_size=1, shuffle=False,
66 | num_workers=2, pin_memory=True, drop_last=True)
67 |
68 | flow_net = getattr(models, args.flownet)(pretrained=True).cuda()
69 |
70 | flow_net.eval()
71 | error_names = ['epe']
72 | errors = AverageMeter(i=len(error_names))
73 |
74 | for i, (tgt_img, ref_imgs, intrinsics, intrinsics_inv, flow_gt, flownet_c_flow, obj_map) in enumerate(val_flow_loader):
75 | tgt_img_var = Variable(tgt_img.cuda(), volatile=True)
76 | if args.dataset=="kitti2015":
77 | ref_imgs_var = [Variable(img.cuda(), volatile=True) for img in ref_imgs]
78 | ref_img_var = ref_imgs_var[2]
79 | elif args.dataset=="kitti2012":
80 | ref_img_var = Variable(ref_imgs.cuda(), volatile=True)
81 |
82 | flow_gt_var = Variable(flow_gt.cuda(), volatile=True)
83 | flownet_c_flow = Variable(flownet_c_flow.cuda(), volatile=True)
84 |
85 | # compute output
86 | flow_fwd = flow_net(tgt_img_var, ref_img_var)
87 | epe = compute_epe(gt=flownet_c_flow, pred=flow_fwd)
88 | scale_factor = compute_epe(gt=flownet_c_flow, pred=flow_fwd, op='div')
89 | #import ipdb
90 | #ipdb.set_trace()
91 | summary_writer.add_image('Frame 1', tensor2array(tgt_img_var.data[0].cpu()) , i)
92 | summary_writer.add_image('Frame 2', tensor2array(ref_img_var.data[0].cpu()) , i)
93 | summary_writer.add_image('Flow Output', flow_to_image(tensor2array(flow_fwd.data[0].cpu())) , i)
94 | summary_writer.add_image('UnFlow Output', flow_to_image(tensor2array(flownet_c_flow.data[0][:2].cpu())) , i)
95 | summary_writer.add_image('gtFlow Output', flow_to_image(tensor2array(flow_gt_var.data[0][:2].cpu())) , i)
96 | summary_writer.add_image('EPE Image w UnFlow', tensor2array(epe.data.cpu()) , i)
97 | summary_writer.add_scalar('EPE mean w UnFlow', epe.mean().data.cpu(), i)
98 | summary_writer.add_scalar('EPE max w UnFlow', epe.max().data.cpu(), i)
99 | summary_writer.add_scalar('Scale Factor max w UnFlow', scale_factor.max().data.cpu(), i)
100 | summary_writer.add_scalar('Scale Factor mean w UnFlow', scale_factor.mean().data.cpu(), i)
101 | summary_writer.add_scalar('Flow value max', flow_fwd.max().data.cpu(), i)
102 | print(i, "EPE: ", epe.mean().item())
103 |
104 | #print(i, epe)
105 | #errors.update(epe)
106 |
107 | print('Done')
108 | #print("Averge EPE",errors.avg )
109 |
110 |
111 |
112 | if __name__ == '__main__':
113 | main()
114 |
--------------------------------------------------------------------------------
/test_make3d.py:
--------------------------------------------------------------------------------
1 | # Author: Anurag Ranjan
2 | # Copyright (c) 2019, Anurag Ranjan
3 | # All rights reserved.
4 | # based on github.com/ClementPinard/SfMLearner-Pytorch
5 |
6 | import glob
7 | import torch
8 | import cv2
9 | from torch.autograd import Variable
10 | from PIL import Image
11 | from scipy import interpolate, io
12 | from scipy.misc import imresize, imread
13 | from scipy.ndimage.interpolation import zoom
14 | import numpy as np
15 | from path import Path
16 | import argparse
17 | from tqdm import tqdm
18 | from utils import tensor2array
19 | import models
20 | from loss_functions import spatial_normalize
21 |
22 | parser = argparse.ArgumentParser(description='Script for DispNet testing with corresponding groundTruth',
23 | formatter_class=argparse.ArgumentDefaultsHelpFormatter)
24 | parser.add_argument("--dispnet", dest='dispnet', type=str, default='DispResNet6', help='dispnet architecture')
25 | parser.add_argument("--pretrained-dispnet", required=True, type=str, help="pretrained DispNet path")
26 | parser.add_argument("--img-height", default=256, type=int, help="Image height")
27 | parser.add_argument("--img-width", default=256, type=int, help="Image width")
28 | parser.add_argument("--no-resize", action='store_true', help="no resizing is done")
29 | parser.add_argument("--min-depth", default=1e-3)
30 | parser.add_argument("--max-depth", default=70, type=float)
31 |
32 | parser.add_argument("--dataset-dir", default='.', type=str, help="Dataset directory")
33 | parser.add_argument("--output-dir", default=None, type=str, help="Output directory for saving predictions in a big 3D numpy file")
34 |
35 | parser.add_argument("--img-exts", default=['png', 'jpg', 'bmp'], nargs='*', type=str, help="images extensions to glob")
36 |
37 | class test_framework(object):
38 | def __init__(self, root, min_depth=1e-3, max_depth=70):
39 | self.root = root
40 | self.min_depth, self.max_depth = min_depth, max_depth
41 | self.img_files = sorted(glob.glob(root/'Test134/*.jpg'))
42 | self.depth_files = sorted(glob.glob(root/'Gridlaserdata/*.mat'))
43 |
44 | # This test file is corrupted in the original dataset
45 | self.img_files.pop(61)
46 | self.depth_files.pop(61)
47 |
48 | self.ratio = 2
49 | self.h_ratio = 1 / (1.33333 * self.ratio)
50 | self.color_new_height = 1704 // 2
51 | self.depth_new_height = 21
52 |
53 | def __getitem__(self, i):
54 | img = Image.open(self.img_files[i])
55 | try:
56 | imgarr = np.array(img)
57 | tgt_img = imgarr.astype(np.float32)
58 | except:
59 | imgarr = np.array(img)
60 | tgt_img = imgarr.astype(np.float32)
61 |
62 | tgt_img = tgt_img[ (2272 - self.color_new_height)//2:(2272 + self.color_new_height)//2,:]
63 |
64 | depth_map = io.loadmat(self.depth_files[i])
65 | depth_gt = depth_map["Position3DGrid"][:,:,3]
66 | depth_gt_cropped = depth_gt[(55 - 21)//2:(55 + 21)//2]
67 | return {'tgt': tgt_img,
68 | 'path':self.img_files[i],
69 | 'gt_depth': depth_gt_cropped,
70 | 'mask': np.logical_and(depth_gt_cropped > self.min_depth, depth_gt_cropped < self.max_depth)
71 | }
72 |
73 | def __len__(self):
74 | return len(self.img_files)
75 |
76 | def main():
77 | args = parser.parse_args()
78 |
79 | disp_net = getattr(models, args.dispnet)().cuda()
80 | weights = torch.load(args.pretrained_dispnet)
81 | disp_net.load_state_dict(weights['state_dict'])
82 | disp_net.eval()
83 |
84 | print('no PoseNet specified, scale_factor will be determined by median ratio, which is kiiinda cheating\
85 | (but consistent with original paper)')
86 | seq_length = 0
87 |
88 | dataset_dir = Path(args.dataset_dir)
89 | framework = test_framework(dataset_dir, args.min_depth, args.max_depth)
90 | errors = np.zeros((2, 7, len(framework)), np.float32)
91 | if args.output_dir is not None:
92 | output_dir = Path(args.output_dir)
93 | viz_dir = output_dir/'viz'
94 | output_dir.makedirs_p()
95 | viz_dir.makedirs_p()
96 |
97 | for j, sample in enumerate(tqdm(framework)):
98 | tgt_img = sample['tgt']
99 |
100 | h,w,_ = tgt_img.shape
101 | if (not args.no_resize) and (h != args.img_height or w != args.img_width):
102 | tgt_img = imresize(tgt_img, (args.img_height, args.img_width)).astype(np.float32)
103 |
104 | tgt_img = np.transpose(tgt_img, (2, 0, 1))
105 | tgt_img = torch.from_numpy(tgt_img).unsqueeze(0)
106 | tgt_img = ((tgt_img/255 - 0.5)/0.5).cuda()
107 | tgt_img_var = Variable(tgt_img, volatile=True)
108 |
109 | pred_disp = disp_net(tgt_img_var)
110 | pred_disp = pred_disp.data.cpu().numpy()[0,0]
111 | gt_depth = sample['gt_depth']
112 |
113 | if args.output_dir is not None:
114 | if j == 0:
115 | predictions = np.zeros((len(framework), *pred_disp.shape))
116 | predictions[j] = 1/pred_disp
117 | gt_viz = interp_gt_disp(gt_depth)
118 | gt_viz = torch.FloatTensor(gt_viz)
119 | gt_viz[gt_viz == 0] = 1000
120 | gt_viz = (1/gt_viz).clamp(0,10)
121 |
122 | tgt_img_viz = tensor2array(tgt_img[0].cpu())
123 | depth_viz = tensor2array(torch.FloatTensor(pred_disp), max_value=None, colormap='hot')
124 | gt_viz = tensor2array(gt_viz, max_value=None, colormap='hot')
125 | tgt_img_viz_im = Image.fromarray((255*tgt_img_viz).astype('uint8'))
126 | tgt_img_viz_im = tgt_img_viz_im.resize(size=(args.img_width, args.img_height), resample=3)
127 | tgt_img_viz_im.save(viz_dir/str(j).zfill(4)+'img.png')
128 | depth_viz_im = Image.fromarray((255*depth_viz).astype('uint8'))
129 | depth_viz_im = depth_viz_im.resize(size=(args.img_width, args.img_height), resample=3)
130 | depth_viz_im.save(viz_dir/str(j).zfill(4)+'depth.png')
131 | gt_viz_im = Image.fromarray((255*gt_viz).astype('uint8'))
132 | gt_viz_im = gt_viz_im.resize(size=(args.img_width, args.img_height), resample=3)
133 | gt_viz_im.save(viz_dir/str(j).zfill(4)+'gt.png')
134 |
135 | all_viz_im = Image.fromarray( np.hstack([np.array(tgt_img_viz_im), np.array(gt_viz_im), np.array(depth_viz_im)]) )
136 | all_viz_im.save(viz_dir/str(j).zfill(4)+'all.png')
137 |
138 |
139 | pred_depth = 1/pred_disp
140 | pred_depth_zoomed = zoom(pred_depth, (gt_depth.shape[0]/pred_depth.shape[0],gt_depth.shape[1]/pred_depth.shape[1])).clip(args.min_depth, args.max_depth)
141 | if sample['mask'] is not None:
142 | pred_depth_zoomed = pred_depth_zoomed[sample['mask']]
143 | gt_depth = gt_depth[sample['mask']]
144 |
145 | scale_factor = np.median(gt_depth)/np.median(pred_depth_zoomed)
146 | pred_depth_zoomed = scale_factor*pred_depth_zoomed
147 | pred_depth_zoomed[pred_depth_zoomed>args.max_depth] = args.max_depth
148 | errors[1,:,j] = compute_errors(gt_depth, pred_depth_zoomed)
149 |
150 | mean_errors = errors.mean(2)
151 | error_names = ['abs_rel','sq_rel','rms','log_rms','a1','a2','a3']
152 |
153 | print("Results with scale factor determined by GT/prediction ratio (like the original paper) : ")
154 | print("{:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10}, {:>10}".format(*error_names))
155 | print("{:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}, {:10.4f}".format(*mean_errors[1]))
156 |
157 | if args.output_dir is not None:
158 | np.save(output_dir/'predictions.npy', predictions)
159 |
160 | def interp_gt_disp(mat, mask_val=0):
161 | mat[mat==mask_val] = np.nan
162 | x = np.arange(0, mat.shape[1])
163 | y = np.arange(0, mat.shape[0])
164 | mat = np.ma.masked_invalid(mat)
165 | xx, yy = np.meshgrid(x, y)
166 | #get only the valid values
167 | x1 = xx[~mat.mask]
168 | y1 = yy[~mat.mask]
169 | newarr = mat[~mat.mask]
170 |
171 | GD1 = interpolate.griddata((x1, y1), newarr.ravel(), (xx, yy), method='linear', fill_value=mask_val)
172 | return GD1
173 |
174 | def compute_errors(gt, pred):
175 | thresh = np.maximum((gt / pred), (pred / gt))
176 | a1 = (thresh < 1.25 ).mean()
177 | a2 = (thresh < 1.25 ** 2).mean()
178 | a3 = (thresh < 1.25 ** 3).mean()
179 |
180 | rmse = (gt - pred) ** 2
181 | rmse = np.sqrt(rmse.mean())
182 |
183 | rmse_log = (np.log10(gt) - np.log10(pred)) ** 2
184 | rmse_log = np.sqrt(rmse_log.mean())
185 |
186 | abs_rel = np.mean(np.abs(gt - pred) / gt)
187 |
188 | sq_rel = np.mean(((gt - pred)**2) / gt)
189 |
190 | return abs_rel, sq_rel, rmse, rmse_log, a1, a2, a3
191 |
192 |
193 | if __name__ == '__main__':
194 | main()
195 |
--------------------------------------------------------------------------------
/test_pose.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from torch.autograd import Variable
3 |
4 | from scipy.misc import imresize
5 | import numpy as np
6 | from path import Path
7 | import argparse
8 | from tqdm import tqdm
9 |
10 | import models
11 | from inverse_warp import pose_vec2mat
12 |
13 |
14 | parser = argparse.ArgumentParser(description='Script for PoseNet testing with corresponding groundTruth from KITTI Odometry',
15 | formatter_class=argparse.ArgumentDefaultsHelpFormatter)
16 | parser.add_argument("pretrained_posenet", type=str, help="pretrained PoseNet path")
17 | parser.add_argument("--posenet", type=str, default="PoseNetB6", help="PoseNet model path")
18 | parser.add_argument("--img-height", default=256, type=int, help="Image height")
19 | parser.add_argument("--img-width", default=832, type=int, help="Image width")
20 | parser.add_argument("--no-resize", action='store_true', help="no resizing is done")
21 | parser.add_argument("--min-depth", default=1e-3)
22 | parser.add_argument("--max-depth", default=80)
23 |
24 | parser.add_argument("--dataset-dir", default='.', type=str, help="Dataset directory")
25 | parser.add_argument("--sequences", default=['09'], type=str, nargs='*', help="sequences to test")
26 | parser.add_argument("--output-dir", default=None, type=str, help="Output directory for saving predictions in a big 3D numpy file")
27 | parser.add_argument("--img-exts", default=['png', 'jpg', 'bmp'], nargs='*', type=str, help="images extensions to glob")
28 | parser.add_argument("--rotation-mode", default='euler', choices=['euler', 'quat'], type=str)
29 |
30 |
31 | def main():
32 | args = parser.parse_args()
33 | from kitti_eval.pose_evaluation_utils import test_framework_KITTI as test_framework
34 |
35 | weights = torch.load(args.pretrained_posenet)
36 | seq_length = int(weights['state_dict']['conv1.0.weight'].size(1)/3)
37 | pose_net = getattr(models, args.posenet)(nb_ref_imgs=seq_length - 1).cuda()
38 | pose_net.load_state_dict(weights['state_dict'], strict=False)
39 |
40 | dataset_dir = Path(args.dataset_dir)
41 | framework = test_framework(dataset_dir, args.sequences, seq_length)
42 |
43 | print('{} snippets to test'.format(len(framework)))
44 | errors = np.zeros((len(framework), 2), np.float32)
45 | if args.output_dir is not None:
46 | output_dir = Path(args.output_dir)
47 | output_dir.makedirs_p()
48 | predictions_array = np.zeros((len(framework), seq_length, 3, 4))
49 |
50 | for j, sample in enumerate(tqdm(framework)):
51 | imgs = sample['imgs']
52 |
53 | h,w,_ = imgs[0].shape
54 | if (not args.no_resize) and (h != args.img_height or w != args.img_width):
55 | imgs = [imresize(img, (args.img_height, args.img_width)).astype(np.float32) for img in imgs]
56 |
57 | imgs = [np.transpose(img, (2,0,1)) for img in imgs]
58 |
59 | ref_imgs_var = []
60 | for i, img in enumerate(imgs):
61 | img = torch.from_numpy(img).unsqueeze(0)
62 | img = ((img/255 - 0.5)/0.5).cuda()
63 | img_var = Variable(img, volatile=True)
64 | if i == len(imgs)//2:
65 | tgt_img_var = img_var
66 | else:
67 | ref_imgs_var.append(Variable(img, volatile=True))
68 |
69 | if args.posenet in ["PoseNet6", "PoseNetB6"]:
70 | poses = pose_net(tgt_img_var, ref_imgs_var)
71 | else:
72 | _, poses = pose_net(tgt_img_var, ref_imgs_var)
73 |
74 | poses = poses.cpu().data[0]
75 | poses = torch.cat([poses[:len(imgs)//2], torch.zeros(1,6).float(), poses[len(imgs)//2:]])
76 |
77 | inv_transform_matrices = pose_vec2mat(Variable(poses), rotation_mode=args.rotation_mode).data.numpy().astype(np.float64)
78 |
79 | rot_matrices = np.linalg.inv(inv_transform_matrices[:,:,:3])
80 | tr_vectors = -rot_matrices @ inv_transform_matrices[:,:,-1:]
81 |
82 | transform_matrices = np.concatenate([rot_matrices, tr_vectors], axis=-1)
83 |
84 | first_inv_transform = inv_transform_matrices[0]
85 | final_poses = first_inv_transform[:,:3] @ transform_matrices
86 | final_poses[:,:,-1:] += first_inv_transform[:,-1:]
87 |
88 | if args.output_dir is not None:
89 | predictions_array[j] = final_poses
90 |
91 | ATE, RE = compute_pose_error(sample['poses'], final_poses)
92 | errors[j] = ATE, RE
93 |
94 | mean_errors = errors.mean(0)
95 | std_errors = errors.std(0)
96 | error_names = ['ATE','RE']
97 | print('')
98 | print("Results")
99 | print("\t {:>10}, {:>10}".format(*error_names))
100 | print("mean \t {:10.4f}, {:10.4f}".format(*mean_errors))
101 | print("std \t {:10.4f}, {:10.4f}".format(*std_errors))
102 |
103 | if args.output_dir is not None:
104 | np.save(output_dir/'predictions.npy', predictions_array)
105 |
106 |
107 | def compute_pose_error(gt, pred):
108 | RE = 0
109 | snippet_length = gt.shape[0]
110 | scale_factor = np.sum(gt[:,:,-1] * pred[:,:,-1])/np.sum(pred[:,:,-1] ** 2)
111 | ATE = np.linalg.norm((gt[:,:,-1] - scale_factor * pred[:,:,-1]).reshape(-1))
112 | for gt_pose, pred_pose in zip(gt, pred):
113 | # Residual matrix to which we compute angle's sin and cos
114 | R = gt_pose[:,:3] @ np.linalg.inv(pred_pose[:,:3])
115 | s = np.linalg.norm([R[0,1]-R[1,0],
116 | R[1,2]-R[2,1],
117 | R[0,2]-R[2,0]])
118 | c = np.trace(R) - 1
119 | # Note: we actually compute double of cos and sin, but arctan2 is invariant to scale
120 | RE += np.arctan2(s,c)
121 |
122 | return ATE/snippet_length, RE/snippet_length
123 |
124 |
125 | if __name__ == '__main__':
126 | main()
127 |
--------------------------------------------------------------------------------
/test_sintel_pose.py:
--------------------------------------------------------------------------------
1 | # Author: Anurag Ranjan
2 | # Copyright (c) 2019, Anurag Ranjan
3 | # All rights reserved.
4 | # based on github.com/ClementPinard/SfMLearner-Pytorch
5 |
6 | import torch
7 | from torch.autograd import Variable
8 |
9 | from scipy.misc import imresize
10 | import numpy as np
11 | from path import Path
12 | import argparse
13 | from tqdm import tqdm
14 |
15 | import models
16 | from inverse_warp import pose_vec2mat
17 |
18 |
19 | parser = argparse.ArgumentParser(description='Script for PoseNet testing with corresponding groundTruth from Sintel Odometry',
20 | formatter_class=argparse.ArgumentDefaultsHelpFormatter)
21 | parser.add_argument("pretrained_posenet", type=str, help="pretrained PoseNet path")
22 | parser.add_argument("--posenet", type=str, default="PoseNetB6", help="PoseNet model path")
23 | parser.add_argument("--img-height", default=128, type=int, help="Image height")
24 | parser.add_argument("--img-width", default=416, type=int, help="Image width")
25 | parser.add_argument("--no-resize", action='store_true', help="no resizing is done")
26 | parser.add_argument("--min-depth", default=1e-3)
27 | parser.add_argument("--max-depth", default=80)
28 |
29 | parser.add_argument("--dataset-dir", default='.', type=str, help="Dataset directory")
30 | parser.add_argument("--sequences", default=['alley_1'], type=str, nargs='*', help="sequences to test")
31 | parser.add_argument("--output-dir", default=None, type=str, help="Output directory for saving predictions in a big 3D numpy file")
32 | parser.add_argument("--img-exts", default=['png', 'jpg', 'bmp'], nargs='*', type=str, help="images extensions to glob")
33 | parser.add_argument("--rotation-mode", default='euler', choices=['euler', 'quat'], type=str)
34 |
35 |
36 | def main():
37 | args = parser.parse_args()
38 | from sintel_eval.pose_evaluation_utils import test_framework_Sintel as test_framework
39 |
40 | weights = torch.load(args.pretrained_posenet)
41 | seq_length = int(weights['state_dict']['conv1.0.weight'].size(1)/3)
42 | pose_net = getattr(models, args.posenet)(nb_ref_imgs=seq_length - 1).cuda()
43 | pose_net.load_state_dict(weights['state_dict'], strict=False)
44 |
45 | dataset_dir = Path(args.dataset_dir)
46 | framework = test_framework(dataset_dir, args.sequences, seq_length)
47 |
48 | print('{} snippets to test'.format(len(framework)))
49 | RE = np.zeros((len(framework)), np.float32)
50 | if args.output_dir is not None:
51 | output_dir = Path(args.output_dir)
52 | output_dir.makedirs_p()
53 | predictions_array = np.zeros((len(framework), seq_length, 3, 4))
54 |
55 | for j, sample in enumerate(tqdm(framework)):
56 | imgs = sample['imgs']
57 |
58 | h,w,_ = imgs[0].shape
59 | if (not args.no_resize) and (h != args.img_height or w != args.img_width):
60 | imgs = [imresize(img, (args.img_height, args.img_width)).astype(np.float32) for img in imgs]
61 |
62 | imgs = [np.transpose(img, (2,0,1)) for img in imgs]
63 |
64 | ref_imgs_var = []
65 | for i, img in enumerate(imgs):
66 | img = torch.from_numpy(img).unsqueeze(0)
67 | img = ((img/255 - 0.5)/0.5).cuda()
68 | img_var = Variable(img, volatile=True)
69 | if i == len(imgs)//2:
70 | tgt_img_var = img_var
71 | else:
72 | ref_imgs_var.append(Variable(img, volatile=True))
73 |
74 | if args.posenet in ["PoseNet6", "PoseNetB6"]:
75 | poses = pose_net(tgt_img_var, ref_imgs_var)
76 | else:
77 | _, poses = pose_net(tgt_img_var, ref_imgs_var)
78 |
79 | poses = poses.cpu().data[0]
80 | poses = torch.cat([poses[:len(imgs)//2], torch.zeros(1,6).float(), poses[len(imgs)//2:]])
81 |
82 | inv_transform_matrices = pose_vec2mat(Variable(poses), rotation_mode=args.rotation_mode).data.numpy().astype(np.float64)
83 |
84 | rot_matrices = np.linalg.inv(inv_transform_matrices[:,:,:3])
85 | tr_vectors = -rot_matrices @ inv_transform_matrices[:,:,-1:]
86 |
87 | transform_matrices = np.concatenate([rot_matrices, tr_vectors], axis=-1)
88 |
89 | first_inv_transform = inv_transform_matrices[0]
90 | final_poses = first_inv_transform[:,:3] @ transform_matrices
91 | final_poses[:,:,-1:] += first_inv_transform[:,-1:]
92 |
93 | if args.output_dir is not None:
94 | predictions_array[j] = final_poses
95 |
96 | RE[j] = compute_pose_error(sample['poses'], final_poses)
97 |
98 | print('')
99 | print("Results")
100 | print("\t {:>10}".format('RE'))
101 | print("mean \t {:10.4f}".format(RE.mean()))
102 | print("std \t {:10.4f}".format(RE.std()))
103 |
104 | if args.output_dir is not None:
105 | np.save(output_dir/'predictions.npy', predictions_array)
106 |
107 |
108 | def compute_pose_error(gt, pred):
109 | RE = 0
110 | snippet_length = gt.shape[0]
111 | for gt_pose, pred_pose in zip(gt, pred):
112 | # Residual matrix to which we compute angle's sin and cos
113 | R = gt_pose[:,:3] @ np.linalg.inv(pred_pose[:,:3])
114 | s = np.linalg.norm([R[0,1]-R[1,0],
115 | R[1,2]-R[2,1],
116 | R[0,2]-R[2,0]])
117 | c = np.trace(R) - 1
118 | # Note: we actually compute double of cos and sin, but arctan2 is invariant to scale
119 | RE += np.arctan2(s,c)
120 |
121 | return RE/snippet_length
122 |
123 |
124 | if __name__ == '__main__':
125 | main()
126 |
--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 | import shutil
3 | import numpy as np
4 | import torch
5 | from matplotlib import cm
6 | from matplotlib.colors import ListedColormap, LinearSegmentedColormap
7 |
8 | def high_res_colormap(low_res_cmap, resolution=1000, max_value=1):
9 | # Construct the list colormap, with interpolated values for higer resolution
10 | # For a linear segmented colormap, you can just specify the number of point in
11 | # cm.get_cmap(name, lutsize) with the parameter lutsize
12 | x = np.linspace(0,1,low_res_cmap.N)
13 | low_res = low_res_cmap(x)
14 | new_x = np.linspace(0,max_value,resolution)
15 | high_res = np.stack([np.interp(new_x, x, low_res[:,i]) for i in range(low_res.shape[1])], axis=1)
16 | return ListedColormap(high_res)
17 |
18 |
19 | def opencv_rainbow(resolution=1000):
20 | # Construct the opencv equivalent of Rainbow
21 | opencv_rainbow_data = (
22 | (0.000, (1.00, 0.00, 0.00)),
23 | (0.400, (1.00, 1.00, 0.00)),
24 | (0.600, (0.00, 1.00, 0.00)),
25 | (0.800, (0.00, 0.00, 1.00)),
26 | (1.000, (0.60, 0.00, 1.00))
27 | )
28 |
29 | return LinearSegmentedColormap.from_list('opencv_rainbow', opencv_rainbow_data, resolution)
30 |
31 |
32 | COLORMAPS = {'rainbow': opencv_rainbow(),
33 | 'magma': high_res_colormap(cm.get_cmap('magma')),
34 | 'bone': cm.get_cmap('bone', 10000)}
35 |
36 |
37 | def tensor2array(tensor, max_value=None, colormap='rainbow'):
38 | tensor = tensor.detach().cpu()
39 | if max_value is None:
40 | max_value = tensor.max().item()
41 | if tensor.ndimension() == 2 or tensor.size(0) == 1:
42 | norm_array = tensor.squeeze().numpy()/max_value
43 | array = COLORMAPS[colormap](norm_array).astype(np.float32)
44 | array = array[:,:,:3]
45 | array = array.transpose(2, 0, 1)
46 |
47 | elif tensor.ndimension() == 3:
48 | if (tensor.size(0) == 3):
49 | array = 0.5 + tensor.numpy()*0.5
50 | elif (tensor.size(0) == 2):
51 | array = tensor.numpy()
52 |
53 | return array
54 |
55 | def save_checkpoint(save_path, dispnet_state, posenet_state, masknet_state, flownet_state, optimizer_state, is_best, filename='checkpoint.pth.tar'):
56 | file_prefixes = ['dispnet', 'posenet', 'masknet', 'flownet', 'optimizer']
57 | states = [dispnet_state, posenet_state, masknet_state, flownet_state, optimizer_state]
58 | for (prefix, state) in zip(file_prefixes, states):
59 | torch.save(state, save_path/'{}_{}'.format(prefix,filename))
60 |
61 | if is_best:
62 | for prefix in file_prefixes:
63 | shutil.copyfile(save_path/'{}_{}'.format(prefix,filename), save_path/'{}_model_best.pth.tar'.format(prefix))
64 |
--------------------------------------------------------------------------------