├── LICENSE ├── README.md ├── demo_im.jpg ├── demo_mask.jpg ├── figure └── teaser.png ├── model ├── SfMNet.py ├── __init__.py ├── consistency_layer.py ├── dataloader.py ├── hdr_illu_pca │ ├── mean.npy │ ├── pcaMean.npy │ ├── pcaVariance.npy │ └── pcaVector.npy ├── lambSH_layer.py ├── pred_illuDecomp_layer_new.py ├── reproj_layer.py └── vgg16.py ├── run_test_demo.sh ├── run_test_diode.sh ├── run_test_iiw.sh ├── test.py ├── train.py └── utils ├── __init__.py ├── diode_metrics.py ├── iiw_test_ids.npy ├── render_sphere_nm.py └── whdr.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Ye Yu 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Outdoor inverse rendering from a single image using multiview self-supervision 2 | 3 | Y. Yu and W. Smith, "Outdoor inverse rendering from a single image using multiview self-supervision" in IEEE Transactions on Pattern Analysis & Machine Intelligence 4 | 5 | ## Abstract 6 | 7 |

8 | 9 | In this paper we show how to perform scene-level inverse rendering to recover shape, reflectance and lighting from a single, uncontrolled image using a fully convolutional neural network. The network takes an RGB image as input, regresses albedo, shadow and normal maps from which we infer least squares optimal spherical harmonic lighting coefficients. Our network is trained using large uncontrolled multiview and timelapse image collections without ground truth. By incorporating a differentiable renderer, our network can learn from self-supervision. Since the problem is ill-posed we introduce additional supervision. Our key insight is to perform offline multiview stereo (MVS) on images containing rich illumination variation. From the MVS pose and depth maps, we can cross project between overlapping views such that Siamese training can be used to ensure consistent estimation of photometric invariants. MVS depth also provides direct coarse supervision for normal map estimation. We believe this is the first attempt to use MVS supervision for learning inverse rendering. In addition, we learn a statistical natural illumination prior. We evaluate performance on inverse rendering, normal map estimation and intrinsic image decomposition benchmarks. 10 | 11 | ## Evaluation 12 | 13 | ### Dependencies 14 | 15 | To run our evaluation code, please create your environment based on following dependencies: 16 | 17 | * tensorflow 1.12.0 18 | * python 3.6 19 | * skimage 20 | * cv2 21 | * numpy 22 | 23 | ### Pretrained model 24 | 25 | 1. Download our pretrained model from [here](https://drive.google.com/uc?export=download&id=1hGIoK3Pemtg3eYjFy_CBK-R37D3gA0VC). 26 | 2. Untar the downloaded file to get models: **model_ckpt**, **iiw_model_ckpt** and **diode_model_ckpt**. 27 | 3. Place these three model folders at the root path. 28 | 29 | ```tree 30 | InverseRenderNet_v2 31 | │ README.md 32 | │ test.py 33 | │ ... 34 | | 35 | └─────model_ckpt 36 | │ model.ckpt.meta 37 | │ model.ckpt.index 38 | │ ... 39 | └─────iiw_model_ckpt 40 | │ model.ckpt.meta 41 | │ model.ckpt.index 42 | │ ... 43 | └─────diode_model_ckpt 44 | │ model.ckpt.meta 45 | │ model.ckpt.index 46 | │ ... 47 | ``` 48 | 49 | ### Test on demo image 50 | 51 | You can perform inverse rendering on random RGB image by our pretrained model. To run the demo code, you need to specify the path to pretrained model, path to RGB image and corresponding mask which masked out sky in the image. The mask can be generated by PSPNet, which you can find on . Finally inverse rendering results will be saved to the output folder named by your argument. 52 | 53 | An example of our inference results on the demonstration image can be performed by: 54 | 55 | ```bash 56 | bash run_test_demo.sh 57 | ``` 58 | 59 | Other than running test on the provided demo image, you can test your own images by specifying `IMAGE_PATH` and `MASK_PATH` in [run_test_demo.sh](run_test_demo.sh). The default output folder, being specified by`RESULTS_DIR` is **test_results**. 60 | 61 | ### Test on IIW 62 | 63 | * IIW dataset should be downloaded firstly from 64 | 65 | * Make sure `IMAGES_DIR` in [run_test_iiw.sh](run_test_iiw.sh) point to the path of IIW data and run: 66 | 67 | ```bash 68 | bash run_test_iiw.sh 69 | ``` 70 | 71 | Results will be saved to **test_iiw**. 72 | 73 | ### Test on DIODE 74 | 75 | * Download the dataset from 76 | 77 | * Replace `${IMAGES_DIR}` defined in [run_test_diode.sh](run_test_diode.sh) with the path to DIODE data and run: 78 | 79 | ```bash 80 | bash run_test_diode.sh 81 | ``` 82 | 83 | Results will be saved to **test_diode**. 84 | 85 | ## Citation 86 | 87 | If you use the model or the code in your research, please cite the following paper: 88 | 89 | ```bibtex 90 | @article{yu2021outdoor, 91 | title={Outdoor inverse rendering from a single image using multiview self-supervision}, 92 | author={Yu, Ye and Smith, William A. P.}, 93 | journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 94 | note={to appear}, 95 | year={2021} 96 | } 97 | ``` 98 | -------------------------------------------------------------------------------- /demo_im.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/demo_im.jpg -------------------------------------------------------------------------------- /demo_mask.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/demo_mask.jpg -------------------------------------------------------------------------------- /figure/teaser.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/figure/teaser.png -------------------------------------------------------------------------------- /model/SfMNet.py: -------------------------------------------------------------------------------- 1 | import importlib 2 | import tensorflow as tf 3 | import numpy as np 4 | import tensorflow.contrib.layers as layers 5 | import os 6 | from model import pred_illuDecomp_layer_new as pred_illuDecomp_layer 7 | 8 | 9 | def SfMNet( 10 | inputs, 11 | height, 12 | width, 13 | masks, 14 | n_layers=12, 15 | n_pools=2, 16 | is_training=True, 17 | depth_base=64, 18 | ): 19 | conv_layers = np.int32(n_layers / 2) - 1 20 | deconv_layers = np.int32(n_layers / 2) 21 | # number of layers before perform pooling 22 | nlayers_befPool = np.int32(np.ceil((conv_layers - 1) / n_pools) - 1) 23 | 24 | max_depth = 512 25 | 26 | # dimensional arrangement 27 | # number of layer at tail where no pooling anymore 28 | # also exclude first layer who in charge of expanding dimension 29 | if depth_base * 2 ** n_pools < max_depth: 30 | tail = conv_layers - nlayers_befPool * n_pools 31 | tail_deconv = deconv_layers - nlayers_befPool * n_pools 32 | else: 33 | maxNum_pool = np.log2(max_depth / depth_base) 34 | tail = np.int32(conv_layers - nlayers_befPool * maxNum_pool) 35 | tail_deconv = np.int32(deconv_layers - nlayers_befPool * maxNum_pool) 36 | 37 | f_in_conv = ( 38 | [3] 39 | + [ 40 | np.int32(depth_base * 2 ** (np.ceil(i / nlayers_befPool) - 1)) 41 | for i in range(1, conv_layers - tail + 1) 42 | ] 43 | + [ 44 | np.int32(depth_base * 2 ** maxNum_pool) 45 | for i in range(conv_layers - tail + 1, conv_layers + 1) 46 | ] 47 | ) 48 | f_out_conv = ( 49 | [64] 50 | + [ 51 | np.int32(depth_base * 2 ** (np.floor(i / nlayers_befPool))) 52 | for i in range(1, conv_layers - tail + 1) 53 | ] 54 | + [ 55 | np.int32(depth_base * 2 ** maxNum_pool) 56 | for i in range(conv_layers - tail + 1, conv_layers + 1) 57 | ] 58 | ) 59 | 60 | f_in_deconv = f_out_conv[:0:-1] + [64] 61 | f_out_amDeconv = f_in_conv[:0:-1] + [3] 62 | f_out_MaskDeconv = f_in_conv[:0:-1] + [1] 63 | f_out_nmDeconv = f_in_conv[:0:-1] + [2] 64 | 65 | group_norm_params = { 66 | "groups": 16, 67 | "channels_axis": -1, 68 | "reduction_axes": (-3, -2), 69 | "center": True, 70 | "scale": True, 71 | "epsilon": 1e-4, 72 | "param_initializers": { 73 | "beta_initializer": tf.zeros_initializer(), 74 | "gamma_initializer": tf.ones_initializer(), 75 | "moving_variance_initializer": tf.ones_initializer(), 76 | "moving_average_initializer": tf.zeros_initializer(), 77 | }, 78 | } 79 | 80 | # contractive conv_layer block 81 | conv_out = inputs 82 | conv_out_list = [] 83 | for i, f_in, f_out in zip(range(1, conv_layers + 2), f_in_conv, f_out_conv): 84 | scope = "inverserendernet/conv" + str(i) 85 | 86 | if ( 87 | np.mod(i - 1, nlayers_befPool) == 0 88 | and i <= n_pools * nlayers_befPool + 1 89 | and i != 1 90 | ): 91 | conv_out_list.append(conv_out) 92 | conv_out = conv2d(conv_out, scope, f_in, f_out) 93 | conv_out = tf.nn.max_pool( 94 | conv_out, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME" 95 | ) 96 | else: 97 | conv_out = conv2d(conv_out, scope, f_in, f_out) 98 | 99 | # expanding deconv_layer block succeeding conv_layer block 100 | am_deconv_out = conv_out 101 | for i, f_in, f_out in zip(range(1, deconv_layers + 1), f_in_deconv, f_out_amDeconv): 102 | scope = "inverserendernet/am_deconv" + str(i) 103 | 104 | # expand resolution every after nlayers_befPool deconv_layer 105 | if np.mod(i, nlayers_befPool) == 0 and i <= n_pools * nlayers_befPool: 106 | # attach previous convolutional output to upsampling/deconvolutional output 107 | tmp = conv_out_list[-np.int32(i / nlayers_befPool)] 108 | output_shape = tmp.shape[1:3] 109 | am_deconv_out = tf.image.resize_images(am_deconv_out, output_shape) 110 | am_deconv_out = conv2d(am_deconv_out, scope, f_in, f_out) 111 | am_deconv_out = tf.concat([am_deconv_out, tmp], axis=-1) 112 | elif i == deconv_layers: 113 | # no normalisation and activation, which is placed at the end 114 | am_deconv_out = layers.conv2d( 115 | am_deconv_out, 116 | num_outputs=f_out, 117 | kernel_size=[3, 3], 118 | stride=[1, 1], 119 | padding="SAME", 120 | normalizer_fn=None, 121 | activation_fn=None, 122 | weights_initializer=tf.random_normal_initializer( 123 | mean=0, stddev=np.sqrt(2 / 9 / f_in) 124 | ), 125 | weights_regularizer=layers.l2_regularizer(scale=1e-5), 126 | scope=scope, 127 | ) 128 | else: 129 | # layers that not expand spatial resolution 130 | am_deconv_out = conv2d(am_deconv_out, scope, f_in, f_out) 131 | 132 | # deconvolution net for nm estimates 133 | nm_deconv_out = conv_out 134 | for i, f_in, f_out in zip(range(1, deconv_layers + 1), f_in_deconv, f_out_nmDeconv): 135 | scope = "inverserendernet/nm" + str(i) 136 | 137 | # expand resolution every after nlayers_befPool deconv_layer 138 | if np.mod(i, nlayers_befPool) == 0 and i <= n_pools * nlayers_befPool: 139 | # attach previous convolutional output to upsampling/deconvolutional output 140 | tmp = conv_out_list[-np.int32(i / nlayers_befPool)] 141 | output_shape = tmp.shape[1:3] 142 | nm_deconv_out = tf.image.resize_images(nm_deconv_out, output_shape) 143 | nm_deconv_out = conv2d(nm_deconv_out, scope, f_in, f_out) 144 | nm_deconv_out = tf.concat([nm_deconv_out, tmp], axis=-1) 145 | elif i == deconv_layers: 146 | # no normalisation and activation, which is placed at the end 147 | nm_deconv_out = layers.conv2d( 148 | nm_deconv_out, 149 | num_outputs=f_out, 150 | kernel_size=[3, 3], 151 | stride=[1, 1], 152 | padding="SAME", 153 | normalizer_fn=None, 154 | activation_fn=None, 155 | weights_initializer=tf.random_normal_initializer( 156 | mean=0, stddev=np.sqrt(2 / 9 / f_in) 157 | ), 158 | weights_regularizer=layers.l2_regularizer(scale=1e-5), 159 | biases_initializer=None, 160 | scope=scope, 161 | ) 162 | else: 163 | # layers that not expand spatial resolution 164 | nm_deconv_out = conv2d(nm_deconv_out, scope, f_in, f_out) 165 | 166 | # deconv branch for predicting masks 167 | mask_deconv_out = conv_out 168 | for i, f_in, f_out in zip( 169 | range(1, deconv_layers + 1), f_in_deconv, f_out_MaskDeconv 170 | ): 171 | scope = "inverserendernet/mask_deconv" + str(i) 172 | 173 | # expand resolution every after nlayers_befPool deconv_layer 174 | if np.mod(i, nlayers_befPool) == 0 and i <= n_pools * nlayers_befPool: 175 | # with tf.variable_scope(scope): 176 | # attach previous convolutional output to upsampling/deconvolutional output 177 | tmp = conv_out_list[-np.int32(i / nlayers_befPool)] 178 | output_shape = tmp.shape[1:3] 179 | mask_deconv_out = tf.image.resize_images(mask_deconv_out, output_shape) 180 | mask_deconv_out = conv2d(mask_deconv_out, scope, f_in, f_out) 181 | mask_deconv_out = tf.concat([mask_deconv_out, tmp], axis=-1) 182 | elif i == deconv_layers: 183 | # no normalisation and activation, which is placed at the end 184 | mask_deconv_out = layers.conv2d( 185 | mask_deconv_out, 186 | num_outputs=f_out, 187 | kernel_size=[3, 3], 188 | stride=[1, 1], 189 | padding="SAME", 190 | normalizer_fn=None, 191 | activation_fn=None, 192 | weights_initializer=tf.random_normal_initializer( 193 | mean=0, stddev=np.sqrt(2 / 9 / f_in) 194 | ), 195 | weights_regularizer=layers.l2_regularizer(scale=1e-5), 196 | scope=scope, 197 | ) 198 | else: 199 | # layers that not expand spatial resolution 200 | mask_deconv_out = conv2d(mask_deconv_out, scope, f_in, f_out) 201 | 202 | albedos = am_deconv_out[:, :, :, :3] 203 | nm_pred = nm_deconv_out 204 | 205 | albedos = tf.clip_by_value(tf.nn.tanh(albedos) * masks, -0.9999, 0.9999) 206 | 207 | nm_pred_norm = tf.sqrt( 208 | tf.reduce_sum(nm_pred ** 2, axis=-1, keepdims=True) + tf.constant(1.0) 209 | ) 210 | nm_pred_xy = nm_pred / nm_pred_norm 211 | nm_pred_z = tf.constant(1.0) / nm_pred_norm 212 | nm_pred_xyz = tf.concat([nm_pred_xy, nm_pred_z], axis=-1) * masks 213 | 214 | shadow = mask_deconv_out[:, :, :, :1] 215 | shadow = tf.clip_by_value(tf.nn.tanh(shadow) * masks, -0.9999, 0.9999) 216 | 217 | return albedos, shadow, nm_pred_xyz 218 | 219 | 220 | def get_bilinear_filter(filter_shape, upscale_factor): 221 | ##filter_shape is [width, height, num_in_channels, num_out_channels] 222 | kernel_size = filter_shape[1] 223 | # Centre location of the filter for which value is calculated 224 | if kernel_size % 2 == 1: 225 | centre_location = upscale_factor - 1 226 | else: 227 | centre_location = upscale_factor - 0.5 228 | 229 | x, y = np.meshgrid(np.arange(kernel_size), np.arange(kernel_size)) 230 | bilinear = (1 - abs((x - centre_location) / upscale_factor)) * ( 231 | 1 - abs((y - centre_location) / upscale_factor) 232 | ) 233 | weights = np.tile( 234 | bilinear[:, :, None, None], (1, 1, filter_shape[2], filter_shape[3]) 235 | ) 236 | 237 | return tf.constant_initializer(weights) 238 | 239 | 240 | def group_norm(inputs, scope="group_norm"): 241 | input_shape = tf.shape(inputs) 242 | _, H, W, C = inputs.get_shape().as_list() 243 | group = 32 244 | with tf.variable_scope(scope): 245 | gamma = tf.get_variable( 246 | "scale", 247 | shape=[C], 248 | dtype=tf.float32, 249 | initializer=tf.ones_initializer(), 250 | trainable=True, 251 | regularizer=layers.l2_regularizer(scale=1e-5), 252 | ) 253 | 254 | beta = tf.get_variable( 255 | "bias", 256 | shape=[C], 257 | dtype=tf.float32, 258 | initializer=tf.zeros_initializer(), 259 | trainable=True, 260 | regularizer=layers.l2_regularizer(scale=1e-5), 261 | ) 262 | 263 | inputs = tf.reshape(inputs, [-1, H, W, group, C // group], name="unpack") 264 | mean, var = tf.nn.moments(inputs, [1, 2, 4], keep_dims=True) 265 | inputs = (inputs - mean) / tf.sqrt(var + 1e-5) 266 | inputs = tf.reshape(inputs, input_shape, name="pack") 267 | gamma = tf.reshape(gamma, [1, 1, 1, C], name="reshape_gamma") 268 | beta = tf.reshape(beta, [1, 1, 1, C], name="reshape_beta") 269 | return inputs * gamma + beta 270 | 271 | 272 | def conv2d(inputs, scope, f_in, f_out): 273 | conv_out = layers.conv2d( 274 | inputs, 275 | num_outputs=f_out, 276 | kernel_size=[3, 3], 277 | stride=[1, 1], 278 | padding="SAME", 279 | normalizer_fn=None, 280 | activation_fn=None, 281 | weights_initializer=tf.random_normal_initializer( 282 | mean=0, stddev=np.sqrt(2 / 9 / f_in) 283 | ), 284 | weights_regularizer=layers.l2_regularizer(scale=1e-5), 285 | biases_initializer=None, 286 | scope=scope, 287 | ) 288 | 289 | with tf.variable_scope(scope): 290 | gn_out = group_norm(conv_out) 291 | 292 | relu_out = tf.nn.relu(gn_out) 293 | 294 | return relu_out 295 | 296 | 297 | def comp_light(inputs, albedos, normals, shadows, gamma, masks): 298 | inputs = rescale_2_zero_one(inputs) 299 | albedos = rescale_2_zero_one(albedos) 300 | shadows = rescale_2_zero_one(shadows) 301 | 302 | lighting_model = "./model/hdr_illu_pca" 303 | lighting_vectors = tf.constant( 304 | np.load(os.path.join(lighting_model, "pcaVector.npy")), dtype=tf.float32 305 | ) 306 | lighting_means = tf.constant( 307 | np.load(os.path.join(lighting_model, "mean.npy")), dtype=tf.float32 308 | ) 309 | lightings_var = tf.constant( 310 | np.load(os.path.join(lighting_model, "pcaVariance.npy")), dtype=tf.float32 311 | ) 312 | 313 | lightings = pred_illuDecomp_layer.illuDecomp( 314 | inputs, albedos, normals, shadows, gamma, masks 315 | ) 316 | lightings_pca = tf.matmul((lightings - lighting_means), pinv(lighting_vectors)) 317 | 318 | # recompute lightings from lightins_pca which could add weak constraint on lighting reconstruction 319 | lightings = tf.matmul(lightings_pca, lighting_vectors) + lighting_means 320 | # reshape 27-D lightings to 9*3 lightings 321 | lightings = tf.reshape(lightings, [tf.shape(lightings)[0], 9, 3]) 322 | 323 | # lighting prior loss 324 | var = tf.reduce_mean(lightings_pca ** 2, axis=0) 325 | 326 | illu_prior_loss = tf.losses.absolute_difference(var, lightings_var) 327 | illu_prior_loss = tf.constant(0.0) 328 | 329 | return lightings, illu_prior_loss 330 | 331 | 332 | def pinv(A, reltol=1e-6): 333 | # compute SVD of input A 334 | s, u, v = tf.svd(A) 335 | 336 | # invert s and clear entries lower than reltol*s_max 337 | atol = tf.reduce_max(s) * reltol 338 | # s = tf.boolean_mask(s, s>atol) 339 | s = tf.where(s > atol, s, atol * tf.ones_like(s)) 340 | s_inv = tf.diag(1.0 / s) 341 | # s_inv = tf.diag(tf.concat([1./s, tf.zeros([tf.size(b) - tf.size(s)])], axis=0)) 342 | 343 | # compute v * s_inv * u_t as psuedo inverse 344 | return tf.matmul(v, tf.matmul(s_inv, tf.transpose(u))) 345 | 346 | 347 | def rescale_2_zero_one(imgs): 348 | return imgs / 2.0 + 0.5 349 | 350 | 351 | def rescale_2_minusOne_one(imgs): 352 | return imgs * 2.0 - 1.0 353 | -------------------------------------------------------------------------------- /model/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/model/__init__.py -------------------------------------------------------------------------------- /model/consistency_layer.py: -------------------------------------------------------------------------------- 1 | # formulate loss function based on supplied ground truth and outputs from network 2 | 3 | import importlib 4 | import tensorflow as tf 5 | import numpy as np 6 | import os 7 | from model import ( 8 | reproj_layer, 9 | lambSH_layer, 10 | ) 11 | from model import vgg16 as VGG 12 | 13 | 14 | def loss_formulate( 15 | albedos, 16 | shadow, 17 | nm_pred, 18 | lightings, 19 | nm_gt, 20 | inputs, 21 | dms, 22 | cams, 23 | scale_xs, 24 | scale_ys, 25 | masks, 26 | reproj_inputs, 27 | reprojInput_mask, 28 | pair_label, 29 | sup_flag, 30 | illu_prior_loss, 31 | reg_loss_flag=True, 32 | ): 33 | 34 | # define perceptual loss by vgg16 35 | vgg_path = "../vgg16.npy" 36 | vgg16 = VGG.Vgg16(vgg_path) 37 | 38 | # pre-process inputs based on gamma 39 | gamma = tf.constant(1.0) # gamma is 4d constant 40 | 41 | albedos = rescale_2_zero_one(albedos) 42 | shadow = rescale_2_zero_one(shadow) 43 | inputs = rescale_2_zero_one(inputs) 44 | reproj_inputs = rescale_2_zero_one(reproj_inputs) 45 | 46 | sdFree_inputs = tf.pow(tf.nn.relu(tf.pow(inputs, gamma) / shadow) + 1e-4, 1 / gamma) 47 | 48 | # selete normal map used in rendering - gt or pred 49 | normals = tf.where(sup_flag, nm_gt, nm_pred) 50 | 51 | ###### cProj rendering loss 52 | # repeating elements from original batch for a number of times that is same with the number of paired images inside the sub-batch, such that realise the parallel albedos reproj computation 53 | reproj_tb = tf.to_float(tf.equal(pair_label, tf.transpose(pair_label))) 54 | reproj_tb = tf.cast( 55 | tf.matrix_set_diag(reproj_tb, tf.zeros([tf.shape(inputs)[0]])), tf.bool 56 | ) 57 | reproj_list = tf.where(reproj_tb) 58 | img1_inds = tf.expand_dims(reproj_list[:, 0], axis=-1) 59 | img2_inds = tf.expand_dims(reproj_list[:, 1], axis=-1) 60 | albedo1 = tf.gather_nd(albedos, img1_inds) 61 | dms1 = tf.gather_nd(dms, img1_inds) 62 | cams1 = tf.gather_nd(cams, img1_inds) 63 | albedo2 = tf.gather_nd(albedos, img2_inds) 64 | cams2 = tf.gather_nd(cams, img2_inds) 65 | scale_xs1 = tf.gather_nd(scale_xs, img1_inds) 66 | scale_xs2 = tf.gather_nd(scale_xs, img2_inds) 67 | scale_ys1 = tf.gather_nd(scale_ys, img1_inds) 68 | scale_ys2 = tf.gather_nd(scale_ys, img2_inds) 69 | 70 | lightings2 = tf.gather_nd(lightings, img2_inds) 71 | normals1 = tf.gather_nd(normals, img1_inds) 72 | shadow2 = tf.gather_nd(shadow, img2_inds) 73 | 74 | # rotate lighting predictions 75 | cams_rot = tf.matmul( 76 | tf.reshape(cams1[:, 4:13], (-1, 3, 3)), 77 | tf.transpose(tf.reshape(cams2[:, 4:13], (-1, 3, 3)), (0, 2, 1)), 78 | ) 79 | 80 | thetaX, thetaY, thetaZ = rotm2eul(cams_rot) 81 | rot = Rotation(thetaX, thetaY, thetaZ) 82 | 83 | # rotate SHL from source cam_coord to target cam_coord 84 | reproj_lightings = tf.reduce_sum( 85 | rot[:, :, :, tf.newaxis] * lightings2[:, tf.newaxis, :, :], axis=-2 86 | ) 87 | 88 | # scale albedo map based on max and min values such that albedo values are in range (0,1) 89 | reproj_shadow1, reproj_mask = reproj_layer.map_reproj( 90 | dms1, shadow2, cams1, cams2, scale_xs1, scale_xs2, scale_ys1, scale_ys2 91 | ) 92 | reproj_shadow1 = tf.clip_by_value(reproj_shadow1, 1e-4, 0.9999) 93 | 94 | gamma2 = 1.0 95 | reproj_sdFree_inputs = tf.pow( 96 | tf.clip_by_value(tf.pow(reproj_inputs, gamma2) / reproj_shadow1, 1e-4, 0.9999), 97 | 1 / gamma2, 98 | ) 99 | 100 | sdFree_shadings, renderings_mask = lambSH_layer.lambSH_layer( 101 | tf.ones_like(albedos), normals, lightings, tf.ones_like(shadow), 1.0 102 | ) 103 | reproj_sdFree_shadings, _ = lambSH_layer.lambSH_layer( 104 | tf.ones_like(albedo1), 105 | normals1, 106 | reproj_lightings, 107 | tf.ones_like(reproj_shadow1), 108 | 1.0, 109 | ) 110 | 111 | reproj_albedo1, reproj_mask = reproj_layer.map_reproj( 112 | dms1, albedo2, cams1, cams2, scale_xs1, scale_xs2, scale_ys1, scale_ys2 113 | ) 114 | 115 | reproj_albedo1 = reproj_albedo1 + tf.constant(1e-4) # numerical stable constant 116 | 117 | ### scale intensities for each image 118 | albedo1_pixels = tf.boolean_mask(albedo1, reproj_mask) 119 | reproj_albedo1_pixels = tf.boolean_mask(reproj_albedo1, reproj_mask) 120 | reproj_err = 0.5 * tf.losses.mean_squared_error( 121 | cvtLab(albedo1_pixels), cvtLab(reproj_albedo1_pixels) 122 | ) 123 | reproj_err += 2.5 * perceptualLoss_formulate( 124 | vgg16, 125 | albedo1, 126 | reproj_albedo1, 127 | tf.to_float(reproj_mask[:, :, :, tf.newaxis]), 128 | ) 129 | 130 | sdFree_recons = tf.pow(tf.nn.relu(albedos * sdFree_shadings), 1 / gamma) 131 | sdFree_inputs_pixels = cvtLab(tf.boolean_mask(sdFree_inputs, renderings_mask)) 132 | sdFree_recons_pixels = cvtLab(tf.boolean_mask(sdFree_recons, renderings_mask)) 133 | render_err = 0.5 * tf.losses.mean_squared_error( 134 | sdFree_inputs_pixels, sdFree_recons_pixels 135 | ) 136 | render_err += 2.5 * perceptualLoss_formulate( 137 | vgg16, 138 | sdFree_inputs, 139 | sdFree_recons, 140 | tf.to_float(renderings_mask[:, :, :, tf.newaxis]), 141 | ) 142 | 143 | ## scale intensities for each image 144 | reproj_sdFree_renderings = tf.pow( 145 | tf.nn.relu(reproj_sdFree_shadings * albedo1), 1 / gamma2 146 | ) 147 | 148 | reprojInput_mask = tf.cast(reprojInput_mask[:, :, :, 0], tf.bool) 149 | sdFree_inputs_pixels = cvtLab( 150 | tf.boolean_mask(reproj_sdFree_inputs, reprojInput_mask) 151 | ) 152 | reproj_sdFree_renderings_pixels = cvtLab( 153 | tf.boolean_mask(reproj_sdFree_renderings, reprojInput_mask) 154 | ) 155 | cross_render_err = 0.5 * tf.losses.mean_squared_error( 156 | sdFree_inputs_pixels, reproj_sdFree_renderings_pixels 157 | ) 158 | cross_render_err += 2.5 * perceptualLoss_formulate( 159 | vgg16, 160 | reproj_sdFree_inputs, 161 | reproj_sdFree_renderings, 162 | tf.to_float(reprojInput_mask[:, :, :, tf.newaxis]), 163 | ) 164 | 165 | ### regualarisation loss 166 | reg_loss = sum(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)) 167 | 168 | ### scale-invariant loss 169 | 170 | ### compute nm_pred error 171 | nmSup_mask = tf.not_equal(tf.reduce_sum(nm_gt, axis=-1), 0) 172 | nm_gt_pixel = tf.boolean_mask(nm_gt, nmSup_mask) 173 | nm_pred_pixel = tf.boolean_mask(nm_pred, nmSup_mask) 174 | nm_prod = tf.reduce_sum(nm_pred_pixel * nm_gt_pixel, axis=-1, keepdims=True) 175 | nm_cosValue = tf.constant(0.9999) 176 | nm_prod = tf.clip_by_value(nm_prod, -nm_cosValue, nm_cosValue) 177 | nm_angle = tf.acos(nm_prod) + tf.constant(1e-4) 178 | nm_loss = tf.reduce_mean(nm_angle ** 2) 179 | 180 | ### compute gradient loss 181 | Gx = tf.constant(1 / 2) * tf.expand_dims( 182 | tf.expand_dims(tf.constant([[-1, 1]], dtype=tf.float32), axis=-1), axis=-1 183 | ) 184 | Gy = tf.constant(1 / 2) * tf.expand_dims( 185 | tf.expand_dims(tf.constant([[-1], [1]], dtype=tf.float32), axis=-1), axis=-1 186 | ) 187 | nm_pred_Gx = conv2d_nosum(nm_pred, Gx) 188 | nm_pred_Gy = conv2d_nosum(nm_pred, Gy) 189 | nm_pred_Gxy = tf.concat([nm_pred_Gx, nm_pred_Gy], axis=-1) 190 | normals_Gx = conv2d_nosum(nm_gt, Gx) 191 | normals_Gy = conv2d_nosum(nm_gt, Gy) 192 | normals_Gxy = tf.concat([normals_Gx, normals_Gy], axis=-1) 193 | nm_pred_smt_error = tf.losses.mean_squared_error(nm_pred_Gxy, normals_Gxy) 194 | 195 | ### total loss 196 | render_err *= tf.constant(0.1) 197 | reproj_err *= tf.constant(0.1) 198 | cross_render_err *= tf.constant(0.1) 199 | illu_prior_loss *= tf.constant(0.005) 200 | nm_pred_smt_error *= tf.constant(1.0) 201 | nm_loss *= tf.constant(1.0) 202 | 203 | if reg_loss_flag == True: 204 | loss = ( 205 | render_err 206 | + reproj_err 207 | + cross_render_err 208 | + reg_loss 209 | + illu_prior_loss 210 | + nm_pred_smt_error 211 | + nm_loss 212 | ) 213 | else: 214 | loss = ( 215 | render_err 216 | + reproj_err 217 | + cross_render_err 218 | + illu_prior_loss 219 | + nm_pred_smt_error 220 | + nm_loss 221 | ) 222 | 223 | return ( 224 | loss, 225 | render_err, 226 | reproj_err, 227 | cross_render_err, 228 | reg_loss, 229 | illu_prior_loss, 230 | nm_pred_smt_error, 231 | nm_loss, 232 | sdFree_inputs, 233 | sdFree_shadings, 234 | sdFree_recons, 235 | ) 236 | 237 | 238 | def perceptualLoss_formulate(vgg16, renderings, inputs, masks, w_act=0.1): 239 | vgg_layers = ["conv1_2"] # conv1 through conv5 240 | vgg_layer_weights = [1.0] 241 | 242 | renderings_acts = vgg16.get_vgg_activations(renderings, vgg_layers) 243 | refs_acts = vgg16.get_vgg_activations(inputs, vgg_layers) 244 | 245 | loss = 0 246 | masks_shape = [(200, 200), (100, 100)] 247 | 248 | tmp_reproj_mask = masks 249 | for mask_shape, w, act1, act2 in zip( 250 | masks_shape, vgg_layer_weights, renderings_acts, refs_acts 251 | ): 252 | act1 *= tmp_reproj_mask 253 | act2 *= tmp_reproj_mask 254 | tmp_reproj_mask_weights = tf.tile( 255 | tf.clip_by_value(tmp_reproj_mask, 1e-4, 0.9999), (1, 1, 1, act1.shape[-1]) 256 | ) 257 | loss += ( 258 | w 259 | * tf.reduce_sum(tmp_reproj_mask_weights * tf.square(w_act * (act1 - act2))) 260 | / tf.reduce_sum(tmp_reproj_mask_weights) 261 | ) 262 | 263 | tmp_reproj_mask = 1.0 - tf.nn.max_pool( 264 | 1.0 - tmp_reproj_mask, 265 | ksize=[1, 2, 2, 1], 266 | strides=[1, 2, 2, 1], 267 | padding="SAME", 268 | ) 269 | 270 | loss *= 0.0005 271 | 272 | return loss 273 | 274 | 275 | # input RGB is 2d tensor with shape (n_pix, 3) 276 | def cvtLab(RGB): 277 | 278 | # threshold definition 279 | T = tf.constant(0.008856) 280 | 281 | # matrix for converting RGB to LUV color space 282 | cvt_XYZ = tf.constant( 283 | [ 284 | [0.412453, 0.35758, 0.180423], 285 | [0.212671, 0.71516, 0.072169], 286 | [0.019334, 0.119193, 0.950227], 287 | ] 288 | ) 289 | 290 | # convert RGB to XYZ 291 | XYZ = tf.matmul(RGB, tf.transpose(cvt_XYZ)) 292 | 293 | # normalise for D65 white point 294 | XYZ /= tf.constant([[0.950456, 1.0, 1.088754]]) * 100 295 | 296 | mask = tf.to_float(tf.greater(XYZ, T)) 297 | 298 | fXYZ = XYZ ** (1 / 3) * mask + (1.0 - mask) * ( 299 | tf.constant(7.787) * XYZ + tf.constant(0.137931) 300 | ) 301 | 302 | M_cvtLab = tf.constant( 303 | [[0.0, 116.0, 0.0], [500.0, -500.0, 0.0], [0.0, 200.0, -200.0]] 304 | ) 305 | 306 | Lab = tf.matmul(fXYZ, tf.transpose(M_cvtLab)) + tf.constant([[-16.0, 0.0, 0.0]]) 307 | mask = tf.to_float(tf.equal(Lab, tf.constant(0.0))) 308 | 309 | Lab += mask * tf.constant(1e-4) 310 | 311 | return Lab 312 | 313 | 314 | # compute regular 2d convolution on 3d data 315 | def conv2d_nosum(input, kernel): 316 | input_x = input[:, :, :, 0:1] 317 | input_y = input[:, :, :, 1:2] 318 | input_z = input[:, :, :, 2:3] 319 | 320 | output_x = tf.nn.conv2d(input_x, kernel, strides=(1, 1, 1, 1), padding="SAME") 321 | output_y = tf.nn.conv2d(input_y, kernel, strides=(1, 1, 1, 1), padding="SAME") 322 | output_z = tf.nn.conv2d(input_z, kernel, strides=(1, 1, 1, 1), padding="SAME") 323 | 324 | return tf.concat([output_x, output_y, output_z], axis=-1) 325 | 326 | 327 | def rescale_2_zero_one(imgs): 328 | return imgs / 2.0 + 0.5 329 | 330 | 331 | def Rotation(thetaX, thetaY, thetaZ): 332 | num_rots = tf.shape(thetaX)[0] 333 | 334 | # rows_x = [0, 1, 2, 3, 4, 5, 6, 6, 7, 8, 8] 335 | # cols_x = [0, 2, 1, 3, 7, 5, 6, 8, 4, 6, 8] 336 | idx_x = [ 337 | [0, 0], 338 | [1, 2], 339 | [2, 1], 340 | [3, 3], 341 | [4, 7], 342 | [5, 5], 343 | [6, 6], 344 | [6, 8], 345 | [7, 4], 346 | [8, 6], 347 | [8, 8], 348 | ] 349 | 350 | data_x_p90 = [ 351 | 1, 352 | -1, 353 | 1, 354 | 1, 355 | -1, 356 | -1, 357 | -1 / 2, 358 | -np.sqrt(3) / 2, 359 | 1, 360 | -np.sqrt(3) / 2, 361 | 1 / 2, 362 | ] 363 | data_x_n90 = [ 364 | 1, 365 | 1, 366 | -1, 367 | 1, 368 | 1, 369 | -1, 370 | -1 / 2, 371 | -np.sqrt(3) / 2, 372 | -1, 373 | -np.sqrt(3) / 2, 374 | 1 / 2, 375 | ] 376 | 377 | Rot_X_p90 = tf.sparse_to_dense( 378 | sparse_indices=idx_x, sparse_values=data_x_p90, output_shape=(9, 9) 379 | ) 380 | Rot_X_n90 = tf.sparse_to_dense( 381 | sparse_indices=idx_x, sparse_values=data_x_n90, output_shape=(9, 9) 382 | ) 383 | 384 | Rot_X_p90 = tf.tile(Rot_X_p90[tf.newaxis], (num_rots, 1, 1)) 385 | Rot_X_n90 = tf.tile(Rot_X_n90[tf.newaxis], (num_rots, 1, 1)) 386 | 387 | Rot_z = rot_z(thetaZ, num_rots) 388 | 389 | Rot_y = rot_y(thetaY, Rot_X_p90, Rot_X_n90, num_rots) 390 | 391 | Rot_x = rot_x(thetaX, Rot_X_p90, Rot_X_n90, num_rots) 392 | # return Rot_x 393 | 394 | Rot = tf.matmul(Rot_z, tf.matmul(Rot_y, Rot_x)) 395 | 396 | return Rot 397 | 398 | 399 | def rot_z(thetaZ, num_rots): 400 | # rows_z = [0, 1, 1, 2, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 8] 401 | # cols_z = [0, 1, 3, 2, 1, 3, 4, 8, 5, 7, 6, 5, 7, 4, 8] 402 | idx_z = tf.constant( 403 | [ 404 | [0, 0], 405 | [1, 1], 406 | [1, 3], 407 | [2, 2], 408 | [3, 1], 409 | [3, 3], 410 | [4, 4], 411 | [4, 8], 412 | [5, 5], 413 | [5, 7], 414 | [6, 6], 415 | [7, 5], 416 | [7, 7], 417 | [8, 4], 418 | [8, 8], 419 | ] 420 | ) 421 | idx_id = tf.reshape( 422 | tf.tile(tf.range(num_rots)[:, tf.newaxis], (1, tf.shape(idx_z)[0])), (-1, 1) 423 | ) 424 | idx_z = tf.tile(idx_z, (num_rots, 1)) 425 | idx_z = tf.concat([idx_id, idx_z], axis=-1) 426 | 427 | data_Z = tf.stack( 428 | [ 429 | tf.ones_like(thetaZ), 430 | tf.cos(thetaZ), 431 | tf.sin(thetaZ), 432 | tf.ones_like(thetaZ), 433 | -tf.sin(thetaZ), 434 | tf.cos(thetaZ), 435 | tf.cos(2 * thetaZ), 436 | tf.sin(2 * thetaZ), 437 | tf.cos(thetaZ), 438 | tf.sin(thetaZ), 439 | tf.ones_like(thetaZ), 440 | -tf.sin(thetaZ), 441 | tf.cos(thetaZ), 442 | -tf.sin(2 * thetaZ), 443 | tf.cos(2 * thetaZ), 444 | ], 445 | axis=-1, 446 | ) 447 | data_Z = tf.reshape(data_Z, (-1,)) 448 | 449 | return tf.sparse_to_dense( 450 | sparse_indices=idx_z, sparse_values=data_Z, output_shape=(num_rots, 9, 9) 451 | ) 452 | 453 | 454 | def rot_y(thetaY, Rot_X_p90, Rot_X_n90, num_rots): 455 | # rows_z = [0, 1, 1, 2, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 8] 456 | # cols_z = [0, 1, 3, 2, 1, 3, 4, 8, 5, 7, 6, 5, 7, 4, 8] 457 | idx_z = tf.constant( 458 | [ 459 | [0, 0], 460 | [1, 1], 461 | [1, 3], 462 | [2, 2], 463 | [3, 1], 464 | [3, 3], 465 | [4, 4], 466 | [4, 8], 467 | [5, 5], 468 | [5, 7], 469 | [6, 6], 470 | [7, 5], 471 | [7, 7], 472 | [8, 4], 473 | [8, 8], 474 | ] 475 | ) 476 | idx_id = tf.reshape( 477 | tf.tile(tf.range(num_rots)[:, tf.newaxis], (1, tf.shape(idx_z)[0])), (-1, 1) 478 | ) 479 | idx_z = tf.tile(idx_z, (num_rots, 1)) 480 | idx_z = tf.concat([idx_id, idx_z], axis=-1) 481 | 482 | data_Y = tf.stack( 483 | [ 484 | tf.ones_like(thetaY), 485 | tf.cos(thetaY), 486 | tf.sin(thetaY), 487 | tf.ones_like(thetaY), 488 | -tf.sin(thetaY), 489 | tf.cos(thetaY), 490 | tf.cos(2 * thetaY), 491 | tf.sin(2 * thetaY), 492 | tf.cos(thetaY), 493 | tf.sin(thetaY), 494 | tf.ones_like(thetaY), 495 | -tf.sin(thetaY), 496 | tf.cos(thetaY), 497 | -tf.sin(2 * thetaY), 498 | tf.cos(2 * thetaY), 499 | ], 500 | axis=-1, 501 | ) 502 | data_Y = tf.reshape(data_Y, (-1,)) 503 | 504 | Rot_y = tf.sparse_to_dense( 505 | sparse_indices=idx_z, sparse_values=data_Y, output_shape=(num_rots, 9, 9) 506 | ) 507 | 508 | return tf.matmul(Rot_X_n90, tf.matmul(Rot_y, Rot_X_p90)) 509 | 510 | 511 | def rot_x(thetaX, Rot_X_p90, Rot_X_n90, num_rots): 512 | # rows_z = [0, 1, 1, 2, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 8] 513 | # cols_z = [0, 1, 3, 2, 1, 3, 4, 8, 5, 7, 6, 5, 7, 4, 8] 514 | idx_z = tf.constant( 515 | [ 516 | [0, 0], 517 | [1, 1], 518 | [1, 3], 519 | [2, 2], 520 | [3, 1], 521 | [3, 3], 522 | [4, 4], 523 | [4, 8], 524 | [5, 5], 525 | [5, 7], 526 | [6, 6], 527 | [7, 5], 528 | [7, 7], 529 | [8, 4], 530 | [8, 8], 531 | ] 532 | ) 533 | idx_id = tf.reshape( 534 | tf.tile(tf.range(num_rots)[:, tf.newaxis], (1, tf.shape(idx_z)[0])), (-1, 1) 535 | ) 536 | idx_z = tf.tile(idx_z, (num_rots, 1)) 537 | idx_z = tf.concat([idx_id, idx_z], axis=-1) 538 | 539 | data_X = tf.stack( 540 | [ 541 | tf.ones_like(thetaX), 542 | tf.cos(thetaX), 543 | tf.sin(thetaX), 544 | tf.ones_like(thetaX), 545 | -tf.sin(thetaX), 546 | tf.cos(thetaX), 547 | tf.cos(2 * thetaX), 548 | tf.sin(2 * thetaX), 549 | tf.cos(thetaX), 550 | tf.sin(thetaX), 551 | tf.ones_like(thetaX), 552 | -tf.sin(thetaX), 553 | tf.cos(thetaX), 554 | -tf.sin(2 * thetaX), 555 | tf.cos(2 * thetaX), 556 | ], 557 | axis=-1, 558 | ) 559 | data_X = tf.reshape(data_X, (-1,)) 560 | 561 | Rot_x = tf.sparse_to_dense( 562 | sparse_indices=idx_z, sparse_values=data_X, output_shape=(num_rots, 9, 9) 563 | ) 564 | 565 | half_pi = tf.tile(tf.constant([np.pi / 2]), (num_rots,)) 566 | Rot_Y_n90 = rot_y(-half_pi, Rot_X_p90, Rot_X_n90, num_rots) 567 | Rot_Y_p90 = rot_y(half_pi, Rot_X_p90, Rot_X_n90, num_rots) 568 | 569 | return tf.matmul(Rot_Y_p90, tf.matmul(Rot_x, Rot_Y_n90)) 570 | 571 | 572 | def rotm2eul(rotm): 573 | sy = tf.sqrt(rotm[:, 0, 0] ** 2 + rotm[:, 1, 0] ** 2) 574 | singular = sy < 1e-6 575 | 576 | thetaX = tf.where( 577 | singular, 578 | tf.atan2(-rotm[:, 1, 2], rotm[:, 1, 1]), 579 | tf.atan2(rotm[:, 2, 1], rotm[:, 2, 2]), 580 | ) 581 | 582 | # thetaY = tf.where(singular, tf.atan2(-rotm[:,2,0], sy), tf.atan2(rotm[:,2,1], rotm[:,2,2])) 583 | thetaY = tf.atan2(rotm[:, 2, 0], sy) 584 | 585 | thetaZ = tf.where( 586 | singular, tf.zeros_like(rotm[:, 0, 0]), tf.atan2(rotm[:, 1, 0], rotm[:, 0, 0]) 587 | ) 588 | 589 | return thetaX, thetaY, thetaZ 590 | -------------------------------------------------------------------------------- /model/dataloader.py: -------------------------------------------------------------------------------- 1 | import pickle as pk 2 | import os 3 | import numpy as np 4 | import tensorflow as tf 5 | import skimage.transform as imgTform 6 | import glob 7 | from scipy import io 8 | 9 | 10 | def bigTime_dataPipeline(num_subbatch_input, dir): 11 | img_batch = pk.load(open(os.path.join(dir + "BigTime_v1", "img_batch.p"), "rb")) 12 | 13 | scene = "0161" 14 | img_batch = [sorted(sc_batch) for sc_batch in img_batch if scene in sc_batch[0][:4]] 15 | img_batch = np.asarray(img_batch[0]) 16 | img_batch = [np.delete(img_batch, [10, 20, 30, 40, 50])] 17 | 18 | sc_len = np.asarray([len(sc_l) for sc_l in img_batch], dtype=np.int32) 19 | size_dim2 = sc_len.max() 20 | 21 | bt_imgs_path = [] 22 | bt_masks_path = [] 23 | for sc_list, sc_size in zip(img_batch, sc_len): 24 | sc_imgs_path = [] 25 | sc_masks_path = [] 26 | for img_string in sc_list: 27 | tmp = img_string.split(os.path.sep) 28 | 29 | img_path = os.path.join(dir + "BigTime_v1", tmp[0], "data", tmp[1]) 30 | 31 | img_name = os.path.splitext(tmp[1])[0] 32 | mask_path = os.path.join( 33 | dir + "BigTime_v1", tmp[0], "data", img_name + "_mask.png" 34 | ) 35 | 36 | sc_imgs_path.append(img_path) 37 | sc_masks_path.append(mask_path) 38 | 39 | sc_imgs_path += ["" for i in range(size_dim2 - sc_size)] 40 | sc_masks_path += ["" for i in range(size_dim2 - sc_size)] 41 | bt_imgs_path.append(sc_imgs_path) 42 | bt_masks_path.append(sc_masks_path) 43 | 44 | bt_imgs_path = np.asarray(bt_imgs_path) 45 | bt_masks_path = np.asarray(bt_masks_path) 46 | 47 | train_data = bt_construct_inputPipeline( 48 | bt_imgs_path, 49 | bt_masks_path, 50 | sc_len, 51 | batch_size=num_subbatch_input, 52 | flag_shuffle=True, 53 | ) 54 | 55 | # define re-initialisable iterator 56 | iterator = tf.data.Iterator.from_structure( 57 | train_data.output_types, train_data.output_shapes 58 | ) 59 | next_element = iterator.get_next() 60 | 61 | # define initialisation for each iterator 62 | trainData_init_op = iterator.make_initializer(train_data) 63 | 64 | return next_element, trainData_init_op, len(bt_imgs_path) 65 | 66 | 67 | def megaDepth_dataPipeline(num_subbatch_input, dir, training_mode, num_test_sc): 68 | # locate all scenes 69 | data_scenes1 = np.array( 70 | sorted(glob.glob(os.path.join(dir + "new_outdoorMega_items", "*"))) 71 | ) 72 | data_scenes2 = np.array( 73 | sorted(glob.glob(os.path.join(dir + "new_indoorMega_items", "*"))) 74 | ) 75 | data_scenes3 = np.array( 76 | sorted(glob.glob(os.path.join(dir + "new_LSMega_items", "*"))) 77 | ) 78 | 79 | # scan scenes 80 | # sort scenes by number of training images in each 81 | scenes_size1 = np.array([len(os.listdir(i)) for i in data_scenes1]) 82 | scenes_size2 = np.array([len(os.listdir(i)) for i in data_scenes2]) 83 | scenes_size3 = np.array([len(os.listdir(i)) for i in data_scenes3]) 84 | scenes_sorted1 = np.argsort(scenes_size1) 85 | scenes_sorted2 = np.argsort(scenes_size2) 86 | scenes_sorted3 = np.argsort(scenes_size3) 87 | 88 | train_scenes = data_scenes1[scenes_sorted1[num_test_sc:]] 89 | test_scenes = data_scenes1[scenes_sorted1[:num_test_sc]] 90 | 91 | cProj_HiRes_scenes = np.array( 92 | [ 93 | os.path.join(dir + "HiRes_cProj_imgs", sc.split("/")[-1]) 94 | for sc in train_scenes 95 | ] 96 | ) 97 | cProj_HiRes_test_scenes = np.array( 98 | [ 99 | os.path.join(dir + "HiRes_cProj_imgs", sc.split("/")[-1]) 100 | for sc in test_scenes 101 | ] 102 | ) 103 | 104 | # load data from each scene 105 | # locate each data minibatch in each sorted sc 106 | train_scenes_items = [ 107 | sorted(glob.glob(os.path.join(sc, "*.pk"))) for sc in train_scenes 108 | ] 109 | train_scenes_items = np.concatenate(train_scenes_items, axis=0) 110 | test_scenes_items = [ 111 | sorted(glob.glob(os.path.join(sc, "*.pk"))) for sc in test_scenes 112 | ] 113 | test_scenes_items = np.concatenate(test_scenes_items, axis=0) 114 | 115 | HiRes_cProj_items = [ 116 | sorted(glob.glob(os.path.join(sc, "*.pk"))) for sc in cProj_HiRes_scenes 117 | ] 118 | HiRes_cProj_items = np.concatenate(HiRes_cProj_items, axis=0) 119 | HiRes_cProj_test_items = [ 120 | sorted(glob.glob(os.path.join(sc, "*.pk"))) for sc in cProj_HiRes_test_scenes 121 | ] 122 | HiRes_cProj_test_items = np.concatenate(HiRes_cProj_test_items, axis=0) 123 | 124 | # split data into train and test 125 | # separate out some data from training scenes as testing data 126 | train_items = train_scenes_items 127 | test_items = test_scenes_items 128 | 129 | ### contruct training data pipeline 130 | # remove residual data over number of data in one epoch 131 | res_train_items = len(train_items) - (len(train_items) % num_subbatch_input) 132 | train_items = train_items[:res_train_items] 133 | HiRes_cProj_items = HiRes_cProj_items[:res_train_items] 134 | train_data = md_construct_inputPipeline( 135 | train_items, HiRes_cProj_items, flag_shuffle=True, batch_size=num_subbatch_input 136 | ) 137 | 138 | res_test_items = len(test_items) - (len(test_items) % num_subbatch_input) 139 | test_items = test_items[:res_test_items] 140 | HiRes_cProj_test_items = HiRes_cProj_test_items[:res_test_items] 141 | test_data = md_construct_inputPipeline( 142 | test_items, 143 | HiRes_cProj_test_items, 144 | flag_shuffle=False, 145 | batch_size=num_subbatch_input, 146 | ) 147 | 148 | # define re-initialisable iterator 149 | iterator = tf.data.Iterator.from_structure( 150 | train_data.output_types, train_data.output_shapes 151 | ) 152 | next_element = iterator.get_next() 153 | 154 | # define initialisation for each iterator 155 | trainData_init_op = iterator.make_initializer(train_data) 156 | testData_init_op = iterator.make_initializer(test_data) 157 | 158 | return ( 159 | next_element, 160 | trainData_init_op, 161 | testData_init_op, 162 | len(train_items), 163 | len(test_items), 164 | ) 165 | 166 | 167 | def nyu_dataPipeline(num_subbatch_input, dir): 168 | nm_gts_path = np.array( 169 | glob.glob(os.path.join(dir, "normals_gt", "new_normals", "*")) 170 | ) 171 | nm_gts_path.sort() 172 | masks_path = np.array(glob.glob(os.path.join(dir, "normals_gt", "masks", "*"))) 173 | masks_path.sort() 174 | splits_path = os.path.join(dir, "splits.mat") 175 | imgs_path = os.path.join(dir, "NYU_imgs.mat") 176 | train_split = io.loadmat(splits_path)["trainNdxs"] 177 | train_split -= 1 178 | train_split = train_split.squeeze() 179 | test_split = io.loadmat(splits_path)["testNdxs"] 180 | test_split -= 1 181 | test_split = test_split.squeeze() 182 | train_split = np.squeeze(train_split) 183 | imgs = io.loadmat(imgs_path)["imgs"] 184 | imgs = imgs.transpose(-1, 0, 1, 2) 185 | 186 | train_nm_gts_path = nm_gts_path[train_split] 187 | train_masks_path = masks_path[train_split] 188 | train_imgs = imgs[train_split] 189 | 190 | train_data = nyu_construct_inputPipeline( 191 | train_imgs, 192 | train_nm_gts_path, 193 | train_masks_path, 194 | batch_size=num_subbatch_input, 195 | flag_shuffle=True, 196 | ) 197 | 198 | # define re-initialisable iterator 199 | iterator = tf.data.Iterator.from_structure( 200 | train_data.output_types, train_data.output_shapes 201 | ) 202 | next_element = iterator.get_next() 203 | 204 | # define initialisation for each iterator 205 | trainData_init_op = iterator.make_initializer(train_data) 206 | 207 | return next_element, trainData_init_op 208 | 209 | 210 | def _read_pk_function(filename): 211 | with open(filename, "rb") as f: 212 | batch_data = pk.load(f) 213 | input = np.float32(batch_data["input"]) 214 | dm = batch_data["dm"] 215 | nm = np.float32(batch_data["nm"]) 216 | cam = np.float32(batch_data["cam"]) 217 | scaleX = batch_data["scaleX"] 218 | scaleY = batch_data["scaleY"] 219 | mask = np.float32(batch_data["mask"]) 220 | 221 | return input, dm, nm, cam, scaleX, scaleY, mask 222 | 223 | 224 | def _read_pk_function_cProj(filename): 225 | with open(filename, "rb") as f: 226 | batch_data = pk.load(f) 227 | input = np.float32(batch_data["reproj_im1"]) 228 | mask = np.float32(batch_data["reproj_mask"]) 229 | 230 | return input, mask 231 | 232 | 233 | def md_read_func(filename, cProj_filename): 234 | 235 | input, dm, nm, cam, scaleX, scaleY, mask = tf.py_func( 236 | _read_pk_function, 237 | [filename], 238 | [ 239 | tf.float32, 240 | tf.float32, 241 | tf.float32, 242 | tf.float32, 243 | tf.float32, 244 | tf.float32, 245 | tf.float32, 246 | ], 247 | ) 248 | reproj_inputs, reproj_mask = tf.py_func( 249 | _read_pk_function_cProj, [cProj_filename], [tf.float32, tf.float32] 250 | ) 251 | 252 | input = tf.data.Dataset.from_tensor_slices(input[None]) 253 | dm = tf.data.Dataset.from_tensor_slices(dm[None]) 254 | nm = tf.data.Dataset.from_tensor_slices(nm[None]) 255 | cam = tf.data.Dataset.from_tensor_slices(cam[None]) 256 | scaleX = tf.data.Dataset.from_tensor_slices(scaleX[None]) 257 | scaleY = tf.data.Dataset.from_tensor_slices(scaleY[None]) 258 | mask = tf.data.Dataset.from_tensor_slices(mask[None]) 259 | reproj_inputs = tf.data.Dataset.from_tensor_slices(reproj_inputs[None]) 260 | reproj_mask = tf.data.Dataset.from_tensor_slices(reproj_mask[None]) 261 | 262 | return tf.data.Dataset.zip( 263 | (input, dm, nm, cam, scaleX, scaleY, mask, reproj_inputs, reproj_mask) 264 | ) 265 | 266 | 267 | def md_preprocess_func( 268 | input, dm, nm, cam, scaleX, scaleY, mask, reproj_inputs, reproj_mask 269 | ): 270 | 271 | input = input / 255 272 | input = input * 2 - 1 273 | 274 | nm = nm / 127 275 | 276 | reproj_inputs = reproj_inputs / 255 277 | 278 | reproj_inputs = reproj_inputs * 2 - 1 279 | 280 | return input, dm, nm, cam, scaleX, scaleY, mask, reproj_inputs, reproj_mask 281 | 282 | 283 | def bt_preprocess_func(bt_imgs, bt_masks): 284 | ori_bt_imgs = tf.identity(bt_imgs) 285 | ori_bt_masks = tf.identity(bt_masks) 286 | 287 | input_height = 200 288 | input_width = 200 289 | 290 | ori_height = tf.shape(bt_imgs)[1] 291 | ori_width = tf.shape(bt_imgs)[2] 292 | ratio = tf.to_float(ori_width) / tf.to_float(ori_height) 293 | 294 | bt_imgs = tf.image.resize_nearest_neighbor(bt_imgs, (input_height, input_width)) 295 | bt_masks = tf.image.resize_nearest_neighbor(bt_masks, (input_height, input_width)) 296 | 297 | bt_imgs = tf.to_float(bt_imgs) / 255.0 298 | bt_masks = tf.to_float(tf.not_equal(bt_masks, 0)) 299 | 300 | return bt_imgs, bt_masks 301 | 302 | 303 | def bt_read_func(bt_imgs_path, bt_masks_path, sc_len): 304 | batch_size = tf.constant(5) 305 | res_len = batch_size - sc_len 306 | res_idx = tf.range(sc_len, batch_size) 307 | 308 | sfl_idx = tf.random_shuffle(tf.range(sc_len)) 309 | 310 | sc_idx = sfl_idx[:batch_size] 311 | bt_imgs_path = tf.gather(bt_imgs_path, sc_idx) 312 | bt_masks_path = tf.gather(bt_masks_path, sc_idx) 313 | 314 | i_ = tf.constant(0) 315 | num_loops = tf.shape(bt_imgs_path)[0] 316 | im_output = tf.TensorArray(dtype=tf.uint8, size=num_loops) 317 | mask_output = tf.TensorArray(dtype=tf.uint8, size=num_loops) 318 | 319 | def condition(i_, im_output, mask_output): 320 | return tf.less(i_, num_loops) 321 | 322 | def body(i_, im_output, mask_output): 323 | bt_img = tf.read_file(bt_imgs_path[i_]) 324 | bt_img = tf.image.decode_image(bt_img) 325 | 326 | bt_mask = tf.read_file(bt_masks_path[i_]) 327 | bt_mask = tf.image.decode_image(bt_mask) 328 | 329 | im_output = im_output.write(i_, bt_img) 330 | mask_output = mask_output.write(i_, bt_mask) 331 | i_ += 1 332 | 333 | return i_, im_output, mask_output 334 | 335 | _, im_output, mask_output = tf.while_loop( 336 | condition, body, loop_vars=[i_, im_output, mask_output] 337 | ) 338 | 339 | bt_imgs = im_output.stack()[tf.newaxis] 340 | bt_masks = mask_output.stack()[tf.newaxis] 341 | 342 | return tf.data.Dataset.from_tensor_slices((bt_imgs, bt_masks)) 343 | 344 | 345 | def nyu_read_func(img, nm_gt_path, mask_path): 346 | 347 | nm_gt = tf.image.decode_image(tf.read_file(nm_gt_path), channels=3) 348 | mask = tf.image.decode_image(tf.read_file(mask_path)) 349 | 350 | return tf.data.Dataset.from_tensor_slices( 351 | (img[tf.newaxis, :, :, :], tf.expand_dims(nm_gt, axis=0), mask[tf.newaxis]) 352 | ) 353 | 354 | 355 | def nyu_preprocess_func(img, nm_gt, mask): 356 | 357 | # masking 358 | bdL = tf.reduce_min(tf.where(tf.not_equal(mask, 0))[:, 0]) 359 | bdR = tf.reduce_max(tf.where(tf.not_equal(mask, 0))[:, 0]) 360 | bdT = tf.reduce_min(tf.where(tf.not_equal(mask, 0))[:, 1]) 361 | bdB = tf.reduce_max(tf.where(tf.not_equal(mask, 0))[:, 1]) 362 | 363 | img = img[bdT:bdB, bdL:bdR] 364 | nm_gt = nm_gt[bdT:bdB, bdL:bdR] 365 | 366 | img = tf.to_float(img) / 255.0 367 | nm_gt = tf.to_float(nm_gt) / 127.0 - 1.0 368 | 369 | nm_gt = tf.stack([nm_gt[:, :, 2], -nm_gt[:, :, 1], -nm_gt[:, :, 0]], axis=-1) 370 | 371 | img = img[tf.newaxis] 372 | nm_gt = nm_gt[tf.newaxis] 373 | 374 | input_height = 200 375 | input_width = 200 376 | 377 | ori_height = tf.shape(img)[1] 378 | ori_width = tf.shape(img)[2] 379 | ratio = tf.to_float(ori_width) / tf.to_float(ori_height) 380 | 381 | rand_pos = tf.cond( 382 | ratio > 1.0, 383 | lambda: f1(ori_height, ori_width), 384 | lambda: f2(ori_height, ori_width), 385 | ) 386 | 387 | rand_flip = tf.random_uniform((), 0, 1, dtype=tf.float32) 388 | rand_angle = tf.random_uniform((), -1, 1, dtype=tf.float32) * (5.0 / 180.0) * np.pi 389 | 390 | img = img[:, rand_pos[0] : rand_pos[1], rand_pos[2] : rand_pos[3], :] 391 | nm_gt = nm_gt[:, rand_pos[0] : rand_pos[1], rand_pos[2] : rand_pos[3], :] 392 | 393 | img = tf.where(rand_flip > 0.5, img[:, :, ::-1], img) 394 | nm_gt = tf.where( 395 | rand_flip > 0.5, 396 | nm_gt[:, :, ::-1] * tf.constant([[[[-1, 1, 1]]]], dtype=tf.float32), 397 | nm_gt, 398 | ) 399 | 400 | img = tf.image.resize_nearest_neighbor(img, (input_height, input_width)) 401 | nm_gt = tf.image.resize_nearest_neighbor(nm_gt, (input_height, input_width)) 402 | 403 | img = tf.contrib.image.rotate(img, rand_angle) 404 | nm_gt = tf.contrib.image.rotate(nm_gt, rand_angle) 405 | 406 | sinR = tf.sin(rand_angle) 407 | cosR = tf.cos(rand_angle) 408 | R = tf.stack( 409 | [ 410 | tf.stack([cosR, sinR, 0.0], axis=-1), 411 | tf.stack([-sinR, cosR, 0.0], axis=-1), 412 | tf.constant([0, 0, 1], dtype=tf.float32), 413 | ], 414 | axis=0, 415 | ) 416 | nm_gt = tf.reshape(tf.matmul(tf.reshape(nm_gt, (-1, 3)), R), (1, 200, 200, 3)) 417 | 418 | return tf.squeeze(img), tf.squeeze(nm_gt) 419 | 420 | 421 | def bt_construct_inputPipeline( 422 | bt_imgs_path, bt_masks_path, sc_len, batch_size, flag_shuffle=True 423 | ): 424 | data = tf.data.Dataset.from_tensor_slices((bt_imgs_path, bt_masks_path, sc_len)) 425 | if flag_shuffle: 426 | data = data.apply(tf.contrib.data.shuffle_and_repeat(buffer_size=100000)) 427 | else: 428 | data = data.repeat() 429 | 430 | data = data.apply( 431 | tf.contrib.data.parallel_interleave( 432 | bt_read_func, cycle_length=batch_size, block_length=1, sloppy=False 433 | ) 434 | ) 435 | 436 | data = data.map(bt_preprocess_func, num_parallel_calls=8) 437 | data = data.batch(batch_size).prefetch(4) 438 | 439 | return data 440 | 441 | 442 | def nyu_construct_inputPipeline( 443 | nyu_imgs, nyu_nm_gts_path, nyu_masks_path, batch_size, flag_shuffle=True 444 | ): 445 | imgs_data = tf.data.Dataset.from_tensor_slices(nyu_imgs) 446 | nm_gts_data = tf.data.Dataset.from_tensor_slices(nyu_nm_gts_path) 447 | masks_data = tf.data.Dataset.from_tensor_slices(nyu_masks_path) 448 | 449 | data = tf.data.Dataset.zip((imgs_data, nm_gts_data, masks_data)) 450 | 451 | if flag_shuffle: 452 | data = data.apply(tf.contrib.data.shuffle_and_repeat(buffer_size=1000)) 453 | else: 454 | data = data.repeat() 455 | data = data.apply( 456 | tf.contrib.data.parallel_interleave( 457 | nyu_read_func, cycle_length=batch_size, block_length=1, sloppy=False 458 | ) 459 | ) 460 | 461 | data = data.map(nyu_preprocess_func, num_parallel_calls=8) 462 | data = data.batch(batch_size).prefetch(4) 463 | 464 | return data 465 | 466 | 467 | def md_construct_inputPipeline(items, cProj_items, batch_size, flag_shuffle=True): 468 | data = tf.data.Dataset.from_tensor_slices((items, cProj_items)) 469 | if flag_shuffle: 470 | data = data.apply(tf.contrib.data.shuffle_and_repeat(buffer_size=100000)) 471 | else: 472 | data = data.repeat() 473 | data = data.apply( 474 | tf.contrib.data.parallel_interleave( 475 | md_read_func, cycle_length=batch_size, block_length=1, sloppy=False 476 | ) 477 | ) 478 | data = data.map(md_preprocess_func, num_parallel_calls=8) 479 | data = data.batch(batch_size).prefetch(4) 480 | 481 | return data 482 | 483 | 484 | def f1(ori_h, ori_w): 485 | h_upMost = 25 486 | w_leftMost = ori_w - ori_h + h_upMost 487 | 488 | random_start_y = tf.random_uniform((), 0, h_upMost, dtype=tf.int32) 489 | random_start_x = tf.random_uniform((), 0, w_leftMost, dtype=tf.int32) 490 | random_pos = [ 491 | random_start_y, 492 | random_start_y + ori_h - h_upMost, 493 | random_start_x, 494 | random_start_x + ori_w - w_leftMost, 495 | ] 496 | 497 | return random_pos 498 | 499 | 500 | def f2(ori_h, ori_w): 501 | w_leftMost = 25 502 | h_upMost = ori_h - ori_w + w_leftMost 503 | 504 | random_start_x = tf.random_uniform((), 0, w_leftMost, dtype=tf.int32) 505 | random_start_y = tf.random_uniform((), 0, h_upMost, dtype=tf.int32) 506 | random_pos = [ 507 | random_start_y, 508 | random_start_y + ori_h - h_upMost, 509 | random_start_x, 510 | random_start_x + ori_w - w_leftMost, 511 | ] 512 | 513 | return random_pos 514 | -------------------------------------------------------------------------------- /model/hdr_illu_pca/mean.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/model/hdr_illu_pca/mean.npy -------------------------------------------------------------------------------- /model/hdr_illu_pca/pcaMean.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/model/hdr_illu_pca/pcaMean.npy -------------------------------------------------------------------------------- /model/hdr_illu_pca/pcaVariance.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/model/hdr_illu_pca/pcaVariance.npy -------------------------------------------------------------------------------- /model/hdr_illu_pca/pcaVector.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/model/hdr_illu_pca/pcaVector.npy -------------------------------------------------------------------------------- /model/lambSH_layer.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | 3 | 4 | def lambSH_layer(am, nm, L_SHcoeffs, shadow, gamma): 5 | """ 6 | i = albedo * irradiance 7 | the multiplication is elementwise 8 | albedo is given 9 | irraidance = n.T * M * n, where n is (x,y,z,1) 10 | M is contructed from some precomputed constants and L_SHcoeff, where M contains information about illuminations, clamped cosine and SH basis 11 | """ 12 | 13 | # M is only related with lighting 14 | c1 = tf.constant(0.429043, dtype=tf.float32) 15 | c2 = tf.constant(0.511664, dtype=tf.float32) 16 | c3 = tf.constant(0.743125, dtype=tf.float32) 17 | c4 = tf.constant(0.886227, dtype=tf.float32) 18 | c5 = tf.constant(0.247708, dtype=tf.float32) 19 | 20 | # each row have shape (batch, 4, 3) 21 | M_row1 = tf.stack( 22 | [ 23 | c1 * L_SHcoeffs[:, 8, :], 24 | c1 * L_SHcoeffs[:, 4, :], 25 | c1 * L_SHcoeffs[:, 7, :], 26 | c2 * L_SHcoeffs[:, 3, :], 27 | ], 28 | axis=1, 29 | ) 30 | M_row2 = tf.stack( 31 | [ 32 | c1 * L_SHcoeffs[:, 4, :], 33 | -c1 * L_SHcoeffs[:, 8, :], 34 | c1 * L_SHcoeffs[:, 5, :], 35 | c2 * L_SHcoeffs[:, 1, :], 36 | ], 37 | axis=1, 38 | ) 39 | M_row3 = tf.stack( 40 | [ 41 | c1 * L_SHcoeffs[:, 7, :], 42 | c1 * L_SHcoeffs[:, 5, :], 43 | c3 * L_SHcoeffs[:, 6, :], 44 | c2 * L_SHcoeffs[:, 2, :], 45 | ], 46 | axis=1, 47 | ) 48 | M_row4 = tf.stack( 49 | [ 50 | c2 * L_SHcoeffs[:, 3, :], 51 | c2 * L_SHcoeffs[:, 1, :], 52 | c2 * L_SHcoeffs[:, 2, :], 53 | c4 * L_SHcoeffs[:, 0, :] - c5 * L_SHcoeffs[:, 6, :], 54 | ], 55 | axis=1, 56 | ) 57 | 58 | # M is a 5d tensot with shape (batch,4,4,3[rgb]), the axis 1 and 2 are transposely equivalent 59 | M = tf.stack([M_row1, M_row2, M_row3, M_row4], axis=1) 60 | 61 | # find batch-spatial three dimensional mask of defined normals over nm 62 | mask = tf.not_equal(tf.reduce_sum(nm, axis=-1), 0) 63 | 64 | # flatten nm and am, resulting a dense nm and am 65 | total_npix = tf.shape(nm)[:3] 66 | ones = tf.ones(total_npix) 67 | nm_homo = tf.concat([nm, tf.expand_dims(ones, axis=-1)], axis=-1) 68 | 69 | ### manually perform batch-wise dot product 70 | # contruct batch-wise flatten M corresponding with nm_homo, such that multiplication between them is batch-wise 71 | # batch_indices = tf.expand_dims(tf.where(mask)[:,0],axis=-1) 72 | M = tf.expand_dims(tf.expand_dims(M, axis=1), axis=1) 73 | 74 | # expand M for broadcasting, such that M has shape (npix,4,4,3) 75 | # expand nm_homo, such that nm_homo has shape (npix,4,1,1) 76 | nm_homo = tf.expand_dims(tf.expand_dims(nm_homo, axis=-1), axis=-1) 77 | ## realise dot product by sum over element-wise product on axis=1 and axis=2 78 | # tmp have shape (npix, 4, 3[rgb]) 79 | tmp = tf.reduce_sum(nm_homo * M, axis=-3) 80 | # E has shape (npix, 3[rbg]) 81 | E = tf.reduce_sum(tmp * nm_homo[:, :, :, :, 0, :], axis=-2) 82 | 83 | # compute intensity by product between irradiance and albedo 84 | i = E * am * shadow * tf.to_float(tf.expand_dims(mask, -1)) 85 | 86 | # gamma correction 87 | i = tf.clip_by_value(i, 0.0, 1.0) + tf.constant(1e-4) 88 | i = tf.pow(i, 1.0 / gamma) 89 | 90 | return i, mask 91 | -------------------------------------------------------------------------------- /model/pred_illuDecomp_layer_new.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | 3 | 4 | def illuDecomp(inputs, am, nm, shadow, gamma, masks): 5 | 6 | """ 7 | i = albedo * irradiance 8 | the multiplication is elementwise 9 | albedo is given 10 | irraidance = n.T * M * n, where n is (x,y,z,1) 11 | M is contructed from some precomputed constants and L_SHcoeff, where M contains information about illuminations, clamped cosine and SH basis 12 | """ 13 | 14 | inputs = tf.pow(inputs, gamma) * masks 15 | 16 | D = am * shadow * masks 17 | 18 | # compute shading by linear equation regarding nm and L_SHcoeffs 19 | c1 = tf.constant(0.429043, dtype=tf.float32) 20 | c2 = tf.constant(0.511664, dtype=tf.float32) 21 | c3 = tf.constant(0.743125, dtype=tf.float32) 22 | c4 = tf.constant(0.886227, dtype=tf.float32) 23 | c5 = tf.constant(0.247708, dtype=tf.float32) 24 | 25 | # find defined pixels 26 | num_iter = tf.shape(nm)[0] 27 | output = tf.TensorArray(dtype=tf.float32, size=num_iter) 28 | i = tf.constant(0) 29 | 30 | def condition(i, output): 31 | return i < num_iter 32 | 33 | def body(i, output): 34 | inputs_ = inputs[i] 35 | nm_ = nm[i] 36 | D_ = D[i] 37 | nm_ = tf.reshape(nm_, (-1, 3)) 38 | inputs_pixels = tf.reshape(inputs_, (-1, 3)) 39 | D_ = tf.reshape(D_, (-1, 3)) 40 | 41 | total_npix = tf.shape(nm_)[0:1] 42 | ones = tf.ones(total_npix) 43 | A = tf.stack( 44 | [ 45 | c4 * ones, 46 | 2 * c2 * nm_[:, 1], 47 | 2 * c2 * nm_[:, 2], 48 | 2 * c2 * nm_[:, 0], 49 | 2 * c1 * nm_[:, 0] * nm_[:, 1], 50 | 2 * c1 * nm_[:, 1] * nm_[:, 2], 51 | c3 * nm_[:, 2] ** 2 - c5, 52 | 2 * c1 * nm_[:, 2] * nm_[:, 0], 53 | c1 * (nm_[:, 0] ** 2 - nm_[:, 1] ** 2), 54 | ], 55 | axis=-1, 56 | ) 57 | 58 | output_r = tf.matmul(pinv(A * D_[:, 0:1]), inputs_pixels[:, 0:1]) 59 | output_g = tf.matmul(pinv(A * D_[:, 1:2]), inputs_pixels[:, 1:2]) 60 | output_b = tf.matmul(pinv(A * D_[:, 2:3]), inputs_pixels[:, 2:3]) 61 | 62 | output = output.write(i, tf.concat([output_r, output_g, output_b], axis=-1)) 63 | i += tf.constant(1) 64 | 65 | return i, output 66 | 67 | _, output = tf.while_loop(condition, body, loop_vars=[i, output]) 68 | L_SHcoeffs = output.stack() 69 | 70 | return tf.reshape(L_SHcoeffs, [-1, 27]) 71 | 72 | 73 | def pinv(A, reltol=1e-6): 74 | # compute SVD of input A 75 | s, u, v = tf.svd(A) 76 | 77 | # invert s and clear entries lower than reltol*s_max 78 | atol = tf.reduce_max(s) * reltol 79 | s = tf.where(s > atol, s, atol * tf.ones_like(s)) 80 | s_inv = tf.diag(1.0 / s) 81 | 82 | # compute v * s_inv * u_t as psuedo inverse 83 | return tf.matmul(v, tf.matmul(s_inv, tf.transpose(u))) 84 | -------------------------------------------------------------------------------- /model/reproj_layer.py: -------------------------------------------------------------------------------- 1 | # apply error mask in albedo reprojection 2 | 3 | 4 | # no rotation involved 5 | 6 | 7 | #### directly output flatten reprojected pixels and the reconstruction mask 8 | 9 | # the differentiable layer performing reprojection 10 | 11 | import tensorflow as tf 12 | import numpy as np 13 | 14 | # pc is n-by-3 matrix containing point could three locations 15 | # cam is the new camera parameters, whose f and p_a have shape (batch) and c has shape (batch, 2) 16 | # dm1 is the depth map associated with cam1 that is camera for output image, which has shape (batch, height, width) 17 | # img2 is the input image that acts as source image for reprojection, which has shape (batch, height, width, 3) 18 | def map_reproj(dm1, map2, cam1, cam2, scale_x1, scale_x2, scale_y1, scale_y2): 19 | batch_size = tf.shape(dm1)[0] 20 | 21 | # read camera parameters 22 | c1 = cam1[:, 2:4] 23 | f1 = cam1[:, 0] 24 | p_a1 = cam1[:, 1] # ratio is width divided by height 25 | R1 = tf.reshape(cam1[:, 4:13], [-1, 3, 3]) 26 | t1 = cam1[:, 13:] 27 | 28 | c2 = cam2[:, 2:4] 29 | f2 = cam2[:, 0] 30 | p_a2 = cam2[:, 1] 31 | R2 = tf.reshape(cam2[:, 4:13], [-1, 3, 3]) 32 | t2 = cam2[:, 13:] 33 | 34 | # project pixel points back to camera coords 35 | # u is the height and v is the width 36 | # u and v are scalars 37 | u1 = tf.shape(dm1)[1] 38 | v1 = tf.shape(dm1)[2] 39 | 40 | # convert u1 and v1 to float, convenient for computation 41 | u1 = tf.to_float(u1) 42 | v1 = tf.to_float(v1) 43 | 44 | ### regular grid in output image 45 | # x increase towards right, y increase toward down 46 | vm, um = tf.meshgrid(tf.range(1.0, v1 + 1.0), tf.range(1.0, u1 + 1.0)) 47 | 48 | # apply scaling factors on f 49 | # f1 = f1/(scale_x1+scale_y1)*2 50 | # f1 = tf.stack([f1, f1*p_a1],axis=-1) 51 | f1 = tf.stack([f1 / scale_x1, f1 / scale_y1 * p_a1], axis=-1) 52 | 53 | # expand f1 (batch,2,1,1), to be consistant with dm 54 | f1 = tf.expand_dims(tf.expand_dims(f1, axis=-1), axis=-1) 55 | # expand c1 dimension (batch,2,1,1) 56 | c1 = tf.expand_dims(tf.expand_dims(c1, axis=-1), axis=-1) 57 | # expand vm and um to have shape (1,height,width) 58 | vm = tf.expand_dims(vm, axis=0) 59 | um = tf.expand_dims(um, axis=0) 60 | 61 | # compute 3D point x and y coordinates 62 | # Xm and Ym have shape (batch, height, width) 63 | Xm = (vm - c1[:, 0]) / f1[:, 0] * dm1 64 | Ym = (um - c1[:, 1]) / f1[:, 1] * dm1 65 | 66 | # the point cloud is (batch, 3, npix) matrix, each row is XYZ cam coords for one point 67 | pc = tf.stack( 68 | [ 69 | tf.contrib.layers.flatten(Xm), 70 | tf.contrib.layers.flatten(Ym), 71 | tf.contrib.layers.flatten(dm1), 72 | ], 73 | axis=1, 74 | ) 75 | 76 | ### transfer pc from coords of cam1 to cam2 77 | # construct homogeneous point cloud with shape batch-4-by-num_pix 78 | num_pix = tf.shape(pc)[-1] 79 | homo_pc_c1 = tf.concat( 80 | [pc, tf.ones((batch_size, 1, num_pix), dtype=tf.float32)], axis=1 81 | ) 82 | 83 | # both transformation matrix have shape batch-by-4-by-4, valid for multiplication with defined homogeneous point cloud 84 | last_row = tf.tile( 85 | tf.constant([[[0, 0, 0, 1]]], dtype=tf.float32), multiples=[batch_size, 1, 1] 86 | ) 87 | W_C_R_t1 = tf.concat([R1, tf.expand_dims(t1, axis=2)], axis=2) 88 | W_C_trans1 = tf.concat([W_C_R_t1, last_row], axis=1) 89 | W_C_R_t2 = tf.concat([R2, tf.expand_dims(t2, axis=2)], axis=2) 90 | W_C_trans2 = tf.concat([W_C_R_t2, last_row], axis=1) 91 | 92 | # batch dot product, output has shape (batch, 4, npix) 93 | homo_pc_c2 = tf.matmul( 94 | W_C_trans2, tf.matmul(tf.matrix_inverse(W_C_trans1), homo_pc_c1) 95 | ) 96 | 97 | ### project point cloud to cam2 pixel coordinates 98 | # u in vertical and v in horizontal 99 | u2 = tf.shape(map2)[1] 100 | v2 = tf.shape(map2)[2] 101 | 102 | # convert u2 and v2 to float 103 | u2 = tf.to_float(u2) 104 | v2 = tf.to_float(v2) 105 | 106 | # f2 = f2/(scale_x2+scale_y2)*2 107 | # f2 = tf.stack([f2, f2*p_a2],axis=-1) 108 | f2 = tf.stack([f2 / scale_x2, f2 / scale_y2 * p_a2], axis=-1) 109 | 110 | # construct intrics matrics, which has shape (batch, 3, 4) 111 | zeros = tf.zeros_like(f2[:, 0], dtype=tf.float32) 112 | ones = tf.ones_like(f2[:, 0], tf.float32) 113 | k2 = tf.stack( 114 | [ 115 | tf.stack([f2[:, 0], zeros, c2[:, 0], zeros], axis=1), 116 | tf.stack([zeros, f2[:, 1], c2[:, 1], zeros], axis=1), 117 | tf.stack([zeros, zeros, ones, zeros], axis=1), 118 | ], 119 | axis=1, 120 | ) 121 | 122 | ## manual batch dot product 123 | k2 = tf.expand_dims(k2, axis=-1) 124 | homo_pc_c2 = tf.expand_dims(homo_pc_c2, axis=1) 125 | # homo_uv2 has shape (batch, 3, npix) 126 | homo_uv2 = tf.reduce_sum(k2 * homo_pc_c2, axis=2) 127 | 128 | # the reprojected locations of regular grid in output image 129 | # both have shape (batch, npix) 130 | v_reproj = homo_uv2[:, 0, :] / homo_uv2[:, 2, :] 131 | u_reproj = homo_uv2[:, 1, :] / homo_uv2[:, 2, :] 132 | 133 | # u and v are flatten vector containing reprojected pixel locations 134 | # the u and v on same index compose one pixel 135 | u_valid = tf.logical_and( 136 | tf.logical_and(tf.logical_not(tf.is_nan(u_reproj)), u_reproj > 0), 137 | u_reproj < u2 - 1, 138 | ) 139 | v_valid = tf.logical_and( 140 | tf.logical_and(tf.logical_not(tf.is_nan(v_reproj)), v_reproj > 0), 141 | v_reproj < v2 - 1, 142 | ) 143 | # pixels has shape (batch, npix), indicating available reprojected pixels 144 | pixels = tf.logical_and(u_valid, v_valid) 145 | 146 | # pixels is bool indicator over original regular grid 147 | # v_reproj and u_reproj is x and y coordinates in source image 148 | # pixels, v_reproj and u_reproj are corresponded with each other by their indices 149 | 150 | ### interpolation function based on source image img2 151 | # it has shape (total_npix, 3), the second dimension contains [img_inds, x, y]; we need to use img_inds to distinguish each pixel's request image 152 | # img_inds is 2d matrix with shape (batch, npix), containing img_ind for each (x,y) location 153 | img_inds = tf.tile( 154 | tf.expand_dims(tf.to_float(tf.range(batch_size)), axis=1), 155 | multiples=[1, num_pix], 156 | ) 157 | request_points1 = tf.stack( 158 | [ 159 | tf.boolean_mask(img_inds, pixels), 160 | tf.boolean_mask(v_reproj, pixels), 161 | tf.boolean_mask(u_reproj, pixels), 162 | ], 163 | axis=1, 164 | ) 165 | 166 | # the output is stacked flatten pixel values for channels 167 | re_proj_pixs = interpImg(request_points1, map2) 168 | 169 | # reconstruct original shaped re-projection map 170 | ndims = tf.shape(map2)[3] 171 | shape = [batch_size, tf.to_int32(u1), tf.to_int32(v1), ndims] 172 | 173 | pixels = tf.reshape( 174 | pixels, shape=tf.stack([batch_size, tf.to_int32(u1), tf.to_int32(v1)], axis=0) 175 | ) 176 | indices = tf.to_int32(tf.where(tf.equal(pixels, True))) 177 | 178 | re_proj_pixs = tf.scatter_nd(updates=re_proj_pixs, indices=indices, shape=shape) 179 | 180 | # re_proj_pix is flatten reprojection results with shape (total_npix, 3) 181 | # indices contains first three indices in original image shape for each pixel in re_proj_pixs 182 | return re_proj_pixs, pixels 183 | 184 | 185 | def interpImg(unknown, data): 186 | # interpolate unknown data on pixel locations defined in unknown from known data with location defined in on regular grid 187 | 188 | # find neighbour pixels on regular grid 189 | # x is horizontal, y is vertical 190 | img_inds = tf.to_int32(unknown[:, 0]) 191 | x = unknown[:, 1] 192 | y = unknown[:, 2] 193 | # rgb_inds = tf.to_int32(unknown[:,3]) 194 | 195 | low_x = tf.to_int32(tf.floor(x)) 196 | high_x = tf.to_int32(tf.ceil(x)) 197 | low_y = tf.to_int32(tf.floor(y)) 198 | high_y = tf.to_int32(tf.ceil(y)) 199 | 200 | # measure the weights for neighbourhood average based on distance 201 | dist_low_x = tf.expand_dims(x - tf.to_float(low_x), axis=-1) 202 | dist_high_x = tf.expand_dims(tf.to_float(high_x) - x, axis=-1) 203 | dist_low_y = tf.expand_dims(y - tf.to_float(low_y), axis=-1) 204 | dist_high_y = tf.expand_dims(tf.to_float(high_y) - y, axis=-1) 205 | 206 | # compute horizontal avarage 207 | avg_low_y = dist_low_x * tf.gather_nd( 208 | data, indices=tf.stack([img_inds, low_y, low_x], axis=1) 209 | ) + dist_high_x * tf.gather_nd( 210 | data, indices=tf.stack([img_inds, low_y, high_x], axis=1) 211 | ) 212 | avg_high_y = dist_low_x * tf.gather_nd( 213 | data, indices=tf.stack([img_inds, high_y, low_x], axis=1) 214 | ) + dist_high_x * tf.gather_nd( 215 | data, indices=tf.stack([img_inds, high_y, high_x], axis=1) 216 | ) 217 | 218 | # compute vertical average 219 | avg = dist_low_y * avg_low_y + dist_high_y * avg_high_y 220 | 221 | return avg 222 | -------------------------------------------------------------------------------- /model/vgg16.py: -------------------------------------------------------------------------------- 1 | """ 2 | code cloned from 3 | https://github.com/machrisaa/tensorflow-vgg/blob/master/vgg16.py 4 | """ 5 | 6 | 7 | import numpy as np 8 | import os 9 | import tensorflow as tf 10 | import time 11 | 12 | 13 | VGG_MEAN = [103.939, 116.779, 123.68] 14 | 15 | 16 | class Vgg16: 17 | def __init__(self, vgg16_npy_path): 18 | self.initialized = False 19 | self.vgg16_npy_path = vgg16_npy_path 20 | print("npy file loaded") 21 | 22 | def build(self, rgb): 23 | """ 24 | load variable from npy to build the VGG 25 | :param rgb: rgb image [batch, height, width, 3] values scaled [-1, 1] 26 | """ 27 | 28 | start_time = time.time() 29 | print("build model started") 30 | 31 | # rgb_scaled = (rgb + 1) * 255.0 / 2. 32 | rgb_scaled = rgb * 255.0 33 | # Convert RGB to BGR 34 | red, green, blue = tf.split(axis=3, num_or_size_splits=3, value=rgb_scaled) 35 | bgr = tf.concat( 36 | axis=3, 37 | values=[ 38 | blue - VGG_MEAN[0], 39 | green - VGG_MEAN[1], 40 | red - VGG_MEAN[2], 41 | ], 42 | ) 43 | 44 | self.data_dict = np.load( 45 | self.vgg16_npy_path, encoding="latin1", allow_pickle=True 46 | ).item() 47 | layer_dict = dict() 48 | with tf.variable_scope("vgg16", reuse=self.initialized): 49 | layer_dict["conv1_1"] = self.conv_layer(bgr, "conv1_1") 50 | layer_dict["conv1_2"] = self.conv_layer(layer_dict["conv1_1"], "conv1_2") 51 | layer_dict["pool1"] = self.max_pool(layer_dict["conv1_2"], "pool1") 52 | 53 | # layer_dict['conv2_1'] = self.conv_layer( 54 | # layer_dict['pool1'], 'conv2_1') 55 | # layer_dict['conv2_2'] = self.conv_layer( 56 | # layer_dict['conv2_1'], 'conv2_2') 57 | # layer_dict['pool2'] = self.max_pool( 58 | # layer_dict['conv2_2'], 'pool2') 59 | 60 | # layer_dict['conv3_1'] = self.conv_layer( 61 | # layer_dict['pool2'], 'conv3_1') 62 | # layer_dict['conv3_2'] = self.conv_layer( 63 | # layer_dict['conv3_1'], 'conv3_2') 64 | # layer_dict['conv3_3'] = self.conv_layer( 65 | # layer_dict['conv3_2'], 'conv3_3') 66 | # layer_dict['pool3'] = self.max_pool( 67 | # layer_dict['conv3_3'], 'pool3') 68 | 69 | # layer_dict['conv4_1'] = self.conv_layer( 70 | # layer_dict['pool3'], 'conv4_1') 71 | # layer_dict['conv4_2'] = self.conv_layer( 72 | # layer_dict['conv4_1'], 'conv4_2') 73 | # layer_dict['conv4_3'] = self.conv_layer( 74 | # layer_dict['conv4_2'], 'conv4_3') 75 | # layer_dict['pool4'] = self.max_pool(layer_dict['conv4_3'], 'pool4') 76 | 77 | # layer_dict['conv5_1'] = self.conv_layer( 78 | # layer_dict['pool4'], 'conv5_1') 79 | # layer_dict['conv5_2'] = self.conv_layer( 80 | # layer_dict['conv5_1'], 'conv5_2') 81 | # layer_dict['conv5_3'] = self.conv_layer( 82 | # layer_dict['conv5_2'], 'conv5_3') 83 | # layer_dict['pool5'] = self.max_pool(layer_dict['conv5_3'], 'pool5') 84 | 85 | self.data_dict = None 86 | self.initialized = True 87 | return layer_dict 88 | 89 | def get_vgg_activations(self, rgb, layer_names): 90 | layer_dict = self.build(rgb) 91 | # validate_names = reduce(lambda f1, f2: f1 & f2, 92 | # [layer_dict.has_key(x) for x in layer_names]) 93 | # assert validate_names, 'invalid vgg16 layer name(s): %s' % str(layer_names) 94 | activations = [layer_dict[k] for k in layer_names] 95 | return activations 96 | 97 | def avg_pool(self, bottom, name): 98 | return tf.nn.avg_pool( 99 | bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME", name=name 100 | ) 101 | 102 | def max_pool(self, bottom, name): 103 | return tf.nn.max_pool( 104 | bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME", name=name 105 | ) 106 | 107 | def conv_layer(self, bottom, name): 108 | with tf.variable_scope(name): 109 | filt = self.get_conv_filter(name) 110 | 111 | conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding="SAME") 112 | 113 | conv_biases = self.get_bias(name) 114 | bias = tf.nn.bias_add(conv, conv_biases) 115 | 116 | relu = tf.nn.relu(bias) 117 | return relu 118 | 119 | def fc_layer(self, bottom, name): 120 | with tf.variable_scope(name): 121 | shape = bottom.get_shape().as_list() 122 | dim = 1 123 | for d in shape[1:]: 124 | dim *= d 125 | x = tf.reshape(bottom, [-1, dim]) 126 | 127 | weights = self.get_fc_weight(name) 128 | biases = self.get_bias(name) 129 | 130 | # Fully connected layer. Note that the '+' operation automatically 131 | # broadcasts the biases. 132 | fc = tf.nn.bias_add(tf.matmul(x, weights), biases) 133 | 134 | return fc 135 | 136 | def get_conv_filter(self, name): 137 | return tf.constant(self.data_dict[name][0], name="filter") 138 | 139 | def get_bias(self, name): 140 | return tf.constant(self.data_dict[name][1], name="biases") 141 | 142 | def get_fc_weight(self, name): 143 | return tf.constant(self.data_dict[name][0], name="weights") 144 | 145 | -------------------------------------------------------------------------------- /run_test_demo.sh: -------------------------------------------------------------------------------- 1 | #! /bin/bash 2 | set -x 3 | 4 | ### input and output options #### 5 | TESTING_MODE="demo_im" 6 | MODEL_PATH="model_ckpt" 7 | IMAGE_PATH="demo_im.jpg" 8 | MASK_PATH="demo_mask.jpg" 9 | RESULTS_DIR="test_demo" 10 | 11 | python test.py \ 12 | --mode "${TESTING_MODE}" \ 13 | --model "${MODEL_PATH}" \ 14 | --image "${IMAGE_PATH}" \ 15 | --mask "${MASK_PATH}" \ 16 | --output "${RESULTS_DIR}" 17 | -------------------------------------------------------------------------------- /run_test_diode.sh: -------------------------------------------------------------------------------- 1 | #! /bin/bash 2 | set -x 3 | ROOT="/shared/storage/cs/staffstore/yy1571" 4 | 5 | ### input and output options #### 6 | TESTING_MODE="diode" 7 | MODEL_PATH="diode_model_ckpt" 8 | IMAGES_DIR="${ROOT}/Data/DIODE" 9 | RESULTS_DIR="test_diode" 10 | 11 | python test.py \ 12 | --mode "${TESTING_MODE}" \ 13 | --model "${MODEL_PATH}" \ 14 | --diode "${IMAGES_DIR}" \ 15 | --output "${RESULTS_DIR}" 16 | -------------------------------------------------------------------------------- /run_test_iiw.sh: -------------------------------------------------------------------------------- 1 | #! /bin/bash 2 | set -x 3 | ROOT="/shared/storage/cs/staffstore/yy1571" 4 | 5 | ### input and output options #### 6 | TESTING_MODE="iiw" 7 | MODEL_PATH="iiw_model_ckpt" 8 | IMAGES_DIR="${ROOT}/Data/testData/iiw-dataset/data" 9 | RESULTS_DIR="test_iiw" 10 | 11 | python test.py \ 12 | --mode "${TESTING_MODE}" \ 13 | --model "${MODEL_PATH}" \ 14 | --iiw "${IMAGES_DIR}" \ 15 | --output "${RESULTS_DIR}" 16 | -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | import ipdb 2 | from tqdm import tqdm 3 | import json 4 | import os 5 | import numpy as np 6 | import tensorflow as tf 7 | import shutil 8 | import cv2 9 | from skimage import io 10 | from model import lambSH_layer, SfMNet 11 | from utils.render_sphere_nm import render_sphere_nm 12 | from utils.whdr import compute_whdr 13 | from utils.diode_metrics import angular_error 14 | import argparse 15 | 16 | parser = argparse.ArgumentParser(description="InverseRenderNet++") 17 | parser.add_argument( 18 | "--mode", 19 | type=str, 20 | default="demo_im", 21 | choices=["demo_im", "iiw", "diode"], 22 | help="testing mode", 23 | ) 24 | 25 | # test demo image 26 | parser.add_argument("--image", type=str, default=None, help="Path to test image") 27 | parser.add_argument("--mask", type=str, default=None, help="Path to image mask") 28 | # test iiw 29 | parser.add_argument( 30 | "--iiw", type=str, default=None, help="Root directory for iiw-dataset" 31 | ) 32 | # test diode 33 | parser.add_argument( 34 | "--diode", type=str, default=None, help="Root directory for iiw-dataset" 35 | ) 36 | # model and output path 37 | parser.add_argument("--model", type=str, required=True, help="Path to trained model") 38 | parser.add_argument("--output", type=str, required=True, help="Folder saving outputs") 39 | args = parser.parse_args() 40 | 41 | 42 | def rescale_2_zero_one(imgs): 43 | return imgs / 2.0 + 0.5 44 | 45 | 46 | def srgb_to_rgb(srgb): 47 | """Taken from bell2014: sRGB -> RGB.""" 48 | ret = np.zeros_like(srgb) 49 | idx0 = srgb <= 0.04045 50 | idx1 = srgb > 0.04045 51 | ret[idx0] = srgb[idx0] / 12.92 52 | ret[idx1] = np.power((srgb[idx1] + 0.055) / 1.055, 2.4) 53 | return ret 54 | 55 | 56 | def irn_func(input_height, input_width): 57 | # define inputs 58 | inputs_var = tf.placeholder(tf.float32, (None, input_height, input_width, 3)) 59 | masks_var = tf.placeholder(tf.float32, (None, input_height, input_width, 1)) 60 | train_flag = tf.placeholder(tf.bool, ()) 61 | 62 | albedos, shadow, nm_pred = SfMNet.SfMNet( 63 | inputs=inputs_var, 64 | is_training=train_flag, 65 | height=input_height, 66 | width=input_width, 67 | masks=masks_var, 68 | n_layers=30, 69 | n_pools=4, 70 | depth_base=32, 71 | ) 72 | 73 | gamma = tf.constant(2.2) 74 | lightings, _ = SfMNet.comp_light( 75 | inputs_var, albedos, nm_pred, shadow, gamma, masks_var 76 | ) 77 | 78 | # rescale 79 | albedos = rescale_2_zero_one(albedos) * masks_var 80 | shadow = rescale_2_zero_one(shadow) 81 | inputs = rescale_2_zero_one(inputs_var) * masks_var 82 | 83 | # visualise lighting on a sphere 84 | num_rendering = tf.shape(lightings)[0] 85 | nm_sphere = tf.constant(render_sphere_nm(100, 1), dtype=tf.float32) 86 | nm_sphere = tf.tile(nm_sphere, (num_rendering, 1, 1, 1)) 87 | lighting_recon, _ = lambSH_layer.lambSH_layer( 88 | tf.ones_like(nm_sphere), nm_sphere, lightings, tf.ones_like(nm_sphere), 1.0 89 | ) 90 | 91 | # recon shading map 92 | shading, _ = lambSH_layer.lambSH_layer( 93 | tf.ones_like(albedos), nm_pred, lightings, tf.ones_like(albedos), 1.0 94 | ) 95 | 96 | return ( 97 | albedos, 98 | shadow, 99 | nm_pred, 100 | lighting_recon, 101 | shading, 102 | inputs, 103 | inputs_var, 104 | masks_var, 105 | train_flag, 106 | ) 107 | 108 | 109 | def post_process( 110 | albedos_val, 111 | shading_val, 112 | shadow_val, 113 | lighting_recon_val, 114 | nm_pred_val, 115 | ori_width, 116 | ori_height, 117 | resize=True, 118 | ): 119 | # post-process results 120 | results = {} 121 | 122 | if resize: 123 | results.update( 124 | dict(albedos=cv2.resize(albedos_val[0], (ori_width, ori_height))) 125 | ) 126 | 127 | results.update( 128 | dict(shading=cv2.resize(shading_val[0], (ori_width, ori_height))) 129 | ) 130 | 131 | results.update( 132 | dict(shadow=cv2.resize(shadow_val[0, :, :, 0], (ori_width, ori_height))) 133 | ) 134 | 135 | results.update(dict(lighting_recon=lighting_recon_val[0])) 136 | 137 | results.update( 138 | dict(nm_pred=cv2.resize(nm_pred_val[0], (ori_width, ori_height))) 139 | ) 140 | else: 141 | results.update(dict(albedos=albedos_val[0])) 142 | 143 | results.update(dict(shading=shading_val[0])) 144 | 145 | results.update(dict(shadow=shadow_val[0, ..., 0])) 146 | 147 | results.update(dict(lighting_recon=lighting_recon_val[0])) 148 | 149 | results.update(dict(nm_pred=nm_pred_val[0])) 150 | 151 | return results 152 | 153 | 154 | def saving_result(results, dst_dir, prefix=""): 155 | img = np.uint8(results["img"]) 156 | albedos = np.uint8(results["albedos"] * 255.0) 157 | shading = np.uint8(results["shading"] * 255.0) 158 | shadow = np.uint8(results["shadow"] * 255.0) 159 | lighting_recon = np.uint8(results["lighting_recon"] * 255.0) 160 | nm_pred = np.uint8(results["nm_pred"] * 255.0) 161 | 162 | # save images 163 | input_path = os.path.join(dst_dir, prefix + "img.png") 164 | io.imsave(input_path, img) 165 | nm_pred_path = os.path.join(dst_dir, prefix + "nm_pred.png") 166 | io.imsave(nm_pred_path, nm_pred) 167 | albedo_path = os.path.join(dst_dir, prefix + "albedo.png") 168 | io.imsave(albedo_path, albedos) 169 | shading_path = os.path.join(dst_dir, prefix + "shading.png") 170 | io.imsave(shading_path, shading) 171 | shadow_path = os.path.join(dst_dir, prefix + "shadow.png") 172 | io.imsave(shadow_path, shadow) 173 | lighting_path = os.path.join(dst_dir, prefix + "lighting.png") 174 | io.imsave(lighting_path, lighting_recon) 175 | pass 176 | 177 | 178 | def rescale_img(img): 179 | img_h, img_w = img.shape[:2] 180 | if img_h > img_w: 181 | scale = img_w / 200 182 | new_img_h = np.int32(img_h / scale) 183 | new_img_w = 200 184 | 185 | img = cv2.resize(img, (new_img_w, new_img_h)) 186 | else: 187 | scale = img_h / 200 188 | new_img_w = np.int32(img_w / scale) 189 | new_img_h = 200 190 | 191 | img = cv2.resize(img, (new_img_w, new_img_h)) 192 | 193 | return img, (img_h, img_w), (new_img_h, new_img_w) 194 | 195 | 196 | if args.mode == "demo_im": 197 | assert args.image is not None and args.mask is not None 198 | 199 | # read in images 200 | img_path = args.image 201 | mask_path = args.mask 202 | 203 | img = io.imread(img_path) 204 | mask = io.imread(mask_path) 205 | 206 | input_height = 200 207 | input_width = 200 208 | 209 | ori_img, (ori_height, ori_width), (input_height, input_width) = rescale_img(img) 210 | 211 | # run inverse rendering 212 | ( 213 | albedos, 214 | shadow, 215 | nm_pred, 216 | lighting_recon, 217 | shading, 218 | inputs, 219 | inputs_var, 220 | masks_var, 221 | train_flag, 222 | ) = irn_func(input_height, input_width) 223 | 224 | # load model and run session 225 | model_path = tf.train.get_checkpoint_state(args.model).model_checkpoint_path 226 | sess = tf.InteractiveSession() 227 | saver = tf.train.Saver() 228 | saver.restore(sess, model_path) 229 | 230 | # evaluation 231 | dst_dir = args.output 232 | if os.path.isdir(dst_dir): 233 | shutil.rmtree(dst_dir, ignore_errors=True) 234 | os.makedirs(dst_dir) 235 | 236 | imgs = np.float32(ori_img) / 255.0 237 | imgs = srgb_to_rgb(imgs) 238 | imgs = imgs * 2.0 - 1.0 239 | imgs = imgs[None] 240 | mask = cv2.resize(mask, (input_width, input_height), cv2.INTER_NEAREST) 241 | img_masks = np.float32(mask == 255)[None, ..., None] 242 | imgs *= img_masks 243 | [ 244 | albedos_val, 245 | nm_pred_val, 246 | shadow_val, 247 | lighting_recon_val, 248 | shading_val, 249 | inputs_val, 250 | ] = sess.run( 251 | [albedos, nm_pred, shadow, lighting_recon, shading, inputs], 252 | feed_dict={inputs_var: imgs, masks_var: img_masks, train_flag: False}, 253 | ) 254 | 255 | # post-process results 256 | results = post_process( 257 | albedos_val, 258 | shading_val, 259 | shadow_val, 260 | lighting_recon_val, 261 | nm_pred_val, 262 | ori_width, 263 | ori_height, 264 | ) 265 | 266 | # rescale albedo and normal 267 | results["albedos"] = (results["albedos"] - results["albedos"].min()) / ( 268 | results["albedos"].max() - results["albedos"].min() 269 | ) 270 | 271 | results["nm_pred"] = (results["nm_pred"] + 1.0) / 2.0 272 | results["img"] = img 273 | 274 | saving_result(results, dst_dir) 275 | 276 | 277 | elif args.mode == "iiw": 278 | assert args.iiw is not None 279 | 280 | input_height = 200 281 | input_width = 200 282 | 283 | ( 284 | albedos, 285 | shadow, 286 | nm_pred, 287 | lighting_recon, 288 | shading, 289 | inputs, 290 | inputs_var, 291 | masks_var, 292 | train_flag, 293 | ) = irn_func(input_height, input_width) 294 | 295 | # load model and run session 296 | model_path = tf.train.get_checkpoint_state(args.model).model_checkpoint_path 297 | sess = tf.InteractiveSession() 298 | saver = tf.train.Saver() 299 | saver.restore(sess, model_path) 300 | 301 | # evaluation 302 | dst_dir = args.output 303 | if os.path.isdir(dst_dir): 304 | shutil.rmtree(dst_dir, ignore_errors=True) 305 | os.makedirs(dst_dir) 306 | 307 | iiw = args.iiw 308 | 309 | test_ids = np.load("utils/iiw_test_ids.npy") 310 | 311 | total_loss = 0 312 | for counter, test_id in enumerate(tqdm(test_ids)): 313 | img_file = str(test_id) + ".png" 314 | judgement_file = str(test_id) + ".json" 315 | 316 | img_path = os.path.join(iiw, "imgs", img_file) 317 | judgement_path = os.path.join(iiw, "jsons", judgement_file) 318 | 319 | img = io.imread(img_path) 320 | judgement = json.load(open(judgement_path)) 321 | 322 | ori_img = img 323 | ori_height, ori_width = ori_img.shape[:2] 324 | img = cv2.resize(img, (input_width, input_height)) 325 | img = np.float32(img) / 255.0 326 | img = img * 2.0 - 1.0 327 | img = img[None, :, :, :] 328 | mask = np.ones((1, input_height, input_width, 1), np.bool) 329 | 330 | [ 331 | albedos_val, 332 | nm_pred_val, 333 | shadow_val, 334 | lighting_recon_val, 335 | shading_val, 336 | inputs_val, 337 | ] = sess.run( 338 | [albedos, nm_pred, shadow, lighting_recon, shading, inputs], 339 | feed_dict={inputs_var: img, masks_var: mask, train_flag: False}, 340 | ) 341 | 342 | # results folder for current scn 343 | result_dir = os.path.join(dst_dir, img_file[:-4]) 344 | os.makedirs(result_dir, exist_ok=True) 345 | 346 | # post-process results 347 | results = post_process( 348 | albedos_val, 349 | shading_val, 350 | shadow_val, 351 | lighting_recon_val, 352 | nm_pred_val, 353 | ori_width, 354 | ori_height, 355 | ) 356 | 357 | results["img"] = ori_img 358 | results["shading"] *= results["shadow"][..., None] 359 | results["nm_pred"] = (results["nm_pred"] + 1.0) / 2.0 360 | 361 | results["albedos"] = results["albedos"] ** (1 / 2.2) 362 | 363 | loss = compute_whdr(results["albedos"], judgement) 364 | total_loss += loss 365 | # print(f"{result_dir:s}\t\t{loss:f} {total_loss:f}") 366 | 367 | saving_result(results, result_dir) 368 | 369 | print("IIW TEST WHDR %f" % (total_loss / len(test_ids))) 370 | 371 | 372 | elif args.mode == "diode": 373 | assert args.diode is not None 374 | 375 | last_height = None 376 | last_width = None 377 | 378 | diode = args.diode 379 | test_root_dir = os.path.join(diode, "depth", "val") 380 | test_nm_root_dir = os.path.join(diode, "normal", "val") 381 | 382 | from glob import glob 383 | 384 | test_scenes_nm_dir = sorted( 385 | glob(os.path.join(test_nm_root_dir, "outdoor", "scene*", "scan*")) 386 | + glob(os.path.join(test_nm_root_dir, "indoor", "scene*", "scan*")) 387 | ) 388 | 389 | test_normals_path = np.concatenate( 390 | [ 391 | sorted(glob(os.path.join(t_sc_dir, "*_normal.npy"))) 392 | for t_sc_dir in test_scenes_nm_dir 393 | ], 394 | axis=0, 395 | ) 396 | test_imgs_path = np.stack( 397 | [ 398 | tmp.replace("/normal/", "/depth/").replace("_normal.npy", ".png") 399 | for tmp in test_normals_path 400 | ], 401 | axis=0, 402 | ) 403 | test_masks_path = np.stack( 404 | [ 405 | tmp.replace("/normal/", "/depth/").replace("_normal.npy", "_depth_mask.npy") 406 | for tmp in test_normals_path 407 | ], 408 | axis=0, 409 | ) 410 | test_depths_path = np.stack( 411 | [ 412 | tmp.replace("/normal/", "/depth/").replace("_normal.npy", "_depth.npy") 413 | for tmp in test_normals_path 414 | ], 415 | axis=0, 416 | ) 417 | 418 | total_angErr_list = [] 419 | for i, (img_path, mask_path, nm_gt_path) in enumerate( 420 | zip(tqdm(test_imgs_path), test_masks_path, test_normals_path) 421 | ): 422 | 423 | im_id = os.path.split(img_path)[1].split(".")[0] 424 | cur_dir = os.path.split(img_path)[0] 425 | cur_dir = cur_dir.split("/val/")[1] 426 | 427 | # results folder 428 | dst_dir = args.output 429 | if os.path.isdir(dst_dir): 430 | shutil.rmtree(dst_dir, ignore_errors=True) 431 | os.makedirs(dst_dir) 432 | 433 | # load im and gts 434 | img = io.imread(img_path) 435 | ori_img = img 436 | img = np.float32(ori_img) / 255.0 437 | img = img * 2.0 - 1.0 438 | 439 | mask = np.load(mask_path) 440 | nm_gt = np.load(nm_gt_path) 441 | 442 | img, (ori_height, ori_width), (input_height, input_width) = rescale_img(img) 443 | img_mask, (_, _), (_, _) = rescale_img(mask) 444 | nm_gt, (_, _), (_, _) = rescale_img(nm_gt) 445 | 446 | img = img[None, :, :, :] 447 | img_mask = img_mask[None, :, :, None] != 0.0 448 | 449 | if input_height != last_height or input_width != last_width: 450 | ( 451 | albedos, 452 | shadow, 453 | nm_pred, 454 | lighting_recon, 455 | shading, 456 | inputs, 457 | inputs_var, 458 | masks_var, 459 | train_flag, 460 | ) = irn_func(input_height, input_width) 461 | 462 | if last_height is None and last_width is None: 463 | model_path = tf.train.get_checkpoint_state( 464 | args.model 465 | ).model_checkpoint_path 466 | sess = tf.InteractiveSession() 467 | saver = tf.train.Saver() 468 | saver.restore(sess, model_path) 469 | 470 | last_height = input_height 471 | last_width = input_width 472 | 473 | [ 474 | albedos_val, 475 | nm_pred_val, 476 | shadow_val, 477 | lighting_recon_val, 478 | shading_val, 479 | inputs_val, 480 | ] = sess.run( 481 | [albedos, nm_pred, shadow, lighting_recon, shading, inputs], 482 | feed_dict={inputs_var: img, masks_var: img_mask, train_flag: False}, 483 | ) 484 | 485 | # results folder for current scn 486 | cur_dst_dir = os.path.join(dst_dir, cur_dir) 487 | os.makedirs(cur_dst_dir, exist_ok=True) 488 | 489 | # post-process results 490 | results = post_process( 491 | albedos_val, 492 | shading_val, 493 | shadow_val, 494 | lighting_recon_val, 495 | nm_pred_val, 496 | ori_width, 497 | ori_height, 498 | resize=False, 499 | ) 500 | 501 | angErr_list = angular_error(nm_gt, results["nm_pred"]) 502 | # print(f"{i:d} {angErr_list.mean():f}") 503 | 504 | total_angErr_list.append(angErr_list) 505 | total_angErr_list = [np.concatenate(total_angErr_list, -1)] 506 | 507 | results["img"] = cv2.resize(ori_img, (input_width, input_height)) 508 | results["nm_pred"] = (results["nm_pred"] + 1.0) / 2.0 509 | 510 | saving_result(results, cur_dst_dir, prefix=im_id) 511 | 512 | print( 513 | f"DIODE TEST: mean={np.mean(total_angErr_list):f} median={np.median(total_angErr_list):.4f}" 514 | ) 515 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | # also predict shadow mask and error mask 2 | 3 | # no rotation 4 | 5 | 6 | #### compute albedo reproj loss only on reprojection available area; compute reconstruction and its loss only based on defined area 7 | 8 | 9 | import os 10 | import shutil 11 | import time 12 | 13 | import numpy as np 14 | import tensorflow as tf 15 | 16 | from model import dataloader 17 | import argparse 18 | 19 | parser = argparse.ArgumentParser(description="InverseRenderNet++ training") 20 | parser.add_argument( 21 | "--mode", 22 | type=str, 23 | default="scratch", 24 | choices=["scratch", "trained"], 25 | help="training mode", 26 | ) 27 | 28 | parser.add_argument("--root-dir", type=str, default=None, help="Path to image data") 29 | parser.add_argument("--batch-size", type=int, default=None, help="Training batchsize") 30 | parser.add_argument( 31 | "--num-test-sc", type=int, default=1, help="Split for esting scenes" 32 | ) 33 | parser.add_argument("--num-gpus", type=int, default=1, help="Number of available gpus") 34 | parser.add_argument( 35 | "--use-GT-nm", action="store_true", help="Train with true normal map" 36 | ) 37 | args = parser.parse_args() 38 | 39 | 40 | def main(): 41 | 42 | # training batches are list of numpy arrays each of which is paired data 43 | num_subbatch_input = args.batch_size 44 | dir = args.root_dir 45 | training_mode = args.mode 46 | num_test_sc = args.num_test_sc 47 | num_gpus = args.num_gpus 48 | supTrain = args.use_GT_nm 49 | 50 | inputs_shape = (5, 200, 200, 3) 51 | 52 | ( 53 | md_next_element, 54 | md_trainData_init_op, 55 | md_testData_init_op, 56 | num_train_batches, 57 | num_test_batches, 58 | ) = dataloader.megaDepth_dataPipeline( 59 | num_subbatch_input, dir, training_mode, num_test_sc 60 | ) 61 | 62 | # use image batch shape to create placeholder 63 | md_inputs_var = tf.reshape( 64 | md_next_element[0], (-1, inputs_shape[1], inputs_shape[2], inputs_shape[3]) 65 | ) 66 | md_dms_var = tf.reshape(md_next_element[1], (-1, inputs_shape[1], inputs_shape[2])) 67 | md_nms_var = tf.reshape( 68 | md_next_element[2], (-1, inputs_shape[1], inputs_shape[2], 3) 69 | ) 70 | md_cams_var = tf.reshape(md_next_element[3], (-1, 16)) 71 | md_scaleXs_var = tf.reshape(md_next_element[4], (-1,)) 72 | md_scaleYs_var = tf.reshape(md_next_element[5], (-1,)) 73 | md_masks_var = tf.reshape( 74 | md_next_element[6], (-1, inputs_shape[1], inputs_shape[2]) 75 | ) 76 | md_reproj_inputs_var = tf.reshape( 77 | md_next_element[7], (-1, inputs_shape[1], inputs_shape[2], inputs_shape[3]) 78 | ) 79 | md_reproj_mask_var = tf.reshape( 80 | md_next_element[8], (-1, inputs_shape[1], inputs_shape[2]) 81 | ) 82 | 83 | train_flag = tf.placeholder(tf.bool, ()) 84 | supTrain_flag = tf.placeholder(tf.bool, ()) 85 | 86 | pair_label_var = tf.constant( 87 | np.repeat(np.arange(num_subbatch_input), inputs_shape[0])[:, None], 88 | dtype=tf.float32, 89 | ) 90 | 91 | # feed-foward neural network from input images to lighting and albedo 92 | ( 93 | loss, 94 | render_err, 95 | reproj_err, 96 | cross_render_err, 97 | reg_loss, 98 | illu_prior_loss, 99 | nm_smt_loss, 100 | nm_loss, 101 | albedos, 102 | nm_pred, 103 | shadow, 104 | sdFree_inputs, 105 | sdFree_shadings, 106 | sdFree_recons, 107 | ) = make_parallel( 108 | num_gpus, 109 | md_inputs_var, 110 | md_dms_var, 111 | md_nms_var, 112 | md_cams_var, 113 | md_scaleXs_var, 114 | md_scaleYs_var, 115 | md_masks_var, 116 | md_reproj_inputs_var, 117 | md_reproj_mask_var, 118 | pair_label_var, 119 | train_flag, 120 | supTrain_flag, 121 | inputs_shape, 122 | ) 123 | 124 | ### regualarisation loss 125 | reg_loss = sum(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)) 126 | 127 | # defined traning loop 128 | iters = 500 129 | num_subbatch = num_subbatch_input 130 | num_iters = np.int32(np.ceil(num_train_batches / num_subbatch)) 131 | num_test_iters = np.int32(np.ceil(num_test_batches / num_subbatch)) 132 | 133 | # define variable list for each of training 134 | g_vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope="inverserendernet") 135 | 136 | # training op 137 | global_step = tf.Variable(1, name="global_step", trainable=False) 138 | 139 | g_optimizer = tf.train.AdamOptimizer(0.0005) 140 | 141 | update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) 142 | with tf.control_dependencies(update_ops): 143 | # Ensures that we execute the update_ops before performing the train_step 144 | g_train_step = g_optimizer.minimize( 145 | loss + reg_loss, 146 | global_step=global_step, 147 | var_list=g_vars, 148 | colocate_gradients_with_ops=True, 149 | ) 150 | 151 | # define saver for saving and restoring 152 | saver = tf.train.Saver(g_vars + [global_step]) 153 | 154 | config = tf.ConfigProto(allow_soft_placement=True) 155 | config.gpu_options.allow_growth = True 156 | sess = tf.InteractiveSession(config=config) 157 | tf.local_variables_initializer().run() 158 | tf.global_variables_initializer().run() 159 | 160 | if training_mode == "scratch": 161 | pass 162 | 163 | elif training_mode == "trained": 164 | saver.restore(sess, "model/model.ckpt") 165 | 166 | elif training_mode == "debug": 167 | saver.restore(sess, "model/model.ckpt") 168 | 169 | # save summeries 170 | render_err_summary = tf.summary.scalar("self_sup/render_err", render_err) 171 | reproj_err_summary = tf.summary.scalar("self_sup/reproj_err", reproj_err) 172 | cross_render_err_summary = tf.summary.scalar( 173 | "self_sup/cross_render_err", cross_render_err 174 | ) 175 | illu_prior_loss_summary = tf.summary.scalar( 176 | "self_sup/illu_prior_loss", illu_prior_loss 177 | ) 178 | nm_loss_summary = tf.summary.scalar("self_sup/nm_loss", nm_loss) 179 | nm_smt_loss_summary = tf.summary.scalar("self_sup/nm_smt_loss", nm_smt_loss) 180 | 181 | ori_summary = tf.summary.image("ori_img", md_inputs_var, max_outputs=15) 182 | am_summary = tf.summary.image("am", albedos, max_outputs=15) 183 | nm_summary = tf.summary.image("nm", nm_pred, max_outputs=15) 184 | shadow_summary = tf.summary.image("shadow", shadow, max_outputs=15) 185 | sdFree_shadings_summary = tf.summary.image( 186 | "sdFree_shadings", sdFree_shadings, max_outputs=15 187 | ) 188 | sdFree_inputs_summary = tf.summary.image( 189 | "sdFree_inputs", sdFree_inputs, max_outputs=15 190 | ) 191 | sdFree_recons_summary = tf.summary.image( 192 | "sdFree_recons", sdFree_recons, max_outputs=15 193 | ) 194 | 195 | performance_summary = tf.summary.merge( 196 | [ 197 | render_err_summary, 198 | reproj_err_summary, 199 | cross_render_err_summary, 200 | illu_prior_loss_summary, 201 | nm_loss_summary, 202 | nm_smt_loss_summary, 203 | ] 204 | ) 205 | imgs_summary = tf.summary.merge( 206 | [ 207 | ori_summary, 208 | am_summary, 209 | nm_summary, 210 | shadow_summary, 211 | sdFree_shadings_summary, 212 | sdFree_inputs_summary, 213 | sdFree_recons_summary, 214 | ] 215 | ) 216 | 217 | if not (os.path.exists("summaries")): 218 | os.mkdir("summaries") 219 | summ_first = os.path.join("summaries", "first") 220 | if not (os.path.exists(summ_first)): 221 | os.mkdir(summ_first) 222 | else: 223 | shutil.rmtree(summ_first, ignore_errors=True) 224 | summ_writer = tf.summary.FileWriter(summ_first, sess.graph) 225 | 226 | # supTrain = True -> train albedo net by given nm_gt 227 | # supTrain = False -> train albedo net using nm_preds 228 | md_trainData_init_op.run() 229 | 230 | best_score = 100 231 | best_result = 0 232 | for i in range(1, iters + 1): 233 | g_loss_avg = 0 234 | f = open("cost.txt", "a") 235 | if training_mode == "trained" or training_mode == "scratch": 236 | for j in range(1, num_iters + 1): 237 | 238 | print("iter %d/%d loop %d/%d" % (i, iters, j, num_iters)) 239 | f.write("iter %d/%d loop %d/%d" % (i, iters, j, num_iters)) 240 | start_time_g = time.time() 241 | if j % 50 == 1: 242 | [ 243 | global_step_val, 244 | imgs_summary_val, 245 | performance_summary_val, 246 | _, 247 | loss_val, 248 | reg_loss_val, 249 | render_err_val, 250 | reproj_err_val, 251 | cross_render_err_val, 252 | illu_prior_val, 253 | nm_smt_loss_val, 254 | nm_loss_val, 255 | ] = sess.run( 256 | [ 257 | global_step, 258 | imgs_summary, 259 | performance_summary, 260 | g_train_step, 261 | loss, 262 | reg_loss, 263 | render_err, 264 | reproj_err, 265 | cross_render_err, 266 | illu_prior_loss, 267 | nm_smt_loss, 268 | nm_loss, 269 | ], 270 | feed_dict={train_flag: True, supTrain_flag: supTrain}, 271 | ) 272 | summ_writer.add_summary(performance_summary_val, global_step_val) 273 | summ_writer.add_summary(imgs_summary_val, global_step_val) 274 | 275 | else: 276 | [ 277 | _, 278 | loss_val, 279 | reg_loss_val, 280 | render_err_val, 281 | reproj_err_val, 282 | cross_render_err_val, 283 | illu_prior_val, 284 | nm_smt_loss_val, 285 | nm_loss_val, 286 | ] = sess.run( 287 | [ 288 | g_train_step, 289 | loss, 290 | reg_loss, 291 | render_err, 292 | reproj_err, 293 | cross_render_err, 294 | illu_prior_loss, 295 | nm_smt_loss, 296 | nm_loss, 297 | ], 298 | feed_dict={train_flag: True, supTrain_flag: supTrain}, 299 | ) 300 | 301 | g_loss_avg += loss_val 302 | 303 | if j % 1 == 0: 304 | print( 305 | "\tg_loss_avg = %f, loss = %f, took %.3fs" 306 | % (g_loss_avg / j, loss_val, time.time() - start_time_g) 307 | ) 308 | print( 309 | "\t\treg_loss = %f, render_err = %f, reproj_err = %f, cross_render_err = %f, illu_prior = %f, nm_smt_loss = %f, nm_loss = %f" 310 | % ( 311 | reg_loss_val, 312 | render_err_val, 313 | reproj_err_val, 314 | cross_render_err_val, 315 | illu_prior_val, 316 | nm_smt_loss_val, 317 | nm_loss_val, 318 | ) 319 | ) 320 | 321 | f.write( 322 | "\tg_loss_avg = %f, loss = %f, took %.3fs\n\t\treg_loss = %f, render_err = %f, reproj_err = %f, cross_render_err = %f, illu_prior = %f, nm_smt_loss = %f, nm_loss = %f\n" 323 | % ( 324 | g_loss_avg / j, 325 | loss_val, 326 | time.time() - start_time_g, 327 | reg_loss_val, 328 | render_err_val, 329 | reproj_err_val, 330 | cross_render_err_val, 331 | illu_prior_val, 332 | nm_smt_loss_val, 333 | nm_loss_val, 334 | ) 335 | ) 336 | 337 | f.close() 338 | 339 | md_testData_init_op.run() 340 | test_loss = 0 341 | test_render_err = 0 342 | test_reproj_err = 0 343 | test_cross_render_err = 0 344 | test_illu_prior = 0 345 | test_nm_loss = 0 346 | for j in range(1, num_test_iters + 1): 347 | [ 348 | loss_val, 349 | reg_loss_val, 350 | render_err_val, 351 | reproj_err_val, 352 | cross_render_err_val, 353 | illu_prior_val, 354 | nm_smt_loss_val, 355 | nm_loss_val, 356 | ] = sess.run( 357 | [ 358 | loss, 359 | reg_loss, 360 | render_err, 361 | reproj_err, 362 | cross_render_err, 363 | illu_prior_loss, 364 | nm_smt_loss, 365 | nm_loss, 366 | ], 367 | feed_dict={train_flag: False, supTrain_flag: supTrain}, 368 | ) 369 | 370 | test_loss += loss_val 371 | test_render_err += render_err_val 372 | test_reproj_err += reproj_err_val 373 | test_cross_render_err += cross_render_err_val 374 | test_illu_prior += illu_prior_val 375 | test_nm_loss += nm_loss_val 376 | 377 | test_loss /= num_test_iters 378 | test_render_err /= num_test_iters 379 | test_reproj_err /= num_test_iters 380 | test_cross_render_err /= num_test_iters 381 | test_illu_prior /= num_test_iters 382 | test_nm_loss /= num_test_iters 383 | 384 | score = test_loss 385 | 386 | if best_score > score: 387 | best_result = i 388 | best_score = score 389 | saver.save(sess, "model_best/model.ckpt") 390 | 391 | f = open("test.txt", "a") 392 | f.write( 393 | "iter {:d}, score {:f}: render_err={:f}, reproj_err={:f}, cross_render_err={:f}, illu_prior={:f}, nm_loss={:f}\n".format( 394 | i, 395 | score, 396 | test_render_err, 397 | test_reproj_err, 398 | test_cross_render_err, 399 | test_illu_prior, 400 | test_nm_loss, 401 | ) 402 | ) 403 | f.write( 404 | "\tbest_result {:d}, best_score {:f}\n".format(best_result, best_score) 405 | ) 406 | f.close() 407 | 408 | md_trainData_init_op.run() 409 | 410 | # save model every 10 iterations 411 | if i % 1 == 0: 412 | saver.save(sess, "model/model.ckpt") 413 | 414 | 415 | def make_parallel( 416 | num_gpus, 417 | inputs_var, 418 | dms_var, 419 | nms_var, 420 | cams_var, 421 | scaleXs_var, 422 | scaleYs_var, 423 | masks_var, 424 | reproj_inputs_var, 425 | reproj_mask_var, 426 | pair_label_var, 427 | train_flag, 428 | supTrain_flag, 429 | inputs_shape, 430 | ): 431 | from model import SfMNet, consistency_layer 432 | 433 | inputs_var = tf.split(inputs_var, num_gpus) 434 | dms_var = tf.split(dms_var, num_gpus) 435 | nms_var = tf.split(nms_var, num_gpus) 436 | cams_var = tf.split(cams_var, num_gpus) 437 | scaleXs_var = tf.split(scaleXs_var, num_gpus) 438 | scaleYs_var = tf.split(scaleYs_var, num_gpus) 439 | masks_var = tf.split(masks_var, num_gpus) 440 | reproj_inputs_var = tf.split(reproj_inputs_var, num_gpus) 441 | reproj_mask_var = tf.split(reproj_mask_var, num_gpus) 442 | pair_label_var = tf.split(pair_label_var, num_gpus) 443 | 444 | loss_split = [] 445 | render_err_split = [] 446 | reproj_err_split = [] 447 | cross_render_err_split = [] 448 | reg_loss_split = [] 449 | illu_prior_loss_split = [] 450 | nm_smt_loss_split = [] 451 | nm_loss_split = [] 452 | for i in range(num_gpus): 453 | with tf.device(tf.DeviceSpec(device_type="GPU", device_index=i)): 454 | with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE): 455 | # mask out sky in inputs and nms 456 | # dms_var *= masks_var 457 | masks_var_4d = tf.expand_dims(masks_var[i], axis=-1) 458 | reproj_mask_var_4d = tf.expand_dims(reproj_mask_var[i], axis=-1) 459 | 460 | inputs_var[i] *= masks_var_4d 461 | nms_var[i] *= masks_var_4d 462 | 463 | albedos, shadow, nm_pred = SfMNet.SfMNet( 464 | inputs=inputs_var[i], 465 | is_training=train_flag, 466 | height=inputs_shape[1], 467 | width=inputs_shape[2], 468 | masks=masks_var_4d, 469 | n_layers=30, 470 | n_pools=4, 471 | depth_base=32, 472 | ) 473 | 474 | normals = tf.where(supTrain_flag, nms_var[i], nm_pred) 475 | 476 | # linearise srgb input to rgb 477 | rbg_inputs_var = inputs_srbg_2_rbg(inputs_var[i]) 478 | rbg_reproj_inputs_var = inputs_srbg_2_rbg(reproj_inputs_var[i]) 479 | 480 | # infer lighting from rgb input and compute lighting loss 481 | lightings, illu_prior_loss = SfMNet.comp_light( 482 | rbg_inputs_var, albedos, normals, shadow, 1.0, masks_var_4d 483 | ) 484 | 485 | ( 486 | loss, 487 | render_err, 488 | reproj_err, 489 | cross_render_err, 490 | reg_loss, 491 | illu_prior_loss, 492 | nm_smt_loss, 493 | nm_loss, 494 | sdFree_inputs, 495 | sdFree_shadings, 496 | sdFree_recons, 497 | ) = consistency_layer.loss_formulate( 498 | albedos, 499 | shadow, 500 | nm_pred, 501 | lightings, 502 | nms_var[i], 503 | rbg_inputs_var, 504 | dms_var[i], 505 | cams_var[i], 506 | scaleXs_var[i], 507 | scaleYs_var[i], 508 | masks_var_4d, 509 | rbg_reproj_inputs_var, 510 | reproj_mask_var_4d, 511 | pair_label_var[i], 512 | supTrain_flag, 513 | illu_prior_loss, 514 | reg_loss_flag=False, 515 | ) 516 | 517 | loss_split += [loss] 518 | render_err_split += [render_err] 519 | reproj_err_split += [reproj_err] 520 | cross_render_err_split += [cross_render_err] 521 | reg_loss_split += [reg_loss] 522 | illu_prior_loss_split += [illu_prior_loss] 523 | nm_smt_loss_split += [nm_smt_loss] 524 | nm_loss_split += [nm_loss] 525 | 526 | loss = tf.reduce_mean(loss_split) 527 | render_err = tf.reduce_mean(render_err_split) 528 | reproj_err = tf.reduce_mean(reproj_err_split) 529 | cross_render_err = tf.reduce_mean(cross_render_err_split) 530 | reg_loss = tf.reduce_mean(reg_loss_split) 531 | illu_prior_loss = tf.reduce_mean(illu_prior_loss_split) 532 | nm_smt_loss = tf.reduce_mean(nm_smt_loss_split) 533 | nm_loss = tf.reduce_mean(nm_loss_split) 534 | 535 | return ( 536 | loss, 537 | render_err, 538 | reproj_err, 539 | cross_render_err, 540 | reg_loss, 541 | illu_prior_loss, 542 | nm_smt_loss, 543 | nm_loss, 544 | albedos, 545 | nm_pred, 546 | shadow, 547 | sdFree_inputs, 548 | sdFree_shadings, 549 | sdFree_recons, 550 | ) 551 | 552 | 553 | def inputs_srbg_2_rbg(imgs): 554 | imgs = imgs / 2.0 + 0.5 555 | 556 | ret = tf.zeros_like(imgs) 557 | dp_mask = tf.to_float(imgs <= 0.04045) 558 | ret += dp_mask * imgs / 12.92 559 | ret += tf.pow((imgs + 0.055) / 1.055, 2.2) * (1 - dp_mask) 560 | imgs = tf.identity(ret) 561 | 562 | imgs = imgs * 2.0 - 1.0 563 | 564 | return imgs 565 | 566 | 567 | if __name__ == "__main__": 568 | main() 569 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/utils/__init__.py -------------------------------------------------------------------------------- /utils/diode_metrics.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def angular_error(gt, pred): 5 | 6 | # compute the vector product between gt and prediction for each pixel 7 | angularDist = (gt * pred).sum(axis=-1) 8 | 9 | # compute the angle from vector product 10 | angularDist = np.arccos(np.clip(angularDist, -1.0, 1.0)) 11 | 12 | # convert radius to degrees 13 | angularDist = angularDist / np.pi * 180 14 | 15 | # find mask 16 | mask = np.float32(np.sum(gt ** 2, axis=-1) > 0.9) 17 | mask = mask != 0.0 18 | 19 | # only compute pixels under mask 20 | angularDist_masked = angularDist[mask] 21 | 22 | return angularDist_masked 23 | -------------------------------------------------------------------------------- /utils/iiw_test_ids.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/utils/iiw_test_ids.npy -------------------------------------------------------------------------------- /utils/render_sphere_nm.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | def render_sphere_nm(radius, num): 4 | # nm is a batch of normal maps 5 | nm = [] 6 | 7 | for i in range(num): 8 | ### hemisphere 9 | height = 2*radius 10 | width = 2*radius 11 | centre = radius 12 | x_grid, y_grid = np.meshgrid(np.arange(1.,2*radius+1), np.arange(1.,2*radius+1)) 13 | # grids are (-radius, radius) 14 | x_grid -= centre 15 | # y_grid -= centre 16 | y_grid = centre - y_grid 17 | # scale range of h and w grid in (-1,1) 18 | x_grid /= radius 19 | y_grid /= radius 20 | dist = 1 - (x_grid**2+y_grid**2) 21 | mask = dist > 0 22 | z_grid = np.ones_like(mask) * np.nan 23 | z_grid[mask] = np.sqrt(dist[mask]) 24 | 25 | # remove xs and ys by masking out nans in zs 26 | x_grid[~(mask)] = np.nan 27 | y_grid[~(mask)] = np.nan 28 | 29 | # concatenate normal map 30 | nm.append(np.stack([x_grid,y_grid,z_grid],axis=2)) 31 | 32 | 33 | 34 | ### sphere 35 | # span the regular grid for computing azimuth and zenith angular map 36 | # height = 2*radius 37 | # width = 2*radius 38 | # centre = radius 39 | # h_grid, v_grid = np.meshgrid(np.arange(1.,2*radius+1), np.arange(1.,2*radius+1)) 40 | # # grids are (-radius, radius) 41 | # h_grid -= centre 42 | # # v_grid -= centre 43 | # v_grid = centre - v_grid 44 | # # scale range of h and v grid in (-1,1) 45 | # h_grid /= radius 46 | # v_grid /= radius 47 | 48 | # # z_grid is linearly spread along theta/zenith in range (0,pi) 49 | # dist_grid = np.sqrt(h_grid**2+v_grid**2) 50 | # dist_grid[dist_grid>1] = np.nan 51 | # theta_grid = dist_grid * np.pi 52 | # z_grid = np.cos(theta_grid) 53 | 54 | # rho_grid = np.arctan2(v_grid,h_grid) 55 | # x_grid = np.sin(theta_grid)*np.cos(rho_grid) 56 | # y_grid = np.sin(theta_grid)*np.sin(rho_grid) 57 | 58 | # # concatenate normal map 59 | # nm.append(np.stack([x_grid,y_grid,z_grid],axis=2)) 60 | 61 | 62 | # construct batch 63 | nm = np.stack(nm,axis=0) 64 | 65 | 66 | 67 | return nm 68 | -------------------------------------------------------------------------------- /utils/whdr.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python2.7 2 | # 3 | # This is an implementation of the WHDR metric proposed in this paper: 4 | # 5 | # Sean Bell, Kavita Bala, Noah Snavely. "Intrinsic Images in the Wild". ACM 6 | # Transactions on Graphics (SIGGRAPH 2014). http://intrinsic.cs.cornell.edu. 7 | # 8 | # Please cite the above paper if you find this code useful. This code is 9 | # released under the MIT license (http://opensource.org/licenses/MIT). 10 | # 11 | 12 | 13 | import sys 14 | import json 15 | import argparse 16 | import numpy as np 17 | from PIL import Image 18 | 19 | 20 | def compute_whdr(reflectance, judgements, delta=0.10): 21 | """ Return the WHDR score for a reflectance image, evaluated against human 22 | judgements. The return value is in the range 0.0 to 1.0, or None if there 23 | are no judgements for the image. See section 3.5 of our paper for more 24 | details. 25 | 26 | :param reflectance: a numpy array containing the linear RGB 27 | reflectance image. 28 | 29 | :param judgements: a JSON object loaded from the Intrinsic Images in 30 | the Wild dataset. 31 | 32 | :param delta: the threshold where humans switch from saying "about the 33 | same" to "one point is darker." 34 | """ 35 | 36 | points = judgements['intrinsic_points'] 37 | comparisons = judgements['intrinsic_comparisons'] 38 | id_to_points = {p['id']: p for p in points} 39 | rows, cols = reflectance.shape[0:2] 40 | 41 | error_sum = 0.0 42 | weight_sum = 0.0 43 | 44 | for c in comparisons: 45 | # "darker" is "J_i" in our paper 46 | darker = c['darker'] 47 | if darker not in ('1', '2', 'E'): 48 | continue 49 | 50 | # "darker_score" is "w_i" in our paper 51 | weight = c['darker_score'] 52 | if weight <= 0 or weight is None: 53 | continue 54 | 55 | point1 = id_to_points[c['point1']] 56 | point2 = id_to_points[c['point2']] 57 | if not point1['opaque'] or not point2['opaque']: 58 | continue 59 | 60 | # convert to grayscale and threshold 61 | l1 = max(1e-10, np.mean(reflectance[ 62 | int(point1['y'] * rows), int(point1['x'] * cols), ...])) 63 | l2 = max(1e-10, np.mean(reflectance[ 64 | int(point2['y'] * rows), int(point2['x'] * cols), ...])) 65 | 66 | # convert algorithm value to the same units as human judgements 67 | if l2 / l1 > 1.0 + delta: 68 | alg_darker = '1' 69 | elif l1 / l2 > 1.0 + delta: 70 | alg_darker = '2' 71 | else: 72 | alg_darker = 'E' 73 | 74 | if darker != alg_darker: 75 | error_sum += weight 76 | weight_sum += weight 77 | 78 | if weight_sum: 79 | return error_sum / weight_sum 80 | else: 81 | return None 82 | 83 | 84 | def load_image(filename, is_srgb=True): 85 | """ Load an image that is either linear or sRGB-encoded. """ 86 | 87 | if not filename: 88 | raise ValueError("Empty filename") 89 | image = np.asarray(Image.open(filename)).astype(np.float) / 255.0 90 | if is_srgb: 91 | return srgb_to_rgb(image) 92 | else: 93 | return image 94 | 95 | 96 | def srgb_to_rgb(srgb): 97 | """ Convert an sRGB image to a linear RGB image """ 98 | 99 | ret = np.zeros_like(srgb) 100 | idx0 = srgb <= 0.04045 101 | idx1 = srgb > 0.04045 102 | ret[idx0] = srgb[idx0] / 12.92 103 | ret[idx1] = np.power((srgb[idx1] + 0.055) / 1.055, 2.4) 104 | return ret 105 | 106 | 107 | if __name__ == "__main__": 108 | parser = argparse.ArgumentParser( 109 | description=( 110 | 'Evaluate an intrinsic image decomposition using the WHDR metric presented in:\n' 111 | ' Sean Bell, Kavita Bala, Noah Snavely. "Intrinsic Images in the Wild".\n' 112 | ' ACM Transactions on Graphics (SIGGRAPH 2014).\n' 113 | ' http://intrinsic.cs.cornell.edu.\n' 114 | '\n' 115 | 'The output is in the range 0.0 to 1.0.' 116 | ) 117 | ) 118 | 119 | parser.add_argument( 120 | 'reflectance', metavar='', 121 | help='reflectance image to be evaluated') 122 | 123 | parser.add_argument( 124 | 'judgements', metavar='', 125 | help='human judgements JSON file') 126 | 127 | parser.add_argument( 128 | '-l', '--linear', action='store_true', required=False, 129 | help='assume the reflectance image is linear, otherwise assume sRGB') 130 | 131 | parser.add_argument( 132 | '-d', '--delta', metavar='', type=float, required=False, default=0.10, 133 | help='delta threshold (default 0.10)') 134 | 135 | if len(sys.argv) < 2: 136 | parser.print_help() 137 | sys.exit(1) 138 | 139 | args = parser.parse_args() 140 | reflectance = load_image(filename=args.reflectance, is_srgb=(not args.linear)) 141 | judgements = json.load(open(args.judgements)) 142 | 143 | whdr = compute_whdr(reflectance, judgements, args.delta) 144 | print(whdr) 145 | --------------------------------------------------------------------------------