├── LICENSE
├── README.md
├── demo_im.jpg
├── demo_mask.jpg
├── figure
└── teaser.png
├── model
├── SfMNet.py
├── __init__.py
├── consistency_layer.py
├── dataloader.py
├── hdr_illu_pca
│ ├── mean.npy
│ ├── pcaMean.npy
│ ├── pcaVariance.npy
│ └── pcaVector.npy
├── lambSH_layer.py
├── pred_illuDecomp_layer_new.py
├── reproj_layer.py
└── vgg16.py
├── run_test_demo.sh
├── run_test_diode.sh
├── run_test_iiw.sh
├── test.py
├── train.py
└── utils
├── __init__.py
├── diode_metrics.py
├── iiw_test_ids.npy
├── render_sphere_nm.py
└── whdr.py
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2023 Ye Yu
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Outdoor inverse rendering from a single image using multiview self-supervision
2 |
3 | Y. Yu and W. Smith, "Outdoor inverse rendering from a single image using multiview self-supervision" in IEEE Transactions on Pattern Analysis & Machine Intelligence
4 |
5 | ## Abstract
6 |
7 |

8 |
9 | In this paper we show how to perform scene-level inverse rendering to recover shape, reflectance and lighting from a single, uncontrolled image using a fully convolutional neural network. The network takes an RGB image as input, regresses albedo, shadow and normal maps from which we infer least squares optimal spherical harmonic lighting coefficients. Our network is trained using large uncontrolled multiview and timelapse image collections without ground truth. By incorporating a differentiable renderer, our network can learn from self-supervision. Since the problem is ill-posed we introduce additional supervision. Our key insight is to perform offline multiview stereo (MVS) on images containing rich illumination variation. From the MVS pose and depth maps, we can cross project between overlapping views such that Siamese training can be used to ensure consistent estimation of photometric invariants. MVS depth also provides direct coarse supervision for normal map estimation. We believe this is the first attempt to use MVS supervision for learning inverse rendering. In addition, we learn a statistical natural illumination prior. We evaluate performance on inverse rendering, normal map estimation and intrinsic image decomposition benchmarks.
10 |
11 | ## Evaluation
12 |
13 | ### Dependencies
14 |
15 | To run our evaluation code, please create your environment based on following dependencies:
16 |
17 | * tensorflow 1.12.0
18 | * python 3.6
19 | * skimage
20 | * cv2
21 | * numpy
22 |
23 | ### Pretrained model
24 |
25 | 1. Download our pretrained model from [here](https://drive.google.com/uc?export=download&id=1hGIoK3Pemtg3eYjFy_CBK-R37D3gA0VC).
26 | 2. Untar the downloaded file to get models: **model_ckpt**, **iiw_model_ckpt** and **diode_model_ckpt**.
27 | 3. Place these three model folders at the root path.
28 |
29 | ```tree
30 | InverseRenderNet_v2
31 | │ README.md
32 | │ test.py
33 | │ ...
34 | |
35 | └─────model_ckpt
36 | │ model.ckpt.meta
37 | │ model.ckpt.index
38 | │ ...
39 | └─────iiw_model_ckpt
40 | │ model.ckpt.meta
41 | │ model.ckpt.index
42 | │ ...
43 | └─────diode_model_ckpt
44 | │ model.ckpt.meta
45 | │ model.ckpt.index
46 | │ ...
47 | ```
48 |
49 | ### Test on demo image
50 |
51 | You can perform inverse rendering on random RGB image by our pretrained model. To run the demo code, you need to specify the path to pretrained model, path to RGB image and corresponding mask which masked out sky in the image. The mask can be generated by PSPNet, which you can find on . Finally inverse rendering results will be saved to the output folder named by your argument.
52 |
53 | An example of our inference results on the demonstration image can be performed by:
54 |
55 | ```bash
56 | bash run_test_demo.sh
57 | ```
58 |
59 | Other than running test on the provided demo image, you can test your own images by specifying `IMAGE_PATH` and `MASK_PATH` in [run_test_demo.sh](run_test_demo.sh). The default output folder, being specified by`RESULTS_DIR` is **test_results**.
60 |
61 | ### Test on IIW
62 |
63 | * IIW dataset should be downloaded firstly from
64 |
65 | * Make sure `IMAGES_DIR` in [run_test_iiw.sh](run_test_iiw.sh) point to the path of IIW data and run:
66 |
67 | ```bash
68 | bash run_test_iiw.sh
69 | ```
70 |
71 | Results will be saved to **test_iiw**.
72 |
73 | ### Test on DIODE
74 |
75 | * Download the dataset from
76 |
77 | * Replace `${IMAGES_DIR}` defined in [run_test_diode.sh](run_test_diode.sh) with the path to DIODE data and run:
78 |
79 | ```bash
80 | bash run_test_diode.sh
81 | ```
82 |
83 | Results will be saved to **test_diode**.
84 |
85 | ## Citation
86 |
87 | If you use the model or the code in your research, please cite the following paper:
88 |
89 | ```bibtex
90 | @article{yu2021outdoor,
91 | title={Outdoor inverse rendering from a single image using multiview self-supervision},
92 | author={Yu, Ye and Smith, William A. P.},
93 | journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
94 | note={to appear},
95 | year={2021}
96 | }
97 | ```
98 |
--------------------------------------------------------------------------------
/demo_im.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/demo_im.jpg
--------------------------------------------------------------------------------
/demo_mask.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/demo_mask.jpg
--------------------------------------------------------------------------------
/figure/teaser.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/figure/teaser.png
--------------------------------------------------------------------------------
/model/SfMNet.py:
--------------------------------------------------------------------------------
1 | import importlib
2 | import tensorflow as tf
3 | import numpy as np
4 | import tensorflow.contrib.layers as layers
5 | import os
6 | from model import pred_illuDecomp_layer_new as pred_illuDecomp_layer
7 |
8 |
9 | def SfMNet(
10 | inputs,
11 | height,
12 | width,
13 | masks,
14 | n_layers=12,
15 | n_pools=2,
16 | is_training=True,
17 | depth_base=64,
18 | ):
19 | conv_layers = np.int32(n_layers / 2) - 1
20 | deconv_layers = np.int32(n_layers / 2)
21 | # number of layers before perform pooling
22 | nlayers_befPool = np.int32(np.ceil((conv_layers - 1) / n_pools) - 1)
23 |
24 | max_depth = 512
25 |
26 | # dimensional arrangement
27 | # number of layer at tail where no pooling anymore
28 | # also exclude first layer who in charge of expanding dimension
29 | if depth_base * 2 ** n_pools < max_depth:
30 | tail = conv_layers - nlayers_befPool * n_pools
31 | tail_deconv = deconv_layers - nlayers_befPool * n_pools
32 | else:
33 | maxNum_pool = np.log2(max_depth / depth_base)
34 | tail = np.int32(conv_layers - nlayers_befPool * maxNum_pool)
35 | tail_deconv = np.int32(deconv_layers - nlayers_befPool * maxNum_pool)
36 |
37 | f_in_conv = (
38 | [3]
39 | + [
40 | np.int32(depth_base * 2 ** (np.ceil(i / nlayers_befPool) - 1))
41 | for i in range(1, conv_layers - tail + 1)
42 | ]
43 | + [
44 | np.int32(depth_base * 2 ** maxNum_pool)
45 | for i in range(conv_layers - tail + 1, conv_layers + 1)
46 | ]
47 | )
48 | f_out_conv = (
49 | [64]
50 | + [
51 | np.int32(depth_base * 2 ** (np.floor(i / nlayers_befPool)))
52 | for i in range(1, conv_layers - tail + 1)
53 | ]
54 | + [
55 | np.int32(depth_base * 2 ** maxNum_pool)
56 | for i in range(conv_layers - tail + 1, conv_layers + 1)
57 | ]
58 | )
59 |
60 | f_in_deconv = f_out_conv[:0:-1] + [64]
61 | f_out_amDeconv = f_in_conv[:0:-1] + [3]
62 | f_out_MaskDeconv = f_in_conv[:0:-1] + [1]
63 | f_out_nmDeconv = f_in_conv[:0:-1] + [2]
64 |
65 | group_norm_params = {
66 | "groups": 16,
67 | "channels_axis": -1,
68 | "reduction_axes": (-3, -2),
69 | "center": True,
70 | "scale": True,
71 | "epsilon": 1e-4,
72 | "param_initializers": {
73 | "beta_initializer": tf.zeros_initializer(),
74 | "gamma_initializer": tf.ones_initializer(),
75 | "moving_variance_initializer": tf.ones_initializer(),
76 | "moving_average_initializer": tf.zeros_initializer(),
77 | },
78 | }
79 |
80 | # contractive conv_layer block
81 | conv_out = inputs
82 | conv_out_list = []
83 | for i, f_in, f_out in zip(range(1, conv_layers + 2), f_in_conv, f_out_conv):
84 | scope = "inverserendernet/conv" + str(i)
85 |
86 | if (
87 | np.mod(i - 1, nlayers_befPool) == 0
88 | and i <= n_pools * nlayers_befPool + 1
89 | and i != 1
90 | ):
91 | conv_out_list.append(conv_out)
92 | conv_out = conv2d(conv_out, scope, f_in, f_out)
93 | conv_out = tf.nn.max_pool(
94 | conv_out, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME"
95 | )
96 | else:
97 | conv_out = conv2d(conv_out, scope, f_in, f_out)
98 |
99 | # expanding deconv_layer block succeeding conv_layer block
100 | am_deconv_out = conv_out
101 | for i, f_in, f_out in zip(range(1, deconv_layers + 1), f_in_deconv, f_out_amDeconv):
102 | scope = "inverserendernet/am_deconv" + str(i)
103 |
104 | # expand resolution every after nlayers_befPool deconv_layer
105 | if np.mod(i, nlayers_befPool) == 0 and i <= n_pools * nlayers_befPool:
106 | # attach previous convolutional output to upsampling/deconvolutional output
107 | tmp = conv_out_list[-np.int32(i / nlayers_befPool)]
108 | output_shape = tmp.shape[1:3]
109 | am_deconv_out = tf.image.resize_images(am_deconv_out, output_shape)
110 | am_deconv_out = conv2d(am_deconv_out, scope, f_in, f_out)
111 | am_deconv_out = tf.concat([am_deconv_out, tmp], axis=-1)
112 | elif i == deconv_layers:
113 | # no normalisation and activation, which is placed at the end
114 | am_deconv_out = layers.conv2d(
115 | am_deconv_out,
116 | num_outputs=f_out,
117 | kernel_size=[3, 3],
118 | stride=[1, 1],
119 | padding="SAME",
120 | normalizer_fn=None,
121 | activation_fn=None,
122 | weights_initializer=tf.random_normal_initializer(
123 | mean=0, stddev=np.sqrt(2 / 9 / f_in)
124 | ),
125 | weights_regularizer=layers.l2_regularizer(scale=1e-5),
126 | scope=scope,
127 | )
128 | else:
129 | # layers that not expand spatial resolution
130 | am_deconv_out = conv2d(am_deconv_out, scope, f_in, f_out)
131 |
132 | # deconvolution net for nm estimates
133 | nm_deconv_out = conv_out
134 | for i, f_in, f_out in zip(range(1, deconv_layers + 1), f_in_deconv, f_out_nmDeconv):
135 | scope = "inverserendernet/nm" + str(i)
136 |
137 | # expand resolution every after nlayers_befPool deconv_layer
138 | if np.mod(i, nlayers_befPool) == 0 and i <= n_pools * nlayers_befPool:
139 | # attach previous convolutional output to upsampling/deconvolutional output
140 | tmp = conv_out_list[-np.int32(i / nlayers_befPool)]
141 | output_shape = tmp.shape[1:3]
142 | nm_deconv_out = tf.image.resize_images(nm_deconv_out, output_shape)
143 | nm_deconv_out = conv2d(nm_deconv_out, scope, f_in, f_out)
144 | nm_deconv_out = tf.concat([nm_deconv_out, tmp], axis=-1)
145 | elif i == deconv_layers:
146 | # no normalisation and activation, which is placed at the end
147 | nm_deconv_out = layers.conv2d(
148 | nm_deconv_out,
149 | num_outputs=f_out,
150 | kernel_size=[3, 3],
151 | stride=[1, 1],
152 | padding="SAME",
153 | normalizer_fn=None,
154 | activation_fn=None,
155 | weights_initializer=tf.random_normal_initializer(
156 | mean=0, stddev=np.sqrt(2 / 9 / f_in)
157 | ),
158 | weights_regularizer=layers.l2_regularizer(scale=1e-5),
159 | biases_initializer=None,
160 | scope=scope,
161 | )
162 | else:
163 | # layers that not expand spatial resolution
164 | nm_deconv_out = conv2d(nm_deconv_out, scope, f_in, f_out)
165 |
166 | # deconv branch for predicting masks
167 | mask_deconv_out = conv_out
168 | for i, f_in, f_out in zip(
169 | range(1, deconv_layers + 1), f_in_deconv, f_out_MaskDeconv
170 | ):
171 | scope = "inverserendernet/mask_deconv" + str(i)
172 |
173 | # expand resolution every after nlayers_befPool deconv_layer
174 | if np.mod(i, nlayers_befPool) == 0 and i <= n_pools * nlayers_befPool:
175 | # with tf.variable_scope(scope):
176 | # attach previous convolutional output to upsampling/deconvolutional output
177 | tmp = conv_out_list[-np.int32(i / nlayers_befPool)]
178 | output_shape = tmp.shape[1:3]
179 | mask_deconv_out = tf.image.resize_images(mask_deconv_out, output_shape)
180 | mask_deconv_out = conv2d(mask_deconv_out, scope, f_in, f_out)
181 | mask_deconv_out = tf.concat([mask_deconv_out, tmp], axis=-1)
182 | elif i == deconv_layers:
183 | # no normalisation and activation, which is placed at the end
184 | mask_deconv_out = layers.conv2d(
185 | mask_deconv_out,
186 | num_outputs=f_out,
187 | kernel_size=[3, 3],
188 | stride=[1, 1],
189 | padding="SAME",
190 | normalizer_fn=None,
191 | activation_fn=None,
192 | weights_initializer=tf.random_normal_initializer(
193 | mean=0, stddev=np.sqrt(2 / 9 / f_in)
194 | ),
195 | weights_regularizer=layers.l2_regularizer(scale=1e-5),
196 | scope=scope,
197 | )
198 | else:
199 | # layers that not expand spatial resolution
200 | mask_deconv_out = conv2d(mask_deconv_out, scope, f_in, f_out)
201 |
202 | albedos = am_deconv_out[:, :, :, :3]
203 | nm_pred = nm_deconv_out
204 |
205 | albedos = tf.clip_by_value(tf.nn.tanh(albedos) * masks, -0.9999, 0.9999)
206 |
207 | nm_pred_norm = tf.sqrt(
208 | tf.reduce_sum(nm_pred ** 2, axis=-1, keepdims=True) + tf.constant(1.0)
209 | )
210 | nm_pred_xy = nm_pred / nm_pred_norm
211 | nm_pred_z = tf.constant(1.0) / nm_pred_norm
212 | nm_pred_xyz = tf.concat([nm_pred_xy, nm_pred_z], axis=-1) * masks
213 |
214 | shadow = mask_deconv_out[:, :, :, :1]
215 | shadow = tf.clip_by_value(tf.nn.tanh(shadow) * masks, -0.9999, 0.9999)
216 |
217 | return albedos, shadow, nm_pred_xyz
218 |
219 |
220 | def get_bilinear_filter(filter_shape, upscale_factor):
221 | ##filter_shape is [width, height, num_in_channels, num_out_channels]
222 | kernel_size = filter_shape[1]
223 | # Centre location of the filter for which value is calculated
224 | if kernel_size % 2 == 1:
225 | centre_location = upscale_factor - 1
226 | else:
227 | centre_location = upscale_factor - 0.5
228 |
229 | x, y = np.meshgrid(np.arange(kernel_size), np.arange(kernel_size))
230 | bilinear = (1 - abs((x - centre_location) / upscale_factor)) * (
231 | 1 - abs((y - centre_location) / upscale_factor)
232 | )
233 | weights = np.tile(
234 | bilinear[:, :, None, None], (1, 1, filter_shape[2], filter_shape[3])
235 | )
236 |
237 | return tf.constant_initializer(weights)
238 |
239 |
240 | def group_norm(inputs, scope="group_norm"):
241 | input_shape = tf.shape(inputs)
242 | _, H, W, C = inputs.get_shape().as_list()
243 | group = 32
244 | with tf.variable_scope(scope):
245 | gamma = tf.get_variable(
246 | "scale",
247 | shape=[C],
248 | dtype=tf.float32,
249 | initializer=tf.ones_initializer(),
250 | trainable=True,
251 | regularizer=layers.l2_regularizer(scale=1e-5),
252 | )
253 |
254 | beta = tf.get_variable(
255 | "bias",
256 | shape=[C],
257 | dtype=tf.float32,
258 | initializer=tf.zeros_initializer(),
259 | trainable=True,
260 | regularizer=layers.l2_regularizer(scale=1e-5),
261 | )
262 |
263 | inputs = tf.reshape(inputs, [-1, H, W, group, C // group], name="unpack")
264 | mean, var = tf.nn.moments(inputs, [1, 2, 4], keep_dims=True)
265 | inputs = (inputs - mean) / tf.sqrt(var + 1e-5)
266 | inputs = tf.reshape(inputs, input_shape, name="pack")
267 | gamma = tf.reshape(gamma, [1, 1, 1, C], name="reshape_gamma")
268 | beta = tf.reshape(beta, [1, 1, 1, C], name="reshape_beta")
269 | return inputs * gamma + beta
270 |
271 |
272 | def conv2d(inputs, scope, f_in, f_out):
273 | conv_out = layers.conv2d(
274 | inputs,
275 | num_outputs=f_out,
276 | kernel_size=[3, 3],
277 | stride=[1, 1],
278 | padding="SAME",
279 | normalizer_fn=None,
280 | activation_fn=None,
281 | weights_initializer=tf.random_normal_initializer(
282 | mean=0, stddev=np.sqrt(2 / 9 / f_in)
283 | ),
284 | weights_regularizer=layers.l2_regularizer(scale=1e-5),
285 | biases_initializer=None,
286 | scope=scope,
287 | )
288 |
289 | with tf.variable_scope(scope):
290 | gn_out = group_norm(conv_out)
291 |
292 | relu_out = tf.nn.relu(gn_out)
293 |
294 | return relu_out
295 |
296 |
297 | def comp_light(inputs, albedos, normals, shadows, gamma, masks):
298 | inputs = rescale_2_zero_one(inputs)
299 | albedos = rescale_2_zero_one(albedos)
300 | shadows = rescale_2_zero_one(shadows)
301 |
302 | lighting_model = "./model/hdr_illu_pca"
303 | lighting_vectors = tf.constant(
304 | np.load(os.path.join(lighting_model, "pcaVector.npy")), dtype=tf.float32
305 | )
306 | lighting_means = tf.constant(
307 | np.load(os.path.join(lighting_model, "mean.npy")), dtype=tf.float32
308 | )
309 | lightings_var = tf.constant(
310 | np.load(os.path.join(lighting_model, "pcaVariance.npy")), dtype=tf.float32
311 | )
312 |
313 | lightings = pred_illuDecomp_layer.illuDecomp(
314 | inputs, albedos, normals, shadows, gamma, masks
315 | )
316 | lightings_pca = tf.matmul((lightings - lighting_means), pinv(lighting_vectors))
317 |
318 | # recompute lightings from lightins_pca which could add weak constraint on lighting reconstruction
319 | lightings = tf.matmul(lightings_pca, lighting_vectors) + lighting_means
320 | # reshape 27-D lightings to 9*3 lightings
321 | lightings = tf.reshape(lightings, [tf.shape(lightings)[0], 9, 3])
322 |
323 | # lighting prior loss
324 | var = tf.reduce_mean(lightings_pca ** 2, axis=0)
325 |
326 | illu_prior_loss = tf.losses.absolute_difference(var, lightings_var)
327 | illu_prior_loss = tf.constant(0.0)
328 |
329 | return lightings, illu_prior_loss
330 |
331 |
332 | def pinv(A, reltol=1e-6):
333 | # compute SVD of input A
334 | s, u, v = tf.svd(A)
335 |
336 | # invert s and clear entries lower than reltol*s_max
337 | atol = tf.reduce_max(s) * reltol
338 | # s = tf.boolean_mask(s, s>atol)
339 | s = tf.where(s > atol, s, atol * tf.ones_like(s))
340 | s_inv = tf.diag(1.0 / s)
341 | # s_inv = tf.diag(tf.concat([1./s, tf.zeros([tf.size(b) - tf.size(s)])], axis=0))
342 |
343 | # compute v * s_inv * u_t as psuedo inverse
344 | return tf.matmul(v, tf.matmul(s_inv, tf.transpose(u)))
345 |
346 |
347 | def rescale_2_zero_one(imgs):
348 | return imgs / 2.0 + 0.5
349 |
350 |
351 | def rescale_2_minusOne_one(imgs):
352 | return imgs * 2.0 - 1.0
353 |
--------------------------------------------------------------------------------
/model/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/model/__init__.py
--------------------------------------------------------------------------------
/model/consistency_layer.py:
--------------------------------------------------------------------------------
1 | # formulate loss function based on supplied ground truth and outputs from network
2 |
3 | import importlib
4 | import tensorflow as tf
5 | import numpy as np
6 | import os
7 | from model import (
8 | reproj_layer,
9 | lambSH_layer,
10 | )
11 | from model import vgg16 as VGG
12 |
13 |
14 | def loss_formulate(
15 | albedos,
16 | shadow,
17 | nm_pred,
18 | lightings,
19 | nm_gt,
20 | inputs,
21 | dms,
22 | cams,
23 | scale_xs,
24 | scale_ys,
25 | masks,
26 | reproj_inputs,
27 | reprojInput_mask,
28 | pair_label,
29 | sup_flag,
30 | illu_prior_loss,
31 | reg_loss_flag=True,
32 | ):
33 |
34 | # define perceptual loss by vgg16
35 | vgg_path = "../vgg16.npy"
36 | vgg16 = VGG.Vgg16(vgg_path)
37 |
38 | # pre-process inputs based on gamma
39 | gamma = tf.constant(1.0) # gamma is 4d constant
40 |
41 | albedos = rescale_2_zero_one(albedos)
42 | shadow = rescale_2_zero_one(shadow)
43 | inputs = rescale_2_zero_one(inputs)
44 | reproj_inputs = rescale_2_zero_one(reproj_inputs)
45 |
46 | sdFree_inputs = tf.pow(tf.nn.relu(tf.pow(inputs, gamma) / shadow) + 1e-4, 1 / gamma)
47 |
48 | # selete normal map used in rendering - gt or pred
49 | normals = tf.where(sup_flag, nm_gt, nm_pred)
50 |
51 | ###### cProj rendering loss
52 | # repeating elements from original batch for a number of times that is same with the number of paired images inside the sub-batch, such that realise the parallel albedos reproj computation
53 | reproj_tb = tf.to_float(tf.equal(pair_label, tf.transpose(pair_label)))
54 | reproj_tb = tf.cast(
55 | tf.matrix_set_diag(reproj_tb, tf.zeros([tf.shape(inputs)[0]])), tf.bool
56 | )
57 | reproj_list = tf.where(reproj_tb)
58 | img1_inds = tf.expand_dims(reproj_list[:, 0], axis=-1)
59 | img2_inds = tf.expand_dims(reproj_list[:, 1], axis=-1)
60 | albedo1 = tf.gather_nd(albedos, img1_inds)
61 | dms1 = tf.gather_nd(dms, img1_inds)
62 | cams1 = tf.gather_nd(cams, img1_inds)
63 | albedo2 = tf.gather_nd(albedos, img2_inds)
64 | cams2 = tf.gather_nd(cams, img2_inds)
65 | scale_xs1 = tf.gather_nd(scale_xs, img1_inds)
66 | scale_xs2 = tf.gather_nd(scale_xs, img2_inds)
67 | scale_ys1 = tf.gather_nd(scale_ys, img1_inds)
68 | scale_ys2 = tf.gather_nd(scale_ys, img2_inds)
69 |
70 | lightings2 = tf.gather_nd(lightings, img2_inds)
71 | normals1 = tf.gather_nd(normals, img1_inds)
72 | shadow2 = tf.gather_nd(shadow, img2_inds)
73 |
74 | # rotate lighting predictions
75 | cams_rot = tf.matmul(
76 | tf.reshape(cams1[:, 4:13], (-1, 3, 3)),
77 | tf.transpose(tf.reshape(cams2[:, 4:13], (-1, 3, 3)), (0, 2, 1)),
78 | )
79 |
80 | thetaX, thetaY, thetaZ = rotm2eul(cams_rot)
81 | rot = Rotation(thetaX, thetaY, thetaZ)
82 |
83 | # rotate SHL from source cam_coord to target cam_coord
84 | reproj_lightings = tf.reduce_sum(
85 | rot[:, :, :, tf.newaxis] * lightings2[:, tf.newaxis, :, :], axis=-2
86 | )
87 |
88 | # scale albedo map based on max and min values such that albedo values are in range (0,1)
89 | reproj_shadow1, reproj_mask = reproj_layer.map_reproj(
90 | dms1, shadow2, cams1, cams2, scale_xs1, scale_xs2, scale_ys1, scale_ys2
91 | )
92 | reproj_shadow1 = tf.clip_by_value(reproj_shadow1, 1e-4, 0.9999)
93 |
94 | gamma2 = 1.0
95 | reproj_sdFree_inputs = tf.pow(
96 | tf.clip_by_value(tf.pow(reproj_inputs, gamma2) / reproj_shadow1, 1e-4, 0.9999),
97 | 1 / gamma2,
98 | )
99 |
100 | sdFree_shadings, renderings_mask = lambSH_layer.lambSH_layer(
101 | tf.ones_like(albedos), normals, lightings, tf.ones_like(shadow), 1.0
102 | )
103 | reproj_sdFree_shadings, _ = lambSH_layer.lambSH_layer(
104 | tf.ones_like(albedo1),
105 | normals1,
106 | reproj_lightings,
107 | tf.ones_like(reproj_shadow1),
108 | 1.0,
109 | )
110 |
111 | reproj_albedo1, reproj_mask = reproj_layer.map_reproj(
112 | dms1, albedo2, cams1, cams2, scale_xs1, scale_xs2, scale_ys1, scale_ys2
113 | )
114 |
115 | reproj_albedo1 = reproj_albedo1 + tf.constant(1e-4) # numerical stable constant
116 |
117 | ### scale intensities for each image
118 | albedo1_pixels = tf.boolean_mask(albedo1, reproj_mask)
119 | reproj_albedo1_pixels = tf.boolean_mask(reproj_albedo1, reproj_mask)
120 | reproj_err = 0.5 * tf.losses.mean_squared_error(
121 | cvtLab(albedo1_pixels), cvtLab(reproj_albedo1_pixels)
122 | )
123 | reproj_err += 2.5 * perceptualLoss_formulate(
124 | vgg16,
125 | albedo1,
126 | reproj_albedo1,
127 | tf.to_float(reproj_mask[:, :, :, tf.newaxis]),
128 | )
129 |
130 | sdFree_recons = tf.pow(tf.nn.relu(albedos * sdFree_shadings), 1 / gamma)
131 | sdFree_inputs_pixels = cvtLab(tf.boolean_mask(sdFree_inputs, renderings_mask))
132 | sdFree_recons_pixels = cvtLab(tf.boolean_mask(sdFree_recons, renderings_mask))
133 | render_err = 0.5 * tf.losses.mean_squared_error(
134 | sdFree_inputs_pixels, sdFree_recons_pixels
135 | )
136 | render_err += 2.5 * perceptualLoss_formulate(
137 | vgg16,
138 | sdFree_inputs,
139 | sdFree_recons,
140 | tf.to_float(renderings_mask[:, :, :, tf.newaxis]),
141 | )
142 |
143 | ## scale intensities for each image
144 | reproj_sdFree_renderings = tf.pow(
145 | tf.nn.relu(reproj_sdFree_shadings * albedo1), 1 / gamma2
146 | )
147 |
148 | reprojInput_mask = tf.cast(reprojInput_mask[:, :, :, 0], tf.bool)
149 | sdFree_inputs_pixels = cvtLab(
150 | tf.boolean_mask(reproj_sdFree_inputs, reprojInput_mask)
151 | )
152 | reproj_sdFree_renderings_pixels = cvtLab(
153 | tf.boolean_mask(reproj_sdFree_renderings, reprojInput_mask)
154 | )
155 | cross_render_err = 0.5 * tf.losses.mean_squared_error(
156 | sdFree_inputs_pixels, reproj_sdFree_renderings_pixels
157 | )
158 | cross_render_err += 2.5 * perceptualLoss_formulate(
159 | vgg16,
160 | reproj_sdFree_inputs,
161 | reproj_sdFree_renderings,
162 | tf.to_float(reprojInput_mask[:, :, :, tf.newaxis]),
163 | )
164 |
165 | ### regualarisation loss
166 | reg_loss = sum(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))
167 |
168 | ### scale-invariant loss
169 |
170 | ### compute nm_pred error
171 | nmSup_mask = tf.not_equal(tf.reduce_sum(nm_gt, axis=-1), 0)
172 | nm_gt_pixel = tf.boolean_mask(nm_gt, nmSup_mask)
173 | nm_pred_pixel = tf.boolean_mask(nm_pred, nmSup_mask)
174 | nm_prod = tf.reduce_sum(nm_pred_pixel * nm_gt_pixel, axis=-1, keepdims=True)
175 | nm_cosValue = tf.constant(0.9999)
176 | nm_prod = tf.clip_by_value(nm_prod, -nm_cosValue, nm_cosValue)
177 | nm_angle = tf.acos(nm_prod) + tf.constant(1e-4)
178 | nm_loss = tf.reduce_mean(nm_angle ** 2)
179 |
180 | ### compute gradient loss
181 | Gx = tf.constant(1 / 2) * tf.expand_dims(
182 | tf.expand_dims(tf.constant([[-1, 1]], dtype=tf.float32), axis=-1), axis=-1
183 | )
184 | Gy = tf.constant(1 / 2) * tf.expand_dims(
185 | tf.expand_dims(tf.constant([[-1], [1]], dtype=tf.float32), axis=-1), axis=-1
186 | )
187 | nm_pred_Gx = conv2d_nosum(nm_pred, Gx)
188 | nm_pred_Gy = conv2d_nosum(nm_pred, Gy)
189 | nm_pred_Gxy = tf.concat([nm_pred_Gx, nm_pred_Gy], axis=-1)
190 | normals_Gx = conv2d_nosum(nm_gt, Gx)
191 | normals_Gy = conv2d_nosum(nm_gt, Gy)
192 | normals_Gxy = tf.concat([normals_Gx, normals_Gy], axis=-1)
193 | nm_pred_smt_error = tf.losses.mean_squared_error(nm_pred_Gxy, normals_Gxy)
194 |
195 | ### total loss
196 | render_err *= tf.constant(0.1)
197 | reproj_err *= tf.constant(0.1)
198 | cross_render_err *= tf.constant(0.1)
199 | illu_prior_loss *= tf.constant(0.005)
200 | nm_pred_smt_error *= tf.constant(1.0)
201 | nm_loss *= tf.constant(1.0)
202 |
203 | if reg_loss_flag == True:
204 | loss = (
205 | render_err
206 | + reproj_err
207 | + cross_render_err
208 | + reg_loss
209 | + illu_prior_loss
210 | + nm_pred_smt_error
211 | + nm_loss
212 | )
213 | else:
214 | loss = (
215 | render_err
216 | + reproj_err
217 | + cross_render_err
218 | + illu_prior_loss
219 | + nm_pred_smt_error
220 | + nm_loss
221 | )
222 |
223 | return (
224 | loss,
225 | render_err,
226 | reproj_err,
227 | cross_render_err,
228 | reg_loss,
229 | illu_prior_loss,
230 | nm_pred_smt_error,
231 | nm_loss,
232 | sdFree_inputs,
233 | sdFree_shadings,
234 | sdFree_recons,
235 | )
236 |
237 |
238 | def perceptualLoss_formulate(vgg16, renderings, inputs, masks, w_act=0.1):
239 | vgg_layers = ["conv1_2"] # conv1 through conv5
240 | vgg_layer_weights = [1.0]
241 |
242 | renderings_acts = vgg16.get_vgg_activations(renderings, vgg_layers)
243 | refs_acts = vgg16.get_vgg_activations(inputs, vgg_layers)
244 |
245 | loss = 0
246 | masks_shape = [(200, 200), (100, 100)]
247 |
248 | tmp_reproj_mask = masks
249 | for mask_shape, w, act1, act2 in zip(
250 | masks_shape, vgg_layer_weights, renderings_acts, refs_acts
251 | ):
252 | act1 *= tmp_reproj_mask
253 | act2 *= tmp_reproj_mask
254 | tmp_reproj_mask_weights = tf.tile(
255 | tf.clip_by_value(tmp_reproj_mask, 1e-4, 0.9999), (1, 1, 1, act1.shape[-1])
256 | )
257 | loss += (
258 | w
259 | * tf.reduce_sum(tmp_reproj_mask_weights * tf.square(w_act * (act1 - act2)))
260 | / tf.reduce_sum(tmp_reproj_mask_weights)
261 | )
262 |
263 | tmp_reproj_mask = 1.0 - tf.nn.max_pool(
264 | 1.0 - tmp_reproj_mask,
265 | ksize=[1, 2, 2, 1],
266 | strides=[1, 2, 2, 1],
267 | padding="SAME",
268 | )
269 |
270 | loss *= 0.0005
271 |
272 | return loss
273 |
274 |
275 | # input RGB is 2d tensor with shape (n_pix, 3)
276 | def cvtLab(RGB):
277 |
278 | # threshold definition
279 | T = tf.constant(0.008856)
280 |
281 | # matrix for converting RGB to LUV color space
282 | cvt_XYZ = tf.constant(
283 | [
284 | [0.412453, 0.35758, 0.180423],
285 | [0.212671, 0.71516, 0.072169],
286 | [0.019334, 0.119193, 0.950227],
287 | ]
288 | )
289 |
290 | # convert RGB to XYZ
291 | XYZ = tf.matmul(RGB, tf.transpose(cvt_XYZ))
292 |
293 | # normalise for D65 white point
294 | XYZ /= tf.constant([[0.950456, 1.0, 1.088754]]) * 100
295 |
296 | mask = tf.to_float(tf.greater(XYZ, T))
297 |
298 | fXYZ = XYZ ** (1 / 3) * mask + (1.0 - mask) * (
299 | tf.constant(7.787) * XYZ + tf.constant(0.137931)
300 | )
301 |
302 | M_cvtLab = tf.constant(
303 | [[0.0, 116.0, 0.0], [500.0, -500.0, 0.0], [0.0, 200.0, -200.0]]
304 | )
305 |
306 | Lab = tf.matmul(fXYZ, tf.transpose(M_cvtLab)) + tf.constant([[-16.0, 0.0, 0.0]])
307 | mask = tf.to_float(tf.equal(Lab, tf.constant(0.0)))
308 |
309 | Lab += mask * tf.constant(1e-4)
310 |
311 | return Lab
312 |
313 |
314 | # compute regular 2d convolution on 3d data
315 | def conv2d_nosum(input, kernel):
316 | input_x = input[:, :, :, 0:1]
317 | input_y = input[:, :, :, 1:2]
318 | input_z = input[:, :, :, 2:3]
319 |
320 | output_x = tf.nn.conv2d(input_x, kernel, strides=(1, 1, 1, 1), padding="SAME")
321 | output_y = tf.nn.conv2d(input_y, kernel, strides=(1, 1, 1, 1), padding="SAME")
322 | output_z = tf.nn.conv2d(input_z, kernel, strides=(1, 1, 1, 1), padding="SAME")
323 |
324 | return tf.concat([output_x, output_y, output_z], axis=-1)
325 |
326 |
327 | def rescale_2_zero_one(imgs):
328 | return imgs / 2.0 + 0.5
329 |
330 |
331 | def Rotation(thetaX, thetaY, thetaZ):
332 | num_rots = tf.shape(thetaX)[0]
333 |
334 | # rows_x = [0, 1, 2, 3, 4, 5, 6, 6, 7, 8, 8]
335 | # cols_x = [0, 2, 1, 3, 7, 5, 6, 8, 4, 6, 8]
336 | idx_x = [
337 | [0, 0],
338 | [1, 2],
339 | [2, 1],
340 | [3, 3],
341 | [4, 7],
342 | [5, 5],
343 | [6, 6],
344 | [6, 8],
345 | [7, 4],
346 | [8, 6],
347 | [8, 8],
348 | ]
349 |
350 | data_x_p90 = [
351 | 1,
352 | -1,
353 | 1,
354 | 1,
355 | -1,
356 | -1,
357 | -1 / 2,
358 | -np.sqrt(3) / 2,
359 | 1,
360 | -np.sqrt(3) / 2,
361 | 1 / 2,
362 | ]
363 | data_x_n90 = [
364 | 1,
365 | 1,
366 | -1,
367 | 1,
368 | 1,
369 | -1,
370 | -1 / 2,
371 | -np.sqrt(3) / 2,
372 | -1,
373 | -np.sqrt(3) / 2,
374 | 1 / 2,
375 | ]
376 |
377 | Rot_X_p90 = tf.sparse_to_dense(
378 | sparse_indices=idx_x, sparse_values=data_x_p90, output_shape=(9, 9)
379 | )
380 | Rot_X_n90 = tf.sparse_to_dense(
381 | sparse_indices=idx_x, sparse_values=data_x_n90, output_shape=(9, 9)
382 | )
383 |
384 | Rot_X_p90 = tf.tile(Rot_X_p90[tf.newaxis], (num_rots, 1, 1))
385 | Rot_X_n90 = tf.tile(Rot_X_n90[tf.newaxis], (num_rots, 1, 1))
386 |
387 | Rot_z = rot_z(thetaZ, num_rots)
388 |
389 | Rot_y = rot_y(thetaY, Rot_X_p90, Rot_X_n90, num_rots)
390 |
391 | Rot_x = rot_x(thetaX, Rot_X_p90, Rot_X_n90, num_rots)
392 | # return Rot_x
393 |
394 | Rot = tf.matmul(Rot_z, tf.matmul(Rot_y, Rot_x))
395 |
396 | return Rot
397 |
398 |
399 | def rot_z(thetaZ, num_rots):
400 | # rows_z = [0, 1, 1, 2, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 8]
401 | # cols_z = [0, 1, 3, 2, 1, 3, 4, 8, 5, 7, 6, 5, 7, 4, 8]
402 | idx_z = tf.constant(
403 | [
404 | [0, 0],
405 | [1, 1],
406 | [1, 3],
407 | [2, 2],
408 | [3, 1],
409 | [3, 3],
410 | [4, 4],
411 | [4, 8],
412 | [5, 5],
413 | [5, 7],
414 | [6, 6],
415 | [7, 5],
416 | [7, 7],
417 | [8, 4],
418 | [8, 8],
419 | ]
420 | )
421 | idx_id = tf.reshape(
422 | tf.tile(tf.range(num_rots)[:, tf.newaxis], (1, tf.shape(idx_z)[0])), (-1, 1)
423 | )
424 | idx_z = tf.tile(idx_z, (num_rots, 1))
425 | idx_z = tf.concat([idx_id, idx_z], axis=-1)
426 |
427 | data_Z = tf.stack(
428 | [
429 | tf.ones_like(thetaZ),
430 | tf.cos(thetaZ),
431 | tf.sin(thetaZ),
432 | tf.ones_like(thetaZ),
433 | -tf.sin(thetaZ),
434 | tf.cos(thetaZ),
435 | tf.cos(2 * thetaZ),
436 | tf.sin(2 * thetaZ),
437 | tf.cos(thetaZ),
438 | tf.sin(thetaZ),
439 | tf.ones_like(thetaZ),
440 | -tf.sin(thetaZ),
441 | tf.cos(thetaZ),
442 | -tf.sin(2 * thetaZ),
443 | tf.cos(2 * thetaZ),
444 | ],
445 | axis=-1,
446 | )
447 | data_Z = tf.reshape(data_Z, (-1,))
448 |
449 | return tf.sparse_to_dense(
450 | sparse_indices=idx_z, sparse_values=data_Z, output_shape=(num_rots, 9, 9)
451 | )
452 |
453 |
454 | def rot_y(thetaY, Rot_X_p90, Rot_X_n90, num_rots):
455 | # rows_z = [0, 1, 1, 2, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 8]
456 | # cols_z = [0, 1, 3, 2, 1, 3, 4, 8, 5, 7, 6, 5, 7, 4, 8]
457 | idx_z = tf.constant(
458 | [
459 | [0, 0],
460 | [1, 1],
461 | [1, 3],
462 | [2, 2],
463 | [3, 1],
464 | [3, 3],
465 | [4, 4],
466 | [4, 8],
467 | [5, 5],
468 | [5, 7],
469 | [6, 6],
470 | [7, 5],
471 | [7, 7],
472 | [8, 4],
473 | [8, 8],
474 | ]
475 | )
476 | idx_id = tf.reshape(
477 | tf.tile(tf.range(num_rots)[:, tf.newaxis], (1, tf.shape(idx_z)[0])), (-1, 1)
478 | )
479 | idx_z = tf.tile(idx_z, (num_rots, 1))
480 | idx_z = tf.concat([idx_id, idx_z], axis=-1)
481 |
482 | data_Y = tf.stack(
483 | [
484 | tf.ones_like(thetaY),
485 | tf.cos(thetaY),
486 | tf.sin(thetaY),
487 | tf.ones_like(thetaY),
488 | -tf.sin(thetaY),
489 | tf.cos(thetaY),
490 | tf.cos(2 * thetaY),
491 | tf.sin(2 * thetaY),
492 | tf.cos(thetaY),
493 | tf.sin(thetaY),
494 | tf.ones_like(thetaY),
495 | -tf.sin(thetaY),
496 | tf.cos(thetaY),
497 | -tf.sin(2 * thetaY),
498 | tf.cos(2 * thetaY),
499 | ],
500 | axis=-1,
501 | )
502 | data_Y = tf.reshape(data_Y, (-1,))
503 |
504 | Rot_y = tf.sparse_to_dense(
505 | sparse_indices=idx_z, sparse_values=data_Y, output_shape=(num_rots, 9, 9)
506 | )
507 |
508 | return tf.matmul(Rot_X_n90, tf.matmul(Rot_y, Rot_X_p90))
509 |
510 |
511 | def rot_x(thetaX, Rot_X_p90, Rot_X_n90, num_rots):
512 | # rows_z = [0, 1, 1, 2, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 8]
513 | # cols_z = [0, 1, 3, 2, 1, 3, 4, 8, 5, 7, 6, 5, 7, 4, 8]
514 | idx_z = tf.constant(
515 | [
516 | [0, 0],
517 | [1, 1],
518 | [1, 3],
519 | [2, 2],
520 | [3, 1],
521 | [3, 3],
522 | [4, 4],
523 | [4, 8],
524 | [5, 5],
525 | [5, 7],
526 | [6, 6],
527 | [7, 5],
528 | [7, 7],
529 | [8, 4],
530 | [8, 8],
531 | ]
532 | )
533 | idx_id = tf.reshape(
534 | tf.tile(tf.range(num_rots)[:, tf.newaxis], (1, tf.shape(idx_z)[0])), (-1, 1)
535 | )
536 | idx_z = tf.tile(idx_z, (num_rots, 1))
537 | idx_z = tf.concat([idx_id, idx_z], axis=-1)
538 |
539 | data_X = tf.stack(
540 | [
541 | tf.ones_like(thetaX),
542 | tf.cos(thetaX),
543 | tf.sin(thetaX),
544 | tf.ones_like(thetaX),
545 | -tf.sin(thetaX),
546 | tf.cos(thetaX),
547 | tf.cos(2 * thetaX),
548 | tf.sin(2 * thetaX),
549 | tf.cos(thetaX),
550 | tf.sin(thetaX),
551 | tf.ones_like(thetaX),
552 | -tf.sin(thetaX),
553 | tf.cos(thetaX),
554 | -tf.sin(2 * thetaX),
555 | tf.cos(2 * thetaX),
556 | ],
557 | axis=-1,
558 | )
559 | data_X = tf.reshape(data_X, (-1,))
560 |
561 | Rot_x = tf.sparse_to_dense(
562 | sparse_indices=idx_z, sparse_values=data_X, output_shape=(num_rots, 9, 9)
563 | )
564 |
565 | half_pi = tf.tile(tf.constant([np.pi / 2]), (num_rots,))
566 | Rot_Y_n90 = rot_y(-half_pi, Rot_X_p90, Rot_X_n90, num_rots)
567 | Rot_Y_p90 = rot_y(half_pi, Rot_X_p90, Rot_X_n90, num_rots)
568 |
569 | return tf.matmul(Rot_Y_p90, tf.matmul(Rot_x, Rot_Y_n90))
570 |
571 |
572 | def rotm2eul(rotm):
573 | sy = tf.sqrt(rotm[:, 0, 0] ** 2 + rotm[:, 1, 0] ** 2)
574 | singular = sy < 1e-6
575 |
576 | thetaX = tf.where(
577 | singular,
578 | tf.atan2(-rotm[:, 1, 2], rotm[:, 1, 1]),
579 | tf.atan2(rotm[:, 2, 1], rotm[:, 2, 2]),
580 | )
581 |
582 | # thetaY = tf.where(singular, tf.atan2(-rotm[:,2,0], sy), tf.atan2(rotm[:,2,1], rotm[:,2,2]))
583 | thetaY = tf.atan2(rotm[:, 2, 0], sy)
584 |
585 | thetaZ = tf.where(
586 | singular, tf.zeros_like(rotm[:, 0, 0]), tf.atan2(rotm[:, 1, 0], rotm[:, 0, 0])
587 | )
588 |
589 | return thetaX, thetaY, thetaZ
590 |
--------------------------------------------------------------------------------
/model/dataloader.py:
--------------------------------------------------------------------------------
1 | import pickle as pk
2 | import os
3 | import numpy as np
4 | import tensorflow as tf
5 | import skimage.transform as imgTform
6 | import glob
7 | from scipy import io
8 |
9 |
10 | def bigTime_dataPipeline(num_subbatch_input, dir):
11 | img_batch = pk.load(open(os.path.join(dir + "BigTime_v1", "img_batch.p"), "rb"))
12 |
13 | scene = "0161"
14 | img_batch = [sorted(sc_batch) for sc_batch in img_batch if scene in sc_batch[0][:4]]
15 | img_batch = np.asarray(img_batch[0])
16 | img_batch = [np.delete(img_batch, [10, 20, 30, 40, 50])]
17 |
18 | sc_len = np.asarray([len(sc_l) for sc_l in img_batch], dtype=np.int32)
19 | size_dim2 = sc_len.max()
20 |
21 | bt_imgs_path = []
22 | bt_masks_path = []
23 | for sc_list, sc_size in zip(img_batch, sc_len):
24 | sc_imgs_path = []
25 | sc_masks_path = []
26 | for img_string in sc_list:
27 | tmp = img_string.split(os.path.sep)
28 |
29 | img_path = os.path.join(dir + "BigTime_v1", tmp[0], "data", tmp[1])
30 |
31 | img_name = os.path.splitext(tmp[1])[0]
32 | mask_path = os.path.join(
33 | dir + "BigTime_v1", tmp[0], "data", img_name + "_mask.png"
34 | )
35 |
36 | sc_imgs_path.append(img_path)
37 | sc_masks_path.append(mask_path)
38 |
39 | sc_imgs_path += ["" for i in range(size_dim2 - sc_size)]
40 | sc_masks_path += ["" for i in range(size_dim2 - sc_size)]
41 | bt_imgs_path.append(sc_imgs_path)
42 | bt_masks_path.append(sc_masks_path)
43 |
44 | bt_imgs_path = np.asarray(bt_imgs_path)
45 | bt_masks_path = np.asarray(bt_masks_path)
46 |
47 | train_data = bt_construct_inputPipeline(
48 | bt_imgs_path,
49 | bt_masks_path,
50 | sc_len,
51 | batch_size=num_subbatch_input,
52 | flag_shuffle=True,
53 | )
54 |
55 | # define re-initialisable iterator
56 | iterator = tf.data.Iterator.from_structure(
57 | train_data.output_types, train_data.output_shapes
58 | )
59 | next_element = iterator.get_next()
60 |
61 | # define initialisation for each iterator
62 | trainData_init_op = iterator.make_initializer(train_data)
63 |
64 | return next_element, trainData_init_op, len(bt_imgs_path)
65 |
66 |
67 | def megaDepth_dataPipeline(num_subbatch_input, dir, training_mode, num_test_sc):
68 | # locate all scenes
69 | data_scenes1 = np.array(
70 | sorted(glob.glob(os.path.join(dir + "new_outdoorMega_items", "*")))
71 | )
72 | data_scenes2 = np.array(
73 | sorted(glob.glob(os.path.join(dir + "new_indoorMega_items", "*")))
74 | )
75 | data_scenes3 = np.array(
76 | sorted(glob.glob(os.path.join(dir + "new_LSMega_items", "*")))
77 | )
78 |
79 | # scan scenes
80 | # sort scenes by number of training images in each
81 | scenes_size1 = np.array([len(os.listdir(i)) for i in data_scenes1])
82 | scenes_size2 = np.array([len(os.listdir(i)) for i in data_scenes2])
83 | scenes_size3 = np.array([len(os.listdir(i)) for i in data_scenes3])
84 | scenes_sorted1 = np.argsort(scenes_size1)
85 | scenes_sorted2 = np.argsort(scenes_size2)
86 | scenes_sorted3 = np.argsort(scenes_size3)
87 |
88 | train_scenes = data_scenes1[scenes_sorted1[num_test_sc:]]
89 | test_scenes = data_scenes1[scenes_sorted1[:num_test_sc]]
90 |
91 | cProj_HiRes_scenes = np.array(
92 | [
93 | os.path.join(dir + "HiRes_cProj_imgs", sc.split("/")[-1])
94 | for sc in train_scenes
95 | ]
96 | )
97 | cProj_HiRes_test_scenes = np.array(
98 | [
99 | os.path.join(dir + "HiRes_cProj_imgs", sc.split("/")[-1])
100 | for sc in test_scenes
101 | ]
102 | )
103 |
104 | # load data from each scene
105 | # locate each data minibatch in each sorted sc
106 | train_scenes_items = [
107 | sorted(glob.glob(os.path.join(sc, "*.pk"))) for sc in train_scenes
108 | ]
109 | train_scenes_items = np.concatenate(train_scenes_items, axis=0)
110 | test_scenes_items = [
111 | sorted(glob.glob(os.path.join(sc, "*.pk"))) for sc in test_scenes
112 | ]
113 | test_scenes_items = np.concatenate(test_scenes_items, axis=0)
114 |
115 | HiRes_cProj_items = [
116 | sorted(glob.glob(os.path.join(sc, "*.pk"))) for sc in cProj_HiRes_scenes
117 | ]
118 | HiRes_cProj_items = np.concatenate(HiRes_cProj_items, axis=0)
119 | HiRes_cProj_test_items = [
120 | sorted(glob.glob(os.path.join(sc, "*.pk"))) for sc in cProj_HiRes_test_scenes
121 | ]
122 | HiRes_cProj_test_items = np.concatenate(HiRes_cProj_test_items, axis=0)
123 |
124 | # split data into train and test
125 | # separate out some data from training scenes as testing data
126 | train_items = train_scenes_items
127 | test_items = test_scenes_items
128 |
129 | ### contruct training data pipeline
130 | # remove residual data over number of data in one epoch
131 | res_train_items = len(train_items) - (len(train_items) % num_subbatch_input)
132 | train_items = train_items[:res_train_items]
133 | HiRes_cProj_items = HiRes_cProj_items[:res_train_items]
134 | train_data = md_construct_inputPipeline(
135 | train_items, HiRes_cProj_items, flag_shuffle=True, batch_size=num_subbatch_input
136 | )
137 |
138 | res_test_items = len(test_items) - (len(test_items) % num_subbatch_input)
139 | test_items = test_items[:res_test_items]
140 | HiRes_cProj_test_items = HiRes_cProj_test_items[:res_test_items]
141 | test_data = md_construct_inputPipeline(
142 | test_items,
143 | HiRes_cProj_test_items,
144 | flag_shuffle=False,
145 | batch_size=num_subbatch_input,
146 | )
147 |
148 | # define re-initialisable iterator
149 | iterator = tf.data.Iterator.from_structure(
150 | train_data.output_types, train_data.output_shapes
151 | )
152 | next_element = iterator.get_next()
153 |
154 | # define initialisation for each iterator
155 | trainData_init_op = iterator.make_initializer(train_data)
156 | testData_init_op = iterator.make_initializer(test_data)
157 |
158 | return (
159 | next_element,
160 | trainData_init_op,
161 | testData_init_op,
162 | len(train_items),
163 | len(test_items),
164 | )
165 |
166 |
167 | def nyu_dataPipeline(num_subbatch_input, dir):
168 | nm_gts_path = np.array(
169 | glob.glob(os.path.join(dir, "normals_gt", "new_normals", "*"))
170 | )
171 | nm_gts_path.sort()
172 | masks_path = np.array(glob.glob(os.path.join(dir, "normals_gt", "masks", "*")))
173 | masks_path.sort()
174 | splits_path = os.path.join(dir, "splits.mat")
175 | imgs_path = os.path.join(dir, "NYU_imgs.mat")
176 | train_split = io.loadmat(splits_path)["trainNdxs"]
177 | train_split -= 1
178 | train_split = train_split.squeeze()
179 | test_split = io.loadmat(splits_path)["testNdxs"]
180 | test_split -= 1
181 | test_split = test_split.squeeze()
182 | train_split = np.squeeze(train_split)
183 | imgs = io.loadmat(imgs_path)["imgs"]
184 | imgs = imgs.transpose(-1, 0, 1, 2)
185 |
186 | train_nm_gts_path = nm_gts_path[train_split]
187 | train_masks_path = masks_path[train_split]
188 | train_imgs = imgs[train_split]
189 |
190 | train_data = nyu_construct_inputPipeline(
191 | train_imgs,
192 | train_nm_gts_path,
193 | train_masks_path,
194 | batch_size=num_subbatch_input,
195 | flag_shuffle=True,
196 | )
197 |
198 | # define re-initialisable iterator
199 | iterator = tf.data.Iterator.from_structure(
200 | train_data.output_types, train_data.output_shapes
201 | )
202 | next_element = iterator.get_next()
203 |
204 | # define initialisation for each iterator
205 | trainData_init_op = iterator.make_initializer(train_data)
206 |
207 | return next_element, trainData_init_op
208 |
209 |
210 | def _read_pk_function(filename):
211 | with open(filename, "rb") as f:
212 | batch_data = pk.load(f)
213 | input = np.float32(batch_data["input"])
214 | dm = batch_data["dm"]
215 | nm = np.float32(batch_data["nm"])
216 | cam = np.float32(batch_data["cam"])
217 | scaleX = batch_data["scaleX"]
218 | scaleY = batch_data["scaleY"]
219 | mask = np.float32(batch_data["mask"])
220 |
221 | return input, dm, nm, cam, scaleX, scaleY, mask
222 |
223 |
224 | def _read_pk_function_cProj(filename):
225 | with open(filename, "rb") as f:
226 | batch_data = pk.load(f)
227 | input = np.float32(batch_data["reproj_im1"])
228 | mask = np.float32(batch_data["reproj_mask"])
229 |
230 | return input, mask
231 |
232 |
233 | def md_read_func(filename, cProj_filename):
234 |
235 | input, dm, nm, cam, scaleX, scaleY, mask = tf.py_func(
236 | _read_pk_function,
237 | [filename],
238 | [
239 | tf.float32,
240 | tf.float32,
241 | tf.float32,
242 | tf.float32,
243 | tf.float32,
244 | tf.float32,
245 | tf.float32,
246 | ],
247 | )
248 | reproj_inputs, reproj_mask = tf.py_func(
249 | _read_pk_function_cProj, [cProj_filename], [tf.float32, tf.float32]
250 | )
251 |
252 | input = tf.data.Dataset.from_tensor_slices(input[None])
253 | dm = tf.data.Dataset.from_tensor_slices(dm[None])
254 | nm = tf.data.Dataset.from_tensor_slices(nm[None])
255 | cam = tf.data.Dataset.from_tensor_slices(cam[None])
256 | scaleX = tf.data.Dataset.from_tensor_slices(scaleX[None])
257 | scaleY = tf.data.Dataset.from_tensor_slices(scaleY[None])
258 | mask = tf.data.Dataset.from_tensor_slices(mask[None])
259 | reproj_inputs = tf.data.Dataset.from_tensor_slices(reproj_inputs[None])
260 | reproj_mask = tf.data.Dataset.from_tensor_slices(reproj_mask[None])
261 |
262 | return tf.data.Dataset.zip(
263 | (input, dm, nm, cam, scaleX, scaleY, mask, reproj_inputs, reproj_mask)
264 | )
265 |
266 |
267 | def md_preprocess_func(
268 | input, dm, nm, cam, scaleX, scaleY, mask, reproj_inputs, reproj_mask
269 | ):
270 |
271 | input = input / 255
272 | input = input * 2 - 1
273 |
274 | nm = nm / 127
275 |
276 | reproj_inputs = reproj_inputs / 255
277 |
278 | reproj_inputs = reproj_inputs * 2 - 1
279 |
280 | return input, dm, nm, cam, scaleX, scaleY, mask, reproj_inputs, reproj_mask
281 |
282 |
283 | def bt_preprocess_func(bt_imgs, bt_masks):
284 | ori_bt_imgs = tf.identity(bt_imgs)
285 | ori_bt_masks = tf.identity(bt_masks)
286 |
287 | input_height = 200
288 | input_width = 200
289 |
290 | ori_height = tf.shape(bt_imgs)[1]
291 | ori_width = tf.shape(bt_imgs)[2]
292 | ratio = tf.to_float(ori_width) / tf.to_float(ori_height)
293 |
294 | bt_imgs = tf.image.resize_nearest_neighbor(bt_imgs, (input_height, input_width))
295 | bt_masks = tf.image.resize_nearest_neighbor(bt_masks, (input_height, input_width))
296 |
297 | bt_imgs = tf.to_float(bt_imgs) / 255.0
298 | bt_masks = tf.to_float(tf.not_equal(bt_masks, 0))
299 |
300 | return bt_imgs, bt_masks
301 |
302 |
303 | def bt_read_func(bt_imgs_path, bt_masks_path, sc_len):
304 | batch_size = tf.constant(5)
305 | res_len = batch_size - sc_len
306 | res_idx = tf.range(sc_len, batch_size)
307 |
308 | sfl_idx = tf.random_shuffle(tf.range(sc_len))
309 |
310 | sc_idx = sfl_idx[:batch_size]
311 | bt_imgs_path = tf.gather(bt_imgs_path, sc_idx)
312 | bt_masks_path = tf.gather(bt_masks_path, sc_idx)
313 |
314 | i_ = tf.constant(0)
315 | num_loops = tf.shape(bt_imgs_path)[0]
316 | im_output = tf.TensorArray(dtype=tf.uint8, size=num_loops)
317 | mask_output = tf.TensorArray(dtype=tf.uint8, size=num_loops)
318 |
319 | def condition(i_, im_output, mask_output):
320 | return tf.less(i_, num_loops)
321 |
322 | def body(i_, im_output, mask_output):
323 | bt_img = tf.read_file(bt_imgs_path[i_])
324 | bt_img = tf.image.decode_image(bt_img)
325 |
326 | bt_mask = tf.read_file(bt_masks_path[i_])
327 | bt_mask = tf.image.decode_image(bt_mask)
328 |
329 | im_output = im_output.write(i_, bt_img)
330 | mask_output = mask_output.write(i_, bt_mask)
331 | i_ += 1
332 |
333 | return i_, im_output, mask_output
334 |
335 | _, im_output, mask_output = tf.while_loop(
336 | condition, body, loop_vars=[i_, im_output, mask_output]
337 | )
338 |
339 | bt_imgs = im_output.stack()[tf.newaxis]
340 | bt_masks = mask_output.stack()[tf.newaxis]
341 |
342 | return tf.data.Dataset.from_tensor_slices((bt_imgs, bt_masks))
343 |
344 |
345 | def nyu_read_func(img, nm_gt_path, mask_path):
346 |
347 | nm_gt = tf.image.decode_image(tf.read_file(nm_gt_path), channels=3)
348 | mask = tf.image.decode_image(tf.read_file(mask_path))
349 |
350 | return tf.data.Dataset.from_tensor_slices(
351 | (img[tf.newaxis, :, :, :], tf.expand_dims(nm_gt, axis=0), mask[tf.newaxis])
352 | )
353 |
354 |
355 | def nyu_preprocess_func(img, nm_gt, mask):
356 |
357 | # masking
358 | bdL = tf.reduce_min(tf.where(tf.not_equal(mask, 0))[:, 0])
359 | bdR = tf.reduce_max(tf.where(tf.not_equal(mask, 0))[:, 0])
360 | bdT = tf.reduce_min(tf.where(tf.not_equal(mask, 0))[:, 1])
361 | bdB = tf.reduce_max(tf.where(tf.not_equal(mask, 0))[:, 1])
362 |
363 | img = img[bdT:bdB, bdL:bdR]
364 | nm_gt = nm_gt[bdT:bdB, bdL:bdR]
365 |
366 | img = tf.to_float(img) / 255.0
367 | nm_gt = tf.to_float(nm_gt) / 127.0 - 1.0
368 |
369 | nm_gt = tf.stack([nm_gt[:, :, 2], -nm_gt[:, :, 1], -nm_gt[:, :, 0]], axis=-1)
370 |
371 | img = img[tf.newaxis]
372 | nm_gt = nm_gt[tf.newaxis]
373 |
374 | input_height = 200
375 | input_width = 200
376 |
377 | ori_height = tf.shape(img)[1]
378 | ori_width = tf.shape(img)[2]
379 | ratio = tf.to_float(ori_width) / tf.to_float(ori_height)
380 |
381 | rand_pos = tf.cond(
382 | ratio > 1.0,
383 | lambda: f1(ori_height, ori_width),
384 | lambda: f2(ori_height, ori_width),
385 | )
386 |
387 | rand_flip = tf.random_uniform((), 0, 1, dtype=tf.float32)
388 | rand_angle = tf.random_uniform((), -1, 1, dtype=tf.float32) * (5.0 / 180.0) * np.pi
389 |
390 | img = img[:, rand_pos[0] : rand_pos[1], rand_pos[2] : rand_pos[3], :]
391 | nm_gt = nm_gt[:, rand_pos[0] : rand_pos[1], rand_pos[2] : rand_pos[3], :]
392 |
393 | img = tf.where(rand_flip > 0.5, img[:, :, ::-1], img)
394 | nm_gt = tf.where(
395 | rand_flip > 0.5,
396 | nm_gt[:, :, ::-1] * tf.constant([[[[-1, 1, 1]]]], dtype=tf.float32),
397 | nm_gt,
398 | )
399 |
400 | img = tf.image.resize_nearest_neighbor(img, (input_height, input_width))
401 | nm_gt = tf.image.resize_nearest_neighbor(nm_gt, (input_height, input_width))
402 |
403 | img = tf.contrib.image.rotate(img, rand_angle)
404 | nm_gt = tf.contrib.image.rotate(nm_gt, rand_angle)
405 |
406 | sinR = tf.sin(rand_angle)
407 | cosR = tf.cos(rand_angle)
408 | R = tf.stack(
409 | [
410 | tf.stack([cosR, sinR, 0.0], axis=-1),
411 | tf.stack([-sinR, cosR, 0.0], axis=-1),
412 | tf.constant([0, 0, 1], dtype=tf.float32),
413 | ],
414 | axis=0,
415 | )
416 | nm_gt = tf.reshape(tf.matmul(tf.reshape(nm_gt, (-1, 3)), R), (1, 200, 200, 3))
417 |
418 | return tf.squeeze(img), tf.squeeze(nm_gt)
419 |
420 |
421 | def bt_construct_inputPipeline(
422 | bt_imgs_path, bt_masks_path, sc_len, batch_size, flag_shuffle=True
423 | ):
424 | data = tf.data.Dataset.from_tensor_slices((bt_imgs_path, bt_masks_path, sc_len))
425 | if flag_shuffle:
426 | data = data.apply(tf.contrib.data.shuffle_and_repeat(buffer_size=100000))
427 | else:
428 | data = data.repeat()
429 |
430 | data = data.apply(
431 | tf.contrib.data.parallel_interleave(
432 | bt_read_func, cycle_length=batch_size, block_length=1, sloppy=False
433 | )
434 | )
435 |
436 | data = data.map(bt_preprocess_func, num_parallel_calls=8)
437 | data = data.batch(batch_size).prefetch(4)
438 |
439 | return data
440 |
441 |
442 | def nyu_construct_inputPipeline(
443 | nyu_imgs, nyu_nm_gts_path, nyu_masks_path, batch_size, flag_shuffle=True
444 | ):
445 | imgs_data = tf.data.Dataset.from_tensor_slices(nyu_imgs)
446 | nm_gts_data = tf.data.Dataset.from_tensor_slices(nyu_nm_gts_path)
447 | masks_data = tf.data.Dataset.from_tensor_slices(nyu_masks_path)
448 |
449 | data = tf.data.Dataset.zip((imgs_data, nm_gts_data, masks_data))
450 |
451 | if flag_shuffle:
452 | data = data.apply(tf.contrib.data.shuffle_and_repeat(buffer_size=1000))
453 | else:
454 | data = data.repeat()
455 | data = data.apply(
456 | tf.contrib.data.parallel_interleave(
457 | nyu_read_func, cycle_length=batch_size, block_length=1, sloppy=False
458 | )
459 | )
460 |
461 | data = data.map(nyu_preprocess_func, num_parallel_calls=8)
462 | data = data.batch(batch_size).prefetch(4)
463 |
464 | return data
465 |
466 |
467 | def md_construct_inputPipeline(items, cProj_items, batch_size, flag_shuffle=True):
468 | data = tf.data.Dataset.from_tensor_slices((items, cProj_items))
469 | if flag_shuffle:
470 | data = data.apply(tf.contrib.data.shuffle_and_repeat(buffer_size=100000))
471 | else:
472 | data = data.repeat()
473 | data = data.apply(
474 | tf.contrib.data.parallel_interleave(
475 | md_read_func, cycle_length=batch_size, block_length=1, sloppy=False
476 | )
477 | )
478 | data = data.map(md_preprocess_func, num_parallel_calls=8)
479 | data = data.batch(batch_size).prefetch(4)
480 |
481 | return data
482 |
483 |
484 | def f1(ori_h, ori_w):
485 | h_upMost = 25
486 | w_leftMost = ori_w - ori_h + h_upMost
487 |
488 | random_start_y = tf.random_uniform((), 0, h_upMost, dtype=tf.int32)
489 | random_start_x = tf.random_uniform((), 0, w_leftMost, dtype=tf.int32)
490 | random_pos = [
491 | random_start_y,
492 | random_start_y + ori_h - h_upMost,
493 | random_start_x,
494 | random_start_x + ori_w - w_leftMost,
495 | ]
496 |
497 | return random_pos
498 |
499 |
500 | def f2(ori_h, ori_w):
501 | w_leftMost = 25
502 | h_upMost = ori_h - ori_w + w_leftMost
503 |
504 | random_start_x = tf.random_uniform((), 0, w_leftMost, dtype=tf.int32)
505 | random_start_y = tf.random_uniform((), 0, h_upMost, dtype=tf.int32)
506 | random_pos = [
507 | random_start_y,
508 | random_start_y + ori_h - h_upMost,
509 | random_start_x,
510 | random_start_x + ori_w - w_leftMost,
511 | ]
512 |
513 | return random_pos
514 |
--------------------------------------------------------------------------------
/model/hdr_illu_pca/mean.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/model/hdr_illu_pca/mean.npy
--------------------------------------------------------------------------------
/model/hdr_illu_pca/pcaMean.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/model/hdr_illu_pca/pcaMean.npy
--------------------------------------------------------------------------------
/model/hdr_illu_pca/pcaVariance.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/model/hdr_illu_pca/pcaVariance.npy
--------------------------------------------------------------------------------
/model/hdr_illu_pca/pcaVector.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/model/hdr_illu_pca/pcaVector.npy
--------------------------------------------------------------------------------
/model/lambSH_layer.py:
--------------------------------------------------------------------------------
1 | import tensorflow as tf
2 |
3 |
4 | def lambSH_layer(am, nm, L_SHcoeffs, shadow, gamma):
5 | """
6 | i = albedo * irradiance
7 | the multiplication is elementwise
8 | albedo is given
9 | irraidance = n.T * M * n, where n is (x,y,z,1)
10 | M is contructed from some precomputed constants and L_SHcoeff, where M contains information about illuminations, clamped cosine and SH basis
11 | """
12 |
13 | # M is only related with lighting
14 | c1 = tf.constant(0.429043, dtype=tf.float32)
15 | c2 = tf.constant(0.511664, dtype=tf.float32)
16 | c3 = tf.constant(0.743125, dtype=tf.float32)
17 | c4 = tf.constant(0.886227, dtype=tf.float32)
18 | c5 = tf.constant(0.247708, dtype=tf.float32)
19 |
20 | # each row have shape (batch, 4, 3)
21 | M_row1 = tf.stack(
22 | [
23 | c1 * L_SHcoeffs[:, 8, :],
24 | c1 * L_SHcoeffs[:, 4, :],
25 | c1 * L_SHcoeffs[:, 7, :],
26 | c2 * L_SHcoeffs[:, 3, :],
27 | ],
28 | axis=1,
29 | )
30 | M_row2 = tf.stack(
31 | [
32 | c1 * L_SHcoeffs[:, 4, :],
33 | -c1 * L_SHcoeffs[:, 8, :],
34 | c1 * L_SHcoeffs[:, 5, :],
35 | c2 * L_SHcoeffs[:, 1, :],
36 | ],
37 | axis=1,
38 | )
39 | M_row3 = tf.stack(
40 | [
41 | c1 * L_SHcoeffs[:, 7, :],
42 | c1 * L_SHcoeffs[:, 5, :],
43 | c3 * L_SHcoeffs[:, 6, :],
44 | c2 * L_SHcoeffs[:, 2, :],
45 | ],
46 | axis=1,
47 | )
48 | M_row4 = tf.stack(
49 | [
50 | c2 * L_SHcoeffs[:, 3, :],
51 | c2 * L_SHcoeffs[:, 1, :],
52 | c2 * L_SHcoeffs[:, 2, :],
53 | c4 * L_SHcoeffs[:, 0, :] - c5 * L_SHcoeffs[:, 6, :],
54 | ],
55 | axis=1,
56 | )
57 |
58 | # M is a 5d tensot with shape (batch,4,4,3[rgb]), the axis 1 and 2 are transposely equivalent
59 | M = tf.stack([M_row1, M_row2, M_row3, M_row4], axis=1)
60 |
61 | # find batch-spatial three dimensional mask of defined normals over nm
62 | mask = tf.not_equal(tf.reduce_sum(nm, axis=-1), 0)
63 |
64 | # flatten nm and am, resulting a dense nm and am
65 | total_npix = tf.shape(nm)[:3]
66 | ones = tf.ones(total_npix)
67 | nm_homo = tf.concat([nm, tf.expand_dims(ones, axis=-1)], axis=-1)
68 |
69 | ### manually perform batch-wise dot product
70 | # contruct batch-wise flatten M corresponding with nm_homo, such that multiplication between them is batch-wise
71 | # batch_indices = tf.expand_dims(tf.where(mask)[:,0],axis=-1)
72 | M = tf.expand_dims(tf.expand_dims(M, axis=1), axis=1)
73 |
74 | # expand M for broadcasting, such that M has shape (npix,4,4,3)
75 | # expand nm_homo, such that nm_homo has shape (npix,4,1,1)
76 | nm_homo = tf.expand_dims(tf.expand_dims(nm_homo, axis=-1), axis=-1)
77 | ## realise dot product by sum over element-wise product on axis=1 and axis=2
78 | # tmp have shape (npix, 4, 3[rgb])
79 | tmp = tf.reduce_sum(nm_homo * M, axis=-3)
80 | # E has shape (npix, 3[rbg])
81 | E = tf.reduce_sum(tmp * nm_homo[:, :, :, :, 0, :], axis=-2)
82 |
83 | # compute intensity by product between irradiance and albedo
84 | i = E * am * shadow * tf.to_float(tf.expand_dims(mask, -1))
85 |
86 | # gamma correction
87 | i = tf.clip_by_value(i, 0.0, 1.0) + tf.constant(1e-4)
88 | i = tf.pow(i, 1.0 / gamma)
89 |
90 | return i, mask
91 |
--------------------------------------------------------------------------------
/model/pred_illuDecomp_layer_new.py:
--------------------------------------------------------------------------------
1 | import tensorflow as tf
2 |
3 |
4 | def illuDecomp(inputs, am, nm, shadow, gamma, masks):
5 |
6 | """
7 | i = albedo * irradiance
8 | the multiplication is elementwise
9 | albedo is given
10 | irraidance = n.T * M * n, where n is (x,y,z,1)
11 | M is contructed from some precomputed constants and L_SHcoeff, where M contains information about illuminations, clamped cosine and SH basis
12 | """
13 |
14 | inputs = tf.pow(inputs, gamma) * masks
15 |
16 | D = am * shadow * masks
17 |
18 | # compute shading by linear equation regarding nm and L_SHcoeffs
19 | c1 = tf.constant(0.429043, dtype=tf.float32)
20 | c2 = tf.constant(0.511664, dtype=tf.float32)
21 | c3 = tf.constant(0.743125, dtype=tf.float32)
22 | c4 = tf.constant(0.886227, dtype=tf.float32)
23 | c5 = tf.constant(0.247708, dtype=tf.float32)
24 |
25 | # find defined pixels
26 | num_iter = tf.shape(nm)[0]
27 | output = tf.TensorArray(dtype=tf.float32, size=num_iter)
28 | i = tf.constant(0)
29 |
30 | def condition(i, output):
31 | return i < num_iter
32 |
33 | def body(i, output):
34 | inputs_ = inputs[i]
35 | nm_ = nm[i]
36 | D_ = D[i]
37 | nm_ = tf.reshape(nm_, (-1, 3))
38 | inputs_pixels = tf.reshape(inputs_, (-1, 3))
39 | D_ = tf.reshape(D_, (-1, 3))
40 |
41 | total_npix = tf.shape(nm_)[0:1]
42 | ones = tf.ones(total_npix)
43 | A = tf.stack(
44 | [
45 | c4 * ones,
46 | 2 * c2 * nm_[:, 1],
47 | 2 * c2 * nm_[:, 2],
48 | 2 * c2 * nm_[:, 0],
49 | 2 * c1 * nm_[:, 0] * nm_[:, 1],
50 | 2 * c1 * nm_[:, 1] * nm_[:, 2],
51 | c3 * nm_[:, 2] ** 2 - c5,
52 | 2 * c1 * nm_[:, 2] * nm_[:, 0],
53 | c1 * (nm_[:, 0] ** 2 - nm_[:, 1] ** 2),
54 | ],
55 | axis=-1,
56 | )
57 |
58 | output_r = tf.matmul(pinv(A * D_[:, 0:1]), inputs_pixels[:, 0:1])
59 | output_g = tf.matmul(pinv(A * D_[:, 1:2]), inputs_pixels[:, 1:2])
60 | output_b = tf.matmul(pinv(A * D_[:, 2:3]), inputs_pixels[:, 2:3])
61 |
62 | output = output.write(i, tf.concat([output_r, output_g, output_b], axis=-1))
63 | i += tf.constant(1)
64 |
65 | return i, output
66 |
67 | _, output = tf.while_loop(condition, body, loop_vars=[i, output])
68 | L_SHcoeffs = output.stack()
69 |
70 | return tf.reshape(L_SHcoeffs, [-1, 27])
71 |
72 |
73 | def pinv(A, reltol=1e-6):
74 | # compute SVD of input A
75 | s, u, v = tf.svd(A)
76 |
77 | # invert s and clear entries lower than reltol*s_max
78 | atol = tf.reduce_max(s) * reltol
79 | s = tf.where(s > atol, s, atol * tf.ones_like(s))
80 | s_inv = tf.diag(1.0 / s)
81 |
82 | # compute v * s_inv * u_t as psuedo inverse
83 | return tf.matmul(v, tf.matmul(s_inv, tf.transpose(u)))
84 |
--------------------------------------------------------------------------------
/model/reproj_layer.py:
--------------------------------------------------------------------------------
1 | # apply error mask in albedo reprojection
2 |
3 |
4 | # no rotation involved
5 |
6 |
7 | #### directly output flatten reprojected pixels and the reconstruction mask
8 |
9 | # the differentiable layer performing reprojection
10 |
11 | import tensorflow as tf
12 | import numpy as np
13 |
14 | # pc is n-by-3 matrix containing point could three locations
15 | # cam is the new camera parameters, whose f and p_a have shape (batch) and c has shape (batch, 2)
16 | # dm1 is the depth map associated with cam1 that is camera for output image, which has shape (batch, height, width)
17 | # img2 is the input image that acts as source image for reprojection, which has shape (batch, height, width, 3)
18 | def map_reproj(dm1, map2, cam1, cam2, scale_x1, scale_x2, scale_y1, scale_y2):
19 | batch_size = tf.shape(dm1)[0]
20 |
21 | # read camera parameters
22 | c1 = cam1[:, 2:4]
23 | f1 = cam1[:, 0]
24 | p_a1 = cam1[:, 1] # ratio is width divided by height
25 | R1 = tf.reshape(cam1[:, 4:13], [-1, 3, 3])
26 | t1 = cam1[:, 13:]
27 |
28 | c2 = cam2[:, 2:4]
29 | f2 = cam2[:, 0]
30 | p_a2 = cam2[:, 1]
31 | R2 = tf.reshape(cam2[:, 4:13], [-1, 3, 3])
32 | t2 = cam2[:, 13:]
33 |
34 | # project pixel points back to camera coords
35 | # u is the height and v is the width
36 | # u and v are scalars
37 | u1 = tf.shape(dm1)[1]
38 | v1 = tf.shape(dm1)[2]
39 |
40 | # convert u1 and v1 to float, convenient for computation
41 | u1 = tf.to_float(u1)
42 | v1 = tf.to_float(v1)
43 |
44 | ### regular grid in output image
45 | # x increase towards right, y increase toward down
46 | vm, um = tf.meshgrid(tf.range(1.0, v1 + 1.0), tf.range(1.0, u1 + 1.0))
47 |
48 | # apply scaling factors on f
49 | # f1 = f1/(scale_x1+scale_y1)*2
50 | # f1 = tf.stack([f1, f1*p_a1],axis=-1)
51 | f1 = tf.stack([f1 / scale_x1, f1 / scale_y1 * p_a1], axis=-1)
52 |
53 | # expand f1 (batch,2,1,1), to be consistant with dm
54 | f1 = tf.expand_dims(tf.expand_dims(f1, axis=-1), axis=-1)
55 | # expand c1 dimension (batch,2,1,1)
56 | c1 = tf.expand_dims(tf.expand_dims(c1, axis=-1), axis=-1)
57 | # expand vm and um to have shape (1,height,width)
58 | vm = tf.expand_dims(vm, axis=0)
59 | um = tf.expand_dims(um, axis=0)
60 |
61 | # compute 3D point x and y coordinates
62 | # Xm and Ym have shape (batch, height, width)
63 | Xm = (vm - c1[:, 0]) / f1[:, 0] * dm1
64 | Ym = (um - c1[:, 1]) / f1[:, 1] * dm1
65 |
66 | # the point cloud is (batch, 3, npix) matrix, each row is XYZ cam coords for one point
67 | pc = tf.stack(
68 | [
69 | tf.contrib.layers.flatten(Xm),
70 | tf.contrib.layers.flatten(Ym),
71 | tf.contrib.layers.flatten(dm1),
72 | ],
73 | axis=1,
74 | )
75 |
76 | ### transfer pc from coords of cam1 to cam2
77 | # construct homogeneous point cloud with shape batch-4-by-num_pix
78 | num_pix = tf.shape(pc)[-1]
79 | homo_pc_c1 = tf.concat(
80 | [pc, tf.ones((batch_size, 1, num_pix), dtype=tf.float32)], axis=1
81 | )
82 |
83 | # both transformation matrix have shape batch-by-4-by-4, valid for multiplication with defined homogeneous point cloud
84 | last_row = tf.tile(
85 | tf.constant([[[0, 0, 0, 1]]], dtype=tf.float32), multiples=[batch_size, 1, 1]
86 | )
87 | W_C_R_t1 = tf.concat([R1, tf.expand_dims(t1, axis=2)], axis=2)
88 | W_C_trans1 = tf.concat([W_C_R_t1, last_row], axis=1)
89 | W_C_R_t2 = tf.concat([R2, tf.expand_dims(t2, axis=2)], axis=2)
90 | W_C_trans2 = tf.concat([W_C_R_t2, last_row], axis=1)
91 |
92 | # batch dot product, output has shape (batch, 4, npix)
93 | homo_pc_c2 = tf.matmul(
94 | W_C_trans2, tf.matmul(tf.matrix_inverse(W_C_trans1), homo_pc_c1)
95 | )
96 |
97 | ### project point cloud to cam2 pixel coordinates
98 | # u in vertical and v in horizontal
99 | u2 = tf.shape(map2)[1]
100 | v2 = tf.shape(map2)[2]
101 |
102 | # convert u2 and v2 to float
103 | u2 = tf.to_float(u2)
104 | v2 = tf.to_float(v2)
105 |
106 | # f2 = f2/(scale_x2+scale_y2)*2
107 | # f2 = tf.stack([f2, f2*p_a2],axis=-1)
108 | f2 = tf.stack([f2 / scale_x2, f2 / scale_y2 * p_a2], axis=-1)
109 |
110 | # construct intrics matrics, which has shape (batch, 3, 4)
111 | zeros = tf.zeros_like(f2[:, 0], dtype=tf.float32)
112 | ones = tf.ones_like(f2[:, 0], tf.float32)
113 | k2 = tf.stack(
114 | [
115 | tf.stack([f2[:, 0], zeros, c2[:, 0], zeros], axis=1),
116 | tf.stack([zeros, f2[:, 1], c2[:, 1], zeros], axis=1),
117 | tf.stack([zeros, zeros, ones, zeros], axis=1),
118 | ],
119 | axis=1,
120 | )
121 |
122 | ## manual batch dot product
123 | k2 = tf.expand_dims(k2, axis=-1)
124 | homo_pc_c2 = tf.expand_dims(homo_pc_c2, axis=1)
125 | # homo_uv2 has shape (batch, 3, npix)
126 | homo_uv2 = tf.reduce_sum(k2 * homo_pc_c2, axis=2)
127 |
128 | # the reprojected locations of regular grid in output image
129 | # both have shape (batch, npix)
130 | v_reproj = homo_uv2[:, 0, :] / homo_uv2[:, 2, :]
131 | u_reproj = homo_uv2[:, 1, :] / homo_uv2[:, 2, :]
132 |
133 | # u and v are flatten vector containing reprojected pixel locations
134 | # the u and v on same index compose one pixel
135 | u_valid = tf.logical_and(
136 | tf.logical_and(tf.logical_not(tf.is_nan(u_reproj)), u_reproj > 0),
137 | u_reproj < u2 - 1,
138 | )
139 | v_valid = tf.logical_and(
140 | tf.logical_and(tf.logical_not(tf.is_nan(v_reproj)), v_reproj > 0),
141 | v_reproj < v2 - 1,
142 | )
143 | # pixels has shape (batch, npix), indicating available reprojected pixels
144 | pixels = tf.logical_and(u_valid, v_valid)
145 |
146 | # pixels is bool indicator over original regular grid
147 | # v_reproj and u_reproj is x and y coordinates in source image
148 | # pixels, v_reproj and u_reproj are corresponded with each other by their indices
149 |
150 | ### interpolation function based on source image img2
151 | # it has shape (total_npix, 3), the second dimension contains [img_inds, x, y]; we need to use img_inds to distinguish each pixel's request image
152 | # img_inds is 2d matrix with shape (batch, npix), containing img_ind for each (x,y) location
153 | img_inds = tf.tile(
154 | tf.expand_dims(tf.to_float(tf.range(batch_size)), axis=1),
155 | multiples=[1, num_pix],
156 | )
157 | request_points1 = tf.stack(
158 | [
159 | tf.boolean_mask(img_inds, pixels),
160 | tf.boolean_mask(v_reproj, pixels),
161 | tf.boolean_mask(u_reproj, pixels),
162 | ],
163 | axis=1,
164 | )
165 |
166 | # the output is stacked flatten pixel values for channels
167 | re_proj_pixs = interpImg(request_points1, map2)
168 |
169 | # reconstruct original shaped re-projection map
170 | ndims = tf.shape(map2)[3]
171 | shape = [batch_size, tf.to_int32(u1), tf.to_int32(v1), ndims]
172 |
173 | pixels = tf.reshape(
174 | pixels, shape=tf.stack([batch_size, tf.to_int32(u1), tf.to_int32(v1)], axis=0)
175 | )
176 | indices = tf.to_int32(tf.where(tf.equal(pixels, True)))
177 |
178 | re_proj_pixs = tf.scatter_nd(updates=re_proj_pixs, indices=indices, shape=shape)
179 |
180 | # re_proj_pix is flatten reprojection results with shape (total_npix, 3)
181 | # indices contains first three indices in original image shape for each pixel in re_proj_pixs
182 | return re_proj_pixs, pixels
183 |
184 |
185 | def interpImg(unknown, data):
186 | # interpolate unknown data on pixel locations defined in unknown from known data with location defined in on regular grid
187 |
188 | # find neighbour pixels on regular grid
189 | # x is horizontal, y is vertical
190 | img_inds = tf.to_int32(unknown[:, 0])
191 | x = unknown[:, 1]
192 | y = unknown[:, 2]
193 | # rgb_inds = tf.to_int32(unknown[:,3])
194 |
195 | low_x = tf.to_int32(tf.floor(x))
196 | high_x = tf.to_int32(tf.ceil(x))
197 | low_y = tf.to_int32(tf.floor(y))
198 | high_y = tf.to_int32(tf.ceil(y))
199 |
200 | # measure the weights for neighbourhood average based on distance
201 | dist_low_x = tf.expand_dims(x - tf.to_float(low_x), axis=-1)
202 | dist_high_x = tf.expand_dims(tf.to_float(high_x) - x, axis=-1)
203 | dist_low_y = tf.expand_dims(y - tf.to_float(low_y), axis=-1)
204 | dist_high_y = tf.expand_dims(tf.to_float(high_y) - y, axis=-1)
205 |
206 | # compute horizontal avarage
207 | avg_low_y = dist_low_x * tf.gather_nd(
208 | data, indices=tf.stack([img_inds, low_y, low_x], axis=1)
209 | ) + dist_high_x * tf.gather_nd(
210 | data, indices=tf.stack([img_inds, low_y, high_x], axis=1)
211 | )
212 | avg_high_y = dist_low_x * tf.gather_nd(
213 | data, indices=tf.stack([img_inds, high_y, low_x], axis=1)
214 | ) + dist_high_x * tf.gather_nd(
215 | data, indices=tf.stack([img_inds, high_y, high_x], axis=1)
216 | )
217 |
218 | # compute vertical average
219 | avg = dist_low_y * avg_low_y + dist_high_y * avg_high_y
220 |
221 | return avg
222 |
--------------------------------------------------------------------------------
/model/vgg16.py:
--------------------------------------------------------------------------------
1 | """
2 | code cloned from
3 | https://github.com/machrisaa/tensorflow-vgg/blob/master/vgg16.py
4 | """
5 |
6 |
7 | import numpy as np
8 | import os
9 | import tensorflow as tf
10 | import time
11 |
12 |
13 | VGG_MEAN = [103.939, 116.779, 123.68]
14 |
15 |
16 | class Vgg16:
17 | def __init__(self, vgg16_npy_path):
18 | self.initialized = False
19 | self.vgg16_npy_path = vgg16_npy_path
20 | print("npy file loaded")
21 |
22 | def build(self, rgb):
23 | """
24 | load variable from npy to build the VGG
25 | :param rgb: rgb image [batch, height, width, 3] values scaled [-1, 1]
26 | """
27 |
28 | start_time = time.time()
29 | print("build model started")
30 |
31 | # rgb_scaled = (rgb + 1) * 255.0 / 2.
32 | rgb_scaled = rgb * 255.0
33 | # Convert RGB to BGR
34 | red, green, blue = tf.split(axis=3, num_or_size_splits=3, value=rgb_scaled)
35 | bgr = tf.concat(
36 | axis=3,
37 | values=[
38 | blue - VGG_MEAN[0],
39 | green - VGG_MEAN[1],
40 | red - VGG_MEAN[2],
41 | ],
42 | )
43 |
44 | self.data_dict = np.load(
45 | self.vgg16_npy_path, encoding="latin1", allow_pickle=True
46 | ).item()
47 | layer_dict = dict()
48 | with tf.variable_scope("vgg16", reuse=self.initialized):
49 | layer_dict["conv1_1"] = self.conv_layer(bgr, "conv1_1")
50 | layer_dict["conv1_2"] = self.conv_layer(layer_dict["conv1_1"], "conv1_2")
51 | layer_dict["pool1"] = self.max_pool(layer_dict["conv1_2"], "pool1")
52 |
53 | # layer_dict['conv2_1'] = self.conv_layer(
54 | # layer_dict['pool1'], 'conv2_1')
55 | # layer_dict['conv2_2'] = self.conv_layer(
56 | # layer_dict['conv2_1'], 'conv2_2')
57 | # layer_dict['pool2'] = self.max_pool(
58 | # layer_dict['conv2_2'], 'pool2')
59 |
60 | # layer_dict['conv3_1'] = self.conv_layer(
61 | # layer_dict['pool2'], 'conv3_1')
62 | # layer_dict['conv3_2'] = self.conv_layer(
63 | # layer_dict['conv3_1'], 'conv3_2')
64 | # layer_dict['conv3_3'] = self.conv_layer(
65 | # layer_dict['conv3_2'], 'conv3_3')
66 | # layer_dict['pool3'] = self.max_pool(
67 | # layer_dict['conv3_3'], 'pool3')
68 |
69 | # layer_dict['conv4_1'] = self.conv_layer(
70 | # layer_dict['pool3'], 'conv4_1')
71 | # layer_dict['conv4_2'] = self.conv_layer(
72 | # layer_dict['conv4_1'], 'conv4_2')
73 | # layer_dict['conv4_3'] = self.conv_layer(
74 | # layer_dict['conv4_2'], 'conv4_3')
75 | # layer_dict['pool4'] = self.max_pool(layer_dict['conv4_3'], 'pool4')
76 |
77 | # layer_dict['conv5_1'] = self.conv_layer(
78 | # layer_dict['pool4'], 'conv5_1')
79 | # layer_dict['conv5_2'] = self.conv_layer(
80 | # layer_dict['conv5_1'], 'conv5_2')
81 | # layer_dict['conv5_3'] = self.conv_layer(
82 | # layer_dict['conv5_2'], 'conv5_3')
83 | # layer_dict['pool5'] = self.max_pool(layer_dict['conv5_3'], 'pool5')
84 |
85 | self.data_dict = None
86 | self.initialized = True
87 | return layer_dict
88 |
89 | def get_vgg_activations(self, rgb, layer_names):
90 | layer_dict = self.build(rgb)
91 | # validate_names = reduce(lambda f1, f2: f1 & f2,
92 | # [layer_dict.has_key(x) for x in layer_names])
93 | # assert validate_names, 'invalid vgg16 layer name(s): %s' % str(layer_names)
94 | activations = [layer_dict[k] for k in layer_names]
95 | return activations
96 |
97 | def avg_pool(self, bottom, name):
98 | return tf.nn.avg_pool(
99 | bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME", name=name
100 | )
101 |
102 | def max_pool(self, bottom, name):
103 | return tf.nn.max_pool(
104 | bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME", name=name
105 | )
106 |
107 | def conv_layer(self, bottom, name):
108 | with tf.variable_scope(name):
109 | filt = self.get_conv_filter(name)
110 |
111 | conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding="SAME")
112 |
113 | conv_biases = self.get_bias(name)
114 | bias = tf.nn.bias_add(conv, conv_biases)
115 |
116 | relu = tf.nn.relu(bias)
117 | return relu
118 |
119 | def fc_layer(self, bottom, name):
120 | with tf.variable_scope(name):
121 | shape = bottom.get_shape().as_list()
122 | dim = 1
123 | for d in shape[1:]:
124 | dim *= d
125 | x = tf.reshape(bottom, [-1, dim])
126 |
127 | weights = self.get_fc_weight(name)
128 | biases = self.get_bias(name)
129 |
130 | # Fully connected layer. Note that the '+' operation automatically
131 | # broadcasts the biases.
132 | fc = tf.nn.bias_add(tf.matmul(x, weights), biases)
133 |
134 | return fc
135 |
136 | def get_conv_filter(self, name):
137 | return tf.constant(self.data_dict[name][0], name="filter")
138 |
139 | def get_bias(self, name):
140 | return tf.constant(self.data_dict[name][1], name="biases")
141 |
142 | def get_fc_weight(self, name):
143 | return tf.constant(self.data_dict[name][0], name="weights")
144 |
145 |
--------------------------------------------------------------------------------
/run_test_demo.sh:
--------------------------------------------------------------------------------
1 | #! /bin/bash
2 | set -x
3 |
4 | ### input and output options ####
5 | TESTING_MODE="demo_im"
6 | MODEL_PATH="model_ckpt"
7 | IMAGE_PATH="demo_im.jpg"
8 | MASK_PATH="demo_mask.jpg"
9 | RESULTS_DIR="test_demo"
10 |
11 | python test.py \
12 | --mode "${TESTING_MODE}" \
13 | --model "${MODEL_PATH}" \
14 | --image "${IMAGE_PATH}" \
15 | --mask "${MASK_PATH}" \
16 | --output "${RESULTS_DIR}"
17 |
--------------------------------------------------------------------------------
/run_test_diode.sh:
--------------------------------------------------------------------------------
1 | #! /bin/bash
2 | set -x
3 | ROOT="/shared/storage/cs/staffstore/yy1571"
4 |
5 | ### input and output options ####
6 | TESTING_MODE="diode"
7 | MODEL_PATH="diode_model_ckpt"
8 | IMAGES_DIR="${ROOT}/Data/DIODE"
9 | RESULTS_DIR="test_diode"
10 |
11 | python test.py \
12 | --mode "${TESTING_MODE}" \
13 | --model "${MODEL_PATH}" \
14 | --diode "${IMAGES_DIR}" \
15 | --output "${RESULTS_DIR}"
16 |
--------------------------------------------------------------------------------
/run_test_iiw.sh:
--------------------------------------------------------------------------------
1 | #! /bin/bash
2 | set -x
3 | ROOT="/shared/storage/cs/staffstore/yy1571"
4 |
5 | ### input and output options ####
6 | TESTING_MODE="iiw"
7 | MODEL_PATH="iiw_model_ckpt"
8 | IMAGES_DIR="${ROOT}/Data/testData/iiw-dataset/data"
9 | RESULTS_DIR="test_iiw"
10 |
11 | python test.py \
12 | --mode "${TESTING_MODE}" \
13 | --model "${MODEL_PATH}" \
14 | --iiw "${IMAGES_DIR}" \
15 | --output "${RESULTS_DIR}"
16 |
--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
1 | import ipdb
2 | from tqdm import tqdm
3 | import json
4 | import os
5 | import numpy as np
6 | import tensorflow as tf
7 | import shutil
8 | import cv2
9 | from skimage import io
10 | from model import lambSH_layer, SfMNet
11 | from utils.render_sphere_nm import render_sphere_nm
12 | from utils.whdr import compute_whdr
13 | from utils.diode_metrics import angular_error
14 | import argparse
15 |
16 | parser = argparse.ArgumentParser(description="InverseRenderNet++")
17 | parser.add_argument(
18 | "--mode",
19 | type=str,
20 | default="demo_im",
21 | choices=["demo_im", "iiw", "diode"],
22 | help="testing mode",
23 | )
24 |
25 | # test demo image
26 | parser.add_argument("--image", type=str, default=None, help="Path to test image")
27 | parser.add_argument("--mask", type=str, default=None, help="Path to image mask")
28 | # test iiw
29 | parser.add_argument(
30 | "--iiw", type=str, default=None, help="Root directory for iiw-dataset"
31 | )
32 | # test diode
33 | parser.add_argument(
34 | "--diode", type=str, default=None, help="Root directory for iiw-dataset"
35 | )
36 | # model and output path
37 | parser.add_argument("--model", type=str, required=True, help="Path to trained model")
38 | parser.add_argument("--output", type=str, required=True, help="Folder saving outputs")
39 | args = parser.parse_args()
40 |
41 |
42 | def rescale_2_zero_one(imgs):
43 | return imgs / 2.0 + 0.5
44 |
45 |
46 | def srgb_to_rgb(srgb):
47 | """Taken from bell2014: sRGB -> RGB."""
48 | ret = np.zeros_like(srgb)
49 | idx0 = srgb <= 0.04045
50 | idx1 = srgb > 0.04045
51 | ret[idx0] = srgb[idx0] / 12.92
52 | ret[idx1] = np.power((srgb[idx1] + 0.055) / 1.055, 2.4)
53 | return ret
54 |
55 |
56 | def irn_func(input_height, input_width):
57 | # define inputs
58 | inputs_var = tf.placeholder(tf.float32, (None, input_height, input_width, 3))
59 | masks_var = tf.placeholder(tf.float32, (None, input_height, input_width, 1))
60 | train_flag = tf.placeholder(tf.bool, ())
61 |
62 | albedos, shadow, nm_pred = SfMNet.SfMNet(
63 | inputs=inputs_var,
64 | is_training=train_flag,
65 | height=input_height,
66 | width=input_width,
67 | masks=masks_var,
68 | n_layers=30,
69 | n_pools=4,
70 | depth_base=32,
71 | )
72 |
73 | gamma = tf.constant(2.2)
74 | lightings, _ = SfMNet.comp_light(
75 | inputs_var, albedos, nm_pred, shadow, gamma, masks_var
76 | )
77 |
78 | # rescale
79 | albedos = rescale_2_zero_one(albedos) * masks_var
80 | shadow = rescale_2_zero_one(shadow)
81 | inputs = rescale_2_zero_one(inputs_var) * masks_var
82 |
83 | # visualise lighting on a sphere
84 | num_rendering = tf.shape(lightings)[0]
85 | nm_sphere = tf.constant(render_sphere_nm(100, 1), dtype=tf.float32)
86 | nm_sphere = tf.tile(nm_sphere, (num_rendering, 1, 1, 1))
87 | lighting_recon, _ = lambSH_layer.lambSH_layer(
88 | tf.ones_like(nm_sphere), nm_sphere, lightings, tf.ones_like(nm_sphere), 1.0
89 | )
90 |
91 | # recon shading map
92 | shading, _ = lambSH_layer.lambSH_layer(
93 | tf.ones_like(albedos), nm_pred, lightings, tf.ones_like(albedos), 1.0
94 | )
95 |
96 | return (
97 | albedos,
98 | shadow,
99 | nm_pred,
100 | lighting_recon,
101 | shading,
102 | inputs,
103 | inputs_var,
104 | masks_var,
105 | train_flag,
106 | )
107 |
108 |
109 | def post_process(
110 | albedos_val,
111 | shading_val,
112 | shadow_val,
113 | lighting_recon_val,
114 | nm_pred_val,
115 | ori_width,
116 | ori_height,
117 | resize=True,
118 | ):
119 | # post-process results
120 | results = {}
121 |
122 | if resize:
123 | results.update(
124 | dict(albedos=cv2.resize(albedos_val[0], (ori_width, ori_height)))
125 | )
126 |
127 | results.update(
128 | dict(shading=cv2.resize(shading_val[0], (ori_width, ori_height)))
129 | )
130 |
131 | results.update(
132 | dict(shadow=cv2.resize(shadow_val[0, :, :, 0], (ori_width, ori_height)))
133 | )
134 |
135 | results.update(dict(lighting_recon=lighting_recon_val[0]))
136 |
137 | results.update(
138 | dict(nm_pred=cv2.resize(nm_pred_val[0], (ori_width, ori_height)))
139 | )
140 | else:
141 | results.update(dict(albedos=albedos_val[0]))
142 |
143 | results.update(dict(shading=shading_val[0]))
144 |
145 | results.update(dict(shadow=shadow_val[0, ..., 0]))
146 |
147 | results.update(dict(lighting_recon=lighting_recon_val[0]))
148 |
149 | results.update(dict(nm_pred=nm_pred_val[0]))
150 |
151 | return results
152 |
153 |
154 | def saving_result(results, dst_dir, prefix=""):
155 | img = np.uint8(results["img"])
156 | albedos = np.uint8(results["albedos"] * 255.0)
157 | shading = np.uint8(results["shading"] * 255.0)
158 | shadow = np.uint8(results["shadow"] * 255.0)
159 | lighting_recon = np.uint8(results["lighting_recon"] * 255.0)
160 | nm_pred = np.uint8(results["nm_pred"] * 255.0)
161 |
162 | # save images
163 | input_path = os.path.join(dst_dir, prefix + "img.png")
164 | io.imsave(input_path, img)
165 | nm_pred_path = os.path.join(dst_dir, prefix + "nm_pred.png")
166 | io.imsave(nm_pred_path, nm_pred)
167 | albedo_path = os.path.join(dst_dir, prefix + "albedo.png")
168 | io.imsave(albedo_path, albedos)
169 | shading_path = os.path.join(dst_dir, prefix + "shading.png")
170 | io.imsave(shading_path, shading)
171 | shadow_path = os.path.join(dst_dir, prefix + "shadow.png")
172 | io.imsave(shadow_path, shadow)
173 | lighting_path = os.path.join(dst_dir, prefix + "lighting.png")
174 | io.imsave(lighting_path, lighting_recon)
175 | pass
176 |
177 |
178 | def rescale_img(img):
179 | img_h, img_w = img.shape[:2]
180 | if img_h > img_w:
181 | scale = img_w / 200
182 | new_img_h = np.int32(img_h / scale)
183 | new_img_w = 200
184 |
185 | img = cv2.resize(img, (new_img_w, new_img_h))
186 | else:
187 | scale = img_h / 200
188 | new_img_w = np.int32(img_w / scale)
189 | new_img_h = 200
190 |
191 | img = cv2.resize(img, (new_img_w, new_img_h))
192 |
193 | return img, (img_h, img_w), (new_img_h, new_img_w)
194 |
195 |
196 | if args.mode == "demo_im":
197 | assert args.image is not None and args.mask is not None
198 |
199 | # read in images
200 | img_path = args.image
201 | mask_path = args.mask
202 |
203 | img = io.imread(img_path)
204 | mask = io.imread(mask_path)
205 |
206 | input_height = 200
207 | input_width = 200
208 |
209 | ori_img, (ori_height, ori_width), (input_height, input_width) = rescale_img(img)
210 |
211 | # run inverse rendering
212 | (
213 | albedos,
214 | shadow,
215 | nm_pred,
216 | lighting_recon,
217 | shading,
218 | inputs,
219 | inputs_var,
220 | masks_var,
221 | train_flag,
222 | ) = irn_func(input_height, input_width)
223 |
224 | # load model and run session
225 | model_path = tf.train.get_checkpoint_state(args.model).model_checkpoint_path
226 | sess = tf.InteractiveSession()
227 | saver = tf.train.Saver()
228 | saver.restore(sess, model_path)
229 |
230 | # evaluation
231 | dst_dir = args.output
232 | if os.path.isdir(dst_dir):
233 | shutil.rmtree(dst_dir, ignore_errors=True)
234 | os.makedirs(dst_dir)
235 |
236 | imgs = np.float32(ori_img) / 255.0
237 | imgs = srgb_to_rgb(imgs)
238 | imgs = imgs * 2.0 - 1.0
239 | imgs = imgs[None]
240 | mask = cv2.resize(mask, (input_width, input_height), cv2.INTER_NEAREST)
241 | img_masks = np.float32(mask == 255)[None, ..., None]
242 | imgs *= img_masks
243 | [
244 | albedos_val,
245 | nm_pred_val,
246 | shadow_val,
247 | lighting_recon_val,
248 | shading_val,
249 | inputs_val,
250 | ] = sess.run(
251 | [albedos, nm_pred, shadow, lighting_recon, shading, inputs],
252 | feed_dict={inputs_var: imgs, masks_var: img_masks, train_flag: False},
253 | )
254 |
255 | # post-process results
256 | results = post_process(
257 | albedos_val,
258 | shading_val,
259 | shadow_val,
260 | lighting_recon_val,
261 | nm_pred_val,
262 | ori_width,
263 | ori_height,
264 | )
265 |
266 | # rescale albedo and normal
267 | results["albedos"] = (results["albedos"] - results["albedos"].min()) / (
268 | results["albedos"].max() - results["albedos"].min()
269 | )
270 |
271 | results["nm_pred"] = (results["nm_pred"] + 1.0) / 2.0
272 | results["img"] = img
273 |
274 | saving_result(results, dst_dir)
275 |
276 |
277 | elif args.mode == "iiw":
278 | assert args.iiw is not None
279 |
280 | input_height = 200
281 | input_width = 200
282 |
283 | (
284 | albedos,
285 | shadow,
286 | nm_pred,
287 | lighting_recon,
288 | shading,
289 | inputs,
290 | inputs_var,
291 | masks_var,
292 | train_flag,
293 | ) = irn_func(input_height, input_width)
294 |
295 | # load model and run session
296 | model_path = tf.train.get_checkpoint_state(args.model).model_checkpoint_path
297 | sess = tf.InteractiveSession()
298 | saver = tf.train.Saver()
299 | saver.restore(sess, model_path)
300 |
301 | # evaluation
302 | dst_dir = args.output
303 | if os.path.isdir(dst_dir):
304 | shutil.rmtree(dst_dir, ignore_errors=True)
305 | os.makedirs(dst_dir)
306 |
307 | iiw = args.iiw
308 |
309 | test_ids = np.load("utils/iiw_test_ids.npy")
310 |
311 | total_loss = 0
312 | for counter, test_id in enumerate(tqdm(test_ids)):
313 | img_file = str(test_id) + ".png"
314 | judgement_file = str(test_id) + ".json"
315 |
316 | img_path = os.path.join(iiw, "imgs", img_file)
317 | judgement_path = os.path.join(iiw, "jsons", judgement_file)
318 |
319 | img = io.imread(img_path)
320 | judgement = json.load(open(judgement_path))
321 |
322 | ori_img = img
323 | ori_height, ori_width = ori_img.shape[:2]
324 | img = cv2.resize(img, (input_width, input_height))
325 | img = np.float32(img) / 255.0
326 | img = img * 2.0 - 1.0
327 | img = img[None, :, :, :]
328 | mask = np.ones((1, input_height, input_width, 1), np.bool)
329 |
330 | [
331 | albedos_val,
332 | nm_pred_val,
333 | shadow_val,
334 | lighting_recon_val,
335 | shading_val,
336 | inputs_val,
337 | ] = sess.run(
338 | [albedos, nm_pred, shadow, lighting_recon, shading, inputs],
339 | feed_dict={inputs_var: img, masks_var: mask, train_flag: False},
340 | )
341 |
342 | # results folder for current scn
343 | result_dir = os.path.join(dst_dir, img_file[:-4])
344 | os.makedirs(result_dir, exist_ok=True)
345 |
346 | # post-process results
347 | results = post_process(
348 | albedos_val,
349 | shading_val,
350 | shadow_val,
351 | lighting_recon_val,
352 | nm_pred_val,
353 | ori_width,
354 | ori_height,
355 | )
356 |
357 | results["img"] = ori_img
358 | results["shading"] *= results["shadow"][..., None]
359 | results["nm_pred"] = (results["nm_pred"] + 1.0) / 2.0
360 |
361 | results["albedos"] = results["albedos"] ** (1 / 2.2)
362 |
363 | loss = compute_whdr(results["albedos"], judgement)
364 | total_loss += loss
365 | # print(f"{result_dir:s}\t\t{loss:f} {total_loss:f}")
366 |
367 | saving_result(results, result_dir)
368 |
369 | print("IIW TEST WHDR %f" % (total_loss / len(test_ids)))
370 |
371 |
372 | elif args.mode == "diode":
373 | assert args.diode is not None
374 |
375 | last_height = None
376 | last_width = None
377 |
378 | diode = args.diode
379 | test_root_dir = os.path.join(diode, "depth", "val")
380 | test_nm_root_dir = os.path.join(diode, "normal", "val")
381 |
382 | from glob import glob
383 |
384 | test_scenes_nm_dir = sorted(
385 | glob(os.path.join(test_nm_root_dir, "outdoor", "scene*", "scan*"))
386 | + glob(os.path.join(test_nm_root_dir, "indoor", "scene*", "scan*"))
387 | )
388 |
389 | test_normals_path = np.concatenate(
390 | [
391 | sorted(glob(os.path.join(t_sc_dir, "*_normal.npy")))
392 | for t_sc_dir in test_scenes_nm_dir
393 | ],
394 | axis=0,
395 | )
396 | test_imgs_path = np.stack(
397 | [
398 | tmp.replace("/normal/", "/depth/").replace("_normal.npy", ".png")
399 | for tmp in test_normals_path
400 | ],
401 | axis=0,
402 | )
403 | test_masks_path = np.stack(
404 | [
405 | tmp.replace("/normal/", "/depth/").replace("_normal.npy", "_depth_mask.npy")
406 | for tmp in test_normals_path
407 | ],
408 | axis=0,
409 | )
410 | test_depths_path = np.stack(
411 | [
412 | tmp.replace("/normal/", "/depth/").replace("_normal.npy", "_depth.npy")
413 | for tmp in test_normals_path
414 | ],
415 | axis=0,
416 | )
417 |
418 | total_angErr_list = []
419 | for i, (img_path, mask_path, nm_gt_path) in enumerate(
420 | zip(tqdm(test_imgs_path), test_masks_path, test_normals_path)
421 | ):
422 |
423 | im_id = os.path.split(img_path)[1].split(".")[0]
424 | cur_dir = os.path.split(img_path)[0]
425 | cur_dir = cur_dir.split("/val/")[1]
426 |
427 | # results folder
428 | dst_dir = args.output
429 | if os.path.isdir(dst_dir):
430 | shutil.rmtree(dst_dir, ignore_errors=True)
431 | os.makedirs(dst_dir)
432 |
433 | # load im and gts
434 | img = io.imread(img_path)
435 | ori_img = img
436 | img = np.float32(ori_img) / 255.0
437 | img = img * 2.0 - 1.0
438 |
439 | mask = np.load(mask_path)
440 | nm_gt = np.load(nm_gt_path)
441 |
442 | img, (ori_height, ori_width), (input_height, input_width) = rescale_img(img)
443 | img_mask, (_, _), (_, _) = rescale_img(mask)
444 | nm_gt, (_, _), (_, _) = rescale_img(nm_gt)
445 |
446 | img = img[None, :, :, :]
447 | img_mask = img_mask[None, :, :, None] != 0.0
448 |
449 | if input_height != last_height or input_width != last_width:
450 | (
451 | albedos,
452 | shadow,
453 | nm_pred,
454 | lighting_recon,
455 | shading,
456 | inputs,
457 | inputs_var,
458 | masks_var,
459 | train_flag,
460 | ) = irn_func(input_height, input_width)
461 |
462 | if last_height is None and last_width is None:
463 | model_path = tf.train.get_checkpoint_state(
464 | args.model
465 | ).model_checkpoint_path
466 | sess = tf.InteractiveSession()
467 | saver = tf.train.Saver()
468 | saver.restore(sess, model_path)
469 |
470 | last_height = input_height
471 | last_width = input_width
472 |
473 | [
474 | albedos_val,
475 | nm_pred_val,
476 | shadow_val,
477 | lighting_recon_val,
478 | shading_val,
479 | inputs_val,
480 | ] = sess.run(
481 | [albedos, nm_pred, shadow, lighting_recon, shading, inputs],
482 | feed_dict={inputs_var: img, masks_var: img_mask, train_flag: False},
483 | )
484 |
485 | # results folder for current scn
486 | cur_dst_dir = os.path.join(dst_dir, cur_dir)
487 | os.makedirs(cur_dst_dir, exist_ok=True)
488 |
489 | # post-process results
490 | results = post_process(
491 | albedos_val,
492 | shading_val,
493 | shadow_val,
494 | lighting_recon_val,
495 | nm_pred_val,
496 | ori_width,
497 | ori_height,
498 | resize=False,
499 | )
500 |
501 | angErr_list = angular_error(nm_gt, results["nm_pred"])
502 | # print(f"{i:d} {angErr_list.mean():f}")
503 |
504 | total_angErr_list.append(angErr_list)
505 | total_angErr_list = [np.concatenate(total_angErr_list, -1)]
506 |
507 | results["img"] = cv2.resize(ori_img, (input_width, input_height))
508 | results["nm_pred"] = (results["nm_pred"] + 1.0) / 2.0
509 |
510 | saving_result(results, cur_dst_dir, prefix=im_id)
511 |
512 | print(
513 | f"DIODE TEST: mean={np.mean(total_angErr_list):f} median={np.median(total_angErr_list):.4f}"
514 | )
515 |
--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
1 | # also predict shadow mask and error mask
2 |
3 | # no rotation
4 |
5 |
6 | #### compute albedo reproj loss only on reprojection available area; compute reconstruction and its loss only based on defined area
7 |
8 |
9 | import os
10 | import shutil
11 | import time
12 |
13 | import numpy as np
14 | import tensorflow as tf
15 |
16 | from model import dataloader
17 | import argparse
18 |
19 | parser = argparse.ArgumentParser(description="InverseRenderNet++ training")
20 | parser.add_argument(
21 | "--mode",
22 | type=str,
23 | default="scratch",
24 | choices=["scratch", "trained"],
25 | help="training mode",
26 | )
27 |
28 | parser.add_argument("--root-dir", type=str, default=None, help="Path to image data")
29 | parser.add_argument("--batch-size", type=int, default=None, help="Training batchsize")
30 | parser.add_argument(
31 | "--num-test-sc", type=int, default=1, help="Split for esting scenes"
32 | )
33 | parser.add_argument("--num-gpus", type=int, default=1, help="Number of available gpus")
34 | parser.add_argument(
35 | "--use-GT-nm", action="store_true", help="Train with true normal map"
36 | )
37 | args = parser.parse_args()
38 |
39 |
40 | def main():
41 |
42 | # training batches are list of numpy arrays each of which is paired data
43 | num_subbatch_input = args.batch_size
44 | dir = args.root_dir
45 | training_mode = args.mode
46 | num_test_sc = args.num_test_sc
47 | num_gpus = args.num_gpus
48 | supTrain = args.use_GT_nm
49 |
50 | inputs_shape = (5, 200, 200, 3)
51 |
52 | (
53 | md_next_element,
54 | md_trainData_init_op,
55 | md_testData_init_op,
56 | num_train_batches,
57 | num_test_batches,
58 | ) = dataloader.megaDepth_dataPipeline(
59 | num_subbatch_input, dir, training_mode, num_test_sc
60 | )
61 |
62 | # use image batch shape to create placeholder
63 | md_inputs_var = tf.reshape(
64 | md_next_element[0], (-1, inputs_shape[1], inputs_shape[2], inputs_shape[3])
65 | )
66 | md_dms_var = tf.reshape(md_next_element[1], (-1, inputs_shape[1], inputs_shape[2]))
67 | md_nms_var = tf.reshape(
68 | md_next_element[2], (-1, inputs_shape[1], inputs_shape[2], 3)
69 | )
70 | md_cams_var = tf.reshape(md_next_element[3], (-1, 16))
71 | md_scaleXs_var = tf.reshape(md_next_element[4], (-1,))
72 | md_scaleYs_var = tf.reshape(md_next_element[5], (-1,))
73 | md_masks_var = tf.reshape(
74 | md_next_element[6], (-1, inputs_shape[1], inputs_shape[2])
75 | )
76 | md_reproj_inputs_var = tf.reshape(
77 | md_next_element[7], (-1, inputs_shape[1], inputs_shape[2], inputs_shape[3])
78 | )
79 | md_reproj_mask_var = tf.reshape(
80 | md_next_element[8], (-1, inputs_shape[1], inputs_shape[2])
81 | )
82 |
83 | train_flag = tf.placeholder(tf.bool, ())
84 | supTrain_flag = tf.placeholder(tf.bool, ())
85 |
86 | pair_label_var = tf.constant(
87 | np.repeat(np.arange(num_subbatch_input), inputs_shape[0])[:, None],
88 | dtype=tf.float32,
89 | )
90 |
91 | # feed-foward neural network from input images to lighting and albedo
92 | (
93 | loss,
94 | render_err,
95 | reproj_err,
96 | cross_render_err,
97 | reg_loss,
98 | illu_prior_loss,
99 | nm_smt_loss,
100 | nm_loss,
101 | albedos,
102 | nm_pred,
103 | shadow,
104 | sdFree_inputs,
105 | sdFree_shadings,
106 | sdFree_recons,
107 | ) = make_parallel(
108 | num_gpus,
109 | md_inputs_var,
110 | md_dms_var,
111 | md_nms_var,
112 | md_cams_var,
113 | md_scaleXs_var,
114 | md_scaleYs_var,
115 | md_masks_var,
116 | md_reproj_inputs_var,
117 | md_reproj_mask_var,
118 | pair_label_var,
119 | train_flag,
120 | supTrain_flag,
121 | inputs_shape,
122 | )
123 |
124 | ### regualarisation loss
125 | reg_loss = sum(tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES))
126 |
127 | # defined traning loop
128 | iters = 500
129 | num_subbatch = num_subbatch_input
130 | num_iters = np.int32(np.ceil(num_train_batches / num_subbatch))
131 | num_test_iters = np.int32(np.ceil(num_test_batches / num_subbatch))
132 |
133 | # define variable list for each of training
134 | g_vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope="inverserendernet")
135 |
136 | # training op
137 | global_step = tf.Variable(1, name="global_step", trainable=False)
138 |
139 | g_optimizer = tf.train.AdamOptimizer(0.0005)
140 |
141 | update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
142 | with tf.control_dependencies(update_ops):
143 | # Ensures that we execute the update_ops before performing the train_step
144 | g_train_step = g_optimizer.minimize(
145 | loss + reg_loss,
146 | global_step=global_step,
147 | var_list=g_vars,
148 | colocate_gradients_with_ops=True,
149 | )
150 |
151 | # define saver for saving and restoring
152 | saver = tf.train.Saver(g_vars + [global_step])
153 |
154 | config = tf.ConfigProto(allow_soft_placement=True)
155 | config.gpu_options.allow_growth = True
156 | sess = tf.InteractiveSession(config=config)
157 | tf.local_variables_initializer().run()
158 | tf.global_variables_initializer().run()
159 |
160 | if training_mode == "scratch":
161 | pass
162 |
163 | elif training_mode == "trained":
164 | saver.restore(sess, "model/model.ckpt")
165 |
166 | elif training_mode == "debug":
167 | saver.restore(sess, "model/model.ckpt")
168 |
169 | # save summeries
170 | render_err_summary = tf.summary.scalar("self_sup/render_err", render_err)
171 | reproj_err_summary = tf.summary.scalar("self_sup/reproj_err", reproj_err)
172 | cross_render_err_summary = tf.summary.scalar(
173 | "self_sup/cross_render_err", cross_render_err
174 | )
175 | illu_prior_loss_summary = tf.summary.scalar(
176 | "self_sup/illu_prior_loss", illu_prior_loss
177 | )
178 | nm_loss_summary = tf.summary.scalar("self_sup/nm_loss", nm_loss)
179 | nm_smt_loss_summary = tf.summary.scalar("self_sup/nm_smt_loss", nm_smt_loss)
180 |
181 | ori_summary = tf.summary.image("ori_img", md_inputs_var, max_outputs=15)
182 | am_summary = tf.summary.image("am", albedos, max_outputs=15)
183 | nm_summary = tf.summary.image("nm", nm_pred, max_outputs=15)
184 | shadow_summary = tf.summary.image("shadow", shadow, max_outputs=15)
185 | sdFree_shadings_summary = tf.summary.image(
186 | "sdFree_shadings", sdFree_shadings, max_outputs=15
187 | )
188 | sdFree_inputs_summary = tf.summary.image(
189 | "sdFree_inputs", sdFree_inputs, max_outputs=15
190 | )
191 | sdFree_recons_summary = tf.summary.image(
192 | "sdFree_recons", sdFree_recons, max_outputs=15
193 | )
194 |
195 | performance_summary = tf.summary.merge(
196 | [
197 | render_err_summary,
198 | reproj_err_summary,
199 | cross_render_err_summary,
200 | illu_prior_loss_summary,
201 | nm_loss_summary,
202 | nm_smt_loss_summary,
203 | ]
204 | )
205 | imgs_summary = tf.summary.merge(
206 | [
207 | ori_summary,
208 | am_summary,
209 | nm_summary,
210 | shadow_summary,
211 | sdFree_shadings_summary,
212 | sdFree_inputs_summary,
213 | sdFree_recons_summary,
214 | ]
215 | )
216 |
217 | if not (os.path.exists("summaries")):
218 | os.mkdir("summaries")
219 | summ_first = os.path.join("summaries", "first")
220 | if not (os.path.exists(summ_first)):
221 | os.mkdir(summ_first)
222 | else:
223 | shutil.rmtree(summ_first, ignore_errors=True)
224 | summ_writer = tf.summary.FileWriter(summ_first, sess.graph)
225 |
226 | # supTrain = True -> train albedo net by given nm_gt
227 | # supTrain = False -> train albedo net using nm_preds
228 | md_trainData_init_op.run()
229 |
230 | best_score = 100
231 | best_result = 0
232 | for i in range(1, iters + 1):
233 | g_loss_avg = 0
234 | f = open("cost.txt", "a")
235 | if training_mode == "trained" or training_mode == "scratch":
236 | for j in range(1, num_iters + 1):
237 |
238 | print("iter %d/%d loop %d/%d" % (i, iters, j, num_iters))
239 | f.write("iter %d/%d loop %d/%d" % (i, iters, j, num_iters))
240 | start_time_g = time.time()
241 | if j % 50 == 1:
242 | [
243 | global_step_val,
244 | imgs_summary_val,
245 | performance_summary_val,
246 | _,
247 | loss_val,
248 | reg_loss_val,
249 | render_err_val,
250 | reproj_err_val,
251 | cross_render_err_val,
252 | illu_prior_val,
253 | nm_smt_loss_val,
254 | nm_loss_val,
255 | ] = sess.run(
256 | [
257 | global_step,
258 | imgs_summary,
259 | performance_summary,
260 | g_train_step,
261 | loss,
262 | reg_loss,
263 | render_err,
264 | reproj_err,
265 | cross_render_err,
266 | illu_prior_loss,
267 | nm_smt_loss,
268 | nm_loss,
269 | ],
270 | feed_dict={train_flag: True, supTrain_flag: supTrain},
271 | )
272 | summ_writer.add_summary(performance_summary_val, global_step_val)
273 | summ_writer.add_summary(imgs_summary_val, global_step_val)
274 |
275 | else:
276 | [
277 | _,
278 | loss_val,
279 | reg_loss_val,
280 | render_err_val,
281 | reproj_err_val,
282 | cross_render_err_val,
283 | illu_prior_val,
284 | nm_smt_loss_val,
285 | nm_loss_val,
286 | ] = sess.run(
287 | [
288 | g_train_step,
289 | loss,
290 | reg_loss,
291 | render_err,
292 | reproj_err,
293 | cross_render_err,
294 | illu_prior_loss,
295 | nm_smt_loss,
296 | nm_loss,
297 | ],
298 | feed_dict={train_flag: True, supTrain_flag: supTrain},
299 | )
300 |
301 | g_loss_avg += loss_val
302 |
303 | if j % 1 == 0:
304 | print(
305 | "\tg_loss_avg = %f, loss = %f, took %.3fs"
306 | % (g_loss_avg / j, loss_val, time.time() - start_time_g)
307 | )
308 | print(
309 | "\t\treg_loss = %f, render_err = %f, reproj_err = %f, cross_render_err = %f, illu_prior = %f, nm_smt_loss = %f, nm_loss = %f"
310 | % (
311 | reg_loss_val,
312 | render_err_val,
313 | reproj_err_val,
314 | cross_render_err_val,
315 | illu_prior_val,
316 | nm_smt_loss_val,
317 | nm_loss_val,
318 | )
319 | )
320 |
321 | f.write(
322 | "\tg_loss_avg = %f, loss = %f, took %.3fs\n\t\treg_loss = %f, render_err = %f, reproj_err = %f, cross_render_err = %f, illu_prior = %f, nm_smt_loss = %f, nm_loss = %f\n"
323 | % (
324 | g_loss_avg / j,
325 | loss_val,
326 | time.time() - start_time_g,
327 | reg_loss_val,
328 | render_err_val,
329 | reproj_err_val,
330 | cross_render_err_val,
331 | illu_prior_val,
332 | nm_smt_loss_val,
333 | nm_loss_val,
334 | )
335 | )
336 |
337 | f.close()
338 |
339 | md_testData_init_op.run()
340 | test_loss = 0
341 | test_render_err = 0
342 | test_reproj_err = 0
343 | test_cross_render_err = 0
344 | test_illu_prior = 0
345 | test_nm_loss = 0
346 | for j in range(1, num_test_iters + 1):
347 | [
348 | loss_val,
349 | reg_loss_val,
350 | render_err_val,
351 | reproj_err_val,
352 | cross_render_err_val,
353 | illu_prior_val,
354 | nm_smt_loss_val,
355 | nm_loss_val,
356 | ] = sess.run(
357 | [
358 | loss,
359 | reg_loss,
360 | render_err,
361 | reproj_err,
362 | cross_render_err,
363 | illu_prior_loss,
364 | nm_smt_loss,
365 | nm_loss,
366 | ],
367 | feed_dict={train_flag: False, supTrain_flag: supTrain},
368 | )
369 |
370 | test_loss += loss_val
371 | test_render_err += render_err_val
372 | test_reproj_err += reproj_err_val
373 | test_cross_render_err += cross_render_err_val
374 | test_illu_prior += illu_prior_val
375 | test_nm_loss += nm_loss_val
376 |
377 | test_loss /= num_test_iters
378 | test_render_err /= num_test_iters
379 | test_reproj_err /= num_test_iters
380 | test_cross_render_err /= num_test_iters
381 | test_illu_prior /= num_test_iters
382 | test_nm_loss /= num_test_iters
383 |
384 | score = test_loss
385 |
386 | if best_score > score:
387 | best_result = i
388 | best_score = score
389 | saver.save(sess, "model_best/model.ckpt")
390 |
391 | f = open("test.txt", "a")
392 | f.write(
393 | "iter {:d}, score {:f}: render_err={:f}, reproj_err={:f}, cross_render_err={:f}, illu_prior={:f}, nm_loss={:f}\n".format(
394 | i,
395 | score,
396 | test_render_err,
397 | test_reproj_err,
398 | test_cross_render_err,
399 | test_illu_prior,
400 | test_nm_loss,
401 | )
402 | )
403 | f.write(
404 | "\tbest_result {:d}, best_score {:f}\n".format(best_result, best_score)
405 | )
406 | f.close()
407 |
408 | md_trainData_init_op.run()
409 |
410 | # save model every 10 iterations
411 | if i % 1 == 0:
412 | saver.save(sess, "model/model.ckpt")
413 |
414 |
415 | def make_parallel(
416 | num_gpus,
417 | inputs_var,
418 | dms_var,
419 | nms_var,
420 | cams_var,
421 | scaleXs_var,
422 | scaleYs_var,
423 | masks_var,
424 | reproj_inputs_var,
425 | reproj_mask_var,
426 | pair_label_var,
427 | train_flag,
428 | supTrain_flag,
429 | inputs_shape,
430 | ):
431 | from model import SfMNet, consistency_layer
432 |
433 | inputs_var = tf.split(inputs_var, num_gpus)
434 | dms_var = tf.split(dms_var, num_gpus)
435 | nms_var = tf.split(nms_var, num_gpus)
436 | cams_var = tf.split(cams_var, num_gpus)
437 | scaleXs_var = tf.split(scaleXs_var, num_gpus)
438 | scaleYs_var = tf.split(scaleYs_var, num_gpus)
439 | masks_var = tf.split(masks_var, num_gpus)
440 | reproj_inputs_var = tf.split(reproj_inputs_var, num_gpus)
441 | reproj_mask_var = tf.split(reproj_mask_var, num_gpus)
442 | pair_label_var = tf.split(pair_label_var, num_gpus)
443 |
444 | loss_split = []
445 | render_err_split = []
446 | reproj_err_split = []
447 | cross_render_err_split = []
448 | reg_loss_split = []
449 | illu_prior_loss_split = []
450 | nm_smt_loss_split = []
451 | nm_loss_split = []
452 | for i in range(num_gpus):
453 | with tf.device(tf.DeviceSpec(device_type="GPU", device_index=i)):
454 | with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE):
455 | # mask out sky in inputs and nms
456 | # dms_var *= masks_var
457 | masks_var_4d = tf.expand_dims(masks_var[i], axis=-1)
458 | reproj_mask_var_4d = tf.expand_dims(reproj_mask_var[i], axis=-1)
459 |
460 | inputs_var[i] *= masks_var_4d
461 | nms_var[i] *= masks_var_4d
462 |
463 | albedos, shadow, nm_pred = SfMNet.SfMNet(
464 | inputs=inputs_var[i],
465 | is_training=train_flag,
466 | height=inputs_shape[1],
467 | width=inputs_shape[2],
468 | masks=masks_var_4d,
469 | n_layers=30,
470 | n_pools=4,
471 | depth_base=32,
472 | )
473 |
474 | normals = tf.where(supTrain_flag, nms_var[i], nm_pred)
475 |
476 | # linearise srgb input to rgb
477 | rbg_inputs_var = inputs_srbg_2_rbg(inputs_var[i])
478 | rbg_reproj_inputs_var = inputs_srbg_2_rbg(reproj_inputs_var[i])
479 |
480 | # infer lighting from rgb input and compute lighting loss
481 | lightings, illu_prior_loss = SfMNet.comp_light(
482 | rbg_inputs_var, albedos, normals, shadow, 1.0, masks_var_4d
483 | )
484 |
485 | (
486 | loss,
487 | render_err,
488 | reproj_err,
489 | cross_render_err,
490 | reg_loss,
491 | illu_prior_loss,
492 | nm_smt_loss,
493 | nm_loss,
494 | sdFree_inputs,
495 | sdFree_shadings,
496 | sdFree_recons,
497 | ) = consistency_layer.loss_formulate(
498 | albedos,
499 | shadow,
500 | nm_pred,
501 | lightings,
502 | nms_var[i],
503 | rbg_inputs_var,
504 | dms_var[i],
505 | cams_var[i],
506 | scaleXs_var[i],
507 | scaleYs_var[i],
508 | masks_var_4d,
509 | rbg_reproj_inputs_var,
510 | reproj_mask_var_4d,
511 | pair_label_var[i],
512 | supTrain_flag,
513 | illu_prior_loss,
514 | reg_loss_flag=False,
515 | )
516 |
517 | loss_split += [loss]
518 | render_err_split += [render_err]
519 | reproj_err_split += [reproj_err]
520 | cross_render_err_split += [cross_render_err]
521 | reg_loss_split += [reg_loss]
522 | illu_prior_loss_split += [illu_prior_loss]
523 | nm_smt_loss_split += [nm_smt_loss]
524 | nm_loss_split += [nm_loss]
525 |
526 | loss = tf.reduce_mean(loss_split)
527 | render_err = tf.reduce_mean(render_err_split)
528 | reproj_err = tf.reduce_mean(reproj_err_split)
529 | cross_render_err = tf.reduce_mean(cross_render_err_split)
530 | reg_loss = tf.reduce_mean(reg_loss_split)
531 | illu_prior_loss = tf.reduce_mean(illu_prior_loss_split)
532 | nm_smt_loss = tf.reduce_mean(nm_smt_loss_split)
533 | nm_loss = tf.reduce_mean(nm_loss_split)
534 |
535 | return (
536 | loss,
537 | render_err,
538 | reproj_err,
539 | cross_render_err,
540 | reg_loss,
541 | illu_prior_loss,
542 | nm_smt_loss,
543 | nm_loss,
544 | albedos,
545 | nm_pred,
546 | shadow,
547 | sdFree_inputs,
548 | sdFree_shadings,
549 | sdFree_recons,
550 | )
551 |
552 |
553 | def inputs_srbg_2_rbg(imgs):
554 | imgs = imgs / 2.0 + 0.5
555 |
556 | ret = tf.zeros_like(imgs)
557 | dp_mask = tf.to_float(imgs <= 0.04045)
558 | ret += dp_mask * imgs / 12.92
559 | ret += tf.pow((imgs + 0.055) / 1.055, 2.2) * (1 - dp_mask)
560 | imgs = tf.identity(ret)
561 |
562 | imgs = imgs * 2.0 - 1.0
563 |
564 | return imgs
565 |
566 |
567 | if __name__ == "__main__":
568 | main()
569 |
--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/utils/__init__.py
--------------------------------------------------------------------------------
/utils/diode_metrics.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 |
4 | def angular_error(gt, pred):
5 |
6 | # compute the vector product between gt and prediction for each pixel
7 | angularDist = (gt * pred).sum(axis=-1)
8 |
9 | # compute the angle from vector product
10 | angularDist = np.arccos(np.clip(angularDist, -1.0, 1.0))
11 |
12 | # convert radius to degrees
13 | angularDist = angularDist / np.pi * 180
14 |
15 | # find mask
16 | mask = np.float32(np.sum(gt ** 2, axis=-1) > 0.9)
17 | mask = mask != 0.0
18 |
19 | # only compute pixels under mask
20 | angularDist_masked = angularDist[mask]
21 |
22 | return angularDist_masked
23 |
--------------------------------------------------------------------------------
/utils/iiw_test_ids.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/YeeU/InverseRenderNet_v2/69922fa494f8541dc53048c41fe9f95a5d5053d2/utils/iiw_test_ids.npy
--------------------------------------------------------------------------------
/utils/render_sphere_nm.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 | def render_sphere_nm(radius, num):
4 | # nm is a batch of normal maps
5 | nm = []
6 |
7 | for i in range(num):
8 | ### hemisphere
9 | height = 2*radius
10 | width = 2*radius
11 | centre = radius
12 | x_grid, y_grid = np.meshgrid(np.arange(1.,2*radius+1), np.arange(1.,2*radius+1))
13 | # grids are (-radius, radius)
14 | x_grid -= centre
15 | # y_grid -= centre
16 | y_grid = centre - y_grid
17 | # scale range of h and w grid in (-1,1)
18 | x_grid /= radius
19 | y_grid /= radius
20 | dist = 1 - (x_grid**2+y_grid**2)
21 | mask = dist > 0
22 | z_grid = np.ones_like(mask) * np.nan
23 | z_grid[mask] = np.sqrt(dist[mask])
24 |
25 | # remove xs and ys by masking out nans in zs
26 | x_grid[~(mask)] = np.nan
27 | y_grid[~(mask)] = np.nan
28 |
29 | # concatenate normal map
30 | nm.append(np.stack([x_grid,y_grid,z_grid],axis=2))
31 |
32 |
33 |
34 | ### sphere
35 | # span the regular grid for computing azimuth and zenith angular map
36 | # height = 2*radius
37 | # width = 2*radius
38 | # centre = radius
39 | # h_grid, v_grid = np.meshgrid(np.arange(1.,2*radius+1), np.arange(1.,2*radius+1))
40 | # # grids are (-radius, radius)
41 | # h_grid -= centre
42 | # # v_grid -= centre
43 | # v_grid = centre - v_grid
44 | # # scale range of h and v grid in (-1,1)
45 | # h_grid /= radius
46 | # v_grid /= radius
47 |
48 | # # z_grid is linearly spread along theta/zenith in range (0,pi)
49 | # dist_grid = np.sqrt(h_grid**2+v_grid**2)
50 | # dist_grid[dist_grid>1] = np.nan
51 | # theta_grid = dist_grid * np.pi
52 | # z_grid = np.cos(theta_grid)
53 |
54 | # rho_grid = np.arctan2(v_grid,h_grid)
55 | # x_grid = np.sin(theta_grid)*np.cos(rho_grid)
56 | # y_grid = np.sin(theta_grid)*np.sin(rho_grid)
57 |
58 | # # concatenate normal map
59 | # nm.append(np.stack([x_grid,y_grid,z_grid],axis=2))
60 |
61 |
62 | # construct batch
63 | nm = np.stack(nm,axis=0)
64 |
65 |
66 |
67 | return nm
68 |
--------------------------------------------------------------------------------
/utils/whdr.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python2.7
2 | #
3 | # This is an implementation of the WHDR metric proposed in this paper:
4 | #
5 | # Sean Bell, Kavita Bala, Noah Snavely. "Intrinsic Images in the Wild". ACM
6 | # Transactions on Graphics (SIGGRAPH 2014). http://intrinsic.cs.cornell.edu.
7 | #
8 | # Please cite the above paper if you find this code useful. This code is
9 | # released under the MIT license (http://opensource.org/licenses/MIT).
10 | #
11 |
12 |
13 | import sys
14 | import json
15 | import argparse
16 | import numpy as np
17 | from PIL import Image
18 |
19 |
20 | def compute_whdr(reflectance, judgements, delta=0.10):
21 | """ Return the WHDR score for a reflectance image, evaluated against human
22 | judgements. The return value is in the range 0.0 to 1.0, or None if there
23 | are no judgements for the image. See section 3.5 of our paper for more
24 | details.
25 |
26 | :param reflectance: a numpy array containing the linear RGB
27 | reflectance image.
28 |
29 | :param judgements: a JSON object loaded from the Intrinsic Images in
30 | the Wild dataset.
31 |
32 | :param delta: the threshold where humans switch from saying "about the
33 | same" to "one point is darker."
34 | """
35 |
36 | points = judgements['intrinsic_points']
37 | comparisons = judgements['intrinsic_comparisons']
38 | id_to_points = {p['id']: p for p in points}
39 | rows, cols = reflectance.shape[0:2]
40 |
41 | error_sum = 0.0
42 | weight_sum = 0.0
43 |
44 | for c in comparisons:
45 | # "darker" is "J_i" in our paper
46 | darker = c['darker']
47 | if darker not in ('1', '2', 'E'):
48 | continue
49 |
50 | # "darker_score" is "w_i" in our paper
51 | weight = c['darker_score']
52 | if weight <= 0 or weight is None:
53 | continue
54 |
55 | point1 = id_to_points[c['point1']]
56 | point2 = id_to_points[c['point2']]
57 | if not point1['opaque'] or not point2['opaque']:
58 | continue
59 |
60 | # convert to grayscale and threshold
61 | l1 = max(1e-10, np.mean(reflectance[
62 | int(point1['y'] * rows), int(point1['x'] * cols), ...]))
63 | l2 = max(1e-10, np.mean(reflectance[
64 | int(point2['y'] * rows), int(point2['x'] * cols), ...]))
65 |
66 | # convert algorithm value to the same units as human judgements
67 | if l2 / l1 > 1.0 + delta:
68 | alg_darker = '1'
69 | elif l1 / l2 > 1.0 + delta:
70 | alg_darker = '2'
71 | else:
72 | alg_darker = 'E'
73 |
74 | if darker != alg_darker:
75 | error_sum += weight
76 | weight_sum += weight
77 |
78 | if weight_sum:
79 | return error_sum / weight_sum
80 | else:
81 | return None
82 |
83 |
84 | def load_image(filename, is_srgb=True):
85 | """ Load an image that is either linear or sRGB-encoded. """
86 |
87 | if not filename:
88 | raise ValueError("Empty filename")
89 | image = np.asarray(Image.open(filename)).astype(np.float) / 255.0
90 | if is_srgb:
91 | return srgb_to_rgb(image)
92 | else:
93 | return image
94 |
95 |
96 | def srgb_to_rgb(srgb):
97 | """ Convert an sRGB image to a linear RGB image """
98 |
99 | ret = np.zeros_like(srgb)
100 | idx0 = srgb <= 0.04045
101 | idx1 = srgb > 0.04045
102 | ret[idx0] = srgb[idx0] / 12.92
103 | ret[idx1] = np.power((srgb[idx1] + 0.055) / 1.055, 2.4)
104 | return ret
105 |
106 |
107 | if __name__ == "__main__":
108 | parser = argparse.ArgumentParser(
109 | description=(
110 | 'Evaluate an intrinsic image decomposition using the WHDR metric presented in:\n'
111 | ' Sean Bell, Kavita Bala, Noah Snavely. "Intrinsic Images in the Wild".\n'
112 | ' ACM Transactions on Graphics (SIGGRAPH 2014).\n'
113 | ' http://intrinsic.cs.cornell.edu.\n'
114 | '\n'
115 | 'The output is in the range 0.0 to 1.0.'
116 | )
117 | )
118 |
119 | parser.add_argument(
120 | 'reflectance', metavar='',
121 | help='reflectance image to be evaluated')
122 |
123 | parser.add_argument(
124 | 'judgements', metavar='',
125 | help='human judgements JSON file')
126 |
127 | parser.add_argument(
128 | '-l', '--linear', action='store_true', required=False,
129 | help='assume the reflectance image is linear, otherwise assume sRGB')
130 |
131 | parser.add_argument(
132 | '-d', '--delta', metavar='', type=float, required=False, default=0.10,
133 | help='delta threshold (default 0.10)')
134 |
135 | if len(sys.argv) < 2:
136 | parser.print_help()
137 | sys.exit(1)
138 |
139 | args = parser.parse_args()
140 | reflectance = load_image(filename=args.reflectance, is_srgb=(not args.linear))
141 | judgements = json.load(open(args.judgements))
142 |
143 | whdr = compute_whdr(reflectance, judgements, args.delta)
144 | print(whdr)
145 |
--------------------------------------------------------------------------------