├── .gitignore ├── README.md ├── detect_landmarks_in_image.py ├── examples ├── cropped.jpg ├── origin.jpg └── rendered.png ├── load_data.py ├── models ├── __pycache__ │ └── resnet_50.cpython-37.pyc └── resnet_50.py ├── preprocess ├── __pycache__ │ └── mtcnn.cpython-37.pyc ├── mtcnn.py └── utils │ ├── __pycache__ │ └── detect_face.cpython-37.pyc │ └── detect_face.py ├── recon_demo.py ├── reconstruction_mesh.py ├── requirements.txt └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | BFM/* 2 | dataset/* 3 | facebank/* 4 | *.pt 5 | output/* 6 | __pycache__/* 7 | !examples/* 8 | params/* 9 | .vscode/* 10 | *.pyc -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set 2 | 3 | Pytorch version of the repo [Deep3DFaceReconstruction](https://github.com/microsoft/Deep3DFaceReconstruction). 4 | 5 | This repo only contains the **reconstruction** part, so you can use [Deep3DFaceReconstruction-pytorch](https://github.com/changhongjian/Deep3DFaceReconstruction-pytorch) repo to train the network. And the pretrained model is also from this [repo](https://github.com/changhongjian/Deep3DFaceReconstruction-pytorch/tree/master/network). 6 | 7 | ## Features 8 | 9 | ### MTCNN 10 | 11 | I use mtcnn to crop raw images and detect 5 landmarks. The most code of MTCNN comes from [FaceNet-pytorch](https://github.com/timesler/facenet-pytorch). 12 | 13 | ### Pytorc3d 14 | 15 | In this repo, I use [PyTorch3d 0.3.0](https://github.com/facebookresearch/pytorch3d) to render the reconstructed images. 16 | 17 | ### Estimating Intrinsic Parameters 18 | 19 | In the origin repo ([Deep3DFaceReconstruction-pytorch](https://github.com/changhongjian/Deep3DFaceReconstruction-pytorch)), the rendered images is not the same as the input image because of `preprocess`. So, I add the `estimate_intrinsic` to get intrinsic parameters. 20 | 21 | ## Examples: 22 | 23 | Here are some examples: 24 | 25 | |Origin Images|Cropped Images|Rendered Images| 26 | |-------------|---|---| 27 | |![Putin](examples/origin.jpg)|![Putin](examples/cropped.jpg)|![putin](examples/rendered.png)| 28 | 29 | 30 | ## File Architecture 31 | 32 | ``` 33 | ├─BFM same as Deep3DFaceReconstruction 34 | ├─dataset storing the corpped images 35 | │ └─Vladimir_Putin 36 | ├─examples show examples 37 | ├─facebank storing the raw/origin images 38 | │ └─Vladimir_Putin 39 | ├─models storing the pretrained models 40 | ├─output storing the output images(.mat, .png) 41 | │ └─Vladimir_Putin 42 | └─preprocess cropping images and detecting landmarks 43 | ├─data storing the models of mtcnn 44 | ├─utils 45 | ``` 46 | 47 | Also, this repo can also generate the UV map, and you need download UV coordinates from the following link: 48 |   Download UV coordinates fom STN website: https://github.com/anilbas/3DMMasSTN/blob/master/util/BFM_UV.mat 49 |   Copy BFM_UV.mat to BFM 50 | 51 | The pretrained models can be downloaded from [Google Drive](https://drive.google.com/file/d/1JjLl8-7Qurwlq5q61hSJEbCKFrhPh0t2/view?usp=sharing). 52 | -------------------------------------------------------------------------------- /detect_landmarks_in_image.py: -------------------------------------------------------------------------------- 1 | from preprocess.mtcnn import MTCNN 2 | from torch.utils.data import DataLoader 3 | from torchvision import datasets, transforms 4 | import torch 5 | import os 6 | 7 | 8 | def collate_pil(x): 9 | out_x, out_y = [], [] 10 | for xx, yy in x: 11 | out_x.append(xx) 12 | out_y.append(yy) 13 | return out_x, out_y 14 | 15 | 16 | batch_size = 1 17 | workers = 0 if os.name == 'nt' else 8 18 | dataset_dir = r'facebank' 19 | cropped_dataset = r'dataset' 20 | device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') 21 | 22 | mtcnn = MTCNN( 23 | image_size=(300, 300), margin=20, min_face_size=20, 24 | thresholds=[0.6, 0.7, 0.7], factor=0.709, post_process=True, 25 | device=device 26 | ) 27 | 28 | dataset = datasets.ImageFolder( 29 | dataset_dir, transform=transforms.Resize((512, 512))) 30 | dataset.samples = [ 31 | (p, p.replace(dataset_dir, cropped_dataset)) 32 | for p, _ in dataset.samples 33 | ] 34 | loader = DataLoader( 35 | dataset, 36 | num_workers=workers, 37 | batch_size=batch_size, 38 | collate_fn=collate_pil 39 | ) 40 | 41 | for i, (x, y) in enumerate(loader): 42 | x = mtcnn(x, save_path=y, save_landmarks=True) 43 | -------------------------------------------------------------------------------- /examples/cropped.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/csyuhao/Deep3DFaceReconstruction-Pytorch/7c506b55adee55bb269f73354dd16d5327a7fb04/examples/cropped.jpg -------------------------------------------------------------------------------- /examples/origin.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/csyuhao/Deep3DFaceReconstruction-Pytorch/7c506b55adee55bb269f73354dd16d5327a7fb04/examples/origin.jpg -------------------------------------------------------------------------------- /examples/rendered.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/csyuhao/Deep3DFaceReconstruction-Pytorch/7c506b55adee55bb269f73354dd16d5327a7fb04/examples/rendered.png -------------------------------------------------------------------------------- /load_data.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from scipy.io import loadmat, savemat 3 | from array import array 4 | import numpy as np 5 | from PIL import Image 6 | 7 | 8 | class BFM(object): 9 | # BFM 3D face model 10 | def __init__(self, model_path='BFM/BFM_model_front.mat', device='cpu'): 11 | model = loadmat(model_path) 12 | # mean face shape. [3*N,1] 13 | self.meanshape = torch.from_numpy(model['meanshape']) 14 | # identity basis. [3*N,80] 15 | self.idBase = torch.from_numpy(model['idBase']) 16 | self.exBase = torch.from_numpy(model['exBase'].astype( 17 | np.float32)) # expression basis. [3*N,64] 18 | # mean face texture. [3*N,1] (0-255) 19 | self.meantex = torch.from_numpy(model['meantex']) 20 | # texture basis. [3*N,80] 21 | self.texBase = torch.from_numpy(model['texBase']) 22 | # triangle indices for each vertex that lies in. starts from 1. [N,8] 23 | self.point_buf = model['point_buf'].astype(np.int32) 24 | # vertex indices in each triangle. starts from 1. [F,3] 25 | self.tri = model['tri'].astype(np.int32) 26 | # vertex indices of 68 facial landmarks. starts from 1. [68,1] 27 | self.keypoints = model['keypoints'].astype(np.int32)[0] 28 | self.to_device(device) 29 | 30 | def to_device(self, device): 31 | self.meanshape = self.meanshape.to(device) 32 | self.idBase = self.idBase.to(device) 33 | self.exBase = self.exBase.to(device) 34 | self.meantex = self.meantex.to(device) 35 | self.texBase = self.texBase.to(device) 36 | 37 | def load_lm3d(self, fsimilarity_Lm3D_all_mat='BFM/similarity_Lm3D_all.mat'): 38 | # load landmarks for standard face, which is used for image preprocessing 39 | Lm3D = loadmat(fsimilarity_Lm3D_all_mat) 40 | Lm3D = Lm3D['lm'] 41 | 42 | # calculate 5 facial landmarks using 68 landmarks 43 | lm_idx = np.array([31, 37, 40, 43, 46, 49, 55]) - 1 44 | Lm3D = np.stack([Lm3D[lm_idx[0], :], np.mean(Lm3D[lm_idx[[1, 2]], :], 0), np.mean( 45 | Lm3D[lm_idx[[3, 4]], :], 0), Lm3D[lm_idx[5], :], Lm3D[lm_idx[6], :]], axis=0) 46 | Lm3D = Lm3D[[1, 2, 0, 3, 4], :] 47 | self.Lm3D = Lm3D 48 | return Lm3D 49 | 50 | 51 | def load_expbasis(): 52 | # load expression basis 53 | n_vertex = 53215 54 | exp_bin = open(r'BFM\Exp_Pca.bin', 'rb') 55 | exp_dim = array('i') 56 | exp_dim.fromfile(exp_bin, 1) 57 | expMU = array('f') 58 | expPC = array('f') 59 | expMU.fromfile(exp_bin, 3*n_vertex) 60 | expPC.fromfile(exp_bin, 3*exp_dim[0]*n_vertex) 61 | 62 | expPC = np.array(expPC) 63 | expPC = np.reshape(expPC, [exp_dim[0], -1]) 64 | expPC = np.transpose(expPC) 65 | 66 | expEV = np.loadtxt(r'BFM\std_exp.txt') 67 | 68 | return expPC, expEV 69 | 70 | 71 | def transfer_BFM09(): 72 | # tranfer original BFM2009 to target face model 73 | original_BFM = loadmat(r'BFM\01_MorphableModel.mat') 74 | shapePC = original_BFM['shapePC'] # shape basis 75 | shapeEV = original_BFM['shapeEV'] # corresponding eigen values 76 | shapeMU = original_BFM['shapeMU'] # mean face 77 | texPC = original_BFM['texPC'] # texture basis 78 | texEV = original_BFM['texEV'] # corresponding eigen values 79 | texMU = original_BFM['texMU'] # mean texture 80 | 81 | expPC, expEV = load_expbasis() 82 | 83 | idBase = shapePC * np.reshape(shapeEV, [-1, 199]) 84 | idBase = idBase / 1e5 # unify the scale to decimeter 85 | idBase = idBase[:, :80] # use only first 80 basis 86 | 87 | exBase = expPC * np.reshape(expEV, [-1, 79]) 88 | exBase = exBase / 1e5 # unify the scale to decimeter 89 | exBase = exBase[:, :64] # use only first 64 basis 90 | 91 | texBase = texPC*np.reshape(texEV, [-1, 199]) 92 | texBase = texBase[:, :80] # use only first 80 basis 93 | 94 | # our face model is cropped align face landmarks which contains only 35709 vertex. 95 | # original BFM09 contains 53490 vertex, and expression basis provided by JuYong contains 53215 vertex. 96 | # thus we select corresponding vertex to get our face model. 97 | index_exp = loadmat('BFM/BFM_front_idx.mat') 98 | index_exp = index_exp['idx'].astype( 99 | np.int32) - 1 # starts from 0 (to 53215) 100 | 101 | index_shape = loadmat('BFM/BFM_exp_idx.mat') 102 | index_shape = index_shape['trimIndex'].astype( 103 | np.int32) - 1 # starts from 0 (to 53490) 104 | index_shape = index_shape[index_exp] 105 | 106 | idBase = np.reshape(idBase, [-1, 3, 80]) 107 | idBase = idBase[index_shape, :, :] 108 | idBase = np.reshape(idBase, [-1, 80]) 109 | 110 | texBase = np.reshape(texBase, [-1, 3, 80]) 111 | texBase = texBase[index_shape, :, :] 112 | texBase = np.reshape(texBase, [-1, 80]) 113 | 114 | exBase = np.reshape(exBase, [-1, 3, 64]) 115 | exBase = exBase[index_exp, :, :] 116 | exBase = np.reshape(exBase, [-1, 64]) 117 | 118 | meanshape = np.reshape(shapeMU, [-1, 3]) / 1e5 119 | meanshape = meanshape[index_shape, :] 120 | meanshape = np.reshape(meanshape, [1, -1]) 121 | 122 | meantex = np.reshape(texMU, [-1, 3]) 123 | meantex = meantex[index_shape, :] 124 | meantex = np.reshape(meantex, [1, -1]) 125 | 126 | # other info contains triangles, region used for computing photometric loss, 127 | # region used for skin texture regularization, and 68 landmarks index etc. 128 | other_info = loadmat('BFM/facemodel_info.mat') 129 | frontmask2_idx = other_info['frontmask2_idx'] 130 | skinmask = other_info['skinmask'] 131 | keypoints = other_info['keypoints'] 132 | point_buf = other_info['point_buf'] 133 | tri = other_info['tri'] 134 | tri_mask2 = other_info['tri_mask2'] 135 | 136 | # save our face model 137 | savemat('BFM/BFM_model_front.mat', {'meanshape': meanshape, 'meantex': meantex, 'idBase': idBase, 'exBase': exBase, 'texBase': texBase, 138 | 'tri': tri, 'point_buf': point_buf, 'tri_mask2': tri_mask2, 'keypoints': keypoints, 'frontmask2_idx': frontmask2_idx, 'skinmask': skinmask}) 139 | 140 | 141 | # calculating least sqaures problem 142 | def POS(xp, x): 143 | npts = xp.shape[1] 144 | 145 | A = np.zeros([2*npts, 8]) 146 | 147 | A[0:2*npts-1:2, 0:3] = x.transpose() 148 | A[0:2*npts-1:2, 3] = 1 149 | 150 | A[1:2*npts:2, 4:7] = x.transpose() 151 | A[1:2*npts:2, 7] = 1 152 | 153 | b = np.reshape(xp.transpose(), [2*npts, 1]) 154 | 155 | k, _, _, _ = np.linalg.lstsq(A, b, rcond=None) 156 | 157 | R1 = k[0:3] 158 | R2 = k[4:7] 159 | sTx = k[3] 160 | sTy = k[7] 161 | s = (np.linalg.norm(R1) + np.linalg.norm(R2))/2 162 | t = np.stack([sTx, sTy], axis=0) 163 | 164 | return t, s 165 | 166 | 167 | def process_img(img, lm, t, s, target_size=224.): 168 | w0, h0 = img.size 169 | w = (w0/s*102).astype(np.int32) 170 | h = (h0/s*102).astype(np.int32) 171 | img = img.resize((w, h), resample=Image.BICUBIC) 172 | 173 | left = (w/2 - target_size/2 + float((t[0] - w0/2)*102/s)).astype(np.int32) 174 | right = left + target_size 175 | up = (h/2 - target_size/2 + float((h0/2 - t[1])*102/s)).astype(np.int32) 176 | below = up + target_size 177 | 178 | img = img.crop((left, up, right, below)) 179 | img = np.array(img) 180 | img = img[:, :, ::-1] # RGBtoBGR 181 | img = np.expand_dims(img, 0) 182 | lm = np.stack([lm[:, 0] - t[0] + w0/2, lm[:, 1] - 183 | t[1] + h0/2], axis=1)/s*102 184 | lm = lm - \ 185 | np.reshape( 186 | np.array([(w/2 - target_size/2), (h/2-target_size/2)]), [1, 2]) 187 | 188 | return img, lm 189 | 190 | 191 | def Preprocess(img, lm, lm3D): 192 | # resize and crop input images before sending to the R-Net 193 | w0, h0 = img.size 194 | 195 | # change from image plane coordinates to 3D sapce coordinates(X-Y plane) 196 | lm = np.stack([lm[:, 0], h0 - 1 - lm[:, 1]], axis=1) 197 | 198 | # calculate translation and scale factors using 5 facial landmarks and standard landmarks 199 | # lm3D -> lm 200 | t, s = POS(lm.transpose(), lm3D.transpose()) 201 | 202 | # processing the image 203 | img_new, lm_new = process_img(img, lm, t, s) 204 | 205 | lm_new = np.stack([lm_new[:, 0], 223 - lm_new[:, 1]], axis=1) 206 | trans_params = np.array([w0, h0, 102.0/s, t[0, 0], t[1, 0]]) 207 | 208 | return img_new, lm_new, trans_params 209 | 210 | 211 | def load_img(img_path, lm_path): 212 | # load input images and corresponding 5 landmarks 213 | image = Image.open(img_path) 214 | lm = np.loadtxt(lm_path) 215 | return image, lm 216 | 217 | 218 | def save_obj(path, v, f, c): 219 | # save 3D face to obj file 220 | with open(path, 'w') as file: 221 | for i in range(len(v)): 222 | file.write('v %f %f %f %f %f %f\n' % 223 | (v[i, 0], v[i, 1], v[i, 2], c[i, 0], c[i, 1], c[i, 2])) 224 | 225 | file.write('\n') 226 | 227 | for i in range(len(f)): 228 | file.write('f %d %d %d\n' % (f[i, 0], f[i, 1], f[i, 2])) 229 | 230 | file.close() 231 | 232 | 233 | def transfer_UV(): 234 | uv_model = loadmat('BFM/BFM_UV.mat') 235 | 236 | index_exp = loadmat('BFM/BFM_front_idx.mat') 237 | index_exp = index_exp['idx'].astype( 238 | np.int32) - 1 # starts from 0 (to 53215) 239 | 240 | uv_pos = uv_model['UV'] 241 | uv_pos = uv_pos[index_exp, :] 242 | uv_pos = np.reshape(uv_pos, (-1, 2)) 243 | 244 | return uv_pos 245 | -------------------------------------------------------------------------------- /models/__pycache__/resnet_50.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/csyuhao/Deep3DFaceReconstruction-Pytorch/7c506b55adee55bb269f73354dd16d5327a7fb04/models/__pycache__/resnet_50.cpython-37.pyc -------------------------------------------------------------------------------- /models/resnet_50.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import math 4 | import pickle 5 | 6 | 7 | def load_state_dict(model, fname): 8 | """ 9 | Set parameters converted from Caffe models authors of VGGFace2 provide. 10 | See https://www.robots.ox.ac.uk/~vgg/data/vgg_face2/. 11 | 12 | Arguments: 13 | model: model 14 | fname: file name of parameters converted from a Caffe model, assuming the file format is Pickle. 15 | """ 16 | with open(fname, 'rb') as f: 17 | weights = pickle.load(f, encoding='latin1') 18 | 19 | own_state = model.state_dict() 20 | for name, param in weights.items(): 21 | if name in own_state: 22 | try: 23 | own_state[name].copy_(torch.from_numpy(param)) 24 | except Exception: 25 | raise RuntimeError('While copying the parameter named {}, whose dimensions in the model are {} and whose ' 26 | 'dimensions in the checkpoint are {}.'.format(name, own_state[name].size(), param.size())) 27 | else: 28 | # raise KeyError('unexpected key "{}" in state_dict'.format(name)) 29 | print('unexpected key "{}" in state_dict'.format(name)) 30 | 31 | 32 | def conv3x3(in_planes, out_planes, stride=1): 33 | """3x3 convolution with padding""" 34 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 35 | padding=1, bias=False) 36 | 37 | 38 | def conv1x1(in_planes, out_planes, bias=True): 39 | """3x3 convolution with padding""" 40 | return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=1, bias=bias) 41 | 42 | 43 | class Bottleneck(nn.Module): 44 | expansion = 4 45 | 46 | def __init__(self, inplanes, planes, stride=1, downsample=None): 47 | super(Bottleneck, self).__init__() 48 | self.conv1 = nn.Conv2d( 49 | inplanes, planes, kernel_size=1, stride=stride, bias=False) 50 | self.bn1 = nn.BatchNorm2d(planes) 51 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, 52 | stride=1, padding=1, bias=False) 53 | self.bn2 = nn.BatchNorm2d(planes) 54 | self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False) 55 | self.bn3 = nn.BatchNorm2d(planes * 4) 56 | self.relu = nn.ReLU(inplace=True) 57 | self.downsample = downsample 58 | self.stride = stride 59 | 60 | def forward(self, x): 61 | residual = x 62 | 63 | out = self.conv1(x) 64 | out = self.bn1(out) 65 | out = self.relu(out) 66 | 67 | out = self.conv2(out) 68 | out = self.bn2(out) 69 | out = self.relu(out) 70 | 71 | out = self.conv3(out) 72 | out = self.bn3(out) 73 | 74 | if self.downsample is not None: 75 | residual = self.downsample(x) 76 | 77 | out += residual 78 | out = self.relu(out) 79 | 80 | return out 81 | 82 | 83 | class ResNet(nn.Module): 84 | 85 | def __init__(self, block, layers, num_classes=-1, include_top=True): 86 | self.inplanes = 64 87 | super(ResNet, self).__init__() 88 | self.include_top = include_top 89 | 90 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, 91 | stride=2, padding=3, bias=False) 92 | self.bn1 = nn.BatchNorm2d(64) 93 | self.relu = nn.ReLU(inplace=True) 94 | self.maxpool = nn.MaxPool2d( 95 | kernel_size=3, stride=2, padding=0, ceil_mode=True) 96 | 97 | self.layer1 = self._make_layer(block, 64, layers[0]) 98 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2) 99 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2) 100 | self.layer4 = self._make_layer(block, 512, layers[3], stride=2) 101 | self.avgpool = nn.AvgPool2d(7, stride=1) 102 | 103 | # self.fc = nn.Linear(512 * block.expansion, num_classes) 104 | 105 | # CHJ_ADD task use 106 | self.fc_dims = { 107 | "id": 80, 108 | "ex": 64, 109 | "tex": 80, 110 | "angles": 3, 111 | "gamma": 27, 112 | "XY": 2, 113 | "Z": 1} 114 | 115 | # self.fc_dims_arr=[0] * (1+len(self.fc_dims)) 116 | # for i, (k, v) in enumerate(self.fc_dims.items()): 117 | # self.fc_dims_arr[i+1] = v + self.fc_dims_arr[i] 118 | 119 | _outdim = 512 * block.expansion 120 | ''' 121 | self.fcid = nn.Linear(_outdim, 80) 122 | self.fcex = nn.Linear(_outdim, 64) 123 | self.fctex = nn.Linear(_outdim, 80) 124 | self.fcangles = nn.Linear(_outdim, 3) 125 | self.fcgamma = nn.Linear(_outdim, 27) 126 | self.fcXY = nn.Linear(_outdim, 2) 127 | self.fcZ = nn.Linear(_outdim, 1) 128 | ''' 129 | self.fcid = conv1x1(_outdim, 80) 130 | self.fcex = conv1x1(_outdim, 64) 131 | self.fctex = conv1x1(_outdim, 80) 132 | self.fcangles = conv1x1(_outdim, 3) 133 | self.fcgamma = conv1x1(_outdim, 27) 134 | self.fcXY = conv1x1(_outdim, 2) 135 | self.fcZ = conv1x1(_outdim, 1) 136 | 137 | self.arr_fc = [self.fcid, self.fcex, self.fctex, 138 | self.fcangles, self.fcgamma, self.fcXY, self.fcZ] 139 | 140 | for m in self.modules(): 141 | if isinstance(m, nn.Conv2d): 142 | n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels 143 | m.weight.data.normal_(0, math.sqrt(2. / n)) 144 | elif isinstance(m, nn.BatchNorm2d): 145 | m.weight.data.fill_(1) 146 | m.bias.data.zero_() 147 | 148 | def _make_layer(self, block, planes, blocks, stride=1): 149 | downsample = None 150 | if stride != 1 or self.inplanes != planes * block.expansion: 151 | downsample = nn.Sequential( 152 | nn.Conv2d(self.inplanes, planes * block.expansion, 153 | kernel_size=1, stride=stride, bias=False), 154 | nn.BatchNorm2d(planes * block.expansion), 155 | ) 156 | 157 | layers = [] 158 | layers.append(block(self.inplanes, planes, stride, downsample)) 159 | self.inplanes = planes * block.expansion 160 | for i in range(1, blocks): 161 | layers.append(block(self.inplanes, planes)) 162 | 163 | return nn.Sequential(*layers) 164 | 165 | def forward(self, x): 166 | x = self.conv1(x) 167 | x = self.bn1(x) 168 | x = self.relu(x) 169 | x = self.maxpool(x) 170 | x = self.layer1(x) 171 | x = self.layer2(x) 172 | x = self.layer3(x) 173 | x = self.layer4(x) 174 | x = self.avgpool(x) 175 | 176 | # 这里不需要view 177 | n_b = x.size(0) 178 | outs = [] 179 | for fc in self.arr_fc: 180 | outs.append(fc(x).view(n_b, -1)) 181 | 182 | return outs 183 | 184 | 185 | def resnet50_use(): 186 | """Constructs a ResNet-50 model. 187 | """ 188 | model = ResNet(Bottleneck, [3, 4, 6, 3]) 189 | return model 190 | -------------------------------------------------------------------------------- /preprocess/__pycache__/mtcnn.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/csyuhao/Deep3DFaceReconstruction-Pytorch/7c506b55adee55bb269f73354dd16d5327a7fb04/preprocess/__pycache__/mtcnn.cpython-37.pyc -------------------------------------------------------------------------------- /preprocess/mtcnn.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch import nn 3 | import numpy as np 4 | import os 5 | 6 | from .utils.detect_face import detect_face, extract_face, save_landmark 7 | 8 | 9 | class PNet(nn.Module): 10 | """MTCNN PNet. 11 | 12 | Keyword Arguments: 13 | pretrained {bool} -- Whether or not to load saved pretrained weights (default: {True}) 14 | """ 15 | 16 | def __init__(self, pretrained=True): 17 | super().__init__() 18 | 19 | self.conv1 = nn.Conv2d(3, 10, kernel_size=3) 20 | self.prelu1 = nn.PReLU(10) 21 | self.pool1 = nn.MaxPool2d(2, 2, ceil_mode=True) 22 | self.conv2 = nn.Conv2d(10, 16, kernel_size=3) 23 | self.prelu2 = nn.PReLU(16) 24 | self.conv3 = nn.Conv2d(16, 32, kernel_size=3) 25 | self.prelu3 = nn.PReLU(32) 26 | self.conv4_1 = nn.Conv2d(32, 2, kernel_size=1) 27 | self.softmax4_1 = nn.Softmax(dim=1) 28 | self.conv4_2 = nn.Conv2d(32, 4, kernel_size=1) 29 | 30 | self.training = False 31 | 32 | if pretrained: 33 | state_dict_path = os.path.join( 34 | os.path.dirname(__file__), 'data/pnet.pt') 35 | state_dict = torch.load(state_dict_path) 36 | self.load_state_dict(state_dict) 37 | 38 | def forward(self, x): 39 | x = self.conv1(x) 40 | x = self.prelu1(x) 41 | x = self.pool1(x) 42 | x = self.conv2(x) 43 | x = self.prelu2(x) 44 | x = self.conv3(x) 45 | x = self.prelu3(x) 46 | a = self.conv4_1(x) 47 | a = self.softmax4_1(a) 48 | b = self.conv4_2(x) 49 | return b, a 50 | 51 | 52 | class RNet(nn.Module): 53 | """MTCNN RNet. 54 | 55 | Keyword Arguments: 56 | pretrained {bool} -- Whether or not to load saved pretrained weights (default: {True}) 57 | """ 58 | 59 | def __init__(self, pretrained=True): 60 | super().__init__() 61 | 62 | self.conv1 = nn.Conv2d(3, 28, kernel_size=3) 63 | self.prelu1 = nn.PReLU(28) 64 | self.pool1 = nn.MaxPool2d(3, 2, ceil_mode=True) 65 | self.conv2 = nn.Conv2d(28, 48, kernel_size=3) 66 | self.prelu2 = nn.PReLU(48) 67 | self.pool2 = nn.MaxPool2d(3, 2, ceil_mode=True) 68 | self.conv3 = nn.Conv2d(48, 64, kernel_size=2) 69 | self.prelu3 = nn.PReLU(64) 70 | self.dense4 = nn.Linear(576, 128) 71 | self.prelu4 = nn.PReLU(128) 72 | self.dense5_1 = nn.Linear(128, 2) 73 | self.softmax5_1 = nn.Softmax(dim=1) 74 | self.dense5_2 = nn.Linear(128, 4) 75 | 76 | self.training = False 77 | 78 | if pretrained: 79 | state_dict_path = os.path.join( 80 | os.path.dirname(__file__), 'data/rnet.pt') 81 | state_dict = torch.load(state_dict_path) 82 | self.load_state_dict(state_dict) 83 | 84 | def forward(self, x): 85 | x = self.conv1(x) 86 | x = self.prelu1(x) 87 | x = self.pool1(x) 88 | x = self.conv2(x) 89 | x = self.prelu2(x) 90 | x = self.pool2(x) 91 | x = self.conv3(x) 92 | x = self.prelu3(x) 93 | x = x.permute(0, 3, 2, 1).contiguous() 94 | x = self.dense4(x.view(x.shape[0], -1)) 95 | x = self.prelu4(x) 96 | a = self.dense5_1(x) 97 | a = self.softmax5_1(a) 98 | b = self.dense5_2(x) 99 | return b, a 100 | 101 | 102 | class ONet(nn.Module): 103 | """MTCNN ONet. 104 | 105 | Keyword Arguments: 106 | pretrained {bool} -- Whether or not to load saved pretrained weights (default: {True}) 107 | """ 108 | 109 | def __init__(self, pretrained=True): 110 | super().__init__() 111 | 112 | self.conv1 = nn.Conv2d(3, 32, kernel_size=3) 113 | self.prelu1 = nn.PReLU(32) 114 | self.pool1 = nn.MaxPool2d(3, 2, ceil_mode=True) 115 | self.conv2 = nn.Conv2d(32, 64, kernel_size=3) 116 | self.prelu2 = nn.PReLU(64) 117 | self.pool2 = nn.MaxPool2d(3, 2, ceil_mode=True) 118 | self.conv3 = nn.Conv2d(64, 64, kernel_size=3) 119 | self.prelu3 = nn.PReLU(64) 120 | self.pool3 = nn.MaxPool2d(2, 2, ceil_mode=True) 121 | self.conv4 = nn.Conv2d(64, 128, kernel_size=2) 122 | self.prelu4 = nn.PReLU(128) 123 | self.dense5 = nn.Linear(1152, 256) 124 | self.prelu5 = nn.PReLU(256) 125 | self.dense6_1 = nn.Linear(256, 2) 126 | self.softmax6_1 = nn.Softmax(dim=1) 127 | self.dense6_2 = nn.Linear(256, 4) 128 | self.dense6_3 = nn.Linear(256, 10) 129 | 130 | self.training = False 131 | 132 | if pretrained: 133 | state_dict_path = os.path.join( 134 | os.path.dirname(__file__), 'data/onet.pt') 135 | state_dict = torch.load(state_dict_path) 136 | self.load_state_dict(state_dict) 137 | 138 | def forward(self, x): 139 | x = self.conv1(x) 140 | x = self.prelu1(x) 141 | x = self.pool1(x) 142 | x = self.conv2(x) 143 | x = self.prelu2(x) 144 | x = self.pool2(x) 145 | x = self.conv3(x) 146 | x = self.prelu3(x) 147 | x = self.pool3(x) 148 | x = self.conv4(x) 149 | x = self.prelu4(x) 150 | x = x.permute(0, 3, 2, 1).contiguous() 151 | x = self.dense5(x.view(x.shape[0], -1)) 152 | x = self.prelu5(x) 153 | a = self.dense6_1(x) 154 | a = self.softmax6_1(a) 155 | b = self.dense6_2(x) 156 | c = self.dense6_3(x) 157 | return b, c, a 158 | 159 | 160 | class MTCNN(nn.Module): 161 | """MTCNN face detection module. 162 | 163 | This class loads pretrained P-, R-, and O-nets and returns images cropped to include the face 164 | only, given raw input images of one of the following types: 165 | - PIL image or list of PIL images 166 | - numpy.ndarray (uint8) representing either a single image (3D) or a batch of images (4D). 167 | Cropped faces can optionally be saved to file 168 | also. 169 | 170 | Keyword Arguments: 171 | image_size {int} -- Output image size in pixels. The image will be square. (default: {160}) 172 | margin {int} -- Margin to add to bounding box, in terms of pixels in the final image. 173 | Note that the application of the margin differs slightly from the davidsandberg/facenet 174 | repo, which applies the margin to the original image before resizing, making the margin 175 | dependent on the original image size (this is a bug in davidsandberg/facenet). 176 | (default: {0}) 177 | min_face_size {int} -- Minimum face size to search for. (default: {20}) 178 | thresholds {list} -- MTCNN face detection thresholds (default: {[0.6, 0.7, 0.7]}) 179 | factor {float} -- Factor used to create a scaling pyramid of face sizes. (default: {0.709}) 180 | post_process {bool} -- Whether or not to post process images tensors before returning. 181 | (default: {True}) 182 | select_largest {bool} -- If True, if multiple faces are detected, the largest is returned. 183 | If False, the face with the highest detection probability is returned. 184 | (default: {True}) 185 | keep_all {bool} -- If True, all detected faces are returned, in the order dictated by the 186 | select_largest parameter. If a save_path is specified, the first face is saved to that 187 | path and the remaining faces are saved to 1, 2 etc. 188 | device {torch.device} -- The device on which to run neural net passes. Image tensors and 189 | models are copied to this device before running forward passes. (default: {None}) 190 | """ 191 | 192 | def __init__( 193 | self, image_size=160, margin=0, min_face_size=20, 194 | thresholds=[0.6, 0.7, 0.7], factor=0.709, post_process=True, 195 | select_largest=True, keep_all=False, device=None 196 | ): 197 | super().__init__() 198 | 199 | self.image_size = image_size 200 | self.margin = margin 201 | self.min_face_size = min_face_size 202 | self.thresholds = thresholds 203 | self.factor = factor 204 | self.post_process = post_process 205 | self.select_largest = select_largest 206 | self.keep_all = keep_all 207 | 208 | self.pnet = PNet() 209 | self.rnet = RNet() 210 | self.onet = ONet() 211 | 212 | self.device = torch.device('cpu') 213 | if device is not None: 214 | self.device = device 215 | self.to(device) 216 | 217 | def forward(self, img, save_path=None, return_prob=False, save_landmarks=False): 218 | """Run MTCNN face detection on a PIL image or numpy array. This method performs both 219 | detection and extraction of faces, returning tensors representing detected faces rather 220 | than the bounding boxes. To access bounding boxes, see the MTCNN.detect() method below. 221 | 222 | Arguments: 223 | img {PIL.Image, np.ndarray, or list} -- A PIL image, np.ndarray, or list. 224 | 225 | Keyword Arguments: 226 | save_path {str} -- An optional save path for the cropped image. Note that when 227 | self.post_process=True, although the returned tensor is post processed, the saved 228 | face image is not, so it is a true representation of the face in the input image. 229 | If `img` is a list of images, `save_path` should be a list of equal length. 230 | (default: {None}) 231 | return_prob {bool} -- Whether or not to return the detection probability. 232 | (default: {False}) 233 | 234 | Returns: 235 | Union[torch.Tensor, tuple(torch.tensor, float)] -- If detected, cropped image of a face 236 | with dimensions 3 x image_size x image_size. Optionally, the probability that a 237 | face was detected. If self.keep_all is True, n detected faces are returned in an 238 | n x 3 x image_size x image_size tensor with an optional list of detection 239 | probabilities. If `img` is a list of images, the item(s) returned have an extra 240 | dimension (batch) as the first dimension. 241 | 242 | Example: 243 | >>> from facenet_pytorch import MTCNN 244 | >>> mtcnn = MTCNN() 245 | >>> face_tensor, prob = mtcnn(img, save_path='face.png', return_prob=True) 246 | """ 247 | 248 | # Detect faces 249 | with torch.no_grad(): 250 | res = self.detect(img, save_landmarks) 251 | if save_landmarks: 252 | batch_boxes, batch_probs, batch_landmarks = res[0], res[1], res[2] 253 | else: 254 | batch_boxes, batch_probs = res[0], res[1] 255 | 256 | # Determine if a batch or single image was passed 257 | batch_mode = True 258 | if not isinstance(img, (list, tuple)) and not (isinstance(img, np.ndarray) and len(img.shape) == 4): 259 | img = [img] 260 | batch_boxes = [batch_boxes] 261 | batch_probs = [batch_probs] 262 | batch_mode = False 263 | 264 | # Parse save path(s) 265 | if save_path is not None: 266 | if isinstance(save_path, str): 267 | save_path = [save_path] 268 | else: 269 | save_path = [None for _ in range(len(img))] 270 | 271 | # Process all bounding boxes and probabilities 272 | faces, probs = [], [] 273 | for idx, (im, box_im, prob_im, path_im) in enumerate(zip(img, batch_boxes, batch_probs, save_path)): 274 | if box_im is None: 275 | faces.append(None) 276 | probs.append([None] if self.keep_all else None) 277 | continue 278 | 279 | if not self.keep_all: 280 | box_im = box_im[[0]] 281 | 282 | land_im = batch_landmarks[idx] 283 | 284 | faces_im = [] 285 | for i, box in enumerate(box_im): 286 | face_path = path_im 287 | save_name, ext = os.path.splitext(path_im) 288 | landmark_path = save_name + '.txt' 289 | 290 | land = land_im[i] 291 | 292 | if path_im is not None and i > 0: 293 | save_name, ext = os.path.splitext(path_im) 294 | face_path = save_name + '_' + str(i + 1) + ext 295 | landmark_path = save_name + '_' + str(i + 1) + '.txt' 296 | 297 | face = extract_face(im, box, self.image_size, 298 | self.margin, face_path) 299 | if save_landmarks: 300 | save_landmark(im, box, self.image_size, self.margin, land, landmark_path) 301 | if self.post_process: 302 | face = fixed_image_standardization(face) 303 | faces_im.append(face) 304 | 305 | if self.keep_all: 306 | faces_im = torch.stack(faces_im) 307 | else: 308 | faces_im = faces_im[0] 309 | prob_im = prob_im[0] 310 | 311 | faces.append(faces_im) 312 | probs.append(prob_im) 313 | 314 | if not batch_mode: 315 | faces = faces[0] 316 | probs = probs[0] 317 | 318 | if return_prob: 319 | return faces, probs 320 | else: 321 | return faces 322 | 323 | def detect(self, img, landmarks=False): 324 | """Detect all faces in PIL image and return bounding boxes and optional facial landmarks. 325 | 326 | This method is used by the forward method and is also useful for face detection tasks 327 | that require lower-level handling of bounding boxes and facial landmarks (e.g., face 328 | tracking). The functionality of the forward function can be emulated by using this method 329 | followed by the extract_face() function. 330 | 331 | Arguments: 332 | img {PIL.Image, np.ndarray, or list} -- A PIL image or a list of PIL images. 333 | 334 | Keyword Arguments: 335 | landmarks {bool} -- Whether to return facial landmarks in addition to bounding boxes. 336 | (default: {False}) 337 | 338 | Returns: 339 | tuple(numpy.ndarray, list) -- For N detected faces, a tuple containing an 340 | Nx4 array of bounding boxes and a length N list of detection probabilities. 341 | Returned boxes will be sorted in descending order by detection probability if 342 | self.select_largest=False, otherwise the largest face will be returned first. 343 | If `img` is a list of images, the items returned have an extra dimension 344 | (batch) as the first dimension. Optionally, a third item, the facial landmarks, 345 | are returned if `landmarks=True`. 346 | 347 | Example: 348 | >>> from PIL import Image, ImageDraw 349 | >>> from facenet_pytorch import MTCNN, extract_face 350 | >>> mtcnn = MTCNN(keep_all=True) 351 | >>> boxes, probs, points = mtcnn.detect(img, landmarks=True) 352 | >>> # Draw boxes and save faces 353 | >>> img_draw = img.copy() 354 | >>> draw = ImageDraw.Draw(img_draw) 355 | >>> for i, (box, point) in enumerate(zip(boxes, points)): 356 | ... draw.rectangle(box.tolist(), width=5) 357 | ... for p in point: 358 | ... draw.rectangle((p - 10).tolist() + (p + 10).tolist(), width=10) 359 | ... extract_face(img, box, save_path='detected_face_{}.png'.format(i)) 360 | >>> img_draw.save('annotated_faces.png') 361 | """ 362 | 363 | with torch.no_grad(): 364 | batch_boxes, batch_points = detect_face( 365 | img, self.min_face_size, 366 | self.pnet, self.rnet, self.onet, 367 | self.thresholds, self.factor, 368 | self.device 369 | ) 370 | 371 | boxes, probs, points = [], [], [] 372 | for box, point in zip(batch_boxes, batch_points): 373 | box = np.array(box) 374 | point = np.array(point) 375 | if len(box) == 0: 376 | boxes.append(None) 377 | probs.append([None]) 378 | points.append(None) 379 | elif self.select_largest: 380 | box_order = np.argsort( 381 | (box[:, 2] - box[:, 0]) * (box[:, 3] - box[:, 1]))[::-1] 382 | box = box[box_order] 383 | point = point[box_order] 384 | boxes.append(box[:, :4]) 385 | probs.append(box[:, 4]) 386 | points.append(point) 387 | else: 388 | boxes.append(box[:, :4]) 389 | probs.append(box[:, 4]) 390 | points.append(point) 391 | boxes = np.array(boxes) 392 | probs = np.array(probs) 393 | points = np.array(points) 394 | 395 | if not isinstance(img, (list, tuple)) and not (isinstance(img, np.ndarray) and len(img.shape) == 4): 396 | boxes = boxes[0] 397 | probs = probs[0] 398 | points = points[0] 399 | 400 | if landmarks: 401 | return boxes, probs, points 402 | 403 | return boxes, probs 404 | 405 | 406 | def fixed_image_standardization(image_tensor): 407 | processed_tensor = (image_tensor - 127.5) / 128.0 408 | return processed_tensor 409 | 410 | 411 | def prewhiten(x): 412 | mean = x.mean() 413 | std = x.std() 414 | std_adj = std.clamp(min=1.0/(float(x.numel())**0.5)) 415 | y = (x - mean) / std_adj 416 | return y 417 | -------------------------------------------------------------------------------- /preprocess/utils/__pycache__/detect_face.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/csyuhao/Deep3DFaceReconstruction-Pytorch/7c506b55adee55bb269f73354dd16d5327a7fb04/preprocess/utils/__pycache__/detect_face.cpython-37.pyc -------------------------------------------------------------------------------- /preprocess/utils/detect_face.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch.nn.functional import interpolate 3 | from torchvision.transforms import functional as F 4 | from torchvision.ops.boxes import batched_nms 5 | import cv2 6 | from PIL import Image, ImageDraw 7 | import numpy as np 8 | import os 9 | 10 | 11 | def detect_face(imgs, minsize, pnet, rnet, onet, threshold, factor, device): 12 | if isinstance(imgs, (np.ndarray, torch.Tensor)): 13 | imgs = torch.as_tensor(imgs, device=device) 14 | if len(imgs.shape) == 3: 15 | imgs = imgs.unsqueeze(0) 16 | else: 17 | if not isinstance(imgs, (list, tuple)): 18 | imgs = [imgs] 19 | if any(img.size != imgs[0].size for img in imgs): 20 | raise Exception("MTCNN batch processing only compatible with equal-dimension images.") 21 | imgs = np.stack([np.uint8(img) for img in imgs]) 22 | 23 | imgs = torch.as_tensor(imgs, device=device) 24 | 25 | imgs = imgs.permute(0, 3, 1, 2).float() 26 | 27 | batch_size = len(imgs) 28 | h, w = imgs.shape[2:4] 29 | m = 12.0 / minsize 30 | minl = min(h, w) 31 | minl = minl * m 32 | 33 | # Create scale pyramid 34 | scale_i = m 35 | scales = [] 36 | while minl >= 12: 37 | scales.append(scale_i) 38 | scale_i = scale_i * factor 39 | minl = minl * factor 40 | 41 | # First stage 42 | boxes = [] 43 | image_inds = [] 44 | all_inds = [] 45 | all_i = 0 46 | for scale in scales: 47 | im_data = imresample(imgs, (int(h * scale + 1), int(w * scale + 1))) 48 | im_data = (im_data - 127.5) * 0.0078125 49 | reg, probs = pnet(im_data) 50 | 51 | boxes_scale, image_inds_scale = generateBoundingBox(reg, probs[:, 1], scale, threshold[0]) 52 | boxes.append(boxes_scale) 53 | image_inds.append(image_inds_scale) 54 | all_inds.append(all_i + image_inds_scale) 55 | all_i += batch_size 56 | 57 | boxes = torch.cat(boxes, dim=0) 58 | image_inds = torch.cat(image_inds, dim=0).cpu() 59 | all_inds = torch.cat(all_inds, dim=0) 60 | 61 | # NMS within each scale + image 62 | pick = batched_nms(boxes[:, :4], boxes[:, 4], all_inds, 0.5) 63 | boxes, image_inds = boxes[pick], image_inds[pick] 64 | 65 | # NMS within each image 66 | pick = batched_nms(boxes[:, :4], boxes[:, 4], image_inds, 0.7) 67 | boxes, image_inds = boxes[pick], image_inds[pick] 68 | 69 | regw = boxes[:, 2] - boxes[:, 0] 70 | regh = boxes[:, 3] - boxes[:, 1] 71 | qq1 = boxes[:, 0] + boxes[:, 5] * regw 72 | qq2 = boxes[:, 1] + boxes[:, 6] * regh 73 | qq3 = boxes[:, 2] + boxes[:, 7] * regw 74 | qq4 = boxes[:, 3] + boxes[:, 8] * regh 75 | boxes = torch.stack([qq1, qq2, qq3, qq4, boxes[:, 4]]).permute(1, 0) 76 | boxes = rerec(boxes) 77 | y, ey, x, ex = pad(boxes, w, h) 78 | 79 | # Second stage 80 | if len(boxes) > 0: 81 | im_data = [] 82 | for k in range(len(y)): 83 | if ey[k] > (y[k] - 1) and ex[k] > (x[k] - 1): 84 | img_k = imgs[image_inds[k], :, (y[k] - 1):ey[k], (x[k] - 1):ex[k]].unsqueeze(0) 85 | im_data.append(imresample(img_k, (24, 24))) 86 | im_data = torch.cat(im_data, dim=0) 87 | im_data = (im_data - 127.5) * 0.0078125 88 | out = rnet(im_data) 89 | 90 | out0 = out[0].permute(1, 0) 91 | out1 = out[1].permute(1, 0) 92 | score = out1[1, :] 93 | ipass = score > threshold[1] 94 | boxes = torch.cat((boxes[ipass, :4], score[ipass].unsqueeze(1)), dim=1) 95 | image_inds = image_inds[ipass] 96 | mv = out0[:, ipass].permute(1, 0) 97 | 98 | # NMS within each image 99 | pick = batched_nms(boxes[:, :4], boxes[:, 4], image_inds, 0.7) 100 | boxes, image_inds, mv = boxes[pick], image_inds[pick], mv[pick] 101 | boxes = bbreg(boxes, mv) 102 | boxes = rerec(boxes) 103 | 104 | # Third stage 105 | points = torch.zeros(0, 5, 2, device=device) 106 | if len(boxes) > 0: 107 | y, ey, x, ex = pad(boxes, w, h) 108 | im_data = [] 109 | for k in range(len(y)): 110 | if ey[k] > (y[k] - 1) and ex[k] > (x[k] - 1): 111 | img_k = imgs[image_inds[k], :, (y[k] - 1):ey[k], (x[k] - 1):ex[k]].unsqueeze(0) 112 | im_data.append(imresample(img_k, (48, 48))) 113 | im_data = torch.cat(im_data, dim=0) 114 | im_data = (im_data - 127.5) * 0.0078125 115 | out = onet(im_data) 116 | 117 | out0 = out[0].permute(1, 0) 118 | out1 = out[1].permute(1, 0) 119 | out2 = out[2].permute(1, 0) 120 | score = out2[1, :] 121 | points = out1 122 | ipass = score > threshold[2] 123 | points = points[:, ipass] 124 | boxes = torch.cat((boxes[ipass, :4], score[ipass].unsqueeze(1)), dim=1) 125 | image_inds = image_inds[ipass] 126 | mv = out0[:, ipass].permute(1, 0) 127 | 128 | w_i = boxes[:, 2] - boxes[:, 0] + 1 129 | h_i = boxes[:, 3] - boxes[:, 1] + 1 130 | points_x = w_i.repeat(5, 1) * points[:5, :] + boxes[:, 0].repeat(5, 1) - 1 131 | points_y = h_i.repeat(5, 1) * points[5:10, :] + boxes[:, 1].repeat(5, 1) - 1 132 | points = torch.stack((points_x, points_y)).permute(2, 1, 0) 133 | boxes = bbreg(boxes, mv) 134 | 135 | # NMS within each image using "Min" strategy 136 | # pick = batched_nms(boxes[:, :4], boxes[:, 4], image_inds, 0.7) 137 | pick = batched_nms_numpy(boxes[:, :4], boxes[:, 4], image_inds, 0.7, 'Min') 138 | boxes, image_inds, points = boxes[pick], image_inds[pick], points[pick] 139 | 140 | boxes = boxes.cpu().numpy() 141 | points = points.cpu().numpy() 142 | 143 | batch_boxes = [] 144 | batch_points = [] 145 | for b_i in range(batch_size): 146 | b_i_inds = np.where(image_inds == b_i) 147 | batch_boxes.append(boxes[b_i_inds].copy()) 148 | batch_points.append(points[b_i_inds].copy()) 149 | 150 | batch_boxes, batch_points = np.array(batch_boxes), np.array(batch_points) 151 | 152 | return batch_boxes, batch_points 153 | 154 | 155 | def bbreg(boundingbox, reg): 156 | if reg.shape[1] == 1: 157 | reg = torch.reshape(reg, (reg.shape[2], reg.shape[3])) 158 | 159 | w = boundingbox[:, 2] - boundingbox[:, 0] + 1 160 | h = boundingbox[:, 3] - boundingbox[:, 1] + 1 161 | b1 = boundingbox[:, 0] + reg[:, 0] * w 162 | b2 = boundingbox[:, 1] + reg[:, 1] * h 163 | b3 = boundingbox[:, 2] + reg[:, 2] * w 164 | b4 = boundingbox[:, 3] + reg[:, 3] * h 165 | boundingbox[:, :4] = torch.stack([b1, b2, b3, b4]).permute(1, 0) 166 | 167 | return boundingbox 168 | 169 | 170 | def generateBoundingBox(reg, probs, scale, thresh): 171 | stride = 2 172 | cellsize = 12 173 | 174 | reg = reg.permute(1, 0, 2, 3) 175 | 176 | mask = probs >= thresh 177 | mask_inds = mask.nonzero() 178 | image_inds = mask_inds[:, 0] 179 | score = probs[mask] 180 | reg = reg[:, mask].permute(1, 0) 181 | bb = mask_inds[:, 1:].float().flip(1) 182 | q1 = ((stride * bb + 1) / scale).floor() 183 | q2 = ((stride * bb + cellsize - 1 + 1) / scale).floor() 184 | boundingbox = torch.cat([q1, q2, score.unsqueeze(1), reg], dim=1) 185 | return boundingbox, image_inds 186 | 187 | 188 | def nms_numpy(boxes, scores, threshold, method): 189 | if boxes.size == 0: 190 | return np.empty((0, 3)) 191 | 192 | x1 = boxes[:, 0].copy() 193 | y1 = boxes[:, 1].copy() 194 | x2 = boxes[:, 2].copy() 195 | y2 = boxes[:, 3].copy() 196 | s = scores 197 | area = (x2 - x1 + 1) * (y2 - y1 + 1) 198 | 199 | I = np.argsort(s) 200 | pick = np.zeros_like(s, dtype=np.int16) 201 | counter = 0 202 | while I.size > 0: 203 | i = I[-1] 204 | pick[counter] = i 205 | counter += 1 206 | idx = I[0:-1] 207 | 208 | xx1 = np.maximum(x1[i], x1[idx]).copy() 209 | yy1 = np.maximum(y1[i], y1[idx]).copy() 210 | xx2 = np.minimum(x2[i], x2[idx]).copy() 211 | yy2 = np.minimum(y2[i], y2[idx]).copy() 212 | 213 | w = np.maximum(0.0, xx2 - xx1 + 1).copy() 214 | h = np.maximum(0.0, yy2 - yy1 + 1).copy() 215 | 216 | inter = w * h 217 | if method is "Min": 218 | o = inter / np.minimum(area[i], area[idx]) 219 | else: 220 | o = inter / (area[i] + area[idx] - inter) 221 | I = I[np.where(o <= threshold)] 222 | 223 | pick = pick[:counter].copy() 224 | return pick 225 | 226 | 227 | def batched_nms_numpy(boxes, scores, idxs, threshold, method): 228 | device = boxes.device 229 | if boxes.numel() == 0: 230 | return torch.empty((0,), dtype=torch.int64, device=device) 231 | # strategy: in order to perform NMS independently per class. 232 | # we add an offset to all the boxes. The offset is dependent 233 | # only on the class idx, and is large enough so that boxes 234 | # from different classes do not overlap 235 | max_coordinate = boxes.max() 236 | offsets = idxs.to(boxes) * (max_coordinate + 1) 237 | boxes_for_nms = boxes + offsets[:, None] 238 | boxes_for_nms = boxes_for_nms.cpu().numpy() 239 | scores = scores.cpu().numpy() 240 | keep = nms_numpy(boxes_for_nms, scores, threshold, method) 241 | return torch.as_tensor(keep, dtype=torch.long, device=device) 242 | 243 | 244 | def pad(boxes, w, h): 245 | boxes = boxes.trunc().int().cpu().numpy() 246 | x = boxes[:, 0] 247 | y = boxes[:, 1] 248 | ex = boxes[:, 2] 249 | ey = boxes[:, 3] 250 | 251 | x[x < 1] = 1 252 | y[y < 1] = 1 253 | ex[ex > w] = w 254 | ey[ey > h] = h 255 | 256 | return y, ey, x, ex 257 | 258 | 259 | def rerec(bboxA): 260 | h = bboxA[:, 3] - bboxA[:, 1] 261 | w = bboxA[:, 2] - bboxA[:, 0] 262 | 263 | l = torch.max(w, h) 264 | bboxA[:, 0] = bboxA[:, 0] + w * 0.5 - l * 0.5 265 | bboxA[:, 1] = bboxA[:, 1] + h * 0.5 - l * 0.5 266 | bboxA[:, 2:4] = bboxA[:, :2] + l.repeat(2, 1).permute(1, 0) 267 | 268 | return bboxA 269 | 270 | 271 | def imresample(img, sz): 272 | im_data = interpolate(img, size=sz, mode="area") 273 | return im_data 274 | 275 | 276 | def crop_resize(img, box, image_size): 277 | if isinstance(image_size, tuple): 278 | if isinstance(img, np.ndarray): 279 | out = cv2.resize( 280 | img[box[1]:box[3], box[0]:box[2]], 281 | (image_size[1], image_size[0]), 282 | interpolation=cv2.INTER_AREA 283 | ).copy() 284 | else: 285 | out = img.crop(box).copy().resize((image_size[1], image_size[0]), Image.BILINEAR) 286 | else: 287 | if isinstance(img, np.ndarray): 288 | out = cv2.resize( 289 | img[box[1]:box[3], box[0]:box[2]], 290 | (image_size, image_size), 291 | interpolation=cv2.INTER_AREA 292 | ).copy() 293 | else: 294 | out = img.crop(box).copy().resize((image_size, image_size), Image.BILINEAR) 295 | return out 296 | 297 | 298 | def save_img(img, path): 299 | if isinstance(img, np.ndarray): 300 | cv2.imwrite(path, cv2.cvtColor(img, cv2.COLOR_RGB2BGR)) 301 | else: 302 | img.save(path) 303 | 304 | 305 | def get_size(img): 306 | if isinstance(img, np.ndarray): 307 | return img.shape[1::-1] 308 | else: 309 | return img.size 310 | 311 | 312 | def extract_face(img, box, image_size=160, margin=0, save_path=None): 313 | """Extract face + margin from PIL Image given bounding box. 314 | 315 | Arguments: 316 | img {PIL.Image} -- A PIL Image. 317 | box {numpy.ndarray} -- Four-element bounding box. 318 | image_size {int} -- Output image size in pixels. The image will be square. 319 | margin {int} -- Margin to add to bounding box, in terms of pixels in the final image. 320 | Note that the application of the margin differs slightly from the davidsandberg/facenet 321 | repo, which applies the margin to the original image before resizing, making the margin 322 | dependent on the original image size. 323 | save_path {str} -- Save path for extracted face image. (default: {None}) 324 | 325 | Returns: 326 | torch.tensor -- tensor representing the extracted face. 327 | """ 328 | if isinstance(image_size, tuple): 329 | margin = [ 330 | margin * (box[2] - box[0]) / (image_size[1] - margin), 331 | margin * (box[3] - box[1]) / (image_size[0] - margin), 332 | ] 333 | else: 334 | margin = [ 335 | margin * (box[2] - box[0]) / (image_size - margin), 336 | margin * (box[3] - box[1]) / (image_size - margin), 337 | ] 338 | raw_image_size = get_size(img) 339 | box = [ 340 | int(max(box[0] - margin[0] / 2, 0)), 341 | int(max(box[1] - margin[1] / 2, 0)), 342 | int(min(box[2] + margin[0] / 2, raw_image_size[0])), 343 | int(min(box[3] + margin[1] / 2, raw_image_size[1])), 344 | ] 345 | 346 | # img_draw = img.copy() 347 | # draw = ImageDraw.Draw(img_draw) 348 | # draw.rectangle(box, outline=(255, 0, 0), width=6) 349 | # img_draw.show() 350 | 351 | face = crop_resize(img, box, image_size) 352 | 353 | if save_path is not None: 354 | os.makedirs(os.path.dirname(save_path) + "/", exist_ok=True) 355 | save_img(face, save_path) 356 | 357 | face = F.to_tensor(np.float32(face)) 358 | 359 | return face 360 | 361 | 362 | def save_landmark(img, box, image_size, margin, landmark, save_path): 363 | """Save Landmark to path 364 | 365 | Arguments: 366 | img {PIL.Image} -- A PIL Image. 367 | box {numpy.ndarray} -- Four-element bounding box. 368 | image_size {int} -- Output image size in pixels. The image will be square. 369 | margin {int} -- Margin to add to bounding box, in terms of pixels in the final image. 370 | Note that the application of the margin differs slightly from the davidsandberg/facenet 371 | repo, which applies the margin to the original image before resizing, making the margin 372 | dependent on the original image size. 373 | save_path {str} -- Save path for extracted face image. (default: {None}) 374 | 375 | Returns: 376 | None 377 | """ 378 | if isinstance(image_size, tuple): 379 | margin = [ 380 | margin * (box[2] - box[0]) / (image_size[1] - margin), 381 | margin * (box[3] - box[1]) / (image_size[0] - margin), 382 | ] 383 | else: 384 | margin = [ 385 | margin * (box[2] - box[0]) / (image_size - margin), 386 | margin * (box[3] - box[1]) / (image_size - margin), 387 | ] 388 | raw_image_size = get_size(img) 389 | box = [ 390 | int(max(box[0] - margin[0] / 2, 0)), 391 | int(max(box[1] - margin[1] / 2, 0)), 392 | int(min(box[2] + margin[0] / 2, raw_image_size[0])), 393 | int(min(box[3] + margin[1] / 2, raw_image_size[1])), 394 | ] 395 | 396 | landmark[:, 0] = (landmark[:, 0] - box[0]) / (box[2] - box[0]) 397 | landmark[:, 1] = (landmark[:, 1] - box[1]) / (box[3] - box[1]) 398 | 399 | if isinstance(image_size, tuple): 400 | landmark[:, 0] *= image_size[1] 401 | landmark[:, 1] *= image_size[0] 402 | else: 403 | landmark *= image_size 404 | 405 | with open(save_path, 'w+') as f: 406 | for (x, y) in landmark: 407 | f.write('{}\t{}\n'.format(x, y)) 408 | -------------------------------------------------------------------------------- /recon_demo.py: -------------------------------------------------------------------------------- 1 | import os 2 | import glob 3 | import torch 4 | import numpy as np 5 | from models.resnet_50 import resnet50_use 6 | from load_data import transfer_BFM09, BFM, load_img, Preprocess, save_obj 7 | from reconstruction_mesh import reconstruction, render_img, transform_face_shape, estimate_intrinsic 8 | 9 | 10 | def recon(): 11 | # input and output folder 12 | image_path = r'dataset' 13 | save_path = 'output' 14 | if not os.path.exists(save_path): 15 | os.makedirs(save_path) 16 | img_list = glob.glob(image_path + '/**/' + '*.png', recursive=True) 17 | img_list += glob.glob(image_path + '/**/' + '*.jpg', recursive=True) 18 | 19 | # read BFM face model 20 | # transfer original BFM model to our model 21 | if not os.path.isfile('BFM/BFM_model_front.mat'): 22 | transfer_BFM09() 23 | 24 | device = 'cuda:0' if torch.cuda.is_available() else 'cpu:0' 25 | bfm = BFM(r'BFM/BFM_model_front.mat', device) 26 | 27 | # read standard landmarks for preprocessing images 28 | lm3D = bfm.load_lm3d() 29 | 30 | model = resnet50_use().to(device) 31 | model.load_state_dict(torch.load(r'models\params.pt')) 32 | model.eval() 33 | 34 | for param in model.parameters(): 35 | param.requires_grad = False 36 | 37 | for file in img_list: 38 | # load images and corresponding 5 facial landmarks 39 | img, lm = load_img(file, file.replace('jpg', 'txt')) 40 | 41 | # preprocess input image 42 | input_img_org, lm_new, transform_params = Preprocess(img, lm, lm3D) 43 | 44 | input_img = input_img_org.astype(np.float32) 45 | input_img = torch.from_numpy(input_img).permute(0, 3, 1, 2) 46 | # the input_img is BGR 47 | input_img = input_img.to(device) 48 | 49 | arr_coef = model(input_img) 50 | 51 | coef = torch.cat(arr_coef, 1) 52 | 53 | # reconstruct 3D face with output coefficients and face model 54 | face_shape, face_texture, face_color, landmarks_2d, z_buffer, angles, translation, gamma = reconstruction(coef, bfm) 55 | 56 | fx, px, fy, py = estimate_intrinsic(landmarks_2d, transform_params, z_buffer, face_shape, bfm, angles, translation) 57 | 58 | face_shape_t = transform_face_shape(face_shape, angles, translation) 59 | face_color = face_color / 255.0 60 | face_shape_t[:, :, 2] = 10.0 - face_shape_t[:, :, 2] 61 | 62 | images = render_img(face_shape_t, face_color, bfm, 300, fx, fy, px, py) 63 | images = images.detach().cpu().numpy() 64 | images = np.squeeze(images) 65 | 66 | path_str = file.replace(image_path, save_path) 67 | path = os.path.split(path_str)[0] 68 | if os.path.exists(path) is False: 69 | os.makedirs(path) 70 | 71 | from PIL import Image 72 | images = np.uint8(images[:, :, :3] * 255.0) 73 | # init_img = np.array(img) 74 | # init_img[images != 0] = 0 75 | # images += init_img 76 | img = Image.fromarray(images) 77 | img.save(file.replace(image_path, save_path).replace('jpg', 'png')) 78 | 79 | face_shape = face_shape.detach().cpu().numpy() 80 | face_color = face_color.detach().cpu().numpy() 81 | 82 | face_shape = np.squeeze(face_shape) 83 | face_color = np.squeeze(face_color) 84 | save_obj(file.replace(image_path, save_path).replace('.jpg', '_mesh.obj'), face_shape, bfm.tri, np.clip(face_color, 0, 1.0)) # 3D reconstruction face (in canonical view) 85 | 86 | from load_data import transfer_UV 87 | from utils import process_uv 88 | # loading UV coordinates 89 | uv_pos = transfer_UV() 90 | tex_coords = process_uv(uv_pos.copy()) 91 | tex_coords = torch.tensor(tex_coords, dtype=torch.float32).unsqueeze(0).to(device) 92 | 93 | face_texture = face_texture / 255.0 94 | images = render_img(tex_coords, face_texture, bfm, 600, 600.0 - 1.0, 600.0 - 1.0, 0.0, 0.0) 95 | images = images.detach().cpu().numpy() 96 | images = np.squeeze(images) 97 | 98 | # from PIL import Image 99 | images = np.uint8(images[:, :, :3] * 255.0) 100 | img = Image.fromarray(images) 101 | img.save(file.replace(image_path, save_path).replace('.jpg', '_texture.png')) 102 | 103 | 104 | if __name__ == '__main__': 105 | recon() 106 | -------------------------------------------------------------------------------- /reconstruction_mesh.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import math 3 | import numpy as np 4 | from utils import LeastSquares 5 | 6 | 7 | def split_coeff(coeff): 8 | # input: coeff with shape [1,257] 9 | id_coeff = coeff[:, :80] # identity(shape) coeff of dim 80 10 | ex_coeff = coeff[:, 80:144] # expression coeff of dim 64 11 | tex_coeff = coeff[:, 144:224] # texture(albedo) coeff of dim 80 12 | angles = coeff[:, 224:227] # ruler angles(x,y,z) for rotation of dim 3 13 | # lighting coeff for 3 channel SH function of dim 27 14 | gamma = coeff[:, 227:254] 15 | translation = coeff[:, 254:] # translation coeff of dim 3 16 | 17 | return id_coeff, ex_coeff, tex_coeff, angles, gamma, translation 18 | 19 | 20 | class _need_const: 21 | a0 = np.pi 22 | a1 = 2 * np.pi / np.sqrt(3.0) 23 | a2 = 2 * np.pi / np.sqrt(8.0) 24 | c0 = 1 / np.sqrt(4 * np.pi) 25 | c1 = np.sqrt(3.0) / np.sqrt(4 * np.pi) 26 | c2 = 3 * np.sqrt(5.0) / np.sqrt(12 * np.pi) 27 | d0 = 0.5 / np.sqrt(3.0) 28 | 29 | illu_consts = [a0, a1, a2, c0, c1, c2, d0] 30 | 31 | origin_size = 300 32 | target_size = 224 33 | camera_pos = 10.0 34 | 35 | 36 | def shape_formation(id_coeff, ex_coeff, facemodel): 37 | # compute face shape with identity and expression coeff, based on BFM model 38 | # input: id_coeff with shape [1,80] 39 | # ex_coeff with shape [1,64] 40 | # output: face_shape with shape [1,N,3], N is number of vertices 41 | 42 | ''' 43 | S = mean_shape + \alpha * B_id + \beta * B_exp 44 | ''' 45 | n_b = id_coeff.size(0) 46 | face_shape = torch.einsum('ij,aj->ai', facemodel.idBase, id_coeff) + \ 47 | torch.einsum('ij,aj->ai', facemodel.exBase, ex_coeff) + \ 48 | facemodel.meanshape 49 | 50 | face_shape = face_shape.view(n_b, -1, 3) 51 | # re-center face shape 52 | face_shape = face_shape - \ 53 | facemodel.meanshape.view(1, -1, 3).mean(dim=1, keepdim=True) 54 | 55 | return face_shape 56 | 57 | 58 | def texture_formation(tex_coeff, facemodel): 59 | # compute vertex texture(albedo) with tex_coeff 60 | # input: tex_coeff with shape [1,N,3] 61 | # output: face_texture with shape [1,N,3], RGB order, range from 0-255 62 | 63 | ''' 64 | T = mean_texture + \gamma * B_texture 65 | ''' 66 | 67 | n_b = tex_coeff.size(0) 68 | face_texture = torch.einsum( 69 | 'ij,aj->ai', facemodel.texBase, tex_coeff) + facemodel.meantex 70 | 71 | face_texture = face_texture.view(n_b, -1, 3) 72 | return face_texture 73 | 74 | 75 | def compute_norm(face_shape, facemodel): 76 | # compute vertex normal using one-ring neighborhood (8 points) 77 | # input: face_shape with shape [1,N,3] 78 | # output: v_norm with shape [1,N,3] 79 | # https://fredriksalomonsson.files.wordpress.com/2010/10/mesh-data-structuresv2.pdf 80 | 81 | # vertex index for each triangle face, with shape [F,3], F is number of faces 82 | face_id = facemodel.tri - 1 83 | # adjacent face index for each vertex, with shape [N,8], N is number of vertex 84 | point_id = facemodel.point_buf - 1 85 | shape = face_shape 86 | v1 = shape[:, face_id[:, 0], :] 87 | v2 = shape[:, face_id[:, 1], :] 88 | v3 = shape[:, face_id[:, 2], :] 89 | e1 = v1 - v2 90 | e2 = v2 - v3 91 | face_norm = e1.cross(e2) # compute normal for each face 92 | 93 | # normalized face_norm first 94 | face_norm = torch.nn.functional.normalize(face_norm, p=2, dim=2) 95 | empty = torch.zeros((face_norm.size(0), 1, 3), 96 | dtype=face_norm.dtype, device=face_norm.device) 97 | 98 | # concat face_normal with a zero vector at the end 99 | face_norm = torch.cat((face_norm, empty), 1) 100 | 101 | # compute vertex normal using one-ring neighborhood 102 | v_norm = face_norm[:, point_id, :].sum(dim=2) 103 | v_norm = torch.nn.functional.normalize(v_norm, p=2, dim=2) # normalize normal vectors 104 | return v_norm 105 | 106 | 107 | def compute_rotation_matrix(angles): 108 | # compute rotation matrix based on 3 ruler angles 109 | # input: angles with shape [1,3] 110 | # output: rotation matrix with shape [1,3,3] 111 | n_b = angles.size(0) 112 | 113 | # https://www.cnblogs.com/larry-xia/p/11926121.html 114 | device = angles.device 115 | # compute rotation matrix for X-axis, Y-axis, Z-axis respectively 116 | rotation_X = torch.cat( 117 | [ 118 | torch.ones([n_b, 1]).to(device), 119 | torch.zeros([n_b, 3]).to(device), 120 | torch.reshape(torch.cos(angles[:, 0]), [n_b, 1]), 121 | - torch.reshape(torch.sin(angles[:, 0]), [n_b, 1]), 122 | torch.zeros([n_b, 1]).to(device), 123 | torch.reshape(torch.sin(angles[:, 0]), [n_b, 1]), 124 | torch.reshape(torch.cos(angles[:, 0]), [n_b, 1]) 125 | ], 126 | axis=1 127 | ) 128 | rotation_Y = torch.cat( 129 | [ 130 | torch.reshape(torch.cos(angles[:, 1]), [n_b, 1]), 131 | torch.zeros([n_b, 1]).to(device), 132 | torch.reshape(torch.sin(angles[:, 1]), [n_b, 1]), 133 | torch.zeros([n_b, 1]).to(device), 134 | torch.ones([n_b, 1]).to(device), 135 | torch.zeros([n_b, 1]).to(device), 136 | - torch.reshape(torch.sin(angles[:, 1]), [n_b, 1]), 137 | torch.zeros([n_b, 1]).to(device), 138 | torch.reshape(torch.cos(angles[:, 1]), [n_b, 1]), 139 | ], 140 | axis=1 141 | ) 142 | rotation_Z = torch.cat( 143 | [ 144 | torch.reshape(torch.cos(angles[:, 2]), [n_b, 1]), 145 | - torch.reshape(torch.sin(angles[:, 2]), [n_b, 1]), 146 | torch.zeros([n_b, 1]).to(device), 147 | torch.reshape(torch.sin(angles[:, 2]), [n_b, 1]), 148 | torch.reshape(torch.cos(angles[:, 2]), [n_b, 1]), 149 | torch.zeros([n_b, 3]).to(device), 150 | torch.ones([n_b, 1]).to(device), 151 | ], 152 | axis=1 153 | ) 154 | 155 | rotation_X = rotation_X.reshape([n_b, 3, 3]) 156 | rotation_Y = rotation_Y.reshape([n_b, 3, 3]) 157 | rotation_Z = rotation_Z.reshape([n_b, 3, 3]) 158 | 159 | # R = Rz*Ry*Rx 160 | rotation = rotation_Z.bmm(rotation_Y).bmm(rotation_X) 161 | 162 | # because our face shape is N*3, so compute the transpose of R, so that rotation shapes can be calculated as face_shape*R 163 | rotation = rotation.permute(0, 2, 1) 164 | 165 | return rotation 166 | 167 | 168 | def projection_layer(face_shape, fx=1015.0, fy=1015.0, px=112.0, py=112.0): 169 | # we choose the focal length and camera position empirically 170 | # project 3D face onto image plane 171 | # input: face_shape with shape [1,N,3] 172 | # rotation with shape [1,3,3] 173 | # translation with shape [1,3] 174 | # output: face_projection with shape [1,N,2] 175 | # z_buffer with shape [1,N,1] 176 | 177 | cam_pos = 10 178 | p_matrix = np.concatenate([[fx], [0.0], [px], [0.0], [fy], [py], [0.0], [0.0], [1.0]], 179 | axis=0).astype(np.float32) # projection matrix 180 | p_matrix = np.reshape(p_matrix, [1, 3, 3]) 181 | p_matrix = torch.from_numpy(p_matrix) 182 | gpu_p_matrix = None 183 | 184 | n_b, nV, _ = face_shape.size() 185 | if face_shape.is_cuda: 186 | gpu_p_matrix = p_matrix.cuda() 187 | p_matrix = gpu_p_matrix.expand(n_b, 3, 3) 188 | else: 189 | p_matrix = p_matrix.expand(n_b, 3, 3) 190 | 191 | face_shape[:, :, 2] = cam_pos - face_shape[:, :, 2] 192 | aug_projection = face_shape.bmm(p_matrix.permute(0, 2, 1)) 193 | face_projection = aug_projection[:, :, 0:2] / aug_projection[:, :, 2:] 194 | 195 | z_buffer = cam_pos - aug_projection[:, :, 2:] 196 | 197 | return face_projection, z_buffer 198 | 199 | 200 | def illumination_layer(face_texture, norm, gamma): 201 | # CHJ: It's different from what I knew. 202 | # compute vertex color using face_texture and SH function lighting approximation 203 | # input: face_texture with shape [1,N,3] 204 | # norm with shape [1,N,3] 205 | # gamma with shape [1,27] 206 | # output: face_color with shape [1,N,3], RGB order, range from 0-255 207 | # lighting with shape [1,N,3], color under uniform texture 208 | 209 | n_b, num_vertex, _ = face_texture.size() 210 | n_v_full = n_b * num_vertex 211 | gamma = gamma.view(-1, 3, 9).clone() 212 | gamma[:, :, 0] += 0.8 213 | 214 | gamma = gamma.permute(0, 2, 1) 215 | 216 | a0, a1, a2, c0, c1, c2, d0 = _need_const.illu_consts 217 | 218 | Y0 = torch.ones(n_v_full).float() * a0*c0 219 | if gamma.is_cuda: 220 | Y0 = Y0.cuda() 221 | norm = norm.view(-1, 3) 222 | nx, ny, nz = norm[:, 0], norm[:, 1], norm[:, 2] 223 | arrH = [] 224 | 225 | arrH.append(Y0) 226 | arrH.append(-a1*c1*ny) 227 | arrH.append(a1*c1*nz) 228 | arrH.append(-a1*c1*nx) 229 | arrH.append(a2*c2*nx*ny) 230 | arrH.append(-a2*c2*ny*nz) 231 | arrH.append(a2*c2*d0*(3*nz.pow(2)-1)) 232 | arrH.append(-a2*c2*nx*nz) 233 | arrH.append(a2*c2*0.5*(nx.pow(2)-ny.pow(2))) 234 | 235 | H = torch.stack(arrH, 1) 236 | Y = H.view(n_b, num_vertex, 9) 237 | 238 | # Y shape:[batch,N,9]. 239 | 240 | # shape:[batch,N,3] 241 | lighting = Y.bmm(gamma) 242 | 243 | face_color = face_texture * lighting 244 | 245 | return face_color, lighting 246 | 247 | 248 | def rigid_transform(face_shape, rotation, translation): 249 | n_b = face_shape.shape[0] 250 | face_shape_r = face_shape.bmm(rotation) # R has been transposed 251 | face_shape_t = face_shape_r + translation.view(n_b, 1, 3) 252 | return face_shape_t 253 | 254 | 255 | def compute_landmarks(face_shape, facemodel): 256 | # compute 3D landmark postitions with pre-computed 3D face shape 257 | keypoints_idx = facemodel.keypoints - 1 258 | face_landmarks = face_shape[:, keypoints_idx, :] 259 | return face_landmarks 260 | 261 | 262 | def compute_3d_landmarks(face_shape, facemodel, angles, translation): 263 | rotation = compute_rotation_matrix(angles) 264 | face_shape_t = rigid_transform(face_shape, rotation, translation) 265 | landmarks_3d = compute_landmarks(face_shape_t, facemodel) 266 | return landmarks_3d 267 | 268 | 269 | def transform_face_shape(face_shape, angles, translation): 270 | rotation = compute_rotation_matrix(angles) 271 | face_shape_t = rigid_transform(face_shape, rotation, translation) 272 | return face_shape_t 273 | 274 | 275 | def render_img(face_shape, face_color, facemodel, image_size=224, fx=1015.0, fy=1015.0, px=112.0, py=112.0, device='cuda:0'): 276 | ''' 277 | ref: https://github.com/facebookresearch/pytorch3d/issues/184 278 | The rendering function (just for test) 279 | Input: 280 | face_shape: Tensor[1, 35709, 3] 281 | face_color: Tensor[1, 35709, 3] in [0, 1] 282 | facemodel: contains `tri` (triangles[70789, 3], index start from 1) 283 | ''' 284 | from pytorch3d.structures import Meshes 285 | from pytorch3d.renderer.mesh.textures import TexturesVertex 286 | from pytorch3d.renderer import ( 287 | PerspectiveCameras, 288 | PointLights, 289 | RasterizationSettings, 290 | MeshRenderer, 291 | MeshRasterizer, 292 | SoftPhongShader, 293 | BlendParams 294 | ) 295 | 296 | face_color = TexturesVertex(verts_features=face_color.to(device)) 297 | face_buf = torch.from_numpy(facemodel.tri - 1) # index start from 1 298 | face_idx = face_buf.unsqueeze(0) 299 | 300 | mesh = Meshes(face_shape.to(device), face_idx.to(device), face_color) 301 | 302 | R = torch.eye(3).view(1, 3, 3).to(device) 303 | R[0, 0, 0] *= -1.0 304 | T = torch.zeros([1, 3]).to(device) 305 | 306 | half_size = (image_size - 1.0) / 2 307 | focal_length = torch.tensor([fx / half_size, fy / half_size], dtype=torch.float32).reshape(1, 2).to(device) 308 | principal_point = torch.tensor([(half_size - px) / half_size, (py - half_size) / half_size], dtype=torch.float32).reshape(1, 2).to(device) 309 | 310 | cameras = PerspectiveCameras( 311 | device=device, 312 | R=R, 313 | T=T, 314 | focal_length=focal_length, 315 | principal_point=principal_point 316 | ) 317 | 318 | raster_settings = RasterizationSettings( 319 | image_size=image_size, 320 | blur_radius=0.0, 321 | faces_per_pixel=1 322 | ) 323 | 324 | lights = PointLights( 325 | device=device, 326 | ambient_color=((1.0, 1.0, 1.0),), 327 | diffuse_color=((0.0, 0.0, 0.0),), 328 | specular_color=((0.0, 0.0, 0.0),), 329 | location=((0.0, 0.0, 1e5),) 330 | ) 331 | 332 | blend_params = BlendParams(background_color=(0.0, 0.0, 0.0)) 333 | 334 | renderer = MeshRenderer( 335 | rasterizer=MeshRasterizer( 336 | cameras=cameras, 337 | raster_settings=raster_settings 338 | ), 339 | shader=SoftPhongShader( 340 | device=device, 341 | cameras=cameras, 342 | lights=lights, 343 | blend_params=blend_params 344 | ) 345 | ) 346 | images = renderer(mesh) 347 | images = torch.clamp(images, 0.0, 1.0) 348 | return images 349 | 350 | 351 | def estimate_intrinsic(landmarks_2d, transform_params, z_buffer, face_shape, facemodel, angles, translation): 352 | # estimate intrinsic parameters 353 | 354 | def re_convert(landmarks_2d, trans_params, origin_size=_need_const.origin_size, target_size=_need_const.target_size): 355 | # convert landmarks to un_cropped images 356 | w = (origin_size * trans_params[2]).astype(np.int32) 357 | h = (origin_size * trans_params[2]).astype(np.int32) 358 | landmarks_2d[:, :, 1] = target_size - 1 - landmarks_2d[:, :, 1] 359 | 360 | landmarks_2d[:, :, 0] = landmarks_2d[:, :, 0] + w / 2 - target_size / 2 361 | landmarks_2d[:, :, 1] = landmarks_2d[:, :, 1] + h / 2 - target_size / 2 362 | 363 | landmarks_2d = landmarks_2d / trans_params[2] 364 | 365 | landmarks_2d[:, :, 0] = landmarks_2d[:, :, 0] + trans_params[3] - origin_size / 2 366 | landmarks_2d[:, :, 1] = landmarks_2d[:, :, 1] + trans_params[4] - origin_size / 2 367 | 368 | landmarks_2d[:, :, 1] = origin_size - 1 - landmarks_2d[:, :, 1] 369 | return landmarks_2d 370 | 371 | def POS(xp, x): 372 | # calculating least sqaures problem 373 | # ref https://github.com/pytorch/pytorch/issues/27036 374 | ls = LeastSquares() 375 | npts = xp.shape[1] 376 | 377 | A = torch.zeros([2*npts, 4]).to(x.device) 378 | A[0:2*npts-1:2, 0:2] = x[0, :, [0, 2]] 379 | A[1:2*npts:2, 2:4] = x[0, :, [1, 2]] 380 | 381 | b = torch.reshape(xp[0], [2*npts, 1]) 382 | 383 | k = ls.lstq(A, b, 0.010) 384 | 385 | fx = k[0, 0] 386 | px = k[1, 0] 387 | fy = k[2, 0] 388 | py = k[3, 0] 389 | return fx, px, fy, py 390 | 391 | # convert landmarks to un_cropped images 392 | landmarks_2d = re_convert(landmarks_2d, transform_params) 393 | landmarks_2d[:, :, 1] = _need_const.origin_size - 1.0 - landmarks_2d[:, :, 1] 394 | landmarks_2d[:, :, :2] = landmarks_2d[:, :, :2] * (_need_const.camera_pos - z_buffer[:, :, :]) 395 | 396 | # compute 3d landmarks 397 | landmarks_3d = compute_3d_landmarks(face_shape, facemodel, angles, translation) 398 | 399 | # compute fx, fy, px, py 400 | landmarks_3d_ = landmarks_3d.clone() 401 | landmarks_3d_[:, :, 2] = _need_const.camera_pos - landmarks_3d_[:, :, 2] 402 | fx, px, fy, py = POS(landmarks_2d, landmarks_3d_) 403 | return fx, px, fy, py 404 | 405 | 406 | def reconstruction(coeff, facemodel): 407 | # The image size is 224 * 224 408 | # face reconstruction with coeff and BFM model 409 | id_coeff, ex_coeff, tex_coeff, angles, gamma, translation = split_coeff(coeff) 410 | 411 | # compute face shape 412 | face_shape = shape_formation(id_coeff, ex_coeff, facemodel) 413 | # compute vertex texture(albedo) 414 | face_texture = texture_formation(tex_coeff, facemodel) 415 | 416 | # vertex normal 417 | face_norm = compute_norm(face_shape, facemodel) 418 | # rotation matrix 419 | rotation = compute_rotation_matrix(angles) 420 | face_norm_r = face_norm.bmm(rotation) 421 | # print(face_norm_r[:, :3, :]) 422 | 423 | # do rigid transformation for face shape using predicted rotation and translation 424 | face_shape_t = rigid_transform(face_shape, rotation, translation) 425 | 426 | # compute 2d landmark projection 427 | face_landmark_t = compute_landmarks(face_shape_t, facemodel) 428 | 429 | # compute 68 landmark on image plane (with image sized 224*224) 430 | landmarks_2d, z_buffer = projection_layer(face_landmark_t) 431 | landmarks_2d[:, :, 1] = _need_const.target_size - 1.0 - landmarks_2d[:, :, 1] 432 | 433 | # compute vertex color using SH function lighting approximation 434 | face_color, lighting = illumination_layer(face_texture, face_norm_r, gamma) 435 | 436 | return face_shape, face_texture, face_color, landmarks_2d, z_buffer, angles, translation, gamma 437 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # This file may be used to create an environment using: 2 | # $ conda create --name --file 3 | # platform: win-64 4 | blas=1.0=mkl 5 | ca-certificates=2020.11.8=h5b45459_0 6 | certifi=2020.11.8=py38haa244fe_0 7 | chardet=3.0.4=pypi_0 8 | cudatoolkit=10.1.243=h74a9793_0 9 | docopt=0.6.2=pypi_0 10 | freetype=2.10.4=hd328e21_0 11 | future=0.18.2=pypi_0 12 | fvcore=0.1.2.post20201111=pypi_0 13 | icc_rt=2020.2=intel_254 14 | idna=2.10=pypi_0 15 | intel-openmp=2020.2=254 16 | jpeg=9b=hb83a4c4_2 17 | libpng=1.6.37=h2a8f88b_0 18 | libtiff=4.1.0=h56a325e_1 19 | lz4-c=1.9.2=hf4a77e7_3 20 | mkl=2020.2=256 21 | mkl-service=2.3.0=py38hb782905_0 22 | mkl_fft=1.2.0=py38h45dec08_0 23 | mkl_random=1.1.1=py38h47e9c7a_0 24 | msys2-conda-epoch=20160418=1 25 | ninja=1.7.2=0 26 | numpy=1.19.2=py38hadc3359_0 27 | numpy-base=1.19.2=py38ha3acd2a_0 28 | olefile=0.46=py_0 29 | openssl=1.1.1h=he774522_0 30 | pillow>=8.1.1 31 | pip=20.2.4=py38haa95532_0 32 | pipreqs=0.4.10=pypi_0 33 | portalocker=2.0.0=pypi_0 34 | python=3.8.5=h5fd99cc_1 35 | python_abi=3.8=1_cp38 36 | pytorch=1.6.0=py3.8_cuda101_cudnn7_0 37 | pytorch3d=0.3.0=pypi_0 38 | pywin32=300=pypi_0 39 | PyYAML>=5.4 40 | requests=2.25.0=pypi_0 41 | scipy=1.5.2=py38h14eb087_0 42 | setuptools=50.3.1=py38haa95532_1 43 | six=1.15.0=py38haa95532_0 44 | sqlite=3.33.0=h2a8f88b_0 45 | tabulate=0.8.7=pyh9f0ad1d_0 46 | termcolor=1.1.0=pypi_0 47 | tk=8.6.10=he774522_0 48 | torchvision=0.7.0=py38_cu101 49 | tqdm=4.52.0=pyhd3deb0d_0 50 | urllib3>=1.26.4 51 | vc=14.1=h0510ff6_4 52 | vs2015_runtime=14.16.27012=hf0eaf9b_3 53 | wheel=0.35.1=pyhd3eb1b0_0 54 | wincertstore=0.2=py38_0 55 | xz=5.2.5=h62dcd97_0 56 | yacs=0.1.8=pypi_0 57 | yaml=0.2.5=he774522_0 58 | yarg=0.1.9=pypi_0 59 | zlib=1.2.11=h62dcd97_4 60 | zstd=1.4.5=h04227a9_0 61 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | 4 | 5 | class LeastSquares: 6 | # https://github.com/pytorch/pytorch/issues/27036 7 | def __init__(self): 8 | pass 9 | 10 | def lstq(self, A, Y, lamb=0.0): 11 | """ 12 | Differentiable least square 13 | :param A: m x n 14 | :param Y: n x 1 15 | """ 16 | # Assuming A to be full column rank 17 | cols = A.shape[1] 18 | if cols == torch.matrix_rank(A): 19 | q, r = torch.qr(A) 20 | x = torch.inverse(r) @ q.T @ Y 21 | else: 22 | A_dash = A.permute(1, 0) @ A + lamb * torch.eye(cols) 23 | Y_dash = A.permute(1, 0) @ Y 24 | x = self.lstq(A_dash, Y_dash) 25 | return x 26 | 27 | 28 | def process_uv(uv_coords): 29 | uv_coords[:, 0] = uv_coords[:, 0] 30 | uv_coords[:, 1] = uv_coords[:, 1] 31 | # uv_coords[:, 1] = uv_h - uv_coords[:, 1] - 1 32 | uv_coords = np.hstack((uv_coords, np.ones((uv_coords.shape[0], 1)))) # add z 33 | return uv_coords 34 | --------------------------------------------------------------------------------