├── .gitignore ├── LICENSE ├── README.md ├── data └── 0502.mp4 ├── demo.py ├── requirements.txt └── src ├── __init__.py ├── config.py ├── evaluation ├── __init__.py ├── run_video.py └── tester_pred.py ├── external ├── install_alphapose.sh └── install_nmr.sh ├── extract_tracks.py ├── models.py ├── omega.py ├── ops.py ├── renderer.py ├── tf_smpl ├── __init__.py ├── batch_lbs.py ├── batch_smpl.py └── projection.py └── util ├── __init__.py ├── common.py ├── render_utils.py ├── smooth_bbox.py ├── torch_utils.py └── video.py /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | __pycache__ 3 | 4 | data 5 | demo_output 6 | models 7 | src/external/AlphaPose 8 | src/external/neural_renderer 9 | venv_* 10 | 11 | *.pyc 12 | *.swp 13 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 2-Clause License 2 | 3 | Copyright (c) 2019, Jason Zhang 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | 1. Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 17 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 18 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 19 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 20 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 21 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 22 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 23 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 24 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 25 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 26 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Predicting 3D Human Dynamics from Video 2 | 3 | Jason Y. Zhang, Panna Felsen, Angjoo Kanazawa, Jitendra Malik 4 | 5 | University of California, Berkeley 6 | 7 | [Project Page](https://jasonyzhang.com/phd/) 8 | 9 | ![Teaser Image](https://jasonyzhang.com/phd/assets/img/overview.jpg) 10 | 11 | Requirements: 12 | * Python 3 (tested on 3.6.8) 13 | * Tensorflow (tested on 1.15) 14 | * Pytorch for NMR (tested on 1.3.0) 15 | * CUDA (tested on 10.0) 16 | * ffmpeg (tested on 3.4.6) 17 | 18 | 19 | ### License: 20 | 21 | Our code is licensed under BSD. Note that the SMPL model and any datasets still 22 | fall under their respective licenses. 23 | 24 | ### Installation: 25 | ```bash 26 | virtualenv venv_phd -p python3 27 | source venv_phd/bin/activate 28 | pip install -U pip 29 | pip install numpy tensorflow-gpu==1.15.0 30 | pip install torch==1.3.0 # Make sure the wheel corresponds to your CUDA Version 31 | pip install -r requirements.txt 32 | cd src/external 33 | sh install_nmr.sh 34 | ``` 35 | 36 | Download the model weights from [this Google Drive link](https://drive.google.com/file/d/1_sipXE-FNs_08YCPFxFlLauHJcqzny7x/view?usp=sharing). 37 | You should place them in `phd/models`. 38 | 39 | 40 | ## Running Demo 41 | 42 | ### Penn Action 43 | 44 | Download the [Penn Action dataset](http://dreamdragon.github.io/PennAction/). 45 | You should place or symlink the dataset to `phd/data/Penn_Action`. 46 | 47 | #### Running on one subsequence 48 | `--vid_id 0104` runs the model on video 0104 in Penn Action. The public model is 49 | conditioned on 15 images, so `--start_frame 60` starts the conditioning window 50 | at 60, and future prediction will start on frame 76. `--ar_length 25` sets the 51 | number of future predictions at 25, which is the prediction length the model 52 | was trained on. You can also try increasing `ar_length`, which usually looks 53 | reasonable until 35. 54 | 55 | ``` 56 | python demo.py --load_path models/phd.ckpt-199269 --vid_id 0104 --ar_length 25 --start_frame 60 57 | ``` 58 | 59 | For reference, [this](https://jasonyzhang.com/phd/assets/vid/penn_action-0104_AR25_60-100_fps5.mp4) 60 | should be your output. 61 | 62 | #### Running on multiple subsequences 63 | 64 | You can also run at multiple starting points in the same sequence. 65 | `--start_frame 0 --skip_rate 5` will run starting at frame 0, frame 5, frame 10, 66 | etc. 67 | 68 | ``` 69 | python demo.py --load_path models/phd.ckpt-199269 --vid_id 0104 --ar_length 25 --start_frame 0 --skip_rate 5 70 | ``` 71 | For reference, [this](https://jasonyzhang.com/phd/assets/vid/0104.zip) should be your output. 72 | 73 | 74 | ### Running on Any Video 75 | 76 | To run on a generic video, you will need a tracklet around the person. We extract the tracklet using PoseFlow. 77 | 78 | Follow directions to download AlphaPose and Model Weights from https://github.com/MVIG-SJTU/AlphaPose. 79 | 80 | Roughly, that should entail: 81 | 1. Clone the repo to `src/external` 82 | 2. Build AlphaPose using `python setup.py build develop --user` 83 | 3. Download pre-trained weights to the specified directories. Use the ResNet50 Fast Pose from the Model Zoo. 84 | 85 | Steps 1. and 2. can be done by running `sh install_alphapose.sh` in `src/external` 86 | 87 | Now you should be able to run the model on any video, eg: 88 | ``` 89 | python demo.py --load_path models/phd.ckpt-199269 --vid_path data/0502.mp4 --start_frame 0 --ar_length 25 90 | ``` 91 | 92 | ## Training Code 93 | 94 | Coming soon 95 | -------------------------------------------------------------------------------- /data/0502.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jasonyzhang/phd/7b7f526d45913902ed93cdc49fdd59272698bd71/data/0502.mp4 -------------------------------------------------------------------------------- /demo.py: -------------------------------------------------------------------------------- 1 | """ 2 | Runs PHD on Penn Action video. 3 | """ 4 | 5 | from glob import glob 6 | import os 7 | import os.path as osp 8 | import warnings 9 | 10 | from absl import flags 11 | import numpy as np 12 | from scipy.io import loadmat 13 | import tensorflow as tf 14 | 15 | from src.config import get_config 16 | from src.evaluation.run_video import ( 17 | process_image, 18 | process_videos, 19 | run_predictions, 20 | ) 21 | from src.evaluation.tester_pred import TesterPred 22 | from src.extract_tracks import ( 23 | compute_tracks, 24 | get_labels_poseflow, 25 | ) 26 | from src.renderer import VisRenderer 27 | from src.util.smooth_bbox import get_smooth_bbox_params 28 | 29 | flags.DEFINE_string('dataset', '', 30 | 'Dataset to use. Leave blank if using PoseFlow to extract' 31 | ' tracks. Otherwise can set to "penn_action".') 32 | flags.DEFINE_string('vid_id', '0001', 'Video id number if using Penn Action.') 33 | flags.DEFINE_string('vid_path', 'data/0504.mp4', 34 | 'Path to filename if using PoseFlow for tracks') 35 | 36 | flags.DEFINE_integer('ar_length', 25, 'Number of steps into future to predict.') 37 | flags.DEFINE_integer('start_frame', 0, 'First frame of conditioning.') 38 | flags.DEFINE_integer('skip_rate', None, 39 | 'If set, will be used for choosing subsequences.') 40 | flags.DEFINE_integer('fps', 5, 'Frames per second in rendered video.') 41 | flags.DEFINE_integer('degrees', '60', 'Angle for rotated viewpoint.') 42 | flags.DEFINE_string('mesh_color', 'blue', 'Color of mesh.') 43 | 44 | flags.DEFINE_string('out_dir', 'demo_output', 'Where to save final PHD videos.') 45 | flags.DEFINE_string('penn_dir', 'data/Penn_Action', 46 | 'Directory where Penn Action is saved.') 47 | 48 | NUM_CONDITION = 15 49 | 50 | # Hides some TF warnings. 51 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 52 | tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR) 53 | 54 | 55 | def load_penn_video(penn_dir, vid_id): 56 | im_paths = sorted(glob(osp.join(penn_dir, 'frames', vid_id, '*.jpg'))) 57 | labels = loadmat(osp.join(penn_dir, 'labels', '{}.mat'.format(vid_id))) 58 | if np.all(labels['train'] == 1): 59 | print('Warning: {} is a training sequence!'.format(vid_id)) 60 | kps = np.dstack((labels['x'], labels['y'], labels['visibility'])) 61 | return im_paths, kps 62 | 63 | 64 | def load_poseflow_video(vid_path, out_dir): 65 | track_json, im_dir = compute_tracks(vid_path=vid_path, out_dir=out_dir) 66 | im_paths = sorted(glob(osp.join(im_dir, '*.png'))) 67 | kps = get_labels_poseflow( 68 | json_path=track_json, 69 | num_frames=len(im_paths), 70 | min_kp_count=NUM_CONDITION, 71 | ) 72 | return im_paths, kps 73 | 74 | 75 | def main(model): 76 | # Keypoints are only used to compute the bounding box around human tracks. 77 | # They are not fed into the model. Keypoint format is [x, y, vis]. Keypoint 78 | # order doesn't matter. 79 | if config.dataset == '': 80 | im_paths, kps = load_poseflow_video(config.vid_path, config.out_dir) 81 | vis_thresh = 0.1 82 | elif config.dataset == 'penn_action': 83 | im_paths, kps = load_penn_video(config.penn_dir, config.vid_id) 84 | vis_thresh = 0.5 85 | else: 86 | raise Exception('Dataset {} not recognized'.format(config.dataset)) 87 | bbox_params_smooth, s, e = get_smooth_bbox_params(kps, vis_thresh) 88 | images = [] 89 | min_f = max(s, 0) 90 | max_f = min(e, len(kps)) 91 | for i in range(min_f, max_f): 92 | images.append(process_image( 93 | im_path=im_paths[i], 94 | bbox_param=bbox_params_smooth[i] 95 | )) 96 | all_images, vid_paths = process_videos( 97 | config=config, 98 | images=images, 99 | T=(NUM_CONDITION + config.ar_length), 100 | suffix='AR{}'.format(config.ar_length), 101 | ) 102 | if not osp.exists(config.out_dir): 103 | os.mkdir(config.out_dir) 104 | renderer = VisRenderer(img_size=224) 105 | for i in range(0, len(all_images), config.batch_size): 106 | run_predictions( 107 | config=config, 108 | renderer=renderer, 109 | model=model, 110 | images=all_images[i : i + config.batch_size], 111 | vid_paths=vid_paths[i : i + config.batch_size], 112 | num_condition=NUM_CONDITION, 113 | ) 114 | 115 | 116 | if __name__ == '__main__': 117 | config = get_config() 118 | if config.skip_rate is None: 119 | setattr(config, 'batch_size', 1) 120 | model = TesterPred( 121 | config, 122 | sequence_length=(NUM_CONDITION + config.ar_length), 123 | resnet_path='models/hmr_noS5.ckpt-642561', 124 | ) 125 | main(model) 126 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # python requirements 2 | pip>=9.0 3 | absl-py==0.7.1 4 | chumpy==0.68 5 | deepdish==0.3.6 6 | ipdb==0.12 7 | matplotlib==3.0.3 8 | # neural_renderer_pytorch==1.1.3 # Works better if you build from source 9 | numpy 10 | opencv-python==4.2.0.32 11 | pillow<7 12 | scikit-image==0.15.0 13 | scipy==1.2.1 14 | # tensorflow-gpu==1.14 15 | torch==1.3.0 16 | torchvision==0.4.1 17 | tqdm==4.19.9 18 | 19 | # For AlphaPose/Poseflow 20 | cython 21 | munkres==1.0.12 22 | -------------------------------------------------------------------------------- /src/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jasonyzhang/phd/7b7f526d45913902ed93cdc49fdd59272698bd71/src/__init__.py -------------------------------------------------------------------------------- /src/config.py: -------------------------------------------------------------------------------- 1 | import os.path as osp 2 | import sys 3 | 4 | from absl import flags 5 | 6 | 7 | curr_path = osp.dirname(osp.abspath(__file__)) 8 | model_dir = osp.join(curr_path, '..', 'models') 9 | if not osp.exists(model_dir): 10 | print('Fix path to models/') 11 | import ipdb 12 | ipdb.set_trace() 13 | SMPL_MODEL_PATH = osp.join(model_dir, 14 | 'neutral_smpl_with_cocoplustoesankles_reg.pkl') 15 | SMPL_FACE_PATH = osp.join(curr_path, '../src/tf_smpl', 'smpl_faces.npy') 16 | 17 | # Default pred-trained model path for the demo. 18 | PRETRAINED_MODEL = osp.join(model_dir, 'model.ckpt-667589') 19 | 20 | flags.DEFINE_string('smpl_model_path', SMPL_MODEL_PATH, 21 | 'path to the neutral smpl model') 22 | flags.DEFINE_string('smpl_face_path', SMPL_FACE_PATH, 23 | 'path to smpl mesh faces (for easy rendering)') 24 | 25 | # Model details 26 | 27 | flags.DEFINE_string('load_path', None, 'path to trained model dir') 28 | flags.DEFINE_integer('batch_size', 8, 'Size of mini-batch.') 29 | flags.DEFINE_integer('num_conv_layers', 3, '# of layers for convolutional') 30 | flags.DEFINE_boolean('use_delta_from_pred', True, 31 | 'If True, initialize delta regressor from pred.') 32 | flags.DEFINE_boolean('pad_edges', False, 'If True, edge pad, else zero pad.') 33 | flags.DEFINE_bool('use_optcam', True, 34 | 'If True, kp reprojection uses optimal camera.') 35 | flags.DEFINE_integer('num_kps', 25, 'Number of keypoints.') 36 | 37 | 38 | # For training. 39 | flags.DEFINE_string('data_dir', None, 'Where tfrecords are saved') 40 | flags.DEFINE_string('model_dir', None, 41 | 'Where model will be saved -- filled automatically') 42 | flags.DEFINE_list('datasets', ['h36m', 'penn_action', 'insta_variety'], 43 | 'datasets to use for training') 44 | flags.DEFINE_list('mocap_datasets', ['CMU', 'H3.6', 'jointLim'], 45 | 'datasets to use for adversarial prior training') 46 | flags.DEFINE_list('pretrained_model_path', [PRETRAINED_MODEL], 47 | 'if not None, fine-tunes from this ckpt') 48 | flags.DEFINE_string('image_encoder_model_type', 'resnet', 49 | 'Specifies which image encoder to use') 50 | flags.DEFINE_string('temporal_encoder_type', 'AZ_FC2GN', 51 | 'Specifies which network to use for temporal encoding') 52 | flags.DEFINE_integer('img_size', 224, 53 | 'Input image size to the network after preprocessing') 54 | flags.DEFINE_integer('num_stage', 3, '# of times to iterate IEF regressor') 55 | flags.DEFINE_integer('max_iteration', 5000000, '# of max iteration to train') 56 | flags.DEFINE_integer('log_img_count', 10, 57 | 'Number of images in sequence to visualize') 58 | flags.DEFINE_integer('log_img_step', 5000, 59 | 'How often to visualize img during training') 60 | 61 | # Random seed 62 | flags.DEFINE_integer('seed', 1, 'Graph-level random seed') 63 | 64 | 65 | def get_config(): 66 | config = flags.FLAGS 67 | config(sys.argv) 68 | return config 69 | -------------------------------------------------------------------------------- /src/evaluation/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jasonyzhang/phd/7b7f526d45913902ed93cdc49fdd59272698bd71/src/evaluation/__init__.py -------------------------------------------------------------------------------- /src/evaluation/run_video.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import cv2 4 | import ipdb 5 | import matplotlib.pyplot as plt 6 | import numpy as np 7 | from skimage.io import imread 8 | from tqdm import tqdm 9 | 10 | import src.util.render_utils as vis_util 11 | from src.util.common import resize_img 12 | from src.util.video import VideoWriter 13 | 14 | IMG_SIZE = 224 15 | 16 | 17 | def get_output_path_name(config, vid_id, suffix='', inds=()): 18 | """ 19 | Returns the output video's path. 20 | 21 | Args: 22 | config: Configuration. 23 | vid_id (str): Id of video. 24 | suffix (str). 25 | inds (tuple): Indices of start and end of conditioning. 26 | 27 | Returns: 28 | str: Output video path name. 29 | """ 30 | if inds: 31 | suffix += '_' + '-'.join(map(str, inds)) 32 | suffix += '_fps{}'.format(config.fps) 33 | if config.dataset: 34 | output_name = '{dataset}-{vid}_{suf}.mp4'.format( 35 | dataset=config.dataset, 36 | vid=vid_id, 37 | suf=suffix, 38 | ) 39 | else: 40 | output_name = '{vid}_{suf}.mp4'.format( 41 | vid=os.path.basename(config.vid_path).split('.')[0], 42 | suf=suffix, 43 | ) 44 | 45 | return output_name 46 | 47 | 48 | def process_videos(config, images, T, suffix=''): 49 | """ 50 | 51 | Args: 52 | config: Configuration. 53 | images (list): list of images. 54 | T (int): Sequence length (conditioning length + AR length). 55 | 56 | Returns: 57 | all_images: List of list of images, each corresponding to a subseq 58 | all_vid_paths: Video names corresponding to each subseq. 59 | """ 60 | start_frame = config.start_frame 61 | skip_rate = config.skip_rate 62 | n = len(images) 63 | if skip_rate is None: 64 | if start_frame + T > n: 65 | # In case we want to run past edge of video. 66 | images.extend([np.zeros((IMG_SIZE, IMG_SIZE, 3))] * T) 67 | n += T 68 | starts = [start_frame] 69 | else: 70 | starts = np.arange(start_frame, n - T, skip_rate, dtype=int) 71 | all_images, all_vid_paths = [], [] 72 | for start in starts: 73 | end = start + T 74 | if n < end: 75 | print('Too short!') 76 | ims = images[start:end] 77 | vid_path = get_output_path_name( 78 | config=config, 79 | vid_id=config.vid_id.replace('mp4', ''), 80 | suffix=suffix, 81 | inds=(start, end), 82 | ) 83 | all_images.append(ims) 84 | all_vid_paths.append(os.path.join(config.out_dir, vid_path)) 85 | while len(all_images) % config.batch_size != 0: 86 | # Pad with garbage so that we fill up the batch. 87 | all_images.append(np.zeros((T, 224, 224, 3))) 88 | all_vid_paths.append('') 89 | return all_images, all_vid_paths 90 | 91 | 92 | def process_image(im_path, bbox_param): 93 | """ 94 | Processes an image, producing 224x224 crop. 95 | Args: 96 | im_path (str). 97 | bbox_param (3,): [cx, cy, scale]. 98 | 99 | Returns: 100 | image 101 | """ 102 | image = imread(im_path) 103 | center = bbox_param[:2] 104 | scale = bbox_param[2] 105 | 106 | # Pre-process image to [-1, 1] 107 | image = ((image / 255.) - 0.5) * 2 108 | image_scaled, scale_factors = resize_img(image, scale) 109 | center_scaled = np.round(center * scale_factors).astype(np.int) 110 | 111 | # Make sure there is enough space to crop 224x224. 112 | image_padded = np.pad( 113 | array=image_scaled, 114 | pad_width=((IMG_SIZE,), (IMG_SIZE,), (0,)), 115 | mode='edge' 116 | ) 117 | height, width = image_padded.shape[:2] 118 | center_scaled += IMG_SIZE 119 | 120 | # Crop 224x224 around the center. 121 | margin = IMG_SIZE // 2 122 | 123 | start_pt = (center_scaled - margin).astype(int) 124 | end_pt = (center_scaled + margin).astype(int) 125 | end_pt[0] = min(end_pt[0], width) 126 | end_pt[1] = min(end_pt[1], height) 127 | image_scaled = image_padded[start_pt[1]:end_pt[1], 128 | start_pt[0]:end_pt[0], :] 129 | return image_scaled 130 | 131 | 132 | def run_predictions(config, renderer, model, images, vid_paths, num_condition): 133 | """ 134 | 135 | Args: 136 | config: Configuration. 137 | renderer (VisRenderer). 138 | model (TesterPred). 139 | images (ndarray): B x T x H x W x 3. 140 | vid_paths (B). 141 | ar_length (int): Number of times to run auto-regressive prediction. 142 | num_condition (int): Condition length. 143 | """ 144 | fov = model.fov 145 | ar_length = config.ar_length 146 | images = np.array(images) 147 | preds_gt = model.predict_movie_strips(images, get_smpl=True) 148 | movie_strips = preds_gt['movie_strips_cond'] 149 | movie_strips = movie_strips[:, num_condition - fov: num_condition] 150 | verts_gt = preds_gt['verts'][:, -ar_length - fov:] # B x (ar+f) x 6980 x 3 151 | verts = [] 152 | for _ in tqdm(range(ar_length)): 153 | preds = model.predict_auto_regressive(movie_strips[:, -fov:]) 154 | movie_strips = np.concatenate(( 155 | movie_strips, 156 | preds['movie_strip'], # B x 1 x 2048 157 | ), axis=1) 158 | verts.append(np.squeeze(preds['verts'], axis=1)) 159 | 160 | verts = np.array(verts) # ar x B x 6980 x 3! 161 | 162 | for i in range(len(images)): 163 | if vid_paths[i] == '': 164 | continue 165 | render_results( 166 | config=config, 167 | renderer=renderer, 168 | fov=fov, 169 | vid_path=vid_paths[i], 170 | images=images[i][-model.fov - ar_length:], 171 | verts=verts[:, i], 172 | verts_gt=verts_gt[i], 173 | ) 174 | 175 | 176 | def render_results(config, renderer, fov, vid_path, images, verts, verts_gt): 177 | """ 178 | 179 | Args: 180 | config 181 | renderer (VisRenderer). 182 | fov (int). 183 | images ((f+ar) x H x W x 3). 184 | verts (ar x 6980 x 3): Predicted vertices from auto-regressive model. 185 | verts_gt ((f+ar) x 6980 x 3): Predicted vertices from real movie strips. 186 | """ 187 | print('Rendering', vid_path) 188 | writer = VideoWriter(output_path=vid_path, fps=config.fps) 189 | images = (images + 1) * 0.5 190 | for i, im in tqdm(enumerate(images)): 191 | if im.shape[0] != IMG_SIZE: 192 | im = cv2.resize(im, (IMG_SIZE, IMG_SIZE)) 193 | im = vis_util.draw_text(im, {'T': i - fov + 1}) 194 | if i < fov: 195 | vert = verts_gt[i] 196 | color = 'yellow' 197 | im = vis_util.add_alpha(im) 198 | else: 199 | vert = verts[i - fov] 200 | color = config.mesh_color 201 | im = vis_util.add_alpha(im, 0.7) 202 | mesh = renderer( 203 | verts=vert, 204 | color_name=color, 205 | alpha=True, 206 | cam=np.array([0.7, 0, 0]), 207 | ) / 255. 208 | rot = renderer.rotated( 209 | verts=vert, 210 | deg=config.degrees, 211 | color_name=color, 212 | alpha=True, 213 | cam=np.array([0.7, 0, 0]), 214 | ) / 255. 215 | combined = np.hstack((im, mesh, rot)) 216 | writer.add_image(combined) 217 | writer.make_video() 218 | writer.close() 219 | -------------------------------------------------------------------------------- /src/evaluation/tester_pred.py: -------------------------------------------------------------------------------- 1 | import os 2 | import os.path as osp 3 | 4 | import deepdish as dd 5 | import numpy as np 6 | import tensorflow as tf 7 | 8 | from src.models import ( 9 | batch_pred_omega, 10 | get_image_encoder, 11 | get_prediction_model, 12 | get_temporal_encoder, 13 | ) 14 | from src.omega import ( 15 | OmegasPred, 16 | ) 17 | from src.tf_smpl.batch_smpl import SMPL 18 | 19 | 20 | class TesterPred(object): 21 | 22 | def __init__(self, config, sequence_length, resnet_path='', sess=None, 23 | precomputed_phi=False): 24 | self.config = config 25 | self.load_path = config.load_path 26 | tf.set_random_seed(config.seed) 27 | 28 | self.num_conv_layers = 3 29 | self.fov = self.num_conv_layers * 4 + 1 30 | self.sequence_length = sequence_length 31 | self.use_delta_from_pred = config.use_delta_from_pred 32 | self.use_optcam = config.use_optcam 33 | self.precomputed_phi = precomputed_phi 34 | 35 | # Config + path 36 | if not config.load_path: 37 | raise Exception( 38 | 'You need to specify `load_path` to load a pretrained model' 39 | ) 40 | if not osp.exists(config.load_path + '.index'): 41 | print('{} doesnt exist'.format(config.load_path)) 42 | import ipdb 43 | ipdb.set_trace() 44 | 45 | # Data 46 | self.batch_size = config.batch_size 47 | self.img_size = config.img_size 48 | self.E_var = [] 49 | self.pad_edges = config.pad_edges 50 | 51 | self.smpl_model_path = config.smpl_model_path 52 | self.use_hmr_ief = False 53 | 54 | self.num_output = 85 55 | 56 | if precomputed_phi: 57 | input_size = (self.batch_size, self.sequence_length, 2048) 58 | else: 59 | input_size = (self.batch_size, self.sequence_length, 60 | self.img_size, self.img_size, 3) 61 | self.images_pl = tf.placeholder(tf.float32, shape=input_size) 62 | 63 | strip_size = (self.batch_size, self.fov, 2048) 64 | self.movie_strips_pl = tf.placeholder(tf.float32, shape=strip_size) 65 | 66 | # Model Spec 67 | self.f_image_enc = get_image_encoder() 68 | self.f_temporal_enc = get_temporal_encoder() 69 | self.f_prediction_ar = get_prediction_model() 70 | 71 | self.smpl = SMPL(self.smpl_model_path) 72 | self.omegas_movie_strip = self.make_omega_pred() 73 | self.omegas_pred = self.make_omega_pred(use_optcam=True) 74 | 75 | # HMR Model Params 76 | self.num_stage = 3 77 | self.total_params = 85 78 | 79 | self.load_mean_omega() 80 | self.build_temporal_encoder_model() 81 | self.build_auto_regressive_model() 82 | self.update_E_vars() 83 | 84 | if sess is None: 85 | options = tf.GPUOptions(per_process_gpu_memory_fraction=0.7) 86 | self.sess = tf.Session(config=tf.ConfigProto(gpu_options=options)) 87 | else: 88 | self.sess = sess 89 | 90 | # Load data. 91 | self.prepare(resnet_path) 92 | 93 | def make_omega_pred(self, use_optcam=False): 94 | return OmegasPred( 95 | config=self.config, 96 | smpl=self.smpl, 97 | use_optcam=use_optcam, 98 | vis_max_batch=self.batch_size, 99 | is_training=False, 100 | ) 101 | 102 | def update_E_vars(self): 103 | trainable_vars = tf.contrib.framework.get_variables() 104 | trainable_vars_e = [var for var in trainable_vars 105 | if var.name[:2] != 'D_'] 106 | self.E_var.extend(trainable_vars_e) 107 | 108 | def load_mean_omega(self): 109 | # Initialize scale at 0.9 110 | mean_path = os.path.join(os.path.dirname(self.smpl_model_path), 111 | 'neutral_smpl_meanwjoints.h5') 112 | mean_vals = dd.io.load(mean_path) 113 | 114 | mean_cams = [0.9, 0, 0] 115 | # 72D 116 | mean_pose = mean_vals['pose'] 117 | mean_pose[:3] = 0. 118 | mean_pose[0] = np.pi 119 | # 10D 120 | mean_shape = mean_vals['shape'] 121 | 122 | mean_vals = np.hstack((mean_cams, mean_pose, mean_shape)) 123 | # Needs to be 1 x 85 124 | mean_vals = np.expand_dims(mean_vals, 0) 125 | self.mean_var = tf.Variable( 126 | mean_vals, 127 | name='mean_param', 128 | dtype=tf.float32, 129 | trainable=True 130 | ) 131 | mean_cams = self.mean_var[0, :3] 132 | mean_pose = self.mean_var[0, 3:3+72] 133 | mean_shape = self.mean_var[0, 3+72:] 134 | 135 | self.mean_vars = [mean_cams, mean_pose, mean_shape] 136 | mean_cams = tf.tile(tf.reshape(mean_cams, (1,-1)), 137 | (self.batch_size, 1)) 138 | mean_shape = tf.tile(tf.reshape(mean_shape, (1,-1)), 139 | (self.batch_size, 1)) 140 | mean_pose = tf.tile(tf.reshape(mean_pose, (1, 24, 3)), 141 | (self.batch_size, 1, 1)) 142 | _, mean_joints3d, mean_poses_rot = self.smpl( 143 | mean_shape, mean_pose, get_skin=True) 144 | 145 | # Starting point for IEF. 146 | self.theta_mean = tf.concat(( 147 | mean_cams, 148 | tf.reshape(mean_pose, (-1, 72)), 149 | mean_shape 150 | ), axis=1) 151 | 152 | def prepare(self, resnet_path=''): 153 | """ 154 | Restores variables from checkpoint. 155 | 156 | Args: 157 | resnet_path (str): Optional path to load resnet weights. 158 | """ 159 | if resnet_path and not self.precomputed_phi: 160 | print('Restoring resnet vars from', resnet_path) 161 | resnet_vars = [] 162 | e_vars = [] 163 | for var in self.E_var: 164 | if 'resnet' in var.name: 165 | resnet_vars.append(var) 166 | else: 167 | e_vars.append(var) 168 | resnet_saver = tf.train.Saver(resnet_vars) 169 | resnet_saver.restore(self.sess, resnet_path) 170 | else: 171 | e_vars = self.E_var 172 | print('Restoring checkpoint ', self.load_path) 173 | 174 | saver = tf.train.Saver(e_vars) 175 | saver.restore(self.sess, self.load_path) 176 | self.sess.run(self.mean_vars) 177 | 178 | def build_temporal_encoder_model(self): 179 | B, T = self.batch_size, self.sequence_length 180 | if self.precomputed_phi: 181 | print('loading pre-computed phi!') 182 | self.img_feat_full = self.images_pl 183 | else: 184 | print('Getting all image features...') 185 | I_t = tf.reshape( 186 | self.images_pl, 187 | (B * T, self.img_size, self.img_size, 3) 188 | ) 189 | img_feat, phi_var_scope = self.f_image_enc( 190 | I_t, 191 | is_training=False, 192 | reuse=False, 193 | ) 194 | self.img_feat_full = tf.reshape(img_feat, (B, T, -1)) 195 | 196 | omega_mean = tf.tile(self.theta_mean, (self.sequence_length, 1)) 197 | 198 | # At training time, we only use first 40. Want to make sure GN 199 | # statistics are right. 200 | self.movie_strips_cond = self.f_temporal_enc( 201 | net=self.img_feat_full[:, :40], 202 | num_conv_layers=self.num_conv_layers, 203 | prefix='', 204 | reuse=None, 205 | ) 206 | 207 | self.movie_strips = self.f_temporal_enc( 208 | net=self.img_feat_full, 209 | num_conv_layers=self.num_conv_layers, 210 | prefix='', 211 | reuse=True, 212 | ) 213 | 214 | omega_movie_strip, _ = batch_pred_omega( 215 | input_features=self.movie_strips, 216 | batch_size=B, 217 | sequence_length=T, 218 | num_output=self.num_output, 219 | is_training=False, 220 | omega_mean=omega_mean, 221 | scope='single_view_ief', 222 | use_delta_from_pred=self.use_delta_from_pred, 223 | use_optcam=self.use_optcam, 224 | ) 225 | 226 | self.omegas_movie_strip.append_batched(omega_movie_strip) 227 | self.omegas_movie_strip.compute_smpl() 228 | 229 | def build_auto_regressive_model(self): 230 | omega_mean = tf.tile(self.theta_mean, (self.fov, 1)) 231 | input_movie_strips = self.movie_strips_pl # B x 13 x 2048 232 | movie_strip_pred = self.f_prediction_ar( 233 | net=input_movie_strips, 234 | num_conv_layers=self.num_conv_layers, 235 | prefix='pred_', 236 | reuse=None, 237 | ) 238 | omega_pred, _ = batch_pred_omega( 239 | input_features=movie_strip_pred, 240 | batch_size=self.batch_size, 241 | is_training=False, 242 | num_output=self.num_output, 243 | omega_mean=omega_mean, 244 | sequence_length=self.fov, 245 | scope='single_view_ief', 246 | predict_delta_keys=(), 247 | use_delta_from_pred=self.use_delta_from_pred, 248 | use_optcam=self.use_optcam, 249 | ) 250 | # Only want the last entry. 251 | self.movie_strip_pred = movie_strip_pred[:, -1:] 252 | self.omegas_pred.append_batched(omega_pred[:, -1:]) 253 | self.omegas_pred.compute_smpl() 254 | 255 | def make_fetch_dict(self, omegas, suffix=''): 256 | return { 257 | # Predictions. 258 | 'cams' + suffix: omegas.get_cams(), 259 | 'joints' + suffix: omegas.get_joints(), 260 | 'kps' + suffix: omegas.get_kps(), 261 | 'poses' + suffix: omegas.get_poses_rot(), 262 | 'shapes' + suffix: omegas.get_shapes(), 263 | 'verts' + suffix: omegas.get_verts(), 264 | 'omegas' + suffix: omegas.get_raw(), 265 | } 266 | 267 | def predict_movie_strips(self, images, get_smpl=False): 268 | """ 269 | Converts images to movie strip representation. Number of images should 270 | be equal to sequence_length. If precomputed_phi, images should be phis. 271 | 272 | Args: 273 | images (B x (2*fov-1) x H x W x 3) or (B x (2*fov-1) x 2048). 274 | get_smpl (bool): If True, returns all the smpl stuff. 275 | 276 | Returns: 277 | Movie strips (B x (2*fov-1) x 2048). 278 | """ 279 | feed_dict = { 280 | self.images_pl: images, 281 | } 282 | fetch_dict = { 283 | 'movie_strips': self.movie_strips, 284 | 'movie_strips_cond': self.movie_strips_cond, 285 | } 286 | if get_smpl: 287 | fetch_dict.update(self.make_fetch_dict(self.omegas_movie_strip)) 288 | return self.sess.run(fetch_dict, feed_dict) 289 | 290 | def predict_auto_regressive(self, movie_strips): 291 | """ 292 | Predicts the next time step in an auto-regressive manner. 293 | 294 | Args: 295 | movie_strips (B x fov x 2048). 296 | 297 | Returns: 298 | 299 | """ 300 | feed_dict = { 301 | self.movie_strips_pl: movie_strips, 302 | } 303 | 304 | fetch_dict = { 305 | 'movie_strip': self.movie_strip_pred, 306 | } 307 | fetch_dict.update(self.make_fetch_dict(self.omegas_pred)) 308 | result = self.sess.run(fetch_dict, feed_dict) 309 | return result 310 | -------------------------------------------------------------------------------- /src/external/install_alphapose.sh: -------------------------------------------------------------------------------- 1 | git clone git@github.com:MVIG-SJTU/AlphaPose.git 2 | cd AlphaPose 3 | git fetch origin 38e00c688023282304462b5b6da98248e798842e # API tested with this commit. 4 | python setup.py build develop --user 5 | echo "" 6 | echo "Don't forget to download the pre-trained model and configs!" 7 | 8 | -------------------------------------------------------------------------------- /src/external/install_nmr.sh: -------------------------------------------------------------------------------- 1 | git clone git@github.com:daniilidis-group/neural_renderer.git 2 | cd neural_renderer 3 | python setup.py build develop 4 | -------------------------------------------------------------------------------- /src/extract_tracks.py: -------------------------------------------------------------------------------- 1 | """ 2 | Given a directory of videos, extracts 2D pose tracklet using AlphaPose/PoseFlow 3 | for each video. 4 | Make sure you have installed AlphaPose in src/external. 5 | This script is basically a wrapper around AlphaPose/PoseFlow: 6 | 1. Split the video into individual frames since PoseFlow requires that format 7 | 2. Run AlphaPose on the produced directory with frames 8 | 3. Run PoseFlow on the AlphaPose output. 9 | Therefore, if at any point this script fails, please look into each system cmd 10 | that's printed prior to running them. Make sure you can run those commands on 11 | their own. 12 | """ 13 | import json 14 | import os 15 | import os.path as osp 16 | import re 17 | import subprocess 18 | from glob import glob 19 | 20 | import numpy as np 21 | 22 | 23 | def dump_frames(vid_path, out_dir): 24 | """ 25 | Extracts all frames from the video at vid_path and saves them inside of 26 | out_dir. 27 | """ 28 | if len(glob(osp.join(out_dir, '*.png'))) > 0: 29 | print('Image frames already exist!') 30 | return 31 | 32 | print('{} Writing frames to file'.format(vid_path)) 33 | 34 | cmd = [ 35 | 'ffmpeg', 36 | '-i', vid_path, 37 | '-start_number', '0', 38 | '{temp_dir}/frame%08d.png'.format(temp_dir=out_dir), 39 | ] 40 | print(' '.join(cmd)) 41 | subprocess.call(cmd) 42 | 43 | 44 | def run_alphapose(img_dir, out_dir): 45 | if osp.exists(osp.join(out_dir, 'alphapose-results.json')): 46 | print('Alpha Pose already run!') 47 | return 48 | 49 | print('----------') 50 | print('Computing per-frame results with AlphaPose') 51 | 52 | # Ex: 53 | # python3 demo.py --indir data/0502/ --outdir data/0502/alphapose --sp \ 54 | # --cfg pretrained_models/256x192_res50_lr1e-3_1x.yaml \ 55 | # --checkpoint pretrained_models/fast_res50_256x192.pth 56 | cmd = [ 57 | 'python', 'scripts/demo_inference.py', 58 | '--indir', img_dir, 59 | '--outdir', out_dir, 60 | '--sp', # Needed to avoid multi-processing issues. 61 | # Update thees if you used a different model from the Model Zoo. 62 | '--cfg', 'pretrained_models/256x192_res50_lr1e-3_1x.yaml', 63 | '--checkpoint', 'pretrained_models/fast_res50_256x192.pth', 64 | # '--save_img', # Uncomment this if you want to visualize poses. 65 | ] 66 | 67 | print('Running: {}'.format(' '.join(cmd))) 68 | curr_dir = os.getcwd() 69 | os.chdir('src/external/AlphaPose') 70 | ret = subprocess.call(cmd) 71 | if ret != 0: 72 | print('Issue running alphapose. Please make sure you can run the above ' 73 | 'command from the commandline.') 74 | exit(ret) 75 | os.chdir(curr_dir) 76 | print('AlphaPose successfully ran!') 77 | print('----------') 78 | 79 | 80 | def run_poseflow(img_dir, out_dir): 81 | alphapose_json = osp.join(out_dir, 'alphapose', 'alphapose-results.json') 82 | out_json = osp.join(out_dir, 'poseflow', 'poseflow-results-tracked.json') 83 | if osp.exists(out_json): 84 | print('PoseFlow already run!') 85 | return out_json 86 | 87 | print('Computing tracking with PoseFlow') 88 | 89 | # Ex: 90 | # python PoseFlow/tracker-general.py --imgdir data/0502/ \ 91 | # --in_json demo_output/0502/alphapose/alphapose-results.json \ 92 | # --out_json demo_output/0502/poseflow/poseflow-results-tracked.json \ 93 | # --visdir demo_output/0502/poseflow/ 94 | cmd = [ 95 | 'python', 'PoseFlow/tracker-general.py', 96 | '--imgdir', img_dir, 97 | '--in_json', alphapose_json, 98 | '--out_json', out_json, 99 | # '--visdir', out_dir, # Uncomment this to visualize PoseFlow tracks. 100 | ] 101 | 102 | print('Running: {}'.format(' '.join(cmd))) 103 | curr_dir = os.getcwd() 104 | os.chdir('src/external/AlphaPose') 105 | ret = subprocess.call(cmd) 106 | if ret != 0: 107 | print('Issue running PoseFlow. Please make sure you can run the above ' 108 | 'command from the commandline.') 109 | exit(ret) 110 | os.chdir(curr_dir) 111 | print('PoseFlow successfully ran!') 112 | print('----------') 113 | return out_json 114 | 115 | 116 | def compute_tracks(vid_path, out_dir): 117 | """ 118 | This script basically: 119 | 1. Extracts individual frames from mp4 since PoseFlow requires per frame 120 | images to be written. 121 | 2. Call AlphaPose on these frames. 122 | 3. Call PoseFlow on the output of 2. 123 | """ 124 | vid_name = osp.basename(vid_path).split('.')[0] 125 | 126 | # Where to save all intermediate outputs in. 127 | vid_dir = osp.abspath(osp.join(out_dir, vid_name)) 128 | img_dir = osp.abspath(osp.join(vid_dir, 'video_frames')) 129 | alphapose_dir = osp.abspath(osp.join(vid_dir, 'alphapose')) 130 | poseflow_dir = osp.abspath(osp.join(vid_dir, 'poseflow')) 131 | 132 | os.makedirs(img_dir, exist_ok=True) 133 | os.makedirs(alphapose_dir, exist_ok=True) 134 | os.makedirs(poseflow_dir, exist_ok=True) 135 | 136 | dump_frames(vid_path, img_dir) 137 | run_alphapose(img_dir, alphapose_dir) 138 | track_json = run_poseflow(img_dir, vid_dir) 139 | 140 | return track_json, img_dir 141 | 142 | 143 | def get_labels_poseflow(json_path, num_frames, min_kp_count=15): 144 | """ 145 | Returns the poses for each person tracklet. 146 | Each pose has dimension num_kp x 3 (x,y,vis) if the person is visible in the 147 | current frame. Otherwise, the pose will be None. 148 | Args: 149 | json_path (str): Path to the json output from AlphaPose/PoseTrack. 150 | num_frames (int): Number of frames. 151 | min_kp_count (int): Minimum threshold length for a tracklet. 152 | Returns: 153 | List of length num_people. Each element in the list is another list of 154 | length num_frames containing the poses for each person. 155 | """ 156 | with open(json_path, 'r') as f: 157 | data = json.load(f) 158 | if len(data.keys()) != num_frames: 159 | print('Not all frames have people detected in it.') 160 | frame_ids = [int(re.findall(r'\d+', img_name)[0]) 161 | for img_name in sorted(data.keys())] 162 | if frame_ids[0] != 0: 163 | print('PoseFlow did not find people in the first frame.') 164 | exit(1) 165 | 166 | all_kps_dict = {} 167 | all_kps_count = {} 168 | for i, key in enumerate(sorted(data.keys())): 169 | # People who are visible in this frame. 170 | track_ids = [] 171 | for person in data[key]: 172 | kps = np.array(person['keypoints']).reshape(-1, 3) 173 | idx = int(person['idx']) 174 | if idx not in all_kps_dict.keys(): 175 | # If this is the first time, fill up until now with None 176 | all_kps_dict[idx] = [None] * i 177 | all_kps_count[idx] = 0 178 | # Save these kps. 179 | all_kps_dict[idx].append(kps) 180 | track_ids.append(idx) 181 | all_kps_count[idx] += 1 182 | # If any person seen in the past is missing in this frame, add None. 183 | for idx in set(all_kps_dict.keys()).difference(track_ids): 184 | all_kps_dict[idx].append(None) 185 | 186 | all_kps_list = [] 187 | all_counts_list = [] 188 | for k in all_kps_dict: 189 | if all_kps_count[k] >= min_kp_count: 190 | all_kps_list.append(all_kps_dict[k]) 191 | all_counts_list.append(all_kps_count[k]) 192 | 193 | # Sort it by the length so longest is first: 194 | sort_idx = np.argsort(all_counts_list)[::-1] 195 | all_kps_list_sorted = [] 196 | for sort_id in sort_idx: 197 | all_kps_list_sorted.append(all_kps_list[sort_id]) 198 | print("Number of detected tracks:", len(all_kps_list_sorted)) 199 | return all_kps_list_sorted[0] # Just take the first track. 200 | -------------------------------------------------------------------------------- /src/models.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow.contrib.layers.python.layers.initializers import ( 3 | variance_scaling_initializer, 4 | ) 5 | import tensorflow.contrib.slim as slim 6 | 7 | from src.ops import tf_pad 8 | 9 | 10 | def get_image_encoder(model_type='resnet'): 11 | """ 12 | Retrieves encoder fn for image and 3D 13 | """ 14 | models = { 15 | 'resnet': encoder_resnet 16 | } 17 | if model_type in models.keys(): 18 | return models[model_type] 19 | else: 20 | print('Unknown image encoder:', model_type) 21 | exit(1) 22 | 23 | 24 | def get_temporal_encoder(model_type='AZ_FC2GN'): 25 | models = { 26 | 'AZ_FC2GN': az_fc2_groupnorm, 27 | } 28 | if model_type in models.keys(): 29 | return models[model_type] 30 | else: 31 | print('Unknown temporal encoder:', model_type) 32 | exit(1) 33 | 34 | 35 | def get_prediction_model(model_type='AZ_FC2GN'): 36 | models = { 37 | 'AZ_FC2GN': az_fc2_groupnorm, 38 | } 39 | if model_type in models.keys(): 40 | return models[model_type] 41 | else: 42 | print('Unknown prediction model:', model_type) 43 | exit(1) 44 | 45 | 46 | # Functions for image encoder. 47 | 48 | def encoder_resnet(x, is_training=True, weight_decay=0.001, reuse=False): 49 | """ 50 | Resnet v2-50 51 | Assumes input is [batch, height_in, width_in, channels]!! 52 | Input: 53 | - x: N x H x W x 3 54 | - weight_decay: float 55 | - reuse: bool->True if test 56 | 57 | Outputs: 58 | - cam: N x 3 59 | - Pose vector: N x 72 60 | - Shape vector: N x 10 61 | - variables: tf variables 62 | """ 63 | from tensorflow.contrib.slim.python.slim.nets import resnet_v2 64 | with tf.name_scope('Encoder_resnet', values=[x]): 65 | with slim.arg_scope( 66 | resnet_v2.resnet_arg_scope(weight_decay=weight_decay)): 67 | net, end_points = resnet_v2.resnet_v2_50( 68 | x, 69 | num_classes=None, 70 | is_training=is_training, 71 | reuse=reuse, 72 | scope='resnet_v2_50') 73 | net = tf.squeeze(net, axis=[1, 2]) 74 | variables_scope = 'resnet_v2_50' 75 | return net, variables_scope 76 | 77 | 78 | def encoder_fc3_dropout(x, 79 | num_output=85, 80 | is_training=True, 81 | reuse=False, 82 | name='3D_module'): 83 | """ 84 | 3D inference module. 3 MLP layers (last is the output) 85 | With dropout on first 2. 86 | Input: 87 | - x: N x [|img_feat|, |3D_param|] 88 | - reuse: bool 89 | 90 | Outputs: 91 | - 3D params: N x num_output 92 | if orthogonal: 93 | either 85: (3 + 24*3 + 10) or 109 (3 + 24*4 + 10) for factored 94 | axis-angle representation 95 | if perspective: 96 | 86: (f, tx, ty, tz) + 24*3 + 10, or 110 for factored axis-angle. 97 | - variables: tf variables 98 | """ 99 | with tf.variable_scope(name, reuse=reuse) as scope: 100 | net = slim.fully_connected(x, 1024, scope='fc1') 101 | net = slim.dropout(net, 0.5, is_training=is_training, scope='dropout1') 102 | net = slim.fully_connected(net, 1024, scope='fc2') 103 | net = slim.dropout(net, 0.5, is_training=is_training, scope='dropout2') 104 | small_xavier = variance_scaling_initializer( 105 | factor=.01, mode='FAN_AVG', uniform=True) 106 | net = slim.fully_connected( 107 | net, 108 | num_output, 109 | activation_fn=None, 110 | weights_initializer=small_xavier, 111 | scope='fc3') 112 | 113 | variables = tf.contrib.framework.get_variables(scope) 114 | return net, variables 115 | 116 | 117 | # Functions for f_{movie strip} and f_{AR}. 118 | 119 | def az_fc2_groupnorm(net, num_conv_layers, prefix, reuse=None): 120 | """ 121 | Causal architecture for movie strip encoder and autoregressive prediction. 122 | 123 | Each block has 2 convs: 124 | norm --> relu --> conv --> norm --> relu --> conv --> add. 125 | Uses full convolution. 126 | """ 127 | for i in range(num_conv_layers): 128 | net = az_fc_causal_block2( 129 | kernel_width=3, 130 | name='block_{}'.format(i), 131 | net_input=net, 132 | num_filter=2048, 133 | pad_edges=True, 134 | prefix=prefix, 135 | reuse=reuse, 136 | ) 137 | return net 138 | 139 | 140 | def az_fc_causal_block2(net_input, num_filter, kernel_width, prefix='', name='', 141 | pad_edges=False, reuse=None): 142 | """ 143 | Causal res block: 144 | BN -> Relu -> Weight -> BN -> Relu -> Weight -> Add 145 | 146 | Causal is implemented by padding the left size by receptive field - 1 and 147 | using 'VALID' mode. The output indices thus only look at the input indices 148 | that are less than or equal. 149 | """ 150 | pad_mode = 'EDGE' if pad_edges else 'ZERO' 151 | num_groups = 32 152 | 153 | # NTC -> NT1C 154 | net_input_expand = tf.expand_dims(net_input, axis=2) 155 | # group-norm 156 | net_norm = tf.contrib.layers.group_norm( 157 | net_input_expand, 158 | channels_axis=-1, 159 | reduction_axes=(-3, -2), 160 | scope=(prefix + 'AZ_FC_causal_block_preact_gn1' + name), 161 | reuse=reuse, 162 | groups=num_groups, 163 | ) 164 | # relu 165 | net_relu = tf.nn.relu(net_norm) 166 | # weight 167 | net_relu_padded = tf_pad( 168 | tensor=net_relu, 169 | paddings=[[0, 0], [kernel_width - 1, 0], [0, 0], [0, 0]], 170 | mode=pad_mode 171 | ) 172 | net_conv1 = tf.contrib.layers.conv2d( 173 | inputs=net_relu_padded, 174 | num_outputs=num_filter, 175 | kernel_size=[kernel_width, 1], 176 | stride=1, 177 | padding='VALID', 178 | data_format='NHWC', # was previously 'NCHW', 179 | rate=1, 180 | activation_fn=None, 181 | scope=prefix + 'AZ_FC_causal_block2_conv1' + name, 182 | reuse=reuse, 183 | ) 184 | # group-norm 185 | net_norm2 = tf.contrib.layers.group_norm( 186 | net_conv1, 187 | channels_axis=-1, 188 | reduction_axes=(-3, -2), 189 | reuse=reuse, 190 | groups=num_groups, 191 | scope=(prefix + 'AZ_FC_causal_block_preact_gn2' + name), 192 | ) 193 | # relu 194 | net_relu2 = tf.nn.relu(net_norm2) 195 | net_relu_padded2 = tf_pad( 196 | tensor=net_relu2, 197 | paddings=[[0, 0], [kernel_width - 1, 0], [0, 0], [0, 0]], 198 | mode=pad_mode 199 | ) 200 | # Initalization 201 | small_xavier = variance_scaling_initializer( 202 | factor=.001, 203 | mode='FAN_AVG', 204 | uniform=True 205 | ) 206 | net_final = tf.contrib.layers.conv2d( 207 | inputs=net_relu_padded2, 208 | num_outputs=num_filter, 209 | kernel_size=[kernel_width, 1], 210 | stride=1, 211 | padding='VALID', 212 | data_format='NHWC', 213 | rate=1, 214 | activation_fn=None, 215 | weights_initializer=small_xavier, 216 | scope=prefix + 'AZ_FC_causal_block2_conv2' + name, 217 | reuse=reuse, 218 | ) 219 | # NT1C -> NTC 220 | net_final = tf.squeeze(net_final, axis=2) 221 | # skip connection 222 | residual = tf.add(net_final, net_input) 223 | return residual 224 | 225 | 226 | # Functions for f_{3D}. 227 | 228 | def batch_pred_omega(input_features, batch_size, is_training, num_output, 229 | omega_mean, sequence_length, scope, predict_delta_keys=(), 230 | use_delta_from_pred=False, use_optcam=False): 231 | """ 232 | Given B x T x * inputs, computes IEF on them by batching them 233 | as BT x *. 234 | 235 | if use_optcam is True, only outputs 72 or 82 dims. 236 | and appends fixed camera [1,0,0] 237 | """ 238 | # run in batch 239 | # omega_mean comes in as shape: BT x 85 240 | input_features_reshape = tf.reshape(input_features, 241 | (batch_size * sequence_length, -1)) 242 | omega_pred, delta_predictions = call_hmr_ief( 243 | phi=input_features_reshape, 244 | omega_start=omega_mean, 245 | scope=scope, 246 | num_output=num_output, 247 | is_training=is_training, 248 | predict_delta_keys=predict_delta_keys, 249 | use_delta_from_pred=use_delta_from_pred, 250 | use_optcam=use_optcam, 251 | ) 252 | omega_pred = tf.reshape( 253 | omega_pred, 254 | (batch_size, sequence_length, num_output) 255 | ) 256 | new_delta_predictions = {} 257 | for delta_t, prediction in delta_predictions.items(): 258 | new_delta_predictions[delta_t] = tf.reshape( 259 | prediction, 260 | (batch_size, sequence_length, num_output) 261 | ) 262 | return omega_pred, new_delta_predictions 263 | 264 | 265 | def call_hmr_ief(phi, omega_start, scope, num_output=85, num_stage=3, 266 | is_training=True, predict_delta_keys=(), 267 | use_delta_from_pred=False, use_optcam=True): 268 | """ 269 | Wrapper for doing HMR-style IEF. 270 | 271 | If predict_delta, then also makes num_delta_t predictions forward and 272 | backward in time, with each step of delta_t. 273 | 274 | Args: 275 | phi (Bx2048): Image features. 276 | omega_start (Bx85): Starting Omega as input to first IEF. 277 | scope (str): Name of scope for reuse. 278 | num_output (int): Size of output. 279 | num_stage (int): Number of iterations for IEF. 280 | is_training (bool): If False, don't apply dropout. 281 | predict_delta_keys (iterable): List of keys for delta_t. 282 | use_delta_from_pred (bool): If True, initializes delta prediction from 283 | current frame prediction. 284 | use_optcam (bool): If True, uses [1, 0, 0] for cam. 285 | 286 | Returns: 287 | Final theta (Bx{num_output}) 288 | Deltas predictions (List of outputs) 289 | """ 290 | theta_here = hmr_ief( 291 | phi=phi, 292 | omega_start=omega_start, 293 | scope=scope, 294 | num_output=num_output, 295 | num_stage=num_stage, 296 | is_training=is_training 297 | ) 298 | 299 | # Delta only needs to do cam/pose, no shape! 300 | if use_optcam: 301 | num_output_delta = 72 302 | else: 303 | num_output_delta = 3 + 72 304 | 305 | deltas_predictions = {} 306 | for delta_t in predict_delta_keys: 307 | if delta_t == 0: 308 | # This should just be the normal IEF. 309 | continue 310 | elif delta_t > 0: 311 | scope_delta = scope + '_future{}'.format(delta_t) 312 | elif delta_t < 0: 313 | scope_delta = scope + '_past{}'.format(abs(delta_t)) 314 | 315 | omega_start_delta = theta_here if use_delta_from_pred else omega_start 316 | # append this later. 317 | beta = omega_start_delta[:, -10:] 318 | 319 | if use_optcam: 320 | # trim the first 3D camera + last shpae 321 | omega_start_delta = omega_start_delta[:, 3:3 + num_output_delta] 322 | else: 323 | omega_start_delta = omega_start_delta[:, :num_output_delta] 324 | 325 | delta_pred = hmr_ief( 326 | phi=phi, 327 | omega_start=omega_start_delta, 328 | scope=scope_delta, 329 | num_output=num_output_delta, 330 | num_stage=num_stage, 331 | is_training=is_training 332 | ) 333 | if use_optcam: 334 | # Add camera + shape 335 | scale = tf.ones([delta_pred.shape[0], 1]) 336 | trans = tf.zeros([delta_pred.shape[0], 2]) 337 | delta_pred = tf.concat([scale, trans, delta_pred, beta], 1) 338 | else: 339 | delta_pred = tf.concat([delta_pred[:, :75], beta], 1) 340 | 341 | deltas_predictions[delta_t] = delta_pred 342 | 343 | return theta_here, deltas_predictions 344 | 345 | 346 | def hmr_ief(phi, omega_start, scope, num_output=85, num_stage=3, 347 | is_training=True): 348 | """ 349 | Runs HMR-style IEF. 350 | 351 | Args: 352 | phi (Bx2048): Image features. 353 | omega_start (Bx85): Starting Omega as input to first IEF. 354 | scope (str): Name of scope for reuse. 355 | num_output (int): Size of output. 356 | num_stage (int): Number of iterations for IEF. 357 | is_training (bool): If False, don't apply dropout. 358 | 359 | Returns: 360 | Final theta (Bx{num_output}) 361 | """ 362 | with tf.variable_scope(scope): 363 | theta_prev = omega_start 364 | theta_here = None 365 | 366 | for _ in range(num_stage): 367 | # ---- Compute outputs 368 | state = tf.concat([phi, theta_prev], 1) 369 | delta_theta, _ = encoder_fc3_dropout( 370 | state, 371 | is_training=is_training, 372 | num_output=num_output, 373 | reuse=tf.AUTO_REUSE 374 | ) 375 | # Compute new theta 376 | theta_here = theta_prev + delta_theta 377 | 378 | # Finally update to end iteration. 379 | theta_prev = theta_here 380 | 381 | return theta_here 382 | -------------------------------------------------------------------------------- /src/omega.py: -------------------------------------------------------------------------------- 1 | """ 2 | Wrapper classes for saving all predicted variables. Makes it easier to compute 3 | SMPL from the 85-dimension output of the model all at once. 4 | """ 5 | import tensorflow as tf 6 | 7 | from src.tf_smpl.batch_lbs import batch_rodrigues 8 | from src.tf_smpl.projection import batch_orth_proj_idrot 9 | 10 | 11 | class Omegas(object): 12 | """ 13 | Superclass container for batches of sequences of poses, shapes, joints, etc. 14 | 15 | Args: 16 | config. 17 | """ 18 | def __init__(self, config, batch_size=None): 19 | self.config = config 20 | if batch_size: 21 | self.batch_size = batch_size 22 | else: 23 | self.batch_size = config.batch_size 24 | self.length = 0 25 | 26 | self.joints = tf.constant( 27 | (), 28 | shape=(self.batch_size, 0, self.config.num_kps, 3) 29 | ) 30 | self.kps = tf.constant( 31 | (), 32 | shape=(self.batch_size, 0, self.config.num_kps, 2) 33 | ) 34 | self.poses_aa = tf.constant((), shape=(self.batch_size, 0, 24, 3)) 35 | self.poses_rot = tf.constant((), shape=(self.batch_size, 0, 24, 3, 3)) 36 | self.shapes = tf.constant((), shape=(self.batch_size, 0, 10)) 37 | 38 | def __len__(self): 39 | """ 40 | Returns the current sequence length. 41 | 42 | Returns: 43 | length (int). 44 | """ 45 | return self.length 46 | 47 | def get_joints(self, t=None): 48 | """ 49 | Returns the joints at time t. 50 | 51 | Args: 52 | t (int). 53 | 54 | Returns: 55 | Joints (Bx25x3). 56 | """ 57 | return self.joints if t is None else self.joints[:, t] 58 | 59 | def get_kps(self, t=None): 60 | """ 61 | Returns the keypoints at time t. 62 | 63 | Note that the shape is different for ground truth omegas and predicted 64 | omegas. 65 | 66 | Args: 67 | t (int). 68 | 69 | Returns: 70 | Kps (Bx25x3) if gt, 71 | or Kps (Bx25x2) if pred. 72 | """ 73 | return self.kps if t is None else self.kps[:, t] 74 | 75 | def get_poses_aa(self, t=None): 76 | """ 77 | Returns axis-aligned poses at time t. 78 | 79 | Args: 80 | t (int). 81 | 82 | Returns: 83 | Poses (Bx24x3). 84 | """ 85 | return self.poses_aa if t is None else self.poses_aa[:, t] 86 | 87 | def get_poses_rot(self, t=None): 88 | """ 89 | Returns poses as rotation matrices at time t. 90 | 91 | Args: 92 | t (int). 93 | 94 | Returns: 95 | Poses (Bx24x3x3). 96 | """ 97 | return self.poses_rot if t is None else self.poses_rot[:, t] 98 | 99 | def get_shapes(self, t=None): 100 | """ 101 | Returns shapes at time t. 102 | 103 | Args: 104 | t (int). 105 | 106 | Returns: 107 | Shapes (Bx10). 108 | """ 109 | return self.shapes if t is None else self.shapes[:, t] 110 | 111 | @staticmethod 112 | def gather(values, indices): 113 | """ 114 | Gathers a subset over time. 115 | 116 | Args: 117 | values (BxTx...): Tensor that we only need a subset of. 118 | indices (iterable): 1D tensor of times. 119 | 120 | Returns: 121 | tensor. 122 | """ 123 | if not isinstance(indices, tf.Tensor): 124 | indices = tf.constant(indices) 125 | spliced = tf.gather(params=values, indices=indices, axis=1) 126 | return spliced 127 | 128 | 129 | class OmegasGt(Omegas): 130 | """ 131 | Stores fields for ground truth omegas. 132 | 133 | Args: 134 | config. 135 | poses_aa (BxTx24x3). 136 | shapes (Bx10). 137 | joints (BxTx14x3). 138 | """ 139 | def __init__(self, config, poses_aa, shapes, joints, kps, batch_size=None): 140 | super(OmegasGt, self).__init__(config, batch_size=batch_size) 141 | self.length = poses_aa.shape[1] 142 | 143 | self.poses_aa = poses_aa 144 | poses_rot = batch_rodrigues(tf.reshape(poses_aa, (-1, 3))) 145 | self.poses_rot = tf.reshape(poses_rot, (self.batch_size, -1, 24, 3, 3)) 146 | self.shapes = shapes 147 | self.joints = joints 148 | self.kps = kps 149 | 150 | def get_shapes(self, t=None): 151 | if t is None: 152 | # When t is None, expect BxTx10. 153 | return tf.tile(tf.expand_dims(self.shapes, 1), (1, self.length, 1)) 154 | else: 155 | return self.shapes 156 | 157 | 158 | class OmegasPred(Omegas): 159 | """ 160 | Stores fields for predicted Omegas. 161 | 162 | Args: 163 | config. 164 | smpl (func). 165 | optcam (bool): If true, uses optcam when computing kp proj. 166 | vis_max_batch (int): Number of batches to visualize. 167 | vis_t_indices (ndarray): Times to visualize. If None, keeps all. 168 | """ 169 | omega_instances = [] 170 | 171 | def __init__(self, 172 | config, 173 | smpl, 174 | use_optcam=False, 175 | vis_max_batch=2, 176 | vis_t_indices=None, 177 | batch_size=None, 178 | is_training=True): 179 | super(OmegasPred, self).__init__(config, batch_size) 180 | self.smpl = smpl 181 | self.cams = tf.constant((), shape=(self.batch_size, 0, 3)) 182 | self.all_verts = tf.constant((), shape=(0, 6890, 3)) 183 | self.verts = tf.constant((), shape=(0, 6890, 3)) 184 | self.smpl_computed = False 185 | self.vis_max_batch = vis_max_batch 186 | self.vis_t_indices = vis_t_indices 187 | self.raw = tf.constant((), shape=(self.batch_size, 0, 85)) 188 | self.use_optcam = use_optcam 189 | self.is_training = is_training 190 | OmegasPred.omega_instances.append(self) 191 | 192 | def update_instance_vars(self): 193 | self.cams = self.raw[:, :, :3] 194 | self.poses_aa = self.raw[:, :, 3: 3 + 24 * 3] 195 | self.shapes = self.raw[:, :, 3 + 24 * 3: 85] 196 | self.length = self.raw.shape[1] 197 | 198 | def append_batched(self, omegas): 199 | """ 200 | Appends multiple omegas. 201 | 202 | Args: 203 | omegas (BxTx85): [cams, poses, shapes]. 204 | """ 205 | B = self.batch_size 206 | omegas = tf.reshape(omegas, (B, -1, 85)) 207 | self.raw = tf.concat((self.raw, omegas), axis=1) 208 | self.update_instance_vars() 209 | self.smpl_computed = False 210 | 211 | def append(self, omega): 212 | """ 213 | Appends an omega. 214 | 215 | Args: 216 | omega (Bx85): [cams, poses, shapes]. 217 | """ 218 | B = self.batch_size 219 | omega = tf.reshape(omega, (B, 1, 85)) 220 | self.raw = tf.concat((self.raw, omega), axis=1) 221 | self.update_instance_vars() 222 | self.smpl_computed = False 223 | 224 | def compute_smpl(self): 225 | """ 226 | Batch computation of vertices, joints, rotation matrices, and keypoints. 227 | Due to the overhead added to computation graph, call this once. 228 | """ 229 | if self.smpl_computed: 230 | print('SMPL should only be computed once!') 231 | B = self.batch_size 232 | T = self.length 233 | 234 | verts, joints, poses_rot = self.smpl( 235 | beta=tf.reshape(self.shapes, (B * T, 10)), 236 | theta=tf.reshape(self.poses_aa, (B * T, 24, 3)), 237 | get_skin=True 238 | ) 239 | self.joints = tf.reshape(joints, (B, T, self.config.num_kps, 3)) 240 | self.poses_rot = tf.reshape(poses_rot, (B, T, 24, 3, 3)) 241 | 242 | # Make sure joints are B*T x num_kps x 3. 243 | if self.use_optcam and self.is_training: 244 | print('Using optimal camera!!') 245 | # Just drop the z here ([1, 0, 0]) 246 | kps = joints[:, :, :2] 247 | else: 248 | kps = batch_orth_proj_idrot(joints, 249 | tf.reshape(self.cams, (B * T, 3))) 250 | self.kps = tf.reshape(kps, (B, T, self.config.num_kps, 2)) 251 | 252 | self.all_verts = tf.reshape(verts, (B, T, 6890, 3))[:self.vis_max_batch] 253 | if self.vis_t_indices is None: 254 | self.verts = self.all_verts 255 | else: 256 | self.verts = Omegas.gather( 257 | values=self.all_verts, 258 | indices=self.vis_t_indices 259 | ) 260 | self.smpl_computed = True 261 | 262 | def get_cams(self, t=None): 263 | """ 264 | Gets cams at time t. 265 | 266 | Args: 267 | t (int). 268 | 269 | Returns: 270 | Cams (Bx3). 271 | """ 272 | return self.cams if t is None else self.cams[:, t] 273 | 274 | def set_cams(self, cams): 275 | """ 276 | Only used for opt_cam 277 | """ 278 | assert self.use_optcam 279 | self.cams = cams 280 | 281 | def get_all_verts(self): 282 | return self.all_verts 283 | 284 | def get_verts(self): 285 | return self.verts 286 | 287 | def get_raw(self): 288 | """ 289 | Returns: 290 | Raw Omega (BxTx85). 291 | """ 292 | return self.raw 293 | 294 | @classmethod 295 | def compute_all_smpl(cls): 296 | omegas = cls.omega_instances 297 | for omega in omegas: 298 | omega.compute_smpl() 299 | -------------------------------------------------------------------------------- /src/ops.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import tensorflow as tf 3 | 4 | 5 | def tf_pad(tensor, paddings, mode): 6 | """ 7 | Pads a tensor according to paddings. 8 | 9 | mode can be 'ZERO' or 'EDGE' (Just use tf.pad for other modes). 10 | 11 | 'EDGE' padding is equivalent to repeatedly doing symmetric padding with all 12 | pads at most 1. 13 | 14 | Args: 15 | tensor (Tensor). 16 | paddings (list of list of non-negative ints). 17 | mode (str). 18 | 19 | Returns: 20 | Padded tensor. 21 | """ 22 | paddings = np.array(paddings, dtype=int) 23 | assert np.all(paddings >= 0) 24 | while not np.all(paddings == 0): 25 | new_paddings = np.array(paddings > 0, dtype=int) 26 | paddings -= new_paddings 27 | new_paddings = tf.constant(new_paddings) 28 | if mode == 'ZERO': 29 | tensor = tf.pad(tensor, new_paddings, 'CONSTANT', constant_values=0) 30 | elif mode == 'EDGE': 31 | tensor = tf.pad(tensor, new_paddings, 'SYMMETRIC') 32 | else: 33 | raise Exception('pad type {} not recognized'.format(mode)) 34 | 35 | return tensor -------------------------------------------------------------------------------- /src/renderer.py: -------------------------------------------------------------------------------- 1 | import neural_renderer as nr 2 | import numpy as np 3 | from skimage.io import imread 4 | import torch 5 | from torch.autograd import Variable 6 | 7 | from src.util.common import resize_img 8 | from src.util.torch_utils import orthographic_proj_withz_idrot 9 | from src.util.render_utils import ( 10 | draw_skeleton, 11 | draw_text, 12 | ) 13 | 14 | 15 | COLORS = { 16 | # colorblind/print/copy safe: 17 | 'blue': [0.65098039, 0.74117647, 0.85882353], 18 | 'pink': [.9, .7, .7], 19 | 'mint': [ 166/255., 229/255., 204/255.], 20 | 'mint2': [ 202/255., 229/255., 223/255.], 21 | 'green': [ 153/255., 216/255., 201/255.], 22 | 'green2': [ 171/255., 221/255., 164/255.], 23 | 'red': [ 251/255., 128/255., 114/255.], 24 | 'orange': [ 253/255., 174/255., 97/255.], 25 | 'yellow': [ 250/255., 230/255., 154/255.] 26 | } 27 | 28 | 29 | def get_dims(x): 30 | return x.dim() if isinstance(x, torch.Tensor) else x.ndim 31 | 32 | 33 | class VisRenderer(object): 34 | """ 35 | Utility to render meshes using pytorch NMR 36 | faces are F x 3 or 1 x F x 3 numpy 37 | this is for visualization only -- does not allow backprop. 38 | This class assumes all inputs are Torch/numpy variables. 39 | This renderer expects quarternion rotation for camera,, 40 | """ 41 | 42 | def __init__(self, 43 | img_size=256, 44 | face_path='models/smpl_faces.npy', 45 | t_size=1): 46 | 47 | self.renderer = nr.Renderer( 48 | img_size, camera_mode='look_at', perspective=False) 49 | self.set_light_dir([1, .5, -1], int_dir=0.3, int_amb=0.7) 50 | self.set_bgcolor([1, 1, 1.]) 51 | self.img_size = img_size 52 | 53 | self.faces_np = np.load(face_path).astype(np.int) 54 | self.faces = to_variable(torch.IntTensor(self.faces_np).cuda()) 55 | if self.faces.dim() == 2: 56 | self.faces = torch.unsqueeze(self.faces, 0) 57 | 58 | # Default color: 59 | default_tex = np.ones((1, self.faces.shape[1], t_size, t_size, t_size, 60 | 3)) 61 | self.default_tex = to_variable(torch.FloatTensor(default_tex).cuda()) 62 | 63 | # Default camera: 64 | cam = np.hstack([0.9, 0, 0]) 65 | default_cam = to_variable(torch.FloatTensor(cam).cuda()) 66 | self.default_cam = torch.unsqueeze(default_cam, 0) 67 | 68 | # Setup proj fn: 69 | self.proj_fn = orthographic_proj_withz_idrot 70 | 71 | def __call__(self, 72 | verts, 73 | cam=None, 74 | texture=None, 75 | rend_mask=False, 76 | alpha=False, 77 | img=None, 78 | color_name='blue'): 79 | """ 80 | verts is |V| x 3 numpy/cuda torch Variable or B x V x 3 81 | cams is 3D [s, tx, ty], numpy/cuda torch Variable or B x 3 82 | cams is NOT the same as OpenDR renderer. 83 | Directly use the cams of HMR output 84 | Returns N x N x 3 numpy, where N is the image size. 85 | Or B x N x N x 3 when input was batched 86 | if you're using this as a batch, make sure you send in B x 3 cameras 87 | as well as B x * x * x 3 images if you're using it. 88 | """ 89 | num_batch = 1 90 | 91 | if get_dims(verts) == 3 and verts.shape[0] != 1: 92 | print('batch mode') 93 | num_batch = verts.shape[0] 94 | # Make sure everything else is also batch mode. 95 | if cam is not None: 96 | assert get_dims(cam) == 2 and cam.shape[0] == num_batch 97 | if img is not None: 98 | assert img.ndim == 4 and img.shape[0] == num_batch 99 | 100 | if texture is None: 101 | # single color. 102 | color = torch.FloatTensor(COLORS[color_name]).cuda() 103 | texture = color * self.default_tex 104 | texture = texture.repeat(num_batch, 1, 1, 1, 1, 1) 105 | else: 106 | texture = to_float_tensor(texture) 107 | if texture.dim() == 5: 108 | # Here input it F x T x T x T x 3 (instead of F x T x T x 3) 109 | # So add batch dim. 110 | texture = torch.unsqueeze(texture, 0) 111 | if cam is None: 112 | cam = self.default_cam 113 | if num_batch > 1: 114 | cam = cam.repeat(num_batch, 1) 115 | else: 116 | cam = to_float_tensor(cam) 117 | if cam.dim() == 1: 118 | cam = torch.unsqueeze(cam, 0) 119 | 120 | verts = to_float_tensor(verts) 121 | if verts.dim() == 2: 122 | verts = torch.unsqueeze(verts, 0) 123 | 124 | verts = to_variable(verts) 125 | cam = to_variable(cam) 126 | texture = to_variable(texture) 127 | 128 | # set offset_z for persp proj 129 | proj_verts = self.proj_fn(verts, cam, offset_z=0) 130 | # Flipping the y-axis here to make it align with 131 | # the image coordinate system! 132 | proj_verts[:, :, 1] *= -1 133 | 134 | # Adjust for batch. 135 | faces = self.faces.repeat(num_batch, 1, 1) 136 | if rend_mask: 137 | rend = self.renderer.render_silhouettes(proj_verts, faces) 138 | rend = torch.unsqueeze(rend, 0) 139 | rend = rend.repeat(1, 3, 1, 1) 140 | else: 141 | rend = self.renderer.render(proj_verts, faces, texture) 142 | 143 | rend = rend[0].data.cpu().numpy().transpose((0, 2, 3, 1)) 144 | rend = np.clip(rend, 0, 1) * 255.0 145 | 146 | if num_batch == 1: 147 | rend = rend[0] 148 | 149 | if not rend_mask and (alpha or img is not None): 150 | mask = self.renderer.render_silhouettes(proj_verts, faces) 151 | mask = mask.data.cpu().numpy() 152 | if img is not None: 153 | mask = np.repeat(np.expand_dims(mask, 3), 3, axis=3) 154 | if num_batch == 1: 155 | mask = mask[0] 156 | # TODO: Make sure img is [0, 255]!!! 157 | return (img * (1 - mask) + rend * mask).astype(np.uint8) 158 | else: 159 | # TODO: Temporary hack 160 | mask = mask.reshape((rend.shape[:2]) + (1,)) 161 | return self.make_alpha(rend, mask) 162 | else: 163 | return rend.astype(np.uint8) 164 | 165 | def rotated(self, 166 | verts, 167 | deg, 168 | axis='y', 169 | cam=None, 170 | texture=None, 171 | rend_mask=False, 172 | alpha=False, 173 | color_name='blue'): 174 | """ 175 | vert is N x 3, torch FloatTensor (or Variable) 176 | """ 177 | import cv2 178 | if axis == 'y': 179 | axis = [0, 1., 0] 180 | elif axis == 'x': 181 | axis = [1., 0, 0] 182 | else: 183 | axis = [0, 0, 1.] 184 | 185 | new_rot = cv2.Rodrigues(np.deg2rad(deg) * np.array(axis))[0] 186 | new_rot = to_float_tensor(new_rot) 187 | 188 | verts = to_float_tensor(verts) 189 | 190 | if get_dims(verts) == 2: 191 | # Make it in to 1 x N x 3 192 | verts = verts.unsqueeze(0) 193 | num_batch = verts.shape[0] 194 | 195 | new_rot = new_rot.unsqueeze(0) 196 | new_rot = new_rot.repeat(num_batch, 1, 1) 197 | 198 | center = verts.mean(1, keepdim=True) 199 | centered_v = (verts - center) 200 | new_verts = torch.matmul(new_rot, centered_v.permute(0, 2, 1)) 201 | new_verts = new_verts.permute(0, 2, 1) + center 202 | 203 | return self.__call__( 204 | new_verts, 205 | cam=cam, 206 | texture=texture, 207 | rend_mask=rend_mask, 208 | alpha=alpha, 209 | color_name=color_name 210 | ) 211 | 212 | def make_alpha(self, rend, mask): 213 | rend = rend.astype(np.uint8) 214 | alpha = (mask * 255).astype(np.uint8) 215 | 216 | imgA = np.dstack((rend, alpha)) 217 | return imgA 218 | 219 | def set_light_dir(self, direction, int_dir=0.8, int_amb=0.8): 220 | self.renderer.light_direction = direction 221 | self.renderer.light_intensity_directional = int_dir 222 | self.renderer.light_intensity_ambient = int_amb 223 | 224 | def set_bgcolor(self, color): 225 | self.renderer.background_color = color 226 | 227 | 228 | def to_variable(x): 229 | if type(x) is not torch.autograd.Variable: 230 | x = Variable(x, requires_grad=False) 231 | return x 232 | 233 | 234 | def to_float_tensor(x): 235 | if isinstance(x, np.ndarray): 236 | x = torch.FloatTensor(x).cuda() 237 | # ow assumed it's already a Tensor.. 238 | return x 239 | 240 | 241 | def convert_as(src, trg): 242 | src = src.type_as(trg) 243 | if src.is_cuda: 244 | src = src.cuda(device=trg.get_device()) 245 | if type(trg) is torch.autograd.Variable: 246 | src = Variable(src, requires_grad=False) 247 | return src 248 | 249 | 250 | def visualize_img(img, 251 | cam, 252 | kp_pred, 253 | vert, 254 | renderer, 255 | kp_gt=None, 256 | text={}, 257 | rotated_view=False, 258 | mesh_color='blue', 259 | pad_vals=None, 260 | no_text=False): 261 | """ 262 | Visualizes the image with the ground truth keypoints and 263 | predicted keypoints on left and image with mesh on right. 264 | Keypoints should be in normalized coordinates, not image coordinates. 265 | Args: 266 | img: Image. 267 | cam (3x1): Camera parameters. 268 | kp_gt: Ground truth keypoints. 269 | kp_pred: Predicted keypoints. 270 | vert: Vertices. 271 | renderer: SMPL renderer. 272 | text (dict): Optional information to include in the image. 273 | rotated_view (bool): If True, also visualizes mesh from another angle. 274 | if pad_vals (2,) is not None, removes those values from the image 275 | (undo img pad to make square) 276 | Returns: 277 | Combined image. 278 | """ 279 | img_size = img.shape[0] 280 | text.update({'sc': cam[0], 'tx': cam[1], 'ty': cam[2]}) 281 | if kp_gt is not None: 282 | gt_vis = kp_gt[:, 2].astype(bool) 283 | loss = np.sum((kp_gt[gt_vis, :2] - kp_pred[gt_vis])**2) 284 | text['kpl'] = loss 285 | 286 | # Undo pre-processing. 287 | # Make sure img is [0-255] 288 | input_img = ((img + 1) * 0.5) * 255. 289 | rend_img = renderer(vert, cam=cam, img=input_img, color_name=mesh_color) 290 | if not no_text: 291 | rend_img = draw_text(rend_img, text) 292 | 293 | # Draw skeletons 294 | pred_joint = ((kp_pred + 1) * 0.5) * img_size 295 | skel_img = draw_skeleton(input_img, pred_joint) 296 | if kp_gt is not None: 297 | gt_joint = ((kp_gt[:, :2] + 1) * 0.5) * img_size 298 | skel_img = draw_skeleton( 299 | skel_img, gt_joint, draw_edges=False, vis=gt_vis) 300 | 301 | if pad_vals is not None: 302 | skel_img = remove_pads(skel_img, pad_vals) 303 | rend_img = remove_pads(rend_img, pad_vals) 304 | if rotated_view: 305 | rot_img = renderer.rotated( 306 | vert, 90, cam=cam, alpha=False, color_name=mesh_color) 307 | if pad_vals is not None: 308 | rot_img = remove_pads(rot_img, pad_vals) 309 | 310 | return skel_img / 255, rend_img / 255, rot_img / 255 311 | 312 | else: 313 | return skel_img / 255, rend_img / 255 314 | 315 | 316 | def visualize_img_orig(cam, kp_pred, vert, renderer, start_pt, scale, 317 | proc_img_shape, im_path=None, img=None, 318 | rotated_view=False, mesh_color='blue', max_img_size=300, 319 | no_text=False, bbox=None, crop_cam=None): 320 | """ 321 | Visualizes the image with the ground truth keypoints and predicted keypoints 322 | in the original image space (squared). 323 | If you get out of memory error, make max_img_size smaller. 324 | Args: 325 | must supply either the im_path or img 326 | start_pt, scale, proc_img_shape are parameters used to preprocess the 327 | image. 328 | scale_result is how much to scale the current image 329 | Returns: 330 | Combined image. 331 | """ 332 | if img is None: 333 | img = imread(im_path) 334 | # Pre-process image to [-1, 1] bc it expects this. 335 | img = ((img / 255.) - 0.5) * 2 336 | if np.max(img.shape[:2]) > max_img_size: 337 | # if the image is too big it wont fit in gpu and nmr poops out. 338 | scale_orig = max_img_size / float(np.max(img.shape[:2])) 339 | img, _ = resize_img(img, scale_orig) 340 | undo_scale = (1. / np.array(scale)) * scale_orig 341 | else: 342 | undo_scale = 1. / np.array(scale) 343 | 344 | if bbox is not None: 345 | assert(crop_cam is not None) 346 | img = img[bbox[0]:bbox[1], bbox[2]:bbox[3]] 347 | # For these, the cameras are already adjusted. 348 | start_pt = np.array([0, 0]) 349 | 350 | # NMR needs images to be square.. 351 | img, pad_vals = make_square(img) 352 | img_size = np.max(img.shape[:2]) 353 | renderer.renderer.image_size = img_size 354 | 355 | # Adjust kp_pred. 356 | # This is in 224x224 cropped space. 357 | pred_joint = ((kp_pred + 1) * 0.5) * proc_img_shape[0] 358 | # This is in the original image. 359 | pred_joint_orig = (pred_joint + start_pt - proc_img_shape[0]) * undo_scale 360 | 361 | # in normalize coord of the original image: 362 | kp_orig = 2 * (pred_joint_orig / img_size) - 1 363 | if bbox is not None: 364 | use_cam = crop_cam 365 | else: 366 | 367 | # This is camera in crop image coord. 368 | cam_crop = np.hstack([proc_img_shape[0] * cam[0] * 0.5, 369 | cam[1:] + (2./cam[0]) * 0.5]) 370 | 371 | # This is camera in orig image coord 372 | cam_orig = np.hstack([ 373 | cam_crop[0] * undo_scale, 374 | cam_crop[1:] + (start_pt - proc_img_shape[0]) / cam_crop[0] 375 | ]) 376 | 377 | # This is the camera in normalized orig_image coord 378 | new_cam = np.hstack([ 379 | cam_orig[0] * (2. / img_size), 380 | cam_orig[1:] - (1 / ((2./img_size) * cam_orig[0])) 381 | ]) 382 | new_cam = new_cam.astype(np.float32) 383 | use_cam = new_cam 384 | 385 | # Call visualize_img with this camera: 386 | rendered_orig = visualize_img( 387 | img=img, 388 | cam=use_cam, 389 | kp_pred=kp_orig, 390 | vert=vert, 391 | renderer=renderer, 392 | rotated_view=rotated_view, 393 | mesh_color=mesh_color, 394 | pad_vals=pad_vals, 395 | no_text=no_text, 396 | ) 397 | 398 | return rendered_orig 399 | 400 | 401 | def visualize_mesh_og(cam, vert, renderer, start_pt, scale, proc_img_shape, 402 | im_path=None, img=None, deg=0, mesh_color='blue', 403 | max_img_size=300, pad=50, crop_cam=None, bbox=None): 404 | """ 405 | Visualize mesh in original image space. 406 | If you get out of memory error, make max_img_size smaller. 407 | If crop_cam and bbox is not None, 408 | crops the image and uses the crop_cam to render. 409 | (See compute_video_bbox.py) 410 | """ 411 | if img is None: 412 | img = imread(im_path) 413 | # Pre-process image to [-1, 1] bc it expects this. 414 | img = ((img / 255.) - 0.5) * 2 415 | 416 | if bbox is not None: 417 | assert(crop_cam is not None) 418 | img = img[bbox[0]:bbox[1], bbox[2]:bbox[3]] 419 | # For these, the cameras are already adjusted. 420 | scale = 1. 421 | start_pt = np.array([0, 0]) 422 | if np.max(img.shape[:2]) > max_img_size: 423 | # if the image is too big it wont fit in gpu and nmr poops out. 424 | scale_orig = max_img_size / float(np.max(img.shape[:2])) 425 | img, _ = resize_img(img, scale_orig) 426 | undo_scale = (1. / np.array(scale)) * scale_orig 427 | else: 428 | undo_scale = 1. / np.array(scale) 429 | # NMR needs images to be square.. 430 | img, pad_vals = make_square(img) 431 | img_size = np.max(img.shape[:2]) 432 | renderer.renderer.image_size = img_size 433 | 434 | if bbox is not None: 435 | return renderer.rotated( 436 | verts=vert, 437 | deg=deg, 438 | cam=crop_cam, 439 | color_name=mesh_color, 440 | ) 441 | else: 442 | # This is camera in crop image coord. 443 | cam_crop = np.hstack([proc_img_shape[0] * cam[0] * 0.5, 444 | cam[1:] + (2./cam[0]) * 0.5]) 445 | 446 | # This is camera in orig image coord 447 | cam_orig = np.hstack([ 448 | cam_crop[0] * undo_scale, 449 | cam_crop[1:] + (start_pt - proc_img_shape[0]) / cam_crop[0] 450 | ]) 451 | 452 | # This is the camera in normalized orig_image coord 453 | new_cam = np.hstack([ 454 | cam_orig[0] * (2. / img_size), 455 | cam_orig[1:] - (1 / ((2./img_size) * cam_orig[0])) 456 | ]) 457 | new_cam = new_cam.astype(np.float32) 458 | 459 | return renderer.rotated( 460 | verts=vert, 461 | deg=deg, 462 | cam=new_cam, 463 | color_name=mesh_color, 464 | ) 465 | 466 | 467 | def make_square(img): 468 | """ 469 | Bc nmr only deals with square image, adds pad to the shorter side. 470 | """ 471 | img_size = np.max(img.shape[:2]) 472 | pad_vals = img_size - img.shape[:2] 473 | 474 | img = np.pad( 475 | array=img, 476 | pad_width=((0, pad_vals[0]), (0, pad_vals[1]), (0, 0)), 477 | mode='constant' 478 | ) 479 | 480 | return img, pad_vals 481 | 482 | 483 | def remove_pads(img, pad_vals): 484 | """ 485 | Undos padding done by make_square. 486 | """ 487 | 488 | if pad_vals[0] != 0: 489 | img = img[:-pad_vals[0], :] 490 | if pad_vals[1] != 0: 491 | img = img[:, :-pad_vals[1]] 492 | return img 493 | 494 | 495 | def compute_video_bbox(cams, kps, proc_infos, margin=10): 496 | """ 497 | Given the prediction and original image info, 498 | figures out the min/max extent (bbox) 499 | of the person in the entire video. 500 | Adjust the cameras so now ppl project in this new bbox. 501 | Needed to crop the video around the person and also to 502 | rotate the mesh. 503 | cams: N x 3, predicted camera 504 | joints: N x K x 3, predicted 3D joints for debug 505 | kp: N x K x 3, predicted 2D joints to figure out extent 506 | proc_infos: dict holding: 507 | start_pt, scale: N x 2, N x 1 508 | preprocessing done on this image. 509 | im_shape: image shape after preprocessing 510 | im_path: to the first image to figure out size of orig video 511 | """ 512 | im_path = proc_infos[0]['im_path'] 513 | img = imread(im_path) 514 | img_h, img_w = img.shape[:2] 515 | img_size = np.max([img_h, img_w]) 516 | 517 | im_shape = proc_infos[0]['im_shape'][0] 518 | 519 | new_cams = [] 520 | bboxes = [] 521 | # For each image, get the joints in the original coord frame: 522 | for i, (proc_info, kp, cam) in enumerate(zip(proc_infos, kps, cams)): 523 | scale = proc_info['scale'] 524 | start_pt = proc_info['start_pt'] 525 | 526 | undo_scale = 1. / np.array(scale) 527 | # Adjust kp_pred. 528 | # This is in 224x224 cropped space. 529 | pred_joint = ((kp + 1) * 0.5) * im_shape 530 | # This is in the original image. 531 | pred_joint_orig = (pred_joint + start_pt - im_shape) * undo_scale 532 | # in normalize coord of the original image: 533 | # kp_orig = 2 * (pred_joint_orig / img_size) - 1 534 | # This is camera in crop image coord (224x224). 535 | cam_crop = np.hstack([im_shape * cam[0] * 0.5, 536 | cam[1:] + (2./cam[0]) * 0.5]) 537 | # This is camera in orig image coord 538 | cam_orig = np.hstack([ 539 | cam_crop[0] * undo_scale, 540 | cam_crop[1:] + (start_pt - im_shape) / cam_crop[0] 541 | ]) 542 | # This is the camera in normalized orig_image coord 543 | new_cam = np.hstack([ 544 | cam_orig[0] * (2. / img_size), 545 | cam_orig[1:] - (1 / ((2./img_size) * cam_orig[0])) 546 | ]) 547 | new_cams.append(new_cam.astype(np.float32)) 548 | x = pred_joint_orig[:, 0] 549 | y = pred_joint_orig[:, 1] 550 | ymin = max(0, min(y) - margin) 551 | ymax = min(img_h - 1, max(y) + margin) 552 | xmin = max(0, min(x) - margin) 553 | xmax = min(img_w - 1, max(x) + margin) 554 | bbox = np.array([ymin, ymax, xmin, xmax]) 555 | 556 | bboxes.append(bbox) 557 | 558 | # Figure out the video level bbox. 559 | # bbox is in format [ymin, ymax, xmin, xmax] 560 | bboxes = np.stack(bboxes) 561 | bbox = np.array([ 562 | np.min(bboxes[:, 0]), 563 | np.max(bboxes[:, 1]), 564 | np.min(bboxes[:, 2]), 565 | np.max(bboxes[:, 3]) 566 | ]) 567 | bbox = bbox.astype(np.int) 568 | # Now adjust the cams by this bbox offset. 569 | ymin, xmin = bbox[0], bbox[2] 570 | new_offset = np.array([xmin, ymin]) 571 | new_offset_norm = np.linalg.norm(new_offset) 572 | img_size_crop = np.max([bbox[1] - bbox[0], bbox[3] - bbox[2]]) 573 | 574 | # Rotated images: save delta translation 575 | new_cams_cropped = [] 576 | 577 | for i, (proc_info, kp, cam) in enumerate(zip(proc_infos, kps, cams)): 578 | scale = proc_info['scale'] 579 | 580 | undo_scale = 1. / np.array(scale) 581 | start_pt0 = proc_info['start_pt'] 582 | 583 | start_pt = start_pt0 - (new_offset * scale) 584 | 585 | if np.linalg.norm(proc_info['start_pt']) < new_offset_norm: 586 | print('crop is more than start pt..?') 587 | import ipdb; ipdb.set_trace() 588 | 589 | # This is camera in crop image coord (224x224). 590 | cam_crop = np.hstack([im_shape * cam[0] * 0.5, 591 | cam[1:] + (2./cam[0]) * 0.5]) 592 | 593 | # This is camera in orig image coord 594 | cam_orig = np.hstack([ 595 | cam_crop[0] * undo_scale, 596 | cam_crop[1:] + (start_pt - im_shape) / cam_crop[0] 597 | ]) 598 | 599 | # This is the camera in normalized orig_image coord 600 | new_cam = np.hstack([ 601 | cam_orig[0] * (2. / img_size_crop), 602 | cam_orig[1:] - (1 / ((2./img_size_crop) * cam_orig[0])) 603 | ]) 604 | new_cams_cropped.append(new_cam.astype(np.float32)) 605 | 606 | return bbox, new_cams_cropped 607 | 608 | 609 | def get_params_from_omega(smpl_model, regressor, omega, cam=None): 610 | cam = omega[:3] if cam is None else cam 611 | pose = omega[3:3 + 72] 612 | shape = omega[75:] 613 | smpl_model.pose[:] = pose 614 | smpl_model.betas[:] = shape 615 | verts = np.copy(smpl_model.r) 616 | joints = regressor.dot(verts) 617 | kps = cam[0] * (joints[:, :2] + cam[1:]) 618 | return { 619 | 'cam': cam, 620 | 'joints': joints, 621 | 'kps': kps, 622 | 'pose': pose, 623 | 'shape': shape, 624 | 'verts': verts, 625 | } 626 | -------------------------------------------------------------------------------- /src/tf_smpl/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jasonyzhang/phd/7b7f526d45913902ed93cdc49fdd59272698bd71/src/tf_smpl/__init__.py -------------------------------------------------------------------------------- /src/tf_smpl/batch_lbs.py: -------------------------------------------------------------------------------- 1 | """ Util functions for SMPL 2 | @@batch_skew 3 | @@batch_rodrigues 4 | @@batch_lrotmin 5 | @@batch_global_rigid_transformation 6 | """ 7 | import tensorflow as tf 8 | 9 | 10 | def batch_skew(vec, batch_size=None): 11 | """ 12 | vec is N x 3, batch_size is int 13 | 14 | returns N x 3 x 3. Skew_sym version of each matrix. 15 | """ 16 | with tf.name_scope("batch_skew", values=[vec]): 17 | if batch_size is None: 18 | batch_size = vec.shape.as_list()[0] 19 | col_inds = tf.constant([1, 2, 3, 5, 6, 7]) 20 | indices = tf.reshape( 21 | tf.reshape(tf.range(0, batch_size) * 9, [-1, 1]) + col_inds, 22 | [-1, 1]) 23 | updates = tf.reshape( 24 | tf.stack( 25 | [ 26 | -vec[:, 2], vec[:, 1], vec[:, 2], -vec[:, 0], -vec[:, 1], 27 | vec[:, 0] 28 | ], 29 | axis=1), [-1]) 30 | out_shape = [batch_size * 9] 31 | res = tf.scatter_nd(indices, updates, out_shape) 32 | res = tf.reshape(res, [batch_size, 3, 3]) 33 | 34 | return res 35 | 36 | 37 | def batch_rodrigues(theta, name=None): 38 | """ 39 | Theta is N x 3 40 | """ 41 | with tf.name_scope(name, "batch_rodrigues", values=[theta]): 42 | batch_size = theta.shape.as_list()[0] 43 | angle = tf.expand_dims(tf.norm(theta + 1e-8, axis=1), -1) 44 | r = tf.expand_dims(tf.div(theta, angle), -1) 45 | 46 | angle = tf.expand_dims(angle, -1) 47 | cos = tf.cos(angle) 48 | sin = tf.sin(angle) 49 | 50 | outer = tf.matmul(r, r, transpose_b=True, name="outer") 51 | 52 | eyes = tf.tile(tf.expand_dims(tf.eye(3), 0), [batch_size, 1, 1]) 53 | R = cos * eyes + (1 - cos) * outer + sin * batch_skew( 54 | r, batch_size=batch_size) 55 | return R 56 | 57 | 58 | def batch_rot2aa(Rs): 59 | """ 60 | Rs is B x 3 x 3 61 | void cMathUtil::RotMatToAxisAngle(const tMatrix& mat, tVector& out_axis, 62 | double& out_theta) 63 | { 64 | double c = 0.5 * (mat(0, 0) + mat(1, 1) + mat(2, 2) - 1); 65 | c = cMathUtil::Clamp(c, -1.0, 1.0); 66 | 67 | out_theta = std::acos(c); 68 | 69 | if (std::abs(out_theta) < 0.00001) 70 | { 71 | out_axis = tVector(0, 0, 1, 0); 72 | } 73 | else 74 | { 75 | double m21 = mat(2, 1) - mat(1, 2); 76 | double m02 = mat(0, 2) - mat(2, 0); 77 | double m10 = mat(1, 0) - mat(0, 1); 78 | double denom = std::sqrt(m21 * m21 + m02 * m02 + m10 * m10); 79 | out_axis[0] = m21 / denom; 80 | out_axis[1] = m02 / denom; 81 | out_axis[2] = m10 / denom; 82 | out_axis[3] = 0; 83 | } 84 | } 85 | """ 86 | cos = 0.5 * (tf.trace(Rs) - 1) 87 | cos = tf.clip_by_value(cos, -1, 1) 88 | 89 | theta = tf.acos(cos) 90 | 91 | m21 = Rs[:, 2, 1] - Rs[:, 1, 2] 92 | m02 = Rs[:, 0, 2] - Rs[:, 2, 0] 93 | m10 = Rs[:, 1, 0] - Rs[:, 0, 1] 94 | denom = tf.sqrt(m21 * m21 + m02 * m02 + m10 * m10) 95 | 96 | axis0 = tf.where(tf.abs(theta) < 0.00001, m21, m21 / denom) 97 | axis1 = tf.where(tf.abs(theta) < 0.00001, m02, m02 / denom) 98 | axis2 = tf.where(tf.abs(theta) < 0.00001, m10, m10 / denom) 99 | 100 | return tf.expand_dims(theta, 1) * tf.stack([axis0, axis1, axis2], 1) 101 | 102 | 103 | def batch_lrotmin(theta, name=None): 104 | """ NOTE: not used bc I want to reuse R and this is simple. 105 | Output of this is used to compute joint-to-pose blend shape mapping. 106 | Equation 9 in SMPL paper. 107 | 108 | 109 | Args: 110 | pose: `Tensor`, N x 72 vector holding the axis-angle rep of K joints. 111 | This includes the global rotation so K=24 112 | 113 | Returns 114 | diff_vec : `Tensor`: N x 207 rotation matrix of 23=(K-1) joints with 115 | identity subtracted., 116 | """ 117 | with tf.name_scope(name, "batch_lrotmin", [theta]): 118 | with tf.name_scope("ignore_global"): 119 | theta = theta[:, 3:] 120 | 121 | # N*23 x 3 x 3 122 | Rs = batch_rodrigues(tf.reshape(theta, [-1, 3])) 123 | lrotmin = tf.reshape(Rs - tf.eye(3), [-1, 207]) 124 | 125 | return lrotmin 126 | 127 | 128 | def batch_global_rigid_transformation(Rs, Js, parent, rotate_base=False): 129 | """ 130 | Computes absolute joint locations given pose. 131 | 132 | rotate_base: if True, rotates the global rotation by 90 deg in x axis. 133 | if False, this is the original SMPL coordinate. 134 | 135 | Args: 136 | Rs: N x 24 x 3 x 3 rotation vector of K joints 137 | Js: N x 24 x 3, joint locations before posing 138 | parent: 24 holding the parent id for each index 139 | 140 | Returns 141 | new_J : `Tensor`: N x 24 x 3 location of absolute joints 142 | A : `Tensor`: N x 24 4 x 4 relative joint transformations for LBS. 143 | """ 144 | with tf.name_scope("batch_forward_kinematics", values=[Rs, Js]): 145 | N = Rs.shape[0].value 146 | if rotate_base: 147 | print('Flipping the SMPL coordinate frame!!!!') 148 | rot_x = tf.constant( 149 | [[1, 0, 0], [0, -1, 0], [0, 0, -1]], dtype=Rs.dtype) 150 | rot_x = tf.reshape(tf.tile(rot_x, [N, 1]), [N, 3, 3]) 151 | root_rotation = tf.matmul(Rs[:, 0, :, :], rot_x) 152 | else: 153 | root_rotation = Rs[:, 0, :, :] 154 | 155 | # Now Js is N x 24 x 3 x 1 156 | Js = tf.expand_dims(Js, -1) 157 | 158 | def make_A(R, t, name=None): 159 | # Rs is N x 3 x 3, ts is N x 3 x 1 160 | with tf.name_scope(name, "Make_A", [R, t]): 161 | R_homo = tf.pad(R, [[0, 0], [0, 1], [0, 0]]) 162 | t_homo = tf.concat([t, tf.ones([N, 1, 1])], 1) 163 | return tf.concat([R_homo, t_homo], 2) 164 | 165 | A0 = make_A(root_rotation, Js[:, 0]) 166 | results = [A0] 167 | for i in range(1, parent.shape[0]): 168 | j_here = Js[:, i] - Js[:, parent[i]] 169 | A_here = make_A(Rs[:, i], j_here) 170 | res_here = tf.matmul( 171 | results[parent[i]], A_here, name="propA%d" % i) 172 | results.append(res_here) 173 | 174 | # 10 x 24 x 4 x 4 175 | results = tf.stack(results, axis=1) 176 | 177 | new_J = results[:, :, :3, 3] 178 | 179 | # --- Compute relative A: Skinning is based on 180 | # how much the bone moved (not the final location of the bone) 181 | # but (final_bone - init_bone) 182 | # --- 183 | Js_w0 = tf.concat([Js, tf.zeros([N, 24, 1, 1])], 2) 184 | init_bone = tf.matmul(results, Js_w0) 185 | # Append empty 4 x 3: 186 | init_bone = tf.pad(init_bone, [[0, 0], [0, 0], [0, 0], [3, 0]]) 187 | A = results - init_bone 188 | 189 | return new_J, A 190 | -------------------------------------------------------------------------------- /src/tf_smpl/batch_smpl.py: -------------------------------------------------------------------------------- 1 | """ 2 | Tensorflow SMPL implementation as batch. 3 | Specify joint types: 4 | 'coco': Returns COCO+ 19 joints 5 | 'lsp': Returns H3.6M-LSP 14 joints 6 | Note: To get original smpl joints, use self.J_transformed 7 | """ 8 | import ipdb 9 | import numpy as np 10 | import pickle 11 | 12 | import tensorflow as tf 13 | from src.tf_smpl.batch_lbs import ( 14 | batch_rodrigues, 15 | batch_global_rigid_transformation 16 | ) 17 | 18 | 19 | # There are chumpy variables so convert them to numpy. 20 | def undo_chumpy(x): 21 | return x if isinstance(x, np.ndarray) else x.r 22 | 23 | 24 | class SMPL(object): 25 | def __init__(self, pkl_path, joint_type='cocoplus', dtype=tf.float32): 26 | """ 27 | pkl_path is the path to a SMPL model 28 | """ 29 | # -- Load SMPL params -- 30 | with open(pkl_path, 'rb') as f: 31 | dd = pickle.load(f, encoding='latin1') 32 | # Mean template vertices 33 | self.v_template = tf.Variable( 34 | undo_chumpy(dd['v_template']), 35 | name='v_template', 36 | dtype=dtype, 37 | trainable=False) 38 | # Size of mesh [Number of vertices, 3] 39 | self.size = [self.v_template.shape[0].value, 3] 40 | self.num_betas = dd['shapedirs'].shape[-1] 41 | # Shape blend shape basis: 6980 x 3 x 10 42 | # reshaped to 6980*30 x 10, transposed to 10x6980*3 43 | shapedir = np.reshape( 44 | undo_chumpy(dd['shapedirs']), [-1, self.num_betas]).T 45 | self.shapedirs = tf.Variable( 46 | shapedir, name='shapedirs', dtype=dtype, trainable=False) 47 | 48 | # Regressor for joint locations given shape - 6890 x 24 49 | self.J_regressor = tf.Variable( 50 | dd['J_regressor'].T.todense(), 51 | name="J_regressor", 52 | dtype=dtype, 53 | trainable=False) 54 | 55 | # Pose blend shape basis: 6890 x 3 x 207, reshaped to 6890*30 x 207 56 | num_pose_basis = dd['posedirs'].shape[-1] 57 | # 207 x 20670 58 | posedirs = np.reshape( 59 | undo_chumpy(dd['posedirs']), [-1, num_pose_basis]).T 60 | self.posedirs = tf.Variable( 61 | posedirs, name='posedirs', dtype=dtype, trainable=False) 62 | 63 | # indices of parents for each joints 64 | self.parents = dd['kintree_table'][0].astype(np.int32) 65 | 66 | # LBS weights 67 | self.weights = tf.Variable( 68 | undo_chumpy(dd['weights']), 69 | name='lbs_weights', 70 | dtype=dtype, 71 | trainable=False) 72 | 73 | # This returns 19 keypoints: 6890 x 19 74 | self.joint_regressor = tf.Variable( 75 | dd['cocoplus_regressor'].T.todense(), 76 | name="cocoplus_regressor", 77 | dtype=dtype, 78 | trainable=False) 79 | if joint_type == 'lsp': # 14 LSP joints! 80 | self.joint_regressor = self.joint_regressor[:, :14] 81 | 82 | if joint_type not in ['cocoplus', 'lsp']: 83 | print('Unknown joint type: {}, it must be either "cocoplus" ' 84 | 'or "lsp"'.format(joint_type)) 85 | ipdb.set_trace() 86 | 87 | def __call__(self, beta, theta, get_skin=False, name=None): 88 | """ 89 | Obtain SMPL with shape (beta) & pose (theta) inputs. 90 | Theta includes the global rotation. 91 | Args: 92 | beta: N x 10 93 | theta: N x 72 (with 3-D axis-angle rep) 94 | 95 | Updates: 96 | self.J_transformed: N x 24 x 3 joint location after shaping 97 | & posing with beta and theta 98 | Returns: 99 | - joints: N x 19 or 14 x 3 joint locations depending on joint_type 100 | If get_skin is True, also returns 101 | - Verts: N x 6890 x 3 102 | """ 103 | with tf.name_scope(name, "smpl_main", [beta, theta]): 104 | num_batch = beta.shape[0].value 105 | 106 | # 1. Add shape blend shapes 107 | # (N x 10) x (10 x 6890*3) = N x 6890 x 3 108 | v_shaped = tf.reshape( 109 | tf.matmul(beta, self.shapedirs, name='shape_bs'), 110 | [-1, self.size[0], self.size[1]]) + self.v_template 111 | 112 | # 2. Infer shape-dependent joint locations. 113 | Jx = tf.matmul(v_shaped[:, :, 0], self.J_regressor) 114 | Jy = tf.matmul(v_shaped[:, :, 1], self.J_regressor) 115 | Jz = tf.matmul(v_shaped[:, :, 2], self.J_regressor) 116 | J = tf.stack([Jx, Jy, Jz], axis=2) 117 | 118 | # 3. Add pose blend shapes 119 | # N x 24 x 3 x 3 120 | # only do rodrigues if theta has axis angle representation 121 | Rs = tf.reshape( 122 | batch_rodrigues(tf.reshape(theta, [-1, 3])), [-1, 24, 3, 3]) 123 | with tf.name_scope("lrotmin"): 124 | # Ignore global rotation. 125 | pose_feature = tf.reshape(Rs[:, 1:, :, :] - tf.eye(3), 126 | [-1, 207]) 127 | 128 | # (N x 207) x (207, 20670) -> N x 6890 x 3 129 | v_posed = tf.reshape( 130 | tf.matmul(pose_feature, self.posedirs), 131 | [-1, self.size[0], self.size[1]]) + v_shaped 132 | 133 | #4. Get the global joint location 134 | self.J_transformed, A = batch_global_rigid_transformation( 135 | Rs, J, self.parents) 136 | 137 | # 5. Do skinning: 138 | # W is N x 6890 x 24 139 | W = tf.reshape( 140 | tf.tile(self.weights, [num_batch, 1]), [num_batch, -1, 24]) 141 | # (N x 6890 x 24) x (N x 24 x 16) 142 | T = tf.reshape( 143 | tf.matmul(W, tf.reshape(A, [num_batch, 24, 16])), 144 | [num_batch, -1, 4, 4]) 145 | v_posed_homo = tf.concat( 146 | [v_posed, tf.ones([num_batch, v_posed.shape[1], 1])], 2) 147 | v_homo = tf.matmul(T, tf.expand_dims(v_posed_homo, -1)) 148 | 149 | verts = v_homo[:, :, :3, 0] 150 | 151 | # Get cocoplus or lsp joints: 152 | joint_x = tf.matmul(verts[:, :, 0], self.joint_regressor) 153 | joint_y = tf.matmul(verts[:, :, 1], self.joint_regressor) 154 | joint_z = tf.matmul(verts[:, :, 2], self.joint_regressor) 155 | joints = tf.stack([joint_x, joint_y, joint_z], axis=2) 156 | 157 | if get_skin: 158 | return verts, joints, Rs 159 | else: 160 | return joints 161 | -------------------------------------------------------------------------------- /src/tf_smpl/projection.py: -------------------------------------------------------------------------------- 1 | """ 2 | Util functions implementing the camera 3 | 4 | @@batch_orth_proj_idrot 5 | @@batch_orth_proj_optcam 6 | @@procrustes2d_vis 7 | """ 8 | 9 | import tensorflow as tf 10 | 11 | 12 | def batch_orth_proj_idrot(X, camera, name=None): 13 | """ 14 | X is N x num_points x 3 15 | camera is N x 3 16 | same as applying orth_proj_idrot to each N 17 | """ 18 | with tf.name_scope(name, "batch_orth_proj_idrot", [X, camera]): 19 | # TODO check X dim size. 20 | # tf.Assert(X.shape[2] == 3, [X]) 21 | 22 | camera = tf.reshape(camera, [-1, 1, 3], name="cam_adj_shape") 23 | 24 | X_trans = X[:, :, :2] + camera[:, :, 1:] 25 | 26 | shape = tf.shape(X_trans) 27 | return tf.reshape( 28 | camera[:, :, 0] * tf.reshape(X_trans, [shape[0], -1]), shape) 29 | 30 | 31 | def batch_orth_proj_optcam(X, X_gt, unbounded=False, name=None): 32 | """ 33 | Solves for best sale and translation in 2D, i.e. 34 | gives (s, t) such that ||s(x + t) - x_gt||^2 35 | X is N x K x 2, for [x, y] pred (via identity). 36 | X_gt is N x K x 3, the 3rd dim is visibility 37 | 38 | returns proj_x: N x K x 2 and best_cam:[scale, trans] 39 | """ 40 | with tf.name_scope(name, "batch_orth_proj_optcam", [X, X_gt]): 41 | best_cam = procrustes2d_vis(X, X_gt, unbounded=unbounded) 42 | best_cam = tf.stop_gradient(best_cam) 43 | proj_x = batch_orth_proj_idrot(X, best_cam) 44 | return proj_x, best_cam 45 | 46 | 47 | def procrustes2d_vis(X, X_target, unbounded=True): 48 | """ 49 | Solves for the optimal sale and translation in 2D, i.e. 50 | gives (s, t) such that ||s(x + t) - x_gt||^2 51 | on *visible* points. 52 | 53 | Gradient is stopped on the computed camera. 54 | 55 | if unbounded is False (i.e. bounded), it lower bounds the scale 56 | so it can't be so small. 57 | 58 | X: N x K x 2 or N x K x 3 (last dim is dropped) 59 | X_target: N x K x 3, 3rd dim is visibility. 60 | 61 | returns best_cam: N x 3 62 | """ 63 | assert len(X_target.shape) == 3 64 | with tf.name_scope('procrustes2d_vis', values=[X, X_target]): 65 | # Turn vis into [0, 1] 66 | vis = tf.cast(X_target[:, :, 2] > 0, tf.float32) 67 | vis_vec = tf.expand_dims(vis, 2) 68 | # Prepare data. 69 | x_target = X_target[:, :, :2] 70 | x = X[:, :, :2] 71 | # Start: 72 | # Make sure invisible points dont contribute 73 | # (They could be not always 0...) 74 | x_vis = vis_vec * x 75 | x_target_vis = vis_vec * x_target 76 | 77 | num_vis = tf.expand_dims(tf.reduce_sum(vis, 1, keepdims=True), 2) 78 | 79 | # need to compute mean ignoring the non-vis 80 | mu1 = tf.reduce_sum(x_vis, 1, keepdims=True) / num_vis 81 | mu2 = tf.reduce_sum(x_target_vis, 1, keepdims=True) / num_vis 82 | # Need to 0 out the ignore region again 83 | xmu = vis_vec * (x - mu1) 84 | y = vis_vec * (x_target - mu2) 85 | 86 | # A_inv = inv(x'x) 87 | # scale = trace(A_inv * (x'x_target) ) / 2. 88 | # trans = mu_target / scale - mu 89 | # Add noise on the diagonal to avoid numerical instability 90 | # for taking inv. 91 | eps = 1e-6 * tf.eye(2) 92 | Ainv = tf.matrix_inverse(tf.matmul(xmu, xmu, transpose_a=True) + eps) 93 | B = tf.matmul(xmu, y, transpose_a=True) 94 | 95 | scale = tf.expand_dims(tf.trace(tf.matmul(Ainv, B)) / 2., 1) 96 | 97 | if not unbounded: 98 | print('Optcam: lowerbound scale') 99 | # only need the lower bound, but setting max to 10 bc tf doesn't 100 | # take None. 101 | scale = tf.clip_by_value(scale, 0.7, 10) 102 | 103 | trans = tf.squeeze(mu2) / scale - tf.squeeze(mu1) 104 | 105 | best_cam = tf.concat([scale, trans], 1) 106 | return best_cam 107 | -------------------------------------------------------------------------------- /src/util/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jasonyzhang/phd/7b7f526d45913902ed93cdc49fdd59272698bd71/src/util/__init__.py -------------------------------------------------------------------------------- /src/util/common.py: -------------------------------------------------------------------------------- 1 | import os 2 | import os.path as osp 3 | 4 | import cv2 5 | import numpy as np 6 | 7 | 8 | def resize_img(img, scale_factor): 9 | new_size = (np.floor(np.array(img.shape[0:2]) * scale_factor)).astype(int) 10 | new_img = cv2.resize(img, (new_size[1], new_size[0])) 11 | # This is scale factor of [height, width] i.e. [y, x] 12 | actual_factor = [ 13 | new_size[0] / float(img.shape[0]), new_size[1] / float(img.shape[1]) 14 | ] 15 | return new_img, actual_factor 16 | 17 | 18 | def mkdir(dir_path): 19 | if not osp.exists(dir_path): 20 | os.makedirs(dir_path) 21 | -------------------------------------------------------------------------------- /src/util/render_utils.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | 4 | 5 | def add_alpha(img, alpha=1): 6 | shape = img.shape[:2] + (1,) 7 | alpha_channel = alpha * np.ones(shape) 8 | return np.dstack((img, alpha_channel)) 9 | 10 | 11 | def draw_text(input_image, content): 12 | """ 13 | content is a dict. draws key: val on image 14 | Assumes key is str, val is float 15 | """ 16 | image = input_image.copy() 17 | input_is_float = False 18 | if np.issubdtype(image.dtype, np.float): 19 | input_is_float = True 20 | image = (image * 255).astype(np.uint8) 21 | 22 | green = [57, 255, 20] 23 | margin = 15 24 | start_x = 5 25 | start_y = margin 26 | for key in sorted(content.keys()): 27 | value = content[key] 28 | if isinstance(value, str): 29 | text = '{}: {}'.format(key, value) 30 | else: 31 | text = "%s: %.2g" % (key, value) 32 | cv2.putText(image, text, (start_x, start_y), 0, 0.5, green, thickness=2) 33 | start_y += margin 34 | 35 | if input_is_float: 36 | image = image.astype(np.float32) / 255. 37 | return image 38 | 39 | 40 | def draw_skeleton(input_image, joints, draw_edges=True, vis=None, radius=None): 41 | """ 42 | joints is 3 x 19. but if not will transpose it. 43 | 0: Right heel 44 | 1: Right knee 45 | 2: Right hip 46 | 3: Left hip 47 | 4: Left knee 48 | 5: Left heel 49 | 6: Right wrist 50 | 7: Right elbow 51 | 8: Right shoulder 52 | 9: Left shoulder 53 | 10: Left elbow 54 | 11: Left wrist 55 | 12: Neck 56 | 13: Head top 57 | 14: nose 58 | 15: left_eye 59 | 16: right_eye 60 | 17: left_ear 61 | 18: right_ear 62 | 19: left big toe 63 | 20: right big toe 64 | 21: Left small toe 65 | 22: Right small toe 66 | 23: L ankle 67 | 24: R ankle 68 | """ 69 | if radius is None: 70 | radius = max(4, (np.mean(input_image.shape[:2]) * 0.01).astype(int)) 71 | 72 | colors = { 73 | 'pink': [197, 27, 125], # L lower leg 74 | 'light_pink': [233, 163, 201], # L upper leg 75 | 'light_green': [161, 215, 106], # L lower arm 76 | 'green': [77, 146, 33], # L upper arm 77 | 'red': [215, 48, 39], # head 78 | 'light_red': [252, 146, 114], # head 79 | 'light_orange': [252, 141, 89], # chest 80 | 'orange': [200,90,39], 81 | 'purple': [118, 42, 131], # R lower leg 82 | 'light_purple': [175, 141, 195], # R upper 83 | 'light_blue': [145, 191, 219], # R lower arm 84 | 'blue': [69, 117, 180], # R upper arm 85 | 'gray': [130, 130, 130], # 86 | 'white': [255, 255, 255], # 87 | } 88 | 89 | image = input_image.copy() 90 | input_is_float = False 91 | 92 | if (np.issubdtype(image.dtype, np.float32) or 93 | np.issubdtype(image.dtype, np.float64)): 94 | input_is_float = True 95 | max_val = image.max() 96 | if max_val <= 2.: # should be 1 but sometimes it's slightly above 1 97 | image = (image * 255).astype(np.uint8) 98 | else: 99 | image = (image).astype(np.uint8) 100 | 101 | if joints.shape[0] != 2: 102 | joints = joints.T 103 | joints = np.round(joints).astype(int) 104 | 105 | jcolors = [ 106 | 'light_pink', 'light_pink', 'light_pink', 'pink', 'pink', 'pink', 107 | 'light_blue', 'light_blue', 'light_blue', 'blue', 'blue', 'blue', 108 | 'purple', 'purple', 'red', 'green', 'green', 'white', 'white', 109 | 'orange','light_orange','orange','light_orange','pink','light_pink' 110 | ] 111 | 112 | if joints.shape[1] == 19: 113 | # parent indices -1 means no parents 114 | parents = np.array([ 115 | 1, 2, 8, 9, 3, 4, 7, 8, 12, 12, 9, 10, 14, -1, 13, -1, -1, 15, 16 116 | ]) 117 | # Left is dark and right is light. 118 | ecolors = { 119 | 0: 'light_pink', 120 | 1: 'light_pink', 121 | 2: 'light_pink', 122 | 3: 'pink', 123 | 4: 'pink', 124 | 5: 'pink', 125 | 6: 'light_blue', 126 | 7: 'light_blue', 127 | 8: 'light_blue', 128 | 9: 'blue', 129 | 10: 'blue', 130 | 11: 'blue', 131 | 12: 'purple', 132 | 17: 'light_green', 133 | 18: 'light_green', 134 | 14: 'purple' 135 | } 136 | elif joints.shape[1] == 19: 137 | parents = np.array([ 138 | 1, 139 | 2, 140 | 8, 141 | 9, 142 | 3, 143 | 4, 144 | 7, 145 | 8, 146 | -1, 147 | -1, 148 | 9, 149 | 10, 150 | 13, 151 | -1, 152 | ]) 153 | ecolors = { 154 | 0: 'light_pink', 155 | 1: 'light_pink', 156 | 2: 'light_pink', 157 | 3: 'pink', 158 | 4: 'pink', 159 | 5: 'pink', 160 | 6: 'light_blue', 161 | 7: 'light_blue', 162 | 10: 'light_blue', 163 | 11: 'blue', 164 | 12: 'purple' 165 | } 166 | elif joints.shape[1] == 25: 167 | # parent indices -1 means no parents 168 | parents = np.array([ 169 | 24, 2, 8, 9, 3, 23, 7, 8, 12, 12, 9, 10, 14, -1, 13, -1, -1, 15, 170 | 16, 23, 24, 19, 20, 4, 1 171 | ]) 172 | # Left is dark and right is light. 173 | ecolors = { 174 | 0: 'light_pink', 175 | 1: 'light_pink', 176 | 2: 'light_pink', 177 | 3: 'pink', 178 | 4: 'pink', 179 | 5: 'pink', 180 | 6: 'light_blue', 181 | 7: 'light_blue', 182 | 8: 'light_blue', # Right shoulder 183 | 9: 'blue', 184 | 10: 'blue', 185 | 11: 'blue', 186 | 12: 'purple', 187 | 17: 'light_green', 188 | 18: 'light_green', 189 | 14: 'purple', 190 | 19: 'orange', # Left Big Toe 191 | 20: 'light_orange', # Right Big Toe 192 | 21: 'orange', # Left Small Toe 193 | 22: 'light_orange', # Right Small Toe 194 | # Ankles! 195 | 23: 'green', # Left 196 | 24: 'gray' # Right 197 | } 198 | else: 199 | print('Unknown skeleton!!') 200 | import ipdb 201 | ipdb.set_trace() 202 | 203 | for child in range(len(parents)): 204 | point = joints[:, child] 205 | # If invisible skip 206 | if vis is not None and vis[child] == 0: 207 | continue 208 | if draw_edges: 209 | cv2.circle(image, (point[0], point[1]), radius, colors['white'], 210 | -1) 211 | cv2.circle(image, (point[0], point[1]), radius - 1, 212 | colors[jcolors[child]], -1) 213 | else: 214 | cv2.circle(image, (point[0], point[1]), radius - 1, 215 | colors[jcolors[child]], 1) 216 | pa_id = parents[child] 217 | if draw_edges and pa_id >= 0: 218 | if vis is not None and vis[pa_id] == 0: 219 | continue 220 | point_pa = joints[:, pa_id] 221 | cv2.circle(image, (point_pa[0], point_pa[1]), radius - 1, 222 | colors[jcolors[pa_id]], -1) 223 | if child not in ecolors.keys(): 224 | print('bad') 225 | import ipdb 226 | ipdb.set_trace() 227 | cv2.line(image, (point[0], point[1]), (point_pa[0], point_pa[1]), 228 | colors[ecolors[child]], radius - 2) 229 | 230 | # Convert back in original dtype 231 | if input_is_float: 232 | if max_val <= 1.: 233 | image = image.astype(np.float32) / 255. 234 | else: 235 | image = image.astype(np.float32) 236 | return image 237 | -------------------------------------------------------------------------------- /src/util/smooth_bbox.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from scipy.ndimage.filters import gaussian_filter1d 3 | import scipy.signal as signal 4 | 5 | 6 | def get_smooth_bbox_params(kps, vis_thresh=2, kernel_size=11, sigma=3): 7 | """ 8 | Computes smooth bounding box parameters from keypoints: 9 | 1. Computes bbox by rescaling the person to be around 150 px. 10 | 2. Linearly interpolates bbox params for missing annotations. 11 | 3. Median filtering 12 | 4. Gaussian filtering. 13 | Recommended thresholds: 14 | * detect-and-track: 0 15 | * 3DPW: 0.1 16 | Args: 17 | kps (list or ndarray): List of kps (Nx3) or None. 18 | vis_thresh (float): Threshold for visibility. 19 | kernel_size (int): Kernel size for median filtering (must be odd). 20 | sigma (float): Sigma for gaussian smoothing. 21 | Returns: 22 | Smooth bbox params [cx, cy, scale], start index, end index 23 | """ 24 | bbox_params, start, end = get_all_bbox_params(kps, vis_thresh) 25 | smoothed = smooth_bbox_params(bbox_params, kernel_size, sigma) 26 | smoothed = np.vstack((np.zeros((start, 3)), smoothed)) 27 | return smoothed, start, end 28 | 29 | 30 | def kp_to_bbox_param(kp, vis_thresh): 31 | """ 32 | Finds the bounding box parameters from the 2D keypoints. 33 | Args: 34 | kp (Kx3): 2D Keypoints. 35 | vis_thresh (float): Threshold for visibility. 36 | Returns: 37 | [center_x, center_y, scale] 38 | """ 39 | if kp is None: 40 | return 41 | vis = kp[:, 2] > vis_thresh 42 | if not np.any(vis): 43 | return 44 | min_pt = np.min(kp[vis, :2], axis=0) 45 | max_pt = np.max(kp[vis, :2], axis=0) 46 | person_height = np.linalg.norm(max_pt - min_pt) 47 | if person_height < 0.5: 48 | return 49 | center = (min_pt + max_pt) / 2. 50 | scale = 150. / person_height 51 | 52 | return np.append(center, scale) 53 | 54 | 55 | def get_all_bbox_params(kps, vis_thresh=2): 56 | """ 57 | Finds bounding box parameters for all keypoints. 58 | Look for sequences in the middle with no predictions and linearly 59 | interpolate the bbox params for those 60 | Args: 61 | kps (list): List of kps (Kx3) or None. 62 | vis_thresh (float): Threshold for visibility. 63 | Returns: 64 | bbox_params, start_index (incl), end_index (excl) 65 | """ 66 | # keeps track of how many indices in a row with no prediction 67 | num_to_interpolate = 0 68 | start_index = -1 69 | bbox_params = np.empty(shape=(0, 3), dtype=np.float32) 70 | 71 | for i, kp in enumerate(kps): 72 | bbox_param = kp_to_bbox_param(kp, vis_thresh=vis_thresh) 73 | if bbox_param is None: 74 | num_to_interpolate += 1 75 | continue 76 | 77 | if start_index == -1: 78 | # Found the first index with a prediction! 79 | start_index = i 80 | num_to_interpolate = 0 81 | 82 | if num_to_interpolate > 0: 83 | # Linearly interpolate each param. 84 | previous = bbox_params[-1] 85 | # This will be 3x(n+2) 86 | interpolated = np.array( 87 | [np.linspace(prev, curr, num_to_interpolate + 2) 88 | for prev, curr in zip(previous, bbox_param)]) 89 | bbox_params = np.vstack((bbox_params, interpolated.T[1:-1])) 90 | num_to_interpolate = 0 91 | bbox_params = np.vstack((bbox_params, bbox_param)) 92 | 93 | return bbox_params, start_index, i - num_to_interpolate + 1 94 | 95 | 96 | def smooth_bbox_params(bbox_params, kernel_size=11, sigma=8): 97 | """ 98 | Applies median filtering and then gaussian filtering to bounding box 99 | parameters. 100 | Args: 101 | bbox_params (Nx3): [cx, cy, scale]. 102 | kernel_size (int): Kernel size for median filtering (must be odd). 103 | sigma (float): Sigma for gaussian smoothing. 104 | Returns: 105 | Smoothed bounding box parameters (Nx3). 106 | """ 107 | smoothed = np.array([signal.medfilt(param, kernel_size) 108 | for param in bbox_params.T]).T 109 | return np.array([gaussian_filter1d(traj, sigma) for traj in smoothed.T]).T 110 | -------------------------------------------------------------------------------- /src/util/torch_utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | 4 | def orthographic_proj_withz_idrot(X, cam, offset_z=0.): 5 | """ 6 | X: B x N x 3 7 | cam: B x 3: [sc, tx, ty] 8 | No rotation! 9 | Orth preserving the z. 10 | sc * ( x + [tx; ty]) 11 | as in HMR.. 12 | """ 13 | scale = cam[:, 0].contiguous().view(-1, 1, 1) 14 | trans = cam[:, 1:3].contiguous().view(cam.size(0), 1, -1) 15 | 16 | # proj = scale * X 17 | proj = X 18 | 19 | proj_xy = scale * (proj[:, :, :2] + trans) 20 | proj_z = proj[:, :, 2, None] + offset_z 21 | 22 | return torch.cat((proj_xy, proj_z), 2) 23 | 24 | 25 | def orthographic_proj_withz(X, cam, offset_z=0.): 26 | """ 27 | X: B x N x 3 28 | cam: B x 7: [sc, tx, ty, quaternions] 29 | Orth preserving the z. 30 | sc * ( x + [tx; ty]) 31 | as in HMR.. 32 | """ 33 | quat = cam[:, -4:] 34 | X_rot = quat_rotate(X, quat) 35 | 36 | scale = cam[:, 0].contiguous().view(-1, 1, 1) 37 | trans = cam[:, 1:3].contiguous().view(cam.size(0), 1, -1) 38 | 39 | # proj = scale * X_rot 40 | proj = X_rot 41 | 42 | proj_xy = scale * (proj[:, :, :2] + trans) 43 | proj_z = proj[:, :, 2, None] + offset_z 44 | 45 | return torch.cat((proj_xy, proj_z), 2) 46 | 47 | 48 | def quat_rotate(X, q): 49 | """Rotate points by quaternions. 50 | Args: 51 | X: B X N X 3 points 52 | q: B X 4 quaternions 53 | Returns: 54 | X_rot: B X N X 3 (rotated points) 55 | """ 56 | # repeat q along 2nd dim 57 | ones_x = X[[0], :, :][:, :, [0]] * 0 + 1 58 | q = torch.unsqueeze(q, 1) * ones_x 59 | 60 | q_conj = torch.cat([q[:, :, [0]], -1 * q[:, :, 1:4]], dim=-1) 61 | X = torch.cat([X[:, :, [0]] * 0, X], dim=-1) 62 | 63 | X_rot = hamilton_product(q, hamilton_product(X, q_conj)) 64 | return X_rot[:, :, 1:4] 65 | 66 | 67 | def hamilton_product(qa, qb): 68 | """Multiply qa by qb. 69 | Args: 70 | qa: B X N X 4 quaternions 71 | qb: B X N X 4 quaternions 72 | Returns: 73 | q_mult: B X N X 4 74 | """ 75 | qa_0 = qa[:, :, 0] 76 | qa_1 = qa[:, :, 1] 77 | qa_2 = qa[:, :, 2] 78 | qa_3 = qa[:, :, 3] 79 | 80 | qb_0 = qb[:, :, 0] 81 | qb_1 = qb[:, :, 1] 82 | qb_2 = qb[:, :, 2] 83 | qb_3 = qb[:, :, 3] 84 | 85 | # See https://en.wikipedia.org/wiki/Quaternion#Hamilton_product 86 | q_mult_0 = qa_0 * qb_0 - qa_1 * qb_1 - qa_2 * qb_2 - qa_3 * qb_3 87 | q_mult_1 = qa_0 * qb_1 + qa_1 * qb_0 + qa_2 * qb_3 - qa_3 * qb_2 88 | q_mult_2 = qa_0 * qb_2 - qa_1 * qb_3 + qa_2 * qb_0 + qa_3 * qb_1 89 | q_mult_3 = qa_0 * qb_3 + qa_1 * qb_2 - qa_2 * qb_1 + qa_3 * qb_0 90 | 91 | return torch.stack([q_mult_0, q_mult_1, q_mult_2, q_mult_3], dim=-1) -------------------------------------------------------------------------------- /src/util/video.py: -------------------------------------------------------------------------------- 1 | import os 2 | import shutil 3 | import subprocess 4 | import tempfile 5 | 6 | import matplotlib.pyplot as plt 7 | from tqdm import tqdm 8 | 9 | 10 | def images_to_video(output_path, images, fps): 11 | writer = VideoWriter(output_path, fps) 12 | writer.add_images(images) 13 | writer.make_video() 14 | writer.close() 15 | 16 | 17 | def sizeof_fmt(num, suffix='B'): 18 | """ 19 | Returns the filesize as human readable string. 20 | 21 | https://stackoverflow.com/questions/1094841/reusable-library-to-get-human- 22 | readable-version-of-file-size 23 | """ 24 | for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']: 25 | if abs(num) < 1024.0: 26 | return '%3.1f%s%s' % (num, unit, suffix) 27 | num /= 1024.0 28 | return '%.1f%s%s' % (num, 'Yi', suffix) 29 | 30 | 31 | def get_dir_size(dirname): 32 | """ 33 | Returns the size of the contents of a directory. (Doesn't include subdirs.) 34 | """ 35 | size = 0 36 | for fname in os.listdir(dirname): 37 | fname = os.path.join(dirname, fname) 38 | if os.path.isfile(fname): 39 | size += os.path.getsize(fname) 40 | return size 41 | 42 | 43 | class VideoWriter(object): 44 | 45 | def __init__(self, output_path, fps, temp_dir=None): 46 | self.output_path = output_path 47 | self.fps = fps 48 | self.temp_dir = temp_dir 49 | self.current_index = 0 50 | self.img_shape = None 51 | self.frame_string = 'frame{:08}.jpg' 52 | 53 | def add_images(self, images_list, show_pbar=False): 54 | """ 55 | Adds a list of images to temporary directory. 56 | 57 | Args: 58 | images_list (iterable): List of images (HxWx3). 59 | show_pbar (bool): If True, displays a progress bar. 60 | 61 | Returns: 62 | list: filenames of saved images. 63 | """ 64 | filenames = [] 65 | if show_pbar: 66 | images_list = tqdm(images_list) 67 | for image in images_list: 68 | filenames.append(self.add_image(image)) 69 | return filenames 70 | 71 | def add_image(self, image): 72 | """ 73 | Saves image to file. 74 | 75 | Args: 76 | image (HxWx3). 77 | 78 | Returns: 79 | str: filename. 80 | """ 81 | if self.temp_dir is None: 82 | self.temp_dir = tempfile.mkdtemp() 83 | if self.img_shape is None: 84 | self.img_shape = image.shape 85 | assert self.img_shape == image.shape 86 | filename = self.get_filename(self.current_index) 87 | plt.imsave(fname=filename, arr=image) 88 | self.current_index += 1 89 | return filename 90 | 91 | def get_frame(self, index): 92 | """ 93 | Read image from file. 94 | 95 | Args: 96 | index (int). 97 | 98 | Returns: 99 | Array (HxWx3). 100 | """ 101 | filename = self.get_filename(index) 102 | return plt.imread(fname=filename) 103 | 104 | def get_filename(self, index): 105 | if self.temp_dir is None: 106 | self.temp_dir = tempfile.mkdtemp() 107 | return os.path.join(self.temp_dir, self.frame_string.format(index)) 108 | 109 | def make_video(self): 110 | cmd = ('ffmpeg -y -threads 16 -r {fps} ' 111 | '-i {temp_dir}/frame%08d.jpg -profile:v baseline -level 3.0 ' 112 | '-c:v libx264 -pix_fmt yuv420p -an -vf ' 113 | '"scale=trunc(iw/2)*2:trunc(ih/2)*2" {output_path}'.format( 114 | fps=self.fps, temp_dir=self.temp_dir, output_path=self.output_path 115 | )) 116 | print(cmd) 117 | try: 118 | subprocess.call(cmd, shell=True) 119 | except OSError as e: 120 | import ipdb; ipdb.set_trace() 121 | print('OSError') 122 | 123 | def close(self): 124 | """ 125 | Clears the temp_dir. 126 | """ 127 | print('Removing {} which contains {}.'.format( 128 | self.temp_dir, 129 | self.get_temp_dir_size()) 130 | ) 131 | shutil.rmtree(self.temp_dir) 132 | self.temp_dir = None 133 | 134 | def get_temp_dir_size(self): 135 | """ 136 | Returns the size of the temp dir. 137 | """ 138 | return sizeof_fmt(get_dir_size(self.temp_dir)) 139 | 140 | 141 | class VideoReader(object): 142 | 143 | def __init__(self, video_path, temp_dir=None): 144 | self.video_path = video_path 145 | self.temp_dir = temp_dir 146 | self.frame_string = 'frame{:08}.jpg' 147 | 148 | def read(self): 149 | if self.temp_dir is None: 150 | self.temp_dir = tempfile.mkdtemp() 151 | cmd = ('ffmpeg -i {video_path} -start_number 0 ' 152 | '{temp_dir}/frame%08d.jpg'.format( 153 | temp_dir=self.temp_dir, 154 | video_path=self.video_path 155 | )) 156 | print(cmd) 157 | subprocess.call(cmd, shell=True) 158 | self.num_frames = len(os.listdir(self.temp_dir)) 159 | 160 | def get_filename(self, index): 161 | if self.temp_dir is None: 162 | self.temp_dir = tempfile.mkdtemp() 163 | return os.path.join(self.temp_dir, self.frame_string.format(index)) 164 | 165 | def get_image(self, index): 166 | return plt.imread(self.get_filename(index)) 167 | 168 | def get_images(self): 169 | i = 0 170 | fname = self.get_filename(i) 171 | while os.path.exists(fname): 172 | yield plt.imread(self.get_filename(i)) 173 | i += 1 174 | fname = self.get_filename(i) 175 | 176 | def close(self): 177 | """ 178 | Clears the temp_dir. 179 | """ 180 | print('Removing {} which contains {}.'.format( 181 | self.temp_dir, 182 | self.get_temp_dir_size()) 183 | ) 184 | shutil.rmtree(self.temp_dir) 185 | self.temp_dir = None 186 | 187 | def get_temp_dir_size(self): 188 | """ 189 | Returns the size of the temp dir. 190 | """ 191 | return sizeof_fmt(get_dir_size(self.temp_dir)) 192 | --------------------------------------------------------------------------------