├── .gitignore
├── LICENSE
├── README.md
├── data
    └── 0502.mp4
├── demo.py
├── requirements.txt
└── src
    ├── __init__.py
    ├── config.py
    ├── evaluation
        ├── __init__.py
        ├── run_video.py
        └── tester_pred.py
    ├── external
        ├── install_alphapose.sh
        └── install_nmr.sh
    ├── extract_tracks.py
    ├── models.py
    ├── omega.py
    ├── ops.py
    ├── renderer.py
    ├── tf_smpl
        ├── __init__.py
        ├── batch_lbs.py
        ├── batch_smpl.py
        └── projection.py
    └── util
        ├── __init__.py
        ├── common.py
        ├── render_utils.py
        ├── smooth_bbox.py
        ├── torch_utils.py
        └── video.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | .idea
 2 | __pycache__
 3 | 
 4 | data
 5 | demo_output
 6 | models
 7 | src/external/AlphaPose
 8 | src/external/neural_renderer
 9 | venv_*
10 | 
11 | *.pyc
12 | *.swp
13 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | BSD 2-Clause License
 2 | 
 3 | Copyright (c) 2019, Jason Zhang
 4 | All rights reserved.
 5 | 
 6 | Redistribution and use in source and binary forms, with or without
 7 | modification, are permitted provided that the following conditions are met:
 8 | 
 9 | 1. Redistributions of source code must retain the above copyright notice, this
10 |    list of conditions and the following disclaimer.
11 | 
12 | 2. Redistributions in binary form must reproduce the above copyright notice,
13 |    this list of conditions and the following disclaimer in the documentation
14 |    and/or other materials provided with the distribution.
15 | 
16 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
17 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
19 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
20 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
22 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
23 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
24 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Predicting 3D Human Dynamics from Video
 2 | 
 3 | Jason Y. Zhang, Panna Felsen, Angjoo Kanazawa, Jitendra Malik
 4 | 
 5 | University of California, Berkeley
 6 | 
 7 | [Project Page](https://jasonyzhang.com/phd/)
 8 | 
 9 | ![Teaser Image](https://jasonyzhang.com/phd/assets/img/overview.jpg)
10 | 
11 | Requirements:
12 | * Python 3 (tested on 3.6.8)
13 | * Tensorflow (tested on 1.15)
14 | * Pytorch for NMR (tested on 1.3.0)
15 | * CUDA (tested on 10.0)
16 | * ffmpeg (tested on 3.4.6)
17 | 
18 | 
19 | ### License:
20 | 
21 | Our code is licensed under BSD. Note that the SMPL model and any datasets still
22 | fall under their respective licenses.
23 | 
24 | ### Installation:
25 | ```bash
26 | virtualenv venv_phd -p python3
27 | source venv_phd/bin/activate
28 | pip install -U pip
29 | pip install numpy tensorflow-gpu==1.15.0
30 | pip install torch==1.3.0  # Make sure the wheel corresponds to your CUDA Version
31 | pip install -r requirements.txt
32 | cd src/external
33 | sh install_nmr.sh
34 | ```
35 | 
36 | Download the model weights from [this Google Drive link](https://drive.google.com/file/d/1_sipXE-FNs_08YCPFxFlLauHJcqzny7x/view?usp=sharing).
37 | You should place them in `phd/models`.
38 | 
39 | 
40 | ## Running Demo
41 | 
42 | ### Penn Action
43 | 
44 | Download the [Penn Action dataset](http://dreamdragon.github.io/PennAction/).
45 | You should place or symlink the dataset to `phd/data/Penn_Action`.
46 | 
47 | #### Running on one subsequence
48 | `--vid_id 0104` runs the model on video 0104 in Penn Action. The public model is
49 | conditioned on 15 images, so `--start_frame 60` starts the conditioning window 
50 | at 60, and future prediction will start on frame 76. `--ar_length 25` sets the
51 | number of future predictions at 25, which is the prediction length the model
52 | was trained on. You can also try increasing `ar_length`, which usually looks
53 | reasonable until 35. 
54 | 
55 | ```
56 | python demo.py --load_path models/phd.ckpt-199269 --vid_id 0104 --ar_length 25 --start_frame 60
57 | ```
58 | 
59 | For reference, [this](https://jasonyzhang.com/phd/assets/vid/penn_action-0104_AR25_60-100_fps5.mp4)
60 | should be your output.
61 | 
62 | #### Running on multiple subsequences 
63 | 
64 | You can also run at multiple starting points in the same sequence.
65 | `--start_frame 0 --skip_rate 5` will run starting at frame 0, frame 5, frame 10,
66 | etc.
67 | 
68 | ```
69 | python demo.py --load_path models/phd.ckpt-199269 --vid_id 0104 --ar_length 25 --start_frame 0 --skip_rate 5 
70 | ```
71 | For reference, [this](https://jasonyzhang.com/phd/assets/vid/0104.zip) should be your output.
72 | 
73 | 
74 | ### Running on Any Video
75 | 
76 | To run on a generic video, you will need a tracklet around the person. We extract the tracklet using PoseFlow.
77 | 
78 | Follow directions to download AlphaPose and Model Weights from https://github.com/MVIG-SJTU/AlphaPose.
79 | 
80 | Roughly, that should entail:
81 | 1. Clone the repo to `src/external`
82 | 2. Build AlphaPose using `python setup.py build develop --user`
83 | 3. Download pre-trained weights to the specified directories. Use the ResNet50 Fast Pose from the Model Zoo.
84 | 
85 | Steps 1. and 2. can be done by running `sh install_alphapose.sh` in `src/external`
86 | 
87 | Now you should be able to run the model on any video, eg:
88 | ```
89 | python demo.py --load_path models/phd.ckpt-199269 --vid_path data/0502.mp4 --start_frame 0 --ar_length 25
90 | ```
91 | 
92 | ## Training Code
93 | 
94 | Coming soon
95 | 


--------------------------------------------------------------------------------
/data/0502.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jasonyzhang/phd/7b7f526d45913902ed93cdc49fdd59272698bd71/data/0502.mp4


--------------------------------------------------------------------------------
/demo.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Runs PHD on Penn Action video.
  3 | """
  4 | 
  5 | from glob import glob
  6 | import os
  7 | import os.path as osp
  8 | import warnings
  9 | 
 10 | from absl import flags
 11 | import numpy as np
 12 | from scipy.io import loadmat
 13 | import tensorflow as tf
 14 | 
 15 | from src.config import get_config
 16 | from src.evaluation.run_video import (
 17 |     process_image,
 18 |     process_videos,
 19 |     run_predictions,
 20 | )
 21 | from src.evaluation.tester_pred import TesterPred
 22 | from src.extract_tracks import (
 23 |     compute_tracks,
 24 |     get_labels_poseflow,
 25 | )
 26 | from src.renderer import VisRenderer
 27 | from src.util.smooth_bbox import get_smooth_bbox_params
 28 | 
 29 | flags.DEFINE_string('dataset', '',
 30 |                     'Dataset to use. Leave blank if using PoseFlow to extract'
 31 |                     ' tracks. Otherwise can set to "penn_action".')
 32 | flags.DEFINE_string('vid_id', '0001', 'Video id number if using Penn Action.')
 33 | flags.DEFINE_string('vid_path', 'data/0504.mp4',
 34 |                     'Path to filename if using PoseFlow for tracks')
 35 | 
 36 | flags.DEFINE_integer('ar_length', 25, 'Number of steps into future to predict.')
 37 | flags.DEFINE_integer('start_frame', 0, 'First frame of conditioning.')
 38 | flags.DEFINE_integer('skip_rate', None,
 39 |                      'If set, will be used for choosing subsequences.')
 40 | flags.DEFINE_integer('fps', 5, 'Frames per second in rendered video.')
 41 | flags.DEFINE_integer('degrees', '60', 'Angle for rotated viewpoint.')
 42 | flags.DEFINE_string('mesh_color', 'blue', 'Color of mesh.')
 43 | 
 44 | flags.DEFINE_string('out_dir', 'demo_output', 'Where to save final PHD videos.')
 45 | flags.DEFINE_string('penn_dir', 'data/Penn_Action',
 46 |                     'Directory where Penn Action is saved.')
 47 | 
 48 | NUM_CONDITION = 15
 49 | 
 50 | # Hides some TF warnings.
 51 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
 52 | tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
 53 | 
 54 | 
 55 | def load_penn_video(penn_dir, vid_id):
 56 |     im_paths = sorted(glob(osp.join(penn_dir, 'frames', vid_id, '*.jpg')))
 57 |     labels = loadmat(osp.join(penn_dir, 'labels', '{}.mat'.format(vid_id)))
 58 |     if np.all(labels['train'] == 1):
 59 |         print('Warning: {} is a training sequence!'.format(vid_id))
 60 |     kps = np.dstack((labels['x'], labels['y'], labels['visibility']))
 61 |     return im_paths, kps
 62 | 
 63 | 
 64 | def load_poseflow_video(vid_path, out_dir):
 65 |     track_json, im_dir = compute_tracks(vid_path=vid_path, out_dir=out_dir)
 66 |     im_paths = sorted(glob(osp.join(im_dir, '*.png')))
 67 |     kps = get_labels_poseflow(
 68 |         json_path=track_json,
 69 |         num_frames=len(im_paths),
 70 |         min_kp_count=NUM_CONDITION,
 71 |     )
 72 |     return im_paths, kps
 73 | 
 74 | 
 75 | def main(model):
 76 |     # Keypoints are only used to compute the bounding box around human tracks.
 77 |     # They are not fed into the model. Keypoint format is [x, y, vis]. Keypoint
 78 |     # order doesn't matter.
 79 |     if config.dataset == '':
 80 |         im_paths, kps = load_poseflow_video(config.vid_path, config.out_dir)
 81 |         vis_thresh = 0.1
 82 |     elif config.dataset == 'penn_action':
 83 |         im_paths, kps = load_penn_video(config.penn_dir, config.vid_id)
 84 |         vis_thresh = 0.5
 85 |     else:
 86 |         raise Exception('Dataset {} not recognized'.format(config.dataset))
 87 |     bbox_params_smooth, s, e = get_smooth_bbox_params(kps, vis_thresh)
 88 |     images = []
 89 |     min_f = max(s, 0)
 90 |     max_f = min(e, len(kps))
 91 |     for i in range(min_f, max_f):
 92 |         images.append(process_image(
 93 |             im_path=im_paths[i],
 94 |             bbox_param=bbox_params_smooth[i]
 95 |         ))
 96 |     all_images, vid_paths = process_videos(
 97 |         config=config,
 98 |         images=images,
 99 |         T=(NUM_CONDITION + config.ar_length),
100 |         suffix='AR{}'.format(config.ar_length),
101 |     )
102 |     if not osp.exists(config.out_dir):
103 |         os.mkdir(config.out_dir)
104 |     renderer = VisRenderer(img_size=224)
105 |     for i in range(0, len(all_images), config.batch_size):
106 |         run_predictions(
107 |             config=config,
108 |             renderer=renderer,
109 |             model=model,
110 |             images=all_images[i : i + config.batch_size],
111 |             vid_paths=vid_paths[i : i + config.batch_size],
112 |             num_condition=NUM_CONDITION,
113 |         )
114 | 
115 | 
116 | if __name__ == '__main__':
117 |     config = get_config()
118 |     if config.skip_rate is None:
119 |         setattr(config, 'batch_size', 1)
120 |     model = TesterPred(
121 |         config,
122 |         sequence_length=(NUM_CONDITION + config.ar_length),
123 |         resnet_path='models/hmr_noS5.ckpt-642561',
124 |     )
125 |     main(model)
126 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | # python requirements
 2 | pip>=9.0
 3 | absl-py==0.7.1
 4 | chumpy==0.68
 5 | deepdish==0.3.6
 6 | ipdb==0.12
 7 | matplotlib==3.0.3
 8 | # neural_renderer_pytorch==1.1.3  # Works better if you build from source
 9 | numpy
10 | opencv-python==4.2.0.32
11 | pillow<7
12 | scikit-image==0.15.0
13 | scipy==1.2.1
14 | # tensorflow-gpu==1.14
15 | torch==1.3.0
16 | torchvision==0.4.1
17 | tqdm==4.19.9
18 | 
19 | # For AlphaPose/Poseflow
20 | cython
21 | munkres==1.0.12
22 | 


--------------------------------------------------------------------------------
/src/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jasonyzhang/phd/7b7f526d45913902ed93cdc49fdd59272698bd71/src/__init__.py


--------------------------------------------------------------------------------
/src/config.py:
--------------------------------------------------------------------------------
 1 | import os.path as osp
 2 | import sys
 3 | 
 4 | from absl import flags
 5 | 
 6 | 
 7 | curr_path = osp.dirname(osp.abspath(__file__))
 8 | model_dir = osp.join(curr_path, '..', 'models')
 9 | if not osp.exists(model_dir):
10 |     print('Fix path to models/')
11 |     import ipdb
12 |     ipdb.set_trace()
13 | SMPL_MODEL_PATH = osp.join(model_dir,
14 |                            'neutral_smpl_with_cocoplustoesankles_reg.pkl')
15 | SMPL_FACE_PATH = osp.join(curr_path, '../src/tf_smpl', 'smpl_faces.npy')
16 | 
17 | # Default pred-trained model path for the demo.
18 | PRETRAINED_MODEL = osp.join(model_dir, 'model.ckpt-667589')
19 | 
20 | flags.DEFINE_string('smpl_model_path', SMPL_MODEL_PATH,
21 |                     'path to the neutral smpl model')
22 | flags.DEFINE_string('smpl_face_path', SMPL_FACE_PATH,
23 |                     'path to smpl mesh faces (for easy rendering)')
24 | 
25 | # Model details
26 | 
27 | flags.DEFINE_string('load_path', None, 'path to trained model dir')
28 | flags.DEFINE_integer('batch_size', 8, 'Size of mini-batch.')
29 | flags.DEFINE_integer('num_conv_layers', 3, '# of layers for convolutional')
30 | flags.DEFINE_boolean('use_delta_from_pred', True,
31 |                      'If True, initialize delta regressor from pred.')
32 | flags.DEFINE_boolean('pad_edges', False, 'If True, edge pad, else zero pad.')
33 | flags.DEFINE_bool('use_optcam', True,
34 |                   'If True, kp reprojection uses optimal camera.')
35 | flags.DEFINE_integer('num_kps', 25, 'Number of keypoints.')
36 | 
37 | 
38 | # For training.
39 | flags.DEFINE_string('data_dir', None, 'Where tfrecords are saved')
40 | flags.DEFINE_string('model_dir', None,
41 |                     'Where model will be saved -- filled automatically')
42 | flags.DEFINE_list('datasets', ['h36m', 'penn_action', 'insta_variety'],
43 |                   'datasets to use for training')
44 | flags.DEFINE_list('mocap_datasets', ['CMU', 'H3.6', 'jointLim'],
45 |                   'datasets to use for adversarial prior training')
46 | flags.DEFINE_list('pretrained_model_path', [PRETRAINED_MODEL],
47 |                   'if not None, fine-tunes from this ckpt')
48 | flags.DEFINE_string('image_encoder_model_type', 'resnet',
49 |                     'Specifies which image encoder to use')
50 | flags.DEFINE_string('temporal_encoder_type', 'AZ_FC2GN',
51 |                     'Specifies which network to use for temporal encoding')
52 | flags.DEFINE_integer('img_size', 224,
53 |                      'Input image size to the network after preprocessing')
54 | flags.DEFINE_integer('num_stage', 3, '# of times to iterate IEF regressor')
55 | flags.DEFINE_integer('max_iteration', 5000000, '# of max iteration to train')
56 | flags.DEFINE_integer('log_img_count', 10,
57 |                      'Number of images in sequence to visualize')
58 | flags.DEFINE_integer('log_img_step', 5000,
59 |                      'How often to visualize img during training')
60 | 
61 | # Random seed
62 | flags.DEFINE_integer('seed', 1, 'Graph-level random seed')
63 | 
64 | 
65 | def get_config():
66 |     config = flags.FLAGS
67 |     config(sys.argv)
68 |     return config
69 | 


--------------------------------------------------------------------------------
/src/evaluation/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jasonyzhang/phd/7b7f526d45913902ed93cdc49fdd59272698bd71/src/evaluation/__init__.py


--------------------------------------------------------------------------------
/src/evaluation/run_video.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | 
  3 | import cv2
  4 | import ipdb
  5 | import matplotlib.pyplot as plt
  6 | import numpy as np
  7 | from skimage.io import imread
  8 | from tqdm import tqdm
  9 | 
 10 | import src.util.render_utils as vis_util
 11 | from src.util.common import resize_img
 12 | from src.util.video import VideoWriter
 13 | 
 14 | IMG_SIZE = 224
 15 | 
 16 | 
 17 | def get_output_path_name(config, vid_id, suffix='', inds=()):
 18 |     """
 19 |     Returns the output video's path.
 20 | 
 21 |     Args:
 22 |         config: Configuration.
 23 |         vid_id (str): Id of video.
 24 |         suffix (str).
 25 |         inds (tuple): Indices of start and end of conditioning.
 26 | 
 27 |     Returns:
 28 |         str: Output video path name.
 29 |     """
 30 |     if inds:
 31 |         suffix += '_' + '-'.join(map(str, inds))
 32 |     suffix += '_fps{}'.format(config.fps)
 33 |     if config.dataset:
 34 |         output_name = '{dataset}-{vid}_{suf}.mp4'.format(
 35 |             dataset=config.dataset,
 36 |             vid=vid_id,
 37 |             suf=suffix,
 38 |         )
 39 |     else:
 40 |         output_name = '{vid}_{suf}.mp4'.format(
 41 |             vid=os.path.basename(config.vid_path).split('.')[0],
 42 |             suf=suffix,
 43 |         )
 44 | 
 45 |     return output_name
 46 | 
 47 | 
 48 | def process_videos(config, images, T, suffix=''):
 49 |     """
 50 | 
 51 |     Args:
 52 |         config: Configuration.
 53 |         images (list): list of images.
 54 |         T (int): Sequence length (conditioning length + AR length).
 55 | 
 56 |     Returns:
 57 |         all_images: List of list of images, each corresponding to a subseq
 58 |         all_vid_paths: Video names corresponding to each subseq.
 59 |     """
 60 |     start_frame = config.start_frame
 61 |     skip_rate = config.skip_rate
 62 |     n = len(images)
 63 |     if skip_rate is None:
 64 |         if start_frame + T > n:
 65 |             # In case we want to run past edge of video.
 66 |             images.extend([np.zeros((IMG_SIZE, IMG_SIZE, 3))] * T)
 67 |             n += T
 68 |         starts = [start_frame]
 69 |     else:
 70 |         starts = np.arange(start_frame, n - T, skip_rate, dtype=int)
 71 |     all_images, all_vid_paths = [], []
 72 |     for start in starts:
 73 |         end = start + T
 74 |         if n < end:
 75 |             print('Too short!')
 76 |         ims = images[start:end]
 77 |         vid_path = get_output_path_name(
 78 |             config=config,
 79 |             vid_id=config.vid_id.replace('mp4', ''),
 80 |             suffix=suffix,
 81 |             inds=(start, end),
 82 |         )
 83 |         all_images.append(ims)
 84 |         all_vid_paths.append(os.path.join(config.out_dir, vid_path))
 85 |     while len(all_images) % config.batch_size != 0:
 86 |         # Pad with garbage so that we fill up the batch.
 87 |         all_images.append(np.zeros((T, 224, 224, 3)))
 88 |         all_vid_paths.append('')
 89 |     return all_images, all_vid_paths
 90 | 
 91 | 
 92 | def process_image(im_path, bbox_param):
 93 |     """
 94 |     Processes an image, producing 224x224 crop.
 95 |     Args:
 96 |         im_path (str).
 97 |         bbox_param (3,): [cx, cy, scale].
 98 | 
 99 |     Returns:
100 |         image
101 |     """
102 |     image = imread(im_path)
103 |     center = bbox_param[:2]
104 |     scale = bbox_param[2]
105 | 
106 |     # Pre-process image to [-1, 1]
107 |     image = ((image / 255.) - 0.5) * 2
108 |     image_scaled, scale_factors = resize_img(image, scale)
109 |     center_scaled = np.round(center * scale_factors).astype(np.int)
110 | 
111 |     # Make sure there is enough space to crop 224x224.
112 |     image_padded = np.pad(
113 |         array=image_scaled,
114 |         pad_width=((IMG_SIZE,), (IMG_SIZE,), (0,)),
115 |         mode='edge'
116 |     )
117 |     height, width = image_padded.shape[:2]
118 |     center_scaled += IMG_SIZE
119 | 
120 |     # Crop 224x224 around the center.
121 |     margin = IMG_SIZE // 2
122 | 
123 |     start_pt = (center_scaled - margin).astype(int)
124 |     end_pt = (center_scaled + margin).astype(int)
125 |     end_pt[0] = min(end_pt[0], width)
126 |     end_pt[1] = min(end_pt[1], height)
127 |     image_scaled = image_padded[start_pt[1]:end_pt[1],
128 |                                 start_pt[0]:end_pt[0], :]
129 |     return image_scaled
130 | 
131 | 
132 | def run_predictions(config, renderer, model, images, vid_paths, num_condition):
133 |     """
134 | 
135 |     Args:
136 |         config: Configuration.
137 |         renderer (VisRenderer).
138 |         model (TesterPred).
139 |         images (ndarray): B x T x H x W x 3.
140 |         vid_paths (B).
141 |         ar_length (int): Number of times to run auto-regressive prediction.
142 |         num_condition (int): Condition length.
143 |     """
144 |     fov = model.fov
145 |     ar_length = config.ar_length
146 |     images = np.array(images)
147 |     preds_gt = model.predict_movie_strips(images, get_smpl=True)
148 |     movie_strips = preds_gt['movie_strips_cond']
149 |     movie_strips = movie_strips[:, num_condition - fov: num_condition]
150 |     verts_gt = preds_gt['verts'][:, -ar_length - fov:]  # B x (ar+f) x 6980 x 3
151 |     verts = []
152 |     for _ in tqdm(range(ar_length)):
153 |         preds = model.predict_auto_regressive(movie_strips[:, -fov:])
154 |         movie_strips = np.concatenate((
155 |             movie_strips,
156 |             preds['movie_strip'],  # B x 1 x 2048
157 |         ), axis=1)
158 |         verts.append(np.squeeze(preds['verts'], axis=1))
159 | 
160 |     verts = np.array(verts)  # ar x B x 6980 x 3!
161 | 
162 |     for i in range(len(images)):
163 |         if vid_paths[i] == '':
164 |             continue
165 |         render_results(
166 |             config=config,
167 |             renderer=renderer,
168 |             fov=fov,
169 |             vid_path=vid_paths[i],
170 |             images=images[i][-model.fov - ar_length:],
171 |             verts=verts[:, i],
172 |             verts_gt=verts_gt[i],
173 |         )
174 | 
175 | 
176 | def render_results(config, renderer, fov, vid_path, images, verts, verts_gt):
177 |     """
178 | 
179 |     Args:
180 |         config
181 |         renderer (VisRenderer).
182 |         fov (int).
183 |         images ((f+ar) x H x W x 3).
184 |         verts (ar x 6980 x 3): Predicted vertices from auto-regressive model.
185 |         verts_gt ((f+ar) x 6980 x 3): Predicted vertices from real movie strips.
186 |     """
187 |     print('Rendering', vid_path)
188 |     writer = VideoWriter(output_path=vid_path, fps=config.fps)
189 |     images = (images + 1) * 0.5
190 |     for i, im in tqdm(enumerate(images)):
191 |         if im.shape[0] != IMG_SIZE:
192 |             im = cv2.resize(im, (IMG_SIZE, IMG_SIZE))
193 |         im = vis_util.draw_text(im, {'T': i - fov + 1})
194 |         if i < fov:
195 |             vert = verts_gt[i]
196 |             color = 'yellow'
197 |             im = vis_util.add_alpha(im)
198 |         else:
199 |             vert = verts[i - fov]
200 |             color = config.mesh_color
201 |             im = vis_util.add_alpha(im, 0.7)
202 |         mesh = renderer(
203 |             verts=vert,
204 |             color_name=color,
205 |             alpha=True,
206 |             cam=np.array([0.7, 0, 0]),
207 |         ) / 255.
208 |         rot = renderer.rotated(
209 |             verts=vert,
210 |             deg=config.degrees,
211 |             color_name=color,
212 |             alpha=True,
213 |             cam=np.array([0.7, 0, 0]),
214 |         ) / 255.
215 |         combined = np.hstack((im, mesh, rot))
216 |         writer.add_image(combined)
217 |     writer.make_video()
218 |     writer.close()
219 | 


--------------------------------------------------------------------------------
/src/evaluation/tester_pred.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import os.path as osp
  3 | 
  4 | import deepdish as dd
  5 | import numpy as np
  6 | import tensorflow as tf
  7 | 
  8 | from src.models import (
  9 |     batch_pred_omega,
 10 |     get_image_encoder,
 11 |     get_prediction_model,
 12 |     get_temporal_encoder,
 13 | )
 14 | from src.omega import (
 15 |     OmegasPred,
 16 | )
 17 | from src.tf_smpl.batch_smpl import SMPL
 18 | 
 19 | 
 20 | class TesterPred(object):
 21 | 
 22 |     def __init__(self, config, sequence_length, resnet_path='', sess=None,
 23 |                  precomputed_phi=False):
 24 |         self.config = config
 25 |         self.load_path = config.load_path
 26 |         tf.set_random_seed(config.seed)
 27 | 
 28 |         self.num_conv_layers = 3
 29 |         self.fov = self.num_conv_layers * 4 + 1
 30 |         self.sequence_length = sequence_length
 31 |         self.use_delta_from_pred = config.use_delta_from_pred
 32 |         self.use_optcam = config.use_optcam
 33 |         self.precomputed_phi = precomputed_phi
 34 | 
 35 |         # Config + path
 36 |         if not config.load_path:
 37 |             raise Exception(
 38 |                 'You need to specify `load_path` to load a pretrained model'
 39 |             )
 40 |         if not osp.exists(config.load_path + '.index'):
 41 |             print('{} doesnt exist'.format(config.load_path))
 42 |             import ipdb
 43 |             ipdb.set_trace()
 44 | 
 45 |         # Data
 46 |         self.batch_size = config.batch_size
 47 |         self.img_size = config.img_size
 48 |         self.E_var = []
 49 |         self.pad_edges = config.pad_edges
 50 | 
 51 |         self.smpl_model_path = config.smpl_model_path
 52 |         self.use_hmr_ief = False
 53 | 
 54 |         self.num_output = 85
 55 | 
 56 |         if precomputed_phi:
 57 |             input_size = (self.batch_size, self.sequence_length, 2048)
 58 |         else:
 59 |             input_size = (self.batch_size, self.sequence_length,
 60 |                           self.img_size, self.img_size, 3)
 61 |         self.images_pl = tf.placeholder(tf.float32, shape=input_size)
 62 | 
 63 |         strip_size = (self.batch_size, self.fov, 2048)
 64 |         self.movie_strips_pl = tf.placeholder(tf.float32, shape=strip_size)
 65 | 
 66 |         # Model Spec
 67 |         self.f_image_enc = get_image_encoder()
 68 |         self.f_temporal_enc = get_temporal_encoder()
 69 |         self.f_prediction_ar = get_prediction_model()
 70 | 
 71 |         self.smpl = SMPL(self.smpl_model_path)
 72 |         self.omegas_movie_strip = self.make_omega_pred()
 73 |         self.omegas_pred = self.make_omega_pred(use_optcam=True)
 74 | 
 75 |         # HMR Model Params
 76 |         self.num_stage = 3
 77 |         self.total_params = 85
 78 | 
 79 |         self.load_mean_omega()
 80 |         self.build_temporal_encoder_model()
 81 |         self.build_auto_regressive_model()
 82 |         self.update_E_vars()
 83 | 
 84 |         if sess is None:
 85 |             options = tf.GPUOptions(per_process_gpu_memory_fraction=0.7)
 86 |             self.sess = tf.Session(config=tf.ConfigProto(gpu_options=options))
 87 |         else:
 88 |             self.sess = sess
 89 | 
 90 |         # Load data.
 91 |         self.prepare(resnet_path)
 92 | 
 93 |     def make_omega_pred(self, use_optcam=False):
 94 |         return OmegasPred(
 95 |             config=self.config,
 96 |             smpl=self.smpl,
 97 |             use_optcam=use_optcam,
 98 |             vis_max_batch=self.batch_size,
 99 |             is_training=False,
100 |         )
101 | 
102 |     def update_E_vars(self):
103 |         trainable_vars = tf.contrib.framework.get_variables()
104 |         trainable_vars_e = [var for var in trainable_vars
105 |                             if var.name[:2] != 'D_']
106 |         self.E_var.extend(trainable_vars_e)
107 | 
108 |     def load_mean_omega(self):
109 |         # Initialize scale at 0.9
110 |         mean_path = os.path.join(os.path.dirname(self.smpl_model_path),
111 |                                  'neutral_smpl_meanwjoints.h5')
112 |         mean_vals = dd.io.load(mean_path)
113 | 
114 |         mean_cams = [0.9, 0, 0]
115 |         # 72D
116 |         mean_pose = mean_vals['pose']
117 |         mean_pose[:3] = 0.
118 |         mean_pose[0] = np.pi
119 |         # 10D
120 |         mean_shape = mean_vals['shape']
121 | 
122 |         mean_vals = np.hstack((mean_cams, mean_pose, mean_shape))
123 |         # Needs to be 1 x 85
124 |         mean_vals = np.expand_dims(mean_vals, 0)
125 |         self.mean_var = tf.Variable(
126 |             mean_vals,
127 |             name='mean_param',
128 |             dtype=tf.float32,
129 |             trainable=True
130 |         )
131 |         mean_cams = self.mean_var[0, :3]
132 |         mean_pose = self.mean_var[0, 3:3+72]
133 |         mean_shape = self.mean_var[0, 3+72:]
134 | 
135 |         self.mean_vars = [mean_cams, mean_pose, mean_shape]
136 |         mean_cams = tf.tile(tf.reshape(mean_cams, (1,-1)),
137 |                             (self.batch_size, 1))
138 |         mean_shape = tf.tile(tf.reshape(mean_shape, (1,-1)),
139 |                              (self.batch_size, 1))
140 |         mean_pose = tf.tile(tf.reshape(mean_pose, (1, 24, 3)),
141 |                             (self.batch_size, 1, 1))
142 |         _, mean_joints3d, mean_poses_rot = self.smpl(
143 |             mean_shape, mean_pose, get_skin=True)
144 | 
145 |         # Starting point for IEF.
146 |         self.theta_mean = tf.concat((
147 |                 mean_cams,
148 |                 tf.reshape(mean_pose, (-1, 72)),
149 |                 mean_shape
150 |         ), axis=1)
151 | 
152 |     def prepare(self, resnet_path=''):
153 |         """
154 |         Restores variables from checkpoint.
155 | 
156 |         Args:
157 |             resnet_path (str): Optional path to load resnet weights.
158 |         """
159 |         if resnet_path and not self.precomputed_phi:
160 |             print('Restoring resnet vars from', resnet_path)
161 |             resnet_vars = []
162 |             e_vars = []
163 |             for var in self.E_var:
164 |                 if 'resnet' in var.name:
165 |                     resnet_vars.append(var)
166 |                 else:
167 |                     e_vars.append(var)
168 |             resnet_saver = tf.train.Saver(resnet_vars)
169 |             resnet_saver.restore(self.sess, resnet_path)
170 |         else:
171 |             e_vars = self.E_var
172 |         print('Restoring checkpoint ', self.load_path)
173 | 
174 |         saver = tf.train.Saver(e_vars)
175 |         saver.restore(self.sess, self.load_path)
176 |         self.sess.run(self.mean_vars)
177 | 
178 |     def build_temporal_encoder_model(self):
179 |         B, T = self.batch_size, self.sequence_length
180 |         if self.precomputed_phi:
181 |             print('loading pre-computed phi!')
182 |             self.img_feat_full = self.images_pl
183 |         else:
184 |             print('Getting all image features...')
185 |             I_t = tf.reshape(
186 |                 self.images_pl,
187 |                 (B * T, self.img_size, self.img_size, 3)
188 |             )
189 |             img_feat, phi_var_scope = self.f_image_enc(
190 |                 I_t,
191 |                 is_training=False,
192 |                 reuse=False,
193 |             )
194 |             self.img_feat_full = tf.reshape(img_feat, (B, T, -1))
195 | 
196 |         omega_mean = tf.tile(self.theta_mean, (self.sequence_length, 1))
197 | 
198 |         # At training time, we only use first 40. Want to make sure GN
199 |         # statistics are right.
200 |         self.movie_strips_cond = self.f_temporal_enc(
201 |             net=self.img_feat_full[:, :40],
202 |             num_conv_layers=self.num_conv_layers,
203 |             prefix='',
204 |             reuse=None,
205 |         )
206 | 
207 |         self.movie_strips = self.f_temporal_enc(
208 |             net=self.img_feat_full,
209 |             num_conv_layers=self.num_conv_layers,
210 |             prefix='',
211 |             reuse=True,
212 |         )
213 | 
214 |         omega_movie_strip, _ = batch_pred_omega(
215 |             input_features=self.movie_strips,
216 |             batch_size=B,
217 |             sequence_length=T,
218 |             num_output=self.num_output,
219 |             is_training=False,
220 |             omega_mean=omega_mean,
221 |             scope='single_view_ief',
222 |             use_delta_from_pred=self.use_delta_from_pred,
223 |             use_optcam=self.use_optcam,
224 |         )
225 | 
226 |         self.omegas_movie_strip.append_batched(omega_movie_strip)
227 |         self.omegas_movie_strip.compute_smpl()
228 | 
229 |     def build_auto_regressive_model(self):
230 |         omega_mean = tf.tile(self.theta_mean, (self.fov, 1))
231 |         input_movie_strips = self.movie_strips_pl  # B x 13 x 2048
232 |         movie_strip_pred = self.f_prediction_ar(
233 |             net=input_movie_strips,
234 |             num_conv_layers=self.num_conv_layers,
235 |             prefix='pred_',
236 |             reuse=None,
237 |         )
238 |         omega_pred, _ = batch_pred_omega(
239 |             input_features=movie_strip_pred,
240 |             batch_size=self.batch_size,
241 |             is_training=False,
242 |             num_output=self.num_output,
243 |             omega_mean=omega_mean,
244 |             sequence_length=self.fov,
245 |             scope='single_view_ief',
246 |             predict_delta_keys=(),
247 |             use_delta_from_pred=self.use_delta_from_pred,
248 |             use_optcam=self.use_optcam,
249 |         )
250 |         # Only want the last entry.
251 |         self.movie_strip_pred = movie_strip_pred[:, -1:]
252 |         self.omegas_pred.append_batched(omega_pred[:, -1:])
253 |         self.omegas_pred.compute_smpl()
254 | 
255 |     def make_fetch_dict(self, omegas, suffix=''):
256 |         return {
257 |             # Predictions.
258 |             'cams' + suffix: omegas.get_cams(),
259 |             'joints' + suffix: omegas.get_joints(),
260 |             'kps' + suffix: omegas.get_kps(),
261 |             'poses' + suffix: omegas.get_poses_rot(),
262 |             'shapes' + suffix: omegas.get_shapes(),
263 |             'verts' + suffix: omegas.get_verts(),
264 |             'omegas' + suffix: omegas.get_raw(),
265 |         }
266 | 
267 |     def predict_movie_strips(self, images, get_smpl=False):
268 |         """
269 |         Converts images to movie strip representation. Number of images should
270 |         be equal to sequence_length. If precomputed_phi, images should be phis.
271 | 
272 |         Args:
273 |             images (B x (2*fov-1) x H x W x 3) or (B x (2*fov-1) x 2048).
274 |             get_smpl (bool): If True, returns all the smpl stuff.
275 | 
276 |         Returns:
277 |             Movie strips (B x (2*fov-1) x 2048).
278 |         """
279 |         feed_dict = {
280 |             self.images_pl: images,
281 |         }
282 |         fetch_dict = {
283 |             'movie_strips': self.movie_strips,
284 |             'movie_strips_cond': self.movie_strips_cond,
285 |         }
286 |         if get_smpl:
287 |             fetch_dict.update(self.make_fetch_dict(self.omegas_movie_strip))
288 |         return self.sess.run(fetch_dict, feed_dict)
289 | 
290 |     def predict_auto_regressive(self, movie_strips):
291 |         """
292 |         Predicts the next time step in an auto-regressive manner.
293 | 
294 |         Args:
295 |             movie_strips (B x fov x 2048).
296 | 
297 |         Returns:
298 | 
299 |         """
300 |         feed_dict = {
301 |             self.movie_strips_pl: movie_strips,
302 |         }
303 | 
304 |         fetch_dict = {
305 |             'movie_strip': self.movie_strip_pred,
306 |         }
307 |         fetch_dict.update(self.make_fetch_dict(self.omegas_pred))
308 |         result = self.sess.run(fetch_dict, feed_dict)
309 |         return result
310 | 


--------------------------------------------------------------------------------
/src/external/install_alphapose.sh:
--------------------------------------------------------------------------------
1 | git clone git@github.com:MVIG-SJTU/AlphaPose.git
2 | cd AlphaPose
3 | git fetch origin 38e00c688023282304462b5b6da98248e798842e  # API tested with this commit.
4 | python setup.py build develop --user
5 | echo ""
6 | echo "Don't forget to download the pre-trained model and configs!"
7 | 
8 | 


--------------------------------------------------------------------------------
/src/external/install_nmr.sh:
--------------------------------------------------------------------------------
1 | git clone git@github.com:daniilidis-group/neural_renderer.git
2 | cd neural_renderer
3 | python setup.py build develop
4 | 


--------------------------------------------------------------------------------
/src/extract_tracks.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Given a directory of videos, extracts 2D pose tracklet using AlphaPose/PoseFlow
  3 | for each video.
  4 | Make sure you have installed AlphaPose in src/external.
  5 | This script is basically a wrapper around AlphaPose/PoseFlow:
  6 | 1. Split the video into individual frames since PoseFlow requires that format
  7 | 2. Run AlphaPose on the produced directory with frames
  8 | 3. Run PoseFlow on the AlphaPose output.
  9 | Therefore, if at any point this script fails, please look into each system cmd
 10 | that's printed prior to running them. Make sure you can run those commands on
 11 | their own.
 12 | """
 13 | import json
 14 | import os
 15 | import os.path as osp
 16 | import re
 17 | import subprocess
 18 | from glob import glob
 19 | 
 20 | import numpy as np
 21 | 
 22 | 
 23 | def dump_frames(vid_path, out_dir):
 24 |     """
 25 |     Extracts all frames from the video at vid_path and saves them inside of
 26 |     out_dir.
 27 |     """
 28 |     if len(glob(osp.join(out_dir, '*.png'))) > 0:
 29 |         print('Image frames already exist!')
 30 |         return
 31 | 
 32 |     print('{} Writing frames to file'.format(vid_path))
 33 | 
 34 |     cmd = [
 35 |         'ffmpeg',
 36 |         '-i', vid_path,
 37 |         '-start_number', '0',
 38 |         '{temp_dir}/frame%08d.png'.format(temp_dir=out_dir),
 39 |     ]
 40 |     print(' '.join(cmd))
 41 |     subprocess.call(cmd)
 42 | 
 43 | 
 44 | def run_alphapose(img_dir, out_dir):
 45 |     if osp.exists(osp.join(out_dir, 'alphapose-results.json')):
 46 |         print('Alpha Pose already run!')
 47 |         return
 48 | 
 49 |     print('----------')
 50 |     print('Computing per-frame results with AlphaPose')
 51 | 
 52 |     # Ex:
 53 |     # python3 demo.py --indir data/0502/ --outdir data/0502/alphapose --sp  \
 54 |     #       --cfg pretrained_models/256x192_res50_lr1e-3_1x.yaml  \
 55 |     #       --checkpoint pretrained_models/fast_res50_256x192.pth
 56 |     cmd = [
 57 |         'python', 'scripts/demo_inference.py',
 58 |         '--indir', img_dir,
 59 |         '--outdir', out_dir,
 60 |         '--sp',  # Needed to avoid multi-processing issues.
 61 |         # Update thees if you used a different model from the Model Zoo.
 62 |         '--cfg', 'pretrained_models/256x192_res50_lr1e-3_1x.yaml',
 63 |         '--checkpoint', 'pretrained_models/fast_res50_256x192.pth',
 64 |         # '--save_img',  # Uncomment this if you want to visualize poses.
 65 |     ]
 66 | 
 67 |     print('Running: {}'.format(' '.join(cmd)))
 68 |     curr_dir = os.getcwd()
 69 |     os.chdir('src/external/AlphaPose')
 70 |     ret = subprocess.call(cmd)
 71 |     if ret != 0:
 72 |         print('Issue running alphapose. Please make sure you can run the above '
 73 |               'command from the commandline.')
 74 |         exit(ret)
 75 |     os.chdir(curr_dir)
 76 |     print('AlphaPose successfully ran!')
 77 |     print('----------')
 78 | 
 79 | 
 80 | def run_poseflow(img_dir, out_dir):
 81 |     alphapose_json = osp.join(out_dir, 'alphapose', 'alphapose-results.json')
 82 |     out_json = osp.join(out_dir, 'poseflow', 'poseflow-results-tracked.json')
 83 |     if osp.exists(out_json):
 84 |         print('PoseFlow already run!')
 85 |         return out_json
 86 | 
 87 |     print('Computing tracking with PoseFlow')
 88 | 
 89 |     # Ex:
 90 |     # python PoseFlow/tracker-general.py --imgdir data/0502/   \
 91 |     #   --in_json demo_output/0502/alphapose/alphapose-results.json  \
 92 |     #   --out_json demo_output/0502/poseflow/poseflow-results-tracked.json  \
 93 |     #   --visdir demo_output/0502/poseflow/
 94 |     cmd = [
 95 |         'python', 'PoseFlow/tracker-general.py',
 96 |         '--imgdir', img_dir,
 97 |         '--in_json', alphapose_json,
 98 |         '--out_json', out_json,
 99 |         # '--visdir', out_dir,  # Uncomment this to visualize PoseFlow tracks.
100 |     ]
101 | 
102 |     print('Running: {}'.format(' '.join(cmd)))
103 |     curr_dir = os.getcwd()
104 |     os.chdir('src/external/AlphaPose')
105 |     ret = subprocess.call(cmd)
106 |     if ret != 0:
107 |         print('Issue running PoseFlow. Please make sure you can run the above '
108 |               'command from the commandline.')
109 |         exit(ret)
110 |     os.chdir(curr_dir)
111 |     print('PoseFlow successfully ran!')
112 |     print('----------')
113 |     return out_json
114 | 
115 | 
116 | def compute_tracks(vid_path, out_dir):
117 |     """
118 |     This script basically:
119 |     1. Extracts individual frames from mp4 since PoseFlow requires per frame
120 |        images to be written.
121 |     2. Call AlphaPose on these frames.
122 |     3. Call PoseFlow on the output of 2.
123 |     """
124 |     vid_name = osp.basename(vid_path).split('.')[0]
125 | 
126 |     # Where to save all intermediate outputs in.
127 |     vid_dir = osp.abspath(osp.join(out_dir, vid_name))
128 |     img_dir = osp.abspath(osp.join(vid_dir, 'video_frames'))
129 |     alphapose_dir = osp.abspath(osp.join(vid_dir, 'alphapose'))
130 |     poseflow_dir = osp.abspath(osp.join(vid_dir, 'poseflow'))
131 | 
132 |     os.makedirs(img_dir, exist_ok=True)
133 |     os.makedirs(alphapose_dir, exist_ok=True)
134 |     os.makedirs(poseflow_dir, exist_ok=True)
135 | 
136 |     dump_frames(vid_path, img_dir)
137 |     run_alphapose(img_dir, alphapose_dir)
138 |     track_json = run_poseflow(img_dir, vid_dir)
139 | 
140 |     return track_json, img_dir
141 | 
142 | 
143 | def get_labels_poseflow(json_path, num_frames, min_kp_count=15):
144 |     """
145 |     Returns the poses for each person tracklet.
146 |     Each pose has dimension num_kp x 3 (x,y,vis) if the person is visible in the
147 |     current frame. Otherwise, the pose will be None.
148 |     Args:
149 |         json_path (str): Path to the json output from AlphaPose/PoseTrack.
150 |         num_frames (int): Number of frames.
151 |         min_kp_count (int): Minimum threshold length for a tracklet.
152 |     Returns:
153 |         List of length num_people. Each element in the list is another list of
154 |         length num_frames containing the poses for each person.
155 |     """
156 |     with open(json_path, 'r') as f:
157 |         data = json.load(f)
158 |     if len(data.keys()) != num_frames:
159 |         print('Not all frames have people detected in it.')
160 |         frame_ids = [int(re.findall(r'\d+', img_name)[0])
161 |                      for img_name in sorted(data.keys())]
162 |         if frame_ids[0] != 0:
163 |             print('PoseFlow did not find people in the first frame.')
164 |             exit(1)
165 | 
166 |     all_kps_dict = {}
167 |     all_kps_count = {}
168 |     for i, key in enumerate(sorted(data.keys())):
169 |         # People who are visible in this frame.
170 |         track_ids = []
171 |         for person in data[key]:
172 |             kps = np.array(person['keypoints']).reshape(-1, 3)
173 |             idx = int(person['idx'])
174 |             if idx not in all_kps_dict.keys():
175 |                 # If this is the first time, fill up until now with None
176 |                 all_kps_dict[idx] = [None] * i
177 |                 all_kps_count[idx] = 0
178 |             # Save these kps.
179 |             all_kps_dict[idx].append(kps)
180 |             track_ids.append(idx)
181 |             all_kps_count[idx] += 1
182 |         # If any person seen in the past is missing in this frame, add None.
183 |         for idx in set(all_kps_dict.keys()).difference(track_ids):
184 |             all_kps_dict[idx].append(None)
185 | 
186 |     all_kps_list = []
187 |     all_counts_list = []
188 |     for k in all_kps_dict:
189 |         if all_kps_count[k] >= min_kp_count:
190 |             all_kps_list.append(all_kps_dict[k])
191 |             all_counts_list.append(all_kps_count[k])
192 | 
193 |     # Sort it by the length so longest is first:
194 |     sort_idx = np.argsort(all_counts_list)[::-1]
195 |     all_kps_list_sorted = []
196 |     for sort_id in sort_idx:
197 |         all_kps_list_sorted.append(all_kps_list[sort_id])
198 |     print("Number of detected tracks:", len(all_kps_list_sorted))
199 |     return all_kps_list_sorted[0]  # Just take the first track.
200 | 


--------------------------------------------------------------------------------
/src/models.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | from tensorflow.contrib.layers.python.layers.initializers import (
  3 |     variance_scaling_initializer,
  4 | )
  5 | import tensorflow.contrib.slim as slim
  6 | 
  7 | from src.ops import tf_pad
  8 | 
  9 | 
 10 | def get_image_encoder(model_type='resnet'):
 11 |     """
 12 |     Retrieves encoder fn for image and 3D
 13 |     """
 14 |     models = {
 15 |         'resnet': encoder_resnet
 16 |     }
 17 |     if model_type in models.keys():
 18 |         return models[model_type]
 19 |     else:
 20 |         print('Unknown image encoder:', model_type)
 21 |         exit(1)
 22 | 
 23 | 
 24 | def get_temporal_encoder(model_type='AZ_FC2GN'):
 25 |     models = {
 26 |         'AZ_FC2GN': az_fc2_groupnorm,
 27 |     }
 28 |     if model_type in models.keys():
 29 |         return models[model_type]
 30 |     else:
 31 |         print('Unknown temporal encoder:', model_type)
 32 |         exit(1)
 33 | 
 34 | 
 35 | def get_prediction_model(model_type='AZ_FC2GN'):
 36 |     models = {
 37 |         'AZ_FC2GN': az_fc2_groupnorm,
 38 |     }
 39 |     if model_type in models.keys():
 40 |         return models[model_type]
 41 |     else:
 42 |         print('Unknown prediction model:', model_type)
 43 |         exit(1)
 44 | 
 45 | 
 46 | # Functions for image encoder.
 47 | 
 48 | def encoder_resnet(x, is_training=True, weight_decay=0.001, reuse=False):
 49 |     """
 50 |     Resnet v2-50
 51 |     Assumes input is [batch, height_in, width_in, channels]!!
 52 |     Input:
 53 |     - x: N x H x W x 3
 54 |     - weight_decay: float
 55 |     - reuse: bool->True if test
 56 | 
 57 |     Outputs:
 58 |     - cam: N x 3
 59 |     - Pose vector: N x 72
 60 |     - Shape vector: N x 10
 61 |     - variables: tf variables
 62 |     """
 63 |     from tensorflow.contrib.slim.python.slim.nets import resnet_v2
 64 |     with tf.name_scope('Encoder_resnet', values=[x]):
 65 |         with slim.arg_scope(
 66 |                 resnet_v2.resnet_arg_scope(weight_decay=weight_decay)):
 67 |             net, end_points = resnet_v2.resnet_v2_50(
 68 |                 x,
 69 |                 num_classes=None,
 70 |                 is_training=is_training,
 71 |                 reuse=reuse,
 72 |                 scope='resnet_v2_50')
 73 |             net = tf.squeeze(net, axis=[1, 2])
 74 |     variables_scope = 'resnet_v2_50'
 75 |     return net, variables_scope
 76 | 
 77 | 
 78 | def encoder_fc3_dropout(x,
 79 |                         num_output=85,
 80 |                         is_training=True,
 81 |                         reuse=False,
 82 |                         name='3D_module'):
 83 |     """
 84 |     3D inference module. 3 MLP layers (last is the output)
 85 |     With dropout  on first 2.
 86 |     Input:
 87 |     - x: N x [|img_feat|, |3D_param|]
 88 |     - reuse: bool
 89 | 
 90 |     Outputs:
 91 |     - 3D params: N x num_output
 92 |       if orthogonal:
 93 |            either 85: (3 + 24*3 + 10) or 109 (3 + 24*4 + 10) for factored
 94 |            axis-angle representation
 95 |       if perspective:
 96 |           86: (f, tx, ty, tz) + 24*3 + 10, or 110 for factored axis-angle.
 97 |     - variables: tf variables
 98 |     """
 99 |     with tf.variable_scope(name, reuse=reuse) as scope:
100 |         net = slim.fully_connected(x, 1024, scope='fc1')
101 |         net = slim.dropout(net, 0.5, is_training=is_training, scope='dropout1')
102 |         net = slim.fully_connected(net, 1024, scope='fc2')
103 |         net = slim.dropout(net, 0.5, is_training=is_training, scope='dropout2')
104 |         small_xavier = variance_scaling_initializer(
105 |             factor=.01, mode='FAN_AVG', uniform=True)
106 |         net = slim.fully_connected(
107 |             net,
108 |             num_output,
109 |             activation_fn=None,
110 |             weights_initializer=small_xavier,
111 |             scope='fc3')
112 | 
113 |     variables = tf.contrib.framework.get_variables(scope)
114 |     return net, variables
115 | 
116 | 
117 | # Functions for f_{movie strip} and f_{AR}.
118 | 
119 | def az_fc2_groupnorm(net, num_conv_layers, prefix, reuse=None):
120 |     """
121 |     Causal architecture for movie strip encoder and autoregressive prediction.
122 | 
123 |     Each block has 2 convs:
124 |     norm --> relu --> conv --> norm --> relu --> conv --> add.
125 |     Uses full convolution.
126 |     """
127 |     for i in range(num_conv_layers):
128 |         net = az_fc_causal_block2(
129 |             kernel_width=3,
130 |             name='block_{}'.format(i),
131 |             net_input=net,
132 |             num_filter=2048,
133 |             pad_edges=True,
134 |             prefix=prefix,
135 |             reuse=reuse,
136 |         )
137 |     return net
138 | 
139 | 
140 | def az_fc_causal_block2(net_input, num_filter, kernel_width, prefix='', name='',
141 |                         pad_edges=False, reuse=None):
142 |     """
143 |     Causal res block:
144 |     BN -> Relu -> Weight -> BN -> Relu -> Weight -> Add
145 | 
146 |     Causal is implemented by padding the left size by receptive field - 1 and
147 |     using 'VALID' mode. The output indices thus only look at the input indices
148 |     that are less than or equal.
149 |     """
150 |     pad_mode = 'EDGE' if pad_edges else 'ZERO'
151 |     num_groups = 32
152 | 
153 |     # NTC -> NT1C
154 |     net_input_expand = tf.expand_dims(net_input, axis=2)
155 |     # group-norm
156 |     net_norm = tf.contrib.layers.group_norm(
157 |         net_input_expand,
158 |         channels_axis=-1,
159 |         reduction_axes=(-3, -2),
160 |         scope=(prefix + 'AZ_FC_causal_block_preact_gn1' + name),
161 |         reuse=reuse,
162 |         groups=num_groups,
163 |     )
164 |     # relu
165 |     net_relu = tf.nn.relu(net_norm)
166 |     # weight
167 |     net_relu_padded = tf_pad(
168 |         tensor=net_relu,
169 |         paddings=[[0, 0], [kernel_width - 1, 0], [0, 0], [0, 0]],
170 |         mode=pad_mode
171 |     )
172 |     net_conv1 = tf.contrib.layers.conv2d(
173 |         inputs=net_relu_padded,
174 |         num_outputs=num_filter,
175 |         kernel_size=[kernel_width, 1],
176 |         stride=1,
177 |         padding='VALID',
178 |         data_format='NHWC',  # was previously 'NCHW',
179 |         rate=1,
180 |         activation_fn=None,
181 |         scope=prefix + 'AZ_FC_causal_block2_conv1' + name,
182 |         reuse=reuse,
183 |     )
184 |     # group-norm
185 |     net_norm2 = tf.contrib.layers.group_norm(
186 |         net_conv1,
187 |         channels_axis=-1,
188 |         reduction_axes=(-3, -2),
189 |         reuse=reuse,
190 |         groups=num_groups,
191 |         scope=(prefix + 'AZ_FC_causal_block_preact_gn2' + name),
192 |     )
193 |     # relu
194 |     net_relu2 = tf.nn.relu(net_norm2)
195 |     net_relu_padded2 = tf_pad(
196 |         tensor=net_relu2,
197 |         paddings=[[0, 0], [kernel_width - 1, 0], [0, 0], [0, 0]],
198 |         mode=pad_mode
199 |     )
200 |     # Initalization
201 |     small_xavier = variance_scaling_initializer(
202 |         factor=.001,
203 |         mode='FAN_AVG',
204 |         uniform=True
205 |     )
206 |     net_final = tf.contrib.layers.conv2d(
207 |         inputs=net_relu_padded2,
208 |         num_outputs=num_filter,
209 |         kernel_size=[kernel_width, 1],
210 |         stride=1,
211 |         padding='VALID',
212 |         data_format='NHWC',
213 |         rate=1,
214 |         activation_fn=None,
215 |         weights_initializer=small_xavier,
216 |         scope=prefix + 'AZ_FC_causal_block2_conv2' + name,
217 |         reuse=reuse,
218 |     )
219 |     # NT1C -> NTC
220 |     net_final = tf.squeeze(net_final, axis=2)
221 |     # skip connection
222 |     residual = tf.add(net_final, net_input)
223 |     return residual
224 | 
225 | 
226 | # Functions for f_{3D}.
227 | 
228 | def batch_pred_omega(input_features, batch_size, is_training, num_output,
229 |                      omega_mean, sequence_length, scope, predict_delta_keys=(),
230 |                      use_delta_from_pred=False, use_optcam=False):
231 |     """
232 |     Given B x T x * inputs, computes IEF on them by batching them
233 |     as BT x *.
234 | 
235 |     if use_optcam is True, only outputs 72 or 82 dims.
236 |     and appends fixed camera [1,0,0]
237 |     """
238 |     # run in batch
239 |     # omega_mean comes in as shape: BT x 85
240 |     input_features_reshape = tf.reshape(input_features,
241 |                                         (batch_size * sequence_length, -1))
242 |     omega_pred, delta_predictions = call_hmr_ief(
243 |         phi=input_features_reshape,
244 |         omega_start=omega_mean,
245 |         scope=scope,
246 |         num_output=num_output,
247 |         is_training=is_training,
248 |         predict_delta_keys=predict_delta_keys,
249 |         use_delta_from_pred=use_delta_from_pred,
250 |         use_optcam=use_optcam,
251 |     )
252 |     omega_pred = tf.reshape(
253 |         omega_pred,
254 |         (batch_size, sequence_length, num_output)
255 |     )
256 |     new_delta_predictions = {}
257 |     for delta_t, prediction in delta_predictions.items():
258 |         new_delta_predictions[delta_t] = tf.reshape(
259 |             prediction,
260 |             (batch_size, sequence_length, num_output)
261 |         )
262 |     return omega_pred, new_delta_predictions
263 | 
264 | 
265 | def call_hmr_ief(phi, omega_start, scope, num_output=85, num_stage=3,
266 |                  is_training=True, predict_delta_keys=(),
267 |                  use_delta_from_pred=False, use_optcam=True):
268 |     """
269 |     Wrapper for doing HMR-style IEF.
270 | 
271 |     If predict_delta, then also makes num_delta_t predictions forward and
272 |     backward in time, with each step of delta_t.
273 | 
274 |     Args:
275 |         phi (Bx2048): Image features.
276 |         omega_start (Bx85): Starting Omega as input to first IEF.
277 |         scope (str): Name of scope for reuse.
278 |         num_output (int): Size of output.
279 |         num_stage (int): Number of iterations for IEF.
280 |         is_training (bool): If False, don't apply dropout.
281 |         predict_delta_keys (iterable): List of keys for delta_t.
282 |         use_delta_from_pred (bool): If True, initializes delta prediction from
283 |             current frame prediction.
284 |         use_optcam (bool): If True, uses [1, 0, 0] for cam.
285 | 
286 |     Returns:
287 |         Final theta (Bx{num_output})
288 |         Deltas predictions (List of outputs)
289 |     """
290 |     theta_here = hmr_ief(
291 |         phi=phi,
292 |         omega_start=omega_start,
293 |         scope=scope,
294 |         num_output=num_output,
295 |         num_stage=num_stage,
296 |         is_training=is_training
297 |     )
298 | 
299 |     # Delta only needs to do cam/pose, no shape!
300 |     if use_optcam:
301 |         num_output_delta = 72
302 |     else:
303 |         num_output_delta = 3 + 72
304 | 
305 |     deltas_predictions = {}
306 |     for delta_t in predict_delta_keys:
307 |         if delta_t == 0:
308 |             # This should just be the normal IEF.
309 |             continue
310 |         elif delta_t > 0:
311 |             scope_delta = scope + '_future{}'.format(delta_t)
312 |         elif delta_t < 0:
313 |             scope_delta = scope + '_past{}'.format(abs(delta_t))
314 | 
315 |         omega_start_delta = theta_here if use_delta_from_pred else omega_start
316 |         # append this later.
317 |         beta = omega_start_delta[:, -10:]
318 | 
319 |         if use_optcam:
320 |             # trim the first 3D camera + last shpae
321 |             omega_start_delta = omega_start_delta[:, 3:3 + num_output_delta]
322 |         else:
323 |             omega_start_delta = omega_start_delta[:, :num_output_delta]
324 | 
325 |         delta_pred = hmr_ief(
326 |             phi=phi,
327 |             omega_start=omega_start_delta,
328 |             scope=scope_delta,
329 |             num_output=num_output_delta,
330 |             num_stage=num_stage,
331 |             is_training=is_training
332 |         )
333 |         if use_optcam:
334 |             # Add camera + shape
335 |             scale = tf.ones([delta_pred.shape[0], 1])
336 |             trans = tf.zeros([delta_pred.shape[0], 2])
337 |             delta_pred = tf.concat([scale, trans, delta_pred, beta], 1)
338 |         else:
339 |             delta_pred = tf.concat([delta_pred[:, :75], beta], 1)
340 | 
341 |         deltas_predictions[delta_t] = delta_pred
342 | 
343 |     return theta_here, deltas_predictions
344 | 
345 | 
346 | def hmr_ief(phi, omega_start, scope, num_output=85, num_stage=3,
347 |             is_training=True):
348 |     """
349 |     Runs HMR-style IEF.
350 | 
351 |     Args:
352 |         phi (Bx2048): Image features.
353 |         omega_start (Bx85): Starting Omega as input to first IEF.
354 |         scope (str): Name of scope for reuse.
355 |         num_output (int): Size of output.
356 |         num_stage (int): Number of iterations for IEF.
357 |         is_training (bool): If False, don't apply dropout.
358 | 
359 |     Returns:
360 |         Final theta (Bx{num_output})
361 |     """
362 |     with tf.variable_scope(scope):
363 |         theta_prev = omega_start
364 |         theta_here = None
365 | 
366 |         for _ in range(num_stage):
367 |             # ---- Compute outputs
368 |             state = tf.concat([phi, theta_prev], 1)
369 |             delta_theta, _ = encoder_fc3_dropout(
370 |                 state,
371 |                 is_training=is_training,
372 |                 num_output=num_output,
373 |                 reuse=tf.AUTO_REUSE
374 |             )
375 |             # Compute new theta
376 |             theta_here = theta_prev + delta_theta
377 | 
378 |             # Finally update to end iteration.
379 |             theta_prev = theta_here
380 | 
381 |     return theta_here
382 | 


--------------------------------------------------------------------------------
/src/omega.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Wrapper classes for saving all predicted variables. Makes it easier to compute
  3 | SMPL from the 85-dimension output of the model all at once.
  4 | """
  5 | import tensorflow as tf
  6 | 
  7 | from src.tf_smpl.batch_lbs import batch_rodrigues
  8 | from src.tf_smpl.projection import batch_orth_proj_idrot
  9 | 
 10 | 
 11 | class Omegas(object):
 12 |     """
 13 |     Superclass container for batches of sequences of poses, shapes, joints, etc.
 14 | 
 15 |     Args:
 16 |         config.
 17 |     """
 18 |     def __init__(self, config, batch_size=None):
 19 |         self.config = config
 20 |         if batch_size:
 21 |             self.batch_size = batch_size
 22 |         else:
 23 |             self.batch_size = config.batch_size
 24 |         self.length = 0
 25 | 
 26 |         self.joints = tf.constant(
 27 |             (),
 28 |             shape=(self.batch_size, 0, self.config.num_kps, 3)
 29 |         )
 30 |         self.kps = tf.constant(
 31 |             (),
 32 |             shape=(self.batch_size, 0, self.config.num_kps, 2)
 33 |         )
 34 |         self.poses_aa = tf.constant((), shape=(self.batch_size, 0, 24, 3))
 35 |         self.poses_rot = tf.constant((), shape=(self.batch_size, 0, 24, 3, 3))
 36 |         self.shapes = tf.constant((), shape=(self.batch_size, 0, 10))
 37 | 
 38 |     def __len__(self):
 39 |         """
 40 |         Returns the current sequence length.
 41 | 
 42 |         Returns:
 43 |             length (int).
 44 |         """
 45 |         return self.length
 46 | 
 47 |     def get_joints(self, t=None):
 48 |         """
 49 |         Returns the joints at time t.
 50 | 
 51 |         Args:
 52 |             t (int).
 53 | 
 54 |         Returns:
 55 |             Joints (Bx25x3).
 56 |         """
 57 |         return self.joints if t is None else self.joints[:, t]
 58 | 
 59 |     def get_kps(self, t=None):
 60 |         """
 61 |         Returns the keypoints at time t.
 62 | 
 63 |         Note that the shape is different for ground truth omegas and predicted
 64 |         omegas.
 65 | 
 66 |         Args:
 67 |             t (int).
 68 | 
 69 |         Returns:
 70 |             Kps (Bx25x3) if gt,
 71 |                 or Kps (Bx25x2) if pred.
 72 |         """
 73 |         return self.kps if t is None else self.kps[:, t]
 74 | 
 75 |     def get_poses_aa(self, t=None):
 76 |         """
 77 |         Returns axis-aligned poses at time t.
 78 | 
 79 |         Args:
 80 |             t (int).
 81 | 
 82 |         Returns:
 83 |             Poses (Bx24x3).
 84 |         """
 85 |         return self.poses_aa if t is None else self.poses_aa[:, t]
 86 | 
 87 |     def get_poses_rot(self, t=None):
 88 |         """
 89 |         Returns poses as rotation matrices at time t.
 90 | 
 91 |         Args:
 92 |             t (int).
 93 | 
 94 |         Returns:
 95 |             Poses (Bx24x3x3).
 96 |         """
 97 |         return self.poses_rot if t is None else self.poses_rot[:, t]
 98 | 
 99 |     def get_shapes(self, t=None):
100 |         """
101 |         Returns shapes at time t.
102 | 
103 |         Args:
104 |             t (int).
105 | 
106 |         Returns:
107 |             Shapes (Bx10).
108 |         """
109 |         return self.shapes if t is None else self.shapes[:, t]
110 | 
111 |     @staticmethod
112 |     def gather(values, indices):
113 |         """
114 |         Gathers a subset over time.
115 | 
116 |         Args:
117 |             values (BxTx...): Tensor that we only need a subset of.
118 |             indices (iterable): 1D tensor of times.
119 | 
120 |         Returns:
121 |             tensor.
122 |         """
123 |         if not isinstance(indices, tf.Tensor):
124 |             indices = tf.constant(indices)
125 |         spliced = tf.gather(params=values, indices=indices, axis=1)
126 |         return spliced
127 | 
128 | 
129 | class OmegasGt(Omegas):
130 |     """
131 |     Stores fields for ground truth omegas.
132 | 
133 |     Args:
134 |         config.
135 |         poses_aa (BxTx24x3).
136 |         shapes (Bx10).
137 |         joints (BxTx14x3).
138 |     """
139 |     def __init__(self, config, poses_aa, shapes, joints, kps, batch_size=None):
140 |         super(OmegasGt, self).__init__(config, batch_size=batch_size)
141 |         self.length = poses_aa.shape[1]
142 | 
143 |         self.poses_aa = poses_aa
144 |         poses_rot = batch_rodrigues(tf.reshape(poses_aa, (-1, 3)))
145 |         self.poses_rot = tf.reshape(poses_rot, (self.batch_size, -1, 24, 3, 3))
146 |         self.shapes = shapes
147 |         self.joints = joints
148 |         self.kps = kps
149 | 
150 |     def get_shapes(self, t=None):
151 |         if t is None:
152 |             # When t is None, expect BxTx10.
153 |             return tf.tile(tf.expand_dims(self.shapes, 1), (1, self.length, 1))
154 |         else:
155 |             return self.shapes
156 | 
157 | 
158 | class OmegasPred(Omegas):
159 |     """
160 |     Stores fields for predicted Omegas.
161 | 
162 |     Args:
163 |         config.
164 |         smpl (func).
165 |         optcam (bool): If true, uses optcam when computing kp proj.
166 |         vis_max_batch (int): Number of batches to visualize.
167 |         vis_t_indices (ndarray): Times to visualize. If None, keeps all.
168 |     """
169 |     omega_instances = []
170 | 
171 |     def __init__(self,
172 |                  config,
173 |                  smpl,
174 |                  use_optcam=False,
175 |                  vis_max_batch=2,
176 |                  vis_t_indices=None,
177 |                  batch_size=None,
178 |                  is_training=True):
179 |         super(OmegasPred, self).__init__(config, batch_size)
180 |         self.smpl = smpl
181 |         self.cams = tf.constant((), shape=(self.batch_size, 0, 3))
182 |         self.all_verts = tf.constant((), shape=(0, 6890, 3))
183 |         self.verts = tf.constant((), shape=(0, 6890, 3))
184 |         self.smpl_computed = False
185 |         self.vis_max_batch = vis_max_batch
186 |         self.vis_t_indices = vis_t_indices
187 |         self.raw = tf.constant((), shape=(self.batch_size, 0, 85))
188 |         self.use_optcam = use_optcam
189 |         self.is_training = is_training
190 |         OmegasPred.omega_instances.append(self)
191 | 
192 |     def update_instance_vars(self):
193 |         self.cams = self.raw[:, :, :3]
194 |         self.poses_aa = self.raw[:, :, 3: 3 + 24 * 3]
195 |         self.shapes = self.raw[:, :, 3 + 24 * 3: 85]
196 |         self.length = self.raw.shape[1]
197 | 
198 |     def append_batched(self, omegas):
199 |         """
200 |         Appends multiple omegas.
201 | 
202 |         Args:
203 |             omegas (BxTx85): [cams, poses, shapes].
204 |         """
205 |         B = self.batch_size
206 |         omegas = tf.reshape(omegas, (B, -1, 85))
207 |         self.raw = tf.concat((self.raw, omegas), axis=1)
208 |         self.update_instance_vars()
209 |         self.smpl_computed = False
210 | 
211 |     def append(self, omega):
212 |         """
213 |         Appends an omega.
214 | 
215 |         Args:
216 |             omega (Bx85): [cams, poses, shapes].
217 |         """
218 |         B = self.batch_size
219 |         omega = tf.reshape(omega, (B, 1, 85))
220 |         self.raw = tf.concat((self.raw, omega), axis=1)
221 |         self.update_instance_vars()
222 |         self.smpl_computed = False
223 | 
224 |     def compute_smpl(self):
225 |         """
226 |         Batch computation of vertices, joints, rotation matrices, and keypoints.
227 |         Due to the overhead added to computation graph, call this once.
228 |         """
229 |         if self.smpl_computed:
230 |             print('SMPL should only be computed once!')
231 |         B = self.batch_size
232 |         T = self.length
233 | 
234 |         verts, joints, poses_rot = self.smpl(
235 |             beta=tf.reshape(self.shapes, (B * T, 10)),
236 |             theta=tf.reshape(self.poses_aa, (B * T, 24, 3)),
237 |             get_skin=True
238 |         )
239 |         self.joints = tf.reshape(joints, (B, T, self.config.num_kps, 3))
240 |         self.poses_rot = tf.reshape(poses_rot, (B, T, 24, 3, 3))
241 | 
242 |         # Make sure joints are B*T x num_kps x 3.
243 |         if self.use_optcam and self.is_training:
244 |             print('Using optimal camera!!')
245 |             # Just drop the z here ([1, 0, 0])
246 |             kps = joints[:, :, :2]
247 |         else:
248 |             kps = batch_orth_proj_idrot(joints,
249 |                                         tf.reshape(self.cams, (B * T, 3)))
250 |         self.kps = tf.reshape(kps, (B, T, self.config.num_kps, 2))
251 | 
252 |         self.all_verts = tf.reshape(verts, (B, T, 6890, 3))[:self.vis_max_batch]
253 |         if self.vis_t_indices is None:
254 |             self.verts = self.all_verts
255 |         else:
256 |             self.verts = Omegas.gather(
257 |                 values=self.all_verts,
258 |                 indices=self.vis_t_indices
259 |             )
260 |         self.smpl_computed = True
261 | 
262 |     def get_cams(self, t=None):
263 |         """
264 |         Gets cams at time t.
265 | 
266 |         Args:
267 |             t (int).
268 | 
269 |         Returns:
270 |             Cams (Bx3).
271 |         """
272 |         return self.cams if t is None else self.cams[:, t]
273 | 
274 |     def set_cams(self, cams):
275 |         """
276 |         Only used for opt_cam
277 |         """
278 |         assert self.use_optcam
279 |         self.cams = cams
280 | 
281 |     def get_all_verts(self):
282 |         return self.all_verts
283 | 
284 |     def get_verts(self):
285 |         return self.verts
286 | 
287 |     def get_raw(self):
288 |         """
289 |         Returns:
290 |             Raw Omega (BxTx85).
291 |         """
292 |         return self.raw
293 | 
294 |     @classmethod
295 |     def compute_all_smpl(cls):
296 |         omegas = cls.omega_instances
297 |         for omega in omegas:
298 |             omega.compute_smpl()
299 | 


--------------------------------------------------------------------------------
/src/ops.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import tensorflow as tf
 3 | 
 4 | 
 5 | def tf_pad(tensor, paddings, mode):
 6 |     """
 7 |     Pads a tensor according to paddings.
 8 | 
 9 |     mode can be 'ZERO' or 'EDGE' (Just use tf.pad for other modes).
10 | 
11 |     'EDGE' padding is equivalent to repeatedly doing symmetric padding with all
12 |     pads at most 1.
13 | 
14 |     Args:
15 |         tensor (Tensor).
16 |         paddings (list of list of non-negative ints).
17 |         mode (str).
18 | 
19 |     Returns:
20 |         Padded tensor.
21 |     """
22 |     paddings = np.array(paddings, dtype=int)
23 |     assert np.all(paddings >= 0)
24 |     while not np.all(paddings == 0):
25 |         new_paddings = np.array(paddings > 0, dtype=int)
26 |         paddings -= new_paddings
27 |         new_paddings = tf.constant(new_paddings)
28 |         if mode == 'ZERO':
29 |             tensor = tf.pad(tensor, new_paddings, 'CONSTANT', constant_values=0)
30 |         elif mode == 'EDGE':
31 |             tensor = tf.pad(tensor, new_paddings, 'SYMMETRIC')
32 |         else:
33 |             raise Exception('pad type {} not recognized'.format(mode))
34 | 
35 |     return tensor


--------------------------------------------------------------------------------
/src/renderer.py:
--------------------------------------------------------------------------------
  1 | import neural_renderer as nr
  2 | import numpy as np
  3 | from skimage.io import imread
  4 | import torch
  5 | from torch.autograd import Variable
  6 | 
  7 | from src.util.common import resize_img
  8 | from src.util.torch_utils import orthographic_proj_withz_idrot
  9 | from src.util.render_utils import (
 10 |     draw_skeleton,
 11 |     draw_text,
 12 | )
 13 | 
 14 | 
 15 | COLORS = {
 16 |     # colorblind/print/copy safe:
 17 |     'blue': [0.65098039, 0.74117647, 0.85882353],
 18 |     'pink': [.9, .7, .7],
 19 |     'mint': [ 166/255.,  229/255.,  204/255.],
 20 |     'mint2': [ 202/255.,  229/255.,  223/255.],
 21 |     'green': [ 153/255.,  216/255.,  201/255.],
 22 |     'green2': [ 171/255.,  221/255.,  164/255.],
 23 |     'red': [ 251/255.,  128/255.,  114/255.],
 24 |     'orange': [ 253/255.,  174/255.,  97/255.],
 25 |     'yellow': [ 250/255.,  230/255.,  154/255.]
 26 | }
 27 | 
 28 | 
 29 | def get_dims(x):
 30 |     return x.dim() if isinstance(x, torch.Tensor) else x.ndim
 31 | 
 32 | 
 33 | class VisRenderer(object):
 34 |     """
 35 |     Utility to render meshes using pytorch NMR
 36 |     faces are F x 3 or 1 x F x 3 numpy
 37 |     this is for visualization only -- does not allow backprop.
 38 |     This class assumes all inputs are Torch/numpy variables.
 39 |     This renderer expects quarternion rotation for camera,,
 40 |     """
 41 | 
 42 |     def __init__(self,
 43 |                  img_size=256,
 44 |                  face_path='models/smpl_faces.npy',
 45 |                  t_size=1):
 46 | 
 47 |         self.renderer = nr.Renderer(
 48 |             img_size, camera_mode='look_at', perspective=False)
 49 |         self.set_light_dir([1, .5, -1], int_dir=0.3, int_amb=0.7)
 50 |         self.set_bgcolor([1, 1, 1.])
 51 |         self.img_size = img_size
 52 | 
 53 |         self.faces_np = np.load(face_path).astype(np.int)
 54 |         self.faces = to_variable(torch.IntTensor(self.faces_np).cuda())
 55 |         if self.faces.dim() == 2:
 56 |             self.faces = torch.unsqueeze(self.faces, 0)
 57 | 
 58 |         # Default color:
 59 |         default_tex = np.ones((1, self.faces.shape[1], t_size, t_size, t_size,
 60 |                                3))
 61 |         self.default_tex = to_variable(torch.FloatTensor(default_tex).cuda())
 62 | 
 63 |         # Default camera:
 64 |         cam = np.hstack([0.9, 0, 0])
 65 |         default_cam = to_variable(torch.FloatTensor(cam).cuda())
 66 |         self.default_cam = torch.unsqueeze(default_cam, 0)
 67 | 
 68 |         # Setup proj fn:
 69 |         self.proj_fn = orthographic_proj_withz_idrot
 70 | 
 71 |     def __call__(self,
 72 |                  verts,
 73 |                  cam=None,
 74 |                  texture=None,
 75 |                  rend_mask=False,
 76 |                  alpha=False,
 77 |                  img=None,
 78 |                  color_name='blue'):
 79 |         """
 80 |         verts is |V| x 3 numpy/cuda torch Variable or B x V x 3
 81 |         cams is 3D [s, tx, ty], numpy/cuda torch Variable or B x 3
 82 |         cams is NOT the same as OpenDR renderer.
 83 |         Directly use the cams of HMR output
 84 |         Returns N x N x 3 numpy, where N is the image size.
 85 |         Or B x N x N x 3 when input was batched
 86 |         if you're using this as a batch, make sure you send in B x 3 cameras
 87 |         as well as B x * x * x 3 images if you're using it.
 88 |         """
 89 |         num_batch = 1
 90 | 
 91 |         if get_dims(verts) == 3 and verts.shape[0] != 1:
 92 |             print('batch mode')
 93 |             num_batch = verts.shape[0]
 94 |             # Make sure everything else is also batch mode.
 95 |             if cam is not None:
 96 |                 assert get_dims(cam) == 2 and cam.shape[0] == num_batch
 97 |             if img is not None:
 98 |                 assert img.ndim == 4 and img.shape[0] == num_batch
 99 | 
100 |         if texture is None:
101 |             # single color.
102 |             color = torch.FloatTensor(COLORS[color_name]).cuda()
103 |             texture = color * self.default_tex
104 |             texture = texture.repeat(num_batch, 1, 1, 1, 1, 1)
105 |         else:
106 |             texture = to_float_tensor(texture)
107 |             if texture.dim() == 5:
108 |                 # Here input it F x T x T x T x 3 (instead of F x T x T x 3)
109 |                 # So add batch dim.
110 |                 texture = torch.unsqueeze(texture, 0)
111 |         if cam is None:
112 |             cam = self.default_cam
113 |             if num_batch > 1:
114 |                 cam = cam.repeat(num_batch, 1)
115 |         else:
116 |             cam = to_float_tensor(cam)
117 |             if cam.dim() == 1:
118 |                 cam = torch.unsqueeze(cam, 0)
119 | 
120 |         verts = to_float_tensor(verts)
121 |         if verts.dim() == 2:
122 |             verts = torch.unsqueeze(verts, 0)
123 | 
124 |         verts = to_variable(verts)
125 |         cam = to_variable(cam)
126 |         texture = to_variable(texture)
127 | 
128 |         # set offset_z for persp proj
129 |         proj_verts = self.proj_fn(verts, cam, offset_z=0)
130 |         # Flipping the y-axis here to make it align with
131 |         # the image coordinate system!
132 |         proj_verts[:, :, 1] *= -1
133 | 
134 |         # Adjust for batch.
135 |         faces = self.faces.repeat(num_batch, 1, 1)
136 |         if rend_mask:
137 |             rend = self.renderer.render_silhouettes(proj_verts, faces)
138 |             rend = torch.unsqueeze(rend, 0)
139 |             rend = rend.repeat(1, 3, 1, 1)
140 |         else:
141 |             rend = self.renderer.render(proj_verts, faces, texture)
142 | 
143 |         rend = rend[0].data.cpu().numpy().transpose((0, 2, 3, 1))
144 |         rend = np.clip(rend, 0, 1) * 255.0
145 | 
146 |         if num_batch == 1:
147 |             rend = rend[0]
148 | 
149 |         if not rend_mask and (alpha or img is not None):
150 |             mask = self.renderer.render_silhouettes(proj_verts, faces)
151 |             mask = mask.data.cpu().numpy()
152 |             if img is not None:
153 |                 mask = np.repeat(np.expand_dims(mask, 3), 3, axis=3)
154 |                 if num_batch == 1:
155 |                     mask = mask[0]
156 |                 # TODO: Make sure img is [0, 255]!!!
157 |                 return (img * (1 - mask) + rend * mask).astype(np.uint8)
158 |             else:
159 |                 # TODO: Temporary hack
160 |                 mask = mask.reshape((rend.shape[:2]) + (1,))
161 |                 return self.make_alpha(rend, mask)
162 |         else:
163 |             return rend.astype(np.uint8)
164 | 
165 |     def rotated(self,
166 |                 verts,
167 |                 deg,
168 |                 axis='y',
169 |                 cam=None,
170 |                 texture=None,
171 |                 rend_mask=False,
172 |                 alpha=False,
173 |                 color_name='blue'):
174 |         """
175 |         vert is N x 3, torch FloatTensor (or Variable)
176 |         """
177 |         import cv2
178 |         if axis == 'y':
179 |             axis = [0, 1., 0]
180 |         elif axis == 'x':
181 |             axis = [1., 0, 0]
182 |         else:
183 |             axis = [0, 0, 1.]
184 | 
185 |         new_rot = cv2.Rodrigues(np.deg2rad(deg) * np.array(axis))[0]
186 |         new_rot = to_float_tensor(new_rot)
187 | 
188 |         verts = to_float_tensor(verts)
189 | 
190 |         if get_dims(verts) == 2:
191 |             # Make it in to 1 x N x 3
192 |             verts = verts.unsqueeze(0)
193 |         num_batch = verts.shape[0]
194 | 
195 |         new_rot = new_rot.unsqueeze(0)
196 |         new_rot = new_rot.repeat(num_batch, 1, 1)
197 | 
198 |         center = verts.mean(1, keepdim=True)
199 |         centered_v = (verts - center)
200 |         new_verts = torch.matmul(new_rot, centered_v.permute(0, 2, 1))
201 |         new_verts = new_verts.permute(0, 2, 1) + center
202 | 
203 |         return self.__call__(
204 |             new_verts,
205 |             cam=cam,
206 |             texture=texture,
207 |             rend_mask=rend_mask,
208 |             alpha=alpha,
209 |             color_name=color_name
210 |         )
211 | 
212 |     def make_alpha(self, rend, mask):
213 |         rend = rend.astype(np.uint8)
214 |         alpha = (mask * 255).astype(np.uint8)
215 | 
216 |         imgA = np.dstack((rend, alpha))
217 |         return imgA
218 | 
219 |     def set_light_dir(self, direction, int_dir=0.8, int_amb=0.8):
220 |         self.renderer.light_direction = direction
221 |         self.renderer.light_intensity_directional = int_dir
222 |         self.renderer.light_intensity_ambient = int_amb
223 | 
224 |     def set_bgcolor(self, color):
225 |         self.renderer.background_color = color
226 | 
227 | 
228 | def to_variable(x):
229 |     if type(x) is not torch.autograd.Variable:
230 |         x = Variable(x, requires_grad=False)
231 |     return x
232 | 
233 | 
234 | def to_float_tensor(x):
235 |     if isinstance(x, np.ndarray):
236 |         x = torch.FloatTensor(x).cuda()
237 |     # ow assumed it's already a Tensor..
238 |     return x
239 | 
240 | 
241 | def convert_as(src, trg):
242 |     src = src.type_as(trg)
243 |     if src.is_cuda:
244 |         src = src.cuda(device=trg.get_device())
245 |     if type(trg) is torch.autograd.Variable:
246 |         src = Variable(src, requires_grad=False)
247 |     return src
248 | 
249 | 
250 | def visualize_img(img,
251 |                   cam,
252 |                   kp_pred,
253 |                   vert,
254 |                   renderer,
255 |                   kp_gt=None,
256 |                   text={},
257 |                   rotated_view=False,
258 |                   mesh_color='blue',
259 |                   pad_vals=None,
260 |                   no_text=False):
261 |     """
262 |     Visualizes the image with the ground truth keypoints and
263 |     predicted keypoints on left and image with mesh on right.
264 |     Keypoints should be in normalized coordinates, not image coordinates.
265 |     Args:
266 |         img: Image.
267 |         cam (3x1): Camera parameters.
268 |         kp_gt: Ground truth keypoints.
269 |         kp_pred: Predicted keypoints.
270 |         vert: Vertices.
271 |         renderer: SMPL renderer.
272 |         text (dict): Optional information to include in the image.
273 |         rotated_view (bool): If True, also visualizes mesh from another angle.
274 |         if pad_vals (2,) is not None, removes those values from the image
275 |             (undo img pad to make square)
276 |     Returns:
277 |         Combined image.
278 |     """
279 |     img_size = img.shape[0]
280 |     text.update({'sc': cam[0], 'tx': cam[1], 'ty': cam[2]})
281 |     if kp_gt is not None:
282 |         gt_vis = kp_gt[:, 2].astype(bool)
283 |         loss = np.sum((kp_gt[gt_vis, :2] - kp_pred[gt_vis])**2)
284 |         text['kpl'] = loss
285 | 
286 |     # Undo pre-processing.
287 |     # Make sure img is [0-255]
288 |     input_img = ((img + 1) * 0.5) * 255.
289 |     rend_img = renderer(vert, cam=cam, img=input_img, color_name=mesh_color)
290 |     if not no_text:
291 |         rend_img = draw_text(rend_img, text)
292 | 
293 |     # Draw skeletons
294 |     pred_joint = ((kp_pred + 1) * 0.5) * img_size
295 |     skel_img = draw_skeleton(input_img, pred_joint)
296 |     if kp_gt is not None:
297 |         gt_joint = ((kp_gt[:, :2] + 1) * 0.5) * img_size
298 |         skel_img = draw_skeleton(
299 |             skel_img, gt_joint, draw_edges=False, vis=gt_vis)
300 | 
301 |     if pad_vals is not None:
302 |         skel_img = remove_pads(skel_img, pad_vals)
303 |         rend_img = remove_pads(rend_img, pad_vals)
304 |     if rotated_view:
305 |         rot_img = renderer.rotated(
306 |             vert, 90, cam=cam, alpha=False, color_name=mesh_color)
307 |         if pad_vals is not None:
308 |             rot_img = remove_pads(rot_img, pad_vals)
309 | 
310 |         return skel_img / 255, rend_img / 255, rot_img / 255
311 | 
312 |     else:
313 |         return skel_img / 255, rend_img / 255
314 | 
315 | 
316 | def visualize_img_orig(cam, kp_pred, vert, renderer, start_pt, scale,
317 |                        proc_img_shape, im_path=None, img=None,
318 |                        rotated_view=False, mesh_color='blue', max_img_size=300,
319 |                        no_text=False, bbox=None, crop_cam=None):
320 |     """
321 |     Visualizes the image with the ground truth keypoints and predicted keypoints
322 |     in the original image space (squared).
323 |     If you get out of memory error, make max_img_size smaller.
324 |     Args:
325 |        must supply either the im_path or img
326 |        start_pt, scale, proc_img_shape are parameters used to preprocess the
327 |        image.
328 |        scale_result is how much to scale the current image
329 |     Returns:
330 |         Combined image.
331 |     """
332 |     if img is None:
333 |         img = imread(im_path)
334 |         # Pre-process image to [-1, 1] bc it expects this.
335 |         img = ((img / 255.) - 0.5) * 2
336 |     if np.max(img.shape[:2]) > max_img_size:
337 |         # if the image is too big it wont fit in gpu and nmr poops out.
338 |         scale_orig = max_img_size / float(np.max(img.shape[:2]))
339 |         img, _ = resize_img(img, scale_orig)
340 |         undo_scale = (1. / np.array(scale)) * scale_orig
341 |     else:
342 |         undo_scale = 1. / np.array(scale)
343 | 
344 |     if bbox is not None:
345 |         assert(crop_cam is not None)
346 |         img = img[bbox[0]:bbox[1], bbox[2]:bbox[3]]
347 |         # For these, the cameras are already adjusted.
348 |         start_pt = np.array([0, 0])
349 | 
350 |     # NMR needs images to be square..
351 |     img, pad_vals = make_square(img)
352 |     img_size = np.max(img.shape[:2])
353 |     renderer.renderer.image_size = img_size
354 | 
355 |     # Adjust kp_pred.
356 |     # This is in 224x224 cropped space.
357 |     pred_joint = ((kp_pred + 1) * 0.5) * proc_img_shape[0]
358 |     # This is in the original image.
359 |     pred_joint_orig = (pred_joint + start_pt - proc_img_shape[0]) * undo_scale
360 | 
361 |     # in normalize coord of the original image:
362 |     kp_orig = 2 * (pred_joint_orig / img_size) - 1
363 |     if bbox is not None:
364 |         use_cam = crop_cam
365 |     else:
366 | 
367 |         # This is camera in crop image coord.
368 |         cam_crop = np.hstack([proc_img_shape[0] * cam[0] * 0.5,
369 |                               cam[1:] + (2./cam[0]) * 0.5])
370 | 
371 |         # This is camera in orig image coord
372 |         cam_orig = np.hstack([
373 |             cam_crop[0] * undo_scale,
374 |             cam_crop[1:] + (start_pt - proc_img_shape[0]) / cam_crop[0]
375 |         ])
376 | 
377 |         # This is the camera in normalized orig_image coord
378 |         new_cam = np.hstack([
379 |             cam_orig[0] * (2. / img_size),
380 |             cam_orig[1:] - (1 / ((2./img_size) * cam_orig[0]))
381 |         ])
382 |         new_cam = new_cam.astype(np.float32)
383 |         use_cam = new_cam
384 | 
385 |     # Call visualize_img with this camera:
386 |     rendered_orig = visualize_img(
387 |         img=img,
388 |         cam=use_cam,
389 |         kp_pred=kp_orig,
390 |         vert=vert,
391 |         renderer=renderer,
392 |         rotated_view=rotated_view,
393 |         mesh_color=mesh_color,
394 |         pad_vals=pad_vals,
395 |         no_text=no_text,
396 |     )
397 | 
398 |     return rendered_orig
399 | 
400 | 
401 | def visualize_mesh_og(cam, vert, renderer, start_pt, scale, proc_img_shape,
402 |                       im_path=None, img=None, deg=0, mesh_color='blue',
403 |                       max_img_size=300, pad=50, crop_cam=None, bbox=None):
404 |     """
405 |     Visualize mesh in original image space.
406 |     If you get out of memory error, make max_img_size smaller.
407 |     If crop_cam and bbox is not None,
408 |     crops the image and uses the crop_cam to render.
409 |     (See compute_video_bbox.py)
410 |     """
411 |     if img is None:
412 |         img = imread(im_path)
413 |         # Pre-process image to [-1, 1] bc it expects this.
414 |         img = ((img / 255.) - 0.5) * 2
415 | 
416 |     if bbox is not None:
417 |         assert(crop_cam is not None)
418 |         img = img[bbox[0]:bbox[1], bbox[2]:bbox[3]]
419 |         # For these, the cameras are already adjusted.
420 |         scale = 1.
421 |         start_pt = np.array([0, 0])
422 |     if np.max(img.shape[:2]) > max_img_size:
423 |         # if the image is too big it wont fit in gpu and nmr poops out.
424 |         scale_orig = max_img_size / float(np.max(img.shape[:2]))
425 |         img, _ = resize_img(img, scale_orig)
426 |         undo_scale = (1. / np.array(scale)) * scale_orig
427 |     else:
428 |         undo_scale = 1. / np.array(scale)
429 |     # NMR needs images to be square..
430 |     img, pad_vals = make_square(img)
431 |     img_size = np.max(img.shape[:2])
432 |     renderer.renderer.image_size = img_size
433 | 
434 |     if bbox is not None:
435 |         return renderer.rotated(
436 |             verts=vert,
437 |             deg=deg,
438 |             cam=crop_cam,
439 |             color_name=mesh_color,
440 |         )
441 |     else:
442 |         # This is camera in crop image coord.
443 |         cam_crop = np.hstack([proc_img_shape[0] * cam[0] * 0.5,
444 |                               cam[1:] + (2./cam[0]) * 0.5])
445 | 
446 |         # This is camera in orig image coord
447 |         cam_orig = np.hstack([
448 |             cam_crop[0] * undo_scale,
449 |             cam_crop[1:] + (start_pt - proc_img_shape[0]) / cam_crop[0]
450 |         ])
451 | 
452 |         # This is the camera in normalized orig_image coord
453 |         new_cam = np.hstack([
454 |             cam_orig[0] * (2. / img_size),
455 |             cam_orig[1:] - (1 / ((2./img_size) * cam_orig[0]))
456 |         ])
457 |         new_cam = new_cam.astype(np.float32)
458 | 
459 |         return renderer.rotated(
460 |             verts=vert,
461 |             deg=deg,
462 |             cam=new_cam,
463 |             color_name=mesh_color,
464 |         )
465 | 
466 | 
467 | def make_square(img):
468 |     """
469 |     Bc nmr only deals with square image, adds pad to the shorter side.
470 |     """
471 |     img_size = np.max(img.shape[:2])
472 |     pad_vals = img_size - img.shape[:2]
473 | 
474 |     img = np.pad(
475 |         array=img,
476 |         pad_width=((0, pad_vals[0]), (0, pad_vals[1]), (0, 0)),
477 |         mode='constant'
478 |     )
479 | 
480 |     return img, pad_vals
481 | 
482 | 
483 | def remove_pads(img, pad_vals):
484 |     """
485 |     Undos padding done by make_square.
486 |     """
487 | 
488 |     if pad_vals[0] != 0:
489 |         img = img[:-pad_vals[0], :]
490 |     if pad_vals[1] != 0:
491 |         img = img[:, :-pad_vals[1]]
492 |     return img
493 | 
494 | 
495 | def compute_video_bbox(cams, kps, proc_infos, margin=10):
496 |     """
497 |     Given the prediction and original image info,
498 |     figures out the min/max extent (bbox)
499 |     of the person in the entire video.
500 |     Adjust the cameras so now ppl project in this new bbox.
501 |     Needed to crop the video around the person and also to
502 |     rotate the mesh.
503 |     cams: N x 3, predicted camera
504 |     joints: N x K x 3, predicted 3D joints for debug
505 |     kp: N x K x 3, predicted 2D joints to figure out extent
506 |     proc_infos: dict holding:
507 |        start_pt, scale: N x 2, N x 1
508 |          preprocessing done on this image.
509 |     im_shape: image shape after preprocessing
510 |     im_path: to the first image to figure out size of orig video
511 |     """
512 |     im_path = proc_infos[0]['im_path']
513 |     img = imread(im_path)
514 |     img_h, img_w = img.shape[:2]
515 |     img_size = np.max([img_h, img_w])
516 | 
517 |     im_shape = proc_infos[0]['im_shape'][0]
518 | 
519 |     new_cams = []
520 |     bboxes = []
521 |     # For each image, get the joints in the original coord frame:
522 |     for i, (proc_info, kp, cam) in enumerate(zip(proc_infos, kps, cams)):
523 |         scale = proc_info['scale']
524 |         start_pt = proc_info['start_pt']
525 | 
526 |         undo_scale = 1. / np.array(scale)
527 |         # Adjust kp_pred.
528 |         # This is in 224x224 cropped space.
529 |         pred_joint = ((kp + 1) * 0.5) * im_shape
530 |         # This is in the original image.
531 |         pred_joint_orig = (pred_joint + start_pt - im_shape) * undo_scale
532 |         # in normalize coord of the original image:
533 |         # kp_orig = 2 * (pred_joint_orig / img_size) - 1
534 |         # This is camera in crop image coord (224x224).
535 |         cam_crop = np.hstack([im_shape * cam[0] * 0.5,
536 |                               cam[1:] + (2./cam[0]) * 0.5])
537 |         # This is camera in orig image coord
538 |         cam_orig = np.hstack([
539 |             cam_crop[0] * undo_scale,
540 |             cam_crop[1:] + (start_pt - im_shape) / cam_crop[0]
541 |         ])
542 |         # This is the camera in normalized orig_image coord
543 |         new_cam = np.hstack([
544 |             cam_orig[0] * (2. / img_size),
545 |             cam_orig[1:] - (1 / ((2./img_size) * cam_orig[0]))
546 |         ])
547 |         new_cams.append(new_cam.astype(np.float32))
548 |         x = pred_joint_orig[:, 0]
549 |         y = pred_joint_orig[:, 1]
550 |         ymin = max(0, min(y) - margin)
551 |         ymax = min(img_h - 1, max(y) + margin)
552 |         xmin = max(0, min(x) - margin)
553 |         xmax = min(img_w - 1, max(x) + margin)
554 |         bbox = np.array([ymin, ymax, xmin, xmax])
555 | 
556 |         bboxes.append(bbox)
557 | 
558 |     # Figure out the video level bbox.
559 |     # bbox is in format [ymin, ymax, xmin, xmax]
560 |     bboxes = np.stack(bboxes)
561 |     bbox = np.array([
562 |         np.min(bboxes[:, 0]),
563 |         np.max(bboxes[:, 1]),
564 |         np.min(bboxes[:, 2]),
565 |         np.max(bboxes[:, 3])
566 |     ])
567 |     bbox = bbox.astype(np.int)
568 |     # Now adjust the cams by this bbox offset.
569 |     ymin, xmin = bbox[0], bbox[2]
570 |     new_offset = np.array([xmin, ymin])
571 |     new_offset_norm = np.linalg.norm(new_offset)
572 |     img_size_crop = np.max([bbox[1] - bbox[0], bbox[3] - bbox[2]])
573 | 
574 |     # Rotated images: save delta translation
575 |     new_cams_cropped = []
576 | 
577 |     for i, (proc_info, kp, cam) in enumerate(zip(proc_infos, kps, cams)):
578 |         scale = proc_info['scale']
579 | 
580 |         undo_scale = 1. / np.array(scale)
581 |         start_pt0 = proc_info['start_pt']
582 | 
583 |         start_pt = start_pt0 - (new_offset * scale)
584 | 
585 |         if np.linalg.norm(proc_info['start_pt']) < new_offset_norm:
586 |             print('crop is more than start pt..?')
587 |             import ipdb; ipdb.set_trace()
588 | 
589 |         # This is camera in crop image coord (224x224).
590 |         cam_crop = np.hstack([im_shape * cam[0] * 0.5,
591 |                               cam[1:] + (2./cam[0]) * 0.5])
592 | 
593 |         # This is camera in orig image coord
594 |         cam_orig = np.hstack([
595 |             cam_crop[0] * undo_scale,
596 |             cam_crop[1:] + (start_pt - im_shape) / cam_crop[0]
597 |         ])
598 | 
599 |         # This is the camera in normalized orig_image coord
600 |         new_cam = np.hstack([
601 |             cam_orig[0] * (2. / img_size_crop),
602 |             cam_orig[1:] - (1 / ((2./img_size_crop) * cam_orig[0]))
603 |         ])
604 |         new_cams_cropped.append(new_cam.astype(np.float32))
605 | 
606 |     return bbox, new_cams_cropped
607 | 
608 | 
609 | def get_params_from_omega(smpl_model, regressor, omega, cam=None):
610 |     cam = omega[:3] if cam is None else cam
611 |     pose = omega[3:3 + 72]
612 |     shape = omega[75:]
613 |     smpl_model.pose[:] = pose
614 |     smpl_model.betas[:] = shape
615 |     verts = np.copy(smpl_model.r)
616 |     joints = regressor.dot(verts)
617 |     kps = cam[0] * (joints[:, :2] + cam[1:])
618 |     return {
619 |         'cam': cam,
620 |         'joints': joints,
621 |         'kps': kps,
622 |         'pose': pose,
623 |         'shape': shape,
624 |         'verts': verts,
625 |     }
626 | 


--------------------------------------------------------------------------------
/src/tf_smpl/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jasonyzhang/phd/7b7f526d45913902ed93cdc49fdd59272698bd71/src/tf_smpl/__init__.py


--------------------------------------------------------------------------------
/src/tf_smpl/batch_lbs.py:
--------------------------------------------------------------------------------
  1 | """ Util functions for SMPL
  2 | @@batch_skew
  3 | @@batch_rodrigues
  4 | @@batch_lrotmin
  5 | @@batch_global_rigid_transformation
  6 | """
  7 | import tensorflow as tf
  8 | 
  9 | 
 10 | def batch_skew(vec, batch_size=None):
 11 |     """
 12 |     vec is N x 3, batch_size is int
 13 | 
 14 |     returns N x 3 x 3. Skew_sym version of each matrix.
 15 |     """
 16 |     with tf.name_scope("batch_skew", values=[vec]):
 17 |         if batch_size is None:
 18 |             batch_size = vec.shape.as_list()[0]
 19 |         col_inds = tf.constant([1, 2, 3, 5, 6, 7])
 20 |         indices = tf.reshape(
 21 |             tf.reshape(tf.range(0, batch_size) * 9, [-1, 1]) + col_inds,
 22 |             [-1, 1])
 23 |         updates = tf.reshape(
 24 |             tf.stack(
 25 |                 [
 26 |                     -vec[:, 2], vec[:, 1], vec[:, 2], -vec[:, 0], -vec[:, 1],
 27 |                     vec[:, 0]
 28 |                 ],
 29 |                 axis=1), [-1])
 30 |         out_shape = [batch_size * 9]
 31 |         res = tf.scatter_nd(indices, updates, out_shape)
 32 |         res = tf.reshape(res, [batch_size, 3, 3])
 33 | 
 34 |         return res
 35 | 
 36 | 
 37 | def batch_rodrigues(theta, name=None):
 38 |     """
 39 |     Theta is N x 3
 40 |     """
 41 |     with tf.name_scope(name, "batch_rodrigues", values=[theta]):
 42 |         batch_size = theta.shape.as_list()[0]
 43 |         angle = tf.expand_dims(tf.norm(theta + 1e-8, axis=1), -1)
 44 |         r = tf.expand_dims(tf.div(theta, angle), -1)
 45 | 
 46 |         angle = tf.expand_dims(angle, -1)
 47 |         cos = tf.cos(angle)
 48 |         sin = tf.sin(angle)
 49 | 
 50 |         outer = tf.matmul(r, r, transpose_b=True, name="outer")
 51 | 
 52 |         eyes = tf.tile(tf.expand_dims(tf.eye(3), 0), [batch_size, 1, 1])
 53 |         R = cos * eyes + (1 - cos) * outer + sin * batch_skew(
 54 |             r, batch_size=batch_size)
 55 |         return R
 56 | 
 57 | 
 58 | def batch_rot2aa(Rs):
 59 |     """
 60 |     Rs is B x 3 x 3
 61 |     void cMathUtil::RotMatToAxisAngle(const tMatrix& mat, tVector& out_axis,
 62 |                                       double& out_theta)
 63 |     {
 64 |         double c = 0.5 * (mat(0, 0) + mat(1, 1) + mat(2, 2) - 1);
 65 |         c = cMathUtil::Clamp(c, -1.0, 1.0);
 66 | 
 67 |         out_theta = std::acos(c);
 68 | 
 69 |         if (std::abs(out_theta) < 0.00001)
 70 |         {
 71 |             out_axis = tVector(0, 0, 1, 0);
 72 |         }
 73 |         else
 74 |         {
 75 |             double m21 = mat(2, 1) - mat(1, 2);
 76 |             double m02 = mat(0, 2) - mat(2, 0);
 77 |             double m10 = mat(1, 0) - mat(0, 1);
 78 |             double denom = std::sqrt(m21 * m21 + m02 * m02 + m10 * m10);
 79 |             out_axis[0] = m21 / denom;
 80 |             out_axis[1] = m02 / denom;
 81 |             out_axis[2] = m10 / denom;
 82 |             out_axis[3] = 0;
 83 |         }
 84 |     }
 85 |     """
 86 |     cos = 0.5 * (tf.trace(Rs) - 1)
 87 |     cos = tf.clip_by_value(cos, -1, 1)
 88 | 
 89 |     theta = tf.acos(cos)
 90 | 
 91 |     m21 = Rs[:, 2, 1] - Rs[:, 1, 2]
 92 |     m02 = Rs[:, 0, 2] - Rs[:, 2, 0]
 93 |     m10 = Rs[:, 1, 0] - Rs[:, 0, 1]
 94 |     denom = tf.sqrt(m21 * m21 + m02 * m02 + m10 * m10)
 95 | 
 96 |     axis0 = tf.where(tf.abs(theta) < 0.00001, m21, m21 / denom)
 97 |     axis1 = tf.where(tf.abs(theta) < 0.00001, m02, m02 / denom)
 98 |     axis2 = tf.where(tf.abs(theta) < 0.00001, m10, m10 / denom)
 99 | 
100 |     return tf.expand_dims(theta, 1) * tf.stack([axis0, axis1, axis2], 1)
101 | 
102 | 
103 | def batch_lrotmin(theta, name=None):
104 |     """ NOTE: not used bc I want to reuse R and this is simple.
105 |     Output of this is used to compute joint-to-pose blend shape mapping.
106 |     Equation 9 in SMPL paper.
107 | 
108 | 
109 |     Args:
110 |       pose: `Tensor`, N x 72 vector holding the axis-angle rep of K joints.
111 |             This includes the global rotation so K=24
112 | 
113 |     Returns
114 |       diff_vec : `Tensor`: N x 207 rotation matrix of 23=(K-1) joints with
115 |         identity subtracted.,
116 |     """
117 |     with tf.name_scope(name, "batch_lrotmin", [theta]):
118 |         with tf.name_scope("ignore_global"):
119 |             theta = theta[:, 3:]
120 | 
121 |         # N*23 x 3 x 3
122 |         Rs = batch_rodrigues(tf.reshape(theta, [-1, 3]))
123 |         lrotmin = tf.reshape(Rs - tf.eye(3), [-1, 207])
124 | 
125 |         return lrotmin
126 | 
127 | 
128 | def batch_global_rigid_transformation(Rs, Js, parent, rotate_base=False):
129 |     """
130 |     Computes absolute joint locations given pose.
131 | 
132 |     rotate_base: if True, rotates the global rotation by 90 deg in x axis.
133 |     if False, this is the original SMPL coordinate.
134 | 
135 |     Args:
136 |       Rs: N x 24 x 3 x 3 rotation vector of K joints
137 |       Js: N x 24 x 3, joint locations before posing
138 |       parent: 24 holding the parent id for each index
139 | 
140 |     Returns
141 |       new_J : `Tensor`: N x 24 x 3 location of absolute joints
142 |       A     : `Tensor`: N x 24 4 x 4 relative joint transformations for LBS.
143 |     """
144 |     with tf.name_scope("batch_forward_kinematics", values=[Rs, Js]):
145 |         N = Rs.shape[0].value
146 |         if rotate_base:
147 |             print('Flipping the SMPL coordinate frame!!!!')
148 |             rot_x = tf.constant(
149 |                 [[1, 0, 0], [0, -1, 0], [0, 0, -1]], dtype=Rs.dtype)
150 |             rot_x = tf.reshape(tf.tile(rot_x, [N, 1]), [N, 3, 3])
151 |             root_rotation = tf.matmul(Rs[:, 0, :, :], rot_x)
152 |         else:
153 |             root_rotation = Rs[:, 0, :, :]
154 | 
155 |         # Now Js is N x 24 x 3 x 1
156 |         Js = tf.expand_dims(Js, -1)
157 | 
158 |         def make_A(R, t, name=None):
159 |             # Rs is N x 3 x 3, ts is N x 3 x 1
160 |             with tf.name_scope(name, "Make_A", [R, t]):
161 |                 R_homo = tf.pad(R, [[0, 0], [0, 1], [0, 0]])
162 |                 t_homo = tf.concat([t, tf.ones([N, 1, 1])], 1)
163 |                 return tf.concat([R_homo, t_homo], 2)
164 | 
165 |         A0 = make_A(root_rotation, Js[:, 0])
166 |         results = [A0]
167 |         for i in range(1, parent.shape[0]):
168 |             j_here = Js[:, i] - Js[:, parent[i]]
169 |             A_here = make_A(Rs[:, i], j_here)
170 |             res_here = tf.matmul(
171 |                 results[parent[i]], A_here, name="propA%d" % i)
172 |             results.append(res_here)
173 | 
174 |         # 10 x 24 x 4 x 4
175 |         results = tf.stack(results, axis=1)
176 | 
177 |         new_J = results[:, :, :3, 3]
178 | 
179 |         # --- Compute relative A: Skinning is based on
180 |         # how much the bone moved (not the final location of the bone)
181 |         # but (final_bone - init_bone)
182 |         # ---
183 |         Js_w0 = tf.concat([Js, tf.zeros([N, 24, 1, 1])], 2)
184 |         init_bone = tf.matmul(results, Js_w0)
185 |         # Append empty 4 x 3:
186 |         init_bone = tf.pad(init_bone, [[0, 0], [0, 0], [0, 0], [3, 0]])
187 |         A = results - init_bone
188 | 
189 |         return new_J, A
190 | 


--------------------------------------------------------------------------------
/src/tf_smpl/batch_smpl.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Tensorflow SMPL implementation as batch.
  3 | Specify joint types:
  4 | 'coco': Returns COCO+ 19 joints
  5 | 'lsp': Returns H3.6M-LSP 14 joints
  6 | Note: To get original smpl joints, use self.J_transformed
  7 | """
  8 | import ipdb
  9 | import numpy as np
 10 | import pickle
 11 | 
 12 | import tensorflow as tf
 13 | from src.tf_smpl.batch_lbs import (
 14 |     batch_rodrigues,
 15 |     batch_global_rigid_transformation
 16 | )
 17 | 
 18 | 
 19 | # There are chumpy variables so convert them to numpy.
 20 | def undo_chumpy(x):
 21 |     return x if isinstance(x, np.ndarray) else x.r
 22 | 
 23 | 
 24 | class SMPL(object):
 25 |     def __init__(self, pkl_path, joint_type='cocoplus', dtype=tf.float32):
 26 |         """
 27 |         pkl_path is the path to a SMPL model
 28 |         """
 29 |         # -- Load SMPL params --
 30 |         with open(pkl_path, 'rb') as f:
 31 |             dd = pickle.load(f, encoding='latin1')
 32 |         # Mean template vertices
 33 |         self.v_template = tf.Variable(
 34 |             undo_chumpy(dd['v_template']),
 35 |             name='v_template',
 36 |             dtype=dtype,
 37 |             trainable=False)
 38 |         # Size of mesh [Number of vertices, 3]
 39 |         self.size = [self.v_template.shape[0].value, 3]
 40 |         self.num_betas = dd['shapedirs'].shape[-1]
 41 |         # Shape blend shape basis: 6980 x 3 x 10
 42 |         # reshaped to 6980*30 x 10, transposed to 10x6980*3
 43 |         shapedir = np.reshape(
 44 |             undo_chumpy(dd['shapedirs']), [-1, self.num_betas]).T
 45 |         self.shapedirs = tf.Variable(
 46 |             shapedir, name='shapedirs', dtype=dtype, trainable=False)
 47 | 
 48 |         # Regressor for joint locations given shape - 6890 x 24
 49 |         self.J_regressor = tf.Variable(
 50 |             dd['J_regressor'].T.todense(),
 51 |             name="J_regressor",
 52 |             dtype=dtype,
 53 |             trainable=False)
 54 | 
 55 |         # Pose blend shape basis: 6890 x 3 x 207, reshaped to 6890*30 x 207
 56 |         num_pose_basis = dd['posedirs'].shape[-1]
 57 |         # 207 x 20670
 58 |         posedirs = np.reshape(
 59 |             undo_chumpy(dd['posedirs']), [-1, num_pose_basis]).T
 60 |         self.posedirs = tf.Variable(
 61 |             posedirs, name='posedirs', dtype=dtype, trainable=False)
 62 | 
 63 |         # indices of parents for each joints
 64 |         self.parents = dd['kintree_table'][0].astype(np.int32)
 65 | 
 66 |         # LBS weights
 67 |         self.weights = tf.Variable(
 68 |             undo_chumpy(dd['weights']),
 69 |             name='lbs_weights',
 70 |             dtype=dtype,
 71 |             trainable=False)
 72 | 
 73 |         # This returns 19 keypoints: 6890 x 19
 74 |         self.joint_regressor = tf.Variable(
 75 |             dd['cocoplus_regressor'].T.todense(),
 76 |             name="cocoplus_regressor",
 77 |             dtype=dtype,
 78 |             trainable=False)
 79 |         if joint_type == 'lsp':  # 14 LSP joints!
 80 |             self.joint_regressor = self.joint_regressor[:, :14]
 81 | 
 82 |         if joint_type not in ['cocoplus', 'lsp']:
 83 |             print('Unknown joint type: {}, it must be either "cocoplus" '
 84 |                   'or "lsp"'.format(joint_type))
 85 |             ipdb.set_trace()
 86 | 
 87 |     def __call__(self, beta, theta, get_skin=False, name=None):
 88 |         """
 89 |         Obtain SMPL with shape (beta) & pose (theta) inputs.
 90 |         Theta includes the global rotation.
 91 |         Args:
 92 |           beta: N x 10
 93 |           theta: N x 72 (with 3-D axis-angle rep)
 94 | 
 95 |         Updates:
 96 |         self.J_transformed: N x 24 x 3 joint location after shaping
 97 |                  & posing with beta and theta
 98 |         Returns:
 99 |           - joints: N x 19 or 14 x 3 joint locations depending on joint_type
100 |         If get_skin is True, also returns
101 |           - Verts: N x 6890 x 3
102 |         """
103 |         with tf.name_scope(name, "smpl_main", [beta, theta]):
104 |             num_batch = beta.shape[0].value
105 | 
106 |             # 1. Add shape blend shapes
107 |             # (N x 10) x (10 x 6890*3) = N x 6890 x 3
108 |             v_shaped = tf.reshape(
109 |                 tf.matmul(beta, self.shapedirs, name='shape_bs'),
110 |                 [-1, self.size[0], self.size[1]]) + self.v_template
111 | 
112 |             # 2. Infer shape-dependent joint locations.
113 |             Jx = tf.matmul(v_shaped[:, :, 0], self.J_regressor)
114 |             Jy = tf.matmul(v_shaped[:, :, 1], self.J_regressor)
115 |             Jz = tf.matmul(v_shaped[:, :, 2], self.J_regressor)
116 |             J = tf.stack([Jx, Jy, Jz], axis=2)
117 | 
118 |             # 3. Add pose blend shapes
119 |             # N x 24 x 3 x 3
120 |             # only do rodrigues if theta has axis angle representation
121 |             Rs = tf.reshape(
122 |                 batch_rodrigues(tf.reshape(theta, [-1, 3])), [-1, 24, 3, 3])
123 |             with tf.name_scope("lrotmin"):
124 |                 # Ignore global rotation.
125 |                 pose_feature = tf.reshape(Rs[:, 1:, :, :] - tf.eye(3),
126 |                                           [-1, 207])
127 | 
128 |             # (N x 207) x (207, 20670) -> N x 6890 x 3
129 |             v_posed = tf.reshape(
130 |                 tf.matmul(pose_feature, self.posedirs),
131 |                 [-1, self.size[0], self.size[1]]) + v_shaped
132 | 
133 |             #4. Get the global joint location
134 |             self.J_transformed, A = batch_global_rigid_transformation(
135 |                 Rs, J, self.parents)
136 | 
137 |             # 5. Do skinning:
138 |             # W is N x 6890 x 24
139 |             W = tf.reshape(
140 |                 tf.tile(self.weights, [num_batch, 1]), [num_batch, -1, 24])
141 |             # (N x 6890 x 24) x (N x 24 x 16)
142 |             T = tf.reshape(
143 |                 tf.matmul(W, tf.reshape(A, [num_batch, 24, 16])),
144 |                 [num_batch, -1, 4, 4])
145 |             v_posed_homo = tf.concat(
146 |                 [v_posed, tf.ones([num_batch, v_posed.shape[1], 1])], 2)
147 |             v_homo = tf.matmul(T, tf.expand_dims(v_posed_homo, -1))
148 | 
149 |             verts = v_homo[:, :, :3, 0]
150 | 
151 |             # Get cocoplus or lsp joints:
152 |             joint_x = tf.matmul(verts[:, :, 0], self.joint_regressor)
153 |             joint_y = tf.matmul(verts[:, :, 1], self.joint_regressor)
154 |             joint_z = tf.matmul(verts[:, :, 2], self.joint_regressor)
155 |             joints = tf.stack([joint_x, joint_y, joint_z], axis=2)
156 | 
157 |             if get_skin:
158 |                 return verts, joints, Rs
159 |             else:
160 |                 return joints
161 | 


--------------------------------------------------------------------------------
/src/tf_smpl/projection.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Util functions implementing the camera
  3 | 
  4 | @@batch_orth_proj_idrot
  5 | @@batch_orth_proj_optcam
  6 | @@procrustes2d_vis
  7 | """
  8 | 
  9 | import tensorflow as tf
 10 | 
 11 | 
 12 | def batch_orth_proj_idrot(X, camera, name=None):
 13 |     """
 14 |     X is N x num_points x 3
 15 |     camera is N x 3
 16 |     same as applying orth_proj_idrot to each N
 17 |     """
 18 |     with tf.name_scope(name, "batch_orth_proj_idrot", [X, camera]):
 19 |         # TODO check X dim size.
 20 |         # tf.Assert(X.shape[2] == 3, [X])
 21 | 
 22 |         camera = tf.reshape(camera, [-1, 1, 3], name="cam_adj_shape")
 23 | 
 24 |         X_trans = X[:, :, :2] + camera[:, :, 1:]
 25 | 
 26 |         shape = tf.shape(X_trans)
 27 |         return tf.reshape(
 28 |             camera[:, :, 0] * tf.reshape(X_trans, [shape[0], -1]), shape)
 29 | 
 30 | 
 31 | def batch_orth_proj_optcam(X, X_gt, unbounded=False, name=None):
 32 |     """
 33 |     Solves for best sale and translation in 2D, i.e.
 34 |     gives (s, t) such that ||s(x + t) - x_gt||^2
 35 |     X is N x K x 2, for [x, y] pred (via identity).
 36 |     X_gt is N x K x 3, the 3rd dim is visibility
 37 | 
 38 |     returns proj_x: N x K x 2 and best_cam:[scale, trans]
 39 |     """
 40 |     with tf.name_scope(name, "batch_orth_proj_optcam", [X, X_gt]):
 41 |         best_cam = procrustes2d_vis(X, X_gt, unbounded=unbounded)
 42 |         best_cam = tf.stop_gradient(best_cam)
 43 |         proj_x = batch_orth_proj_idrot(X, best_cam)
 44 |         return proj_x, best_cam
 45 | 
 46 | 
 47 | def procrustes2d_vis(X, X_target, unbounded=True):
 48 |     """
 49 |     Solves for the optimal sale and translation in 2D, i.e.
 50 |     gives (s, t) such that ||s(x + t) - x_gt||^2
 51 |     on *visible* points.
 52 | 
 53 |     Gradient is stopped on the computed camera.
 54 | 
 55 |     if unbounded is False (i.e. bounded), it lower bounds the scale
 56 |     so it can't be so small.
 57 | 
 58 |     X: N x K x 2 or N x K x 3 (last dim is dropped)
 59 |     X_target: N x K x 3, 3rd dim is visibility.
 60 | 
 61 |     returns best_cam: N x 3
 62 |     """
 63 |     assert len(X_target.shape) == 3
 64 |     with tf.name_scope('procrustes2d_vis', values=[X, X_target]):
 65 |         # Turn vis into [0, 1]
 66 |         vis = tf.cast(X_target[:, :, 2] > 0, tf.float32)
 67 |         vis_vec = tf.expand_dims(vis, 2)
 68 |         # Prepare data.
 69 |         x_target = X_target[:, :, :2]
 70 |         x = X[:, :, :2]
 71 |         # Start:
 72 |         # Make sure invisible points dont contribute
 73 |         # (They could be not always 0...)
 74 |         x_vis = vis_vec * x
 75 |         x_target_vis = vis_vec * x_target
 76 | 
 77 |         num_vis = tf.expand_dims(tf.reduce_sum(vis, 1, keepdims=True), 2)
 78 | 
 79 |         # need to compute mean ignoring the non-vis
 80 |         mu1 = tf.reduce_sum(x_vis, 1, keepdims=True) / num_vis
 81 |         mu2 = tf.reduce_sum(x_target_vis, 1, keepdims=True) / num_vis
 82 |         # Need to 0 out the ignore region again
 83 |         xmu = vis_vec * (x - mu1)
 84 |         y = vis_vec * (x_target - mu2)
 85 | 
 86 |         # A_inv = inv(x'x)
 87 |         # scale = trace(A_inv * (x'x_target) ) / 2.
 88 |         # trans = mu_target / scale - mu
 89 |         # Add noise on the diagonal to avoid numerical instability
 90 |         # for taking inv.
 91 |         eps = 1e-6 * tf.eye(2)
 92 |         Ainv = tf.matrix_inverse(tf.matmul(xmu, xmu, transpose_a=True) + eps)
 93 |         B = tf.matmul(xmu, y, transpose_a=True)
 94 | 
 95 |         scale = tf.expand_dims(tf.trace(tf.matmul(Ainv, B)) / 2., 1)
 96 | 
 97 |         if not unbounded:
 98 |             print('Optcam: lowerbound scale')
 99 |             # only need the lower bound, but setting max to 10 bc tf doesn't
100 |             # take None.
101 |             scale = tf.clip_by_value(scale, 0.7, 10)
102 | 
103 |         trans = tf.squeeze(mu2) / scale - tf.squeeze(mu1)
104 | 
105 |         best_cam = tf.concat([scale, trans], 1)
106 |         return best_cam
107 | 


--------------------------------------------------------------------------------
/src/util/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jasonyzhang/phd/7b7f526d45913902ed93cdc49fdd59272698bd71/src/util/__init__.py


--------------------------------------------------------------------------------
/src/util/common.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import os.path as osp
 3 | 
 4 | import cv2
 5 | import numpy as np
 6 | 
 7 | 
 8 | def resize_img(img, scale_factor):
 9 |     new_size = (np.floor(np.array(img.shape[0:2]) * scale_factor)).astype(int)
10 |     new_img = cv2.resize(img, (new_size[1], new_size[0]))
11 |     # This is scale factor of [height, width] i.e. [y, x]
12 |     actual_factor = [
13 |         new_size[0] / float(img.shape[0]), new_size[1] / float(img.shape[1])
14 |     ]
15 |     return new_img, actual_factor
16 | 
17 | 
18 | def mkdir(dir_path):
19 |     if not osp.exists(dir_path):
20 |         os.makedirs(dir_path)
21 | 


--------------------------------------------------------------------------------
/src/util/render_utils.py:
--------------------------------------------------------------------------------
  1 | import cv2
  2 | import numpy as np
  3 | 
  4 | 
  5 | def add_alpha(img, alpha=1):
  6 |     shape = img.shape[:2] + (1,)
  7 |     alpha_channel = alpha * np.ones(shape)
  8 |     return np.dstack((img, alpha_channel))
  9 | 
 10 | 
 11 | def draw_text(input_image, content):
 12 |     """
 13 |     content is a dict. draws key: val on image
 14 |     Assumes key is str, val is float
 15 |     """
 16 |     image = input_image.copy()
 17 |     input_is_float = False
 18 |     if np.issubdtype(image.dtype, np.float):
 19 |         input_is_float = True
 20 |         image = (image * 255).astype(np.uint8)
 21 | 
 22 |     green = [57, 255, 20]
 23 |     margin = 15
 24 |     start_x = 5
 25 |     start_y = margin
 26 |     for key in sorted(content.keys()):
 27 |         value = content[key]
 28 |         if isinstance(value, str):
 29 |             text = '{}: {}'.format(key, value)
 30 |         else:
 31 |             text = "%s: %.2g" % (key, value)
 32 |         cv2.putText(image, text, (start_x, start_y), 0, 0.5, green, thickness=2)
 33 |         start_y += margin
 34 | 
 35 |     if input_is_float:
 36 |         image = image.astype(np.float32) / 255.
 37 |     return image
 38 | 
 39 | 
 40 | def draw_skeleton(input_image, joints, draw_edges=True, vis=None, radius=None):
 41 |     """
 42 |     joints is 3 x 19. but if not will transpose it.
 43 |     0: Right heel
 44 |     1: Right knee
 45 |     2: Right hip
 46 |     3: Left hip
 47 |     4: Left knee
 48 |     5: Left heel
 49 |     6: Right wrist
 50 |     7: Right elbow
 51 |     8: Right shoulder
 52 |     9: Left shoulder
 53 |     10: Left elbow
 54 |     11: Left wrist
 55 |     12: Neck
 56 |     13: Head top
 57 |     14: nose
 58 |     15: left_eye
 59 |     16: right_eye
 60 |     17: left_ear
 61 |     18: right_ear
 62 |     19: left big toe
 63 |     20: right big toe
 64 |     21: Left small toe
 65 |     22: Right small toe
 66 |     23: L ankle
 67 |     24: R ankle
 68 |     """
 69 |     if radius is None:
 70 |         radius = max(4, (np.mean(input_image.shape[:2]) * 0.01).astype(int))
 71 | 
 72 |     colors = {
 73 |         'pink':  [197, 27, 125],  # L lower leg
 74 |         'light_pink': [233, 163, 201],  # L upper leg
 75 |         'light_green': [161, 215, 106],  # L lower arm
 76 |         'green': [77, 146, 33],  # L upper arm
 77 |         'red': [215, 48, 39],  # head
 78 |         'light_red': [252, 146, 114],  # head
 79 |         'light_orange': [252, 141, 89],  # chest
 80 |         'orange': [200,90,39],
 81 |         'purple': [118, 42, 131],  # R lower leg
 82 |         'light_purple': [175, 141, 195],  # R upper
 83 |         'light_blue': [145, 191, 219],  # R lower arm
 84 |         'blue': [69, 117, 180],  # R upper arm
 85 |         'gray': [130, 130, 130],  #
 86 |         'white': [255, 255, 255],  #
 87 |     }
 88 | 
 89 |     image = input_image.copy()
 90 |     input_is_float = False
 91 | 
 92 |     if (np.issubdtype(image.dtype, np.float32) or
 93 |             np.issubdtype(image.dtype, np.float64)):
 94 |         input_is_float = True
 95 |         max_val = image.max()
 96 |         if max_val <= 2.:  # should be 1 but sometimes it's slightly above 1
 97 |             image = (image * 255).astype(np.uint8)
 98 |         else:
 99 |             image = (image).astype(np.uint8)
100 | 
101 |     if joints.shape[0] != 2:
102 |         joints = joints.T
103 |     joints = np.round(joints).astype(int)
104 | 
105 |     jcolors = [
106 |         'light_pink', 'light_pink', 'light_pink', 'pink', 'pink', 'pink',
107 |         'light_blue', 'light_blue', 'light_blue', 'blue', 'blue', 'blue',
108 |         'purple', 'purple', 'red', 'green', 'green', 'white', 'white',
109 |         'orange','light_orange','orange','light_orange','pink','light_pink'
110 |     ]
111 | 
112 |     if joints.shape[1] == 19:
113 |         # parent indices -1 means no parents
114 |         parents = np.array([
115 |             1, 2, 8, 9, 3, 4, 7, 8, 12, 12, 9, 10, 14, -1, 13, -1, -1, 15, 16
116 |         ])
117 |         # Left is dark and right is light.
118 |         ecolors = {
119 |             0: 'light_pink',
120 |             1: 'light_pink',
121 |             2: 'light_pink',
122 |             3: 'pink',
123 |             4: 'pink',
124 |             5: 'pink',
125 |             6: 'light_blue',
126 |             7: 'light_blue',
127 |             8: 'light_blue',
128 |             9: 'blue',
129 |             10: 'blue',
130 |             11: 'blue',
131 |             12: 'purple',
132 |             17: 'light_green',
133 |             18: 'light_green',
134 |             14: 'purple'
135 |         }
136 |     elif joints.shape[1] == 19:
137 |         parents = np.array([
138 |             1,
139 |             2,
140 |             8,
141 |             9,
142 |             3,
143 |             4,
144 |             7,
145 |             8,
146 |             -1,
147 |             -1,
148 |             9,
149 |             10,
150 |             13,
151 |             -1,
152 |         ])
153 |         ecolors = {
154 |             0: 'light_pink',
155 |             1: 'light_pink',
156 |             2: 'light_pink',
157 |             3: 'pink',
158 |             4: 'pink',
159 |             5: 'pink',
160 |             6: 'light_blue',
161 |             7: 'light_blue',
162 |             10: 'light_blue',
163 |             11: 'blue',
164 |             12: 'purple'
165 |         }
166 |     elif joints.shape[1] == 25:
167 |         # parent indices -1 means no parents
168 |         parents = np.array([
169 |             24, 2, 8, 9, 3, 23, 7, 8, 12, 12, 9, 10, 14, -1, 13, -1, -1, 15,
170 |             16, 23, 24, 19, 20, 4, 1
171 |         ])
172 |         # Left is dark and right is light.
173 |         ecolors = {
174 |             0: 'light_pink',
175 |             1: 'light_pink',
176 |             2: 'light_pink',
177 |             3: 'pink',
178 |             4: 'pink',
179 |             5: 'pink',
180 |             6: 'light_blue',
181 |             7: 'light_blue',
182 |             8: 'light_blue', # Right shoulder
183 |             9: 'blue',
184 |             10: 'blue',
185 |             11: 'blue',
186 |             12: 'purple',
187 |             17: 'light_green',
188 |             18: 'light_green',
189 |             14: 'purple',
190 |             19: 'orange', # Left Big Toe
191 |             20: 'light_orange', # Right Big Toe
192 |             21: 'orange', # Left Small Toe
193 |             22: 'light_orange', # Right Small Toe
194 |             # Ankles!
195 |             23: 'green', # Left
196 |             24: 'gray'  # Right
197 |         }
198 |     else:
199 |         print('Unknown skeleton!!')
200 |         import ipdb
201 |         ipdb.set_trace()
202 | 
203 |     for child in range(len(parents)):
204 |         point = joints[:, child]
205 |         # If invisible skip
206 |         if vis is not None and vis[child] == 0:
207 |             continue
208 |         if draw_edges:
209 |             cv2.circle(image, (point[0], point[1]), radius, colors['white'],
210 |                        -1)
211 |             cv2.circle(image, (point[0], point[1]), radius - 1,
212 |                        colors[jcolors[child]], -1)
213 |         else:
214 |             cv2.circle(image, (point[0], point[1]), radius - 1,
215 |                        colors[jcolors[child]], 1)
216 |         pa_id = parents[child]
217 |         if draw_edges and pa_id >= 0:
218 |             if vis is not None and vis[pa_id] == 0:
219 |                 continue
220 |             point_pa = joints[:, pa_id]
221 |             cv2.circle(image, (point_pa[0], point_pa[1]), radius - 1,
222 |                        colors[jcolors[pa_id]], -1)
223 |             if child not in ecolors.keys():
224 |                 print('bad')
225 |                 import ipdb
226 |                 ipdb.set_trace()
227 |             cv2.line(image, (point[0], point[1]), (point_pa[0], point_pa[1]),
228 |                      colors[ecolors[child]], radius - 2)
229 | 
230 |     # Convert back in original dtype
231 |     if input_is_float:
232 |         if max_val <= 1.:
233 |             image = image.astype(np.float32) / 255.
234 |         else:
235 |             image = image.astype(np.float32)
236 |     return image
237 | 


--------------------------------------------------------------------------------
/src/util/smooth_bbox.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from scipy.ndimage.filters import gaussian_filter1d
  3 | import scipy.signal as signal
  4 | 
  5 | 
  6 | def get_smooth_bbox_params(kps, vis_thresh=2, kernel_size=11, sigma=3):
  7 |     """
  8 |     Computes smooth bounding box parameters from keypoints:
  9 |       1. Computes bbox by rescaling the person to be around 150 px.
 10 |       2. Linearly interpolates bbox params for missing annotations.
 11 |       3. Median filtering
 12 |       4. Gaussian filtering.
 13 |     Recommended thresholds:
 14 |       * detect-and-track: 0
 15 |       * 3DPW: 0.1
 16 |     Args:
 17 |         kps (list or ndarray): List of kps (Nx3) or None.
 18 |         vis_thresh (float): Threshold for visibility.
 19 |         kernel_size (int): Kernel size for median filtering (must be odd).
 20 |         sigma (float): Sigma for gaussian smoothing.
 21 |     Returns:
 22 |         Smooth bbox params [cx, cy, scale], start index, end index
 23 |     """
 24 |     bbox_params, start, end = get_all_bbox_params(kps, vis_thresh)
 25 |     smoothed = smooth_bbox_params(bbox_params, kernel_size, sigma)
 26 |     smoothed = np.vstack((np.zeros((start, 3)), smoothed))
 27 |     return smoothed, start, end
 28 | 
 29 | 
 30 | def kp_to_bbox_param(kp, vis_thresh):
 31 |     """
 32 |     Finds the bounding box parameters from the 2D keypoints.
 33 |     Args:
 34 |         kp (Kx3): 2D Keypoints.
 35 |         vis_thresh (float): Threshold for visibility.
 36 |     Returns:
 37 |         [center_x, center_y, scale]
 38 |     """
 39 |     if kp is None:
 40 |         return
 41 |     vis = kp[:, 2] > vis_thresh
 42 |     if not np.any(vis):
 43 |         return
 44 |     min_pt = np.min(kp[vis, :2], axis=0)
 45 |     max_pt = np.max(kp[vis, :2], axis=0)
 46 |     person_height = np.linalg.norm(max_pt - min_pt)
 47 |     if person_height < 0.5:
 48 |         return
 49 |     center = (min_pt + max_pt) / 2.
 50 |     scale = 150. / person_height
 51 | 
 52 |     return np.append(center, scale)
 53 | 
 54 | 
 55 | def get_all_bbox_params(kps, vis_thresh=2):
 56 |     """
 57 |     Finds bounding box parameters for all keypoints.
 58 |     Look for sequences in the middle with no predictions and linearly
 59 |     interpolate the bbox params for those
 60 |     Args:
 61 |         kps (list): List of kps (Kx3) or None.
 62 |         vis_thresh (float): Threshold for visibility.
 63 |     Returns:
 64 |         bbox_params, start_index (incl), end_index (excl)
 65 |     """
 66 |     # keeps track of how many indices in a row with no prediction
 67 |     num_to_interpolate = 0
 68 |     start_index = -1
 69 |     bbox_params = np.empty(shape=(0, 3), dtype=np.float32)
 70 | 
 71 |     for i, kp in enumerate(kps):
 72 |         bbox_param = kp_to_bbox_param(kp, vis_thresh=vis_thresh)
 73 |         if bbox_param is None:
 74 |             num_to_interpolate += 1
 75 |             continue
 76 | 
 77 |         if start_index == -1:
 78 |             # Found the first index with a prediction!
 79 |             start_index = i
 80 |             num_to_interpolate = 0
 81 | 
 82 |         if num_to_interpolate > 0:
 83 |             # Linearly interpolate each param.
 84 |             previous = bbox_params[-1]
 85 |             # This will be 3x(n+2)
 86 |             interpolated = np.array(
 87 |                 [np.linspace(prev, curr, num_to_interpolate + 2)
 88 |                  for prev, curr in zip(previous, bbox_param)])
 89 |             bbox_params = np.vstack((bbox_params, interpolated.T[1:-1]))
 90 |             num_to_interpolate = 0
 91 |         bbox_params = np.vstack((bbox_params, bbox_param))
 92 | 
 93 |     return bbox_params, start_index, i - num_to_interpolate + 1
 94 | 
 95 | 
 96 | def smooth_bbox_params(bbox_params, kernel_size=11, sigma=8):
 97 |     """
 98 |     Applies median filtering and then gaussian filtering to bounding box
 99 |     parameters.
100 |     Args:
101 |         bbox_params (Nx3): [cx, cy, scale].
102 |         kernel_size (int): Kernel size for median filtering (must be odd).
103 |         sigma (float): Sigma for gaussian smoothing.
104 |     Returns:
105 |         Smoothed bounding box parameters (Nx3).
106 |     """
107 |     smoothed = np.array([signal.medfilt(param, kernel_size)
108 |                          for param in bbox_params.T]).T
109 |     return np.array([gaussian_filter1d(traj, sigma) for traj in smoothed.T]).T
110 | 


--------------------------------------------------------------------------------
/src/util/torch_utils.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | 
 3 | 
 4 | def orthographic_proj_withz_idrot(X, cam, offset_z=0.):
 5 |     """
 6 |     X: B x N x 3
 7 |     cam: B x 3: [sc, tx, ty]
 8 |     No rotation!
 9 |     Orth preserving the z.
10 |     sc * ( x + [tx; ty])
11 |     as in HMR..
12 |     """
13 |     scale = cam[:, 0].contiguous().view(-1, 1, 1)
14 |     trans = cam[:, 1:3].contiguous().view(cam.size(0), 1, -1)
15 | 
16 |     # proj = scale * X
17 |     proj = X
18 | 
19 |     proj_xy = scale * (proj[:, :, :2] + trans)
20 |     proj_z = proj[:, :, 2, None] + offset_z
21 | 
22 |     return torch.cat((proj_xy, proj_z), 2)
23 | 
24 | 
25 | def orthographic_proj_withz(X, cam, offset_z=0.):
26 |     """
27 |     X: B x N x 3
28 |     cam: B x 7: [sc, tx, ty, quaternions]
29 |     Orth preserving the z.
30 |     sc * ( x + [tx; ty])
31 |     as in HMR..
32 |     """
33 |     quat = cam[:, -4:]
34 |     X_rot = quat_rotate(X, quat)
35 | 
36 |     scale = cam[:, 0].contiguous().view(-1, 1, 1)
37 |     trans = cam[:, 1:3].contiguous().view(cam.size(0), 1, -1)
38 | 
39 |     # proj = scale * X_rot
40 |     proj = X_rot
41 | 
42 |     proj_xy = scale * (proj[:, :, :2] + trans)
43 |     proj_z = proj[:, :, 2, None] + offset_z
44 | 
45 |     return torch.cat((proj_xy, proj_z), 2)
46 | 
47 | 
48 | def quat_rotate(X, q):
49 |     """Rotate points by quaternions.
50 |     Args:
51 |         X: B X N X 3 points
52 |         q: B X 4 quaternions
53 |     Returns:
54 |         X_rot: B X N X 3 (rotated points)
55 |     """
56 |     # repeat q along 2nd dim
57 |     ones_x = X[[0], :, :][:, :, [0]] * 0 + 1
58 |     q = torch.unsqueeze(q, 1) * ones_x
59 | 
60 |     q_conj = torch.cat([q[:, :, [0]], -1 * q[:, :, 1:4]], dim=-1)
61 |     X = torch.cat([X[:, :, [0]] * 0, X], dim=-1)
62 | 
63 |     X_rot = hamilton_product(q, hamilton_product(X, q_conj))
64 |     return X_rot[:, :, 1:4]
65 | 
66 | 
67 | def hamilton_product(qa, qb):
68 |     """Multiply qa by qb.
69 |     Args:
70 |         qa: B X N X 4 quaternions
71 |         qb: B X N X 4 quaternions
72 |     Returns:
73 |         q_mult: B X N X 4
74 |     """
75 |     qa_0 = qa[:, :, 0]
76 |     qa_1 = qa[:, :, 1]
77 |     qa_2 = qa[:, :, 2]
78 |     qa_3 = qa[:, :, 3]
79 | 
80 |     qb_0 = qb[:, :, 0]
81 |     qb_1 = qb[:, :, 1]
82 |     qb_2 = qb[:, :, 2]
83 |     qb_3 = qb[:, :, 3]
84 | 
85 |     # See https://en.wikipedia.org/wiki/Quaternion#Hamilton_product
86 |     q_mult_0 = qa_0 * qb_0 - qa_1 * qb_1 - qa_2 * qb_2 - qa_3 * qb_3
87 |     q_mult_1 = qa_0 * qb_1 + qa_1 * qb_0 + qa_2 * qb_3 - qa_3 * qb_2
88 |     q_mult_2 = qa_0 * qb_2 - qa_1 * qb_3 + qa_2 * qb_0 + qa_3 * qb_1
89 |     q_mult_3 = qa_0 * qb_3 + qa_1 * qb_2 - qa_2 * qb_1 + qa_3 * qb_0
90 | 
91 |     return torch.stack([q_mult_0, q_mult_1, q_mult_2, q_mult_3], dim=-1)


--------------------------------------------------------------------------------
/src/util/video.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import shutil
  3 | import subprocess
  4 | import tempfile
  5 | 
  6 | import matplotlib.pyplot as plt
  7 | from tqdm import tqdm
  8 | 
  9 | 
 10 | def images_to_video(output_path, images, fps):
 11 |     writer = VideoWriter(output_path, fps)
 12 |     writer.add_images(images)
 13 |     writer.make_video()
 14 |     writer.close()
 15 | 
 16 | 
 17 | def sizeof_fmt(num, suffix='B'):
 18 |     """
 19 |     Returns the filesize as human readable string.
 20 | 
 21 |     https://stackoverflow.com/questions/1094841/reusable-library-to-get-human-
 22 |         readable-version-of-file-size
 23 |     """
 24 |     for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']:
 25 |         if abs(num) < 1024.0:
 26 |             return '%3.1f%s%s' % (num, unit, suffix)
 27 |         num /= 1024.0
 28 |     return '%.1f%s%s' % (num, 'Yi', suffix)
 29 | 
 30 | 
 31 | def get_dir_size(dirname):
 32 |     """
 33 |     Returns the size of the contents of a directory. (Doesn't include subdirs.)
 34 |     """
 35 |     size = 0
 36 |     for fname in os.listdir(dirname):
 37 |         fname = os.path.join(dirname, fname)
 38 |         if os.path.isfile(fname):
 39 |             size += os.path.getsize(fname)
 40 |     return size
 41 | 
 42 | 
 43 | class VideoWriter(object):
 44 | 
 45 |     def __init__(self, output_path, fps, temp_dir=None):
 46 |         self.output_path = output_path
 47 |         self.fps = fps
 48 |         self.temp_dir = temp_dir
 49 |         self.current_index = 0
 50 |         self.img_shape = None
 51 |         self.frame_string = 'frame{:08}.jpg'
 52 | 
 53 |     def add_images(self, images_list, show_pbar=False):
 54 |         """
 55 |         Adds a list of images to temporary directory.
 56 | 
 57 |         Args:
 58 |             images_list (iterable): List of images (HxWx3).
 59 |             show_pbar (bool): If True, displays a progress bar.
 60 | 
 61 |         Returns:
 62 |             list: filenames of saved images.
 63 |         """
 64 |         filenames = []
 65 |         if show_pbar:
 66 |             images_list = tqdm(images_list)
 67 |         for image in images_list:
 68 |             filenames.append(self.add_image(image))
 69 |         return filenames
 70 | 
 71 |     def add_image(self, image):
 72 |         """
 73 |         Saves image to file.
 74 | 
 75 |         Args:
 76 |             image (HxWx3).
 77 | 
 78 |         Returns:
 79 |             str: filename.
 80 |         """
 81 |         if self.temp_dir is None:
 82 |             self.temp_dir = tempfile.mkdtemp()
 83 |         if self.img_shape is None:
 84 |             self.img_shape = image.shape
 85 |         assert self.img_shape == image.shape
 86 |         filename = self.get_filename(self.current_index)
 87 |         plt.imsave(fname=filename, arr=image)
 88 |         self.current_index += 1
 89 |         return filename
 90 | 
 91 |     def get_frame(self, index):
 92 |         """
 93 |         Read image from file.
 94 | 
 95 |         Args:
 96 |             index (int).
 97 | 
 98 |         Returns:
 99 |             Array (HxWx3).
100 |         """
101 |         filename = self.get_filename(index)
102 |         return plt.imread(fname=filename)
103 | 
104 |     def get_filename(self, index):
105 |         if self.temp_dir is None:
106 |             self.temp_dir = tempfile.mkdtemp()
107 |         return os.path.join(self.temp_dir, self.frame_string.format(index))
108 | 
109 |     def make_video(self):
110 |         cmd = ('ffmpeg -y -threads 16 -r {fps} '
111 |                '-i {temp_dir}/frame%08d.jpg -profile:v baseline -level 3.0 '
112 |                '-c:v libx264 -pix_fmt yuv420p -an -vf '
113 |                '"scale=trunc(iw/2)*2:trunc(ih/2)*2" {output_path}'.format(
114 |             fps=self.fps, temp_dir=self.temp_dir, output_path=self.output_path
115 |         ))
116 |         print(cmd)
117 |         try:
118 |             subprocess.call(cmd, shell=True)
119 |         except OSError as e:
120 |             import ipdb; ipdb.set_trace()
121 |             print('OSError')
122 | 
123 |     def close(self):
124 |         """
125 |         Clears the temp_dir.
126 |         """
127 |         print('Removing {} which contains {}.'.format(
128 |             self.temp_dir,
129 |             self.get_temp_dir_size())
130 |         )
131 |         shutil.rmtree(self.temp_dir)
132 |         self.temp_dir = None
133 | 
134 |     def get_temp_dir_size(self):
135 |         """
136 |         Returns the size of the temp dir.
137 |         """
138 |         return sizeof_fmt(get_dir_size(self.temp_dir))
139 | 
140 | 
141 | class VideoReader(object):
142 | 
143 |     def __init__(self, video_path, temp_dir=None):
144 |         self.video_path = video_path
145 |         self.temp_dir = temp_dir
146 |         self.frame_string = 'frame{:08}.jpg'
147 | 
148 |     def read(self):
149 |         if self.temp_dir is None:
150 |             self.temp_dir = tempfile.mkdtemp()
151 |         cmd = ('ffmpeg -i {video_path} -start_number 0 '
152 |                '{temp_dir}/frame%08d.jpg'.format(
153 |             temp_dir=self.temp_dir,
154 |             video_path=self.video_path
155 |         ))
156 |         print(cmd)
157 |         subprocess.call(cmd, shell=True)
158 |         self.num_frames = len(os.listdir(self.temp_dir))
159 | 
160 |     def get_filename(self, index):
161 |         if self.temp_dir is None:
162 |             self.temp_dir = tempfile.mkdtemp()
163 |         return os.path.join(self.temp_dir, self.frame_string.format(index))
164 | 
165 |     def get_image(self, index):
166 |         return plt.imread(self.get_filename(index))
167 | 
168 |     def get_images(self):
169 |         i = 0
170 |         fname = self.get_filename(i)
171 |         while os.path.exists(fname):
172 |             yield plt.imread(self.get_filename(i))
173 |             i += 1
174 |             fname = self.get_filename(i)
175 | 
176 |     def close(self):
177 |         """
178 |         Clears the temp_dir.
179 |         """
180 |         print('Removing {} which contains {}.'.format(
181 |             self.temp_dir,
182 |             self.get_temp_dir_size())
183 |         )
184 |         shutil.rmtree(self.temp_dir)
185 |         self.temp_dir = None
186 | 
187 |     def get_temp_dir_size(self):
188 |         """
189 |         Returns the size of the temp dir.
190 |         """
191 |         return sizeof_fmt(get_dir_size(self.temp_dir))
192 | 


--------------------------------------------------------------------------------