├── ISSUE_TEMPLATE.md ├── LICENCE ├── README.md ├── imgs └── viz_example.png └── src ├── cameras.py ├── data_utils.py ├── linear_model.py ├── predict_3dpose.py ├── procrustes.py └── viz.py /ISSUE_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | Thanks for your interest in our research! 2 | 3 | If you have problems running our code, please include 4 | 5 | 1. Your operating system 6 | 2. Your tensorflow version 7 | 3. Your python version 8 | 4. The stack trace of the error that you see 9 | 10 | This is **research code**, and its primary purpose is to reproduce the scientific results that you see in our paper. 11 | If you are looking for ways to extend our work such as 12 | 13 | * handling arbitrary images 14 | * handling videos as input 15 | * using other 2d detectors 16 | * large-scale deployment 17 | * handling missing data 18 | * you want to make a startup out of this code 19 | 20 | then pull requests are welcome, but it is very unlikely we will have the time to help you. 21 | -------------------------------------------------------------------------------- /LICENCE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2016 Julieta Martinez, Rayat Hossain, Javier Romero 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Overview 2 | 3 | This is the code for [this](https://youtu.be/EMjPqgLX14A) video on Youtube by Siraj Raval. 4 | 5 | ## 3d-pose-baseline 6 | 7 | This is the code for the paper 8 | 9 | Julieta Martinez, Rayat Hossain, Javier Romero, James J. Little. 10 | _A simple yet effective baseline for 3d human pose estimation._ 11 | In ICCV, 2017. https://arxiv.org/pdf/1705.03098.pdf. 12 | 13 | The code in this repository was mostly written by 14 | [Julieta Martinez](https://github.com/una-dinosauria), 15 | [Rayat Hossain](https://github.com/rayat137) and 16 | [Javier Romero](https://github.com/libicocco). 17 | 18 | We provide a strong baseline for 3d human pose estimation that also sheds light 19 | on the challenges of current approaches. Our model is lightweight and we strive 20 | to make our code transparent, compact, and easy-to-understand. 21 | 22 | ### Dependencies 23 | 24 | * [h5py](http://www.h5py.org/) 25 | * [tensorflow](https://www.tensorflow.org/) 1.0 or later 26 | 27 | ### First of all 28 | 1. Watch our video: https://youtu.be/Hmi3Pd9x1BE 29 | 2. Clone this repository and get the data. We provide the [Human3.6M](http://vision.imar.ro/human3.6m/description.php) dataset in 3d points, camera parameters to produce ground truth 2d detections, and [Stacked Hourglass](https://github.com/anewell/pose-hg-demo) detections. 30 | 31 | ```bash 32 | git clone https://github.com/una-dinosauria/3d-pose-baseline.git 33 | cd 3d-pose-baseline 34 | mkdir data 35 | cd data 36 | wget https://www.dropbox.com/s/e35qv3n6zlkouki/h36m.zip 37 | unzip h36m.zip 38 | rm h36m.zip 39 | cd .. 40 | ``` 41 | 42 | ### Quick demo 43 | 44 | For a quick demo, you can train for one epoch and visualize the results. To train, run 45 | 46 | `python src/predict_3dpose.py --camera_frame --residual --batch_norm --dropout 0.5 --max_norm --evaluateActionWise --use_sh --epochs 1` 47 | 48 | This should take about <5 minutes to complete on a GTX 1080, and give you around 75 mm of error on the test set. 49 | 50 | Now, to visualize the results, simply run 51 | 52 | `python src/predict_3dpose.py --camera_frame --residual --batch_norm --dropout 0.5 --max_norm --evaluateActionWise --use_sh --epochs 1 --sample --load 24371` 53 | 54 | This will produce a visualization similar to this: 55 | 56 | ![Visualization example](/imgs/viz_example.png?raw=1) 57 | 58 | ### Training 59 | 60 | To train a model with clean 2d detections, run: 61 | 62 | 63 | `python src/predict_3dpose.py --camera_frame --residual --batch_norm --dropout 0.5 --max_norm --evaluateActionWise` 64 | 65 | This corresponds to Table 2, bottom row. `Ours (GT detections) (MA)` 66 | 67 | To train on Stacked Hourglass detections, run 68 | 69 | `python src/predict_3dpose.py --camera_frame --residual --batch_norm --dropout 0.5 --max_norm --evaluateActionWise --use_sh` 70 | 71 | This corresponds to Table 2, next-to-last row. `Ours (SH detections) (MA)` 72 | 73 | On a GTX 1080 GPU, this takes <8 ms for forward+backward computation, and 74 | <6 ms for forward-only computation per batch of 64. 75 | 76 | ### Pre-trained model 77 | 78 | We also provide a model pre-trained on Stacked-Hourglass detections, available through [google drive](https://drive.google.com/file/d/0BxWzojlLp259MF9qSFpiVjl0cU0/view?usp=sharing) 79 | 80 | To test the model, decompress the file at the top level of this project, and call 81 | 82 | `python src/predict_3dpose.py --camera_frame --residual --batch_norm --dropout 0.5 --max_norm --evaluateActionWise --use_sh --epochs 200 --sample --load 4874200` 83 | 84 | ### Citing 85 | 86 | If you use our code, please cite our work 87 | 88 | ``` 89 | @inproceedings{martinez_2017_3dbaseline, 90 | title={A simple yet effective baseline for 3d human pose estimation}, 91 | author={Martinez, Julieta and Hossain, Rayat and Romero, Javier and Little, James J.}, 92 | booktitle={ICCV}, 93 | year={2017} 94 | } 95 | ``` 96 | 97 | ### License 98 | MIT 99 | 100 | ## Credits 101 | 102 | Credits for this code go to [una-dinosauria](https://github.com/una-dinosauria/3d-pose-baseline). I've merely created a wrapper to get people started. 103 | -------------------------------------------------------------------------------- /imgs/viz_example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/llSourcell/3D_Pose_Estimation/87c9d77e3bb0c1105eae74046c26f8b7f101ca45/imgs/viz_example.png -------------------------------------------------------------------------------- /src/cameras.py: -------------------------------------------------------------------------------- 1 | 2 | """Utilities to deal with the cameras of human3.6m""" 3 | 4 | from __future__ import division 5 | 6 | import h5py 7 | import numpy as np 8 | import matplotlib.pyplot as plt 9 | import matplotlib.image as mpimg 10 | import data_utils 11 | import viz 12 | 13 | def project_point_radial( P, R, T, f, c, k, p ): 14 | """ 15 | Project points from 3d to 2d using camera parameters 16 | including radial and tangential distortion 17 | 18 | Args 19 | P: Nx3 points in world coordinates 20 | R: 3x3 Camera rotation matrix 21 | T: 3x1 Camera translation parameters 22 | f: (scalar) Camera focal length 23 | c: 2x1 Camera center 24 | k: 3x1 Camera radial distortion coefficients 25 | p: 2x1 Camera tangential distortion coefficients 26 | Returns 27 | Proj: Nx2 points in pixel space 28 | D: 1xN depth of each point in camera space 29 | radial: 1xN radial distortion per point 30 | tan: 1xN tangential distortion per point 31 | r2: 1xN squared radius of the projected points before distortion 32 | """ 33 | 34 | # P is a matrix of 3-dimensional points 35 | assert len(P.shape) == 2 36 | assert P.shape[1] == 3 37 | 38 | N = P.shape[0] 39 | X = R.dot( P.T - T ) # rotate and translate 40 | XX = X[:2,:] / X[2,:] 41 | r2 = XX[0,:]**2 + XX[1,:]**2 42 | 43 | radial = 1 + np.einsum( 'ij,ij->j', np.tile(k,(1, N)), np.array([r2, r2**2, r2**3]) ); 44 | tan = p[0]*XX[1,:] + p[1]*XX[0,:] 45 | 46 | XXX = XX * np.tile(radial+tan,(2,1)) + np.outer(np.array([p[1], p[0]]).reshape(-1), r2 ) 47 | 48 | Proj = (f * XXX) + c 49 | Proj = Proj.T 50 | 51 | D = X[2,] 52 | 53 | return Proj, D, radial, tan, r2 54 | 55 | def world_to_camera_frame(P, R, T): 56 | """ 57 | Convert points from world to camera coordinates 58 | 59 | Args 60 | P: Nx3 3d points in world coordinates 61 | R: 3x3 Camera rotation matrix 62 | T: 3x1 Camera translation parameters 63 | Returns 64 | X_cam: Nx3 3d points in camera coordinates 65 | """ 66 | 67 | assert len(P.shape) == 2 68 | assert P.shape[1] == 3 69 | 70 | X_cam = R.dot( P.T - T ) # rotate and translate 71 | 72 | return X_cam.T 73 | 74 | def camera_to_world_frame(P, R, T): 75 | """Inverse of world_to_camera_frame 76 | 77 | Args 78 | P: Nx3 points in camera coordinates 79 | R: 3x3 Camera rotation matrix 80 | T: 3x1 Camera translation parameters 81 | Returns 82 | X_cam: Nx3 points in world coordinates 83 | """ 84 | 85 | assert len(P.shape) == 2 86 | assert P.shape[1] == 3 87 | 88 | X_cam = R.T.dot( P.T ) + T # rotate and translate 89 | 90 | return X_cam.T 91 | 92 | def load_camera_params( hf, path ): 93 | """Load h36m camera parameters 94 | 95 | Args 96 | hf: hdf5 open file with h36m cameras data 97 | path: path or key inside hf to the camera we are interested in 98 | Returns 99 | R: 3x3 Camera rotation matrix 100 | T: 3x1 Camera translation parameters 101 | f: (scalar) Camera focal length 102 | c: 2x1 Camera center 103 | k: 3x1 Camera radial distortion coefficients 104 | p: 2x1 Camera tangential distortion coefficients 105 | name: String with camera id 106 | """ 107 | 108 | R = hf[ path.format('R') ][:] 109 | R = R.T 110 | 111 | T = hf[ path.format('T') ][:] 112 | f = hf[ path.format('f') ][:] 113 | c = hf[ path.format('c') ][:] 114 | k = hf[ path.format('k') ][:] 115 | p = hf[ path.format('p') ][:] 116 | 117 | name = hf[ path.format('Name') ][:] 118 | name = "".join( [chr(item) for item in name] ) 119 | 120 | return R, T, f, c, k, p, name 121 | 122 | def load_cameras( bpath='cameras.h5', subjects=[1,5,6,7,8,9,11] ): 123 | """Loads the cameras of h36m 124 | 125 | Args 126 | bpath: path to hdf5 file with h36m camera data 127 | subjects: List of ints representing the subject IDs for which cameras are requested 128 | Returns 129 | rcams: dictionary of 4 tuples per subject ID containing its camera parameters for the 4 h36m cams 130 | """ 131 | rcams = {} 132 | 133 | with h5py.File(bpath,'r') as hf: 134 | for s in subjects: 135 | for c in range(4): # There are 4 cameras in human3.6m 136 | rcams[(s, c+1)] = load_camera_params(hf, 'subject%d/camera%d/{0}' % (s,c+1) ) 137 | 138 | return rcams 139 | -------------------------------------------------------------------------------- /src/data_utils.py: -------------------------------------------------------------------------------- 1 | 2 | """Utility functions for dealing with human3.6m data.""" 3 | 4 | from __future__ import division 5 | 6 | import os 7 | import numpy as np 8 | import matplotlib.pyplot as plt 9 | from mpl_toolkits.mplot3d import Axes3D 10 | import cameras 11 | import viz 12 | import h5py 13 | import glob 14 | import copy 15 | 16 | # Human3.6m IDs for training and testing 17 | TRAIN_SUBJECTS = [1,5,6,7,8] 18 | TEST_SUBJECTS = [9,11] 19 | 20 | # Joints in H3.6M -- data has 32 joints, but only 17 that move; these are the indices. 21 | H36M_NAMES = ['']*32 22 | H36M_NAMES[0] = 'Hip' 23 | H36M_NAMES[1] = 'RHip' 24 | H36M_NAMES[2] = 'RKnee' 25 | H36M_NAMES[3] = 'RFoot' 26 | H36M_NAMES[6] = 'LHip' 27 | H36M_NAMES[7] = 'LKnee' 28 | H36M_NAMES[8] = 'LFoot' 29 | H36M_NAMES[12] = 'Spine' 30 | H36M_NAMES[13] = 'Thorax' 31 | H36M_NAMES[14] = 'Neck/Nose' 32 | H36M_NAMES[15] = 'Head' 33 | H36M_NAMES[17] = 'LShoulder' 34 | H36M_NAMES[18] = 'LElbow' 35 | H36M_NAMES[19] = 'LWrist' 36 | H36M_NAMES[25] = 'RShoulder' 37 | H36M_NAMES[26] = 'RElbow' 38 | H36M_NAMES[27] = 'RWrist' 39 | 40 | # Stacked Hourglass produces 16 joints. These are the names. 41 | SH_NAMES = ['']*16 42 | SH_NAMES[0] = 'RFoot' 43 | SH_NAMES[1] = 'RKnee' 44 | SH_NAMES[2] = 'RHip' 45 | SH_NAMES[3] = 'LHip' 46 | SH_NAMES[4] = 'LKnee' 47 | SH_NAMES[5] = 'LFoot' 48 | SH_NAMES[6] = 'Hip' 49 | SH_NAMES[7] = 'Spine' 50 | SH_NAMES[8] = 'Thorax' 51 | SH_NAMES[9] = 'Head' 52 | SH_NAMES[10] = 'RWrist' 53 | SH_NAMES[11] = 'RElbow' 54 | SH_NAMES[12] = 'RShoulder' 55 | SH_NAMES[13] = 'LShoulder' 56 | SH_NAMES[14] = 'LElbow' 57 | SH_NAMES[15] = 'LWrist' 58 | 59 | def load_data( bpath, subjects, actions, dim=3 ): 60 | """ 61 | Loads 2d ground truth from disk, and puts it in an easy-to-acess dictionary 62 | 63 | Args 64 | bpath: String. Path where to load the data from 65 | subjects: List of integers. Subjects whose data will be loaded 66 | actions: List of strings. The actions to load 67 | dim: Integer={2,3}. Load 2 or 3-dimensional data 68 | Returns: 69 | data: Dictionary with keys k=(subject, action, seqname) 70 | values v=(nx(32*2) matrix of 2d ground truth) 71 | There will be 2 entries per subject/action if loading 3d data 72 | There will be 8 entries per subject/action if loading 2d data 73 | """ 74 | 75 | if not dim in [2,3]: 76 | raise(ValueError, 'dim must be 2 or 3') 77 | 78 | data = {} 79 | 80 | for subj in subjects: 81 | for action in actions: 82 | 83 | print('Reading subject {0}, action {1}'.format(subj, action)) 84 | 85 | dpath = os.path.join( bpath, 'S{0}'.format(subj), 'MyPoses/{0}D_positions'.format(dim), '{0}*.h5'.format(action) ) 86 | print( dpath ) 87 | 88 | fnames = glob.glob( dpath ) 89 | 90 | loaded_seqs = 0 91 | for fname in fnames: 92 | seqname = os.path.basename( fname ) 93 | 94 | # This rule makes sure SittingDown is not loaded when Sitting is requested 95 | if action == "Sitting" and seqname.startswith( "SittingDown" ): 96 | continue 97 | 98 | # This rule makes sure that WalkDog and WalkTogeter are not loaded when 99 | # Walking is requested. 100 | if seqname.startswith( action ): 101 | print( fname ) 102 | loaded_seqs = loaded_seqs + 1 103 | 104 | with h5py.File( fname, 'r' ) as h5f: 105 | poses = h5f['{0}D_positions'.format(dim)][:] 106 | 107 | poses = poses.T 108 | data[ (subj, action, seqname) ] = poses 109 | 110 | if dim == 2: 111 | assert loaded_seqs == 8, "Expecting 8 sequences, found {0} instead".format( loaded_seqs ) 112 | else: 113 | assert loaded_seqs == 2, "Expecting 2 sequences, found {0} instead".format( loaded_seqs ) 114 | 115 | return data 116 | 117 | 118 | def load_stacked_hourglass(data_dir, subjects, actions): 119 | """ 120 | Load 2d detections from disk, and put it in an easy-to-acess dictionary. 121 | 122 | Args 123 | data_dir: string. Directory where to load the data from, 124 | subjects: list of integers. Subjects whose data will be loaded. 125 | actions: list of strings. The actions to load. 126 | Returns 127 | data: dictionary with keys k=(subject, action, seqname) 128 | values v=(nx(32*2) matrix of 2d stacked hourglass detections) 129 | There will be 2 entries per subject/action if loading 3d data 130 | There will be 8 entries per subject/action if loading 2d data 131 | """ 132 | # Permutation that goes from SH detections to H36M ordering. 133 | SH_TO_GT_PERM = np.array([SH_NAMES.index( h ) for h in H36M_NAMES if h != '' and h in SH_NAMES]) 134 | assert np.all( SH_TO_GT_PERM == np.array([6,2,1,0,3,4,5,7,8,9,13,14,15,12,11,10]) ) 135 | 136 | data = {} 137 | 138 | for subj in subjects: 139 | for action in actions: 140 | 141 | print('Reading subject {0}, action {1}'.format(subj, action)) 142 | 143 | dpath = os.path.join( data_dir, 'S{0}'.format(subj), 'StackedHourglass/{0}*.h5'.format(action) ) 144 | print( dpath ) 145 | 146 | fnames = glob.glob( dpath ) 147 | 148 | loaded_seqs = 0 149 | for fname in fnames: 150 | seqname = os.path.basename( fname ) 151 | seqname = seqname.replace('_',' ') 152 | 153 | # This rule makes sure SittingDown is not loaded when Sitting is requested 154 | if action == "Sitting" and seqname.startswith( "SittingDown" ): 155 | continue 156 | 157 | # This rule makes sure that WalkDog and WalkTogeter are not loaded when 158 | # Walking is requested. 159 | if seqname.startswith( action ): 160 | print( fname ) 161 | loaded_seqs = loaded_seqs + 1 162 | 163 | # Load the poses from the .h5 file 164 | with h5py.File( fname, 'r' ) as h5f: 165 | poses = h5f['poses'][:] 166 | 167 | # Permute the loaded data to make it compatible with H36M 168 | poses = poses[:,SH_TO_GT_PERM,:] 169 | 170 | # Reshape into n x (32*2) matrix 171 | poses = np.reshape(poses,[poses.shape[0], -1]) 172 | poses_final = np.zeros([poses.shape[0], len(H36M_NAMES)*2]) 173 | 174 | dim_to_use_x = np.where(np.array([x != '' and x != 'Neck/Nose' for x in H36M_NAMES]))[0] * 2 175 | dim_to_use_y = dim_to_use_x+1 176 | 177 | dim_to_use = np.zeros(len(SH_NAMES)*2,dtype=np.int32) 178 | dim_to_use[0::2] = dim_to_use_x 179 | dim_to_use[1::2] = dim_to_use_y 180 | poses_final[:,dim_to_use] = poses 181 | seqname = seqname+'-sh' 182 | data[ (subj, action, seqname) ] = poses_final 183 | 184 | # Make sure we loaded 8 sequences 185 | if (subj == 11 and action == 'Directions'): # <-- this video is damaged 186 | assert loaded_seqs == 7, "Expecting 7 sequences, found {0} instead. S:{1} {2}".format(loaded_seqs, subj, action ) 187 | else: 188 | assert loaded_seqs == 8, "Expecting 8 sequences, found {0} instead. S:{1} {2}".format(loaded_seqs, subj, action ) 189 | 190 | return data 191 | 192 | 193 | def normalization_stats(complete_data, dim, predict_14=False ): 194 | """ 195 | Computes normalization statistics: mean and stdev, dimensions used and ignored 196 | 197 | Args 198 | complete_data: nxd np array with poses 199 | dim. integer={2,3} dimensionality of the data 200 | predict_14. boolean. Whether to use only 14 joints 201 | Returns 202 | data_mean: np vector with the mean of the data 203 | data_std: np vector with the standard deviation of the data 204 | dimensions_to_ignore: list of dimensions not used in the model 205 | dimensions_to_use: list of dimensions used in the model 206 | """ 207 | if not dim in [2,3]: 208 | raise(ValueError, 'dim must be 2 or 3') 209 | 210 | data_mean = np.mean(complete_data, axis=0) 211 | data_std = np.std(complete_data, axis=0) 212 | 213 | # Encodes which 17 (or 14) 2d-3d pairs we are predicting 214 | dimensions_to_ignore = [] 215 | if dim == 2: 216 | dimensions_to_use = np.where(np.array([x != '' and x != 'Neck/Nose' for x in H36M_NAMES]))[0] 217 | dimensions_to_use = np.sort( np.hstack( (dimensions_to_use*2, dimensions_to_use*2+1))) 218 | dimensions_to_ignore = np.delete( np.arange(len(H36M_NAMES)*2), dimensions_to_use ) 219 | else: # dim == 3 220 | dimensions_to_use = np.where(np.array([x != '' for x in H36M_NAMES]))[0] 221 | dimensions_to_use = np.delete( dimensions_to_use, [0,7,9] if predict_14 else 0 ) 222 | 223 | dimensions_to_use = np.sort( np.hstack( (dimensions_to_use*3, 224 | dimensions_to_use*3+1, 225 | dimensions_to_use*3+2))) 226 | dimensions_to_ignore = np.delete( np.arange(len(H36M_NAMES)*3), dimensions_to_use ) 227 | 228 | return data_mean, data_std, dimensions_to_ignore, dimensions_to_use 229 | 230 | 231 | def transform_world_to_camera(poses_set, cams, ncams=4 ): 232 | """ 233 | Project 3d poses from world coordinate to camera coordinate system 234 | Args 235 | poses_set: dictionary with 3d poses 236 | cams: dictionary with cameras 237 | ncams: number of cameras per subject 238 | Return: 239 | t3d_camera: dictionary with 3d poses in camera coordinate 240 | """ 241 | t3d_camera = {} 242 | for t3dk in sorted( poses_set.keys() ): 243 | 244 | subj, action, seqname = t3dk 245 | t3d_world = poses_set[ t3dk ] 246 | 247 | for c in range( ncams ): 248 | R, T, f, c, k, p, name = cams[ (subj, c+1) ] 249 | camera_coord = cameras.world_to_camera_frame( np.reshape(t3d_world, [-1, 3]), R, T) 250 | camera_coord = np.reshape( camera_coord, [-1, len(H36M_NAMES)*3] ) 251 | 252 | sname = seqname[:-3]+"."+name+".h5" # e.g.: Waiting 1.58860488.h5 253 | t3d_camera[ (subj, action, sname) ] = camera_coord 254 | 255 | return t3d_camera 256 | 257 | 258 | def normalize_data(data, data_mean, data_std, dim_to_use ): 259 | """ 260 | Normalizes a dictionary of poses 261 | 262 | Args 263 | data: dictionary where values are 264 | data_mean: np vector with the mean of the data 265 | data_std: np vector with the standard deviation of the data 266 | dim_to_use: list of dimensions to keep in the data 267 | Returns 268 | data_out: dictionary with same keys as data, but values have been normalized 269 | """ 270 | data_out = {} 271 | 272 | for key in data.keys(): 273 | data[ key ] = data[ key ][ :, dim_to_use ] 274 | mu = data_mean[dim_to_use] 275 | stddev = data_std[dim_to_use] 276 | data_out[ key ] = np.divide( (data[key] - mu), stddev ) 277 | 278 | return data_out 279 | 280 | 281 | def unNormalizeData(normalized_data, data_mean, data_std, dimensions_to_ignore): 282 | """ 283 | Un-normalizes a matrix whose mean has been substracted and that has been divided by 284 | standard deviation. Some dimensions might also be missing 285 | 286 | Args 287 | normalized_data: nxd matrix to unnormalize 288 | data_mean: np vector with the mean of the data 289 | data_std: np vector with the standard deviation of the data 290 | dimensions_to_ignore: list of dimensions that were removed from the original data 291 | Returns 292 | orig_data: the input normalized_data, but unnormalized 293 | """ 294 | T = normalized_data.shape[0] # Batch size 295 | D = data_mean.shape[0] # Dimensionality 296 | 297 | orig_data = np.zeros((T, D), dtype=np.float32) 298 | dimensions_to_use = np.array([dim for dim in range(D) 299 | if dim not in dimensions_to_ignore]) 300 | 301 | orig_data[:, dimensions_to_use] = normalized_data 302 | 303 | # Multiply times stdev and add the mean 304 | stdMat = data_std.reshape((1, D)) 305 | stdMat = np.repeat(stdMat, T, axis=0) 306 | meanMat = data_mean.reshape((1, D)) 307 | meanMat = np.repeat(meanMat, T, axis=0) 308 | orig_data = np.multiply(orig_data, stdMat) + meanMat 309 | return orig_data 310 | 311 | 312 | def define_actions( action ): 313 | """ 314 | Given an action string, returns a list of corresponding actions. 315 | 316 | Args 317 | action: String. either "all" or one of the h36m actions 318 | Returns 319 | actions: List of strings. Actions to use. 320 | Raises 321 | ValueError: if the action is not a valid action in Human 3.6M 322 | """ 323 | actions = ["Directions","Discussion","Eating","Greeting", 324 | "Phoning","Photo","Posing","Purchases", 325 | "Sitting","SittingDown","Smoking","Waiting", 326 | "WalkDog","Walking","WalkTogether"] 327 | 328 | if action == "All" or action == "all": 329 | return actions 330 | 331 | if not action in actions: 332 | raise( ValueError, "Unrecognized action: %s" % action ) 333 | 334 | return [action] 335 | 336 | 337 | def project_to_cameras( poses_set, cams, ncams=4 ): 338 | """ 339 | Project 3d poses using camera parameters 340 | 341 | Args 342 | poses_set: dictionary with 3d poses 343 | cams: dictionary with camera parameters 344 | ncams: number of cameras per subject 345 | Returns 346 | t2d: dictionary with 2d poses 347 | """ 348 | t2d = {} 349 | 350 | for t3dk in sorted( poses_set.keys() ): 351 | subj, a, seqname = t3dk 352 | t3d = poses_set[ t3dk ] 353 | 354 | for cam in range( ncams ): 355 | R, T, f, c, k, p, name = cams[ (subj, cam+1) ] 356 | pts2d, _, _, _, _ = cameras.project_point_radial( np.reshape(t3d, [-1, 3]), R, T, f, c, k, p ) 357 | 358 | pts2d = np.reshape( pts2d, [-1, len(H36M_NAMES)*2] ) 359 | sname = seqname[:-3]+"."+name+".h5" # e.g.: Waiting 1.58860488.h5 360 | t2d[ (subj, a, sname) ] = pts2d 361 | 362 | return t2d 363 | 364 | 365 | def read_2d_predictions( actions, data_dir ): 366 | """ 367 | Loads 2d data from precomputed Stacked Hourglass detections 368 | 369 | Args 370 | actions: list of strings. Actions to load 371 | data_dir: string. Directory where the data can be loaded from 372 | Returns 373 | train_set: dictionary with loaded 2d stacked hourglass detections for training 374 | test_set: dictionary with loaded 2d stacked hourglass detections for testing 375 | data_mean: vector with the mean of the 2d training data 376 | data_std: vector with the standard deviation of the 2d training data 377 | dim_to_ignore: list with the dimensions to not predict 378 | dim_to_use: list with the dimensions to predict 379 | """ 380 | 381 | train_set = load_stacked_hourglass( data_dir, TRAIN_SUBJECTS, actions) 382 | test_set = load_stacked_hourglass( data_dir, TEST_SUBJECTS, actions) 383 | 384 | complete_train = copy.deepcopy( np.vstack( train_set.values() )) 385 | data_mean, data_std, dim_to_ignore, dim_to_use = normalization_stats( complete_train, dim=2 ) 386 | 387 | train_set = normalize_data( train_set, data_mean, data_std, dim_to_use ) 388 | test_set = normalize_data( test_set, data_mean, data_std, dim_to_use ) 389 | 390 | return train_set, test_set, data_mean, data_std, dim_to_ignore, dim_to_use 391 | 392 | 393 | def create_2d_data( actions, data_dir, rcams ): 394 | """ 395 | Creates 2d poses by projecting 3d poses with the corresponding camera 396 | parameters. Also normalizes the 2d poses 397 | 398 | Args 399 | actions: list of strings. Actions to load 400 | data_dir: string. Directory where the data can be loaded from 401 | rcams: dictionary with camera parameters 402 | Returns 403 | train_set: dictionary with projected 2d poses for training 404 | test_set: dictionary with projected 2d poses for testing 405 | data_mean: vector with the mean of the 2d training data 406 | data_std: vector with the standard deviation of the 2d training data 407 | dim_to_ignore: list with the dimensions to not predict 408 | dim_to_use: list with the dimensions to predict 409 | """ 410 | 411 | # Load 3d data 412 | train_set = load_data( data_dir, TRAIN_SUBJECTS, actions, dim=3 ) 413 | test_set = load_data( data_dir, TEST_SUBJECTS, actions, dim=3 ) 414 | 415 | train_set = project_to_cameras( train_set, rcams ) 416 | test_set = project_to_cameras( test_set, rcams ) 417 | 418 | # Compute normalization statistics. 419 | complete_train = copy.deepcopy( np.vstack( train_set.values() )) 420 | data_mean, data_std, dim_to_ignore, dim_to_use = normalization_stats( complete_train, dim=2 ) 421 | 422 | # Divide every dimension independently 423 | train_set = normalize_data( train_set, data_mean, data_std, dim_to_use ) 424 | test_set = normalize_data( test_set, data_mean, data_std, dim_to_use ) 425 | 426 | return train_set, test_set, data_mean, data_std, dim_to_ignore, dim_to_use 427 | 428 | 429 | def read_3d_data( actions, data_dir, camera_frame, rcams, predict_14=False ): 430 | """ 431 | Loads 3d poses, zero-centres and normalizes them 432 | 433 | Args 434 | actions: list of strings. Actions to load 435 | data_dir: string. Directory where the data can be loaded from 436 | camera_frame: boolean. Whether to convert the data to camera coordinates 437 | rcams: dictionary with camera parameters 438 | predict_14: boolean. Whether to predict only 14 joints 439 | Returns 440 | train_set: dictionary with loaded 3d poses for training 441 | test_set: dictionary with loaded 3d poses for testing 442 | data_mean: vector with the mean of the 3d training data 443 | data_std: vector with the standard deviation of the 3d training data 444 | dim_to_ignore: list with the dimensions to not predict 445 | dim_to_use: list with the dimensions to predict 446 | train_root_positions: dictionary with the 3d positions of the root in train 447 | test_root_positions: dictionary with the 3d positions of the root in test 448 | """ 449 | # Load 3d data 450 | train_set = load_data( data_dir, TRAIN_SUBJECTS, actions, dim=3 ) 451 | test_set = load_data( data_dir, TEST_SUBJECTS, actions, dim=3 ) 452 | 453 | if camera_frame: 454 | train_set = transform_world_to_camera( train_set, rcams ) 455 | test_set = transform_world_to_camera( test_set, rcams ) 456 | 457 | # Apply 3d post-processing (centering around root) 458 | train_set, train_root_positions = postprocess_3d( train_set ) 459 | test_set, test_root_positions = postprocess_3d( test_set ) 460 | 461 | # Compute normalization statistics 462 | complete_train = copy.deepcopy( np.vstack( train_set.values() )) 463 | data_mean, data_std, dim_to_ignore, dim_to_use = normalization_stats( complete_train, dim=3, predict_14=predict_14 ) 464 | 465 | # Divide every dimension independently 466 | train_set = normalize_data( train_set, data_mean, data_std, dim_to_use ) 467 | test_set = normalize_data( test_set, data_mean, data_std, dim_to_use ) 468 | 469 | return train_set, test_set, data_mean, data_std, dim_to_ignore, dim_to_use, train_root_positions, test_root_positions 470 | 471 | 472 | def postprocess_3d( poses_set ): 473 | """ 474 | Center 3d points around root 475 | 476 | Args 477 | poses_set: dictionary with 3d data 478 | Returns 479 | poses_set: dictionary with 3d data centred around root (center hip) joint 480 | root_positions: dictionary with the original 3d position of each pose 481 | """ 482 | root_positions = {} 483 | for k in poses_set.keys(): 484 | # Keep track of the global position 485 | root_positions[k] = copy.deepcopy(poses_set[k][:,:3]) 486 | 487 | # Remove the root from the 3d position 488 | poses = poses_set[k] 489 | poses = poses - np.tile( poses[:,:3], [1, len(H36M_NAMES)] ) 490 | poses_set[k] = poses 491 | 492 | return poses_set, root_positions 493 | -------------------------------------------------------------------------------- /src/linear_model.py: -------------------------------------------------------------------------------- 1 | 2 | """Simple model to regress 3d human poses from 2d joint locations""" 3 | 4 | from __future__ import absolute_import 5 | from __future__ import division 6 | from __future__ import print_function 7 | 8 | from tensorflow.python.ops import variable_scope as vs 9 | 10 | import os 11 | import numpy as np 12 | from six.moves import xrange # pylint: disable=redefined-builtin 13 | import tensorflow as tf 14 | import data_utils 15 | import cameras as cam 16 | 17 | def kaiming(shape, dtype, partition_info=None): 18 | """Kaiming initialization as described in https://arxiv.org/pdf/1502.01852.pdf 19 | 20 | Args 21 | shape: dimensions of the tf array to initialize 22 | dtype: data type of the array 23 | partition_info: (Optional) info about how the variable is partitioned. 24 | See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/init_ops.py#L26 25 | Needed to be used as an initializer. 26 | Returns 27 | Tensorflow array with initial weights 28 | """ 29 | return(tf.truncated_normal(shape, dtype=dtype)*tf.sqrt(2/float(shape[0]))) 30 | 31 | class LinearModel(object): 32 | """ A simple Linear+RELU model """ 33 | 34 | def __init__(self, 35 | linear_size, 36 | num_layers, 37 | residual, 38 | batch_norm, 39 | max_norm, 40 | batch_size, 41 | learning_rate, 42 | summaries_dir, 43 | predict_14=False, 44 | dtype=tf.float32): 45 | """Creates the linear + relu model 46 | 47 | Args 48 | linear_size: integer. number of units in each layer of the model 49 | num_layers: integer. number of bilinear blocks in the model 50 | residual: boolean. Whether to add residual connections 51 | batch_norm: boolean. Whether to use batch normalization 52 | max_norm: boolean. Whether to clip weights to a norm of 1 53 | batch_size: integer. The size of the batches used during training 54 | learning_rate: float. Learning rate to start with 55 | summaries_dir: String. Directory where to log progress 56 | predict_14: boolean. Whether to predict 14 instead of 17 joints 57 | dtype: the data type to use to store internal variables 58 | """ 59 | 60 | # There are in total 17 joints in H3.6M and 16 in MPII (and therefore in stacked 61 | # hourglass detections). We settled with 16 joints in 2d just to make models 62 | # compatible (e.g. you can train on ground truth 2d and test on SH detections). 63 | # This does not seem to have an effect on prediction performance. 64 | self.HUMAN_2D_SIZE = 16 * 2 65 | 66 | # In 3d all the predictions are zero-centered around the root (hip) joint, so 67 | # we actually predict only 16 joints. The error is still computed over 17 joints, 68 | # because if one uses, e.g. Procrustes alignment, there is still error in the 69 | # hip to account for! 70 | # There is also an option to predict only 14 joints, which makes our results 71 | # directly comparable to those in https://arxiv.org/pdf/1611.09010.pdf 72 | self.HUMAN_3D_SIZE = 14 * 3 if predict_14 else 16 * 3 73 | 74 | self.input_size = self.HUMAN_2D_SIZE 75 | self.output_size = self.HUMAN_3D_SIZE 76 | 77 | self.isTraining = tf.placeholder(tf.bool,name="isTrainingflag") 78 | self.dropout_keep_prob = tf.placeholder(tf.float32, name="dropout_keep_prob") 79 | 80 | # Summary writers for train and test runs 81 | self.train_writer = tf.summary.FileWriter( os.path.join(summaries_dir, 'train' )) 82 | self.test_writer = tf.summary.FileWriter( os.path.join(summaries_dir, 'test' )) 83 | 84 | self.linear_size = linear_size 85 | self.batch_size = batch_size 86 | self.learning_rate = tf.Variable( float(learning_rate), trainable=False, dtype=dtype, name="learning_rate") 87 | self.global_step = tf.Variable(0, trainable=False, name="global_step") 88 | decay_steps = 100000 # empirical 89 | decay_rate = 0.96 # empirical 90 | self.learning_rate = tf.train.exponential_decay(self.learning_rate, self.global_step, decay_steps, decay_rate) 91 | 92 | # === Transform the inputs === 93 | with vs.variable_scope("inputs"): 94 | 95 | # in=2d poses, out=3d poses 96 | enc_in = tf.placeholder(dtype, shape=[None, self.input_size], name="enc_in") 97 | dec_out = tf.placeholder(dtype, shape=[None, self.output_size], name="dec_out") 98 | 99 | self.encoder_inputs = enc_in 100 | self.decoder_outputs = dec_out 101 | 102 | # === Create the linear + relu combos === 103 | with vs.variable_scope( "linear_model" ): 104 | 105 | # === First layer, brings dimensionality up to linear_size === 106 | w1 = tf.get_variable( name="w1", initializer=kaiming, shape=[self.HUMAN_2D_SIZE, linear_size], dtype=dtype ) 107 | b1 = tf.get_variable( name="b1", initializer=kaiming, shape=[linear_size], dtype=dtype ) 108 | w1 = tf.clip_by_norm(w1,1) if max_norm else w1 109 | y3 = tf.matmul( enc_in, w1 ) + b1 110 | 111 | if batch_norm: 112 | y3 = tf.layers.batch_normalization(y3,training=self.isTraining, name="batch_normalization") 113 | y3 = tf.nn.relu( y3 ) 114 | y3 = tf.nn.dropout( y3, self.dropout_keep_prob ) 115 | 116 | # === Create multiple bi-linear layers === 117 | for idx in range( num_layers ): 118 | y3 = self.two_linear( y3, linear_size, residual, self.dropout_keep_prob, max_norm, batch_norm, dtype, idx ) 119 | 120 | # === Last linear layer has HUMAN_3D_SIZE in output === 121 | w4 = tf.get_variable( name="w4", initializer=kaiming, shape=[linear_size, self.HUMAN_3D_SIZE], dtype=dtype ) 122 | b4 = tf.get_variable( name="b4", initializer=kaiming, shape=[self.HUMAN_3D_SIZE], dtype=dtype ) 123 | w4 = tf.clip_by_norm(w4,1) if max_norm else w4 124 | y = tf.matmul(y3, w4) + b4 125 | # === End linear model === 126 | 127 | # Store the outputs here 128 | self.outputs = y 129 | self.loss = tf.reduce_mean(tf.square(y - dec_out)) 130 | self.loss_summary = tf.summary.scalar('loss/loss', self.loss) 131 | 132 | # To keep track of the loss in mm 133 | self.err_mm = tf.placeholder( tf.float32, name="error_mm" ) 134 | self.err_mm_summary = tf.summary.scalar( "loss/error_mm", self.err_mm ) 135 | 136 | # Gradients and update operation for training the model. 137 | opt = tf.train.AdamOptimizer( self.learning_rate ) 138 | update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) 139 | 140 | with tf.control_dependencies(update_ops): 141 | 142 | # Update all the trainable parameters 143 | gradients = opt.compute_gradients(self.loss) 144 | self.gradients = [[] if i==None else i for i in gradients] 145 | self.updates = opt.apply_gradients(gradients, global_step=self.global_step) 146 | 147 | # Keep track of the learning rate 148 | self.learning_rate_summary = tf.summary.scalar('learning_rate/learning_rate', self.learning_rate) 149 | 150 | # To save the model 151 | self.saver = tf.train.Saver( tf.global_variables(), max_to_keep=10 ) 152 | 153 | 154 | def two_linear( self, xin, linear_size, residual, dropout_keep_prob, max_norm, batch_norm, dtype, idx ): 155 | """ 156 | Make a bi-linear block with optional residual connection 157 | 158 | Args 159 | xin: the batch that enters the block 160 | linear_size: integer. The size of the linear units 161 | residual: boolean. Whether to add a residual connection 162 | dropout_keep_prob: float [0,1]. Probability of dropping something out 163 | max_norm: boolean. Whether to clip weights to 1-norm 164 | batch_norm: boolean. Whether to do batch normalization 165 | dtype: type of the weigths. Usually tf.float32 166 | idx: integer. Number of layer (for naming/scoping) 167 | Returns 168 | y: the batch after it leaves the block 169 | """ 170 | 171 | with vs.variable_scope( "two_linear_"+str(idx) ) as scope: 172 | 173 | input_size = int(xin.get_shape()[1]) 174 | 175 | # Linear 1 176 | w2 = tf.get_variable( name="w2_"+str(idx), initializer=kaiming, shape=[input_size, linear_size], dtype=dtype) 177 | b2 = tf.get_variable( name="b2_"+str(idx), initializer=kaiming, shape=[linear_size], dtype=dtype) 178 | w2 = tf.clip_by_norm(w2,1) if max_norm else w2 179 | y = tf.matmul(xin, w2) + b2 180 | if batch_norm: 181 | y = tf.layers.batch_normalization(y,training=self.isTraining,name="batch_normalization1"+str(idx)) 182 | 183 | y = tf.nn.relu( y ) 184 | y = tf.nn.dropout( y, dropout_keep_prob ) 185 | 186 | # Linear 2 187 | w3 = tf.get_variable( name="w3_"+str(idx), initializer=kaiming, shape=[linear_size, linear_size], dtype=dtype) 188 | b3 = tf.get_variable( name="b3_"+str(idx), initializer=kaiming, shape=[linear_size], dtype=dtype) 189 | w3 = tf.clip_by_norm(w3,1) if max_norm else w3 190 | y = tf.matmul(y, w3) + b3 191 | 192 | if batch_norm: 193 | y = tf.layers.batch_normalization(y,training=self.isTraining,name="batch_normalization2"+str(idx)) 194 | 195 | y = tf.nn.relu( y ) 196 | y = tf.nn.dropout( y, dropout_keep_prob ) 197 | 198 | # Residual every 2 blocks 199 | y = (xin + y) if residual else y 200 | 201 | return y 202 | 203 | def step(self, session, encoder_inputs, decoder_outputs, dropout_keep_prob, isTraining=True): 204 | """Run a step of the model feeding the given inputs. 205 | 206 | Args 207 | session: tensorflow session to use 208 | encoder_inputs: list of numpy vectors to feed as encoder inputs 209 | decoder_outputs: list of numpy vectors that are the expected decoder outputs 210 | dropout_keep_prob: (0,1] dropout keep probability 211 | isTraining: whether to do the backward step or only forward 212 | 213 | Returns 214 | if isTraining is True, a 4-tuple 215 | loss: the computed loss of this batch 216 | loss_summary: tf summary of this batch loss, to log on tensorboard 217 | learning_rate_summary: tf summary of learnign rate to log on tensorboard 218 | outputs: predicted 3d poses 219 | if isTraining is False, a 3-tuple 220 | (loss, loss_summary, outputs) same as above 221 | """ 222 | 223 | input_feed = {self.encoder_inputs: encoder_inputs, 224 | self.decoder_outputs: decoder_outputs, 225 | self.isTraining: isTraining, 226 | self.dropout_keep_prob: dropout_keep_prob} 227 | 228 | # Output feed: depends on whether we do a backward step or not. 229 | if isTraining: 230 | output_feed = [self.updates, # Update Op that does SGD 231 | self.loss, 232 | self.loss_summary, 233 | self.learning_rate_summary, 234 | self.outputs] 235 | 236 | outputs = session.run( output_feed, input_feed ) 237 | return outputs[1], outputs[2], outputs[3], outputs[4] 238 | 239 | else: 240 | output_feed = [self.loss, # Loss for this batch. 241 | self.loss_summary, 242 | self.outputs] 243 | 244 | outputs = session.run(output_feed, input_feed) 245 | return outputs[0], outputs[1], outputs[2] # No gradient norm 246 | 247 | def get_all_batches( self, data_x, data_y, camera_frame, training=True ): 248 | """ 249 | Obtain a list of all the batches, randomly permutted 250 | Args 251 | data_x: dictionary with 2d inputs 252 | data_y: dictionary with 3d expected outputs 253 | camera_frame: whether the 3d data is in camera coordinates 254 | training: True if this is a training batch. False otherwise. 255 | 256 | Returns 257 | encoder_inputs: list of 2d batches 258 | decoder_outputs: list of 3d batches 259 | """ 260 | 261 | # Figure out how many frames we have 262 | n = 0 263 | for key2d in data_x.keys(): 264 | n2d, _ = data_x[ key2d ].shape 265 | n = n + n2d 266 | 267 | encoder_inputs = np.zeros((n, self.input_size), dtype=float) 268 | decoder_outputs = np.zeros((n, self.output_size), dtype=float) 269 | 270 | # Put all the data into big arrays 271 | idx = 0 272 | for key2d in data_x.keys(): 273 | (subj, b, fname) = key2d 274 | # keys should be the same if 3d is in camera coordinates 275 | key3d = key2d if (camera_frame) else (subj, b, '{0}.h5'.format(fname.split('.')[0])) 276 | key3d = (subj, b, fname[:-3]) if fname.endswith('-sh') and camera_frame else key3d 277 | 278 | n2d, _ = data_x[ key2d ].shape 279 | encoder_inputs[idx:idx+n2d, :] = data_x[ key2d ] 280 | decoder_outputs[idx:idx+n2d, :] = data_y[ key3d ] 281 | idx = idx + n2d 282 | 283 | 284 | if training: 285 | # Randomly permute everything 286 | idx = np.random.permutation( n ) 287 | encoder_inputs = encoder_inputs[idx, :] 288 | decoder_outputs = decoder_outputs[idx, :] 289 | 290 | # Make the number of examples a multiple of the batch size 291 | n_extra = n % self.batch_size 292 | if n_extra > 0: # Otherwise examples are already a multiple of batch size 293 | encoder_inputs = encoder_inputs[:-n_extra, :] 294 | decoder_outputs = decoder_outputs[:-n_extra, :] 295 | 296 | n_batches = n // self.batch_size 297 | encoder_inputs = np.split( encoder_inputs, n_batches ) 298 | decoder_outputs = np.split( decoder_outputs, n_batches ) 299 | 300 | return encoder_inputs, decoder_outputs 301 | -------------------------------------------------------------------------------- /src/predict_3dpose.py: -------------------------------------------------------------------------------- 1 | 2 | """Predicting 3d poses from 2d joints""" 3 | 4 | from __future__ import absolute_import 5 | from __future__ import division 6 | from __future__ import print_function 7 | 8 | import math 9 | import os 10 | import random 11 | import sys 12 | import time 13 | import h5py 14 | import copy 15 | 16 | import matplotlib.pyplot as plt 17 | import numpy as np 18 | from six.moves import xrange # pylint: disable=redefined-builtin 19 | import tensorflow as tf 20 | import procrustes 21 | 22 | import viz 23 | import cameras 24 | import data_utils 25 | import linear_model 26 | 27 | tf.app.flags.DEFINE_float("learning_rate", 1e-3, "Learning rate") 28 | tf.app.flags.DEFINE_float("dropout", 1, "Dropout keep probability. 1 means no dropout") 29 | tf.app.flags.DEFINE_integer("batch_size", 64, "Batch size to use during training") 30 | tf.app.flags.DEFINE_integer("epochs", 200, "How many epochs we should train for") 31 | tf.app.flags.DEFINE_boolean("camera_frame", False, "Convert 3d poses to camera coordinates") 32 | tf.app.flags.DEFINE_boolean("max_norm", False, "Apply maxnorm constraint to the weights") 33 | tf.app.flags.DEFINE_boolean("batch_norm", False, "Use batch_normalization") 34 | 35 | # Data loading 36 | tf.app.flags.DEFINE_boolean("predict_14", False, "predict 14 joints") 37 | tf.app.flags.DEFINE_boolean("use_sh", False, "Use 2d pose predictions from StackedHourglass") 38 | tf.app.flags.DEFINE_string("action","All", "The action to train on. 'All' means all the actions") 39 | 40 | # Architecture 41 | tf.app.flags.DEFINE_integer("linear_size", 1024, "Size of each model layer.") 42 | tf.app.flags.DEFINE_integer("num_layers", 2, "Number of layers in the model.") 43 | tf.app.flags.DEFINE_boolean("residual", False, "Whether to add a residual connection every 2 layers") 44 | 45 | # Evaluation 46 | tf.app.flags.DEFINE_boolean("procrustes", False, "Apply procrustes analysis at test time") 47 | tf.app.flags.DEFINE_boolean("evaluateActionWise",False, "The dataset to use either h36m or heva") 48 | 49 | # Directories 50 | tf.app.flags.DEFINE_string("cameras_path","data/h36m/cameras.h5","Directory to load camera parameters") 51 | tf.app.flags.DEFINE_string("data_dir", "data/h36m/", "Data directory") 52 | tf.app.flags.DEFINE_string("train_dir", "experiments", "Training directory.") 53 | 54 | # Train or load 55 | tf.app.flags.DEFINE_boolean("sample", False, "Set to True for sampling.") 56 | tf.app.flags.DEFINE_boolean("use_cpu", False, "Whether to use the CPU") 57 | tf.app.flags.DEFINE_integer("load", 0, "Try to load a previous checkpoint.") 58 | 59 | # Misc 60 | tf.app.flags.DEFINE_boolean("use_fp16", False, "Train using fp16 instead of fp32.") 61 | 62 | FLAGS = tf.app.flags.FLAGS 63 | 64 | train_dir = os.path.join( FLAGS.train_dir, 65 | FLAGS.action, 66 | 'dropout_{0}'.format(FLAGS.dropout), 67 | 'epochs_{0}'.format(FLAGS.epochs) if FLAGS.epochs > 0 else '', 68 | 'lr_{0}'.format(FLAGS.learning_rate), 69 | 'residual' if FLAGS.residual else 'not_residual', 70 | 'depth_{0}'.format(FLAGS.num_layers), 71 | 'linear_size{0}'.format(FLAGS.linear_size), 72 | 'batch_size_{0}'.format(FLAGS.batch_size), 73 | 'procrustes' if FLAGS.procrustes else 'no_procrustes', 74 | 'maxnorm' if FLAGS.max_norm else 'no_maxnorm', 75 | 'batch_normalization' if FLAGS.batch_norm else 'no_batch_normalization', 76 | 'use_stacked_hourglass' if FLAGS.use_sh else 'not_stacked_hourglass', 77 | 'predict_14' if FLAGS.predict_14 else 'predict_17') 78 | 79 | print( train_dir ) 80 | summaries_dir = os.path.join( train_dir, "log" ) # Directory for TB summaries 81 | 82 | # To avoid race conditions: https://github.com/tensorflow/tensorflow/issues/7448 83 | os.system('mkdir -p {}'.format(summaries_dir)) 84 | 85 | def create_model( session, actions, batch_size ): 86 | """ 87 | Create model and initialize it or load its parameters in a session 88 | 89 | Args 90 | session: tensorflow session 91 | actions: list of string. Actions to train/test on 92 | batch_size: integer. Number of examples in each batch 93 | Returns 94 | model: The created (or loaded) model 95 | Raises 96 | ValueError if asked to load a model, but the checkpoint specified by 97 | FLAGS.load cannot be found. 98 | """ 99 | 100 | model = linear_model.LinearModel( 101 | FLAGS.linear_size, 102 | FLAGS.num_layers, 103 | FLAGS.residual, 104 | FLAGS.batch_norm, 105 | FLAGS.max_norm, 106 | batch_size, 107 | FLAGS.learning_rate, 108 | summaries_dir, 109 | FLAGS.predict_14, 110 | dtype=tf.float16 if FLAGS.use_fp16 else tf.float32) 111 | 112 | if FLAGS.load <= 0: 113 | # Create a new model from scratch 114 | print("Creating model with fresh parameters.") 115 | session.run( tf.global_variables_initializer() ) 116 | return model 117 | 118 | # Load a previously saved model 119 | ckpt = tf.train.get_checkpoint_state( train_dir, latest_filename="checkpoint") 120 | print( "train_dir", train_dir ) 121 | 122 | if ckpt and ckpt.model_checkpoint_path: 123 | # Check if the specific checkpoint exists 124 | if FLAGS.load > 0: 125 | if os.path.isfile(os.path.join(train_dir,"checkpoint-{0}.index".format(FLAGS.load))): 126 | ckpt_name = os.path.join( os.path.join(train_dir,"checkpoint-{0}".format(FLAGS.load)) ) 127 | else: 128 | raise ValueError("Asked to load checkpoint {0}, but it does not seem to exist".format(FLAGS.load)) 129 | else: 130 | ckpt_name = os.path.basename( ckpt.model_checkpoint_path ) 131 | 132 | print("Loading model {0}".format( ckpt_name )) 133 | model.saver.restore( session, ckpt.model_checkpoint_path ) 134 | return model 135 | else: 136 | print("Could not find checkpoint. Aborting.") 137 | raise( ValueError, "Checkpoint {0} does not seem to exist".format( ckpt.model_checkpoint_path ) ) 138 | 139 | return model 140 | 141 | def train(): 142 | """Train a linear model for 3d pose estimation""" 143 | 144 | actions = data_utils.define_actions( FLAGS.action ) 145 | 146 | number_of_actions = len( actions ) 147 | 148 | # Load camera parameters 149 | SUBJECT_IDS = [1,5,6,7,8,9,11] 150 | rcams = cameras.load_cameras(FLAGS.cameras_path, SUBJECT_IDS) 151 | 152 | # Load 3d data and load (or create) 2d projections 153 | train_set_3d, test_set_3d, data_mean_3d, data_std_3d, dim_to_ignore_3d, dim_to_use_3d, train_root_positions, test_root_positions = data_utils.read_3d_data( 154 | actions, FLAGS.data_dir, FLAGS.camera_frame, rcams, FLAGS.predict_14 ) 155 | 156 | # Read stacked hourglass 2D predictions if use_sh, otherwise use groundtruth 2D projections 157 | if FLAGS.use_sh: 158 | train_set_2d, test_set_2d, data_mean_2d, data_std_2d, dim_to_ignore_2d, dim_to_use_2d = data_utils.read_2d_predictions(actions, FLAGS.data_dir) 159 | else: 160 | train_set_2d, test_set_2d, data_mean_2d, data_std_2d, dim_to_ignore_2d, dim_to_use_2d = data_utils.create_2d_data( actions, FLAGS.data_dir, rcams ) 161 | print( "done reading and normalizing data." ) 162 | 163 | # Avoid using the GPU if requested 164 | device_count = {"GPU": 0} if FLAGS.use_cpu else {"GPU": 1} 165 | with tf.Session(config=tf.ConfigProto( 166 | device_count=device_count, 167 | allow_soft_placement=True )) as sess: 168 | 169 | # === Create the model === 170 | print("Creating %d bi-layers of %d units." % (FLAGS.num_layers, FLAGS.linear_size)) 171 | model = create_model( sess, actions, FLAGS.batch_size ) 172 | model.train_writer.add_graph( sess.graph ) 173 | print("Model created") 174 | 175 | #=== This is the training loop === 176 | step_time, loss, val_loss = 0.0, 0.0, 0.0 177 | current_step = 0 if FLAGS.load <= 0 else FLAGS.load + 1 178 | previous_losses = [] 179 | 180 | step_time, loss = 0, 0 181 | current_epoch = 0 182 | log_every_n_batches = 100 183 | 184 | for _ in xrange( FLAGS.epochs ): 185 | current_epoch = current_epoch + 1 186 | 187 | # === Load training batches for one epoch === 188 | encoder_inputs, decoder_outputs = model.get_all_batches( train_set_2d, train_set_3d, FLAGS.camera_frame, training=True ) 189 | nbatches = len( encoder_inputs ) 190 | print("There are {0} train batches".format( nbatches )) 191 | start_time, loss = time.time(), 0. 192 | 193 | # === Loop through all the training batches === 194 | for i in range( nbatches ): 195 | 196 | if (i+1) % log_every_n_batches == 0: 197 | # Print progress every log_every_n_batches batches 198 | print("Working on epoch {0}, batch {1} / {2}... ".format( current_epoch, i+1, nbatches), end="" ) 199 | 200 | enc_in, dec_out = encoder_inputs[i], decoder_outputs[i] 201 | step_loss, loss_summary, lr_summary, _ = model.step( sess, enc_in, dec_out, FLAGS.dropout, isTraining=True ) 202 | 203 | if (i+1) % log_every_n_batches == 0: 204 | # Log and print progress every log_every_n_batches batches 205 | model.train_writer.add_summary( loss_summary, current_step ) 206 | model.train_writer.add_summary( lr_summary, current_step ) 207 | step_time = (time.time() - start_time) 208 | start_time = time.time() 209 | print("done in {0:.2f} ms".format( 1000*step_time / log_every_n_batches ) ) 210 | 211 | loss += step_loss 212 | current_step += 1 213 | # === end looping through training batches === 214 | 215 | loss = loss / nbatches 216 | print("=============================\n" 217 | "Global step: %d\n" 218 | "Learning rate: %.2e\n" 219 | "Train loss avg: %.4f\n" 220 | "=============================" % (model.global_step.eval(), 221 | model.learning_rate.eval(), loss) ) 222 | # === End training for an epoch === 223 | 224 | # === Testing after this epoch === 225 | isTraining = False 226 | 227 | if FLAGS.evaluateActionWise: 228 | 229 | print("{0:=^12} {1:=^6}".format("Action", "mm")) # line of 30 equal signs 230 | 231 | cum_err = 0 232 | for action in actions: 233 | 234 | print("{0:<12} ".format(action), end="") 235 | # Get 2d and 3d testing data for this action 236 | action_test_set_2d = get_action_subset( test_set_2d, action ) 237 | action_test_set_3d = get_action_subset( test_set_3d, action ) 238 | encoder_inputs, decoder_outputs = model.get_all_batches( action_test_set_2d, action_test_set_3d, FLAGS.camera_frame, training=False) 239 | 240 | act_err, _, step_time, loss = evaluate_batches( sess, model, 241 | data_mean_3d, data_std_3d, dim_to_use_3d, dim_to_ignore_3d, 242 | data_mean_2d, data_std_2d, dim_to_use_2d, dim_to_ignore_2d, 243 | current_step, encoder_inputs, decoder_outputs ) 244 | cum_err = cum_err + act_err 245 | 246 | print("{0:>6.2f}".format(act_err)) 247 | 248 | summaries = sess.run( model.err_mm_summary, {model.err_mm: float(cum_err/float(len(actions)))} ) 249 | model.test_writer.add_summary( summaries, current_step ) 250 | print("{0:<12} {1:>6.2f}".format("Average", cum_err/float(len(actions) ))) 251 | print("{0:=^19}".format('')) 252 | 253 | else: 254 | 255 | n_joints = 17 if not(FLAGS.predict_14) else 14 256 | encoder_inputs, decoder_outputs = model.get_all_batches( test_set_2d, test_set_3d, FLAGS.camera_frame, training=False) 257 | 258 | total_err, joint_err, step_time, loss = evaluate_batches( sess, model, 259 | data_mean_3d, data_std_3d, dim_to_use_3d, dim_to_ignore_3d, 260 | data_mean_2d, data_std_2d, dim_to_use_2d, dim_to_ignore_2d, 261 | current_step, encoder_inputs, decoder_outputs, current_epoch ) 262 | 263 | print("=============================\n" 264 | "Step-time (ms): %.4f\n" 265 | "Val loss avg: %.4f\n" 266 | "Val error avg (mm): %.2f\n" 267 | "=============================" % ( 1000*step_time, loss, total_err )) 268 | 269 | for i in range(n_joints): 270 | # 6 spaces, right-aligned, 5 decimal places 271 | print("Error in joint {0:02d} (mm): {1:>5.2f}".format(i+1, joint_err[i])) 272 | print("=============================") 273 | 274 | # Log the error to tensorboard 275 | summaries = sess.run( model.err_mm_summary, {model.err_mm: total_err} ) 276 | model.test_writer.add_summary( summaries, current_step ) 277 | 278 | # Save the model 279 | print( "Saving the model... ", end="" ) 280 | start_time = time.time() 281 | model.saver.save(sess, os.path.join(train_dir, 'checkpoint'), global_step=current_step ) 282 | print( "done in {0:.2f} ms".format(1000*(time.time() - start_time)) ) 283 | 284 | # Reset global time and loss 285 | step_time, loss = 0, 0 286 | 287 | sys.stdout.flush() 288 | 289 | 290 | def get_action_subset( poses_set, action ): 291 | """ 292 | Given a preloaded dictionary of poses, load the subset of a particular action 293 | 294 | Args 295 | poses_set: dictionary with keys k=(subject, action, seqname), 296 | values v=(nxd matrix of poses) 297 | action: string. The action that we want to filter out 298 | Returns 299 | poses_subset: dictionary with same structure as poses_set, but only with the 300 | specified action. 301 | """ 302 | return {k:v for k, v in poses_set.items() if k[1] == action} 303 | 304 | 305 | def evaluate_batches( sess, model, 306 | data_mean_3d, data_std_3d, dim_to_use_3d, dim_to_ignore_3d, 307 | data_mean_2d, data_std_2d, dim_to_use_2d, dim_to_ignore_2d, 308 | current_step, encoder_inputs, decoder_outputs, current_epoch=0 ): 309 | """ 310 | Generic method that evaluates performance of a list of batches. 311 | May be used to evaluate all actions or a single action. 312 | 313 | Args 314 | sess 315 | model 316 | data_mean_3d 317 | data_std_3d 318 | dim_to_use_3d 319 | dim_to_ignore_3d 320 | data_mean_2d 321 | data_std_2d 322 | dim_to_use_2d 323 | dim_to_ignore_2d 324 | current_step 325 | encoder_inputs 326 | decoder_outputs 327 | current_epoch 328 | Returns 329 | 330 | total_err 331 | joint_err 332 | step_time 333 | loss 334 | """ 335 | 336 | n_joints = 17 if not(FLAGS.predict_14) else 14 337 | nbatches = len( encoder_inputs ) 338 | 339 | # Loop through test examples 340 | all_dists, start_time, loss = [], time.time(), 0. 341 | log_every_n_batches = 100 342 | for i in range(nbatches): 343 | 344 | if current_epoch > 0 and (i+1) % log_every_n_batches == 0: 345 | print("Working on test epoch {0}, batch {1} / {2}".format( current_epoch, i+1, nbatches) ) 346 | 347 | enc_in, dec_out = encoder_inputs[i], decoder_outputs[i] 348 | dp = 1.0 # dropout keep probability is always 1 at test time 349 | step_loss, loss_summary, poses3d = model.step( sess, enc_in, dec_out, dp, isTraining=False ) 350 | loss += step_loss 351 | 352 | # denormalize 353 | enc_in = data_utils.unNormalizeData( enc_in, data_mean_2d, data_std_2d, dim_to_ignore_2d ) 354 | dec_out = data_utils.unNormalizeData( dec_out, data_mean_3d, data_std_3d, dim_to_ignore_3d ) 355 | poses3d = data_utils.unNormalizeData( poses3d, data_mean_3d, data_std_3d, dim_to_ignore_3d ) 356 | 357 | # Keep only the relevant dimensions 358 | dtu3d = np.hstack( (np.arange(3), dim_to_use_3d) ) if not(FLAGS.predict_14) else dim_to_use_3d 359 | 360 | dec_out = dec_out[:, dtu3d] 361 | poses3d = poses3d[:, dtu3d] 362 | 363 | assert dec_out.shape[0] == FLAGS.batch_size 364 | assert poses3d.shape[0] == FLAGS.batch_size 365 | 366 | if FLAGS.procrustes: 367 | # Apply per-frame procrustes alignment if asked to do so 368 | for j in range(FLAGS.batch_size): 369 | gt = np.reshape(dec_out[j,:],[-1,3]) 370 | out = np.reshape(poses3d[j,:],[-1,3]) 371 | _, Z, T, b, c = procrustes.compute_similarity_transform(gt,out,compute_optimal_scale=True) 372 | out = (b*out.dot(T))+c 373 | 374 | poses3d[j,:] = np.reshape(out,[-1,17*3] ) if not(FLAGS.predict_14) else np.reshape(out,[-1,14*3] ) 375 | 376 | # Compute Euclidean distance error per joint 377 | sqerr = (poses3d - dec_out)**2 # Squared error between prediction and expected output 378 | dists = np.zeros( (sqerr.shape[0], n_joints) ) # Array with L2 error per joint in mm 379 | dist_idx = 0 380 | for k in np.arange(0, n_joints*3, 3): 381 | # Sum across X,Y, and Z dimenstions to obtain L2 distance 382 | dists[:,dist_idx] = np.sqrt( np.sum( sqerr[:, k:k+3], axis=1 )) 383 | dist_idx = dist_idx + 1 384 | 385 | all_dists.append(dists) 386 | assert sqerr.shape[0] == FLAGS.batch_size 387 | 388 | step_time = (time.time() - start_time) / nbatches 389 | loss = loss / nbatches 390 | 391 | all_dists = np.vstack( all_dists ) 392 | 393 | # Error per joint and total for all passed batches 394 | joint_err = np.mean( all_dists, axis=0 ) 395 | total_err = np.mean( all_dists ) 396 | 397 | return total_err, joint_err, step_time, loss 398 | 399 | 400 | def sample(): 401 | """Get samples from a model and visualize them""" 402 | 403 | actions = data_utils.define_actions( FLAGS.action ) 404 | 405 | # Load camera parameters 406 | SUBJECT_IDS = [1,5,6,7,8,9,11] 407 | rcams = cameras.load_cameras(FLAGS.cameras_path, SUBJECT_IDS) 408 | 409 | # Load 3d data and load (or create) 2d projections 410 | train_set_3d, test_set_3d, data_mean_3d, data_std_3d, dim_to_ignore_3d, dim_to_use_3d, train_root_positions, test_root_positions = data_utils.read_3d_data( 411 | actions, FLAGS.data_dir, FLAGS.camera_frame, rcams, FLAGS.predict_14 ) 412 | 413 | if FLAGS.use_sh: 414 | train_set_2d, test_set_2d, data_mean_2d, data_std_2d, dim_to_ignore_2d, dim_to_use_2d = data_utils.read_2d_predictions(actions, FLAGS.data_dir) 415 | else: 416 | train_set_2d, test_set_2d, data_mean_2d, data_std_2d, dim_to_ignore_2d, dim_to_use_2d = data_utils.create_2d_data( actions, FLAGS.data_dir, rcams ) 417 | print( "done reading and normalizing data." ) 418 | 419 | device_count = {"GPU": 0} if FLAGS.use_cpu else {"GPU": 1} 420 | with tf.Session(config=tf.ConfigProto( device_count = device_count )) as sess: 421 | # === Create the model === 422 | print("Creating %d layers of %d units." % (FLAGS.num_layers, FLAGS.linear_size)) 423 | batch_size = 128 424 | model = create_model(sess, actions, batch_size) 425 | print("Model loaded") 426 | 427 | for key2d in test_set_2d.keys(): 428 | 429 | (subj, b, fname) = key2d 430 | print( "Subject: {}, action: {}, fname: {}".format(subj, b, fname) ) 431 | 432 | # keys should be the same if 3d is in camera coordinates 433 | key3d = key2d if FLAGS.camera_frame else (subj, b, '{0}.h5'.format(fname.split('.')[0])) 434 | key3d = (subj, b, fname[:-3]) if (fname.endswith('-sh')) and FLAGS.camera_frame else key3d 435 | 436 | enc_in = test_set_2d[ key2d ] 437 | n2d, _ = enc_in.shape 438 | dec_out = test_set_3d[ key3d ] 439 | n3d, _ = dec_out.shape 440 | assert n2d == n3d 441 | 442 | # Split into about-same-size batches 443 | enc_in = np.array_split( enc_in, n2d // batch_size ) 444 | dec_out = np.array_split( dec_out, n3d // batch_size ) 445 | all_poses_3d = [] 446 | 447 | for bidx in range( len(enc_in) ): 448 | 449 | # Dropout probability 0 (keep probability 1) for sampling 450 | dp = 1.0 451 | _, _, poses3d = model.step(sess, enc_in[bidx], dec_out[bidx], dp, isTraining=False) 452 | 453 | # denormalize 454 | enc_in[bidx] = data_utils.unNormalizeData( enc_in[bidx], data_mean_2d, data_std_2d, dim_to_ignore_2d ) 455 | dec_out[bidx] = data_utils.unNormalizeData( dec_out[bidx], data_mean_3d, data_std_3d, dim_to_ignore_3d ) 456 | poses3d = data_utils.unNormalizeData( poses3d, data_mean_3d, data_std_3d, dim_to_ignore_3d ) 457 | all_poses_3d.append( poses3d ) 458 | 459 | # Put all the poses together 460 | enc_in, dec_out, poses3d = map( np.vstack, [enc_in, dec_out, all_poses_3d] ) 461 | 462 | # Convert back to world coordinates 463 | if FLAGS.camera_frame: 464 | N_CAMERAS = 4 465 | N_JOINTS_H36M = 32 466 | 467 | # Add global position back 468 | dec_out = dec_out + np.tile( test_root_positions[ key3d ], [1,N_JOINTS_H36M] ) 469 | 470 | # Load the appropriate camera 471 | subj, _, sname = key3d 472 | 473 | cname = sname.split('.')[1] # <-- camera name 474 | scams = {(subj,c+1): rcams[(subj,c+1)] for c in range(N_CAMERAS)} # cams of this subject 475 | scam_idx = [scams[(subj,c+1)][-1] for c in range(N_CAMERAS)].index( cname ) # index of camera used 476 | the_cam = scams[(subj, scam_idx+1)] # <-- the camera used 477 | R, T, f, c, k, p, name = the_cam 478 | assert name == cname 479 | 480 | def cam2world_centered(data_3d_camframe): 481 | data_3d_worldframe = cameras.camera_to_world_frame(data_3d_camframe.reshape((-1, 3)), R, T) 482 | data_3d_worldframe = data_3d_worldframe.reshape((-1, N_JOINTS_H36M*3)) 483 | # subtract root translation 484 | return data_3d_worldframe - np.tile( data_3d_worldframe[:,:3], (1,N_JOINTS_H36M) ) 485 | 486 | # Apply inverse rotation and translation 487 | dec_out = cam2world_centered(dec_out) 488 | poses3d = cam2world_centered(poses3d) 489 | 490 | # Grab a random batch to visualize 491 | enc_in, dec_out, poses3d = map( np.vstack, [enc_in, dec_out, poses3d] ) 492 | idx = np.random.permutation( enc_in.shape[0] ) 493 | enc_in, dec_out, poses3d = enc_in[idx, :], dec_out[idx, :], poses3d[idx, :] 494 | 495 | # Visualize random samples 496 | import matplotlib.gridspec as gridspec 497 | 498 | # 1080p = 1,920 x 1,080 499 | fig = plt.figure( figsize=(19.2, 10.8) ) 500 | 501 | gs1 = gridspec.GridSpec(5, 9) # 5 rows, 9 columns 502 | gs1.update(wspace=-0.00, hspace=0.05) # set the spacing between axes. 503 | plt.axis('off') 504 | 505 | subplot_idx, exidx = 1, 1 506 | nsamples = 15 507 | for i in np.arange( nsamples ): 508 | 509 | # Plot 2d pose 510 | ax1 = plt.subplot(gs1[subplot_idx-1]) 511 | p2d = enc_in[exidx,:] 512 | viz.show2Dpose( p2d, ax1 ) 513 | ax1.invert_yaxis() 514 | 515 | # Plot 3d gt 516 | ax2 = plt.subplot(gs1[subplot_idx], projection='3d') 517 | p3d = dec_out[exidx,:] 518 | viz.show3Dpose( p3d, ax2 ) 519 | 520 | # Plot 3d predictions 521 | ax3 = plt.subplot(gs1[subplot_idx+1], projection='3d') 522 | p3d = poses3d[exidx,:] 523 | viz.show3Dpose( p3d, ax3, lcolor="#9b59b6", rcolor="#2ecc71" ) 524 | 525 | exidx = exidx + 1 526 | subplot_idx = subplot_idx + 3 527 | 528 | plt.show() 529 | 530 | def main(_): 531 | if FLAGS.sample: 532 | sample() 533 | else: 534 | train() 535 | 536 | if __name__ == "__main__": 537 | tf.app.run() 538 | -------------------------------------------------------------------------------- /src/procrustes.py: -------------------------------------------------------------------------------- 1 | 2 | def compute_similarity_transform(X, Y, compute_optimal_scale=False): 3 | """ 4 | A port of MATLAB's `procrustes` function to Numpy. 5 | Adapted from http://stackoverflow.com/a/18927641/1884420 6 | 7 | Args 8 | X: array NxM of targets, with N number of points and M point dimensionality 9 | Y: array NxM of inputs 10 | compute_optimal_scale: whether we compute optimal scale or force it to be 1 11 | 12 | Returns: 13 | d: squared error after transformation 14 | Z: transformed Y 15 | T: computed rotation 16 | b: scaling 17 | c: translation 18 | """ 19 | import numpy as np 20 | 21 | muX = X.mean(0) 22 | muY = Y.mean(0) 23 | 24 | X0 = X - muX 25 | Y0 = Y - muY 26 | 27 | ssX = (X0**2.).sum() 28 | ssY = (Y0**2.).sum() 29 | 30 | # centred Frobenius norm 31 | normX = np.sqrt(ssX) 32 | normY = np.sqrt(ssY) 33 | 34 | # scale to equal (unit) norm 35 | X0 = X0 / normX 36 | Y0 = Y0 / normY 37 | 38 | # optimum rotation matrix of Y 39 | A = np.dot(X0.T, Y0) 40 | U,s,Vt = np.linalg.svd(A,full_matrices=False) 41 | V = Vt.T 42 | T = np.dot(V, U.T) 43 | 44 | # Make sure we have a rotation 45 | detT = np.linalg.det(T) 46 | V[:,-1] *= np.sign( detT ) 47 | s[-1] *= np.sign( detT ) 48 | T = np.dot(V, U.T) 49 | 50 | traceTA = s.sum() 51 | 52 | if compute_optimal_scale: # Compute optimum scaling of Y. 53 | b = traceTA * normX / normY 54 | d = 1 - traceTA**2 55 | Z = normX*traceTA*np.dot(Y0, T) + muX 56 | else: # If no scaling allowed 57 | b = 1 58 | d = 1 + ssY/ssX - 2 * traceTA * normY / normX 59 | Z = normY*np.dot(Y0, T) + muX 60 | 61 | c = muX - b*np.dot(muY, T) 62 | 63 | return d, Z, T, b, c 64 | -------------------------------------------------------------------------------- /src/viz.py: -------------------------------------------------------------------------------- 1 | 2 | """Functions to visualize human poses""" 3 | 4 | import matplotlib.pyplot as plt 5 | import data_utils 6 | import numpy as np 7 | import h5py 8 | import os 9 | from mpl_toolkits.mplot3d import Axes3D 10 | 11 | def show3Dpose(channels, ax, lcolor="#3498db", rcolor="#e74c3c", add_labels=False): # blue, orange 12 | """ 13 | Visualize a 3d skeleton 14 | 15 | Args 16 | channels: 96x1 vector. The pose to plot. 17 | ax: matplotlib 3d axis to draw on 18 | lcolor: color for left part of the body 19 | rcolor: color for right part of the body 20 | add_labels: whether to add coordinate labels 21 | Returns 22 | Nothing. Draws on ax. 23 | """ 24 | 25 | assert channels.size == len(data_utils.H36M_NAMES)*3, "channels should have 96 entries, it has %d instead" % channels.size 26 | vals = np.reshape( channels, (len(data_utils.H36M_NAMES), -1) ) 27 | 28 | I = np.array([1,2,3,1,7,8,1, 13,14,15,14,18,19,14,26,27])-1 # start points 29 | J = np.array([2,3,4,7,8,9,13,14,15,16,18,19,20,26,27,28])-1 # end points 30 | LR = np.array([1,1,1,0,0,0,0, 0, 0, 0, 0, 0, 0, 1, 1, 1], dtype=bool) 31 | 32 | # Make connection matrix 33 | for i in np.arange( len(I) ): 34 | x, y, z = [np.array( [vals[I[i], j], vals[J[i], j]] ) for j in range(3)] 35 | ax.plot(x, y, z, lw=2, c=lcolor if LR[i] else rcolor) 36 | 37 | RADIUS = 750 # space around the subject 38 | xroot, yroot, zroot = vals[0,0], vals[0,1], vals[0,2] 39 | ax.set_xlim3d([-RADIUS+xroot, RADIUS+xroot]) 40 | ax.set_zlim3d([-RADIUS+zroot, RADIUS+zroot]) 41 | ax.set_ylim3d([-RADIUS+yroot, RADIUS+yroot]) 42 | 43 | if add_labels: 44 | ax.set_xlabel("x") 45 | ax.set_ylabel("y") 46 | ax.set_zlabel("z") 47 | 48 | # Get rid of the ticks and tick labels 49 | ax.set_xticks([]) 50 | ax.set_yticks([]) 51 | ax.set_zticks([]) 52 | 53 | ax.get_xaxis().set_ticklabels([]) 54 | ax.get_yaxis().set_ticklabels([]) 55 | ax.set_zticklabels([]) 56 | ax.set_aspect('equal') 57 | 58 | # Get rid of the panes (actually, make them white) 59 | white = (1.0, 1.0, 1.0, 0.0) 60 | ax.w_xaxis.set_pane_color(white) 61 | ax.w_yaxis.set_pane_color(white) 62 | # Keep z pane 63 | 64 | # Get rid of the lines in 3d 65 | ax.w_xaxis.line.set_color(white) 66 | ax.w_yaxis.line.set_color(white) 67 | ax.w_zaxis.line.set_color(white) 68 | 69 | def show2Dpose(channels, ax, lcolor="#3498db", rcolor="#e74c3c", add_labels=False): 70 | """ 71 | Visualize a 2d skeleton 72 | 73 | Args 74 | channels: 64x1 vector. The pose to plot. 75 | ax: matplotlib axis to draw on 76 | lcolor: color for left part of the body 77 | rcolor: color for right part of the body 78 | add_labels: whether to add coordinate labels 79 | Returns 80 | Nothing. Draws on ax. 81 | """ 82 | 83 | assert channels.size == len(data_utils.H36M_NAMES)*2, "channels should have 64 entries, it has %d instead" % channels.size 84 | vals = np.reshape( channels, (len(data_utils.H36M_NAMES), -1) ) 85 | 86 | I = np.array([1,2,3,1,7,8,1, 13,14,14,18,19,14,26,27])-1 # start points 87 | J = np.array([2,3,4,7,8,9,13,14,16,18,19,20,26,27,28])-1 # end points 88 | LR = np.array([1,1,1,0,0,0,0, 0, 0, 0, 0, 0, 1, 1, 1], dtype=bool) 89 | 90 | # Make connection matrix 91 | for i in np.arange( len(I) ): 92 | x, y = [np.array( [vals[I[i], j], vals[J[i], j]] ) for j in range(2)] 93 | ax.plot(x, y, lw=2, c=lcolor if LR[i] else rcolor) 94 | 95 | # Get rid of the ticks 96 | ax.set_xticks([]) 97 | ax.set_yticks([]) 98 | 99 | # Get rid of tick labels 100 | ax.get_xaxis().set_ticklabels([]) 101 | ax.get_yaxis().set_ticklabels([]) 102 | 103 | RADIUS = 350 # space around the subject 104 | xroot, yroot = vals[0,0], vals[0,1] 105 | ax.set_xlim([-RADIUS+xroot, RADIUS+xroot]) 106 | ax.set_ylim([-RADIUS+yroot, RADIUS+yroot]) 107 | if add_labels: 108 | ax.set_xlabel("x") 109 | ax.set_ylabel("z") 110 | 111 | ax.set_aspect('equal') 112 | --------------------------------------------------------------------------------