├── ISSUE_TEMPLATE.md
├── LICENCE
├── README.md
├── imgs
    └── viz_example.png
└── src
    ├── cameras.py
    ├── data_utils.py
    ├── linear_model.py
    ├── predict_3dpose.py
    ├── procrustes.py
    └── viz.py


/ISSUE_TEMPLATE.md:
--------------------------------------------------------------------------------
 1 | Thanks for your interest in our research!
 2 | 
 3 | If you have problems running our code, please include
 4 | 
 5 | 1. Your operating system
 6 | 2. Your tensorflow version
 7 | 3. Your python version
 8 | 4. The stack trace of the error that you see
 9 | 
10 | This is **research code**, and its primary purpose is to reproduce the scientific results that you see in our paper.
11 | If you are looking for ways to extend our work such as 
12 | 
13 | * handling arbitrary images
14 | * handling videos as input
15 | * using other 2d detectors
16 | * large-scale deployment
17 | * handling missing data
18 | * you want to make a startup out of this code
19 | 
20 | then pull requests are welcome, but it is very unlikely we will have the time to help you.
21 | 


--------------------------------------------------------------------------------
/LICENCE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2016 Julieta Martinez, Rayat Hossain, Javier Romero
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 
23 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ## Overview
  2 | 
  3 | This is the code for [this](https://youtu.be/EMjPqgLX14A) video on Youtube by Siraj Raval. 
  4 | 
  5 | ## 3d-pose-baseline
  6 | 
  7 | This is the code for the paper
  8 | 
  9 | Julieta Martinez, Rayat Hossain, Javier Romero, James J. Little.
 10 | _A simple yet effective baseline for 3d human pose estimation._
 11 | In ICCV, 2017. https://arxiv.org/pdf/1705.03098.pdf.
 12 | 
 13 | The code in this repository was mostly written by
 14 | [Julieta Martinez](https://github.com/una-dinosauria),
 15 | [Rayat Hossain](https://github.com/rayat137) and
 16 | [Javier Romero](https://github.com/libicocco).
 17 | 
 18 | We provide a strong baseline for 3d human pose estimation that also sheds light
 19 | on the challenges of current approaches. Our model is lightweight and we strive
 20 | to make our code transparent, compact, and easy-to-understand.
 21 | 
 22 | ### Dependencies
 23 | 
 24 | * [h5py](http://www.h5py.org/)
 25 | * [tensorflow](https://www.tensorflow.org/) 1.0 or later
 26 | 
 27 | ### First of all
 28 | 1. Watch our video: https://youtu.be/Hmi3Pd9x1BE
 29 | 2. Clone this repository and get the data. We provide the [Human3.6M](http://vision.imar.ro/human3.6m/description.php) dataset in 3d points, camera parameters to produce ground truth 2d detections, and [Stacked Hourglass](https://github.com/anewell/pose-hg-demo) detections.
 30 | 
 31 | ```bash
 32 | git clone https://github.com/una-dinosauria/3d-pose-baseline.git
 33 | cd 3d-pose-baseline
 34 | mkdir data
 35 | cd data
 36 | wget https://www.dropbox.com/s/e35qv3n6zlkouki/h36m.zip
 37 | unzip h36m.zip
 38 | rm h36m.zip
 39 | cd ..
 40 | ```
 41 | 
 42 | ### Quick demo
 43 | 
 44 | For a quick demo, you can train for one epoch and visualize the results. To train, run
 45 | 
 46 | `python src/predict_3dpose.py --camera_frame --residual --batch_norm --dropout 0.5 --max_norm --evaluateActionWise --use_sh --epochs 1`
 47 | 
 48 | This should take about <5 minutes to complete on a GTX 1080, and give you around 75 mm of error on the test set.
 49 | 
 50 | Now, to visualize the results, simply run
 51 | 
 52 | `python src/predict_3dpose.py --camera_frame --residual --batch_norm --dropout 0.5 --max_norm --evaluateActionWise --use_sh --epochs 1 --sample --load 24371`
 53 | 
 54 | This will produce a visualization similar to this:
 55 | 
 56 | ![Visualization example](/imgs/viz_example.png?raw=1)
 57 | 
 58 | ### Training
 59 | 
 60 | To train a model with clean 2d detections, run:
 61 | 
 62 | <!-- `python src/predict_3dpose.py --camera_frame --residual` -->
 63 | `python src/predict_3dpose.py --camera_frame --residual --batch_norm --dropout 0.5 --max_norm --evaluateActionWise`
 64 | 
 65 | This corresponds to Table 2, bottom row. `Ours (GT detections) (MA)`
 66 | 
 67 | To train on Stacked Hourglass detections, run
 68 | 
 69 | `python src/predict_3dpose.py --camera_frame --residual --batch_norm --dropout 0.5 --max_norm --evaluateActionWise --use_sh`
 70 | 
 71 | This corresponds to Table 2, next-to-last row. `Ours (SH detections) (MA)`
 72 | 
 73 | On a GTX 1080 GPU, this takes <8 ms for forward+backward computation, and
 74 | <6 ms for forward-only computation per batch of 64.
 75 | 
 76 | ### Pre-trained model
 77 | 
 78 | We also provide a model pre-trained on Stacked-Hourglass detections, available through [google drive](https://drive.google.com/file/d/0BxWzojlLp259MF9qSFpiVjl0cU0/view?usp=sharing)
 79 | 
 80 | To test the model, decompress the file at the top level of this project, and call
 81 | 
 82 | `python src/predict_3dpose.py --camera_frame --residual --batch_norm --dropout 0.5 --max_norm --evaluateActionWise --use_sh --epochs 200 --sample --load 4874200`
 83 | 
 84 | ### Citing
 85 | 
 86 | If you use our code, please cite our work
 87 | 
 88 | ```
 89 | @inproceedings{martinez_2017_3dbaseline,
 90 |   title={A simple yet effective baseline for 3d human pose estimation},
 91 |   author={Martinez, Julieta and Hossain, Rayat and Romero, Javier and Little, James J.},
 92 |   booktitle={ICCV},
 93 |   year={2017}
 94 | }
 95 | ```
 96 | 
 97 | ### License
 98 | MIT
 99 | 
100 | ## Credits
101 | 
102 | Credits for this code go to [una-dinosauria](https://github.com/una-dinosauria/3d-pose-baseline). I've merely created a wrapper to get people started.
103 | 


--------------------------------------------------------------------------------
/imgs/viz_example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/llSourcell/3D_Pose_Estimation/87c9d77e3bb0c1105eae74046c26f8b7f101ca45/imgs/viz_example.png


--------------------------------------------------------------------------------
/src/cameras.py:
--------------------------------------------------------------------------------
  1 | 
  2 | """Utilities to deal with the cameras of human3.6m"""
  3 | 
  4 | from __future__ import division
  5 | 
  6 | import h5py
  7 | import numpy as np
  8 | import matplotlib.pyplot as plt
  9 | import matplotlib.image as mpimg
 10 | import data_utils
 11 | import viz
 12 | 
 13 | def project_point_radial( P, R, T, f, c, k, p ):
 14 |   """
 15 |   Project points from 3d to 2d using camera parameters
 16 |   including radial and tangential distortion
 17 | 
 18 |   Args
 19 |     P: Nx3 points in world coordinates
 20 |     R: 3x3 Camera rotation matrix
 21 |     T: 3x1 Camera translation parameters
 22 |     f: (scalar) Camera focal length
 23 |     c: 2x1 Camera center
 24 |     k: 3x1 Camera radial distortion coefficients
 25 |     p: 2x1 Camera tangential distortion coefficients
 26 |   Returns
 27 |     Proj: Nx2 points in pixel space
 28 |     D: 1xN depth of each point in camera space
 29 |     radial: 1xN radial distortion per point
 30 |     tan: 1xN tangential distortion per point
 31 |     r2: 1xN squared radius of the projected points before distortion
 32 |   """
 33 | 
 34 |   # P is a matrix of 3-dimensional points
 35 |   assert len(P.shape) == 2
 36 |   assert P.shape[1] == 3
 37 | 
 38 |   N = P.shape[0]
 39 |   X = R.dot( P.T - T ) # rotate and translate
 40 |   XX = X[:2,:] / X[2,:]
 41 |   r2 = XX[0,:]**2 + XX[1,:]**2
 42 | 
 43 |   radial = 1 + np.einsum( 'ij,ij->j', np.tile(k,(1, N)), np.array([r2, r2**2, r2**3]) );
 44 |   tan = p[0]*XX[1,:] + p[1]*XX[0,:]
 45 | 
 46 |   XXX = XX * np.tile(radial+tan,(2,1)) + np.outer(np.array([p[1], p[0]]).reshape(-1), r2 )
 47 | 
 48 |   Proj = (f * XXX) + c
 49 |   Proj = Proj.T
 50 | 
 51 |   D = X[2,]
 52 | 
 53 |   return Proj, D, radial, tan, r2
 54 | 
 55 | def world_to_camera_frame(P, R, T):
 56 |   """
 57 |   Convert points from world to camera coordinates
 58 | 
 59 |   Args
 60 |     P: Nx3 3d points in world coordinates
 61 |     R: 3x3 Camera rotation matrix
 62 |     T: 3x1 Camera translation parameters
 63 |   Returns
 64 |     X_cam: Nx3 3d points in camera coordinates
 65 |   """
 66 | 
 67 |   assert len(P.shape) == 2
 68 |   assert P.shape[1] == 3
 69 | 
 70 |   X_cam = R.dot( P.T - T ) # rotate and translate
 71 | 
 72 |   return X_cam.T
 73 | 
 74 | def camera_to_world_frame(P, R, T):
 75 |   """Inverse of world_to_camera_frame
 76 | 
 77 |   Args
 78 |     P: Nx3 points in camera coordinates
 79 |     R: 3x3 Camera rotation matrix
 80 |     T: 3x1 Camera translation parameters
 81 |   Returns
 82 |     X_cam: Nx3 points in world coordinates
 83 |   """
 84 | 
 85 |   assert len(P.shape) == 2
 86 |   assert P.shape[1] == 3
 87 | 
 88 |   X_cam = R.T.dot( P.T ) + T # rotate and translate
 89 | 
 90 |   return X_cam.T
 91 | 
 92 | def load_camera_params( hf, path ):
 93 |   """Load h36m camera parameters
 94 | 
 95 |   Args
 96 |     hf: hdf5 open file with h36m cameras data
 97 |     path: path or key inside hf to the camera we are interested in
 98 |   Returns
 99 |     R: 3x3 Camera rotation matrix
100 |     T: 3x1 Camera translation parameters
101 |     f: (scalar) Camera focal length
102 |     c: 2x1 Camera center
103 |     k: 3x1 Camera radial distortion coefficients
104 |     p: 2x1 Camera tangential distortion coefficients
105 |     name: String with camera id
106 |   """
107 | 
108 |   R = hf[ path.format('R') ][:]
109 |   R = R.T
110 | 
111 |   T = hf[ path.format('T') ][:]
112 |   f = hf[ path.format('f') ][:]
113 |   c = hf[ path.format('c') ][:]
114 |   k = hf[ path.format('k') ][:]
115 |   p = hf[ path.format('p') ][:]
116 | 
117 |   name = hf[ path.format('Name') ][:]
118 |   name = "".join( [chr(item) for item in name] )
119 | 
120 |   return R, T, f, c, k, p, name
121 | 
122 | def load_cameras( bpath='cameras.h5', subjects=[1,5,6,7,8,9,11] ):
123 |   """Loads the cameras of h36m
124 | 
125 |   Args
126 |     bpath: path to hdf5 file with h36m camera data
127 |     subjects: List of ints representing the subject IDs for which cameras are requested
128 |   Returns
129 |     rcams: dictionary of 4 tuples per subject ID containing its camera parameters for the 4 h36m cams
130 |   """
131 |   rcams = {}
132 | 
133 |   with h5py.File(bpath,'r') as hf:
134 |     for s in subjects:
135 |       for c in range(4): # There are 4 cameras in human3.6m
136 |         rcams[(s, c+1)] = load_camera_params(hf, 'subject%d/camera%d/{0}' % (s,c+1) )
137 | 
138 |   return rcams
139 | 


--------------------------------------------------------------------------------
/src/data_utils.py:
--------------------------------------------------------------------------------
  1 | 
  2 | """Utility functions for dealing with human3.6m data."""
  3 | 
  4 | from __future__ import division
  5 | 
  6 | import os
  7 | import numpy as np
  8 | import matplotlib.pyplot as plt
  9 | from mpl_toolkits.mplot3d import Axes3D
 10 | import cameras
 11 | import viz
 12 | import h5py
 13 | import glob
 14 | import copy
 15 | 
 16 | # Human3.6m IDs for training and testing
 17 | TRAIN_SUBJECTS = [1,5,6,7,8]
 18 | TEST_SUBJECTS  = [9,11]
 19 | 
 20 | # Joints in H3.6M -- data has 32 joints, but only 17 that move; these are the indices.
 21 | H36M_NAMES = ['']*32
 22 | H36M_NAMES[0]  = 'Hip'
 23 | H36M_NAMES[1]  = 'RHip'
 24 | H36M_NAMES[2]  = 'RKnee'
 25 | H36M_NAMES[3]  = 'RFoot'
 26 | H36M_NAMES[6]  = 'LHip'
 27 | H36M_NAMES[7]  = 'LKnee'
 28 | H36M_NAMES[8]  = 'LFoot'
 29 | H36M_NAMES[12] = 'Spine'
 30 | H36M_NAMES[13] = 'Thorax'
 31 | H36M_NAMES[14] = 'Neck/Nose'
 32 | H36M_NAMES[15] = 'Head'
 33 | H36M_NAMES[17] = 'LShoulder'
 34 | H36M_NAMES[18] = 'LElbow'
 35 | H36M_NAMES[19] = 'LWrist'
 36 | H36M_NAMES[25] = 'RShoulder'
 37 | H36M_NAMES[26] = 'RElbow'
 38 | H36M_NAMES[27] = 'RWrist'
 39 | 
 40 | # Stacked Hourglass produces 16 joints. These are the names.
 41 | SH_NAMES = ['']*16
 42 | SH_NAMES[0]  = 'RFoot'
 43 | SH_NAMES[1]  = 'RKnee'
 44 | SH_NAMES[2]  = 'RHip'
 45 | SH_NAMES[3]  = 'LHip'
 46 | SH_NAMES[4]  = 'LKnee'
 47 | SH_NAMES[5]  = 'LFoot'
 48 | SH_NAMES[6]  = 'Hip'
 49 | SH_NAMES[7]  = 'Spine'
 50 | SH_NAMES[8]  = 'Thorax'
 51 | SH_NAMES[9]  = 'Head'
 52 | SH_NAMES[10] = 'RWrist'
 53 | SH_NAMES[11] = 'RElbow'
 54 | SH_NAMES[12] = 'RShoulder'
 55 | SH_NAMES[13] = 'LShoulder'
 56 | SH_NAMES[14] = 'LElbow'
 57 | SH_NAMES[15] = 'LWrist'
 58 | 
 59 | def load_data( bpath, subjects, actions, dim=3 ):
 60 |   """
 61 |   Loads 2d ground truth from disk, and puts it in an easy-to-acess dictionary
 62 | 
 63 |   Args
 64 |     bpath: String. Path where to load the data from
 65 |     subjects: List of integers. Subjects whose data will be loaded
 66 |     actions: List of strings. The actions to load
 67 |     dim: Integer={2,3}. Load 2 or 3-dimensional data
 68 |   Returns:
 69 |     data: Dictionary with keys k=(subject, action, seqname)
 70 |       values v=(nx(32*2) matrix of 2d ground truth)
 71 |       There will be 2 entries per subject/action if loading 3d data
 72 |       There will be 8 entries per subject/action if loading 2d data
 73 |   """
 74 | 
 75 |   if not dim in [2,3]:
 76 |     raise(ValueError, 'dim must be 2 or 3')
 77 | 
 78 |   data = {}
 79 | 
 80 |   for subj in subjects:
 81 |     for action in actions:
 82 | 
 83 |       print('Reading subject {0}, action {1}'.format(subj, action))
 84 | 
 85 |       dpath = os.path.join( bpath, 'S{0}'.format(subj), 'MyPoses/{0}D_positions'.format(dim), '{0}*.h5'.format(action) )
 86 |       print( dpath )
 87 | 
 88 |       fnames = glob.glob( dpath )
 89 | 
 90 |       loaded_seqs = 0
 91 |       for fname in fnames:
 92 |         seqname = os.path.basename( fname )
 93 | 
 94 |         # This rule makes sure SittingDown is not loaded when Sitting is requested
 95 |         if action == "Sitting" and seqname.startswith( "SittingDown" ):
 96 |           continue
 97 | 
 98 |         # This rule makes sure that WalkDog and WalkTogeter are not loaded when
 99 |         # Walking is requested.
100 |         if seqname.startswith( action ):
101 |           print( fname )
102 |           loaded_seqs = loaded_seqs + 1
103 | 
104 |           with h5py.File( fname, 'r' ) as h5f:
105 |             poses = h5f['{0}D_positions'.format(dim)][:]
106 | 
107 |           poses = poses.T
108 |           data[ (subj, action, seqname) ] = poses
109 | 
110 |       if dim == 2:
111 |         assert loaded_seqs == 8, "Expecting 8 sequences, found {0} instead".format( loaded_seqs )
112 |       else:
113 |         assert loaded_seqs == 2, "Expecting 2 sequences, found {0} instead".format( loaded_seqs )
114 | 
115 |   return data
116 | 
117 | 
118 | def load_stacked_hourglass(data_dir, subjects, actions):
119 |   """
120 |   Load 2d detections from disk, and put it in an easy-to-acess dictionary.
121 | 
122 |   Args
123 |     data_dir: string. Directory where to load the data from,
124 |     subjects: list of integers. Subjects whose data will be loaded.
125 |     actions: list of strings. The actions to load.
126 |   Returns
127 |     data: dictionary with keys k=(subject, action, seqname)
128 |           values v=(nx(32*2) matrix of 2d stacked hourglass detections)
129 |           There will be 2 entries per subject/action if loading 3d data
130 |           There will be 8 entries per subject/action if loading 2d data
131 |   """
132 |   # Permutation that goes from SH detections to H36M ordering.
133 |   SH_TO_GT_PERM = np.array([SH_NAMES.index( h ) for h in H36M_NAMES if h != '' and h in SH_NAMES])
134 |   assert np.all( SH_TO_GT_PERM == np.array([6,2,1,0,3,4,5,7,8,9,13,14,15,12,11,10]) )
135 | 
136 |   data = {}
137 | 
138 |   for subj in subjects:
139 |     for action in actions:
140 | 
141 |       print('Reading subject {0}, action {1}'.format(subj, action))
142 | 
143 |       dpath = os.path.join( data_dir, 'S{0}'.format(subj), 'StackedHourglass/{0}*.h5'.format(action) )
144 |       print( dpath )
145 | 
146 |       fnames = glob.glob( dpath )
147 | 
148 |       loaded_seqs = 0
149 |       for fname in fnames:
150 |         seqname = os.path.basename( fname )
151 |         seqname = seqname.replace('_',' ')
152 | 
153 |         # This rule makes sure SittingDown is not loaded when Sitting is requested
154 |         if action == "Sitting" and seqname.startswith( "SittingDown" ):
155 |           continue
156 | 
157 |         # This rule makes sure that WalkDog and WalkTogeter are not loaded when
158 |         # Walking is requested.
159 |         if seqname.startswith( action ):
160 |           print( fname )
161 |           loaded_seqs = loaded_seqs + 1
162 | 
163 |           # Load the poses from the .h5 file
164 |           with h5py.File( fname, 'r' ) as h5f:
165 |             poses = h5f['poses'][:]
166 | 
167 |             # Permute the loaded data to make it compatible with H36M
168 |             poses = poses[:,SH_TO_GT_PERM,:]
169 | 
170 |             # Reshape into n x (32*2) matrix
171 |             poses = np.reshape(poses,[poses.shape[0], -1])
172 |             poses_final = np.zeros([poses.shape[0], len(H36M_NAMES)*2])
173 | 
174 |             dim_to_use_x    = np.where(np.array([x != '' and x != 'Neck/Nose' for x in H36M_NAMES]))[0] * 2
175 |             dim_to_use_y    = dim_to_use_x+1
176 | 
177 |             dim_to_use = np.zeros(len(SH_NAMES)*2,dtype=np.int32)
178 |             dim_to_use[0::2] = dim_to_use_x
179 |             dim_to_use[1::2] = dim_to_use_y
180 |             poses_final[:,dim_to_use] = poses
181 |             seqname = seqname+'-sh'
182 |             data[ (subj, action, seqname) ] = poses_final
183 | 
184 |       # Make sure we loaded 8 sequences
185 |       if (subj == 11 and action == 'Directions'): # <-- this video is damaged
186 |         assert loaded_seqs == 7, "Expecting 7 sequences, found {0} instead. S:{1} {2}".format(loaded_seqs, subj, action )
187 |       else:
188 |         assert loaded_seqs == 8, "Expecting 8 sequences, found {0} instead. S:{1} {2}".format(loaded_seqs, subj, action )
189 | 
190 |   return data
191 | 
192 | 
193 | def normalization_stats(complete_data, dim, predict_14=False ):
194 |   """
195 |   Computes normalization statistics: mean and stdev, dimensions used and ignored
196 | 
197 |   Args
198 |     complete_data: nxd np array with poses
199 |     dim. integer={2,3} dimensionality of the data
200 |     predict_14. boolean. Whether to use only 14 joints
201 |   Returns
202 |     data_mean: np vector with the mean of the data
203 |     data_std: np vector with the standard deviation of the data
204 |     dimensions_to_ignore: list of dimensions not used in the model
205 |     dimensions_to_use: list of dimensions used in the model
206 |   """
207 |   if not dim in [2,3]:
208 |     raise(ValueError, 'dim must be 2 or 3')
209 | 
210 |   data_mean = np.mean(complete_data, axis=0)
211 |   data_std  =  np.std(complete_data, axis=0)
212 | 
213 |   # Encodes which 17 (or 14) 2d-3d pairs we are predicting
214 |   dimensions_to_ignore = []
215 |   if dim == 2:
216 |     dimensions_to_use    = np.where(np.array([x != '' and x != 'Neck/Nose' for x in H36M_NAMES]))[0]
217 |     dimensions_to_use    = np.sort( np.hstack( (dimensions_to_use*2, dimensions_to_use*2+1)))
218 |     dimensions_to_ignore = np.delete( np.arange(len(H36M_NAMES)*2), dimensions_to_use )
219 |   else: # dim == 3
220 |     dimensions_to_use = np.where(np.array([x != '' for x in H36M_NAMES]))[0]
221 |     dimensions_to_use = np.delete( dimensions_to_use, [0,7,9] if predict_14 else 0 )
222 | 
223 |     dimensions_to_use = np.sort( np.hstack( (dimensions_to_use*3,
224 |                                              dimensions_to_use*3+1,
225 |                                              dimensions_to_use*3+2)))
226 |     dimensions_to_ignore = np.delete( np.arange(len(H36M_NAMES)*3), dimensions_to_use )
227 | 
228 |   return data_mean, data_std, dimensions_to_ignore, dimensions_to_use
229 | 
230 | 
231 | def transform_world_to_camera(poses_set, cams, ncams=4 ):
232 |     """
233 |     Project 3d poses from world coordinate to camera coordinate system
234 |     Args
235 |       poses_set: dictionary with 3d poses
236 |       cams: dictionary with cameras
237 |       ncams: number of cameras per subject
238 |     Return:
239 |       t3d_camera: dictionary with 3d poses in camera coordinate
240 |     """
241 |     t3d_camera = {}
242 |     for t3dk in sorted( poses_set.keys() ):
243 | 
244 |       subj, action, seqname = t3dk
245 |       t3d_world = poses_set[ t3dk ]
246 | 
247 |       for c in range( ncams ):
248 |         R, T, f, c, k, p, name = cams[ (subj, c+1) ]
249 |         camera_coord = cameras.world_to_camera_frame( np.reshape(t3d_world, [-1, 3]), R, T)
250 |         camera_coord = np.reshape( camera_coord, [-1, len(H36M_NAMES)*3] )
251 | 
252 |         sname = seqname[:-3]+"."+name+".h5" # e.g.: Waiting 1.58860488.h5
253 |         t3d_camera[ (subj, action, sname) ] = camera_coord
254 | 
255 |     return t3d_camera
256 | 
257 | 
258 | def normalize_data(data, data_mean, data_std, dim_to_use ):
259 |   """
260 |   Normalizes a dictionary of poses
261 | 
262 |   Args
263 |     data: dictionary where values are
264 |     data_mean: np vector with the mean of the data
265 |     data_std: np vector with the standard deviation of the data
266 |     dim_to_use: list of dimensions to keep in the data
267 |   Returns
268 |     data_out: dictionary with same keys as data, but values have been normalized
269 |   """
270 |   data_out = {}
271 | 
272 |   for key in data.keys():
273 |     data[ key ] = data[ key ][ :, dim_to_use ]
274 |     mu = data_mean[dim_to_use]
275 |     stddev = data_std[dim_to_use]
276 |     data_out[ key ] = np.divide( (data[key] - mu), stddev )
277 | 
278 |   return data_out
279 | 
280 | 
281 | def unNormalizeData(normalized_data, data_mean, data_std, dimensions_to_ignore):
282 |   """
283 |   Un-normalizes a matrix whose mean has been substracted and that has been divided by
284 |   standard deviation. Some dimensions might also be missing
285 | 
286 |   Args
287 |     normalized_data: nxd matrix to unnormalize
288 |     data_mean: np vector with the mean of the data
289 |     data_std: np vector with the standard deviation of the data
290 |     dimensions_to_ignore: list of dimensions that were removed from the original data
291 |   Returns
292 |     orig_data: the input normalized_data, but unnormalized
293 |   """
294 |   T = normalized_data.shape[0] # Batch size
295 |   D = data_mean.shape[0] # Dimensionality
296 | 
297 |   orig_data = np.zeros((T, D), dtype=np.float32)
298 |   dimensions_to_use = np.array([dim for dim in range(D)
299 |                                 if dim not in dimensions_to_ignore])
300 | 
301 |   orig_data[:, dimensions_to_use] = normalized_data
302 | 
303 |   # Multiply times stdev and add the mean
304 |   stdMat = data_std.reshape((1, D))
305 |   stdMat = np.repeat(stdMat, T, axis=0)
306 |   meanMat = data_mean.reshape((1, D))
307 |   meanMat = np.repeat(meanMat, T, axis=0)
308 |   orig_data = np.multiply(orig_data, stdMat) + meanMat
309 |   return orig_data
310 | 
311 | 
312 | def define_actions( action ):
313 |   """
314 |   Given an action string, returns a list of corresponding actions.
315 | 
316 |   Args
317 |     action: String. either "all" or one of the h36m actions
318 |   Returns
319 |     actions: List of strings. Actions to use.
320 |   Raises
321 |     ValueError: if the action is not a valid action in Human 3.6M
322 |   """
323 |   actions = ["Directions","Discussion","Eating","Greeting",
324 |            "Phoning","Photo","Posing","Purchases",
325 |            "Sitting","SittingDown","Smoking","Waiting",
326 |            "WalkDog","Walking","WalkTogether"]
327 | 
328 |   if action == "All" or action == "all":
329 |     return actions
330 | 
331 |   if not action in actions:
332 |     raise( ValueError, "Unrecognized action: %s" % action )
333 | 
334 |   return [action]
335 | 
336 | 
337 | def project_to_cameras( poses_set, cams, ncams=4 ):
338 |   """
339 |   Project 3d poses using camera parameters
340 | 
341 |   Args
342 |     poses_set: dictionary with 3d poses
343 |     cams: dictionary with camera parameters
344 |     ncams: number of cameras per subject
345 |   Returns
346 |     t2d: dictionary with 2d poses
347 |   """
348 |   t2d = {}
349 | 
350 |   for t3dk in sorted( poses_set.keys() ):
351 |     subj, a, seqname = t3dk
352 |     t3d = poses_set[ t3dk ]
353 | 
354 |     for cam in range( ncams ):
355 |       R, T, f, c, k, p, name = cams[ (subj, cam+1) ]
356 |       pts2d, _, _, _, _ = cameras.project_point_radial( np.reshape(t3d, [-1, 3]), R, T, f, c, k, p )
357 | 
358 |       pts2d = np.reshape( pts2d, [-1, len(H36M_NAMES)*2] )
359 |       sname = seqname[:-3]+"."+name+".h5" # e.g.: Waiting 1.58860488.h5
360 |       t2d[ (subj, a, sname) ] = pts2d
361 | 
362 |   return t2d
363 | 
364 | 
365 | def read_2d_predictions( actions, data_dir ):
366 |   """
367 |   Loads 2d data from precomputed Stacked Hourglass detections
368 | 
369 |   Args
370 |     actions: list of strings. Actions to load
371 |     data_dir: string. Directory where the data can be loaded from
372 |   Returns
373 |     train_set: dictionary with loaded 2d stacked hourglass detections for training
374 |     test_set: dictionary with loaded 2d stacked hourglass detections for testing
375 |     data_mean: vector with the mean of the 2d training data
376 |     data_std: vector with the standard deviation of the 2d training data
377 |     dim_to_ignore: list with the dimensions to not predict
378 |     dim_to_use: list with the dimensions to predict
379 |   """
380 | 
381 |   train_set = load_stacked_hourglass( data_dir, TRAIN_SUBJECTS, actions)
382 |   test_set  = load_stacked_hourglass( data_dir, TEST_SUBJECTS,  actions)
383 | 
384 |   complete_train = copy.deepcopy( np.vstack( train_set.values() ))
385 |   data_mean, data_std,  dim_to_ignore, dim_to_use = normalization_stats( complete_train, dim=2 )
386 | 
387 |   train_set = normalize_data( train_set, data_mean, data_std, dim_to_use )
388 |   test_set  = normalize_data( test_set,  data_mean, data_std, dim_to_use )
389 | 
390 |   return train_set, test_set, data_mean, data_std, dim_to_ignore, dim_to_use
391 | 
392 | 
393 | def create_2d_data( actions, data_dir, rcams ):
394 |   """
395 |   Creates 2d poses by projecting 3d poses with the corresponding camera
396 |   parameters. Also normalizes the 2d poses
397 | 
398 |   Args
399 |     actions: list of strings. Actions to load
400 |     data_dir: string. Directory where the data can be loaded from
401 |     rcams: dictionary with camera parameters
402 |   Returns
403 |     train_set: dictionary with projected 2d poses for training
404 |     test_set: dictionary with projected 2d poses for testing
405 |     data_mean: vector with the mean of the 2d training data
406 |     data_std: vector with the standard deviation of the 2d training data
407 |     dim_to_ignore: list with the dimensions to not predict
408 |     dim_to_use: list with the dimensions to predict
409 |   """
410 | 
411 |   # Load 3d data
412 |   train_set = load_data( data_dir, TRAIN_SUBJECTS, actions, dim=3 )
413 |   test_set  = load_data( data_dir, TEST_SUBJECTS,  actions, dim=3 )
414 | 
415 |   train_set = project_to_cameras( train_set, rcams )
416 |   test_set  = project_to_cameras( test_set, rcams )
417 | 
418 |   # Compute normalization statistics.
419 |   complete_train = copy.deepcopy( np.vstack( train_set.values() ))
420 |   data_mean, data_std, dim_to_ignore, dim_to_use = normalization_stats( complete_train, dim=2 )
421 | 
422 |   # Divide every dimension independently
423 |   train_set = normalize_data( train_set, data_mean, data_std, dim_to_use )
424 |   test_set  = normalize_data( test_set,  data_mean, data_std, dim_to_use )
425 | 
426 |   return train_set, test_set, data_mean, data_std, dim_to_ignore, dim_to_use
427 | 
428 | 
429 | def read_3d_data( actions, data_dir, camera_frame, rcams, predict_14=False ):
430 |   """
431 |   Loads 3d poses, zero-centres and normalizes them
432 | 
433 |   Args
434 |     actions: list of strings. Actions to load
435 |     data_dir: string. Directory where the data can be loaded from
436 |     camera_frame: boolean. Whether to convert the data to camera coordinates
437 |     rcams: dictionary with camera parameters
438 |     predict_14: boolean. Whether to predict only 14 joints
439 |   Returns
440 |     train_set: dictionary with loaded 3d poses for training
441 |     test_set: dictionary with loaded 3d poses for testing
442 |     data_mean: vector with the mean of the 3d training data
443 |     data_std: vector with the standard deviation of the 3d training data
444 |     dim_to_ignore: list with the dimensions to not predict
445 |     dim_to_use: list with the dimensions to predict
446 |     train_root_positions: dictionary with the 3d positions of the root in train
447 |     test_root_positions: dictionary with the 3d positions of the root in test
448 |   """
449 |   # Load 3d data
450 |   train_set = load_data( data_dir, TRAIN_SUBJECTS, actions, dim=3 )
451 |   test_set  = load_data( data_dir, TEST_SUBJECTS,  actions, dim=3 )
452 | 
453 |   if camera_frame:
454 |     train_set = transform_world_to_camera( train_set, rcams )
455 |     test_set  = transform_world_to_camera( test_set, rcams )
456 | 
457 |   # Apply 3d post-processing (centering around root)
458 |   train_set, train_root_positions = postprocess_3d( train_set )
459 |   test_set,  test_root_positions  = postprocess_3d( test_set )
460 | 
461 |   # Compute normalization statistics
462 |   complete_train = copy.deepcopy( np.vstack( train_set.values() ))
463 |   data_mean, data_std, dim_to_ignore, dim_to_use = normalization_stats( complete_train, dim=3, predict_14=predict_14 )
464 | 
465 |   # Divide every dimension independently
466 |   train_set = normalize_data( train_set, data_mean, data_std, dim_to_use )
467 |   test_set  = normalize_data( test_set,  data_mean, data_std, dim_to_use )
468 | 
469 |   return train_set, test_set, data_mean, data_std, dim_to_ignore, dim_to_use, train_root_positions, test_root_positions
470 | 
471 | 
472 | def postprocess_3d( poses_set ):
473 |   """
474 |   Center 3d points around root
475 | 
476 |   Args
477 |     poses_set: dictionary with 3d data
478 |   Returns
479 |     poses_set: dictionary with 3d data centred around root (center hip) joint
480 |     root_positions: dictionary with the original 3d position of each pose
481 |   """
482 |   root_positions = {}
483 |   for k in poses_set.keys():
484 |     # Keep track of the global position
485 |     root_positions[k] = copy.deepcopy(poses_set[k][:,:3])
486 | 
487 |     # Remove the root from the 3d position
488 |     poses = poses_set[k]
489 |     poses = poses - np.tile( poses[:,:3], [1, len(H36M_NAMES)] )
490 |     poses_set[k] = poses
491 | 
492 |   return poses_set, root_positions
493 | 


--------------------------------------------------------------------------------
/src/linear_model.py:
--------------------------------------------------------------------------------
  1 | 
  2 | """Simple model to regress 3d human poses from 2d joint locations"""
  3 | 
  4 | from __future__ import absolute_import
  5 | from __future__ import division
  6 | from __future__ import print_function
  7 | 
  8 | from tensorflow.python.ops import variable_scope as vs
  9 | 
 10 | import os
 11 | import numpy as np
 12 | from six.moves import xrange  # pylint: disable=redefined-builtin
 13 | import tensorflow as tf
 14 | import data_utils
 15 | import cameras as cam
 16 | 
 17 | def kaiming(shape, dtype, partition_info=None):
 18 |   """Kaiming initialization as described in https://arxiv.org/pdf/1502.01852.pdf
 19 | 
 20 |   Args
 21 |     shape: dimensions of the tf array to initialize
 22 |     dtype: data type of the array
 23 |     partition_info: (Optional) info about how the variable is partitioned.
 24 |       See https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/init_ops.py#L26
 25 |       Needed to be used as an initializer.
 26 |   Returns
 27 |     Tensorflow array with initial weights
 28 |   """
 29 |   return(tf.truncated_normal(shape, dtype=dtype)*tf.sqrt(2/float(shape[0])))
 30 | 
 31 | class LinearModel(object):
 32 |   """ A simple Linear+RELU model """
 33 | 
 34 |   def __init__(self,
 35 |                linear_size,
 36 |                num_layers,
 37 |                residual,
 38 |                batch_norm,
 39 |                max_norm,
 40 |                batch_size,
 41 |                learning_rate,
 42 |                summaries_dir,
 43 |                predict_14=False,
 44 |                dtype=tf.float32):
 45 |     """Creates the linear + relu model
 46 | 
 47 |     Args
 48 |       linear_size: integer. number of units in each layer of the model
 49 |       num_layers: integer. number of bilinear blocks in the model
 50 |       residual: boolean. Whether to add residual connections
 51 |       batch_norm: boolean. Whether to use batch normalization
 52 |       max_norm: boolean. Whether to clip weights to a norm of 1
 53 |       batch_size: integer. The size of the batches used during training
 54 |       learning_rate: float. Learning rate to start with
 55 |       summaries_dir: String. Directory where to log progress
 56 |       predict_14: boolean. Whether to predict 14 instead of 17 joints
 57 |       dtype: the data type to use to store internal variables
 58 |     """
 59 | 
 60 |     # There are in total 17 joints in H3.6M and 16 in MPII (and therefore in stacked
 61 |     # hourglass detections). We settled with 16 joints in 2d just to make models
 62 |     # compatible (e.g. you can train on ground truth 2d and test on SH detections).
 63 |     # This does not seem to have an effect on prediction performance.
 64 |     self.HUMAN_2D_SIZE = 16 * 2
 65 | 
 66 |     # In 3d all the predictions are zero-centered around the root (hip) joint, so
 67 |     # we actually predict only 16 joints. The error is still computed over 17 joints,
 68 |     # because if one uses, e.g. Procrustes alignment, there is still error in the
 69 |     # hip to account for!
 70 |     # There is also an option to predict only 14 joints, which makes our results
 71 |     # directly comparable to those in https://arxiv.org/pdf/1611.09010.pdf
 72 |     self.HUMAN_3D_SIZE = 14 * 3 if predict_14 else 16 * 3
 73 | 
 74 |     self.input_size  = self.HUMAN_2D_SIZE
 75 |     self.output_size = self.HUMAN_3D_SIZE
 76 | 
 77 |     self.isTraining = tf.placeholder(tf.bool,name="isTrainingflag")
 78 |     self.dropout_keep_prob = tf.placeholder(tf.float32, name="dropout_keep_prob")
 79 | 
 80 |     # Summary writers for train and test runs
 81 |     self.train_writer = tf.summary.FileWriter( os.path.join(summaries_dir, 'train' ))
 82 |     self.test_writer  = tf.summary.FileWriter( os.path.join(summaries_dir, 'test' ))
 83 | 
 84 |     self.linear_size   = linear_size
 85 |     self.batch_size    = batch_size
 86 |     self.learning_rate = tf.Variable( float(learning_rate), trainable=False, dtype=dtype, name="learning_rate")
 87 |     self.global_step   = tf.Variable(0, trainable=False, name="global_step")
 88 |     decay_steps = 100000  # empirical
 89 |     decay_rate = 0.96     # empirical
 90 |     self.learning_rate = tf.train.exponential_decay(self.learning_rate, self.global_step, decay_steps, decay_rate)
 91 | 
 92 |     # === Transform the inputs ===
 93 |     with vs.variable_scope("inputs"):
 94 | 
 95 |       # in=2d poses, out=3d poses
 96 |       enc_in  = tf.placeholder(dtype, shape=[None, self.input_size], name="enc_in")
 97 |       dec_out = tf.placeholder(dtype, shape=[None, self.output_size], name="dec_out")
 98 | 
 99 |       self.encoder_inputs  = enc_in
100 |       self.decoder_outputs = dec_out
101 | 
102 |     # === Create the linear + relu combos ===
103 |     with vs.variable_scope( "linear_model" ):
104 | 
105 |       # === First layer, brings dimensionality up to linear_size ===
106 |       w1 = tf.get_variable( name="w1", initializer=kaiming, shape=[self.HUMAN_2D_SIZE, linear_size], dtype=dtype )
107 |       b1 = tf.get_variable( name="b1", initializer=kaiming, shape=[linear_size], dtype=dtype )
108 |       w1 = tf.clip_by_norm(w1,1) if max_norm else w1
109 |       y3 = tf.matmul( enc_in, w1 ) + b1
110 | 
111 |       if batch_norm:
112 |         y3 = tf.layers.batch_normalization(y3,training=self.isTraining, name="batch_normalization")
113 |       y3 = tf.nn.relu( y3 )
114 |       y3 = tf.nn.dropout( y3, self.dropout_keep_prob )
115 | 
116 |       # === Create multiple bi-linear layers ===
117 |       for idx in range( num_layers ):
118 |         y3 = self.two_linear( y3, linear_size, residual, self.dropout_keep_prob, max_norm, batch_norm, dtype, idx )
119 | 
120 |       # === Last linear layer has HUMAN_3D_SIZE in output ===
121 |       w4 = tf.get_variable( name="w4", initializer=kaiming, shape=[linear_size, self.HUMAN_3D_SIZE], dtype=dtype )
122 |       b4 = tf.get_variable( name="b4", initializer=kaiming, shape=[self.HUMAN_3D_SIZE], dtype=dtype )
123 |       w4 = tf.clip_by_norm(w4,1) if max_norm else w4
124 |       y = tf.matmul(y3, w4) + b4
125 |       # === End linear model ===
126 | 
127 |     # Store the outputs here
128 |     self.outputs = y
129 |     self.loss = tf.reduce_mean(tf.square(y - dec_out))
130 |     self.loss_summary = tf.summary.scalar('loss/loss', self.loss)
131 | 
132 |     # To keep track of the loss in mm
133 |     self.err_mm = tf.placeholder( tf.float32, name="error_mm" )
134 |     self.err_mm_summary = tf.summary.scalar( "loss/error_mm", self.err_mm )
135 | 
136 |     # Gradients and update operation for training the model.
137 |     opt = tf.train.AdamOptimizer( self.learning_rate )
138 |     update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
139 | 
140 |     with tf.control_dependencies(update_ops):
141 | 
142 |       # Update all the trainable parameters
143 |       gradients = opt.compute_gradients(self.loss)
144 |       self.gradients = [[] if i==None else i for i in gradients]
145 |       self.updates = opt.apply_gradients(gradients, global_step=self.global_step)
146 | 
147 |     # Keep track of the learning rate
148 |     self.learning_rate_summary = tf.summary.scalar('learning_rate/learning_rate', self.learning_rate)
149 | 
150 |     # To save the model
151 |     self.saver = tf.train.Saver( tf.global_variables(), max_to_keep=10 )
152 | 
153 | 
154 |   def two_linear( self, xin, linear_size, residual, dropout_keep_prob, max_norm, batch_norm, dtype, idx ):
155 |     """
156 |     Make a bi-linear block with optional residual connection
157 | 
158 |     Args
159 |       xin: the batch that enters the block
160 |       linear_size: integer. The size of the linear units
161 |       residual: boolean. Whether to add a residual connection
162 |       dropout_keep_prob: float [0,1]. Probability of dropping something out
163 |       max_norm: boolean. Whether to clip weights to 1-norm
164 |       batch_norm: boolean. Whether to do batch normalization
165 |       dtype: type of the weigths. Usually tf.float32
166 |       idx: integer. Number of layer (for naming/scoping)
167 |     Returns
168 |       y: the batch after it leaves the block
169 |     """
170 | 
171 |     with vs.variable_scope( "two_linear_"+str(idx) ) as scope:
172 | 
173 |       input_size = int(xin.get_shape()[1])
174 | 
175 |       # Linear 1
176 |       w2 = tf.get_variable( name="w2_"+str(idx), initializer=kaiming, shape=[input_size, linear_size], dtype=dtype)
177 |       b2 = tf.get_variable( name="b2_"+str(idx), initializer=kaiming, shape=[linear_size], dtype=dtype)
178 |       w2 = tf.clip_by_norm(w2,1) if max_norm else w2
179 |       y = tf.matmul(xin, w2) + b2
180 |       if  batch_norm:
181 |         y = tf.layers.batch_normalization(y,training=self.isTraining,name="batch_normalization1"+str(idx))
182 | 
183 |       y = tf.nn.relu( y )
184 |       y = tf.nn.dropout( y, dropout_keep_prob )
185 | 
186 |       # Linear 2
187 |       w3 = tf.get_variable( name="w3_"+str(idx), initializer=kaiming, shape=[linear_size, linear_size], dtype=dtype)
188 |       b3 = tf.get_variable( name="b3_"+str(idx), initializer=kaiming, shape=[linear_size], dtype=dtype)
189 |       w3 = tf.clip_by_norm(w3,1) if max_norm else w3
190 |       y = tf.matmul(y, w3) + b3
191 | 
192 |       if  batch_norm:
193 |         y = tf.layers.batch_normalization(y,training=self.isTraining,name="batch_normalization2"+str(idx))
194 | 
195 |       y = tf.nn.relu( y )
196 |       y = tf.nn.dropout( y, dropout_keep_prob )
197 | 
198 |       # Residual every 2 blocks
199 |       y = (xin + y) if residual else y
200 | 
201 |     return y
202 | 
203 |   def step(self, session, encoder_inputs, decoder_outputs, dropout_keep_prob, isTraining=True):
204 |     """Run a step of the model feeding the given inputs.
205 | 
206 |     Args
207 |       session: tensorflow session to use
208 |       encoder_inputs: list of numpy vectors to feed as encoder inputs
209 |       decoder_outputs: list of numpy vectors that are the expected decoder outputs
210 |       dropout_keep_prob: (0,1] dropout keep probability
211 |       isTraining: whether to do the backward step or only forward
212 | 
213 |     Returns
214 |       if isTraining is True, a 4-tuple
215 |         loss: the computed loss of this batch
216 |         loss_summary: tf summary of this batch loss, to log on tensorboard
217 |         learning_rate_summary: tf summary of learnign rate to log on tensorboard
218 |         outputs: predicted 3d poses
219 |       if isTraining is False, a 3-tuple
220 |         (loss, loss_summary, outputs) same as above
221 |     """
222 | 
223 |     input_feed = {self.encoder_inputs: encoder_inputs,
224 |                   self.decoder_outputs: decoder_outputs,
225 |                   self.isTraining: isTraining,
226 |                   self.dropout_keep_prob: dropout_keep_prob}
227 | 
228 |     # Output feed: depends on whether we do a backward step or not.
229 |     if isTraining:
230 |       output_feed = [self.updates,       # Update Op that does SGD
231 |                      self.loss,
232 |                      self.loss_summary,
233 |                      self.learning_rate_summary,
234 |                      self.outputs]
235 | 
236 |       outputs = session.run( output_feed, input_feed )
237 |       return outputs[1], outputs[2], outputs[3], outputs[4]
238 | 
239 |     else:
240 |       output_feed = [self.loss, # Loss for this batch.
241 |                      self.loss_summary,
242 |                      self.outputs]
243 | 
244 |       outputs = session.run(output_feed, input_feed)
245 |       return outputs[0], outputs[1], outputs[2]  # No gradient norm
246 | 
247 |   def get_all_batches( self, data_x, data_y, camera_frame, training=True ):
248 |     """
249 |     Obtain a list of all the batches, randomly permutted
250 |     Args
251 |       data_x: dictionary with 2d inputs
252 |       data_y: dictionary with 3d expected outputs
253 |       camera_frame: whether the 3d data is in camera coordinates
254 |       training: True if this is a training batch. False otherwise.
255 | 
256 |     Returns
257 |       encoder_inputs: list of 2d batches
258 |       decoder_outputs: list of 3d batches
259 |     """
260 | 
261 |     # Figure out how many frames we have
262 |     n = 0
263 |     for key2d in data_x.keys():
264 |       n2d, _ = data_x[ key2d ].shape
265 |       n = n + n2d
266 | 
267 |     encoder_inputs  = np.zeros((n, self.input_size), dtype=float)
268 |     decoder_outputs = np.zeros((n, self.output_size), dtype=float)
269 | 
270 |     # Put all the data into big arrays
271 |     idx = 0
272 |     for key2d in data_x.keys():
273 |       (subj, b, fname) = key2d
274 |       # keys should be the same if 3d is in camera coordinates
275 |       key3d = key2d if (camera_frame) else (subj, b, '{0}.h5'.format(fname.split('.')[0]))
276 |       key3d = (subj, b, fname[:-3]) if fname.endswith('-sh') and camera_frame else key3d
277 | 
278 |       n2d, _ = data_x[ key2d ].shape
279 |       encoder_inputs[idx:idx+n2d, :]  = data_x[ key2d ]
280 |       decoder_outputs[idx:idx+n2d, :] = data_y[ key3d ]
281 |       idx = idx + n2d
282 | 
283 | 
284 |     if training:
285 |       # Randomly permute everything
286 |       idx = np.random.permutation( n )
287 |       encoder_inputs  = encoder_inputs[idx, :]
288 |       decoder_outputs = decoder_outputs[idx, :]
289 | 
290 |     # Make the number of examples a multiple of the batch size
291 |     n_extra  = n % self.batch_size
292 |     if n_extra > 0:  # Otherwise examples are already a multiple of batch size
293 |       encoder_inputs  = encoder_inputs[:-n_extra, :]
294 |       decoder_outputs = decoder_outputs[:-n_extra, :]
295 | 
296 |     n_batches = n // self.batch_size
297 |     encoder_inputs  = np.split( encoder_inputs, n_batches )
298 |     decoder_outputs = np.split( decoder_outputs, n_batches )
299 | 
300 |     return encoder_inputs, decoder_outputs
301 | 


--------------------------------------------------------------------------------
/src/predict_3dpose.py:
--------------------------------------------------------------------------------
  1 | 
  2 | """Predicting 3d poses from 2d joints"""
  3 | 
  4 | from __future__ import absolute_import
  5 | from __future__ import division
  6 | from __future__ import print_function
  7 | 
  8 | import math
  9 | import os
 10 | import random
 11 | import sys
 12 | import time
 13 | import h5py
 14 | import copy
 15 | 
 16 | import matplotlib.pyplot as plt
 17 | import numpy as np
 18 | from six.moves import xrange  # pylint: disable=redefined-builtin
 19 | import tensorflow as tf
 20 | import procrustes
 21 | 
 22 | import viz
 23 | import cameras
 24 | import data_utils
 25 | import linear_model
 26 | 
 27 | tf.app.flags.DEFINE_float("learning_rate", 1e-3, "Learning rate")
 28 | tf.app.flags.DEFINE_float("dropout", 1, "Dropout keep probability. 1 means no dropout")
 29 | tf.app.flags.DEFINE_integer("batch_size", 64, "Batch size to use during training")
 30 | tf.app.flags.DEFINE_integer("epochs", 200, "How many epochs we should train for")
 31 | tf.app.flags.DEFINE_boolean("camera_frame", False, "Convert 3d poses to camera coordinates")
 32 | tf.app.flags.DEFINE_boolean("max_norm", False, "Apply maxnorm constraint to the weights")
 33 | tf.app.flags.DEFINE_boolean("batch_norm", False, "Use batch_normalization")
 34 | 
 35 | # Data loading
 36 | tf.app.flags.DEFINE_boolean("predict_14", False, "predict 14 joints")
 37 | tf.app.flags.DEFINE_boolean("use_sh", False, "Use 2d pose predictions from StackedHourglass")
 38 | tf.app.flags.DEFINE_string("action","All", "The action to train on. 'All' means all the actions")
 39 | 
 40 | # Architecture
 41 | tf.app.flags.DEFINE_integer("linear_size", 1024, "Size of each model layer.")
 42 | tf.app.flags.DEFINE_integer("num_layers", 2, "Number of layers in the model.")
 43 | tf.app.flags.DEFINE_boolean("residual", False, "Whether to add a residual connection every 2 layers")
 44 | 
 45 | # Evaluation
 46 | tf.app.flags.DEFINE_boolean("procrustes", False, "Apply procrustes analysis at test time")
 47 | tf.app.flags.DEFINE_boolean("evaluateActionWise",False, "The dataset to use either h36m or heva")
 48 | 
 49 | # Directories
 50 | tf.app.flags.DEFINE_string("cameras_path","data/h36m/cameras.h5","Directory to load camera parameters")
 51 | tf.app.flags.DEFINE_string("data_dir",   "data/h36m/", "Data directory")
 52 | tf.app.flags.DEFINE_string("train_dir", "experiments", "Training directory.")
 53 | 
 54 | # Train or load
 55 | tf.app.flags.DEFINE_boolean("sample", False, "Set to True for sampling.")
 56 | tf.app.flags.DEFINE_boolean("use_cpu", False, "Whether to use the CPU")
 57 | tf.app.flags.DEFINE_integer("load", 0, "Try to load a previous checkpoint.")
 58 | 
 59 | # Misc
 60 | tf.app.flags.DEFINE_boolean("use_fp16", False, "Train using fp16 instead of fp32.")
 61 | 
 62 | FLAGS = tf.app.flags.FLAGS
 63 | 
 64 | train_dir = os.path.join( FLAGS.train_dir,
 65 |   FLAGS.action,
 66 |   'dropout_{0}'.format(FLAGS.dropout),
 67 |   'epochs_{0}'.format(FLAGS.epochs) if FLAGS.epochs > 0 else '',
 68 |   'lr_{0}'.format(FLAGS.learning_rate),
 69 |   'residual' if FLAGS.residual else 'not_residual',
 70 |   'depth_{0}'.format(FLAGS.num_layers),
 71 |   'linear_size{0}'.format(FLAGS.linear_size),
 72 |   'batch_size_{0}'.format(FLAGS.batch_size),
 73 |   'procrustes' if FLAGS.procrustes else 'no_procrustes',
 74 |   'maxnorm' if FLAGS.max_norm else 'no_maxnorm',
 75 |   'batch_normalization' if FLAGS.batch_norm else 'no_batch_normalization',
 76 |   'use_stacked_hourglass' if FLAGS.use_sh else 'not_stacked_hourglass',
 77 |   'predict_14' if FLAGS.predict_14 else 'predict_17')
 78 | 
 79 | print( train_dir )
 80 | summaries_dir = os.path.join( train_dir, "log" ) # Directory for TB summaries
 81 | 
 82 | # To avoid race conditions: https://github.com/tensorflow/tensorflow/issues/7448
 83 | os.system('mkdir -p {}'.format(summaries_dir))
 84 | 
 85 | def create_model( session, actions, batch_size ):
 86 |   """
 87 |   Create model and initialize it or load its parameters in a session
 88 | 
 89 |   Args
 90 |     session: tensorflow session
 91 |     actions: list of string. Actions to train/test on
 92 |     batch_size: integer. Number of examples in each batch
 93 |   Returns
 94 |     model: The created (or loaded) model
 95 |   Raises
 96 |     ValueError if asked to load a model, but the checkpoint specified by
 97 |     FLAGS.load cannot be found.
 98 |   """
 99 | 
100 |   model = linear_model.LinearModel(
101 |       FLAGS.linear_size,
102 |       FLAGS.num_layers,
103 |       FLAGS.residual,
104 |       FLAGS.batch_norm,
105 |       FLAGS.max_norm,
106 |       batch_size,
107 |       FLAGS.learning_rate,
108 |       summaries_dir,
109 |       FLAGS.predict_14,
110 |       dtype=tf.float16 if FLAGS.use_fp16 else tf.float32)
111 | 
112 |   if FLAGS.load <= 0:
113 |     # Create a new model from scratch
114 |     print("Creating model with fresh parameters.")
115 |     session.run( tf.global_variables_initializer() )
116 |     return model
117 | 
118 |   # Load a previously saved model
119 |   ckpt = tf.train.get_checkpoint_state( train_dir, latest_filename="checkpoint")
120 |   print( "train_dir", train_dir )
121 | 
122 |   if ckpt and ckpt.model_checkpoint_path:
123 |     # Check if the specific checkpoint exists
124 |     if FLAGS.load > 0:
125 |       if os.path.isfile(os.path.join(train_dir,"checkpoint-{0}.index".format(FLAGS.load))):
126 |         ckpt_name = os.path.join( os.path.join(train_dir,"checkpoint-{0}".format(FLAGS.load)) )
127 |       else:
128 |         raise ValueError("Asked to load checkpoint {0}, but it does not seem to exist".format(FLAGS.load))
129 |     else:
130 |       ckpt_name = os.path.basename( ckpt.model_checkpoint_path )
131 | 
132 |     print("Loading model {0}".format( ckpt_name ))
133 |     model.saver.restore( session, ckpt.model_checkpoint_path )
134 |     return model
135 |   else:
136 |     print("Could not find checkpoint. Aborting.")
137 |     raise( ValueError, "Checkpoint {0} does not seem to exist".format( ckpt.model_checkpoint_path ) )
138 | 
139 |   return model
140 | 
141 | def train():
142 |   """Train a linear model for 3d pose estimation"""
143 | 
144 |   actions = data_utils.define_actions( FLAGS.action )
145 | 
146 |   number_of_actions = len( actions )
147 | 
148 |   # Load camera parameters
149 |   SUBJECT_IDS = [1,5,6,7,8,9,11]
150 |   rcams = cameras.load_cameras(FLAGS.cameras_path, SUBJECT_IDS)
151 | 
152 |   # Load 3d data and load (or create) 2d projections
153 |   train_set_3d, test_set_3d, data_mean_3d, data_std_3d, dim_to_ignore_3d, dim_to_use_3d, train_root_positions, test_root_positions = data_utils.read_3d_data(
154 |     actions, FLAGS.data_dir, FLAGS.camera_frame, rcams, FLAGS.predict_14 )
155 | 
156 |   # Read stacked hourglass 2D predictions if use_sh, otherwise use groundtruth 2D projections
157 |   if FLAGS.use_sh:
158 |     train_set_2d, test_set_2d, data_mean_2d, data_std_2d, dim_to_ignore_2d, dim_to_use_2d = data_utils.read_2d_predictions(actions, FLAGS.data_dir)
159 |   else:
160 |     train_set_2d, test_set_2d, data_mean_2d, data_std_2d, dim_to_ignore_2d, dim_to_use_2d = data_utils.create_2d_data( actions, FLAGS.data_dir, rcams )
161 |   print( "done reading and normalizing data." )
162 | 
163 |   # Avoid using the GPU if requested
164 |   device_count = {"GPU": 0} if FLAGS.use_cpu else {"GPU": 1}
165 |   with tf.Session(config=tf.ConfigProto(
166 |     device_count=device_count,
167 |     allow_soft_placement=True )) as sess:
168 | 
169 |     # === Create the model ===
170 |     print("Creating %d bi-layers of %d units." % (FLAGS.num_layers, FLAGS.linear_size))
171 |     model = create_model( sess, actions, FLAGS.batch_size )
172 |     model.train_writer.add_graph( sess.graph )
173 |     print("Model created")
174 | 
175 |     #=== This is the training loop ===
176 |     step_time, loss, val_loss = 0.0, 0.0, 0.0
177 |     current_step = 0 if FLAGS.load <= 0 else FLAGS.load + 1
178 |     previous_losses = []
179 | 
180 |     step_time, loss = 0, 0
181 |     current_epoch = 0
182 |     log_every_n_batches = 100
183 | 
184 |     for _ in xrange( FLAGS.epochs ):
185 |       current_epoch = current_epoch + 1
186 | 
187 |       # === Load training batches for one epoch ===
188 |       encoder_inputs, decoder_outputs = model.get_all_batches( train_set_2d, train_set_3d, FLAGS.camera_frame, training=True )
189 |       nbatches = len( encoder_inputs )
190 |       print("There are {0} train batches".format( nbatches ))
191 |       start_time, loss = time.time(), 0.
192 | 
193 |       # === Loop through all the training batches ===
194 |       for i in range( nbatches ):
195 | 
196 |         if (i+1) % log_every_n_batches == 0:
197 |           # Print progress every log_every_n_batches batches
198 |           print("Working on epoch {0}, batch {1} / {2}... ".format( current_epoch, i+1, nbatches), end="" )
199 | 
200 |         enc_in, dec_out = encoder_inputs[i], decoder_outputs[i]
201 |         step_loss, loss_summary, lr_summary, _ =  model.step( sess, enc_in, dec_out, FLAGS.dropout, isTraining=True )
202 | 
203 |         if (i+1) % log_every_n_batches == 0:
204 |           # Log and print progress every log_every_n_batches batches
205 |           model.train_writer.add_summary( loss_summary, current_step )
206 |           model.train_writer.add_summary( lr_summary, current_step )
207 |           step_time = (time.time() - start_time)
208 |           start_time = time.time()
209 |           print("done in {0:.2f} ms".format( 1000*step_time / log_every_n_batches ) )
210 | 
211 |         loss += step_loss
212 |         current_step += 1
213 |         # === end looping through training batches ===
214 | 
215 |       loss = loss / nbatches
216 |       print("=============================\n"
217 |             "Global step:         %d\n"
218 |             "Learning rate:       %.2e\n"
219 |             "Train loss avg:      %.4f\n"
220 |             "=============================" % (model.global_step.eval(),
221 |             model.learning_rate.eval(), loss) )
222 |       # === End training for an epoch ===
223 | 
224 |       # === Testing after this epoch ===
225 |       isTraining = False
226 | 
227 |       if FLAGS.evaluateActionWise:
228 | 
229 |         print("{0:=^12} {1:=^6}".format("Action", "mm")) # line of 30 equal signs
230 | 
231 |         cum_err = 0
232 |         for action in actions:
233 | 
234 |           print("{0:<12} ".format(action), end="")
235 |           # Get 2d and 3d testing data for this action
236 |           action_test_set_2d = get_action_subset( test_set_2d, action )
237 |           action_test_set_3d = get_action_subset( test_set_3d, action )
238 |           encoder_inputs, decoder_outputs = model.get_all_batches( action_test_set_2d, action_test_set_3d, FLAGS.camera_frame, training=False)
239 | 
240 |           act_err, _, step_time, loss = evaluate_batches( sess, model,
241 |             data_mean_3d, data_std_3d, dim_to_use_3d, dim_to_ignore_3d,
242 |             data_mean_2d, data_std_2d, dim_to_use_2d, dim_to_ignore_2d,
243 |             current_step, encoder_inputs, decoder_outputs )
244 |           cum_err = cum_err + act_err
245 | 
246 |           print("{0:>6.2f}".format(act_err))
247 | 
248 |         summaries = sess.run( model.err_mm_summary, {model.err_mm: float(cum_err/float(len(actions)))} )
249 |         model.test_writer.add_summary( summaries, current_step )
250 |         print("{0:<12} {1:>6.2f}".format("Average", cum_err/float(len(actions) )))
251 |         print("{0:=^19}".format(''))
252 | 
253 |       else:
254 | 
255 |         n_joints = 17 if not(FLAGS.predict_14) else 14
256 |         encoder_inputs, decoder_outputs = model.get_all_batches( test_set_2d, test_set_3d, FLAGS.camera_frame, training=False)
257 | 
258 |         total_err, joint_err, step_time, loss = evaluate_batches( sess, model,
259 |           data_mean_3d, data_std_3d, dim_to_use_3d, dim_to_ignore_3d,
260 |           data_mean_2d, data_std_2d, dim_to_use_2d, dim_to_ignore_2d,
261 |           current_step, encoder_inputs, decoder_outputs, current_epoch )
262 | 
263 |         print("=============================\n"
264 |               "Step-time (ms):      %.4f\n"
265 |               "Val loss avg:        %.4f\n"
266 |               "Val error avg (mm):  %.2f\n"
267 |               "=============================" % ( 1000*step_time, loss, total_err ))
268 | 
269 |         for i in range(n_joints):
270 |           # 6 spaces, right-aligned, 5 decimal places
271 |           print("Error in joint {0:02d} (mm): {1:>5.2f}".format(i+1, joint_err[i]))
272 |         print("=============================")
273 | 
274 |         # Log the error to tensorboard
275 |         summaries = sess.run( model.err_mm_summary, {model.err_mm: total_err} )
276 |         model.test_writer.add_summary( summaries, current_step )
277 | 
278 |       # Save the model
279 |       print( "Saving the model... ", end="" )
280 |       start_time = time.time()
281 |       model.saver.save(sess, os.path.join(train_dir, 'checkpoint'), global_step=current_step )
282 |       print( "done in {0:.2f} ms".format(1000*(time.time() - start_time)) )
283 | 
284 |       # Reset global time and loss
285 |       step_time, loss = 0, 0
286 | 
287 |       sys.stdout.flush()
288 | 
289 | 
290 | def get_action_subset( poses_set, action ):
291 |   """
292 |   Given a preloaded dictionary of poses, load the subset of a particular action
293 | 
294 |   Args
295 |     poses_set: dictionary with keys k=(subject, action, seqname),
296 |       values v=(nxd matrix of poses)
297 |     action: string. The action that we want to filter out
298 |   Returns
299 |     poses_subset: dictionary with same structure as poses_set, but only with the
300 |       specified action.
301 |   """
302 |   return {k:v for k, v in poses_set.items() if k[1] == action}
303 | 
304 | 
305 | def evaluate_batches( sess, model,
306 |   data_mean_3d, data_std_3d, dim_to_use_3d, dim_to_ignore_3d,
307 |   data_mean_2d, data_std_2d, dim_to_use_2d, dim_to_ignore_2d,
308 |   current_step, encoder_inputs, decoder_outputs, current_epoch=0 ):
309 |   """
310 |   Generic method that evaluates performance of a list of batches.
311 |   May be used to evaluate all actions or a single action.
312 | 
313 |   Args
314 |     sess
315 |     model
316 |     data_mean_3d
317 |     data_std_3d
318 |     dim_to_use_3d
319 |     dim_to_ignore_3d
320 |     data_mean_2d
321 |     data_std_2d
322 |     dim_to_use_2d
323 |     dim_to_ignore_2d
324 |     current_step
325 |     encoder_inputs
326 |     decoder_outputs
327 |     current_epoch
328 |   Returns
329 | 
330 |     total_err
331 |     joint_err
332 |     step_time
333 |     loss
334 |   """
335 | 
336 |   n_joints = 17 if not(FLAGS.predict_14) else 14
337 |   nbatches = len( encoder_inputs )
338 | 
339 |   # Loop through test examples
340 |   all_dists, start_time, loss = [], time.time(), 0.
341 |   log_every_n_batches = 100
342 |   for i in range(nbatches):
343 | 
344 |     if current_epoch > 0 and (i+1) % log_every_n_batches == 0:
345 |       print("Working on test epoch {0}, batch {1} / {2}".format( current_epoch, i+1, nbatches) )
346 | 
347 |     enc_in, dec_out = encoder_inputs[i], decoder_outputs[i]
348 |     dp = 1.0 # dropout keep probability is always 1 at test time
349 |     step_loss, loss_summary, poses3d = model.step( sess, enc_in, dec_out, dp, isTraining=False )
350 |     loss += step_loss
351 | 
352 |     # denormalize
353 |     enc_in  = data_utils.unNormalizeData( enc_in,  data_mean_2d, data_std_2d, dim_to_ignore_2d )
354 |     dec_out = data_utils.unNormalizeData( dec_out, data_mean_3d, data_std_3d, dim_to_ignore_3d )
355 |     poses3d = data_utils.unNormalizeData( poses3d, data_mean_3d, data_std_3d, dim_to_ignore_3d )
356 | 
357 |     # Keep only the relevant dimensions
358 |     dtu3d = np.hstack( (np.arange(3), dim_to_use_3d) ) if not(FLAGS.predict_14) else  dim_to_use_3d
359 | 
360 |     dec_out = dec_out[:, dtu3d]
361 |     poses3d = poses3d[:, dtu3d]
362 | 
363 |     assert dec_out.shape[0] == FLAGS.batch_size
364 |     assert poses3d.shape[0] == FLAGS.batch_size
365 | 
366 |     if FLAGS.procrustes:
367 |       # Apply per-frame procrustes alignment if asked to do so
368 |       for j in range(FLAGS.batch_size):
369 |         gt  = np.reshape(dec_out[j,:],[-1,3])
370 |         out = np.reshape(poses3d[j,:],[-1,3])
371 |         _, Z, T, b, c = procrustes.compute_similarity_transform(gt,out,compute_optimal_scale=True)
372 |         out = (b*out.dot(T))+c
373 | 
374 |         poses3d[j,:] = np.reshape(out,[-1,17*3] ) if not(FLAGS.predict_14) else np.reshape(out,[-1,14*3] )
375 | 
376 |     # Compute Euclidean distance error per joint
377 |     sqerr = (poses3d - dec_out)**2 # Squared error between prediction and expected output
378 |     dists = np.zeros( (sqerr.shape[0], n_joints) ) # Array with L2 error per joint in mm
379 |     dist_idx = 0
380 |     for k in np.arange(0, n_joints*3, 3):
381 |       # Sum across X,Y, and Z dimenstions to obtain L2 distance
382 |       dists[:,dist_idx] = np.sqrt( np.sum( sqerr[:, k:k+3], axis=1 ))
383 |       dist_idx = dist_idx + 1
384 | 
385 |     all_dists.append(dists)
386 |     assert sqerr.shape[0] == FLAGS.batch_size
387 | 
388 |   step_time = (time.time() - start_time) / nbatches
389 |   loss      = loss / nbatches
390 | 
391 |   all_dists = np.vstack( all_dists )
392 | 
393 |   # Error per joint and total for all passed batches
394 |   joint_err = np.mean( all_dists, axis=0 )
395 |   total_err = np.mean( all_dists )
396 | 
397 |   return total_err, joint_err, step_time, loss
398 | 
399 | 
400 | def sample():
401 |   """Get samples from a model and visualize them"""
402 | 
403 |   actions = data_utils.define_actions( FLAGS.action )
404 | 
405 |   # Load camera parameters
406 |   SUBJECT_IDS = [1,5,6,7,8,9,11]
407 |   rcams = cameras.load_cameras(FLAGS.cameras_path, SUBJECT_IDS)
408 | 
409 |   # Load 3d data and load (or create) 2d projections
410 |   train_set_3d, test_set_3d, data_mean_3d, data_std_3d, dim_to_ignore_3d, dim_to_use_3d, train_root_positions, test_root_positions = data_utils.read_3d_data(
411 |     actions, FLAGS.data_dir, FLAGS.camera_frame, rcams, FLAGS.predict_14 )
412 | 
413 |   if FLAGS.use_sh:
414 |     train_set_2d, test_set_2d, data_mean_2d, data_std_2d, dim_to_ignore_2d, dim_to_use_2d = data_utils.read_2d_predictions(actions, FLAGS.data_dir)
415 |   else:
416 |     train_set_2d, test_set_2d, data_mean_2d, data_std_2d, dim_to_ignore_2d, dim_to_use_2d = data_utils.create_2d_data( actions, FLAGS.data_dir, rcams )
417 |   print( "done reading and normalizing data." )
418 | 
419 |   device_count = {"GPU": 0} if FLAGS.use_cpu else {"GPU": 1}
420 |   with tf.Session(config=tf.ConfigProto( device_count = device_count )) as sess:
421 |     # === Create the model ===
422 |     print("Creating %d layers of %d units." % (FLAGS.num_layers, FLAGS.linear_size))
423 |     batch_size = 128
424 |     model = create_model(sess, actions, batch_size)
425 |     print("Model loaded")
426 | 
427 |     for key2d in test_set_2d.keys():
428 | 
429 |       (subj, b, fname) = key2d
430 |       print( "Subject: {}, action: {}, fname: {}".format(subj, b, fname) )
431 | 
432 |       # keys should be the same if 3d is in camera coordinates
433 |       key3d = key2d if FLAGS.camera_frame else (subj, b, '{0}.h5'.format(fname.split('.')[0]))
434 |       key3d = (subj, b, fname[:-3]) if (fname.endswith('-sh')) and FLAGS.camera_frame else key3d
435 | 
436 |       enc_in  = test_set_2d[ key2d ]
437 |       n2d, _ = enc_in.shape
438 |       dec_out = test_set_3d[ key3d ]
439 |       n3d, _ = dec_out.shape
440 |       assert n2d == n3d
441 | 
442 |       # Split into about-same-size batches
443 |       enc_in   = np.array_split( enc_in,  n2d // batch_size )
444 |       dec_out  = np.array_split( dec_out, n3d // batch_size )
445 |       all_poses_3d = []
446 | 
447 |       for bidx in range( len(enc_in) ):
448 | 
449 |         # Dropout probability 0 (keep probability 1) for sampling
450 |         dp = 1.0
451 |         _, _, poses3d = model.step(sess, enc_in[bidx], dec_out[bidx], dp, isTraining=False)
452 | 
453 |         # denormalize
454 |         enc_in[bidx]  = data_utils.unNormalizeData(  enc_in[bidx], data_mean_2d, data_std_2d, dim_to_ignore_2d )
455 |         dec_out[bidx] = data_utils.unNormalizeData( dec_out[bidx], data_mean_3d, data_std_3d, dim_to_ignore_3d )
456 |         poses3d = data_utils.unNormalizeData( poses3d, data_mean_3d, data_std_3d, dim_to_ignore_3d )
457 |         all_poses_3d.append( poses3d )
458 | 
459 |       # Put all the poses together
460 |       enc_in, dec_out, poses3d = map( np.vstack, [enc_in, dec_out, all_poses_3d] )
461 | 
462 |       # Convert back to world coordinates
463 |       if FLAGS.camera_frame:
464 |         N_CAMERAS = 4
465 |         N_JOINTS_H36M = 32
466 | 
467 |         # Add global position back
468 |         dec_out = dec_out + np.tile( test_root_positions[ key3d ], [1,N_JOINTS_H36M] )
469 | 
470 |         # Load the appropriate camera
471 |         subj, _, sname = key3d
472 | 
473 |         cname = sname.split('.')[1] # <-- camera name
474 |         scams = {(subj,c+1): rcams[(subj,c+1)] for c in range(N_CAMERAS)} # cams of this subject
475 |         scam_idx = [scams[(subj,c+1)][-1] for c in range(N_CAMERAS)].index( cname ) # index of camera used
476 |         the_cam  = scams[(subj, scam_idx+1)] # <-- the camera used
477 |         R, T, f, c, k, p, name = the_cam
478 |         assert name == cname
479 | 
480 |         def cam2world_centered(data_3d_camframe):
481 |           data_3d_worldframe = cameras.camera_to_world_frame(data_3d_camframe.reshape((-1, 3)), R, T)
482 |           data_3d_worldframe = data_3d_worldframe.reshape((-1, N_JOINTS_H36M*3))
483 |           # subtract root translation
484 |           return data_3d_worldframe - np.tile( data_3d_worldframe[:,:3], (1,N_JOINTS_H36M) )
485 | 
486 |         # Apply inverse rotation and translation
487 |         dec_out = cam2world_centered(dec_out)
488 |         poses3d = cam2world_centered(poses3d)
489 | 
490 |   # Grab a random batch to visualize
491 |   enc_in, dec_out, poses3d = map( np.vstack, [enc_in, dec_out, poses3d] )
492 |   idx = np.random.permutation( enc_in.shape[0] )
493 |   enc_in, dec_out, poses3d = enc_in[idx, :], dec_out[idx, :], poses3d[idx, :]
494 | 
495 |   # Visualize random samples
496 |   import matplotlib.gridspec as gridspec
497 | 
498 |   # 1080p	= 1,920 x 1,080
499 |   fig = plt.figure( figsize=(19.2, 10.8) )
500 | 
501 |   gs1 = gridspec.GridSpec(5, 9) # 5 rows, 9 columns
502 |   gs1.update(wspace=-0.00, hspace=0.05) # set the spacing between axes.
503 |   plt.axis('off')
504 | 
505 |   subplot_idx, exidx = 1, 1
506 |   nsamples = 15
507 |   for i in np.arange( nsamples ):
508 | 
509 |     # Plot 2d pose
510 |     ax1 = plt.subplot(gs1[subplot_idx-1])
511 |     p2d = enc_in[exidx,:]
512 |     viz.show2Dpose( p2d, ax1 )
513 |     ax1.invert_yaxis()
514 | 
515 |     # Plot 3d gt
516 |     ax2 = plt.subplot(gs1[subplot_idx], projection='3d')
517 |     p3d = dec_out[exidx,:]
518 |     viz.show3Dpose( p3d, ax2 )
519 | 
520 |     # Plot 3d predictions
521 |     ax3 = plt.subplot(gs1[subplot_idx+1], projection='3d')
522 |     p3d = poses3d[exidx,:]
523 |     viz.show3Dpose( p3d, ax3, lcolor="#9b59b6", rcolor="#2ecc71" )
524 | 
525 |     exidx = exidx + 1
526 |     subplot_idx = subplot_idx + 3
527 | 
528 |   plt.show()
529 | 
530 | def main(_):
531 |   if FLAGS.sample:
532 |     sample()
533 |   else:
534 |     train()
535 | 
536 | if __name__ == "__main__":
537 |   tf.app.run()
538 | 


--------------------------------------------------------------------------------
/src/procrustes.py:
--------------------------------------------------------------------------------
 1 | 
 2 | def compute_similarity_transform(X, Y, compute_optimal_scale=False):
 3 |   """
 4 |   A port of MATLAB's `procrustes` function to Numpy.
 5 |   Adapted from http://stackoverflow.com/a/18927641/1884420
 6 | 
 7 |   Args
 8 |     X: array NxM of targets, with N number of points and M point dimensionality
 9 |     Y: array NxM of inputs
10 |     compute_optimal_scale: whether we compute optimal scale or force it to be 1
11 | 
12 |   Returns:
13 |     d: squared error after transformation
14 |     Z: transformed Y
15 |     T: computed rotation
16 |     b: scaling
17 |     c: translation
18 |   """
19 |   import numpy as np
20 | 
21 |   muX = X.mean(0)
22 |   muY = Y.mean(0)
23 | 
24 |   X0 = X - muX
25 |   Y0 = Y - muY
26 | 
27 |   ssX = (X0**2.).sum()
28 |   ssY = (Y0**2.).sum()
29 | 
30 |   # centred Frobenius norm
31 |   normX = np.sqrt(ssX)
32 |   normY = np.sqrt(ssY)
33 | 
34 |   # scale to equal (unit) norm
35 |   X0 = X0 / normX
36 |   Y0 = Y0 / normY
37 | 
38 |   # optimum rotation matrix of Y
39 |   A = np.dot(X0.T, Y0)
40 |   U,s,Vt = np.linalg.svd(A,full_matrices=False)
41 |   V = Vt.T
42 |   T = np.dot(V, U.T)
43 | 
44 |   # Make sure we have a rotation
45 |   detT = np.linalg.det(T)
46 |   V[:,-1] *= np.sign( detT )
47 |   s[-1]   *= np.sign( detT )
48 |   T = np.dot(V, U.T)
49 | 
50 |   traceTA = s.sum()
51 | 
52 |   if compute_optimal_scale:  # Compute optimum scaling of Y.
53 |     b = traceTA * normX / normY
54 |     d = 1 - traceTA**2
55 |     Z = normX*traceTA*np.dot(Y0, T) + muX
56 |   else:  # If no scaling allowed
57 |     b = 1
58 |     d = 1 + ssY/ssX - 2 * traceTA * normY / normX
59 |     Z = normY*np.dot(Y0, T) + muX
60 | 
61 |   c = muX - b*np.dot(muY, T)
62 | 
63 |   return d, Z, T, b, c
64 | 


--------------------------------------------------------------------------------
/src/viz.py:
--------------------------------------------------------------------------------
  1 | 
  2 | """Functions to visualize human poses"""
  3 | 
  4 | import matplotlib.pyplot as plt
  5 | import data_utils
  6 | import numpy as np
  7 | import h5py
  8 | import os
  9 | from mpl_toolkits.mplot3d import Axes3D
 10 | 
 11 | def show3Dpose(channels, ax, lcolor="#3498db", rcolor="#e74c3c", add_labels=False): # blue, orange
 12 |   """
 13 |   Visualize a 3d skeleton
 14 | 
 15 |   Args
 16 |     channels: 96x1 vector. The pose to plot.
 17 |     ax: matplotlib 3d axis to draw on
 18 |     lcolor: color for left part of the body
 19 |     rcolor: color for right part of the body
 20 |     add_labels: whether to add coordinate labels
 21 |   Returns
 22 |     Nothing. Draws on ax.
 23 |   """
 24 | 
 25 |   assert channels.size == len(data_utils.H36M_NAMES)*3, "channels should have 96 entries, it has %d instead" % channels.size
 26 |   vals = np.reshape( channels, (len(data_utils.H36M_NAMES), -1) )
 27 | 
 28 |   I   = np.array([1,2,3,1,7,8,1, 13,14,15,14,18,19,14,26,27])-1 # start points
 29 |   J   = np.array([2,3,4,7,8,9,13,14,15,16,18,19,20,26,27,28])-1 # end points
 30 |   LR  = np.array([1,1,1,0,0,0,0, 0, 0, 0, 0, 0, 0, 1, 1, 1], dtype=bool)
 31 | 
 32 |   # Make connection matrix
 33 |   for i in np.arange( len(I) ):
 34 |     x, y, z = [np.array( [vals[I[i], j], vals[J[i], j]] ) for j in range(3)]
 35 |     ax.plot(x, y, z, lw=2, c=lcolor if LR[i] else rcolor)
 36 | 
 37 |   RADIUS = 750 # space around the subject
 38 |   xroot, yroot, zroot = vals[0,0], vals[0,1], vals[0,2]
 39 |   ax.set_xlim3d([-RADIUS+xroot, RADIUS+xroot])
 40 |   ax.set_zlim3d([-RADIUS+zroot, RADIUS+zroot])
 41 |   ax.set_ylim3d([-RADIUS+yroot, RADIUS+yroot])
 42 | 
 43 |   if add_labels:
 44 |     ax.set_xlabel("x")
 45 |     ax.set_ylabel("y")
 46 |     ax.set_zlabel("z")
 47 | 
 48 |   # Get rid of the ticks and tick labels
 49 |   ax.set_xticks([])
 50 |   ax.set_yticks([])
 51 |   ax.set_zticks([])
 52 | 
 53 |   ax.get_xaxis().set_ticklabels([])
 54 |   ax.get_yaxis().set_ticklabels([])
 55 |   ax.set_zticklabels([])
 56 |   ax.set_aspect('equal')
 57 | 
 58 |   # Get rid of the panes (actually, make them white)
 59 |   white = (1.0, 1.0, 1.0, 0.0)
 60 |   ax.w_xaxis.set_pane_color(white)
 61 |   ax.w_yaxis.set_pane_color(white)
 62 |   # Keep z pane
 63 | 
 64 |   # Get rid of the lines in 3d
 65 |   ax.w_xaxis.line.set_color(white)
 66 |   ax.w_yaxis.line.set_color(white)
 67 |   ax.w_zaxis.line.set_color(white)
 68 | 
 69 | def show2Dpose(channels, ax, lcolor="#3498db", rcolor="#e74c3c", add_labels=False):
 70 |   """
 71 |   Visualize a 2d skeleton
 72 | 
 73 |   Args
 74 |     channels: 64x1 vector. The pose to plot.
 75 |     ax: matplotlib axis to draw on
 76 |     lcolor: color for left part of the body
 77 |     rcolor: color for right part of the body
 78 |     add_labels: whether to add coordinate labels
 79 |   Returns
 80 |     Nothing. Draws on ax.
 81 |   """
 82 | 
 83 |   assert channels.size == len(data_utils.H36M_NAMES)*2, "channels should have 64 entries, it has %d instead" % channels.size
 84 |   vals = np.reshape( channels, (len(data_utils.H36M_NAMES), -1) )
 85 | 
 86 |   I  = np.array([1,2,3,1,7,8,1, 13,14,14,18,19,14,26,27])-1 # start points
 87 |   J  = np.array([2,3,4,7,8,9,13,14,16,18,19,20,26,27,28])-1 # end points
 88 |   LR = np.array([1,1,1,0,0,0,0, 0, 0, 0, 0, 0, 1, 1, 1], dtype=bool)
 89 | 
 90 |   # Make connection matrix
 91 |   for i in np.arange( len(I) ):
 92 |     x, y = [np.array( [vals[I[i], j], vals[J[i], j]] ) for j in range(2)]
 93 |     ax.plot(x, y, lw=2, c=lcolor if LR[i] else rcolor)
 94 | 
 95 |   # Get rid of the ticks
 96 |   ax.set_xticks([])
 97 |   ax.set_yticks([])
 98 | 
 99 |   # Get rid of tick labels
100 |   ax.get_xaxis().set_ticklabels([])
101 |   ax.get_yaxis().set_ticklabels([])
102 | 
103 |   RADIUS = 350 # space around the subject
104 |   xroot, yroot = vals[0,0], vals[0,1]
105 |   ax.set_xlim([-RADIUS+xroot, RADIUS+xroot])
106 |   ax.set_ylim([-RADIUS+yroot, RADIUS+yroot])
107 |   if add_labels:
108 |     ax.set_xlabel("x")
109 |     ax.set_ylabel("z")
110 | 
111 |   ax.set_aspect('equal')
112 | 


--------------------------------------------------------------------------------