└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Human Pose Estimation 101 2 | 3 | If you want a slightly more rigorous tutorial and understand the basics of Human Pose Estimation and how the field has evolved, check out these articles I published on [2D Pose Estimation](https://blog.nanonets.com/human-pose-estimation-2d-guide/?utm_source=github&utm_medium=social&utm_campaign=pose&utm_content=cbsudux) and [3D Pose Estimation](https://blog.nanonets.com/human-pose-estimation-3d-guide/) 4 | 5 | 6 | ## Table of Contents 7 | - [Basics](#basics) 8 | - [Loss](#loss) 9 | - [Evaluation metrics](#evaluation-metrics) 10 | - [PCP](#percentage-of-correct-parts---pcp) 11 | - [PCK](#percentage-of-correct-key-points---pck) 12 | - [PDJ](#percentage-of-detected-joints---pdj) 13 | - [MPJPE](#mean-per-joint-position-error---mpjpe) 14 | - [AUC](#auc) 15 | - [Important applications](#important-applications) 16 | - [Extra](#Informative-roadmap-on-2d-human-pose-estimation-research) 17 | 18 | 19 | ## Basics 20 | - Defined as the problem of localization of human joints (or) keypoints 21 | - A rigid body consists of joints and rigid parts. A body with strong articulation is a body with strong contortion. 22 | - Pose Estimation is the search for a specific pose in space of all articulated poses 23 | - Number of keypoints varies with dataset - LSP has 14, MPII has 16, 16 are used in Human3.6m 24 | - Classifed into 2D and 3D Pose Estimation 25 | - __2D Pose Estimation__ 26 | - Estimate a 2D pose (x,y) coordinates for each joint in pixel space from a RGB image 27 | - __3D Pose Estimation__ 28 | - Estimate a 3D pose (x,y,z) coordinates in metric space from a RGB image, or in previous works, data from a RGB-D sensor. (However, research in the past few years is heavily focussed on generating 3D poses from 2D images / 2D videos) 29 | 30 | ## Loss 31 | - Most commonly used loss function - Mean Squared Error, MSE(Least Squares Loss) 32 | - This is a regression problem. The model will try to regress to the the correct coordinates, i.e move to the ground truth coordinatate’s in small increments. The model is trained to output continuous coordinates using a Mean Squared Error loss function 33 | 34 | ## Evaluation metrics 35 | 36 | ### Percentage of Correct Parts - PCP 37 | - A limb is considered detected and a correct part if the distance between the two predicted joint locations and the true limb joint locations is at most half of the limb length (PCP at 0.5 ) 38 | - Measures detection rate of limbs 39 | - Cons - penalizes shorter limbs 40 | - __Calculation__ 41 | - For a specific part, PCP = (No. of correct parts for entire dataset) / (No. of total parts for entire dataset) 42 | - Take a dataset with 10 images and 1 pose per image. Each pose has 8 parts - ( upper arm, lower arm, upper leg, lower leg ) x2 43 | - No of upper arms = 10 * 2 = 20 44 | - No of lower arms = 20 45 | - No of lower legs = No of upper legs = 20 46 | - If upper arm is detected correct for 17 out of the 20 upper arms i.e 17 ( 10 right arms and 7 left) → PCP = 17/20 = 85% 47 | - Higher the better 48 | 49 | ### Percentage of Correct Key-points - PCK 50 | - Detected joint is considered correct if the distance between the predicted and the true joint is within a certain threshold (threshold varies) 51 | - PCKh@0.5 is when the threshold = 50% of the head bone link 52 | - PCK@0.2 == Distance between predicted and true joint < 0.2 * torso diameter 53 | - Sometimes 150 mm is taken as the threshold 54 | - Head, shoulder, Elbow, Wrist, Hip, Knee, Ankle → Keypoints 55 | - PCK is used for 2D and 3D (PCK3D) 56 | - Higher the better 57 | 58 | ### Percentage of Detected Joints - PDJ 59 | - Detected joint is considered correct if the distance between the predicted and the true joint is within a certain fraction of the torso diameter 60 | - Alleviates the shorter limb problem since shorter limbs have smaller torsos 61 | - PDJ at 0.2 → Distance between predicted and true join < 0.2 * torso diameter 62 | - Typically used for 2D Pose Estimation 63 | - Higher the better 64 | 65 | ### Mean Per Joint Position Error - MPJPE 66 | - Per joint position error = Euclidean distance between ground truth and prediction for a joint 67 | - Mean per joint position error = Mean of per joint position error for all k joints (Typically, k = 16) 68 | - Calculated after aligning the root joints (typically the pelvis) of the estimated and groundtruth 3D pose. 69 | - __PA MPJPE__ 70 | - Procrustes analysis MPJPE. 71 | - MPJPE calculated after the estimated 3D pose is aligned to the groundtruth by the [Procrustes method](https://www.coursera.org/lecture/robotics-perception/pose-from-3d-point-correspondences-the-procrustes-problem-X22IH) 72 | - Procrustes method is simply a similarity transformation 73 | - Lower the better 74 | - Used for 3D Pose Estimation 75 | 76 | ### AUC 77 | 79 | 80 | ## Important Applications 81 | - Activity Analysis 82 | - Human-Computer Interaction (HCI) 83 | - Virtual Reality 84 | - Augmented Reality 85 | - Amazon Go presents an important domain for the application of Human Pose Estimation. Cameras track and recognize people and their actions, for which Pose Estimation is an important component. Entities relying on services that track and measure human activities rely heavily on human Pose Estimation 86 | 87 | ## Informative roadmap on 2D Human Pose Estimation research 88 | - [Presentation by Wei Yang](https://www.slideshare.net/plutoyang/mmlab-seminar-2016-deep-learning-for-human-pose-estimation) 89 | 90 | 91 | 92 | --------------------------------------------------------------------------------