├── README.md ├── experiments └── coco │ └── lp_net50 │ ├── 256x192_d256x2_adam_lr1e-3_lp.yaml │ └── 256x192_d256x2_adam_lr1e-3_lp_not_gcb.yaml ├── images └── 0.jpg ├── lib ├── Makefile ├── __init__.py ├── core │ ├── __pycache__ │ │ ├── config.cpython-37.pyc │ │ ├── evaluate.cpython-37.pyc │ │ ├── function.cpython-37.pyc │ │ ├── inference.cpython-37.pyc │ │ ├── loss.cpython-37.pyc │ │ └── softargmax.cpython-37.pyc │ ├── config.py │ ├── evaluate.py │ ├── function.py │ ├── inference.py │ ├── loss.py │ └── softargmax.py ├── dataset │ ├── JointsDataset.py │ ├── JointsDataset_test.py │ ├── __init__.py │ ├── __pycache__ │ │ ├── JointsDataset.cpython-37.pyc │ │ ├── __init__.cpython-37.pyc │ │ ├── coco.cpython-37.pyc │ │ └── mpii.cpython-37.pyc │ ├── coco.py │ ├── coco_test.py │ └── mpii.py ├── models │ ├── Untitled1.ipynb │ ├── __init__.py │ ├── __pycache__ │ │ ├── __init__.cpython-37.pyc │ │ ├── lp_net.cpython-37.pyc │ │ └── pose_resnet.cpython-37.pyc │ ├── ct │ │ ├── __init__.py │ │ ├── __pycache__ │ │ │ ├── __init__.cpython-37.pyc │ │ │ └── context_block.cpython-37.pyc │ │ └── context_block.py │ ├── lp_net.py │ └── pose_resnet.py ├── nms │ ├── __init__.py │ ├── __pycache__ │ │ ├── __init__.cpython-37.pyc │ │ └── nms.cpython-37.pyc │ ├── cpu_nms.c │ ├── cpu_nms.cpython-37m-x86_64-linux-gnu.so │ ├── cpu_nms.pyx │ ├── gpu_nms.cpp │ ├── gpu_nms.cpython-37m-x86_64-linux-gnu.so │ ├── gpu_nms.hpp │ ├── gpu_nms.pyx │ ├── nms.py │ ├── nms_kernel.cu │ └── setup.py └── utils │ ├── __init__.py │ ├── __pycache__ │ ├── __init__.cpython-37.pyc │ ├── transforms.cpython-37.pyc │ ├── utils.cpython-37.pyc │ └── vis.cpython-37.pyc │ ├── transforms.py │ ├── utils.py │ ├── vis.py │ └── zipreader.py ├── models └── lp_coco │ ├── lp_net_50_256x192_with_gcb.pth.tar │ └── lp_net_50_256x192_without_gcb.pth.tar ├── pose_estimation ├── __pycache__ │ └── _init_paths.cpython-37.pyc ├── _init_paths.py ├── demo.py ├── train.py └── valid.py └── requirements.txt /README.md: -------------------------------------------------------------------------------- 1 | # Simple and Lightwight Human Pose Estimation 2 | 3 | ## Introduction 4 | On COCO keypoints valid dataset, if with_gcb module achieves **66.5 of mAP**, else **64.4 of mAp**
5 | 6 | ## Main Results 7 | ### Results on COCO val2017 dataset 8 | | Arch | with_GCB | AP | Ap .5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) | 9 | | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | 10 | | 256x192_lp_net_50_d256d256 | **yes** | 0.665 | 0.903 | 0.746 | 0.644 | 0.697 | 0.700 | 0.911 | 0.771 | 0.672 | 0.743 | 11 | | 256x192_lp_net_50_d256d256 | **no** | 0.644 | 0.885 | 0.715 | 0.619 | 0.685 | 0.679 | 0.898 | 0.742 | 0.647 | 0.725 | 12 | 13 | ### Note: 14 | - Flip test is used. 15 | 16 | ## Environment 17 | The code is developed using python 3.6 on Ubuntu 16.04. NVIDIA GPUs are needed. The code is developed and tested using 4 NVIDIA P100 GPU cards. Other platforms or GPU cards are not fully tested. 18 | 19 | ## Quick start 20 | ### Installation 21 | 1. Install pytorch >= v0.4.0 following [official instruction](https://pytorch.org/). 22 | 2. Disable cudnn for batch_norm: 23 | ``` 24 | # PYTORCH=/path/to/pytorch 25 | # for pytorch v0.4.0 26 | sed -i "1194s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py 27 | # for pytorch v0.4.1 28 | sed -i "1254s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py 29 | ``` 30 | Note that instructions like # PYTORCH=/path/to/pytorch indicate that you should pick a path where you'd like to have pytorch installed and then set an environment variable (PYTORCH in this case) accordingly. 31 | 1. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}. 32 | 2. Install dependencies: 33 | ``` 34 | pip install -r requirements.txt 35 | ``` 36 | 3. Make libs: 37 | ``` 38 | cd ${POSE_ROOT}/lib 39 | make 40 | ``` 41 | 3. Install [COCOAPI](https://github.com/cocodataset/cocoapi): 42 | ``` 43 | # COCOAPI=/path/to/clone/cocoapi 44 | git clone https://github.com/cocodataset/cocoapi.git $COCOAPI 45 | cd $COCOAPI/PythonAPI 46 | # Install into global site-packages 47 | make install 48 | # Alternatively, if you do not have permissions or prefer 49 | # not to install the COCO API into global site-packages 50 | python3 setup.py install --user 51 | ``` 52 | Note that instructions like # COCOAPI=/path/to/install/cocoapi indicate that you should pick a path where you'd like to have the software cloned and then set an environment variable (COCOAPI in this case) accordingly. 53 | 7. coco pretrained models under ${POSE_ROOT}/models/pytorch, and it looks like this: 54 | 55 | ``` 56 | ${POSE_ROOT} 57 | `-- models 58 | `-- lp_coco 59 | |-- lp_net_50_256x192_with_gcb.pth.tar 60 | `-- lp_net_50_256x192_without_gcb.pth.tar 61 | ``` 62 | 63 | 4. Init output(training model output directory) and log(tensorboard log directory) directory: 64 | 65 | ``` 66 | mkdir output 67 | mkdir log 68 | ``` 69 | 70 | Your directory tree should look like this: 71 | 72 | ``` 73 | ${POSE_ROOT} 74 | ├── data 75 | ├── images 76 | ├── experiments 77 | ├── lib 78 | ├── log 79 | ├── models 80 | ├── output 81 | ├── pose_estimation 82 | ├── README.md 83 | └── requirements.txt 84 | ``` 85 | 86 | ### Data preparation 87 | **For COCO data**, please download from [COCO download](http://cocodataset.org/#download), 2017 Train/Val is needed for COCO keypoints training and validation. We also provide person detection result of COCO val2017 to reproduce our multi-person pose estimation results. Please download from [OneDrive](https://1drv.ms/f/s!AhIXJn_J-blWzzDXoz5BeFl8sWM-) or [GoogleDrive](https://drive.google.com/drive/folders/1fRUDNUDxe9fjqcRZ2bnF_TKMlO0nB_dk?usp=sharing). 88 | Download and extract them under {POSE_ROOT}/data, and make them look like this: 89 | ``` 90 | ${POSE_ROOT} 91 | |-- data 92 | `-- |-- coco 93 | `-- |-- annotations 94 | | |-- person_keypoints_train2017.json 95 | | `-- person_keypoints_val2017.json 96 | |-- person_detection_results 97 | | |-- COCO_val2017_detections_AP_H_56_person.json 98 | `-- images 99 | |-- train2017 100 | | |-- 000000000009.jpg 101 | | |-- 000000000025.jpg 102 | | |-- 000000000030.jpg 103 | | |-- ... 104 | `-- val2017 105 | |-- 000000000139.jpg 106 | |-- 000000000285.jpg 107 | |-- 000000000632.jpg 108 | |-- ... 109 | ``` 110 | ### Valid on COCO val2017 using pretrained models 111 | 112 | ``` 113 | python pose_estimation/valid.py \ 114 | --cfg experiments/coco/lp_net50/256x192_d256x2_adam_lr1e-3_lp.yaml \ 115 | --flip-test \ 116 | --model-file models/lp_coco/lp_net_50_256x192_with_gcb.pth.tar 117 | ``` 118 | 119 | ### Training on COCO train2017 120 | 121 | ``` 122 | python pose_estimation/train.py \ 123 | --cfg experiments/coco/lp_net50/256x192_d256x2_adam_lr1e-3_lp.yaml 124 | ``` 125 | ### Demo 126 | The region of human need to be given in ```demo.py``` 127 | 128 | ``` 129 | python pose_estimation/demo.py \ 130 | --cfg experiments/coco/lp_net50/256x192_d256x2_adam_lr1e-3_lp.yaml \ 131 | --model-file ./models/lp_coco/lp_net_50_256x192_with_gcb.pth.tar 132 | --img-file ./images/0.jpg 133 | ``` 134 | -------------------------------------------------------------------------------- /experiments/coco/lp_net50/256x192_d256x2_adam_lr1e-3_lp.yaml: -------------------------------------------------------------------------------- 1 | GPUS: '0' 2 | DATA_DIR: '' 3 | OUTPUT_DIR: 'output' 4 | LOG_DIR: 'log' 5 | WORKERS: 4 6 | PRINT_FREQ: 100 7 | 8 | DATASET: 9 | DATASET: 'coco' 10 | ROOT: 'data/coco/' 11 | TEST_SET: 'val2017' 12 | TRAIN_SET: 'train2017' 13 | FLIP: true 14 | ROT_FACTOR: 40 15 | SCALE_FACTOR: 0.3 16 | MODEL: 17 | NAME: 'lp_net' 18 | INIT_WEIGHTS: False 19 | IMAGE_SIZE: 20 | - 192 21 | - 256 22 | NUM_JOINTS: 17 23 | EXTRA: 24 | USE_GCB: true 25 | TARGET_TYPE: 'gaussian' 26 | HEATMAP_SIZE: 27 | - 48 28 | - 64 29 | SIGMA: 2 30 | FINAL_CONV_KERNEL: 1 31 | DECONV_WITH_BIAS: false 32 | NUM_DECONV_LAYERS: 2 33 | NUM_DECONV_FILTERS: 34 | - 256 35 | - 256 36 | NUM_DECONV_KERNELS: 37 | - 4 38 | - 4 39 | NUM_LAYERS: 50 40 | LOSS: 41 | USE_TARGET_WEIGHT: true 42 | TRAIN: 43 | BATCH_SIZE: 32 44 | SHUFFLE: true 45 | BEGIN_EPOCH: 0 46 | END_EPOCH: 150 47 | RESUME: True 48 | OPTIMIZER: 'adam' 49 | LR: 0.001 50 | LR_FACTOR: 0.1 51 | LR_STEP: 52 | - 90 53 | - 120 54 | WD: 0.0001 55 | GAMMA1: 0.99 56 | GAMMA2: 0.0 57 | MOMENTUM: 0.9 58 | NESTEROV: false 59 | TEST: 60 | BATCH_SIZE: 32 61 | COCO_BBOX_FILE: 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json' 62 | BBOX_THRE: 1.0 63 | FLIP_TEST: false 64 | IMAGE_THRE: 0.0 65 | IN_VIS_THRE: 0.2 66 | MODEL_FILE: '' 67 | NMS_THRE: 1.0 68 | OKS_THRE: 0.9 69 | USE_GT_BBOX: true 70 | DEBUG: 71 | DEBUG: true 72 | SAVE_BATCH_IMAGES_GT: true 73 | SAVE_BATCH_IMAGES_PRED: true 74 | SAVE_HEATMAPS_GT: true 75 | SAVE_HEATMAPS_PRED: true 76 | -------------------------------------------------------------------------------- /experiments/coco/lp_net50/256x192_d256x2_adam_lr1e-3_lp_not_gcb.yaml: -------------------------------------------------------------------------------- 1 | GPUS: '0' 2 | DATA_DIR: '' 3 | OUTPUT_DIR: 'output' 4 | LOG_DIR: 'log' 5 | WORKERS: 4 6 | PRINT_FREQ: 100 7 | 8 | DATASET: 9 | DATASET: 'coco' 10 | ROOT: 'data/coco/' 11 | TEST_SET: 'val2017' 12 | TRAIN_SET: 'train2017' 13 | FLIP: true 14 | ROT_FACTOR: 40 15 | SCALE_FACTOR: 0.3 16 | MODEL: 17 | NAME: 'lp_net' 18 | INIT_WEIGHTS: False 19 | IMAGE_SIZE: 20 | - 192 21 | - 256 22 | NUM_JOINTS: 17 23 | EXTRA: 24 | USE_GCB: false 25 | TARGET_TYPE: 'gaussian' 26 | HEATMAP_SIZE: 27 | - 48 28 | - 64 29 | SIGMA: 2 30 | FINAL_CONV_KERNEL: 1 31 | DECONV_WITH_BIAS: false 32 | NUM_DECONV_LAYERS: 2 33 | NUM_DECONV_FILTERS: 34 | - 256 35 | - 256 36 | NUM_DECONV_KERNELS: 37 | - 4 38 | - 4 39 | NUM_LAYERS: 50 40 | LOSS: 41 | USE_TARGET_WEIGHT: true 42 | TRAIN: 43 | BATCH_SIZE: 32 44 | SHUFFLE: true 45 | BEGIN_EPOCH: 0 46 | END_EPOCH: 150 47 | RESUME: True 48 | OPTIMIZER: 'adam' 49 | LR: 0.001 50 | LR_FACTOR: 0.1 51 | LR_STEP: 52 | - 90 53 | - 120 54 | WD: 0.0001 55 | GAMMA1: 0.99 56 | GAMMA2: 0.0 57 | MOMENTUM: 0.9 58 | NESTEROV: false 59 | TEST: 60 | BATCH_SIZE: 32 61 | COCO_BBOX_FILE: 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json' 62 | BBOX_THRE: 1.0 63 | FLIP_TEST: false 64 | IMAGE_THRE: 0.0 65 | IN_VIS_THRE: 0.2 66 | MODEL_FILE: '' 67 | NMS_THRE: 1.0 68 | OKS_THRE: 0.9 69 | USE_GT_BBOX: true 70 | DEBUG: 71 | DEBUG: true 72 | SAVE_BATCH_IMAGES_GT: true 73 | SAVE_BATCH_IMAGES_PRED: true 74 | SAVE_HEATMAPS_GT: true 75 | SAVE_HEATMAPS_PRED: true 76 | -------------------------------------------------------------------------------- /images/0.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/images/0.jpg -------------------------------------------------------------------------------- /lib/Makefile: -------------------------------------------------------------------------------- 1 | all: 2 | cd nms; python setup.py build_ext --inplace; rm -rf build; cd ../../ 3 | clean: 4 | cd nms; rm *.so; cd ../../ 5 | -------------------------------------------------------------------------------- /lib/__init__.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import 2 | from __future__ import division 3 | from __future__ import print_function 4 | 5 | from dataset.coco import COCODataset -------------------------------------------------------------------------------- /lib/core/__pycache__/config.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/core/__pycache__/config.cpython-37.pyc -------------------------------------------------------------------------------- /lib/core/__pycache__/evaluate.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/core/__pycache__/evaluate.cpython-37.pyc -------------------------------------------------------------------------------- /lib/core/__pycache__/function.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/core/__pycache__/function.cpython-37.pyc -------------------------------------------------------------------------------- /lib/core/__pycache__/inference.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/core/__pycache__/inference.cpython-37.pyc -------------------------------------------------------------------------------- /lib/core/__pycache__/loss.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/core/__pycache__/loss.cpython-37.pyc -------------------------------------------------------------------------------- /lib/core/__pycache__/softargmax.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/core/__pycache__/softargmax.cpython-37.pyc -------------------------------------------------------------------------------- /lib/core/config.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import os 12 | import yaml 13 | 14 | import numpy as np 15 | from easydict import EasyDict as edict 16 | 17 | 18 | config = edict() 19 | 20 | config.OUTPUT_DIR = '' 21 | config.LOG_DIR = '' 22 | config.DATA_DIR = '' 23 | config.GPUS = '0' 24 | config.WORKERS = 4 25 | config.PRINT_FREQ = 20 26 | 27 | # Cudnn related params 28 | config.CUDNN = edict() 29 | config.CUDNN.BENCHMARK = True 30 | config.CUDNN.DETERMINISTIC = False 31 | config.CUDNN.ENABLED = True 32 | 33 | # pose_resnet related params 34 | POSE_RESNET = edict() 35 | POSE_RESNET.NUM_LAYERS = 50 36 | POSE_RESNET.DECONV_WITH_BIAS = False 37 | POSE_RESNET.NUM_DECONV_LAYERS = 3 38 | POSE_RESNET.NUM_DECONV_FILTERS = [256, 256, 256] 39 | POSE_RESNET.NUM_DECONV_KERNELS = [4, 4, 4] 40 | POSE_RESNET.FINAL_CONV_KERNEL = 1 41 | POSE_RESNET.TARGET_TYPE = 'gaussian' 42 | POSE_RESNET.HEATMAP_SIZE = [64, 64] # width * height, ex: 24 * 32 43 | POSE_RESNET.SIGMA = 2 44 | 45 | LP_NET = edict() 46 | LP_NET.NUM_LAYERS = 50 47 | LP_NET.DECONV_WITH_BIAS = False 48 | LP_NET.NUM_DECONV_LAYERS = 3 49 | LP_NET.NUM_DECONV_FILTERS = [256, 256, 256] 50 | LP_NET.NUM_DECONV_KERNELS = [4, 4, 4] 51 | LP_NET.FINAL_CONV_KERNEL = 1 52 | LP_NET.TARGET_TYPE = 'gaussian' 53 | LP_NET.HEATMAP_SIZE = [64, 64] # width * height, ex: 24 * 32 54 | LP_NET.SIGMA = 2 55 | LP_NET.USE_GCB = True 56 | 57 | MODEL_EXTRAS = { 58 | 'pose_resnet': POSE_RESNET, 59 | 'lp_net': LP_NET, 60 | } 61 | 62 | # common params for NETWORK 63 | config.MODEL = edict() 64 | config.MODEL.NAME = 'pose_resnet' 65 | config.MODEL.INIT_WEIGHTS = True 66 | config.MODEL.PRETRAINED = '' 67 | config.MODEL.NUM_JOINTS = 16 68 | config.MODEL.IMAGE_SIZE = [256, 256] # width * height, ex: 192 * 256 69 | config.MODEL.EXTRA = MODEL_EXTRAS[config.MODEL.NAME] 70 | 71 | config.MODEL.STYLE = '' 72 | 73 | config.LOSS = edict() 74 | config.LOSS.USE_TARGET_WEIGHT = True 75 | 76 | # DATASET related params 77 | config.DATASET = edict() 78 | config.DATASET.ROOT = '' 79 | config.DATASET.DATASET = 'mpii' 80 | config.DATASET.TRAIN_SET = 'train' 81 | config.DATASET.TEST_SET = 'valid' 82 | config.DATASET.DATA_FORMAT = 'jpg' 83 | config.DATASET.HYBRID_JOINTS_TYPE = '' 84 | config.DATASET.SELECT_DATA = False 85 | 86 | # training data augmentation 87 | config.DATASET.FLIP = True 88 | config.DATASET.SCALE_FACTOR = 0.25 89 | config.DATASET.ROT_FACTOR = 30 90 | 91 | # train 92 | config.TRAIN = edict() 93 | 94 | config.TRAIN.LR_FACTOR = 0.1 95 | config.TRAIN.LR_STEP = [90, 110] 96 | config.TRAIN.LR = 0.001 97 | 98 | config.TRAIN.OPTIMIZER = 'adam' 99 | config.TRAIN.MOMENTUM = 0.9 100 | config.TRAIN.WD = 0.0001 101 | config.TRAIN.NESTEROV = False 102 | config.TRAIN.GAMMA1 = 0.99 103 | config.TRAIN.GAMMA2 = 0.0 104 | 105 | config.TRAIN.BEGIN_EPOCH = 0 106 | config.TRAIN.END_EPOCH = 140 107 | 108 | config.TRAIN.RESUME = False 109 | config.TRAIN.CHECKPOINT = '' 110 | 111 | config.TRAIN.BATCH_SIZE = 32 112 | config.TRAIN.SHUFFLE = True 113 | 114 | # testing 115 | config.TEST = edict() 116 | 117 | # size of images for each device 118 | config.TEST.BATCH_SIZE = 32 119 | # Test Model Epoch 120 | config.TEST.FLIP_TEST = False 121 | config.TEST.POST_PROCESS = True 122 | config.TEST.SHIFT_HEATMAP = True 123 | 124 | config.TEST.USE_GT_BBOX = False 125 | # nms 126 | config.TEST.OKS_THRE = 0.5 127 | config.TEST.IN_VIS_THRE = 0.0 128 | config.TEST.COCO_BBOX_FILE = '' 129 | config.TEST.BBOX_THRE = 1.0 130 | config.TEST.MODEL_FILE = '' 131 | config.TEST.IMAGE_THRE = 0.0 132 | config.TEST.NMS_THRE = 1.0 133 | 134 | # debug 135 | config.DEBUG = edict() 136 | config.DEBUG.DEBUG = False 137 | config.DEBUG.SAVE_BATCH_IMAGES_GT = False 138 | config.DEBUG.SAVE_BATCH_IMAGES_PRED = False 139 | config.DEBUG.SAVE_HEATMAPS_GT = False 140 | config.DEBUG.SAVE_HEATMAPS_PRED = False 141 | 142 | 143 | def _update_dict(k, v): 144 | if k == 'DATASET': 145 | if 'MEAN' in v and v['MEAN']: 146 | v['MEAN'] = np.array([eval(x) if isinstance(x, str) else x 147 | for x in v['MEAN']]) 148 | if 'STD' in v and v['STD']: 149 | v['STD'] = np.array([eval(x) if isinstance(x, str) else x 150 | for x in v['STD']]) 151 | if k == 'MODEL': 152 | if 'EXTRA' in v and 'HEATMAP_SIZE' in v['EXTRA']: 153 | if isinstance(v['EXTRA']['HEATMAP_SIZE'], int): 154 | v['EXTRA']['HEATMAP_SIZE'] = np.array( 155 | [v['EXTRA']['HEATMAP_SIZE'], v['EXTRA']['HEATMAP_SIZE']]) 156 | else: 157 | v['EXTRA']['HEATMAP_SIZE'] = np.array( 158 | v['EXTRA']['HEATMAP_SIZE']) 159 | if 'IMAGE_SIZE' in v: 160 | if isinstance(v['IMAGE_SIZE'], int): 161 | v['IMAGE_SIZE'] = np.array([v['IMAGE_SIZE'], v['IMAGE_SIZE']]) 162 | else: 163 | v['IMAGE_SIZE'] = np.array(v['IMAGE_SIZE']) 164 | for vk, vv in v.items(): 165 | if vk in config[k]: 166 | config[k][vk] = vv 167 | else: 168 | raise ValueError("{}.{} not exist in config.py".format(k, vk)) 169 | 170 | 171 | def update_config(config_file): 172 | exp_config = None 173 | with open(config_file) as f: 174 | exp_config = edict(yaml.load(f)) 175 | for k, v in exp_config.items(): 176 | if k in config: 177 | if isinstance(v, dict): 178 | _update_dict(k, v) 179 | else: 180 | if k == 'SCALES': 181 | config[k][0] = (tuple(v)) 182 | else: 183 | config[k] = v 184 | else: 185 | raise ValueError("{} not exist in config.py".format(k)) 186 | 187 | 188 | def gen_config(config_file): 189 | cfg = dict(config) 190 | for k, v in cfg.items(): 191 | if isinstance(v, edict): 192 | cfg[k] = dict(v) 193 | 194 | with open(config_file, 'w') as f: 195 | yaml.dump(dict(cfg), f, default_flow_style=False) 196 | 197 | 198 | def update_dir(model_dir, log_dir, data_dir): 199 | if model_dir: 200 | config.OUTPUT_DIR = model_dir 201 | 202 | if log_dir: 203 | config.LOG_DIR = log_dir 204 | 205 | if data_dir: 206 | config.DATA_DIR = data_dir 207 | 208 | config.DATASET.ROOT = os.path.join( 209 | config.DATA_DIR, config.DATASET.ROOT) 210 | 211 | config.TEST.COCO_BBOX_FILE = os.path.join( 212 | config.DATA_DIR, config.TEST.COCO_BBOX_FILE) 213 | 214 | config.MODEL.PRETRAINED = os.path.join( 215 | config.DATA_DIR, config.MODEL.PRETRAINED) 216 | 217 | 218 | def get_model_name(cfg): 219 | name = cfg.MODEL.NAME 220 | full_name = cfg.MODEL.NAME 221 | extra = cfg.MODEL.EXTRA 222 | if name in ['pose_resnet', 'lp_net']: 223 | name = '{model}_{num_layers}'.format( 224 | model=name, 225 | num_layers=extra.NUM_LAYERS) 226 | deconv_suffix = ''.join( 227 | 'd{}'.format(num_filters) 228 | for num_filters in extra.NUM_DECONV_FILTERS) 229 | full_name = '{height}x{width}_{name}_{deconv_suffix}'.format( 230 | height=cfg.MODEL.IMAGE_SIZE[1], 231 | width=cfg.MODEL.IMAGE_SIZE[0], 232 | name=name, 233 | deconv_suffix=deconv_suffix) 234 | else: 235 | raise ValueError('Unkown model: {}'.format(cfg.MODEL)) 236 | 237 | return name, full_name 238 | 239 | 240 | if __name__ == '__main__': 241 | import sys 242 | gen_config(sys.argv[1]) 243 | -------------------------------------------------------------------------------- /lib/core/evaluate.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import numpy as np 12 | 13 | from core.inference import get_max_preds 14 | 15 | 16 | def calc_dists(preds, target, normalize): 17 | preds = preds.astype(np.float32) 18 | target = target.astype(np.float32) 19 | dists = np.zeros((preds.shape[1], preds.shape[0])) 20 | for n in range(preds.shape[0]): 21 | for c in range(preds.shape[1]): 22 | if target[n, c, 0] > 1 and target[n, c, 1] > 1: 23 | normed_preds = preds[n, c, :] / normalize[n] 24 | normed_targets = target[n, c, :] / normalize[n] 25 | dists[c, n] = np.linalg.norm(normed_preds - normed_targets) 26 | else: 27 | dists[c, n] = -1 28 | return dists 29 | 30 | 31 | def dist_acc(dists, thr=0.5): 32 | ''' Return percentage below threshold while ignoring values with a -1 ''' 33 | dist_cal = np.not_equal(dists, -1) 34 | num_dist_cal = dist_cal.sum() 35 | if num_dist_cal > 0: 36 | return np.less(dists[dist_cal], thr).sum() * 1.0 / num_dist_cal 37 | else: 38 | return -1 39 | 40 | 41 | def accuracy(output, target, hm_type='gaussian', thr=0.5): 42 | ''' 43 | Calculate accuracy according to PCK, 44 | but uses ground truth heatmap rather than x,y locations 45 | First value to be returned is average accuracy across 'idxs', 46 | followed by individual accuracies 47 | ''' 48 | idx = list(range(output.shape[1])) 49 | norm = 1.0 50 | if hm_type == 'gaussian': 51 | pred, _ = get_max_preds(output) 52 | target, _ = get_max_preds(target) 53 | h = output.shape[2] 54 | w = output.shape[3] 55 | norm = np.ones((pred.shape[0], 2)) * np.array([h, w]) / 10 56 | dists = calc_dists(pred, target, norm) 57 | 58 | acc = np.zeros((len(idx) + 1)) 59 | avg_acc = 0 60 | cnt = 0 61 | 62 | for i in range(len(idx)): 63 | acc[i + 1] = dist_acc(dists[idx[i]]) 64 | if acc[i + 1] >= 0: 65 | avg_acc = avg_acc + acc[i + 1] 66 | cnt += 1 67 | 68 | avg_acc = avg_acc / cnt if cnt != 0 else 0 69 | if cnt != 0: 70 | acc[0] = avg_acc 71 | return acc, avg_acc, cnt, pred 72 | -------------------------------------------------------------------------------- /lib/core/function.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft human-pose-estimation 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import logging 12 | import time 13 | import os 14 | 15 | import numpy as np 16 | import torch 17 | 18 | from core.config import get_model_name 19 | from core.evaluate import accuracy 20 | from core.inference import get_final_preds 21 | from utils.transforms import flip_back 22 | from utils.vis import save_debug_images 23 | 24 | 25 | logger = logging.getLogger(__name__) 26 | 27 | 28 | def train(config, train_loader, model, criterion, optimizer, epoch, 29 | output_dir, tb_log_dir, writer_dict): 30 | batch_time = AverageMeter() 31 | data_time = AverageMeter() 32 | losses = AverageMeter() 33 | acc = AverageMeter() 34 | 35 | # switch to train mode 36 | model.train() 37 | 38 | end = time.time() 39 | for i, (input, target, target_weight, meta) in enumerate(train_loader): 40 | # measure data loading time 41 | data_time.update(time.time() - end) 42 | 43 | # compute output 44 | output = model(input) 45 | target = target.cuda(non_blocking=True) 46 | target_weight = target_weight.cuda(non_blocking=True) 47 | 48 | loss = criterion(output, target, target_weight) 49 | 50 | # compute gradient and do update step 51 | optimizer.zero_grad() 52 | loss.backward() 53 | optimizer.step() 54 | 55 | # measure accuracy and record loss 56 | losses.update(loss.item(), input.size(0)) 57 | 58 | _, avg_acc, cnt, pred = accuracy(output.detach().cpu().numpy(), 59 | target.detach().cpu().numpy()) 60 | acc.update(avg_acc, cnt) 61 | 62 | # measure elapsed time 63 | batch_time.update(time.time() - end) 64 | end = time.time() 65 | 66 | if i % config.PRINT_FREQ == 0: 67 | msg = 'Epoch: [{0}][{1}/{2}]\t' \ 68 | 'Time {batch_time.val:.3f}s ({batch_time.avg:.3f}s)\t' \ 69 | 'Speed {speed:.1f} samples/s\t' \ 70 | 'Data {data_time.val:.3f}s ({data_time.avg:.3f}s)\t' \ 71 | 'Loss {loss.val:.5f} ({loss.avg:.5f})\t' \ 72 | 'Accuracy {acc.val:.3f} ({acc.avg:.3f})'.format( 73 | epoch, i, len(train_loader), batch_time=batch_time, 74 | speed=input.size(0)/batch_time.val, 75 | data_time=data_time, loss=losses, acc=acc) 76 | logger.info(msg) 77 | 78 | writer = writer_dict['writer'] 79 | global_steps = writer_dict['train_global_steps'] 80 | writer.add_scalar('train_loss', losses.val, global_steps) 81 | writer.add_scalar('train_acc', acc.val, global_steps) 82 | writer_dict['train_global_steps'] = global_steps + 1 83 | 84 | prefix = '{}_{}'.format(os.path.join(output_dir, 'train'), i) 85 | save_debug_images(config, input, meta, target, pred*4, output, 86 | prefix) 87 | 88 | 89 | def validate(config, val_loader, val_dataset, model, criterion, output_dir, 90 | tb_log_dir, writer_dict=None): 91 | batch_time = AverageMeter() 92 | losses = AverageMeter() 93 | acc = AverageMeter() 94 | 95 | # switch to evaluate mode 96 | model.eval() 97 | 98 | num_samples = len(val_dataset) 99 | all_preds = np.zeros((num_samples, config.MODEL.NUM_JOINTS, 3), 100 | dtype=np.float32) 101 | all_boxes = np.zeros((num_samples, 6)) 102 | image_path = [] 103 | filenames = [] 104 | imgnums = [] 105 | idx = 0 106 | with torch.no_grad(): 107 | end = time.time() 108 | for i, (input, target, target_weight, meta) in enumerate(val_loader): 109 | # compute output 110 | output = model(input) 111 | if config.TEST.FLIP_TEST: 112 | # this part is ugly, because pytorch has not supported negative index 113 | # input_flipped = model(input[:, :, :, ::-1]) 114 | input_flipped = np.flip(input.cpu().numpy(), 3).copy() 115 | input_flipped = torch.from_numpy(input_flipped).cuda() 116 | output_flipped = model(input_flipped) 117 | output_flipped = flip_back(output_flipped.cpu().numpy(), 118 | val_dataset.flip_pairs) 119 | output_flipped = torch.from_numpy(output_flipped.copy()).cuda() 120 | 121 | # feature is not aligned, shift flipped heatmap for higher accuracy 122 | if config.TEST.SHIFT_HEATMAP: 123 | output_flipped[:, :, :, 1:] = \ 124 | output_flipped.clone()[:, :, :, 0:-1] 125 | # output_flipped[:, :, :, 0] = 0 126 | 127 | output = (output + output_flipped) * 0.5 128 | 129 | target = target.cuda(non_blocking=True) 130 | target_weight = target_weight.cuda(non_blocking=True) 131 | 132 | loss = criterion(output, target, target_weight) 133 | 134 | num_images = input.size(0) 135 | # measure accuracy and record loss 136 | losses.update(loss.item(), num_images) 137 | _, avg_acc, cnt, pred = accuracy(output.cpu().numpy(), 138 | target.cpu().numpy()) 139 | 140 | acc.update(avg_acc, cnt) 141 | 142 | # measure elapsed time 143 | batch_time.update(time.time() - end) 144 | end = time.time() 145 | 146 | c = meta['center'].numpy() 147 | s = meta['scale'].numpy() 148 | score = meta['score'].numpy() 149 | 150 | # if i == 0: 151 | # np.save('c.npy', c) 152 | # np.save('s.npy', s) 153 | # np.save('output.npy', output.clone().cpu().numpy()) 154 | 155 | preds, maxvals = get_final_preds( 156 | config, output.clone().cpu().numpy(), c, s) 157 | 158 | all_preds[idx:idx + num_images, :, 0:2] = preds[:, :, 0:2] 159 | all_preds[idx:idx + num_images, :, 2:3] = maxvals 160 | # double check this all_boxes parts 161 | all_boxes[idx:idx + num_images, 0:2] = c[:, 0:2] 162 | all_boxes[idx:idx + num_images, 2:4] = s[:, 0:2] 163 | all_boxes[idx:idx + num_images, 4] = np.prod(s*200, 1) 164 | all_boxes[idx:idx + num_images, 5] = score 165 | image_path.extend(meta['image']) 166 | if config.DATASET.DATASET == 'posetrack': 167 | filenames.extend(meta['filename']) 168 | imgnums.extend(meta['imgnum'].numpy()) 169 | 170 | idx += num_images 171 | 172 | if i % config.PRINT_FREQ == 0: 173 | msg = 'Test: [{0}/{1}]\t' \ 174 | 'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' \ 175 | 'Loss {loss.val:.4f} ({loss.avg:.4f})\t' \ 176 | 'Accuracy {acc.val:.3f} ({acc.avg:.3f})'.format( 177 | i, len(val_loader), batch_time=batch_time, 178 | loss=losses, acc=acc) 179 | logger.info(msg) 180 | 181 | prefix = '{}_{}'.format(os.path.join(output_dir, 'val'), i) 182 | save_debug_images(config, input, meta, target, pred*4, output, 183 | prefix) 184 | 185 | name_values, perf_indicator = val_dataset.evaluate( 186 | config, all_preds, output_dir, all_boxes, image_path, 187 | filenames, imgnums) 188 | 189 | _, full_arch_name = get_model_name(config) 190 | if isinstance(name_values, list): 191 | for name_value in name_values: 192 | _print_name_value(name_value, full_arch_name) 193 | else: 194 | _print_name_value(name_values, full_arch_name) 195 | 196 | if writer_dict: 197 | writer = writer_dict['writer'] 198 | global_steps = writer_dict['valid_global_steps'] 199 | writer.add_scalar('valid_loss', losses.avg, global_steps) 200 | writer.add_scalar('valid_acc', acc.avg, global_steps) 201 | if isinstance(name_values, list): 202 | for name_value in name_values: 203 | writer.add_scalars('valid', dict(name_value), global_steps) 204 | else: 205 | writer.add_scalars('valid', dict(name_values), global_steps) 206 | writer_dict['valid_global_steps'] = global_steps + 1 207 | 208 | return perf_indicator 209 | 210 | 211 | # markdown format output 212 | def _print_name_value(name_value, full_arch_name): 213 | names = name_value.keys() 214 | values = name_value.values() 215 | num_values = len(name_value) 216 | logger.info( 217 | '| Arch ' + 218 | ' '.join(['| {}'.format(name) for name in names]) + 219 | ' |' 220 | ) 221 | logger.info('|---' * (num_values+1) + '|') 222 | logger.info( 223 | '| ' + full_arch_name + ' ' + 224 | ' '.join(['| {:.3f}'.format(value) for value in values]) + 225 | ' |' 226 | ) 227 | 228 | 229 | class AverageMeter(object): 230 | """Computes and stores the average and current value""" 231 | def __init__(self): 232 | self.reset() 233 | 234 | def reset(self): 235 | self.val = 0 236 | self.avg = 0 237 | self.sum = 0 238 | self.count = 0 239 | 240 | def update(self, val, n=1): 241 | self.val = val 242 | self.sum += val * n 243 | self.count += n 244 | self.avg = self.sum / self.count if self.count != 0 else 0 245 | -------------------------------------------------------------------------------- /lib/core/inference.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import math 12 | import torch 13 | import numpy as np 14 | 15 | from utils.transforms import transform_preds 16 | from .softargmax import SoftArgmax2D 17 | 18 | beta_soft_argmax = SoftArgmax2D(beta=160) 19 | 20 | def get_max_preds(batch_heatmaps): 21 | batch_size = batch_heatmaps.shape[0] 22 | num_joints = batch_heatmaps.shape[1] 23 | 24 | heatmaps_reshaped = batch_heatmaps.reshape((batch_size, num_joints, -1)) 25 | 26 | maxvals = np.amax(heatmaps_reshaped, 2) 27 | maxvals = maxvals.reshape((batch_size, num_joints, 1)) 28 | 29 | pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2)) 30 | pred_mask = pred_mask.astype(np.float32) 31 | 32 | preds = beta_soft_argmax(torch.from_numpy(batch_heatmaps)).numpy() 33 | preds *= pred_mask 34 | 35 | return preds, maxvals 36 | 37 | 38 | def get_final_preds(config, batch_heatmaps, center, scale): 39 | heatmap_height = batch_heatmaps.shape[2] 40 | heatmap_width = batch_heatmaps.shape[3] 41 | 42 | # post-processing 43 | if config.TEST.POST_PROCESS: 44 | preds, maxval = get_max_preds(batch_heatmaps) 45 | 46 | # Transform back 47 | for i in range(preds.shape[0]): 48 | preds[i] = transform_preds(preds[i], center[i], scale[i], 49 | [heatmap_width, heatmap_height]) 50 | 51 | return preds, maxval 52 | 53 | 54 | # def get_max_preds(batch_heatmaps): 55 | # ''' 56 | # get predictions from score maps 57 | # heatmaps: numpy.ndarray([batch_size, num_joints, height, width]) 58 | # ''' 59 | # assert isinstance(batch_heatmaps, np.ndarray), \ 60 | # 'batch_heatmaps should be numpy.ndarray' 61 | # assert batch_heatmaps.ndim == 4, 'batch_images should be 4-ndim' 62 | # 63 | # batch_size = batch_heatmaps.shape[0] 64 | # num_joints = batch_heatmaps.shape[1] 65 | # width = batch_heatmaps.shape[3] 66 | # heatmaps_reshaped = batch_heatmaps.reshape((batch_size, num_joints, -1)) 67 | # idx = np.argmax(heatmaps_reshaped, 2) 68 | # maxvals = np.amax(heatmaps_reshaped, 2) 69 | # 70 | # maxvals = maxvals.reshape((batch_size, num_joints, 1)) 71 | # idx = idx.reshape((batch_size, num_joints, 1)) 72 | # 73 | # preds = np.tile(idx, (1, 1, 2)).astype(np.float32) 74 | # 75 | # preds[:, :, 0] = (preds[:, :, 0]) % width 76 | # preds[:, :, 1] = np.floor((preds[:, :, 1]) / width) 77 | # 78 | # pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2)) 79 | # pred_mask = pred_mask.astype(np.float32) 80 | # 81 | # preds *= pred_mask 82 | # return preds, maxvals 83 | # 84 | # 85 | # def get_final_preds(config, batch_heatmaps, center, scale): 86 | # coords, maxvals = get_max_preds(batch_heatmaps) 87 | # 88 | # heatmap_height = batch_heatmaps.shape[2] 89 | # heatmap_width = batch_heatmaps.shape[3] 90 | # 91 | # # post-processing 92 | # if config.TEST.POST_PROCESS: 93 | # for n in range(coords.shape[0]): 94 | # for p in range(coords.shape[1]): 95 | # hm = batch_heatmaps[n][p] 96 | # px = int(math.floor(coords[n][p][0] + 0.5)) 97 | # py = int(math.floor(coords[n][p][1] + 0.5)) 98 | # if 1 < px < heatmap_width-1 and 1 < py < heatmap_height-1: 99 | # diff = np.array([hm[py][px+1] - hm[py][px-1], 100 | # hm[py+1][px]-hm[py-1][px]]) 101 | # coords[n][p] += np.sign(diff) * .25 102 | # 103 | # preds = coords.copy() 104 | # 105 | # # Transform back 106 | # for i in range(coords.shape[0]): 107 | # preds[i] = transform_preds(coords[i], center[i], scale[i], 108 | # [heatmap_width, heatmap_height]) 109 | # 110 | # return preds, maxvals -------------------------------------------------------------------------------- /lib/core/loss.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import torch.nn as nn 12 | 13 | 14 | class JointsMSELoss(nn.Module): 15 | def __init__(self, use_target_weight): 16 | super(JointsMSELoss, self).__init__() 17 | self.criterion = nn.MSELoss(size_average=True) 18 | self.use_target_weight = use_target_weight 19 | 20 | def forward(self, output, target, target_weight): 21 | batch_size = output.size(0) 22 | num_joints = output.size(1) 23 | heatmaps_pred = output.reshape((batch_size, num_joints, -1)).split(1, 1) 24 | heatmaps_gt = target.reshape((batch_size, num_joints, -1)).split(1, 1) 25 | loss = 0 26 | 27 | for idx in range(num_joints): 28 | heatmap_pred = heatmaps_pred[idx].squeeze() 29 | heatmap_gt = heatmaps_gt[idx].squeeze() 30 | if self.use_target_weight: 31 | loss += 0.5 * self.criterion( 32 | heatmap_pred.mul(target_weight[:, idx]), 33 | heatmap_gt.mul(target_weight[:, idx]) 34 | ) 35 | else: 36 | loss += 0.5 * self.criterion(heatmap_pred, heatmap_gt) 37 | 38 | return loss / num_joints 39 | -------------------------------------------------------------------------------- /lib/core/softargmax.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | from torch.nn import functional as F 4 | 5 | 6 | class SoftArgmax2D(nn.Module): 7 | """ 8 | Creates a module that computes Soft-Argmax 2D of a given input heatmap. 9 | Returns the index of the maximum 2d coordinates of the give map. 10 | :param beta: The smoothing parameter. 11 | :param return_xy: The output order is [x, y]. 12 | """ 13 | 14 | def __init__(self, beta: int = 100, return_xy: bool = False): 15 | if not 0.0 <= beta: 16 | raise ValueError(f"Invalid beta: {beta}") 17 | super().__init__() 18 | self.beta = beta 19 | self.return_xy = return_xy 20 | 21 | def forward(self, heatmap: torch.Tensor) -> torch.Tensor: 22 | """ 23 | :param heatmap: The input heatmap is of size B x N x H x W. 24 | :return: The index of the maximum 2d coordinates is of size B x N x 2. 25 | """ 26 | heatmap = heatmap.mul(self.beta) 27 | batch_size, num_channel, height, width = heatmap.size() 28 | device: str = heatmap.device 29 | 30 | softmax: torch.Tensor = F.softmax( 31 | heatmap.view(batch_size, num_channel, height * width), dim=2 32 | ).view(batch_size, num_channel, height, width) 33 | 34 | xx, yy = torch.meshgrid(list(map(torch.arange, [height, width]))) 35 | 36 | approx_x = ( 37 | softmax.mul(xx.float().to(device)) 38 | .view(batch_size, num_channel, height * width) 39 | .sum(2) 40 | .unsqueeze(2) 41 | ) 42 | approx_y = ( 43 | softmax.mul(yy.float().to(device)) 44 | .view(batch_size, num_channel, height * width) 45 | .sum(2) 46 | .unsqueeze(2) 47 | ) 48 | 49 | output = [approx_x, approx_y] if self.return_xy else [approx_y, approx_x] 50 | output = torch.cat(output, 2) 51 | return output 52 | -------------------------------------------------------------------------------- /lib/dataset/JointsDataset.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import copy 12 | import logging 13 | import random 14 | 15 | import cv2 16 | import numpy as np 17 | import torch 18 | from torch.utils.data import Dataset 19 | 20 | from utils.transforms import get_affine_transform 21 | from utils.transforms import affine_transform 22 | from utils.transforms import fliplr_joints 23 | 24 | 25 | logger = logging.getLogger(__name__) 26 | 27 | 28 | class JointsDataset(Dataset): 29 | def __init__(self, cfg, root, image_set, is_train, transform=None): 30 | self.num_joints = 0 31 | self.pixel_std = 200 32 | self.flip_pairs = [] 33 | self.parent_ids = [] 34 | 35 | self.is_train = is_train 36 | self.root = root 37 | self.image_set = image_set 38 | 39 | self.output_path = cfg.OUTPUT_DIR 40 | self.data_format = cfg.DATASET.DATA_FORMAT 41 | 42 | self.scale_factor = cfg.DATASET.SCALE_FACTOR 43 | self.rotation_factor = cfg.DATASET.ROT_FACTOR 44 | self.flip = cfg.DATASET.FLIP 45 | 46 | self.image_size = cfg.MODEL.IMAGE_SIZE 47 | self.target_type = cfg.MODEL.EXTRA.TARGET_TYPE 48 | self.heatmap_size = cfg.MODEL.EXTRA.HEATMAP_SIZE 49 | self.sigma = cfg.MODEL.EXTRA.SIGMA 50 | 51 | self.transform = transform 52 | self.db = [] 53 | 54 | def _get_db(self): 55 | raise NotImplementedError 56 | 57 | def evaluate(self, cfg, preds, output_dir, *args, **kwargs): 58 | raise NotImplementedError 59 | 60 | def __len__(self,): 61 | return len(self.db) 62 | 63 | def __getitem__(self, idx): 64 | db_rec = copy.deepcopy(self.db[idx]) 65 | 66 | image_file = db_rec['image'] 67 | filename = db_rec['filename'] if 'filename' in db_rec else '' 68 | imgnum = db_rec['imgnum'] if 'imgnum' in db_rec else '' 69 | 70 | if self.data_format == 'zip': 71 | from utils import zipreader 72 | data_numpy = zipreader.imread( 73 | image_file, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION) 74 | else: 75 | data_numpy = cv2.imread( 76 | image_file, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION) 77 | 78 | if data_numpy is None: 79 | logger.error('=> fail to read {}'.format(image_file)) 80 | raise ValueError('Fail to read {}'.format(image_file)) 81 | 82 | joints = db_rec['joints_3d'] 83 | joints_vis = db_rec['joints_3d_vis'] 84 | 85 | c = db_rec['center'] 86 | s = db_rec['scale'] 87 | score = db_rec['score'] if 'score' in db_rec else 1 88 | r = 0 89 | 90 | if self.is_train: 91 | sf = self.scale_factor 92 | rf = self.rotation_factor 93 | s = s * np.clip(np.random.randn()*sf + 1, 1 - sf, 1 + sf) 94 | r = np.clip(np.random.randn()*rf, -rf*2, rf*2) \ 95 | if random.random() <= 0.6 else 0 96 | 97 | if self.flip and random.random() <= 0.5: 98 | data_numpy = data_numpy[:, ::-1, :] 99 | joints, joints_vis = fliplr_joints( 100 | joints, joints_vis, data_numpy.shape[1], self.flip_pairs) 101 | c[0] = data_numpy.shape[1] - c[0] - 1 102 | 103 | trans = get_affine_transform(c, s, r, self.image_size) 104 | input = cv2.warpAffine( 105 | data_numpy, 106 | trans, 107 | (int(self.image_size[0]), int(self.image_size[1])), 108 | flags=cv2.INTER_LINEAR) 109 | 110 | if self.transform: 111 | input = self.transform(input) 112 | 113 | for i in range(self.num_joints): 114 | if joints_vis[i, 0] > 0.0: 115 | joints[i, 0:2] = affine_transform(joints[i, 0:2], trans) 116 | 117 | target, target_weight = self.generate_target(joints, joints_vis) 118 | 119 | target = torch.from_numpy(target) 120 | target_weight = torch.from_numpy(target_weight) 121 | 122 | meta = { 123 | 'image': image_file, 124 | 'filename': filename, 125 | 'imgnum': imgnum, 126 | 'joints': joints, 127 | 'joints_vis': joints_vis, 128 | 'center': c, 129 | 'scale': s, 130 | 'rotation': r, 131 | 'score': score 132 | } 133 | 134 | return input, target, target_weight, meta 135 | 136 | def select_data(self, db): 137 | db_selected = [] 138 | for rec in db: 139 | num_vis = 0 140 | joints_x = 0.0 141 | joints_y = 0.0 142 | for joint, joint_vis in zip( 143 | rec['joints_3d'], rec['joints_3d_vis']): 144 | if joint_vis[0] <= 0: 145 | continue 146 | num_vis += 1 147 | 148 | joints_x += joint[0] 149 | joints_y += joint[1] 150 | if num_vis == 0: 151 | continue 152 | 153 | joints_x, joints_y = joints_x / num_vis, joints_y / num_vis 154 | 155 | area = rec['scale'][0] * rec['scale'][1] * (self.pixel_std**2) 156 | joints_center = np.array([joints_x, joints_y]) 157 | bbox_center = np.array(rec['center']) 158 | diff_norm2 = np.linalg.norm((joints_center-bbox_center), 2) 159 | ks = np.exp(-1.0*(diff_norm2**2) / ((0.2)**2*2.0*area)) 160 | 161 | metric = (0.2 / 16) * num_vis + 0.45 - 0.2 / 16 162 | if ks > metric: 163 | db_selected.append(rec) 164 | 165 | logger.info('=> num db: {}'.format(len(db))) 166 | logger.info('=> num selected db: {}'.format(len(db_selected))) 167 | return db_selected 168 | 169 | def generate_target(self, joints, joints_vis): 170 | ''' 171 | :param joints: [num_joints, 3] 172 | :param joints_vis: [num_joints, 3] 173 | :return: target, target_weight(1: visible, 0: invisible) 174 | ''' 175 | target_weight = np.ones((self.num_joints, 1), dtype=np.float32) 176 | target_weight[:, 0] = joints_vis[:, 0] 177 | 178 | assert self.target_type == 'gaussian', \ 179 | 'Only support gaussian map now!' 180 | 181 | if self.target_type == 'gaussian': 182 | target = np.zeros((self.num_joints, 183 | self.heatmap_size[1], 184 | self.heatmap_size[0]), 185 | dtype=np.float32) 186 | 187 | tmp_size = self.sigma * 3 188 | 189 | for joint_id in range(self.num_joints): 190 | feat_stride = self.image_size / self.heatmap_size 191 | mu_x = int(joints[joint_id][0] / feat_stride[0] + 0.5) 192 | mu_y = int(joints[joint_id][1] / feat_stride[1] + 0.5) 193 | # Check that any part of the gaussian is in-bounds 194 | ul = [int(mu_x - tmp_size), int(mu_y - tmp_size)] 195 | br = [int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)] 196 | if ul[0] >= self.heatmap_size[0] or ul[1] >= self.heatmap_size[1] \ 197 | or br[0] < 0 or br[1] < 0: 198 | # If not, just return the image as is 199 | target_weight[joint_id] = 0 200 | continue 201 | 202 | # # Generate gaussian 203 | size = 2 * tmp_size + 1 204 | x = np.arange(0, size, 1, np.float32) 205 | y = x[:, np.newaxis] 206 | x0 = y0 = size // 2 207 | # The gaussian is not normalized, we want the center value to equal 1 208 | g = np.exp(- ((x - x0) ** 2 + (y - y0) ** 2) / (2 * self.sigma ** 2)) 209 | 210 | # Usable gaussian range 211 | g_x = max(0, -ul[0]), min(br[0], self.heatmap_size[0]) - ul[0] 212 | g_y = max(0, -ul[1]), min(br[1], self.heatmap_size[1]) - ul[1] 213 | # Image range 214 | img_x = max(0, ul[0]), min(br[0], self.heatmap_size[0]) 215 | img_y = max(0, ul[1]), min(br[1], self.heatmap_size[1]) 216 | 217 | v = target_weight[joint_id] 218 | if v > 0.5: 219 | target[joint_id][img_y[0]:img_y[1], img_x[0]:img_x[1]] = \ 220 | g[g_y[0]:g_y[1], g_x[0]:g_x[1]] 221 | 222 | return target, target_weight 223 | -------------------------------------------------------------------------------- /lib/dataset/JointsDataset_test.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import copy 12 | import logging 13 | import random 14 | 15 | import cv2 16 | import numpy as np 17 | import torch 18 | from torch.utils.data import Dataset 19 | 20 | from utils.transforms import get_affine_transform 21 | from utils.transforms import affine_transform 22 | from utils.transforms import fliplr_joints 23 | 24 | 25 | logger = logging.getLogger(__name__) 26 | 27 | 28 | class JointsDataset(Dataset): 29 | def __init__(self, root, image_set, is_train, transform=None): 30 | self.num_joints = 0 31 | self.pixel_std = 200 32 | self.flip_pairs = [] 33 | self.parent_ids = [] 34 | 35 | self.is_train = is_train 36 | self.root = root 37 | self.image_set = image_set 38 | 39 | self.output_path = 'output' 40 | self.data_format = 'jpg' 41 | 42 | self.scale_factor = 0.3 43 | self.rotation_factor = 40 44 | self.flip = True 45 | 46 | self.image_size = [192, 256] 47 | self.target_type = 'gaussion' 48 | self.heatmap_size = [24, 32] 49 | self.sigma = 2 50 | 51 | self.transform = transform 52 | self.db = [] 53 | 54 | def _get_db(self): 55 | raise NotImplementedError 56 | 57 | def evaluate(self, preds, output_dir, *args, **kwargs): 58 | raise NotImplementedError 59 | 60 | def __len__(self,): 61 | return len(self.db) 62 | 63 | def __getitem__(self, idx): 64 | db_rec = copy.deepcopy(self.db[idx]) 65 | 66 | image_file = db_rec['image'] 67 | filename = db_rec['filename'] if 'filename' in db_rec else '' 68 | imgnum = db_rec['imgnum'] if 'imgnum' in db_rec else '' 69 | 70 | if self.data_format == 'zip': 71 | from utils import zipreader 72 | data_numpy = zipreader.imread( 73 | image_file, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION) 74 | else: 75 | data_numpy = cv2.imread( 76 | image_file, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION) 77 | 78 | if data_numpy is None: 79 | logger.error('=> fail to read {}'.format(image_file)) 80 | raise ValueError('Fail to read {}'.format(image_file)) 81 | 82 | joints = db_rec['joints_3d'] 83 | joints_vis = db_rec['joints_3d_vis'] 84 | 85 | c = db_rec['center'] 86 | s = db_rec['scale'] 87 | score = db_rec['score'] if 'score' in db_rec else 1 88 | r = 0 89 | 90 | if self.is_train: 91 | sf = self.scale_factor 92 | rf = self.rotation_factor 93 | s = s * np.clip(np.random.randn()*sf + 1, 1 - sf, 1 + sf) 94 | r = np.clip(np.random.randn()*rf, -rf*2, rf*2) \ 95 | if random.random() <= 0.6 else 0 96 | 97 | if self.flip and random.random() <= 0.5: 98 | data_numpy = data_numpy[:, ::-1, :] 99 | joints, joints_vis = fliplr_joints( 100 | joints, joints_vis, data_numpy.shape[1], self.flip_pairs) 101 | c[0] = data_numpy.shape[1] - c[0] - 1 102 | 103 | trans = get_affine_transform(c, s, r, self.image_size) 104 | input = cv2.warpAffine( 105 | data_numpy, 106 | trans, 107 | (int(self.image_size[0]), int(self.image_size[1])), 108 | flags=cv2.INTER_LINEAR) 109 | 110 | if self.transform: 111 | input = self.transform(input) 112 | 113 | for i in range(self.num_joints): 114 | if joints_vis[i, 0] > 0.0: 115 | joints[i, 0:2] = affine_transform(joints[i, 0:2], trans) 116 | 117 | target, target_weight = self.generate_target(joints, joints_vis) 118 | 119 | target = torch.from_numpy(target) 120 | target_weight = torch.from_numpy(target_weight) 121 | 122 | meta = { 123 | 'image': image_file, 124 | 'filename': filename, 125 | 'imgnum': imgnum, 126 | 'joints': joints, 127 | 'joints_vis': joints_vis, 128 | 'center': c, 129 | 'scale': s, 130 | 'rotation': r, 131 | 'score': score 132 | } 133 | 134 | return input, target, target_weight, meta 135 | 136 | def select_data(self, db): 137 | db_selected = [] 138 | for rec in db: 139 | num_vis = 0 140 | joints_x = 0.0 141 | joints_y = 0.0 142 | for joint, joint_vis in zip( 143 | rec['joints_3d'], rec['joints_3d_vis']): 144 | if joint_vis[0] <= 0: 145 | continue 146 | num_vis += 1 147 | 148 | joints_x += joint[0] 149 | joints_y += joint[1] 150 | if num_vis == 0: 151 | continue 152 | 153 | joints_x, joints_y = joints_x / num_vis, joints_y / num_vis 154 | 155 | area = rec['scale'][0] * rec['scale'][1] * (self.pixel_std**2) 156 | joints_center = np.array([joints_x, joints_y]) 157 | bbox_center = np.array(rec['center']) 158 | diff_norm2 = np.linalg.norm((joints_center-bbox_center), 2) 159 | ks = np.exp(-1.0*(diff_norm2**2) / ((0.2)**2*2.0*area)) 160 | 161 | metric = (0.2 / 16) * num_vis + 0.45 - 0.2 / 16 162 | if ks > metric: 163 | db_selected.append(rec) 164 | 165 | logger.info('=> num db: {}'.format(len(db))) 166 | logger.info('=> num selected db: {}'.format(len(db_selected))) 167 | return db_selected 168 | 169 | def generate_target(self, joints, joints_vis): 170 | ''' 171 | :param joints: [num_joints, 3] 172 | :param joints_vis: [num_joints, 3] 173 | :return: target, target_weight(1: visible, 0: invisible) 174 | ''' 175 | target_weight = np.ones((self.num_joints, 1), dtype=np.float32) 176 | target_weight[:, 0] = joints_vis[:, 0] 177 | 178 | assert self.target_type == 'gaussian', \ 179 | 'Only support gaussian map now!' 180 | 181 | if self.target_type == 'gaussian': 182 | target = np.zeros((self.num_joints, 183 | self.heatmap_size[1], 184 | self.heatmap_size[0]), 185 | dtype=np.float32) 186 | 187 | tmp_size = self.sigma * 3 188 | 189 | for joint_id in range(self.num_joints): 190 | feat_stride = self.image_size / self.heatmap_size 191 | mu_x = int(joints[joint_id][0] / feat_stride[0] + 0.5) 192 | mu_y = int(joints[joint_id][1] / feat_stride[1] + 0.5) 193 | # Check that any part of the gaussian is in-bounds 194 | ul = [int(mu_x - tmp_size), int(mu_y - tmp_size)] 195 | br = [int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)] 196 | if ul[0] >= self.heatmap_size[0] or ul[1] >= self.heatmap_size[1] \ 197 | or br[0] < 0 or br[1] < 0: 198 | # If not, just return the image as is 199 | target_weight[joint_id] = 0 200 | continue 201 | 202 | # # Generate gaussian 203 | size = 2 * tmp_size + 1 204 | x = np.arange(0, size, 1, np.float32) 205 | y = x[:, np.newaxis] 206 | x0 = y0 = size // 2 207 | # The gaussian is not normalized, we want the center value to equal 1 208 | g = np.exp(- ((x - x0) ** 2 + (y - y0) ** 2) / (2 * self.sigma ** 2)) 209 | 210 | # Usable gaussian range 211 | g_x = max(0, -ul[0]), min(br[0], self.heatmap_size[0]) - ul[0] 212 | g_y = max(0, -ul[1]), min(br[1], self.heatmap_size[1]) - ul[1] 213 | # Image range 214 | img_x = max(0, ul[0]), min(br[0], self.heatmap_size[0]) 215 | img_y = max(0, ul[1]), min(br[1], self.heatmap_size[1]) 216 | 217 | v = target_weight[joint_id] 218 | if v > 0.5: 219 | target[joint_id][img_y[0]:img_y[1], img_x[0]:img_x[1]] = \ 220 | g[g_y[0]:g_y[1], g_x[0]:g_x[1]] 221 | 222 | return target, target_weight 223 | -------------------------------------------------------------------------------- /lib/dataset/__init__.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | from .mpii import MPIIDataset as mpii 12 | from .coco import COCODataset as coco 13 | -------------------------------------------------------------------------------- /lib/dataset/__pycache__/JointsDataset.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/dataset/__pycache__/JointsDataset.cpython-37.pyc -------------------------------------------------------------------------------- /lib/dataset/__pycache__/__init__.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/dataset/__pycache__/__init__.cpython-37.pyc -------------------------------------------------------------------------------- /lib/dataset/__pycache__/coco.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/dataset/__pycache__/coco.cpython-37.pyc -------------------------------------------------------------------------------- /lib/dataset/__pycache__/mpii.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/dataset/__pycache__/mpii.cpython-37.pyc -------------------------------------------------------------------------------- /lib/dataset/coco.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import logging 12 | import os 13 | import pickle 14 | from collections import defaultdict 15 | from collections import OrderedDict 16 | 17 | import json_tricks as json 18 | import numpy as np 19 | from pycocotools.coco import COCO 20 | from pycocotools.cocoeval import COCOeval 21 | 22 | from dataset.JointsDataset import JointsDataset 23 | from nms.nms import oks_nms 24 | 25 | 26 | logger = logging.getLogger(__name__) 27 | 28 | 29 | class COCODataset(JointsDataset): 30 | ''' 31 | "keypoints": { 32 | 0: "nose", 33 | 1: "left_eye", 34 | 2: "right_eye", 35 | 3: "left_ear", 36 | 4: "right_ear", 37 | 5: "left_shoulder", 38 | 6: "right_shoulder", 39 | 7: "left_elbow", 40 | 8: "right_elbow", 41 | 9: "left_wrist", 42 | 10: "right_wrist", 43 | 11: "left_hip", 44 | 12: "right_hip", 45 | 13: "left_knee", 46 | 14: "right_knee", 47 | 15: "left_ankle", 48 | 16: "right_ankle" 49 | }, 50 | "skeleton": [ 51 | [16,14],[14,12],[17,15],[15,13],[12,13],[6,12],[7,13], [6,7],[6,8], 52 | [7,9],[8,10],[9,11],[2,3],[1,2],[1,3],[2,4],[3,5],[4,6],[5,7]] 53 | ''' 54 | def __init__(self, cfg, root, image_set, is_train, transform=None): 55 | super().__init__(cfg, root, image_set, is_train, transform) 56 | self.nms_thre = cfg.TEST.NMS_THRE 57 | self.image_thre = cfg.TEST.IMAGE_THRE 58 | self.oks_thre = cfg.TEST.OKS_THRE 59 | self.in_vis_thre = cfg.TEST.IN_VIS_THRE 60 | self.bbox_file = cfg.TEST.COCO_BBOX_FILE 61 | self.use_gt_bbox = cfg.TEST.USE_GT_BBOX 62 | self.image_width = cfg.MODEL.IMAGE_SIZE[0] 63 | self.image_height = cfg.MODEL.IMAGE_SIZE[1] 64 | self.aspect_ratio = self.image_width * 1.0 / self.image_height 65 | self.pixel_std = 200 66 | self.coco = COCO(self._get_ann_file_keypoint()) 67 | 68 | # deal with class names 69 | cats = [cat['name'] 70 | for cat in self.coco.loadCats(self.coco.getCatIds())] 71 | self.classes = ['__background__'] + cats 72 | logger.info('=> classes: {}'.format(self.classes)) 73 | self.num_classes = len(self.classes) 74 | self._class_to_ind = dict(zip(self.classes, range(self.num_classes))) 75 | self._class_to_coco_ind = dict(zip(cats, self.coco.getCatIds())) 76 | self._coco_ind_to_class_ind = dict([(self._class_to_coco_ind[cls], 77 | self._class_to_ind[cls]) 78 | for cls in self.classes[1:]]) 79 | 80 | # load image file names 81 | self.image_set_index = self._load_image_set_index() 82 | self.num_images = len(self.image_set_index) 83 | logger.info('=> num_images: {}'.format(self.num_images)) 84 | 85 | self.num_joints = 17 86 | self.flip_pairs = [[1, 2], [3, 4], [5, 6], [7, 8], 87 | [9, 10], [11, 12], [13, 14], [15, 16]] 88 | self.parent_ids = None 89 | 90 | self.db = self._get_db() 91 | 92 | if is_train and cfg.DATASET.SELECT_DATA: 93 | self.db = self.select_data(self.db) 94 | 95 | logger.info('=> load {} samples'.format(len(self.db))) 96 | 97 | def _get_ann_file_keypoint(self): 98 | """ self.root / annotations / person_keypoints_train2017.json """ 99 | prefix = 'person_keypoints' \ 100 | if 'test' not in self.image_set else 'image_info' 101 | return os.path.join(self.root, 'annotations', 102 | prefix + '_' + self.image_set + '.json') 103 | 104 | def _load_image_set_index(self): 105 | """ image id: int """ 106 | image_ids = self.coco.getImgIds() 107 | return image_ids 108 | 109 | def _get_db(self): 110 | if self.is_train or self.use_gt_bbox: 111 | # use ground truth bbox 112 | gt_db = self._load_coco_keypoint_annotations() 113 | else: 114 | # use bbox from detection 115 | gt_db = self._load_coco_person_detection_results() 116 | return gt_db 117 | 118 | def _load_coco_keypoint_annotations(self): 119 | """ ground truth bbox and keypoints """ 120 | gt_db = [] 121 | for index in self.image_set_index: 122 | gt_db.extend(self._load_coco_keypoint_annotation_kernal(index)) 123 | return gt_db 124 | 125 | def _load_coco_keypoint_annotation_kernal(self, index): 126 | """ 127 | coco ann: [u'segmentation', u'area', u'iscrowd', u'image_id', u'bbox', u'category_id', u'id'] 128 | iscrowd: 129 | crowd instances are handled by marking their overlaps with all categories to -1 130 | and later excluded in training 131 | bbox: 132 | [x1, y1, w, h] 133 | :param index: coco image id 134 | :return: db entry 135 | """ 136 | im_ann = self.coco.loadImgs(index)[0] 137 | width = im_ann['width'] 138 | height = im_ann['height'] 139 | 140 | annIds = self.coco.getAnnIds(imgIds=index, iscrowd=False) 141 | objs = self.coco.loadAnns(annIds) 142 | 143 | # sanitize bboxes 144 | valid_objs = [] 145 | for obj in objs: 146 | x, y, w, h = obj['bbox'] 147 | x1 = np.max((0, x)) 148 | y1 = np.max((0, y)) 149 | x2 = np.min((width - 1, x1 + np.max((0, w - 1)))) 150 | y2 = np.min((height - 1, y1 + np.max((0, h - 1)))) 151 | if obj['area'] > 0 and x2 >= x1 and y2 >= y1: 152 | # obj['clean_bbox'] = [x1, y1, x2, y2] 153 | obj['clean_bbox'] = [x1, y1, x2-x1, y2-y1] 154 | valid_objs.append(obj) 155 | objs = valid_objs 156 | 157 | rec = [] 158 | for obj in objs: 159 | cls = self._coco_ind_to_class_ind[obj['category_id']] 160 | if cls != 1: 161 | continue 162 | 163 | # ignore objs without keypoints annotation 164 | if max(obj['keypoints']) == 0: 165 | continue 166 | 167 | joints_3d = np.zeros((self.num_joints, 3), dtype=np.float) 168 | joints_3d_vis = np.zeros((self.num_joints, 3), dtype=np.float) 169 | for ipt in range(self.num_joints): 170 | joints_3d[ipt, 0] = obj['keypoints'][ipt * 3 + 0] 171 | joints_3d[ipt, 1] = obj['keypoints'][ipt * 3 + 1] 172 | joints_3d[ipt, 2] = 0 173 | t_vis = obj['keypoints'][ipt * 3 + 2] 174 | if t_vis > 1: 175 | t_vis = 1 176 | joints_3d_vis[ipt, 0] = t_vis 177 | joints_3d_vis[ipt, 1] = t_vis 178 | joints_3d_vis[ipt, 2] = 0 179 | 180 | center, scale = self._box2cs(obj['clean_bbox'][:4]) 181 | rec.append({ 182 | 'image': self.image_path_from_index(index), 183 | 'center': center, 184 | 'scale': scale, 185 | 'joints_3d': joints_3d, 186 | 'joints_3d_vis': joints_3d_vis, 187 | 'filename': '', 188 | 'imgnum': 0, 189 | }) 190 | 191 | return rec 192 | 193 | def _box2cs(self, box): 194 | x, y, w, h = box[:4] 195 | return self._xywh2cs(x, y, w, h) 196 | 197 | def _xywh2cs(self, x, y, w, h): 198 | center = np.zeros((2), dtype=np.float32) 199 | center[0] = x + w * 0.5 200 | center[1] = y + h * 0.5 201 | 202 | if w > self.aspect_ratio * h: 203 | h = w * 1.0 / self.aspect_ratio 204 | elif w < self.aspect_ratio * h: 205 | w = h * self.aspect_ratio 206 | scale = np.array( 207 | [w * 1.0 / self.pixel_std, h * 1.0 / self.pixel_std], 208 | dtype=np.float32) 209 | if center[0] != -1: 210 | scale = scale * 1.25 211 | 212 | return center, scale 213 | 214 | def image_path_from_index(self, index): 215 | """ example: images / train2017 / 000000119993.jpg """ 216 | file_name = '%012d.jpg' % index 217 | if '2014' in self.image_set: 218 | file_name = 'COCO_%s_' % self.image_set + file_name 219 | 220 | prefix = 'test2017' if 'test' in self.image_set else self.image_set 221 | 222 | data_name = prefix + '.zip@' if self.data_format == 'zip' else prefix 223 | 224 | image_path = os.path.join( 225 | self.root, 'images', data_name, file_name) 226 | 227 | return image_path 228 | 229 | def _load_coco_person_detection_results(self): 230 | all_boxes = None 231 | with open(self.bbox_file, 'r') as f: 232 | all_boxes = json.load(f) 233 | 234 | if not all_boxes: 235 | logger.error('=> Load %s fail!' % self.bbox_file) 236 | return None 237 | 238 | logger.info('=> Total boxes: {}'.format(len(all_boxes))) 239 | 240 | kpt_db = [] 241 | num_boxes = 0 242 | for n_img in range(0, len(all_boxes)): 243 | det_res = all_boxes[n_img] 244 | if det_res['category_id'] != 1: 245 | continue 246 | img_name = self.image_path_from_index(det_res['image_id']) 247 | box = det_res['bbox'] 248 | score = det_res['score'] 249 | 250 | if score < self.image_thre: 251 | continue 252 | 253 | num_boxes = num_boxes + 1 254 | 255 | center, scale = self._box2cs(box) 256 | joints_3d = np.zeros((self.num_joints, 3), dtype=np.float) 257 | joints_3d_vis = np.ones( 258 | (self.num_joints, 3), dtype=np.float) 259 | kpt_db.append({ 260 | 'image': img_name, 261 | 'center': center, 262 | 'scale': scale, 263 | 'score': score, 264 | 'joints_3d': joints_3d, 265 | 'joints_3d_vis': joints_3d_vis, 266 | }) 267 | 268 | logger.info('=> Total boxes after fliter low score@{}: {}'.format( 269 | self.image_thre, num_boxes)) 270 | return kpt_db 271 | 272 | # need double check this API and classes field 273 | def evaluate(self, cfg, preds, output_dir, all_boxes, img_path, 274 | *args, **kwargs): 275 | res_folder = os.path.join(output_dir, 'results') 276 | if not os.path.exists(res_folder): 277 | os.makedirs(res_folder) 278 | res_file = os.path.join( 279 | res_folder, 'keypoints_%s_results.json' % self.image_set) 280 | 281 | # person x (keypoints) 282 | _kpts = [] 283 | for idx, kpt in enumerate(preds): 284 | _kpts.append({ 285 | 'keypoints': kpt, 286 | 'center': all_boxes[idx][0:2], 287 | 'scale': all_boxes[idx][2:4], 288 | 'area': all_boxes[idx][4], 289 | 'score': all_boxes[idx][5], 290 | 'image': int(img_path[idx][-16:-4]) 291 | }) 292 | # image x person x (keypoints) 293 | kpts = defaultdict(list) 294 | for kpt in _kpts: 295 | kpts[kpt['image']].append(kpt) 296 | 297 | # rescoring and oks nms 298 | num_joints = self.num_joints 299 | in_vis_thre = self.in_vis_thre 300 | oks_thre = self.oks_thre 301 | oks_nmsed_kpts = [] 302 | for img in kpts.keys(): 303 | img_kpts = kpts[img] 304 | for n_p in img_kpts: 305 | box_score = n_p['score'] 306 | kpt_score = 0 307 | valid_num = 0 308 | for n_jt in range(0, num_joints): 309 | t_s = n_p['keypoints'][n_jt][2] 310 | if t_s > in_vis_thre: 311 | kpt_score = kpt_score + t_s 312 | valid_num = valid_num + 1 313 | if valid_num != 0: 314 | kpt_score = kpt_score / valid_num 315 | # rescoring 316 | n_p['score'] = kpt_score * box_score 317 | keep = oks_nms([img_kpts[i] for i in range(len(img_kpts))], 318 | oks_thre) 319 | if len(keep) == 0: 320 | oks_nmsed_kpts.append(img_kpts) 321 | else: 322 | oks_nmsed_kpts.append([img_kpts[_keep] for _keep in keep]) 323 | 324 | self._write_coco_keypoint_results( 325 | oks_nmsed_kpts, res_file) 326 | if 'test' not in self.image_set: 327 | info_str = self._do_python_keypoint_eval( 328 | res_file, res_folder) 329 | name_value = OrderedDict(info_str) 330 | return name_value, name_value['AP'] 331 | else: 332 | return {'Null': 0}, 0 333 | 334 | def _write_coco_keypoint_results(self, keypoints, res_file): 335 | data_pack = [{'cat_id': self._class_to_coco_ind[cls], 336 | 'cls_ind': cls_ind, 337 | 'cls': cls, 338 | 'ann_type': 'keypoints', 339 | 'keypoints': keypoints 340 | } 341 | for cls_ind, cls in enumerate(self.classes) if not cls == '__background__'] 342 | 343 | results = self._coco_keypoint_results_one_category_kernel(data_pack[0]) 344 | logger.info('=> Writing results json to %s' % res_file) 345 | with open(res_file, 'w') as f: 346 | json.dump(results, f, sort_keys=True, indent=4) 347 | try: 348 | json.load(open(res_file)) 349 | except Exception: 350 | content = [] 351 | with open(res_file, 'r') as f: 352 | for line in f: 353 | content.append(line) 354 | content[-1] = ']' 355 | with open(res_file, 'w') as f: 356 | for c in content: 357 | f.write(c) 358 | 359 | def _coco_keypoint_results_one_category_kernel(self, data_pack): 360 | cat_id = data_pack['cat_id'] 361 | keypoints = data_pack['keypoints'] 362 | cat_results = [] 363 | 364 | for img_kpts in keypoints: 365 | if len(img_kpts) == 0: 366 | continue 367 | 368 | _key_points = np.array([img_kpts[k]['keypoints'] 369 | for k in range(len(img_kpts))]) 370 | key_points = np.zeros( 371 | (_key_points.shape[0], self.num_joints * 3), dtype=np.float) 372 | 373 | for ipt in range(self.num_joints): 374 | key_points[:, ipt * 3 + 0] = _key_points[:, ipt, 0] 375 | key_points[:, ipt * 3 + 1] = _key_points[:, ipt, 1] 376 | key_points[:, ipt * 3 + 2] = _key_points[:, ipt, 2] # keypoints score. 377 | 378 | result = [{'image_id': img_kpts[k]['image'], 379 | 'category_id': cat_id, 380 | 'keypoints': list(key_points[k]), 381 | 'score': img_kpts[k]['score'], 382 | 'center': list(img_kpts[k]['center']), 383 | 'scale': list(img_kpts[k]['scale']) 384 | } for k in range(len(img_kpts))] 385 | cat_results.extend(result) 386 | 387 | return cat_results 388 | 389 | def _do_python_keypoint_eval(self, res_file, res_folder): 390 | coco_dt = self.coco.loadRes(res_file) 391 | coco_eval = COCOeval(self.coco, coco_dt, 'keypoints') 392 | coco_eval.params.useSegm = None 393 | coco_eval.evaluate() 394 | coco_eval.accumulate() 395 | coco_eval.summarize() 396 | stats_names = ['AP', 'Ap .5', 'AP .75', 'AP (M)', 'AP (L)', 'AR', 'AR .5', 'AR .75', 'AR (M)', 'AR (L)'] 397 | 398 | info_str = [] 399 | for ind, name in enumerate(stats_names): 400 | info_str.append((name, coco_eval.stats[ind])) 401 | 402 | eval_file = os.path.join( 403 | res_folder, 'keypoints_%s_results.pkl' % self.image_set) 404 | 405 | with open(eval_file, 'wb') as f: 406 | pickle.dump(coco_eval, f, pickle.HIGHEST_PROTOCOL) 407 | logger.info('=> coco eval results saved to %s' % eval_file) 408 | 409 | return info_str 410 | -------------------------------------------------------------------------------- /lib/dataset/coco_test.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import logging 12 | import os 13 | import pickle 14 | from collections import defaultdict 15 | from collections import OrderedDict 16 | 17 | import json_tricks as json 18 | import numpy as np 19 | from pycocotools.coco import COCO 20 | from pycocotools.cocoeval import COCOeval 21 | 22 | from dataset.JointsDataset_test import JointsDataset 23 | from nms.nms import oks_nms 24 | 25 | 26 | logger = logging.getLogger(__name__) 27 | 28 | 29 | class COCODataset(JointsDataset): 30 | ''' 31 | "keypoints": { 32 | 0: "nose", 33 | 1: "left_eye", 34 | 2: "right_eye", 35 | 3: "left_ear", 36 | 4: "right_ear", 37 | 5: "left_shoulder", 38 | 6: "right_shoulder", 39 | 7: "left_elbow", 40 | 8: "right_elbow", 41 | 9: "left_wrist", 42 | 10: "right_wrist", 43 | 11: "left_hip", 44 | 12: "right_hip", 45 | 13: "left_knee", 46 | 14: "right_knee", 47 | 15: "left_ankle", 48 | 16: "right_ankle" 49 | }, 50 | "skeleton": [ 51 | [16,14],[14,12],[17,15],[15,13],[12,13],[6,12],[7,13], [6,7],[6,8], 52 | [7,9],[8,10],[9,11],[2,3],[1,2],[1,3],[2,4],[3,5],[4,6],[5,7]] 53 | ''' 54 | def __init__(self, root, image_set, is_train, transform=None): 55 | super().__init__(root, image_set, is_train, transform) 56 | self.nms_thre = 1. 57 | self.image_thre = 0. 58 | self.oks_thre = 0.9 59 | self.in_vis_thre = 0.2 60 | self.bbox_file = 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json' 61 | self.use_gt_bbox = True 62 | self.image_width = 192 63 | self.image_height = 256 64 | self.aspect_ratio = self.image_width * 1.0 / self.image_height 65 | self.pixel_std = 200 66 | self.coco = COCO(self._get_ann_file_keypoint()) 67 | 68 | # deal with class names 69 | cats = [cat['name'] 70 | for cat in self.coco.loadCats(self.coco.getCatIds())] 71 | self.classes = ['__background__'] + cats 72 | logger.info('=> classes: {}'.format(self.classes)) 73 | self.num_classes = len(self.classes) 74 | self._class_to_ind = dict(zip(self.classes, range(self.num_classes))) 75 | self._class_to_coco_ind = dict(zip(cats, self.coco.getCatIds())) 76 | self._coco_ind_to_class_ind = dict([(self._class_to_coco_ind[cls], 77 | self._class_to_ind[cls]) 78 | for cls in self.classes[1:]]) 79 | 80 | # load image file names 81 | self.image_set_index = self._load_image_set_index() 82 | self.num_images = len(self.image_set_index) 83 | logger.info('=> num_images: {}'.format(self.num_images)) 84 | 85 | self.num_joints = 17 86 | self.flip_pairs = [[1, 2], [3, 4], [5, 6], [7, 8], 87 | [9, 10], [11, 12], [13, 14], [15, 16]] 88 | self.parent_ids = None 89 | 90 | self.db = self._get_db() 91 | 92 | if is_train and False: 93 | self.db = self.select_data(self.db) 94 | 95 | logger.info('=> load {} samples'.format(len(self.db))) 96 | 97 | def _get_ann_file_keypoint(self): 98 | """ self.root / annotations / person_keypoints_train2017.json """ 99 | prefix = 'person_keypoints' \ 100 | if 'test' not in self.image_set else 'image_info' 101 | return os.path.join(self.root, 'annotations', 102 | prefix + '_' + self.image_set + '.json') 103 | 104 | def _load_image_set_index(self): 105 | """ image id: int """ 106 | image_ids = self.coco.getImgIds() 107 | return image_ids 108 | 109 | def _get_db(self): 110 | if self.is_train or self.use_gt_bbox: 111 | # use ground truth bbox 112 | gt_db = self._load_coco_keypoint_annotations() 113 | else: 114 | # use bbox from detection 115 | gt_db = self._load_coco_person_detection_results() 116 | return gt_db 117 | 118 | def _load_coco_keypoint_annotations(self): 119 | """ ground truth bbox and keypoints """ 120 | gt_db = [] 121 | for index in self.image_set_index: 122 | gt_db.extend(self._load_coco_keypoint_annotation_kernal(index)) 123 | return gt_db 124 | 125 | def _load_coco_keypoint_annotation_kernal(self, index): 126 | """ 127 | coco ann: [u'segmentation', u'area', u'iscrowd', u'image_id', u'bbox', u'category_id', u'id'] 128 | iscrowd: 129 | crowd instances are handled by marking their overlaps with all categories to -1 130 | and later excluded in training 131 | bbox: 132 | [x1, y1, w, h] 133 | :param index: coco image id 134 | :return: db entry 135 | """ 136 | im_ann = self.coco.loadImgs(index)[0] 137 | width = im_ann['width'] 138 | height = im_ann['height'] 139 | 140 | annIds = self.coco.getAnnIds(imgIds=index, iscrowd=False) 141 | objs = self.coco.loadAnns(annIds) 142 | 143 | # sanitize bboxes 144 | valid_objs = [] 145 | for obj in objs: 146 | x, y, w, h = obj['bbox'] 147 | x1 = np.max((0, x)) 148 | y1 = np.max((0, y)) 149 | x2 = np.min((width - 1, x1 + np.max((0, w - 1)))) 150 | y2 = np.min((height - 1, y1 + np.max((0, h - 1)))) 151 | if obj['area'] > 0 and x2 >= x1 and y2 >= y1: 152 | # obj['clean_bbox'] = [x1, y1, x2, y2] 153 | obj['clean_bbox'] = [x1, y1, x2-x1, y2-y1] 154 | valid_objs.append(obj) 155 | objs = valid_objs 156 | 157 | rec = [] 158 | for obj in objs: 159 | cls = self._coco_ind_to_class_ind[obj['category_id']] 160 | if cls != 1: 161 | continue 162 | 163 | # ignore objs without keypoints annotation 164 | if max(obj['keypoints']) == 0: 165 | continue 166 | 167 | joints_3d = np.zeros((self.num_joints, 3), dtype=np.float) 168 | joints_3d_vis = np.zeros((self.num_joints, 3), dtype=np.float) 169 | for ipt in range(self.num_joints): 170 | joints_3d[ipt, 0] = obj['keypoints'][ipt * 3 + 0] 171 | joints_3d[ipt, 1] = obj['keypoints'][ipt * 3 + 1] 172 | joints_3d[ipt, 2] = 0 173 | t_vis = obj['keypoints'][ipt * 3 + 2] 174 | if t_vis > 1: 175 | t_vis = 1 176 | joints_3d_vis[ipt, 0] = t_vis 177 | joints_3d_vis[ipt, 1] = t_vis 178 | joints_3d_vis[ipt, 2] = 0 179 | 180 | center, scale = self._box2cs(obj['clean_bbox'][:4]) 181 | rec.append({ 182 | 'image': self.image_path_from_index(index), 183 | 'center': center, 184 | 'scale': scale, 185 | 'joints_3d': joints_3d, 186 | 'joints_3d_vis': joints_3d_vis, 187 | 'filename': '', 188 | 'imgnum': 0, 189 | }) 190 | 191 | return rec 192 | 193 | def _box2cs(self, box): 194 | x, y, w, h = box[:4] 195 | return self._xywh2cs(x, y, w, h) 196 | 197 | def _xywh2cs(self, x, y, w, h): 198 | center = np.zeros((2), dtype=np.float32) 199 | center[0] = x + w * 0.5 200 | center[1] = y + h * 0.5 201 | 202 | if w > self.aspect_ratio * h: 203 | h = w * 1.0 / self.aspect_ratio 204 | elif w < self.aspect_ratio * h: 205 | w = h * self.aspect_ratio 206 | scale = np.array( 207 | [w * 1.0 / self.pixel_std, h * 1.0 / self.pixel_std], 208 | dtype=np.float32) 209 | if center[0] != -1: 210 | scale = scale * 1.25 211 | 212 | return center, scale 213 | 214 | def image_path_from_index(self, index): 215 | """ example: images / train2017 / 000000119993.jpg """ 216 | file_name = '%012d.jpg' % index 217 | if '2014' in self.image_set: 218 | file_name = 'COCO_%s_' % self.image_set + file_name 219 | 220 | prefix = 'test2017' if 'test' in self.image_set else self.image_set 221 | 222 | data_name = prefix + '.zip@' if self.data_format == 'zip' else prefix 223 | 224 | image_path = os.path.join( 225 | self.root, 'images', data_name, file_name) 226 | 227 | return image_path 228 | 229 | def _load_coco_person_detection_results(self): 230 | all_boxes = None 231 | with open(self.bbox_file, 'r') as f: 232 | all_boxes = json.load(f) 233 | 234 | if not all_boxes: 235 | logger.error('=> Load %s fail!' % self.bbox_file) 236 | return None 237 | 238 | logger.info('=> Total boxes: {}'.format(len(all_boxes))) 239 | 240 | kpt_db = [] 241 | num_boxes = 0 242 | for n_img in range(0, len(all_boxes)): 243 | det_res = all_boxes[n_img] 244 | if det_res['category_id'] != 1: 245 | continue 246 | img_name = self.image_path_from_index(det_res['image_id']) 247 | box = det_res['bbox'] 248 | score = det_res['score'] 249 | 250 | if score < self.image_thre: 251 | continue 252 | 253 | num_boxes = num_boxes + 1 254 | 255 | center, scale = self._box2cs(box) 256 | joints_3d = np.zeros((self.num_joints, 3), dtype=np.float) 257 | joints_3d_vis = np.ones( 258 | (self.num_joints, 3), dtype=np.float) 259 | kpt_db.append({ 260 | 'image': img_name, 261 | 'center': center, 262 | 'scale': scale, 263 | 'score': score, 264 | 'joints_3d': joints_3d, 265 | 'joints_3d_vis': joints_3d_vis, 266 | }) 267 | 268 | logger.info('=> Total boxes after fliter low score@{}: {}'.format( 269 | self.image_thre, num_boxes)) 270 | return kpt_db 271 | 272 | # need double check this API and classes field 273 | def evaluate(self, cfg, preds, output_dir, all_boxes, img_path, 274 | *args, **kwargs): 275 | res_folder = os.path.join(output_dir, 'results') 276 | if not os.path.exists(res_folder): 277 | os.makedirs(res_folder) 278 | res_file = os.path.join( 279 | res_folder, 'keypoints_%s_results.json' % self.image_set) 280 | 281 | # person x (keypoints) 282 | _kpts = [] 283 | for idx, kpt in enumerate(preds): 284 | _kpts.append({ 285 | 'keypoints': kpt, 286 | 'center': all_boxes[idx][0:2], 287 | 'scale': all_boxes[idx][2:4], 288 | 'area': all_boxes[idx][4], 289 | 'score': all_boxes[idx][5], 290 | 'image': int(img_path[idx][-16:-4]) 291 | }) 292 | # image x person x (keypoints) 293 | kpts = defaultdict(list) 294 | for kpt in _kpts: 295 | kpts[kpt['image']].append(kpt) 296 | 297 | # rescoring and oks nms 298 | num_joints = self.num_joints 299 | in_vis_thre = self.in_vis_thre 300 | oks_thre = self.oks_thre 301 | oks_nmsed_kpts = [] 302 | for img in kpts.keys(): 303 | img_kpts = kpts[img] 304 | for n_p in img_kpts: 305 | box_score = n_p['score'] 306 | kpt_score = 0 307 | valid_num = 0 308 | for n_jt in range(0, num_joints): 309 | t_s = n_p['keypoints'][n_jt][2] 310 | if t_s > in_vis_thre: 311 | kpt_score = kpt_score + t_s 312 | valid_num = valid_num + 1 313 | if valid_num != 0: 314 | kpt_score = kpt_score / valid_num 315 | # rescoring 316 | n_p['score'] = kpt_score * box_score 317 | keep = oks_nms([img_kpts[i] for i in range(len(img_kpts))], 318 | oks_thre) 319 | if len(keep) == 0: 320 | oks_nmsed_kpts.append(img_kpts) 321 | else: 322 | oks_nmsed_kpts.append([img_kpts[_keep] for _keep in keep]) 323 | 324 | self._write_coco_keypoint_results( 325 | oks_nmsed_kpts, res_file) 326 | if 'test' not in self.image_set: 327 | info_str = self._do_python_keypoint_eval( 328 | res_file, res_folder) 329 | name_value = OrderedDict(info_str) 330 | return name_value, name_value['AP'] 331 | else: 332 | return {'Null': 0}, 0 333 | 334 | def _write_coco_keypoint_results(self, keypoints, res_file): 335 | data_pack = [{'cat_id': self._class_to_coco_ind[cls], 336 | 'cls_ind': cls_ind, 337 | 'cls': cls, 338 | 'ann_type': 'keypoints', 339 | 'keypoints': keypoints 340 | } 341 | for cls_ind, cls in enumerate(self.classes) if not cls == '__background__'] 342 | 343 | results = self._coco_keypoint_results_one_category_kernel(data_pack[0]) 344 | logger.info('=> Writing results json to %s' % res_file) 345 | with open(res_file, 'w') as f: 346 | json.dump(results, f, sort_keys=True, indent=4) 347 | try: 348 | json.load(open(res_file)) 349 | except Exception: 350 | content = [] 351 | with open(res_file, 'r') as f: 352 | for line in f: 353 | content.append(line) 354 | content[-1] = ']' 355 | with open(res_file, 'w') as f: 356 | for c in content: 357 | f.write(c) 358 | 359 | def _coco_keypoint_results_one_category_kernel(self, data_pack): 360 | cat_id = data_pack['cat_id'] 361 | keypoints = data_pack['keypoints'] 362 | cat_results = [] 363 | 364 | for img_kpts in keypoints: 365 | if len(img_kpts) == 0: 366 | continue 367 | 368 | _key_points = np.array([img_kpts[k]['keypoints'] 369 | for k in range(len(img_kpts))]) 370 | key_points = np.zeros( 371 | (_key_points.shape[0], self.num_joints * 3), dtype=np.float) 372 | 373 | for ipt in range(self.num_joints): 374 | key_points[:, ipt * 3 + 0] = _key_points[:, ipt, 0] 375 | key_points[:, ipt * 3 + 1] = _key_points[:, ipt, 1] 376 | key_points[:, ipt * 3 + 2] = _key_points[:, ipt, 2] # keypoints score. 377 | 378 | result = [{'image_id': img_kpts[k]['image'], 379 | 'category_id': cat_id, 380 | 'keypoints': list(key_points[k]), 381 | 'score': img_kpts[k]['score'], 382 | 'center': list(img_kpts[k]['center']), 383 | 'scale': list(img_kpts[k]['scale']) 384 | } for k in range(len(img_kpts))] 385 | cat_results.extend(result) 386 | 387 | return cat_results 388 | 389 | def _do_python_keypoint_eval(self, res_file, res_folder): 390 | coco_dt = self.coco.loadRes(res_file) 391 | coco_eval = COCOeval(self.coco, coco_dt, 'keypoints') 392 | coco_eval.params.useSegm = None 393 | coco_eval.evaluate() 394 | coco_eval.accumulate() 395 | coco_eval.summarize() 396 | stats_names = ['AP', 'Ap .5', 'AP .75', 'AP (M)', 'AP (L)', 'AR', 'AR .5', 'AR .75', 'AR (M)', 'AR (L)'] 397 | 398 | info_str = [] 399 | for ind, name in enumerate(stats_names): 400 | info_str.append((name, coco_eval.stats[ind])) 401 | 402 | eval_file = os.path.join( 403 | res_folder, 'keypoints_%s_results.pkl' % self.image_set) 404 | 405 | with open(eval_file, 'wb') as f: 406 | pickle.dump(coco_eval, f, pickle.HIGHEST_PROTOCOL) 407 | logger.info('=> coco eval results saved to %s' % eval_file) 408 | 409 | return info_str 410 | -------------------------------------------------------------------------------- /lib/dataset/mpii.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | from collections import OrderedDict 12 | import logging 13 | import os 14 | import json_tricks as json 15 | 16 | import numpy as np 17 | from scipy.io import loadmat, savemat 18 | 19 | from dataset.JointsDataset import JointsDataset 20 | 21 | 22 | logger = logging.getLogger(__name__) 23 | 24 | 25 | class MPIIDataset(JointsDataset): 26 | def __init__(self, cfg, root, image_set, is_train, transform=None): 27 | super().__init__(cfg, root, image_set, is_train, transform) 28 | 29 | self.num_joints = 16 30 | self.flip_pairs = [[0, 5], [1, 4], [2, 3], [10, 15], [11, 14], [12, 13]] 31 | self.parent_ids = [1, 2, 6, 6, 3, 4, 6, 6, 7, 8, 11, 12, 7, 7, 13, 14] 32 | 33 | self.db = self._get_db() 34 | 35 | if is_train and cfg.DATASET.SELECT_DATA: 36 | self.db = self.select_data(self.db) 37 | 38 | logger.info('=> load {} samples'.format(len(self.db))) 39 | 40 | def _get_db(self): 41 | # create train/val split 42 | file_name = os.path.join(self.root, 43 | 'annot', 44 | self.image_set+'.json') 45 | with open(file_name) as anno_file: 46 | anno = json.load(anno_file) 47 | 48 | gt_db = [] 49 | for a in anno: 50 | image_name = a['image'] 51 | 52 | c = np.array(a['center'], dtype=np.float) 53 | s = np.array([a['scale'], a['scale']], dtype=np.float) 54 | 55 | # Adjust center/scale slightly to avoid cropping limbs 56 | if c[0] != -1: 57 | c[1] = c[1] + 15 * s[1] 58 | s = s * 1.25 59 | 60 | # MPII uses matlab format, index is based 1, 61 | # we should first convert to 0-based index 62 | c = c - 1 63 | 64 | joints_3d = np.zeros((self.num_joints, 3), dtype=np.float) 65 | joints_3d_vis = np.zeros((self.num_joints, 3), dtype=np.float) 66 | if self.image_set != 'test': 67 | joints = np.array(a['joints']) 68 | joints[:, 0:2] = joints[:, 0:2] - 1 69 | joints_vis = np.array(a['joints_vis']) 70 | assert len(joints) == self.num_joints, \ 71 | 'joint num diff: {} vs {}'.format(len(joints), 72 | self.num_joints) 73 | 74 | joints_3d[:, 0:2] = joints[:, 0:2] 75 | joints_3d_vis[:, 0] = joints_vis[:] 76 | joints_3d_vis[:, 1] = joints_vis[:] 77 | 78 | image_dir = 'images.zip@' if self.data_format == 'zip' else 'images' 79 | gt_db.append({ 80 | 'image': os.path.join(self.root, image_dir, image_name), 81 | 'center': c, 82 | 'scale': s, 83 | 'joints_3d': joints_3d, 84 | 'joints_3d_vis': joints_3d_vis, 85 | 'filename': '', 86 | 'imgnum': 0, 87 | }) 88 | 89 | return gt_db 90 | 91 | def evaluate(self, cfg, preds, output_dir, *args, **kwargs): 92 | # convert 0-based index to 1-based index 93 | preds = preds[:, :, 0:2] + 1.0 94 | 95 | if output_dir: 96 | pred_file = os.path.join(output_dir, 'pred.mat') 97 | savemat(pred_file, mdict={'preds': preds}) 98 | 99 | if 'test' in cfg.DATASET.TEST_SET: 100 | return {'Null': 0.0}, 0.0 101 | 102 | SC_BIAS = 0.6 103 | threshold = 0.5 104 | 105 | gt_file = os.path.join(cfg.DATASET.ROOT, 106 | 'annot', 107 | 'gt_{}.mat'.format(cfg.DATASET.TEST_SET)) 108 | gt_dict = loadmat(gt_file) 109 | dataset_joints = gt_dict['dataset_joints'] 110 | jnt_missing = gt_dict['jnt_missing'] 111 | pos_gt_src = gt_dict['pos_gt_src'] 112 | headboxes_src = gt_dict['headboxes_src'] 113 | 114 | pos_pred_src = np.transpose(preds, [1, 2, 0]) 115 | 116 | head = np.where(dataset_joints == 'head')[1][0] 117 | lsho = np.where(dataset_joints == 'lsho')[1][0] 118 | lelb = np.where(dataset_joints == 'lelb')[1][0] 119 | lwri = np.where(dataset_joints == 'lwri')[1][0] 120 | lhip = np.where(dataset_joints == 'lhip')[1][0] 121 | lkne = np.where(dataset_joints == 'lkne')[1][0] 122 | lank = np.where(dataset_joints == 'lank')[1][0] 123 | 124 | rsho = np.where(dataset_joints == 'rsho')[1][0] 125 | relb = np.where(dataset_joints == 'relb')[1][0] 126 | rwri = np.where(dataset_joints == 'rwri')[1][0] 127 | rkne = np.where(dataset_joints == 'rkne')[1][0] 128 | rank = np.where(dataset_joints == 'rank')[1][0] 129 | rhip = np.where(dataset_joints == 'rhip')[1][0] 130 | 131 | jnt_visible = 1 - jnt_missing 132 | uv_error = pos_pred_src - pos_gt_src 133 | uv_err = np.linalg.norm(uv_error, axis=1) 134 | headsizes = headboxes_src[1, :, :] - headboxes_src[0, :, :] 135 | headsizes = np.linalg.norm(headsizes, axis=0) 136 | headsizes *= SC_BIAS 137 | scale = np.multiply(headsizes, np.ones((len(uv_err), 1))) 138 | scaled_uv_err = np.divide(uv_err, scale) 139 | scaled_uv_err = np.multiply(scaled_uv_err, jnt_visible) 140 | jnt_count = np.sum(jnt_visible, axis=1) 141 | less_than_threshold = np.multiply((scaled_uv_err <= threshold), 142 | jnt_visible) 143 | PCKh = np.divide(100.*np.sum(less_than_threshold, axis=1), jnt_count) 144 | 145 | # save 146 | rng = np.arange(0, 0.5+0.01, 0.01) 147 | pckAll = np.zeros((len(rng), 16)) 148 | 149 | for r in range(len(rng)): 150 | threshold = rng[r] 151 | less_than_threshold = np.multiply(scaled_uv_err <= threshold, 152 | jnt_visible) 153 | pckAll[r, :] = np.divide(100.*np.sum(less_than_threshold, axis=1), 154 | jnt_count) 155 | 156 | PCKh = np.ma.array(PCKh, mask=False) 157 | PCKh.mask[6:8] = True 158 | 159 | jnt_count = np.ma.array(jnt_count, mask=False) 160 | jnt_count.mask[6:8] = True 161 | jnt_ratio = jnt_count / np.sum(jnt_count).astype(np.float64) 162 | 163 | name_value = [ 164 | ('Head', PCKh[head]), 165 | ('Shoulder', 0.5 * (PCKh[lsho] + PCKh[rsho])), 166 | ('Elbow', 0.5 * (PCKh[lelb] + PCKh[relb])), 167 | ('Wrist', 0.5 * (PCKh[lwri] + PCKh[rwri])), 168 | ('Hip', 0.5 * (PCKh[lhip] + PCKh[rhip])), 169 | ('Knee', 0.5 * (PCKh[lkne] + PCKh[rkne])), 170 | ('Ankle', 0.5 * (PCKh[lank] + PCKh[rank])), 171 | ('Mean', np.sum(PCKh * jnt_ratio)), 172 | ('Mean@0.1', np.sum(pckAll[11, :] * jnt_ratio)) 173 | ] 174 | name_value = OrderedDict(name_value) 175 | 176 | return name_value, name_value['Mean'] 177 | -------------------------------------------------------------------------------- /lib/models/Untitled1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 74, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "from __future__ import absolute_import\n", 10 | "from __future__ import division\n", 11 | "from __future__ import print_function\n", 12 | "\n", 13 | "import os\n", 14 | "import logging\n", 15 | "\n", 16 | "import torch\n", 17 | "import torch.nn as nn\n", 18 | "from lib.models.ct.context_block import ContextBlock\n", 19 | "from collections import OrderedDict\n", 20 | "\n", 21 | "BN_MOMENTUM = 0.1\n", 22 | "logger = logging.getLogger(__name__)\n", 23 | "\n", 24 | "\n", 25 | "class DepthwiseConv2D(nn.Module):\n", 26 | " def __init__(self, in_channels, kernel_size, stride, bias=False):\n", 27 | " super(DepthwiseConv2D, self).__init__()\n", 28 | " padding = (kernel_size - 1) // 2\n", 29 | "\n", 30 | " self.depthwise_conv = nn.Conv2d(in_channels, in_channels, kernel_size=kernel_size, padding=padding, stride=stride, groups=in_channels, bias=bias)\n", 31 | "\n", 32 | " def forward(self, x):\n", 33 | " out = self.depthwise_conv(x)\n", 34 | " return out\n", 35 | "\n", 36 | "\n", 37 | "class Bottleneck(nn.Module):\n", 38 | " expansion = 1\n", 39 | " USE_GCB = False\n", 40 | "\n", 41 | " def __init__(self, inplanes, planes, stride=1, downsample=None):\n", 42 | " super(Bottleneck, self).__init__()\n", 43 | " \n", 44 | " #self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)\n", 45 | " self.conv1 = GhostModule(inplanes, planes, kernel_size=1, relu=True)\n", 46 | " self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)\n", 47 | " self.conv2 = DepthwiseConv2D(planes, kernel_size=3, stride=stride)\n", 48 | " self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)\n", 49 | " #self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, bias=False)\n", 50 | " self.conv3 = GhostModule(planes, planes * self.expansion, kernel_size=1, relu=False)\n", 51 | " self.bn3 = nn.BatchNorm2d(planes * self.expansion, momentum=BN_MOMENTUM)\n", 52 | " if self.USE_GCB:\n", 53 | " self.gcb4 = ContextBlock(planes)\n", 54 | " else:\n", 55 | " self.gcb4 = None\n", 56 | "\n", 57 | " self.relu = nn.ReLU(inplace=True)\n", 58 | " self.downsample = downsample\n", 59 | " self.stride = stride\n", 60 | "\n", 61 | " def forward(self, x):\n", 62 | " residual = x\n", 63 | "\n", 64 | " out = self.conv1(x)\n", 65 | " out = self.bn1(out)\n", 66 | " out = self.relu(out)\n", 67 | "\n", 68 | " out = self.conv2(out)\n", 69 | " out = self.bn2(out)\n", 70 | " out = self.relu(out)\n", 71 | "\n", 72 | " out = self.conv3(out)\n", 73 | " out = self.bn3(out)\n", 74 | "\n", 75 | " if self.gcb4 is not None:\n", 76 | " out = self.gcb4(out)\n", 77 | "\n", 78 | " if self.downsample is not None:\n", 79 | " residual = self.downsample(x)\n", 80 | "\n", 81 | " out += residual\n", 82 | " out = self.relu(out)\n", 83 | "\n", 84 | " return out\n", 85 | "\n", 86 | "\n", 87 | "class PoseResNet(nn.Module):\n", 88 | "\n", 89 | " def __init__(self, block, layers):\n", 90 | " self.inplanes = 64\n", 91 | "# extra = cfg.MODEL.EXTRA\n", 92 | "# self.deconv_with_bias = extra.DECONV_WITH_BIAS\n", 93 | "# self.use_gcb = extra.USE_GCB\n", 94 | " self.deconv_with_bias = False\n", 95 | " self.use_gcb = False\n", 96 | "\n", 97 | " super(PoseResNet, self).__init__()\n", 98 | " self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,\n", 99 | " bias=False)\n", 100 | " self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)\n", 101 | " self.relu = nn.ReLU(inplace=True)\n", 102 | " self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)\n", 103 | " self.layer1 = self._make_layer(block, 64, layers[0])\n", 104 | " self.layer2 = self._make_layer(block, 128, layers[1], stride=2)\n", 105 | " self.layer3 = self._make_layer(block, 256, layers[2], stride=2)\n", 106 | " self.layer4 = self._make_layer(block, 512, layers[3])\n", 107 | "\n", 108 | " # used for deconv layers\n", 109 | "# self.deconv_layers = self._make_deconv_layer(\n", 110 | "# extra.NUM_DECONV_LAYERS,\n", 111 | "# extra.NUM_DECONV_FILTERS,\n", 112 | "# extra.NUM_DECONV_KERNELS,\n", 113 | "# )\n", 114 | " self.deconv_layers = self._make_deconv_layer(\n", 115 | " 2, [256, 256], [4, 4]\n", 116 | " )\n", 117 | "# self.final_layer = nn.Conv2d(\n", 118 | "# in_channels=extra.NUM_DECONV_FILTERS[-1],\n", 119 | "# out_channels=cfg.MODEL.NUM_JOINTS,\n", 120 | "# kernel_size=extra.FINAL_CONV_KERNEL,\n", 121 | "# stride=1,\n", 122 | "# padding=1 if extra.FINAL_CONV_KERNEL == 3 else 0\n", 123 | "# )\n", 124 | " self.final_layer = nn.Conv2d(\n", 125 | " in_channels=256,\n", 126 | " out_channels=17,\n", 127 | " kernel_size=1,\n", 128 | " stride=1,\n", 129 | " padding=0\n", 130 | " )\n", 131 | "\n", 132 | " def _make_layer(self, block, planes, blocks, stride=1):\n", 133 | " downsample = None\n", 134 | " block.USE_GCB = self.use_gcb\n", 135 | "\n", 136 | " if stride != 1 or self.inplanes != planes * block.expansion:\n", 137 | " downsample = nn.Sequential(\n", 138 | " nn.Conv2d(self.inplanes, planes * block.expansion,\n", 139 | " kernel_size=1, stride=stride, bias=False),\n", 140 | " nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),\n", 141 | " )\n", 142 | "\n", 143 | " layers = []\n", 144 | " layers.append(block(self.inplanes, planes, stride, downsample))\n", 145 | " self.inplanes = planes * block.expansion\n", 146 | " for i in range(1, blocks):\n", 147 | " layers.append(block(self.inplanes, planes))\n", 148 | "\n", 149 | " return nn.Sequential(*layers)\n", 150 | "\n", 151 | " def _get_deconv_cfg(self, deconv_kernel, index):\n", 152 | " if deconv_kernel == 4:\n", 153 | " padding = 1\n", 154 | " output_padding = 0\n", 155 | " elif deconv_kernel == 3:\n", 156 | " padding = 1\n", 157 | " output_padding = 1\n", 158 | " elif deconv_kernel == 2:\n", 159 | " padding = 0\n", 160 | " output_padding = 0\n", 161 | "\n", 162 | " return deconv_kernel, padding, output_padding\n", 163 | "\n", 164 | " # (3, [256, 256, 256], [4, 4, 4])\n", 165 | " def _make_deconv_layer(self, num_layers, num_filters, num_kernels):\n", 166 | " assert num_layers == len(num_filters), \\\n", 167 | " 'ERROR: num_deconv_layers is different len(num_deconv_filters)'\n", 168 | " assert num_layers == len(num_kernels), \\\n", 169 | " 'ERROR: num_deconv_layers is different len(num_deconv_filters)'\n", 170 | "\n", 171 | " layers = []\n", 172 | " for i in range(num_layers):\n", 173 | " kernel, padding, output_padding = \\\n", 174 | " self._get_deconv_cfg(num_kernels[i], i)\n", 175 | "\n", 176 | " planes = num_filters[i]\n", 177 | " layers.append(\n", 178 | " nn.ConvTranspose2d(\n", 179 | " in_channels=self.inplanes,\n", 180 | " out_channels=planes,\n", 181 | " kernel_size=kernel,\n", 182 | " stride=2,\n", 183 | " padding=padding,\n", 184 | " output_padding=output_padding,\n", 185 | " groups=planes,\n", 186 | " bias=self.deconv_with_bias))\n", 187 | " layers.append(nn.Conv2d(planes, planes, kernel_size=1,\n", 188 | " bias=False))\n", 189 | " layers.append(nn.BatchNorm2d(planes, momentum=BN_MOMENTUM))\n", 190 | " layers.append(nn.ReLU(inplace=True))\n", 191 | " self.inplanes = planes\n", 192 | "\n", 193 | " return nn.Sequential(*layers)\n", 194 | "\n", 195 | " def forward(self, x):\n", 196 | " x = self.conv1(x)\n", 197 | " x = self.bn1(x)\n", 198 | " x = self.relu(x)\n", 199 | " x = self.maxpool(x)\n", 200 | "\n", 201 | " x = self.layer1(x)\n", 202 | " x = self.layer2(x)\n", 203 | " x = self.layer3(x)\n", 204 | " x = self.layer4(x)\n", 205 | "\n", 206 | " x = self.deconv_layers(x)\n", 207 | " x = self.final_layer(x)\n", 208 | "\n", 209 | " return x\n", 210 | "\n", 211 | "resnet_spec = {50: (Bottleneck, [3, 4, 6, 3]),\n", 212 | " 101: (Bottleneck, [3, 4, 23, 3]),\n", 213 | " 152: (Bottleneck, [3, 8, 36, 3])}\n", 214 | "\n", 215 | "def get_pose_net():\n", 216 | " num_layers = 152\n", 217 | " block_class, layers = resnet_spec[num_layers]\n", 218 | " model = PoseResNet(block_class, layers)\n", 219 | " return model\n" 220 | ] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": 75, 225 | "metadata": {}, 226 | "outputs": [ 227 | { 228 | "name": "stdout", 229 | "output_type": "stream", 230 | "text": [ 231 | "test time elapsed 4.385964912280702ms\n" 232 | ] 233 | } 234 | ], 235 | "source": [ 236 | "net = get_pose_net()\n", 237 | "import time\n", 238 | "X = torch.rand(1, 3, 192, 256)\n", 239 | "tsince = int(round(time.time()*1000))\n", 240 | "a = net(X)\n", 241 | "ttime_elapsed = 1000 / (int(round(time.time()*1000)) - tsince)\n", 242 | "print ('test time elapsed {}ms'.format(ttime_elapsed))" 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": 76, 248 | "metadata": {}, 249 | "outputs": [ 250 | { 251 | "name": "stdout", 252 | "output_type": "stream", 253 | "text": [ 254 | "[INFO] Register count_convNd() for .\n", 255 | "[INFO] Register count_bn() for .\n", 256 | "[INFO] Register zero_ops() for .\n", 257 | "[INFO] Register zero_ops() for .\n", 258 | "\u001b[91m[WARN] Cannot find rule for . Treat it as zero Macs and zero Params.\u001b[00m\n", 259 | "\u001b[91m[WARN] Cannot find rule for . Treat it as zero Macs and zero Params.\u001b[00m\n", 260 | "\u001b[91m[WARN] Cannot find rule for . Treat it as zero Macs and zero Params.\u001b[00m\n", 261 | "\u001b[91m[WARN] Cannot find rule for . Treat it as zero Macs and zero Params.\u001b[00m\n", 262 | "[INFO] Register count_convNd() for .\n", 263 | "\u001b[91m[WARN] Cannot find rule for . Treat it as zero Macs and zero Params.\u001b[00m\n", 264 | "1283828736.0 3870289.0\n" 265 | ] 266 | }, 267 | { 268 | "data": { 269 | "text/plain": [ 270 | "('1.284G', '3.870M')" 271 | ] 272 | }, 273 | "execution_count": 76, 274 | "metadata": {}, 275 | "output_type": "execute_result" 276 | } 277 | ], 278 | "source": [ 279 | "from thop import profile\n", 280 | "from thop import clever_format\n", 281 | "macs, params = profile(net, inputs=(X, ))\n", 282 | "print(macs, params)\n", 283 | "macs, params = clever_format([macs, params], \"%.3f\")\n", 284 | "macs, params" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": 22, 290 | "metadata": {}, 291 | "outputs": [ 292 | { 293 | "ename": "SyntaxError", 294 | "evalue": "invalid syntax (, line 1)", 295 | "output_type": "error", 296 | "traceback": [ 297 | "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m ('1.095G', '2.732M') no GCB ('1.097G', '2.897M')with GCB\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" 298 | ] 299 | } 300 | ], 301 | "source": [ 302 | "('1.095G', '2.732M') no GCB ('1.097G', '2.897M')with GCB\n", 303 | "('803.712M', '1.609M') no GCB ('805.454M', '1.774M') with GCB \n", 304 | "101 ('1.044G', '3.158M')withGCB ('1.041G', '2.845M')no\n", 305 | "152 ('1.288G', '4.306M')withGCB ('1.284G', '3.870M')no" 306 | ] 307 | }, 308 | { 309 | "cell_type": "code", 310 | "execution_count": 24, 311 | "metadata": {}, 312 | "outputs": [], 313 | "source": [ 314 | "import torch\n", 315 | "import torch.nn as nn\n", 316 | "import math\n", 317 | "def _make_divisible(v, divisor, min_value=None):\n", 318 | " \"\"\"\n", 319 | " This function is taken from the original tf repo.\n", 320 | " It ensures that all layers have a channel number that is divisible by 8\n", 321 | " It can be seen here:\n", 322 | " https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py\n", 323 | " \"\"\"\n", 324 | " if min_value is None:\n", 325 | " min_value = divisor\n", 326 | " new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)\n", 327 | " # Make sure that round down does not go down by more than 10%.\n", 328 | " if new_v < 0.9 * v:\n", 329 | " new_v += divisor\n", 330 | " return new_v\n", 331 | "\n", 332 | "\n", 333 | "class SELayer(nn.Module):\n", 334 | " def __init__(self, channel, reduction=4):\n", 335 | " super(SELayer, self).__init__()\n", 336 | " self.avg_pool = nn.AdaptiveAvgPool2d(1)\n", 337 | " self.fc = nn.Sequential(\n", 338 | " nn.Linear(channel, channel // reduction),\n", 339 | " nn.ReLU(inplace=True),\n", 340 | " nn.Linear(channel // reduction, channel), )\n", 341 | "\n", 342 | " def forward(self, x):\n", 343 | " b, c, _, _ = x.size()\n", 344 | " y = self.avg_pool(x).view(b, c)\n", 345 | " y = self.fc(y).view(b, c, 1, 1)\n", 346 | " y = torch.clamp(y, 0, 1)\n", 347 | " return x * y\n", 348 | "\n", 349 | "\n", 350 | "def depthwise_conv(inp, oup, kernel_size=3, stride=1, relu=False):\n", 351 | " return nn.Sequential(\n", 352 | " nn.Conv2d(inp, oup, kernel_size, stride, kernel_size//2, groups=inp, bias=False),\n", 353 | " nn.BatchNorm2d(oup),\n", 354 | " nn.ReLU(inplace=True) if relu else nn.Sequential(),\n", 355 | " )\n", 356 | "\n", 357 | "class GhostModule(nn.Module):\n", 358 | " def __init__(self, inp, oup, kernel_size=1, ratio=2, dw_size=3, stride=1, relu=True):\n", 359 | " super(GhostModule, self).__init__()\n", 360 | " self.oup = oup\n", 361 | " init_channels = math.ceil(oup / ratio)\n", 362 | " new_channels = init_channels*(ratio-1)\n", 363 | "\n", 364 | " self.primary_conv = nn.Sequential(\n", 365 | " nn.Conv2d(inp, init_channels, kernel_size, stride, kernel_size//2, bias=False),\n", 366 | " nn.BatchNorm2d(init_channels),\n", 367 | " nn.ReLU(inplace=True) if relu else nn.Sequential(),\n", 368 | " )\n", 369 | "\n", 370 | " self.cheap_operation = nn.Sequential(\n", 371 | " nn.Conv2d(init_channels, new_channels, dw_size, 1, dw_size//2, groups=init_channels, bias=False),\n", 372 | " nn.BatchNorm2d(new_channels),\n", 373 | " nn.ReLU(inplace=True) if relu else nn.Sequential(),\n", 374 | " )\n", 375 | "\n", 376 | " def forward(self, x):\n", 377 | " x1 = self.primary_conv(x)\n", 378 | " x2 = self.cheap_operation(x1)\n", 379 | " out = torch.cat([x1,x2], dim=1)\n", 380 | " return out[:,:self.oup,:,:]\n" 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": null, 386 | "metadata": {}, 387 | "outputs": [], 388 | "source": [] 389 | } 390 | ], 391 | "metadata": { 392 | "kernelspec": { 393 | "display_name": "Python 3", 394 | "language": "python", 395 | "name": "python3" 396 | }, 397 | "language_info": { 398 | "codemirror_mode": { 399 | "name": "ipython", 400 | "version": 3 401 | }, 402 | "file_extension": ".py", 403 | "mimetype": "text/x-python", 404 | "name": "python", 405 | "nbconvert_exporter": "python", 406 | "pygments_lexer": "ipython3", 407 | "version": "3.6.12" 408 | } 409 | }, 410 | "nbformat": 4, 411 | "nbformat_minor": 4 412 | } 413 | -------------------------------------------------------------------------------- /lib/models/__init__.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import models.pose_resnet 12 | import models.lp_net 13 | -------------------------------------------------------------------------------- /lib/models/__pycache__/__init__.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/models/__pycache__/__init__.cpython-37.pyc -------------------------------------------------------------------------------- /lib/models/__pycache__/lp_net.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/models/__pycache__/lp_net.cpython-37.pyc -------------------------------------------------------------------------------- /lib/models/__pycache__/pose_resnet.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/models/__pycache__/pose_resnet.cpython-37.pyc -------------------------------------------------------------------------------- /lib/models/ct/__init__.py: -------------------------------------------------------------------------------- 1 | from .context_block import ContextBlock 2 | 3 | __all__ = [ 4 | 'ContextBlock', 5 | ] -------------------------------------------------------------------------------- /lib/models/ct/__pycache__/__init__.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/models/ct/__pycache__/__init__.cpython-37.pyc -------------------------------------------------------------------------------- /lib/models/ct/__pycache__/context_block.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/models/ct/__pycache__/context_block.cpython-37.pyc -------------------------------------------------------------------------------- /lib/models/ct/context_block.py: -------------------------------------------------------------------------------- 1 | from mmcv.cnn import constant_init, kaiming_init 2 | import torch 3 | from torch import nn 4 | 5 | 6 | def last_zero_init(m): 7 | if isinstance(m, nn.Sequential): 8 | constant_init(m[-1], val=0) 9 | else: 10 | constant_init(m, val=0) 11 | 12 | 13 | class ContextBlock(nn.Module): 14 | 15 | def __init__(self, 16 | inplanes, 17 | ratio=1/32, 18 | fusion_types=('channel_add', 'channel_mul')): 19 | super(ContextBlock, self).__init__() 20 | self.inplanes = inplanes 21 | self.ratio = ratio 22 | self.planes = int(inplanes * ratio) 23 | self.fusion_types = fusion_types 24 | self.avg_pool = nn.AdaptiveAvgPool2d(1) 25 | if 'channel_add' in fusion_types: 26 | self.channel_add_conv = nn.Sequential( 27 | nn.Conv2d(self.inplanes, self.planes, kernel_size=1), 28 | nn.LayerNorm([self.planes, 1, 1]), 29 | nn.ReLU(inplace=True), # yapf: disable 30 | nn.Conv2d(self.planes, self.inplanes, kernel_size=1)) 31 | else: 32 | self.channel_add_conv = None 33 | if 'channel_mul' in fusion_types: 34 | self.channel_mul_conv = nn.Sequential( 35 | nn.Conv2d(self.inplanes, self.planes, kernel_size=1), 36 | nn.LayerNorm([self.planes, 1, 1]), 37 | nn.ReLU(inplace=True), # yapf: disable 38 | nn.Conv2d(self.planes, self.inplanes, kernel_size=1)) 39 | else: 40 | self.channel_mul_conv = None 41 | self.reset_parameters() 42 | 43 | def reset_parameters(self): 44 | if self.channel_add_conv is not None: 45 | last_zero_init(self.channel_add_conv) 46 | if self.channel_mul_conv is not None: 47 | last_zero_init(self.channel_mul_conv) 48 | 49 | def spatial_pool(self, x): 50 | batch, channel, height, width = x.size() 51 | context = self.avg_pool(x) 52 | 53 | return context 54 | 55 | def forward(self, x): 56 | # [N, C, 1, 1] 57 | context = self.spatial_pool(x) 58 | 59 | out = x 60 | if self.channel_mul_conv is not None: 61 | # [N, C, 1, 1] 62 | channel_mul_term = torch.sigmoid(self.channel_mul_conv(context)) 63 | out = out * channel_mul_term 64 | if self.channel_add_conv is not None: 65 | # [N, C, 1, 1] 66 | channel_add_term = self.channel_add_conv(context) 67 | out = out + channel_add_term 68 | 69 | return out 70 | -------------------------------------------------------------------------------- /lib/models/lp_net.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import 2 | from __future__ import division 3 | from __future__ import print_function 4 | 5 | import os 6 | import logging 7 | 8 | import torch 9 | import torch.nn as nn 10 | from .ct.context_block import ContextBlock 11 | from collections import OrderedDict 12 | 13 | BN_MOMENTUM = 0.1 14 | logger = logging.getLogger(__name__) 15 | 16 | 17 | class DepthwiseConv2D(nn.Module): 18 | def __init__(self, in_channels, kernel_size, stride, bias=False): 19 | super(DepthwiseConv2D, self).__init__() 20 | padding = (kernel_size - 1) // 2 21 | 22 | self.depthwise_conv = nn.Conv2d(in_channels, in_channels, kernel_size=kernel_size, padding=padding, stride=stride, groups=in_channels, bias=bias) 23 | 24 | def forward(self, x): 25 | out = self.depthwise_conv(x) 26 | return out 27 | 28 | 29 | class Bottleneck(nn.Module): 30 | expansion = 1 31 | USE_GCB = True 32 | 33 | def __init__(self, inplanes, planes, stride=1, downsample=None): 34 | super(Bottleneck, self).__init__() 35 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False) 36 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 37 | self.conv2 = DepthwiseConv2D(planes, kernel_size=3, stride=stride) 38 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 39 | self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, 40 | bias=False) 41 | self.bn3 = nn.BatchNorm2d(planes * self.expansion, 42 | momentum=BN_MOMENTUM) 43 | if self.USE_GCB: 44 | self.gcb4 = ContextBlock(planes) 45 | else: 46 | self.gcb4 = None 47 | 48 | self.relu = nn.ReLU(inplace=True) 49 | self.downsample = downsample 50 | self.stride = stride 51 | 52 | def forward(self, x): 53 | residual = x 54 | 55 | out = self.conv1(x) 56 | out = self.bn1(out) 57 | out = self.relu(out) 58 | 59 | out = self.conv2(out) 60 | out = self.bn2(out) 61 | out = self.relu(out) 62 | 63 | out = self.conv3(out) 64 | out = self.bn3(out) 65 | 66 | if self.gcb4 is not None: 67 | out = self.gcb4(out) 68 | 69 | if self.downsample is not None: 70 | residual = self.downsample(x) 71 | 72 | out += residual 73 | out = self.relu(out) 74 | 75 | return out 76 | 77 | 78 | class PoseResNet(nn.Module): 79 | 80 | def __init__(self, block, layers, cfg, **kwargs): 81 | self.inplanes = 64 82 | extra = cfg.MODEL.EXTRA 83 | self.deconv_with_bias = extra.DECONV_WITH_BIAS 84 | self.use_gcb = extra.USE_GCB 85 | 86 | super(PoseResNet, self).__init__() 87 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, 88 | bias=False) 89 | self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM) 90 | self.relu = nn.ReLU(inplace=True) 91 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 92 | self.layer1 = self._make_layer(block, 64, layers[0]) 93 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2) 94 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2) 95 | self.layer4 = self._make_layer(block, 512, layers[3]) 96 | 97 | # used for deconv layers 98 | self.deconv_layers = self._make_deconv_layer( 99 | extra.NUM_DECONV_LAYERS, 100 | extra.NUM_DECONV_FILTERS, 101 | extra.NUM_DECONV_KERNELS, 102 | ) 103 | 104 | self.final_layer = nn.Conv2d( 105 | in_channels=extra.NUM_DECONV_FILTERS[-1], 106 | out_channels=cfg.MODEL.NUM_JOINTS, 107 | kernel_size=extra.FINAL_CONV_KERNEL, 108 | stride=1, 109 | padding=1 if extra.FINAL_CONV_KERNEL == 3 else 0 110 | ) 111 | 112 | def _make_layer(self, block, planes, blocks, stride=1): 113 | downsample = None 114 | block.USE_GCB = self.use_gcb 115 | 116 | if stride != 1 or self.inplanes != planes * block.expansion: 117 | downsample = nn.Sequential( 118 | nn.Conv2d(self.inplanes, planes * block.expansion, 119 | kernel_size=1, stride=stride, bias=False), 120 | nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM), 121 | ) 122 | 123 | layers = [] 124 | layers.append(block(self.inplanes, planes, stride, downsample)) 125 | self.inplanes = planes * block.expansion 126 | for i in range(1, blocks): 127 | layers.append(block(self.inplanes, planes)) 128 | 129 | return nn.Sequential(*layers) 130 | 131 | def _get_deconv_cfg(self, deconv_kernel, index): 132 | if deconv_kernel == 4: 133 | padding = 1 134 | output_padding = 0 135 | elif deconv_kernel == 3: 136 | padding = 1 137 | output_padding = 1 138 | elif deconv_kernel == 2: 139 | padding = 0 140 | output_padding = 0 141 | 142 | return deconv_kernel, padding, output_padding 143 | 144 | # (3, [256, 256, 256], [4, 4, 4]) 145 | def _make_deconv_layer(self, num_layers, num_filters, num_kernels): 146 | assert num_layers == len(num_filters), \ 147 | 'ERROR: num_deconv_layers is different len(num_deconv_filters)' 148 | assert num_layers == len(num_kernels), \ 149 | 'ERROR: num_deconv_layers is different len(num_deconv_filters)' 150 | 151 | layers = [] 152 | for i in range(num_layers): 153 | kernel, padding, output_padding = \ 154 | self._get_deconv_cfg(num_kernels[i], i) 155 | 156 | planes = num_filters[i] 157 | layers.append( 158 | nn.ConvTranspose2d( 159 | in_channels=self.inplanes, 160 | out_channels=planes, 161 | kernel_size=kernel, 162 | stride=2, 163 | padding=padding, 164 | output_padding=output_padding, 165 | groups=planes, 166 | bias=self.deconv_with_bias)) 167 | layers.append(nn.Conv2d(planes, planes, kernel_size=1, 168 | bias=False)) 169 | layers.append(nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)) 170 | layers.append(nn.ReLU(inplace=True)) 171 | self.inplanes = planes 172 | 173 | return nn.Sequential(*layers) 174 | 175 | def forward(self, x): 176 | x = self.conv1(x) 177 | x = self.bn1(x) 178 | x = self.relu(x) 179 | x = self.maxpool(x) 180 | 181 | x = self.layer1(x) 182 | x = self.layer2(x) 183 | x = self.layer3(x) 184 | x = self.layer4(x) 185 | 186 | x = self.deconv_layers(x) 187 | x = self.final_layer(x) 188 | 189 | return x 190 | 191 | resnet_spec = {50: (Bottleneck, [3, 4, 6, 3]), 192 | 101: (Bottleneck, [3, 4, 23, 3]), 193 | 152: (Bottleneck, [3, 8, 36, 3])} 194 | 195 | def get_pose_net(cfg, is_train, **kwargs): 196 | num_layers = cfg.MODEL.EXTRA.NUM_LAYERS 197 | block_class, layers = resnet_spec[num_layers] 198 | model = PoseResNet(block_class, layers, cfg, **kwargs) 199 | if is_train and cfg.MODEL.INIT_WEIGHTS: 200 | model.init_weights(cfg.MODEL.PRETRAINED) 201 | return model 202 | -------------------------------------------------------------------------------- /lib/models/pose_resnet.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import os 12 | import logging 13 | 14 | import torch 15 | import torch.nn as nn 16 | from collections import OrderedDict 17 | 18 | 19 | BN_MOMENTUM = 0.1 20 | logger = logging.getLogger(__name__) 21 | 22 | 23 | def conv3x3(in_planes, out_planes, stride=1): 24 | """3x3 convolution with padding""" 25 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 26 | padding=1, bias=False) 27 | 28 | 29 | class BasicBlock(nn.Module): 30 | expansion = 1 31 | 32 | def __init__(self, inplanes, planes, stride=1, downsample=None): 33 | super(BasicBlock, self).__init__() 34 | self.conv1 = conv3x3(inplanes, planes, stride) 35 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 36 | self.relu = nn.ReLU(inplace=True) 37 | self.conv2 = conv3x3(planes, planes) 38 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 39 | self.downsample = downsample 40 | self.stride = stride 41 | 42 | def forward(self, x): 43 | residual = x 44 | 45 | out = self.conv1(x) 46 | out = self.bn1(out) 47 | out = self.relu(out) 48 | 49 | out = self.conv2(out) 50 | out = self.bn2(out) 51 | 52 | if self.downsample is not None: 53 | residual = self.downsample(x) 54 | 55 | out += residual 56 | out = self.relu(out) 57 | 58 | return out 59 | 60 | 61 | class Bottleneck(nn.Module): 62 | expansion = 4 63 | 64 | def __init__(self, inplanes, planes, stride=1, downsample=None): 65 | super(Bottleneck, self).__init__() 66 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False) 67 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 68 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, 69 | padding=1, bias=False) 70 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 71 | self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, 72 | bias=False) 73 | self.bn3 = nn.BatchNorm2d(planes * self.expansion, 74 | momentum=BN_MOMENTUM) 75 | self.relu = nn.ReLU(inplace=True) 76 | self.downsample = downsample 77 | self.stride = stride 78 | 79 | def forward(self, x): 80 | residual = x 81 | 82 | out = self.conv1(x) 83 | out = self.bn1(out) 84 | out = self.relu(out) 85 | 86 | out = self.conv2(out) 87 | out = self.bn2(out) 88 | out = self.relu(out) 89 | 90 | out = self.conv3(out) 91 | out = self.bn3(out) 92 | 93 | if self.downsample is not None: 94 | residual = self.downsample(x) 95 | 96 | out += residual 97 | out = self.relu(out) 98 | 99 | return out 100 | 101 | 102 | class Bottleneck_CAFFE(nn.Module): 103 | expansion = 4 104 | 105 | def __init__(self, inplanes, planes, stride=1, downsample=None): 106 | super(Bottleneck_CAFFE, self).__init__() 107 | # add stride to conv1x1 108 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, stride=stride, bias=False) 109 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 110 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, 111 | padding=1, bias=False) 112 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 113 | self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, 114 | bias=False) 115 | self.bn3 = nn.BatchNorm2d(planes * self.expansion, 116 | momentum=BN_MOMENTUM) 117 | self.relu = nn.ReLU(inplace=True) 118 | self.downsample = downsample 119 | self.stride = stride 120 | 121 | def forward(self, x): 122 | residual = x 123 | 124 | out = self.conv1(x) 125 | out = self.bn1(out) 126 | out = self.relu(out) 127 | 128 | out = self.conv2(out) 129 | out = self.bn2(out) 130 | out = self.relu(out) 131 | 132 | out = self.conv3(out) 133 | out = self.bn3(out) 134 | 135 | if self.downsample is not None: 136 | residual = self.downsample(x) 137 | 138 | out += residual 139 | out = self.relu(out) 140 | 141 | return out 142 | 143 | 144 | class PoseResNet(nn.Module): 145 | 146 | def __init__(self, block, layers, cfg, **kwargs): 147 | self.inplanes = 64 148 | extra = cfg.MODEL.EXTRA 149 | self.deconv_with_bias = extra.DECONV_WITH_BIAS 150 | 151 | super(PoseResNet, self).__init__() 152 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, 153 | bias=False) 154 | self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM) 155 | self.relu = nn.ReLU(inplace=True) 156 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 157 | self.layer1 = self._make_layer(block, 64, layers[0]) 158 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2) 159 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2) 160 | self.layer4 = self._make_layer(block, 512, layers[3], stride=2) 161 | 162 | # used for deconv layers 163 | self.deconv_layers = self._make_deconv_layer( 164 | extra.NUM_DECONV_LAYERS, 165 | extra.NUM_DECONV_FILTERS, 166 | extra.NUM_DECONV_KERNELS, 167 | ) 168 | 169 | self.final_layer = nn.Conv2d( 170 | in_channels=extra.NUM_DECONV_FILTERS[-1], 171 | out_channels=cfg.MODEL.NUM_JOINTS, 172 | kernel_size=extra.FINAL_CONV_KERNEL, 173 | stride=1, 174 | padding=1 if extra.FINAL_CONV_KERNEL == 3 else 0 175 | ) 176 | 177 | def _make_layer(self, block, planes, blocks, stride=1): 178 | downsample = None 179 | if stride != 1 or self.inplanes != planes * block.expansion: 180 | downsample = nn.Sequential( 181 | nn.Conv2d(self.inplanes, planes * block.expansion, 182 | kernel_size=1, stride=stride, bias=False), 183 | nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM), 184 | ) 185 | 186 | layers = [] 187 | layers.append(block(self.inplanes, planes, stride, downsample)) 188 | self.inplanes = planes * block.expansion 189 | for i in range(1, blocks): 190 | layers.append(block(self.inplanes, planes)) 191 | 192 | return nn.Sequential(*layers) 193 | 194 | def _get_deconv_cfg(self, deconv_kernel, index): 195 | if deconv_kernel == 4: 196 | padding = 1 197 | output_padding = 0 198 | elif deconv_kernel == 3: 199 | padding = 1 200 | output_padding = 1 201 | elif deconv_kernel == 2: 202 | padding = 0 203 | output_padding = 0 204 | 205 | return deconv_kernel, padding, output_padding 206 | 207 | def _make_deconv_layer(self, num_layers, num_filters, num_kernels): 208 | assert num_layers == len(num_filters), \ 209 | 'ERROR: num_deconv_layers is different len(num_deconv_filters)' 210 | assert num_layers == len(num_kernels), \ 211 | 'ERROR: num_deconv_layers is different len(num_deconv_filters)' 212 | 213 | layers = [] 214 | for i in range(num_layers): 215 | kernel, padding, output_padding = \ 216 | self._get_deconv_cfg(num_kernels[i], i) 217 | 218 | planes = num_filters[i] 219 | layers.append( 220 | nn.ConvTranspose2d( 221 | in_channels=self.inplanes, 222 | out_channels=planes, 223 | kernel_size=kernel, 224 | stride=2, 225 | padding=padding, 226 | output_padding=output_padding, 227 | bias=self.deconv_with_bias)) 228 | layers.append(nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)) 229 | layers.append(nn.ReLU(inplace=True)) 230 | self.inplanes = planes 231 | 232 | return nn.Sequential(*layers) 233 | 234 | def forward(self, x): 235 | x = self.conv1(x) 236 | x = self.bn1(x) 237 | x = self.relu(x) 238 | x = self.maxpool(x) 239 | 240 | x = self.layer1(x) 241 | x = self.layer2(x) 242 | x = self.layer3(x) 243 | x = self.layer4(x) 244 | 245 | x = self.deconv_layers(x) 246 | x = self.final_layer(x) 247 | 248 | return x 249 | 250 | def init_weights(self, pretrained=''): 251 | if os.path.isfile(pretrained): 252 | logger.info('=> init deconv weights from normal distribution') 253 | for name, m in self.deconv_layers.named_modules(): 254 | if isinstance(m, nn.ConvTranspose2d): 255 | logger.info('=> init {}.weight as normal(0, 0.001)'.format(name)) 256 | logger.info('=> init {}.bias as 0'.format(name)) 257 | nn.init.normal_(m.weight, std=0.001) 258 | if self.deconv_with_bias: 259 | nn.init.constant_(m.bias, 0) 260 | elif isinstance(m, nn.BatchNorm2d): 261 | logger.info('=> init {}.weight as 1'.format(name)) 262 | logger.info('=> init {}.bias as 0'.format(name)) 263 | nn.init.constant_(m.weight, 1) 264 | nn.init.constant_(m.bias, 0) 265 | logger.info('=> init final conv weights from normal distribution') 266 | for m in self.final_layer.modules(): 267 | if isinstance(m, nn.Conv2d): 268 | # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 269 | logger.info('=> init {}.weight as normal(0, 0.001)'.format(name)) 270 | logger.info('=> init {}.bias as 0'.format(name)) 271 | nn.init.normal_(m.weight, std=0.001) 272 | nn.init.constant_(m.bias, 0) 273 | 274 | # pretrained_state_dict = torch.load(pretrained) 275 | logger.info('=> loading pretrained model {}'.format(pretrained)) 276 | # self.load_state_dict(pretrained_state_dict, strict=False) 277 | checkpoint = torch.load(pretrained) 278 | if isinstance(checkpoint, OrderedDict): 279 | state_dict = checkpoint 280 | elif isinstance(checkpoint, dict) and 'state_dict' in checkpoint: 281 | state_dict_old = checkpoint['state_dict'] 282 | state_dict = OrderedDict() 283 | # delete 'module.' because it is saved from DataParallel module 284 | for key in state_dict_old.keys(): 285 | if key.startswith('module.'): 286 | # state_dict[key[7:]] = state_dict[key] 287 | # state_dict.pop(key) 288 | state_dict[key[7:]] = state_dict_old[key] 289 | else: 290 | state_dict[key] = state_dict_old[key] 291 | else: 292 | raise RuntimeError( 293 | 'No state_dict found in checkpoint file {}'.format(pretrained)) 294 | self.load_state_dict(state_dict, strict=False) 295 | else: 296 | logger.error('=> imagenet pretrained model dose not exist') 297 | logger.error('=> please download it first') 298 | raise ValueError('imagenet pretrained model does not exist') 299 | 300 | 301 | resnet_spec = {18: (BasicBlock, [2, 2, 2, 2]), 302 | 34: (BasicBlock, [3, 4, 6, 3]), 303 | 50: (Bottleneck, [3, 4, 6, 3]), 304 | 101: (Bottleneck, [3, 4, 23, 3]), 305 | 152: (Bottleneck, [3, 8, 36, 3])} 306 | 307 | 308 | def get_pose_net(cfg, is_train, **kwargs): 309 | num_layers = cfg.MODEL.EXTRA.NUM_LAYERS 310 | style = cfg.MODEL.STYLE 311 | 312 | block_class, layers = resnet_spec[num_layers] 313 | 314 | if style == 'caffe': 315 | block_class = Bottleneck_CAFFE 316 | 317 | model = PoseResNet(block_class, layers, cfg, **kwargs) 318 | 319 | if is_train and cfg.MODEL.INIT_WEIGHTS: 320 | model.init_weights(cfg.MODEL.PRETRAINED) 321 | 322 | return model 323 | -------------------------------------------------------------------------------- /lib/nms/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/nms/__init__.py -------------------------------------------------------------------------------- /lib/nms/__pycache__/__init__.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/nms/__pycache__/__init__.cpython-37.pyc -------------------------------------------------------------------------------- /lib/nms/__pycache__/nms.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/nms/__pycache__/nms.cpython-37.pyc -------------------------------------------------------------------------------- /lib/nms/cpu_nms.cpython-37m-x86_64-linux-gnu.so: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/nms/cpu_nms.cpython-37m-x86_64-linux-gnu.so -------------------------------------------------------------------------------- /lib/nms/cpu_nms.pyx: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Modified from py-faster-rcnn (https://github.com/rbgirshick/py-faster-rcnn) 5 | # ------------------------------------------------------------------------------ 6 | 7 | import numpy as np 8 | cimport numpy as np 9 | 10 | cdef inline np.float32_t max(np.float32_t a, np.float32_t b): 11 | return a if a >= b else b 12 | 13 | cdef inline np.float32_t min(np.float32_t a, np.float32_t b): 14 | return a if a <= b else b 15 | 16 | def cpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh): 17 | cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0] 18 | cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1] 19 | cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2] 20 | cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3] 21 | cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4] 22 | 23 | cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1 + 1) * (y2 - y1 + 1) 24 | cdef np.ndarray[np.int_t, ndim=1] order = scores.argsort()[::-1].astype('i') 25 | 26 | cdef int ndets = dets.shape[0] 27 | cdef np.ndarray[np.int_t, ndim=1] suppressed = \ 28 | np.zeros((ndets), dtype=np.int) 29 | 30 | # nominal indices 31 | cdef int _i, _j 32 | # sorted indices 33 | cdef int i, j 34 | # temp variables for box i's (the box currently under consideration) 35 | cdef np.float32_t ix1, iy1, ix2, iy2, iarea 36 | # variables for computing overlap with box j (lower scoring box) 37 | cdef np.float32_t xx1, yy1, xx2, yy2 38 | cdef np.float32_t w, h 39 | cdef np.float32_t inter, ovr 40 | 41 | keep = [] 42 | for _i in range(ndets): 43 | i = order[_i] 44 | if suppressed[i] == 1: 45 | continue 46 | keep.append(i) 47 | ix1 = x1[i] 48 | iy1 = y1[i] 49 | ix2 = x2[i] 50 | iy2 = y2[i] 51 | iarea = areas[i] 52 | for _j in range(_i + 1, ndets): 53 | j = order[_j] 54 | if suppressed[j] == 1: 55 | continue 56 | xx1 = max(ix1, x1[j]) 57 | yy1 = max(iy1, y1[j]) 58 | xx2 = min(ix2, x2[j]) 59 | yy2 = min(iy2, y2[j]) 60 | w = max(0.0, xx2 - xx1 + 1) 61 | h = max(0.0, yy2 - yy1 + 1) 62 | inter = w * h 63 | ovr = inter / (iarea + areas[j] - inter) 64 | if ovr >= thresh: 65 | suppressed[j] = 1 66 | 67 | return keep 68 | -------------------------------------------------------------------------------- /lib/nms/gpu_nms.cpython-37m-x86_64-linux-gnu.so: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/nms/gpu_nms.cpython-37m-x86_64-linux-gnu.so -------------------------------------------------------------------------------- /lib/nms/gpu_nms.hpp: -------------------------------------------------------------------------------- 1 | void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num, 2 | int boxes_dim, float nms_overlap_thresh, int device_id); 3 | -------------------------------------------------------------------------------- /lib/nms/gpu_nms.pyx: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Modified from py-faster-rcnn (https://github.com/rbgirshick/py-faster-rcnn) 5 | # ------------------------------------------------------------------------------ 6 | 7 | import numpy as np 8 | cimport numpy as np 9 | 10 | assert sizeof(int) == sizeof(np.int32_t) 11 | 12 | cdef extern from "gpu_nms.hpp": 13 | void _nms(np.int32_t*, int*, np.float32_t*, int, int, float, int) 14 | 15 | def gpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh, 16 | np.int32_t device_id=0): 17 | cdef int boxes_num = dets.shape[0] 18 | cdef int boxes_dim = dets.shape[1] 19 | cdef int num_out 20 | cdef np.ndarray[np.int32_t, ndim=1] \ 21 | keep = np.zeros(boxes_num, dtype=np.int32) 22 | cdef np.ndarray[np.float32_t, ndim=1] \ 23 | scores = dets[:, 4] 24 | cdef np.ndarray[np.int32_t, ndim=1] \ 25 | order = scores.argsort()[::-1].astype(np.int32) 26 | cdef np.ndarray[np.float32_t, ndim=2] \ 27 | sorted_dets = dets[order, :] 28 | _nms(&keep[0], &num_out, &sorted_dets[0, 0], boxes_num, boxes_dim, thresh, device_id) 29 | keep = keep[:num_out] 30 | return list(order[keep]) 31 | -------------------------------------------------------------------------------- /lib/nms/nms.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Modified from py-faster-rcnn (https://github.com/rbgirshick/py-faster-rcnn) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import numpy as np 12 | 13 | from .cpu_nms import cpu_nms 14 | from .gpu_nms import gpu_nms 15 | 16 | 17 | def py_nms_wrapper(thresh): 18 | def _nms(dets): 19 | return nms(dets, thresh) 20 | return _nms 21 | 22 | 23 | def cpu_nms_wrapper(thresh): 24 | def _nms(dets): 25 | return cpu_nms(dets, thresh) 26 | return _nms 27 | 28 | 29 | def gpu_nms_wrapper(thresh, device_id): 30 | def _nms(dets): 31 | return gpu_nms(dets, thresh, device_id) 32 | return _nms 33 | 34 | 35 | def nms(dets, thresh): 36 | """ 37 | greedily select boxes with high confidence and overlap with current maximum <= thresh 38 | rule out overlap >= thresh 39 | :param dets: [[x1, y1, x2, y2 score]] 40 | :param thresh: retain overlap < thresh 41 | :return: indexes to keep 42 | """ 43 | if dets.shape[0] == 0: 44 | return [] 45 | 46 | x1 = dets[:, 0] 47 | y1 = dets[:, 1] 48 | x2 = dets[:, 2] 49 | y2 = dets[:, 3] 50 | scores = dets[:, 4] 51 | 52 | areas = (x2 - x1 + 1) * (y2 - y1 + 1) 53 | order = scores.argsort()[::-1] 54 | 55 | keep = [] 56 | while order.size > 0: 57 | i = order[0] 58 | keep.append(i) 59 | xx1 = np.maximum(x1[i], x1[order[1:]]) 60 | yy1 = np.maximum(y1[i], y1[order[1:]]) 61 | xx2 = np.minimum(x2[i], x2[order[1:]]) 62 | yy2 = np.minimum(y2[i], y2[order[1:]]) 63 | 64 | w = np.maximum(0.0, xx2 - xx1 + 1) 65 | h = np.maximum(0.0, yy2 - yy1 + 1) 66 | inter = w * h 67 | ovr = inter / (areas[i] + areas[order[1:]] - inter) 68 | 69 | inds = np.where(ovr <= thresh)[0] 70 | order = order[inds + 1] 71 | 72 | return keep 73 | 74 | def oks_iou(g, d, a_g, a_d, sigmas=None, in_vis_thre=None): 75 | if not isinstance(sigmas, np.ndarray): 76 | sigmas = np.array([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07, .87, .87, .89, .89]) / 10.0 77 | vars = (sigmas * 2) ** 2 78 | xg = g[0::3] 79 | yg = g[1::3] 80 | vg = g[2::3] 81 | ious = np.zeros((d.shape[0])) 82 | for n_d in range(0, d.shape[0]): 83 | xd = d[n_d, 0::3] 84 | yd = d[n_d, 1::3] 85 | vd = d[n_d, 2::3] 86 | dx = xd - xg 87 | dy = yd - yg 88 | e = (dx ** 2 + dy ** 2) / vars / ((a_g + a_d[n_d]) / 2 + np.spacing(1)) / 2 89 | if in_vis_thre is not None: 90 | ind = list(vg > in_vis_thre) and list(vd > in_vis_thre) 91 | e = e[ind] 92 | ious[n_d] = np.sum(np.exp(-e)) / e.shape[0] if e.shape[0] != 0 else 0.0 93 | return ious 94 | 95 | def oks_nms(kpts_db, thresh, sigmas=None, in_vis_thre=None): 96 | """ 97 | greedily select boxes with high confidence and overlap with current maximum <= thresh 98 | rule out overlap >= thresh, overlap = oks 99 | :param kpts_db 100 | :param thresh: retain overlap < thresh 101 | :return: indexes to keep 102 | """ 103 | if len(kpts_db) == 0: 104 | return [] 105 | 106 | scores = np.array([kpts_db[i]['score'] for i in range(len(kpts_db))]) 107 | kpts = np.array([kpts_db[i]['keypoints'].flatten() for i in range(len(kpts_db))]) 108 | areas = np.array([kpts_db[i]['area'] for i in range(len(kpts_db))]) 109 | 110 | order = scores.argsort()[::-1] 111 | 112 | keep = [] 113 | while order.size > 0: 114 | i = order[0] 115 | keep.append(i) 116 | 117 | oks_ovr = oks_iou(kpts[i], kpts[order[1:]], areas[i], areas[order[1:]], sigmas, in_vis_thre) 118 | 119 | inds = np.where(oks_ovr <= thresh)[0] 120 | order = order[inds + 1] 121 | 122 | return keep 123 | 124 | -------------------------------------------------------------------------------- /lib/nms/nms_kernel.cu: -------------------------------------------------------------------------------- 1 | // ------------------------------------------------------------------ 2 | // Copyright (c) Microsoft 3 | // Licensed under The MIT License 4 | // Modified from MATLAB Faster R-CNN (https://github.com/shaoqingren/faster_rcnn) 5 | // ------------------------------------------------------------------ 6 | 7 | #include "gpu_nms.hpp" 8 | #include 9 | #include 10 | 11 | #define CUDA_CHECK(condition) \ 12 | /* Code block avoids redefinition of cudaError_t error */ \ 13 | do { \ 14 | cudaError_t error = condition; \ 15 | if (error != cudaSuccess) { \ 16 | std::cout << cudaGetErrorString(error) << std::endl; \ 17 | } \ 18 | } while (0) 19 | 20 | #define DIVUP(m,n) ((m) / (n) + ((m) % (n) > 0)) 21 | int const threadsPerBlock = sizeof(unsigned long long) * 8; 22 | 23 | __device__ inline float devIoU(float const * const a, float const * const b) { 24 | float left = max(a[0], b[0]), right = min(a[2], b[2]); 25 | float top = max(a[1], b[1]), bottom = min(a[3], b[3]); 26 | float width = max(right - left + 1, 0.f), height = max(bottom - top + 1, 0.f); 27 | float interS = width * height; 28 | float Sa = (a[2] - a[0] + 1) * (a[3] - a[1] + 1); 29 | float Sb = (b[2] - b[0] + 1) * (b[3] - b[1] + 1); 30 | return interS / (Sa + Sb - interS); 31 | } 32 | 33 | __global__ void nms_kernel(const int n_boxes, const float nms_overlap_thresh, 34 | const float *dev_boxes, unsigned long long *dev_mask) { 35 | const int row_start = blockIdx.y; 36 | const int col_start = blockIdx.x; 37 | 38 | // if (row_start > col_start) return; 39 | 40 | const int row_size = 41 | min(n_boxes - row_start * threadsPerBlock, threadsPerBlock); 42 | const int col_size = 43 | min(n_boxes - col_start * threadsPerBlock, threadsPerBlock); 44 | 45 | __shared__ float block_boxes[threadsPerBlock * 5]; 46 | if (threadIdx.x < col_size) { 47 | block_boxes[threadIdx.x * 5 + 0] = 48 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 0]; 49 | block_boxes[threadIdx.x * 5 + 1] = 50 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 1]; 51 | block_boxes[threadIdx.x * 5 + 2] = 52 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 2]; 53 | block_boxes[threadIdx.x * 5 + 3] = 54 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 3]; 55 | block_boxes[threadIdx.x * 5 + 4] = 56 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 4]; 57 | } 58 | __syncthreads(); 59 | 60 | if (threadIdx.x < row_size) { 61 | const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x; 62 | const float *cur_box = dev_boxes + cur_box_idx * 5; 63 | int i = 0; 64 | unsigned long long t = 0; 65 | int start = 0; 66 | if (row_start == col_start) { 67 | start = threadIdx.x + 1; 68 | } 69 | for (i = start; i < col_size; i++) { 70 | if (devIoU(cur_box, block_boxes + i * 5) > nms_overlap_thresh) { 71 | t |= 1ULL << i; 72 | } 73 | } 74 | const int col_blocks = DIVUP(n_boxes, threadsPerBlock); 75 | dev_mask[cur_box_idx * col_blocks + col_start] = t; 76 | } 77 | } 78 | 79 | void _set_device(int device_id) { 80 | int current_device; 81 | CUDA_CHECK(cudaGetDevice(¤t_device)); 82 | if (current_device == device_id) { 83 | return; 84 | } 85 | // The call to cudaSetDevice must come before any calls to Get, which 86 | // may perform initialization using the GPU. 87 | CUDA_CHECK(cudaSetDevice(device_id)); 88 | } 89 | 90 | void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num, 91 | int boxes_dim, float nms_overlap_thresh, int device_id) { 92 | _set_device(device_id); 93 | 94 | float* boxes_dev = NULL; 95 | unsigned long long* mask_dev = NULL; 96 | 97 | const int col_blocks = DIVUP(boxes_num, threadsPerBlock); 98 | 99 | CUDA_CHECK(cudaMalloc(&boxes_dev, 100 | boxes_num * boxes_dim * sizeof(float))); 101 | CUDA_CHECK(cudaMemcpy(boxes_dev, 102 | boxes_host, 103 | boxes_num * boxes_dim * sizeof(float), 104 | cudaMemcpyHostToDevice)); 105 | 106 | CUDA_CHECK(cudaMalloc(&mask_dev, 107 | boxes_num * col_blocks * sizeof(unsigned long long))); 108 | 109 | dim3 blocks(DIVUP(boxes_num, threadsPerBlock), 110 | DIVUP(boxes_num, threadsPerBlock)); 111 | dim3 threads(threadsPerBlock); 112 | nms_kernel<<>>(boxes_num, 113 | nms_overlap_thresh, 114 | boxes_dev, 115 | mask_dev); 116 | 117 | std::vector mask_host(boxes_num * col_blocks); 118 | CUDA_CHECK(cudaMemcpy(&mask_host[0], 119 | mask_dev, 120 | sizeof(unsigned long long) * boxes_num * col_blocks, 121 | cudaMemcpyDeviceToHost)); 122 | 123 | std::vector remv(col_blocks); 124 | memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks); 125 | 126 | int num_to_keep = 0; 127 | for (int i = 0; i < boxes_num; i++) { 128 | int nblock = i / threadsPerBlock; 129 | int inblock = i % threadsPerBlock; 130 | 131 | if (!(remv[nblock] & (1ULL << inblock))) { 132 | keep_out[num_to_keep++] = i; 133 | unsigned long long *p = &mask_host[0] + i * col_blocks; 134 | for (int j = nblock; j < col_blocks; j++) { 135 | remv[j] |= p[j]; 136 | } 137 | } 138 | } 139 | *num_out = num_to_keep; 140 | 141 | CUDA_CHECK(cudaFree(boxes_dev)); 142 | CUDA_CHECK(cudaFree(mask_dev)); 143 | } 144 | -------------------------------------------------------------------------------- /lib/nms/setup.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Copyright (c) Microsoft 3 | # Licensed under The MIT License [see LICENSE for details] 4 | # Modified from py-faster-rcnn (https://github.com/rbgirshick/py-faster-rcnn) 5 | # -------------------------------------------------------- 6 | 7 | import os 8 | from os.path import join as pjoin 9 | from setuptools import setup 10 | from distutils.extension import Extension 11 | from Cython.Distutils import build_ext 12 | import numpy as np 13 | 14 | 15 | def find_in_path(name, path): 16 | "Find a file in a search path" 17 | # Adapted fom 18 | # http://code.activestate.com/recipes/52224-find-a-file-given-a-search-path/ 19 | for dir in path.split(os.pathsep): 20 | binpath = pjoin(dir, name) 21 | if os.path.exists(binpath): 22 | return os.path.abspath(binpath) 23 | return None 24 | 25 | 26 | def locate_cuda(): 27 | """Locate the CUDA environment on the system 28 | Returns a dict with keys 'home', 'nvcc', 'include', and 'lib64' 29 | and values giving the absolute path to each directory. 30 | Starts by looking for the CUDAHOME env variable. If not found, everything 31 | is based on finding 'nvcc' in the PATH. 32 | """ 33 | 34 | # first check if the CUDAHOME env variable is in use 35 | if 'CUDAHOME' in os.environ: 36 | home = os.environ['CUDAHOME'] 37 | nvcc = pjoin(home, 'bin', 'nvcc') 38 | else: 39 | # otherwise, search the PATH for NVCC 40 | default_path = pjoin(os.sep, 'usr', 'local', 'cuda', 'bin') 41 | nvcc = find_in_path('nvcc', os.environ['PATH'] + os.pathsep + default_path) 42 | if nvcc is None: 43 | raise EnvironmentError('The nvcc binary could not be ' 44 | 'located in your $PATH. Either add it to your path, or set $CUDAHOME') 45 | home = os.path.dirname(os.path.dirname(nvcc)) 46 | 47 | cudaconfig = {'home':home, 'nvcc':nvcc, 48 | 'include': pjoin(home, 'include'), 49 | 'lib64': pjoin(home, 'lib64')} 50 | for k, v in cudaconfig.items(): 51 | if not os.path.exists(v): 52 | raise EnvironmentError('The CUDA %s path could not be located in %s' % (k, v)) 53 | 54 | return cudaconfig 55 | CUDA = locate_cuda() 56 | 57 | 58 | # Obtain the numpy include directory. This logic works across numpy versions. 59 | try: 60 | numpy_include = np.get_include() 61 | except AttributeError: 62 | numpy_include = np.get_numpy_include() 63 | 64 | 65 | def customize_compiler_for_nvcc(self): 66 | """inject deep into distutils to customize how the dispatch 67 | to gcc/nvcc works. 68 | If you subclass UnixCCompiler, it's not trivial to get your subclass 69 | injected in, and still have the right customizations (i.e. 70 | distutils.sysconfig.customize_compiler) run on it. So instead of going 71 | the OO route, I have this. Note, it's kindof like a wierd functional 72 | subclassing going on.""" 73 | 74 | # tell the compiler it can processes .cu 75 | self.src_extensions.append('.cu') 76 | 77 | # save references to the default compiler_so and _comple methods 78 | default_compiler_so = self.compiler_so 79 | super = self._compile 80 | 81 | # now redefine the _compile method. This gets executed for each 82 | # object but distutils doesn't have the ability to change compilers 83 | # based on source extension: we add it. 84 | def _compile(obj, src, ext, cc_args, extra_postargs, pp_opts): 85 | if os.path.splitext(src)[1] == '.cu': 86 | # use the cuda for .cu files 87 | self.set_executable('compiler_so', CUDA['nvcc']) 88 | # use only a subset of the extra_postargs, which are 1-1 translated 89 | # from the extra_compile_args in the Extension class 90 | postargs = extra_postargs['nvcc'] 91 | else: 92 | postargs = extra_postargs['gcc'] 93 | 94 | super(obj, src, ext, cc_args, postargs, pp_opts) 95 | # reset the default compiler_so, which we might have changed for cuda 96 | self.compiler_so = default_compiler_so 97 | 98 | # inject our redefined _compile method into the class 99 | self._compile = _compile 100 | 101 | 102 | # run the customize_compiler 103 | class custom_build_ext(build_ext): 104 | def build_extensions(self): 105 | customize_compiler_for_nvcc(self.compiler) 106 | build_ext.build_extensions(self) 107 | 108 | 109 | ext_modules = [ 110 | Extension( 111 | "cpu_nms", 112 | ["cpu_nms.pyx"], 113 | extra_compile_args={'gcc': ["-Wno-cpp", "-Wno-unused-function"]}, 114 | include_dirs = [numpy_include] 115 | ), 116 | Extension('gpu_nms', 117 | ['nms_kernel.cu', 'gpu_nms.pyx'], 118 | library_dirs=[CUDA['lib64']], 119 | libraries=['cudart'], 120 | language='c++', 121 | runtime_library_dirs=[CUDA['lib64']], 122 | # this syntax is specific to this build system 123 | # we're only going to use certain compiler args with nvcc and not with 124 | # gcc the implementation of this trick is in customize_compiler() below 125 | extra_compile_args={'gcc': ["-Wno-unused-function"], 126 | 'nvcc': ['-arch=sm_35', 127 | '--ptxas-options=-v', 128 | '-c', 129 | '--compiler-options', 130 | "'-fPIC'"]}, 131 | include_dirs = [numpy_include, CUDA['include']] 132 | ), 133 | ] 134 | 135 | setup( 136 | name='nms', 137 | ext_modules=ext_modules, 138 | # inject our custom trigger 139 | cmdclass={'build_ext': custom_build_ext}, 140 | ) 141 | -------------------------------------------------------------------------------- /lib/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/utils/__init__.py -------------------------------------------------------------------------------- /lib/utils/__pycache__/__init__.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/utils/__pycache__/__init__.cpython-37.pyc -------------------------------------------------------------------------------- /lib/utils/__pycache__/transforms.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/utils/__pycache__/transforms.cpython-37.pyc -------------------------------------------------------------------------------- /lib/utils/__pycache__/utils.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/utils/__pycache__/utils.cpython-37.pyc -------------------------------------------------------------------------------- /lib/utils/__pycache__/vis.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/lib/utils/__pycache__/vis.cpython-37.pyc -------------------------------------------------------------------------------- /lib/utils/transforms.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import numpy as np 12 | import cv2 13 | 14 | 15 | def flip_back(output_flipped, matched_parts): 16 | ''' 17 | ouput_flipped: numpy.ndarray(batch_size, num_joints, height, width) 18 | ''' 19 | assert output_flipped.ndim == 4,\ 20 | 'output_flipped should be [batch_size, num_joints, height, width]' 21 | 22 | output_flipped = output_flipped[:, :, :, ::-1] 23 | 24 | for pair in matched_parts: 25 | tmp = output_flipped[:, pair[0], :, :].copy() 26 | output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :] 27 | output_flipped[:, pair[1], :, :] = tmp 28 | 29 | return output_flipped 30 | 31 | 32 | def fliplr_joints(joints, joints_vis, width, matched_parts): 33 | """ 34 | flip coords 35 | """ 36 | # Flip horizontal 37 | joints[:, 0] = width - joints[:, 0] - 1 38 | 39 | # Change left-right parts 40 | for pair in matched_parts: 41 | joints[pair[0], :], joints[pair[1], :] = \ 42 | joints[pair[1], :], joints[pair[0], :].copy() 43 | joints_vis[pair[0], :], joints_vis[pair[1], :] = \ 44 | joints_vis[pair[1], :], joints_vis[pair[0], :].copy() 45 | 46 | return joints*joints_vis, joints_vis 47 | 48 | 49 | def transform_preds(coords, center, scale, output_size): 50 | target_coords = np.zeros(coords.shape) 51 | trans = get_affine_transform(center, scale, 0, output_size, inv=1) 52 | for p in range(coords.shape[0]): 53 | target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans) 54 | return target_coords 55 | 56 | 57 | def get_affine_transform(center, 58 | scale, 59 | rot, 60 | output_size, 61 | shift=np.array([0, 0], dtype=np.float32), 62 | inv=0): 63 | if not isinstance(scale, np.ndarray) and not isinstance(scale, list): 64 | print(scale) 65 | scale = np.array([scale, scale]) 66 | 67 | scale_tmp = scale * 200.0 68 | src_w = scale_tmp[0] 69 | dst_w = output_size[0] 70 | dst_h = output_size[1] 71 | 72 | rot_rad = np.pi * rot / 180 73 | src_dir = get_dir([0, src_w * -0.5], rot_rad) 74 | dst_dir = np.array([0, dst_w * -0.5], np.float32) 75 | 76 | src = np.zeros((3, 2), dtype=np.float32) 77 | dst = np.zeros((3, 2), dtype=np.float32) 78 | src[0, :] = center + scale_tmp * shift 79 | src[1, :] = center + src_dir + scale_tmp * shift 80 | dst[0, :] = [dst_w * 0.5, dst_h * 0.5] 81 | dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir 82 | 83 | src[2:, :] = get_3rd_point(src[0, :], src[1, :]) 84 | dst[2:, :] = get_3rd_point(dst[0, :], dst[1, :]) 85 | 86 | if inv: 87 | trans = cv2.getAffineTransform(np.float32(dst), np.float32(src)) 88 | else: 89 | trans = cv2.getAffineTransform(np.float32(src), np.float32(dst)) 90 | 91 | return trans 92 | 93 | 94 | def affine_transform(pt, t): 95 | '''仿射变换''' 96 | new_pt = np.array([pt[0], pt[1], 1.]).T 97 | new_pt = np.dot(t, new_pt) 98 | return new_pt[:2] 99 | 100 | 101 | def get_3rd_point(a, b): 102 | direct = a - b 103 | return b + np.array([-direct[1], direct[0]], dtype=np.float32) 104 | 105 | 106 | def get_dir(src_point, rot_rad): 107 | sn, cs = np.sin(rot_rad), np.cos(rot_rad) 108 | 109 | src_result = [0, 0] 110 | src_result[0] = src_point[0] * cs - src_point[1] * sn 111 | src_result[1] = src_point[0] * sn + src_point[1] * cs 112 | 113 | return src_result 114 | 115 | 116 | def crop(img, center, scale, output_size, rot=0): 117 | trans = get_affine_transform(center, scale, rot, output_size) 118 | 119 | dst_img = cv2.warpAffine(img, 120 | trans, 121 | (int(output_size[0]), int(output_size[1])), 122 | flags=cv2.INTER_LINEAR) 123 | 124 | return dst_img 125 | -------------------------------------------------------------------------------- /lib/utils/utils.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import os 12 | import logging 13 | import time 14 | from pathlib import Path 15 | 16 | import torch 17 | import torch.optim as optim 18 | 19 | from core.config import get_model_name 20 | 21 | 22 | def create_logger(cfg, cfg_name, phase='train'): 23 | root_output_dir = Path(cfg.OUTPUT_DIR) 24 | # set up logger 25 | if not root_output_dir.exists(): 26 | print('=> creating {}'.format(root_output_dir)) 27 | root_output_dir.mkdir() 28 | 29 | dataset = cfg.DATASET.DATASET + '_' + cfg.DATASET.HYBRID_JOINTS_TYPE \ 30 | if cfg.DATASET.HYBRID_JOINTS_TYPE else cfg.DATASET.DATASET 31 | dataset = dataset.replace(':', '_') 32 | model, _ = get_model_name(cfg) 33 | cfg_name = os.path.basename(cfg_name).split('.')[0] 34 | 35 | final_output_dir = root_output_dir / dataset / model / cfg_name 36 | 37 | print('=> creating {}'.format(final_output_dir)) 38 | final_output_dir.mkdir(parents=True, exist_ok=True) 39 | 40 | time_str = time.strftime('%Y-%m-%d-%H-%M') 41 | log_file = '{}_{}_{}.log'.format(cfg_name, time_str, phase) 42 | final_log_file = final_output_dir / log_file 43 | head = '%(asctime)-15s %(message)s' 44 | logging.basicConfig(filename=str(final_log_file), 45 | format=head) 46 | logger = logging.getLogger() 47 | logger.setLevel(logging.INFO) 48 | console = logging.StreamHandler() 49 | logging.getLogger('').addHandler(console) 50 | 51 | tensorboard_log_dir = Path(cfg.LOG_DIR) / dataset / model / \ 52 | (cfg_name + '_' + time_str) 53 | print('=> creating {}'.format(tensorboard_log_dir)) 54 | tensorboard_log_dir.mkdir(parents=True, exist_ok=True) 55 | 56 | return logger, str(final_output_dir), str(tensorboard_log_dir) 57 | 58 | 59 | def get_optimizer(cfg, model): 60 | optimizer = None 61 | if cfg.TRAIN.OPTIMIZER == 'sgd': 62 | optimizer = optim.SGD( 63 | model.parameters(), 64 | lr=cfg.TRAIN.LR, 65 | momentum=cfg.TRAIN.MOMENTUM, 66 | weight_decay=cfg.TRAIN.WD, 67 | nesterov=cfg.TRAIN.NESTEROV 68 | ) 69 | elif cfg.TRAIN.OPTIMIZER == 'adam': 70 | optimizer = optim.Adam( 71 | model.parameters(), 72 | lr=cfg.TRAIN.LR 73 | ) 74 | 75 | return optimizer 76 | 77 | 78 | def save_checkpoint(states, is_best, output_dir, 79 | filename='checkpoint.pth.tar'): 80 | torch.save(states, os.path.join(output_dir, filename)) 81 | if is_best and 'state_dict' in states: 82 | torch.save(states['state_dict'], 83 | os.path.join(output_dir, 'model_best.pth.tar')) 84 | -------------------------------------------------------------------------------- /lib/utils/vis.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import math 12 | 13 | import numpy as np 14 | import torchvision 15 | import cv2 16 | 17 | from core.inference import get_max_preds 18 | 19 | 20 | def save_batch_image_with_joints(batch_image, batch_joints, batch_joints_vis, 21 | file_name, nrow=8, padding=2): 22 | ''' 23 | batch_image: [batch_size, channel, height, width] 24 | batch_joints: [batch_size, num_joints, 3], 25 | batch_joints_vis: [batch_size, num_joints, 1], 26 | } 27 | ''' 28 | grid = torchvision.utils.make_grid(batch_image, nrow, padding, True) 29 | ndarr = grid.mul(255).clamp(0, 255).byte().permute(1, 2, 0).cpu().numpy() 30 | ndarr = ndarr.copy() 31 | 32 | nmaps = batch_image.size(0) 33 | xmaps = min(nrow, nmaps) 34 | ymaps = int(math.ceil(float(nmaps) / xmaps)) 35 | height = int(batch_image.size(2) + padding) 36 | width = int(batch_image.size(3) + padding) 37 | k = 0 38 | for y in range(ymaps): 39 | for x in range(xmaps): 40 | if k >= nmaps: 41 | break 42 | joints = batch_joints[k] 43 | joints_vis = batch_joints_vis[k] 44 | 45 | for joint, joint_vis in zip(joints, joints_vis): 46 | joint[0] = x * width + padding + joint[0] 47 | joint[1] = y * height + padding + joint[1] 48 | if joint_vis[0]: 49 | cv2.circle(ndarr, (int(joint[0]), int(joint[1])), 2, [255, 0, 0], 2) 50 | k = k + 1 51 | cv2.imwrite(file_name, ndarr) 52 | 53 | 54 | def save_batch_heatmaps(batch_image, batch_heatmaps, file_name, 55 | normalize=True): 56 | ''' 57 | batch_image: [batch_size, channel, height, width] 58 | batch_heatmaps: ['batch_size, num_joints, height, width] 59 | file_name: saved file name 60 | ''' 61 | if normalize: 62 | batch_image = batch_image.clone() 63 | min = float(batch_image.min()) 64 | max = float(batch_image.max()) 65 | 66 | batch_image.add_(-min).div_(max - min + 1e-5) 67 | 68 | batch_size = batch_heatmaps.size(0) 69 | num_joints = batch_heatmaps.size(1) 70 | heatmap_height = batch_heatmaps.size(2) 71 | heatmap_width = batch_heatmaps.size(3) 72 | 73 | grid_image = np.zeros((batch_size*heatmap_height, 74 | (num_joints+1)*heatmap_width, 75 | 3), 76 | dtype=np.uint8) 77 | 78 | preds, maxvals = get_max_preds(batch_heatmaps.detach().cpu().numpy()) 79 | 80 | for i in range(batch_size): 81 | image = batch_image[i].mul(255)\ 82 | .clamp(0, 255)\ 83 | .byte()\ 84 | .permute(1, 2, 0)\ 85 | .cpu().numpy() 86 | heatmaps = batch_heatmaps[i].mul(255)\ 87 | .clamp(0, 255)\ 88 | .byte()\ 89 | .cpu().numpy() 90 | 91 | resized_image = cv2.resize(image, 92 | (int(heatmap_width), int(heatmap_height))) 93 | 94 | height_begin = heatmap_height * i 95 | height_end = heatmap_height * (i + 1) 96 | for j in range(num_joints): 97 | cv2.circle(resized_image, 98 | (int(preds[i][j][0]), int(preds[i][j][1])), 99 | 1, [0, 0, 255], 1) 100 | heatmap = heatmaps[j, :, :] 101 | colored_heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET) 102 | masked_image = colored_heatmap*0.7 + resized_image*0.3 103 | cv2.circle(masked_image, 104 | (int(preds[i][j][0]), int(preds[i][j][1])), 105 | 1, [0, 0, 255], 1) 106 | 107 | width_begin = heatmap_width * (j+1) 108 | width_end = heatmap_width * (j+2) 109 | grid_image[height_begin:height_end, width_begin:width_end, :] = \ 110 | masked_image 111 | # grid_image[height_begin:height_end, width_begin:width_end, :] = \ 112 | # colored_heatmap*0.7 + resized_image*0.3 113 | 114 | grid_image[height_begin:height_end, 0:heatmap_width, :] = resized_image 115 | 116 | cv2.imwrite(file_name, grid_image) 117 | 118 | 119 | def save_debug_images(config, input, meta, target, joints_pred, output, 120 | prefix): 121 | if not config.DEBUG.DEBUG: 122 | return 123 | 124 | if config.DEBUG.SAVE_BATCH_IMAGES_GT: 125 | save_batch_image_with_joints( 126 | input, meta['joints'], meta['joints_vis'], 127 | '{}_gt.jpg'.format(prefix) 128 | ) 129 | if config.DEBUG.SAVE_BATCH_IMAGES_PRED: 130 | save_batch_image_with_joints( 131 | input, joints_pred, meta['joints_vis'], 132 | '{}_pred.jpg'.format(prefix) 133 | ) 134 | if config.DEBUG.SAVE_HEATMAPS_GT: 135 | save_batch_heatmaps( 136 | input, target, '{}_hm_gt.jpg'.format(prefix) 137 | ) 138 | if config.DEBUG.SAVE_HEATMAPS_PRED: 139 | save_batch_heatmaps( 140 | input, output, '{}_hm_pred.jpg'.format(prefix) 141 | ) 142 | -------------------------------------------------------------------------------- /lib/utils/zipreader.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import os 12 | import zipfile 13 | import xml.etree.ElementTree as ET 14 | 15 | import cv2 16 | import numpy as np 17 | 18 | _im_zfile = [] 19 | _xml_path_zip = [] 20 | _xml_zfile = [] 21 | 22 | 23 | def imread(filename, flags=cv2.IMREAD_COLOR): 24 | global _im_zfile 25 | path = filename 26 | pos_at = path.index('@') 27 | if pos_at == -1: 28 | print("character '@' is not found from the given path '%s'"%(path)) 29 | assert 0 30 | path_zip = path[0: pos_at] 31 | path_img = path[pos_at + 2:] 32 | if not os.path.isfile(path_zip): 33 | print("zip file '%s' is not found"%(path_zip)) 34 | assert 0 35 | for i in range(len(_im_zfile)): 36 | if _im_zfile[i]['path'] == path_zip: 37 | data = _im_zfile[i]['zipfile'].read(path_img) 38 | return cv2.imdecode(np.frombuffer(data, np.uint8), flags) 39 | 40 | _im_zfile.append({ 41 | 'path': path_zip, 42 | 'zipfile': zipfile.ZipFile(path_zip, 'r') 43 | }) 44 | data = _im_zfile[-1]['zipfile'].read(path_img) 45 | 46 | return cv2.imdecode(np.frombuffer(data, np.uint8), flags) 47 | 48 | 49 | def xmlread(filename): 50 | global _xml_path_zip 51 | global _xml_zfile 52 | path = filename 53 | pos_at = path.index('@') 54 | if pos_at == -1: 55 | print("character '@' is not found from the given path '%s'"%(path)) 56 | assert 0 57 | path_zip = path[0: pos_at] 58 | path_xml = path[pos_at + 2:] 59 | if not os.path.isfile(path_zip): 60 | print("zip file '%s' is not found"%(path_zip)) 61 | assert 0 62 | for i in range(len(_xml_path_zip)): 63 | if _xml_path_zip[i] == path_zip: 64 | data = _xml_zfile[i].open(path_xml) 65 | return ET.fromstring(data.read()) 66 | _xml_path_zip.append(path_zip) 67 | print("read new xml file '%s'"%(path_zip)) 68 | _xml_zfile.append(zipfile.ZipFile(path_zip, 'r')) 69 | data = _xml_zfile[-1].open(path_xml) 70 | return ET.fromstring(data.read()) 71 | -------------------------------------------------------------------------------- /models/lp_coco/lp_net_50_256x192_with_gcb.pth.tar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/models/lp_coco/lp_net_50_256x192_with_gcb.pth.tar -------------------------------------------------------------------------------- /models/lp_coco/lp_net_50_256x192_without_gcb.pth.tar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/models/lp_coco/lp_net_50_256x192_without_gcb.pth.tar -------------------------------------------------------------------------------- /pose_estimation/__pycache__/_init_paths.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sailyung/human-pose-estimation/8ddc3103a0ae05e6d7001f1cbd0e3b8963cba864/pose_estimation/__pycache__/_init_paths.cpython-37.pyc -------------------------------------------------------------------------------- /pose_estimation/_init_paths.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft Corporation. All rights reserved. 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import os.path as osp 12 | import sys 13 | 14 | 15 | def add_path(path): 16 | if path not in sys.path: 17 | sys.path.insert(0, path) 18 | 19 | 20 | this_dir = osp.dirname(__file__) 21 | 22 | lib_path = osp.join(this_dir, '..', 'lib') 23 | add_path(lib_path) 24 | -------------------------------------------------------------------------------- /pose_estimation/demo.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft Corporation. All rights reserved. 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | 8 | from __future__ import absolute_import 9 | from __future__ import division 10 | from __future__ import print_function 11 | 12 | import argparse 13 | import os 14 | import pprint 15 | 16 | import torch 17 | import torch.nn.parallel 18 | import torch.backends.cudnn as cudnn 19 | import torch.optim 20 | import torch.utils.data 21 | import torch.utils.data.distributed 22 | import torchvision.transforms as transforms 23 | 24 | import _init_paths 25 | from core.config import config 26 | from core.config import update_config 27 | from core.config import update_dir 28 | from core.inference import get_final_preds 29 | from utils.utils import create_logger 30 | from utils.transforms import get_affine_transform 31 | import numpy as np 32 | import cv2 33 | import models 34 | 35 | 36 | def parse_args(): 37 | parser = argparse.ArgumentParser(description='Train keypoints network') 38 | # general 39 | parser.add_argument('--cfg', 40 | help='experiment configure file name', 41 | required=True, 42 | type=str) 43 | 44 | args, rest = parser.parse_known_args() 45 | # update config 46 | update_config(args.cfg) 47 | 48 | # training 49 | parser.add_argument('--frequent', 50 | help='frequency of logging', 51 | default=config.PRINT_FREQ, 52 | type=int) 53 | parser.add_argument('--model-file', 54 | help='model state file', 55 | type=str) 56 | parser.add_argument('--img-file', 57 | help='input your test img', 58 | type=str, 59 | default='') 60 | args = parser.parse_args() 61 | 62 | return args 63 | 64 | 65 | def reset_config(config, args): 66 | if args.model_file: 67 | config.TEST.MODEL_FILE = args.model_file 68 | 69 | def _box2cs(box, image_width, image_height): 70 | x, y, w, h = box[:4] 71 | return _xywh2cs(x, y, w, h, image_width, image_height) 72 | 73 | def _xywh2cs(x, y, w, h, image_width, image_height): 74 | center = np.zeros((2), dtype=np.float32) 75 | center[0] = x + w * 0.5 76 | center[1] = y + h * 0.5 77 | 78 | aspect_ratio = image_width * 1.0 / image_height 79 | pixel_std = 200 80 | 81 | if w > aspect_ratio * h: 82 | h = w * 1.0 / aspect_ratio 83 | elif w < aspect_ratio * h: 84 | w = h * aspect_ratio 85 | scale = np.array( 86 | [w * 1.0 / pixel_std, h * 1.0 / pixel_std], 87 | dtype=np.float32) 88 | if center[0] != -1: 89 | scale = scale * 1.25 90 | 91 | return center, scale 92 | 93 | def main(): 94 | args = parse_args() 95 | reset_config(config, args) 96 | 97 | logger, final_output_dir, tb_log_dir = create_logger( 98 | config, args.cfg, 'valid') 99 | 100 | model = eval('models.'+config.MODEL.NAME+'.get_pose_net')( 101 | config, is_train=False 102 | ) 103 | 104 | if config.TEST.MODEL_FILE: 105 | logger.info('=> loading model from {}'.format(config.TEST.MODEL_FILE)) 106 | model.load_state_dict(torch.load(config.TEST.MODEL_FILE)) 107 | else: 108 | model_state_file = os.path.join(final_output_dir, 109 | 'final_state.pth.tar') 110 | logger.info('=> loading model from {}'.format(model_state_file)) 111 | model.load_state_dict(torch.load(model_state_file)) 112 | 113 | # Loading an image 114 | image_file = args.img_file 115 | data_numpy = cv2.imread(image_file, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION) 116 | 117 | # object detection box 118 | # need to be given [left_top, w, h] 119 | # box = [391, 99, 667-391, 524-99] 120 | box = [743, 52, 955-743, 500-52] 121 | # box = [93, 262, 429-93, 595-262] 122 | c, s = _box2cs(box, config.MODEL.IMAGE_SIZE[0], config.MODEL.IMAGE_SIZE[1]) 123 | print(c) 124 | r = 0 125 | 126 | trans = get_affine_transform(c, s, r, config.MODEL.IMAGE_SIZE) 127 | print(trans.shape) 128 | input = cv2.warpAffine( 129 | data_numpy, 130 | trans, 131 | (int(config.MODEL.IMAGE_SIZE[0]), int(config.MODEL.IMAGE_SIZE[1])), 132 | flags=cv2.INTER_LINEAR) 133 | 134 | transform = transforms.Compose([ 135 | transforms.ToTensor(), 136 | transforms.Normalize(mean=[0.485, 0.456, 0.406], 137 | std=[0.229, 0.224, 0.225]), 138 | ]) 139 | 140 | input = transform(input).unsqueeze(0) 141 | # switch to evaluate mode 142 | model.eval() 143 | with torch.no_grad(): 144 | # compute output heatmap 145 | output = model(input) 146 | preds, maxvals = get_final_preds(config, output.clone().cpu().numpy(), np.asarray([c]), np.asarray([s])) 147 | 148 | image = data_numpy.copy() 149 | for mat in preds[0]: 150 | x, y = int(mat[0]), int(mat[1]) 151 | cv2.circle(image, (x, y), 2, (255, 0, 0), 2) 152 | 153 | # vis result 154 | # cv2.imwrite("test_lp50.jpg", image) 155 | cv2.imshow('demo', image) 156 | cv2.waitKey(0) 157 | cv2.destroyAllWindows() 158 | if __name__ == '__main__': 159 | main() 160 | 161 | -------------------------------------------------------------------------------- /pose_estimation/train.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft Corporation. All rights reserved. 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import argparse 12 | import os 13 | import pprint 14 | import shutil 15 | 16 | import torch 17 | import torch.nn.parallel 18 | import torch.backends.cudnn as cudnn 19 | import torch.optim 20 | import torch.utils.data 21 | import torch.utils.data.distributed 22 | import torchvision.transforms as transforms 23 | from tensorboardX import SummaryWriter 24 | 25 | import _init_paths 26 | from core.config import config 27 | from core.config import update_config 28 | from core.config import update_dir 29 | from core.config import get_model_name 30 | from core.loss import JointsMSELoss 31 | from core.function import train 32 | from core.function import validate 33 | from utils.utils import get_optimizer 34 | from utils.utils import save_checkpoint 35 | from utils.utils import create_logger 36 | 37 | import dataset 38 | import models 39 | 40 | 41 | def parse_args(): 42 | parser = argparse.ArgumentParser(description='Train keypoints network') 43 | # general 44 | parser.add_argument('--cfg', 45 | help='experiment configure file name', 46 | required=True, 47 | type=str) 48 | 49 | args, rest = parser.parse_known_args() 50 | # update config 51 | update_config(args.cfg) 52 | 53 | # training 54 | parser.add_argument('--frequent', 55 | help='frequency of logging', 56 | default=config.PRINT_FREQ, 57 | type=int) 58 | parser.add_argument('--gpus', 59 | help='gpus', 60 | type=str) 61 | parser.add_argument('--workers', 62 | help='num of dataloader workers', 63 | type=int) 64 | 65 | args = parser.parse_args() 66 | 67 | return args 68 | 69 | 70 | def reset_config(config, args): 71 | if args.gpus: 72 | config.GPUS = args.gpus 73 | if args.workers: 74 | config.WORKERS = args.workers 75 | 76 | 77 | def main(): 78 | args = parse_args() 79 | reset_config(config, args) 80 | 81 | logger, final_output_dir, tb_log_dir = create_logger( 82 | config, args.cfg, 'train') 83 | 84 | logger.info(pprint.pformat(args)) 85 | logger.info(pprint.pformat(config)) 86 | 87 | # cudnn related setting 88 | cudnn.benchmark = config.CUDNN.BENCHMARK 89 | torch.backends.cudnn.deterministic = config.CUDNN.DETERMINISTIC 90 | torch.backends.cudnn.enabled = config.CUDNN.ENABLED 91 | 92 | # models.pose_resnet.get_pose_net((NUM_LAYERS), is_train=True) 93 | model = eval('models.'+config.MODEL.NAME+'.get_pose_net')( 94 | config, is_train=True 95 | ) 96 | 97 | # copy model file 98 | this_dir = os.path.dirname(__file__) 99 | shutil.copy2( 100 | os.path.join(this_dir, '../lib/models', config.MODEL.NAME + '.py'), 101 | final_output_dir) 102 | 103 | writer_dict = { 104 | 'writer': SummaryWriter(log_dir=tb_log_dir), 105 | 'train_global_steps': 0, 106 | 'valid_global_steps': 0, 107 | } 108 | 109 | dump_input = torch.rand((config.TRAIN.BATCH_SIZE, 110 | 3, 111 | config.MODEL.IMAGE_SIZE[1], 112 | config.MODEL.IMAGE_SIZE[0])) 113 | writer_dict['writer'].add_graph(model, (dump_input, ), verbose=False) 114 | 115 | gpus = [int(i) for i in config.GPUS.split(',')] 116 | model = torch.nn.DataParallel(model, device_ids=gpus).cuda() 117 | 118 | # define loss function (criterion) and optimizer 119 | criterion = JointsMSELoss( 120 | use_target_weight=config.LOSS.USE_TARGET_WEIGHT 121 | ).cuda() 122 | 123 | optimizer = get_optimizer(config, model) 124 | 125 | lr_scheduler = torch.optim.lr_scheduler.MultiStepLR( 126 | optimizer, config.TRAIN.LR_STEP, config.TRAIN.LR_FACTOR 127 | ) 128 | 129 | # Data loading code 130 | normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], 131 | std=[0.229, 0.224, 0.225]) 132 | train_dataset = eval('dataset.'+config.DATASET.DATASET)( 133 | config, 134 | config.DATASET.ROOT, 135 | config.DATASET.TRAIN_SET, 136 | True, 137 | transforms.Compose([ 138 | transforms.ToTensor(), 139 | normalize, 140 | ]) 141 | ) 142 | valid_dataset = eval('dataset.'+config.DATASET.DATASET)( 143 | config, 144 | config.DATASET.ROOT, 145 | config.DATASET.TEST_SET, 146 | False, 147 | transforms.Compose([ 148 | transforms.ToTensor(), 149 | normalize, 150 | ]) 151 | ) 152 | 153 | train_loader = torch.utils.data.DataLoader( 154 | train_dataset, 155 | batch_size=config.TRAIN.BATCH_SIZE*len(gpus), 156 | shuffle=config.TRAIN.SHUFFLE, 157 | num_workers=config.WORKERS, 158 | pin_memory=True 159 | ) 160 | valid_loader = torch.utils.data.DataLoader( 161 | valid_dataset, 162 | batch_size=config.TEST.BATCH_SIZE*len(gpus), 163 | shuffle=False, 164 | num_workers=config.WORKERS, 165 | pin_memory=True 166 | ) 167 | 168 | best_perf = 0.0 169 | best_model = False 170 | for epoch in range(config.TRAIN.BEGIN_EPOCH, config.TRAIN.END_EPOCH): 171 | lr_scheduler.step() 172 | 173 | # train for one epoch 174 | train(config, train_loader, model, criterion, optimizer, epoch, 175 | final_output_dir, tb_log_dir, writer_dict) 176 | 177 | 178 | # evaluate on validation set 179 | perf_indicator = validate(config, valid_loader, valid_dataset, model, 180 | criterion, final_output_dir, tb_log_dir, 181 | writer_dict) 182 | 183 | if perf_indicator > best_perf: 184 | best_perf = perf_indicator 185 | best_model = True 186 | else: 187 | best_model = False 188 | 189 | logger.info('=> saving checkpoint to {}'.format(final_output_dir)) 190 | save_checkpoint({ 191 | 'epoch': epoch + 1, 192 | 'model': get_model_name(config), 193 | 'state_dict': model.state_dict(), 194 | 'perf': perf_indicator, 195 | 'optimizer': optimizer.state_dict(), 196 | }, best_model, final_output_dir) 197 | 198 | final_model_state_file = os.path.join(final_output_dir, 199 | 'final_state.pth.tar') 200 | logger.info('saving final model state to {}'.format( 201 | final_model_state_file)) 202 | torch.save(model.module.state_dict(), final_model_state_file) 203 | writer_dict['writer'].close() 204 | 205 | 206 | if __name__ == '__main__': 207 | main() 208 | -------------------------------------------------------------------------------- /pose_estimation/valid.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft Corporation. All rights reserved. 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | 8 | from __future__ import absolute_import 9 | from __future__ import division 10 | from __future__ import print_function 11 | 12 | import argparse 13 | import os 14 | import pprint 15 | 16 | import torch 17 | import torch.nn.parallel 18 | import torch.backends.cudnn as cudnn 19 | import torch.optim 20 | import torch.utils.data 21 | import torch.utils.data.distributed 22 | import torchvision.transforms as transforms 23 | 24 | import _init_paths 25 | from core.config import config 26 | from core.config import update_config 27 | from core.config import update_dir 28 | from core.loss import JointsMSELoss 29 | from core.function import validate 30 | from utils.utils import create_logger 31 | 32 | import dataset 33 | import models 34 | 35 | 36 | def parse_args(): 37 | parser = argparse.ArgumentParser(description='Train keypoints network') 38 | # general 39 | parser.add_argument('--cfg', 40 | help='experiment configure file name', 41 | required=True, 42 | type=str) 43 | 44 | args, rest = parser.parse_known_args() 45 | # update config 46 | update_config(args.cfg) 47 | 48 | # training 49 | parser.add_argument('--frequent', 50 | help='frequency of logging', 51 | default=config.PRINT_FREQ, 52 | type=int) 53 | parser.add_argument('--gpus', 54 | help='gpus', 55 | type=str) 56 | parser.add_argument('--workers', 57 | help='num of dataloader workers', 58 | type=int) 59 | parser.add_argument('--model-file', 60 | help='model state file', 61 | type=str) 62 | parser.add_argument('--use-detect-bbox', 63 | help='use detect bbox', 64 | action='store_true') 65 | parser.add_argument('--flip-test', 66 | help='use flip test', 67 | action='store_true') 68 | parser.add_argument('--post-process', 69 | help='use post process', 70 | action='store_true') 71 | parser.add_argument('--shift-heatmap', 72 | help='shift heatmap', 73 | action='store_true') 74 | parser.add_argument('--coco-bbox-file', 75 | help='coco detection bbox file', 76 | type=str) 77 | 78 | args = parser.parse_args() 79 | 80 | return args 81 | 82 | 83 | def reset_config(config, args): 84 | if args.gpus: 85 | config.GPUS = args.gpus 86 | if args.workers: 87 | config.WORKERS = args.workers 88 | if args.use_detect_bbox: 89 | config.TEST.USE_GT_BBOX = not args.use_detect_bbox 90 | if args.flip_test: 91 | config.TEST.FLIP_TEST = args.flip_test 92 | if args.post_process: 93 | config.TEST.POST_PROCESS = args.post_process 94 | if args.shift_heatmap: 95 | config.TEST.SHIFT_HEATMAP = args.shift_heatmap 96 | if args.model_file: 97 | config.TEST.MODEL_FILE = args.model_file 98 | if args.coco_bbox_file: 99 | config.TEST.COCO_BBOX_FILE = args.coco_bbox_file 100 | 101 | 102 | def main(): 103 | args = parse_args() 104 | reset_config(config, args) 105 | 106 | logger, final_output_dir, tb_log_dir = create_logger( 107 | config, args.cfg, 'valid') 108 | 109 | logger.info(pprint.pformat(args)) 110 | logger.info(pprint.pformat(config)) 111 | 112 | # cudnn related setting 113 | cudnn.benchmark = config.CUDNN.BENCHMARK 114 | torch.backends.cudnn.deterministic = config.CUDNN.DETERMINISTIC 115 | torch.backends.cudnn.enabled = config.CUDNN.ENABLED 116 | 117 | model = eval('models.'+config.MODEL.NAME+'.get_pose_net')( 118 | config, is_train=False 119 | ) 120 | 121 | if config.TEST.MODEL_FILE: 122 | logger.info('=> loading model from {}'.format(config.TEST.MODEL_FILE)) 123 | model.load_state_dict(torch.load(config.TEST.MODEL_FILE)) 124 | else: 125 | model_state_file = os.path.join(final_output_dir, 126 | 'final_state.pth.tar') 127 | logger.info('=> loading model from {}'.format(model_state_file)) 128 | model.load_state_dict(torch.load(model_state_file)) 129 | 130 | gpus = [int(i) for i in config.GPUS.split(',')] 131 | model = torch.nn.DataParallel(model, device_ids=gpus).cuda() 132 | 133 | # define loss function (criterion) and optimizer 134 | criterion = JointsMSELoss( 135 | use_target_weight=config.LOSS.USE_TARGET_WEIGHT 136 | ).cuda() 137 | 138 | # Data loading code 139 | normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], 140 | std=[0.229, 0.224, 0.225]) 141 | valid_dataset = eval('dataset.'+config.DATASET.DATASET)( 142 | config, 143 | config.DATASET.ROOT, 144 | config.DATASET.TEST_SET, 145 | False, 146 | transforms.Compose([ 147 | transforms.ToTensor(), 148 | normalize, 149 | ]) 150 | ) 151 | valid_loader = torch.utils.data.DataLoader( 152 | valid_dataset, 153 | batch_size=config.TEST.BATCH_SIZE*len(gpus), 154 | shuffle=False, 155 | num_workers=config.WORKERS, 156 | pin_memory=True 157 | ) 158 | 159 | # evaluate on validation set 160 | validate(config, valid_loader, valid_dataset, model, criterion, 161 | final_output_dir, tb_log_dir) 162 | 163 | 164 | if __name__ == '__main__': 165 | main() 166 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | EasyDict 2 | opencv-python 3 | Cython 4 | scipy 5 | pandas 6 | pyyaml 7 | json_tricks 8 | scikit-image 9 | tensorboardX 10 | torchvision 11 | --------------------------------------------------------------------------------