├── .gitignore ├── .vscode └── settings.json ├── README.md ├── configs ├── baseline_phase1.yaml ├── baseline_phase2.yaml └── baseline_phase3.yaml ├── doc ├── cxk.gif ├── dance5_.gif ├── data.md ├── micheal2.gif └── out3.gif ├── eval.py ├── exp ├── eval │ └── hvd_start.sh ├── phase1 │ └── hvd_start.sh ├── phase2 │ └── hvd_start.sh └── phase3 │ └── hvd_start.sh ├── lib ├── core │ ├── __init__.py │ ├── config.py │ ├── evaluate.py │ ├── loss.py │ └── trainer.py ├── data_utils │ ├── img_utils.py │ ├── insta_utils.py │ ├── insta_utils_imgs.py │ ├── kp_utils.py │ ├── mpii3d_utils.py │ ├── penn_action_utils.py │ ├── posetrack_utils.py │ ├── threedpw_utils.py │ └── transforms │ │ ├── __init__.py │ │ ├── basic.py │ │ ├── color_jitter.py │ │ ├── crop.py │ │ ├── random_erase.py │ │ └── random_hflip.py ├── dataset │ ├── __init__.py │ ├── dataset_image.py │ ├── dataset_video.py │ └── loaders.py ├── models │ ├── __init__.py │ ├── ktd.py │ ├── maed.py │ ├── ops │ │ ├── __init__.py │ │ └── drop.py │ ├── resnetv2.py │ ├── smpl.py │ ├── spin.py │ ├── tokenpose.py │ └── vision_transformer.py └── utils │ ├── __init__.py │ ├── demo_utils.py │ ├── eval_utils.py │ ├── fbx_output.py │ ├── geometry.py │ ├── pose_tracker.py │ ├── renderer.py │ ├── smooth_bbox.py │ ├── utils.py │ └── vis.py ├── requirements.txt ├── scripts ├── eval.sh ├── prepare_insta.sh ├── prepare_training_data.sh └── run.sh ├── tox.ini └── train_hvd.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | -------------------------------------------------------------------------------- /.vscode/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "python.formatting.provider": "autopep8" 3 | } -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Capturing the Motion of Every Joint: 3D Human Pose and Shape Estimation with Independent Tokens 2 | 3 | [[project]](https://yangsenius.github.io/INT_HMR_Model/) [[arxiv]](https://arxiv.org/abs/2303.00298) [[paper]](https://openreview.net/forum?id=0Vv4H4Ch0la)[[examples]](https://yangsenius.github.io/INT_HMR_Model/) 4 | 5 |

6 | 7 | *The multi-person videos above are based on the VIBE detection and tracking framework.* 8 | 9 | 10 | 11 | > [**Capturing the motion of every joint: 3D human pose and shape estimation with independent tokens**](https://openreview.net/pdf?id=0Vv4H4Ch0la), 12 | > [Sen Yang](https://yangsenius.github.io/INT_HMR_Model/), [Wen Heng](), [Gang Liu](https://scholar.google.com/citations?user=ZyzfB9sAAAAJ&hl=zh-CN&authuser=1), [Guozhong Luo](https://github.com/guozhongluo), [Wankou Yang](https://scholar.google.com/citations?user=inPYAuYAAAAJ&hl=zh-CN), [Gang Yu](https://www.skicyyu.org/), 13 | > *The Eleventh International Conference on Learning Representations, ICLR2023 spotlight* 14 | 15 | 16 | 17 | ## Getting Started 18 | 19 | 20 | This repo is based on the enviroment of `python>=3.6` and `PyTorch>=1.8`. It's better to use the virtual enironment of `conda` 21 | 22 | ``` 23 | conda create -n int_hmr python=3.6 && conda activate int_hmr 24 | ``` 25 | 26 | Install `PyTorch` following the steps of the official guide on [PyTorch website](https://pytorch.org/get-started/locally/). 27 | 28 | The models in the paper were trained using the distributed training framework `Horovod`. If you want to train the model distributedly using this code, please install the `Horovod` following the [website](https://horovod.readthedocs.io/en/stable/), we use the version of horovod:0.3.3. 29 | 30 | And install the dependencies using `conda`: 31 | 32 | ``` 33 | pip install -r requirements.txt 34 | ``` 35 | 36 | ## Data preparation 37 | 38 | We follow the steps of [MAED](https://github.com/ziniuwan/maed) repo to prepare the training data. Please refer to [data.md](doc/data.md) 39 | 40 | ## Training 41 | 42 | 43 | To run on a machine with 4 GPUs: 44 | 45 | ``` 46 | sh hvd_start.sh 4 localhost:4 47 | ``` 48 | 49 | To run on 4 machines with 4 GPUs each 50 | 51 | ``` 52 | sh hvd_start.sh 16 server1_ip:4,server2_ip:4,server3_ip:4,server4_ip:4 53 | ``` 54 | Here we show the training commands of using a single machine with 4 GPUs for the proposed scheme of progressive 3-stage training. 55 | 56 | 1.Image based pre-training: 57 | ``` 58 | sh exp/phase1/hvd_start.sh 4 localhost:4 59 | ``` 60 | 2.Image/Video based pre-training: 61 | ``` 62 | sh exp/phase2/hvd_start.sh 4 localhost:4 63 | ``` 64 | 3.Fine-tuning: 65 | ``` 66 | sh exp/phase3/hvd_start.sh 4 localhost:4 67 | ``` 68 | 69 | ## Evaluation 70 | 71 | ``` 72 | sh exp/eval/hvd_start.sh 4 localhost:4 73 | ``` 74 | 75 | ## Pretrained models 76 | 77 | |PA-MPJPE (3DPW test set)|Length of temp embed.|Link| 78 | |:-:|:-:|-| 79 | |42.0 (T=64)|16|[Model-1 Google drive](https://drive.google.com/file/d/1ffCEhjXxOQ5EIx3Xt2NF0EoOx-3Py0he/view?usp=drive_link)| 80 | |42.3 (T=64)|64|[Model-2 Google drive](https://drive.google.com/file/d/1Kq25NESN6d2QQtUJ02Fjme4BEobWe3Az/view?usp=drive_link)| 81 | 82 | ## Citation 83 | If you find this repository useful please give it a star 🌟 or consider citing our work: 84 | 85 | ``` 86 | @inproceedings{ 87 | yang2023capturing, 88 | title={Capturing the Motion of Every Joint: 3D Human Pose and Shape Estimation with Independent Tokens}, 89 | author={Sen Yang and Wen Heng and Gang Liu and GUOZHONG LUO and Wankou Yang and Gang YU}, 90 | booktitle={The Eleventh International Conference on Learning Representations (ICLR) }, 91 | year={2023}, 92 | url={https://openreview.net/forum?id=0Vv4H4Ch0la} 93 | } 94 | ``` 95 | 96 | ## Credit 97 | Thanks for the great open-source codes of [MAED](https://github.com/ziniuwan/maed) and [VIBE](https://github.com/mkocabas/VIBE) 98 | 99 | -------------------------------------------------------------------------------- /configs/baseline_phase1.yaml: -------------------------------------------------------------------------------- 1 | DEBUG: false 2 | DEBUG_FREQ: 1 3 | LOGDIR: '' 4 | DEVICE: 'cuda' 5 | EXP_NAME: 'token3d' 6 | OUTPUT_DIR: 'results/' 7 | NUM_WORKERS: 8 8 | SEED_VALUE: -1 9 | SAVE_FREQ: 1 10 | DATASET: 11 | SEQLEN: 16 12 | SAMPLE_POOL: 128 13 | OVERLAP: 0.5 14 | RANDOM_SAMPLE: false 15 | RANDOM_START: true 16 | SIZE_JITTER: 0. 17 | ROT_JITTER: 0 18 | RANDOM_FLIP: 0.5 19 | RANDOM_CROP_P: 0.2 20 | RANDOM_CROP_SIZE: 0.6 21 | COLOR_JITTER: 0. 22 | ERASE_PROB: 0. 23 | ERASE_PART: 0. 24 | ERASE_FILL: False 25 | ERASE_KP: False 26 | ERASE_MARGIN: 0. 27 | WIDTH: 224 28 | HEIGHT: 224 29 | EVAL: 30 | SAMPLE_POOL: 128 31 | SEQLEN: 16 32 | BATCH_SIZE: 8 33 | INTERPOLATION: 1 34 | BBOX_SCALE: 1.1 35 | LOSS: 36 | KP_2D_W: 300.0 37 | KP_3D_W: 600.0 38 | SHAPE_W: 0.06 39 | POSE_W: 60.0 40 | SMPL_NORM: 1.0 41 | ACCL_W: 0.0 42 | TRAIN: 43 | BATCH_SIZE_3D: 0 44 | BATCH_SIZE_2D: 0 45 | BATCH_SIZE_IMG: 120 46 | IMG_USE_FREQ: 1 47 | NUM_ITERS_PER_EPOCH: -1 48 | RESUME: '' 49 | START_EPOCH: 0 50 | END_EPOCH: 100 # 51 | DATASETS_2D: [] 52 | DATASETS_3D: [] 53 | DATASETS_IMG: 54 | - 'coco2014-all' 55 | - 'lspet' 56 | - 'mpii' 57 | - 'mpii3d' 58 | - 'h36m' 59 | DATASET_EVAL: '3dpw' 60 | EVAL_SET: 'test' 61 | OPTIM: 62 | LR: 0.0001 63 | WD: 0.00001 64 | OPTIM: 'Adam' 65 | WARMUP_EPOCH: 0 66 | WARMUP_FACTOR: 0.1 67 | MILESTONES: [60, 90] 68 | MODEL: 69 | ENABLE_TEMP_MODELING: False 70 | ENCODER: 71 | NUM_BLOCKS: 6 72 | NUM_HEADS: 12 73 | SPA_TEMP_MODE: 'vanilla' 74 | 75 | -------------------------------------------------------------------------------- /configs/baseline_phase2.yaml: -------------------------------------------------------------------------------- 1 | DEBUG: false 2 | DEBUG_FREQ: 1 3 | LOGDIR: '' 4 | DEVICE: 'cuda' 5 | EXP_NAME: 'token3d' 6 | OUTPUT_DIR: 'results/' 7 | NUM_WORKERS: 8 8 | SEED_VALUE: -1 9 | SAVE_FREQ: 1 10 | DATASET: 11 | SEQLEN: 16 12 | SAMPLE_POOL: 128 13 | OVERLAP: 0.5 14 | RANDOM_SAMPLE: false 15 | RANDOM_START: true 16 | SIZE_JITTER: 0. 17 | ROT_JITTER: 0 18 | RANDOM_FLIP: 0.5 19 | RANDOM_CROP_P: 0.2 20 | RANDOM_CROP_SIZE: 0.6 21 | COLOR_JITTER: 0.3 22 | ERASE_PROB: 0.3 23 | ERASE_PART: 0.7 24 | ERASE_FILL: False 25 | ERASE_KP: False 26 | ERASE_MARGIN: 0.2 27 | WIDTH: 224 28 | HEIGHT: 224 29 | EVAL: 30 | SAMPLE_POOL: 128 31 | SEQLEN: 16 32 | BATCH_SIZE: 8 33 | INTERPOLATION: 1 34 | BBOX_SCALE: 1.1 35 | LOSS: 36 | KP_2D_W: 300.0 37 | KP_3D_W: 600.0 38 | SHAPE_W: 0.06 39 | POSE_W: 60.0 40 | SMPL_NORM: 1.0 41 | ACCL_W: 0.0 42 | TEMP_W: 0.0 43 | TRAIN: 44 | BATCH_SIZE_3D: 4 45 | BATCH_SIZE_2D: 3 46 | BATCH_SIZE_IMG: 7 47 | IMG_USE_FREQ: 1 48 | NUM_ITERS_PER_EPOCH: -1 49 | RESUME: '' 50 | START_EPOCH: 0 51 | END_EPOCH: 100 # 52 | DATASETS_2D: 53 | - 'insta' 54 | - 'posetrack' 55 | - 'pennaction' 56 | DATASETS_3D: 57 | # - '3dpw' 58 | - 'mpii3d' 59 | - 'h36m' 60 | DATASETS_IMG: 61 | - 'coco2014-all' 62 | - 'lspet' 63 | - 'mpii' 64 | DATASET_EVAL: 'h36m' 65 | EVAL_SET: 'val' 66 | OPTIM: 67 | LR: 0.0001 68 | WD: 0.00001 69 | OPTIM: 'Adam' 70 | WARMUP_EPOCH: 0 71 | WARMUP_FACTOR: 0.1 72 | MILESTONES: [60, 90] 73 | MODEL: 74 | ENABLE_TEMP_MODELING: true 75 | ENABLE_TEMP_EMBEDDING: true 76 | ENCODER: 77 | NUM_BLOCKS: 6 78 | NUM_HEADS: 12 79 | SPA_TEMP_MODE: 'vanilla' 80 | MASK_RATIO: 0. 81 | TEMPORAL_LAYERS: 3 82 | TEMPORAL_NUM_HEADS: 12 83 | LOAD_PRETRAINED_HEAD: True 84 | 85 | -------------------------------------------------------------------------------- /configs/baseline_phase3.yaml: -------------------------------------------------------------------------------- 1 | DEBUG: false 2 | DEBUG_FREQ: 1 3 | LOGDIR: '' 4 | DEVICE: 'cuda' 5 | EXP_NAME: 'token3d' 6 | OUTPUT_DIR: 'results/' 7 | NUM_WORKERS: 8 8 | SEED_VALUE: -1 9 | SAVE_FREQ: 1 10 | DATASET: 11 | SEQLEN: 16 12 | SAMPLE_POOL: 128 13 | OVERLAP: 0.5 14 | RANDOM_SAMPLE: false 15 | RANDOM_START: true 16 | SIZE_JITTER: 0. 17 | ROT_JITTER: 0 18 | RANDOM_FLIP: 0.5 19 | RANDOM_CROP_P: 0.2 20 | RANDOM_CROP_SIZE: 0.6 21 | COLOR_JITTER: 0.3 22 | ERASE_PROB: 0.3 23 | ERASE_PART: 0.7 24 | ERASE_FILL: False 25 | ERASE_KP: False 26 | ERASE_MARGIN: 0.2 27 | WIDTH: 224 28 | HEIGHT: 224 29 | EVAL: 30 | SAMPLE_POOL: 128 31 | SEQLEN: 64 # T=16 32 | BATCH_SIZE: 8 33 | INTERPOLATION: 1 34 | BBOX_SCALE: 1.1 35 | LOSS: 36 | KP_2D_W: 300.0 37 | KP_3D_W: 600.0 38 | SHAPE_W: 0.06 39 | POSE_W: 60.0 40 | SMPL_NORM: 0.01 41 | ACCL_W: 0.0 42 | TEMP_W: 600.0 43 | TRAIN: 44 | BATCH_SIZE_3D: 4 45 | BATCH_SIZE_2D: 0 46 | BATCH_SIZE_IMG: 0 47 | IMG_USE_FREQ: 1 48 | NUM_ITERS_PER_EPOCH: -1 49 | RESUME: '' 50 | START_EPOCH: 0 51 | END_EPOCH: 50 # 52 | DATASETS_2D: [] 53 | DATASETS_3D: 54 | - '3dpw' 55 | - 'h36m' 56 | DATASETS_IMG: [] 57 | DATASET_EVAL: '3dpw' 58 | EVAL_SET: 'test' 59 | OPTIM: 60 | LR: 0.0001 61 | WD: 0.00001 62 | OPTIM: 'sgd' 63 | WARMUP_EPOCH: 0 64 | WARMUP_FACTOR: 0.1 65 | MILESTONES: [30, 40] 66 | MODEL: 67 | ENABLE_TEMP_MODELING: true 68 | ENABLE_TEMP_EMBEDDING: true 69 | ENCODER: 70 | NUM_BLOCKS: 6 71 | NUM_HEADS: 12 72 | SPA_TEMP_MODE: 'vanilla' 73 | MASK_RATIO: 0. 74 | TEMPORAL_LAYERS: 3 75 | TEMPORAL_NUM_HEADS: 12 76 | LOAD_PRETRAINED_HEAD: True 77 | 78 | -------------------------------------------------------------------------------- /doc/cxk.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yangsenius/INT_HMR_Model/8f7ee91bf8272fce37b571c02e5df49c5cd13b20/doc/cxk.gif -------------------------------------------------------------------------------- /doc/dance5_.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yangsenius/INT_HMR_Model/8f7ee91bf8272fce37b571c02e5df49c5cd13b20/doc/dance5_.gif -------------------------------------------------------------------------------- /doc/data.md: -------------------------------------------------------------------------------- 1 | Throughout the documentation we refer to the repo root folder as `$ROOT`. All the datasets listed below should be put in or linked to `$ROOT/data`. 2 | 3 | # Data Preparation 4 | 5 | ## 1. Download Datasets 6 | You should first down load the datasets used in MAED. 7 | 8 | - **InstaVariety** 9 | 10 | Download the 11 | [preprocessed tfrecords](https://github.com/akanazawa/human_dynamics/blob/master/doc/insta_variety.md#pre-processed-tfrecords) 12 | provided by the authors of Temporal HMR. 13 | 14 | Directory structure: 15 | ```shell script 16 | insta_variety 17 | |-- train 18 | | |-- insta_variety_00_copy00_hmr_noS5.ckpt-642561.tfrecord 19 | | |-- insta_variety_01_copy00_hmr_noS5.ckpt-642561.tfrecord 20 | | `-- ... 21 | `-- test 22 | |-- insta_variety_00_copy00_hmr_noS5.ckpt-642561.tfrecord 23 | |-- insta_variety_01_copy00_hmr_noS5.ckpt-642561.tfrecord 24 | `-- ... 25 | ``` 26 | 27 | As the original InstaVariety is saved in tfrecord format, which is not suitable for use in Pytorch. You could run this 28 | [script](../scripts/prepare_insta.sh) which will extract frames of every tfrecord and save them as jpeg. 29 | 30 | Directory structure after extraction: 31 | ```shell script 32 | insta_variety_img 33 | |-- train 34 | |-- insta_variety_00_copy00_hmr_noS5.ckpt-642561.tfrecord 35 | | |-- 0 36 | | |-- 1 37 | | `-- ... 38 | |-- insta_variety_01_copy00_hmr_noS5.ckpt-642561.tfrecord 39 | | |-- 0 40 | | |-- 1 41 | | `-- ... 42 | `-- ... 43 | ``` 44 | 45 | - **[MPI-3D-HP](http://gvv.mpi-inf.mpg.de/3dhp-dataset)** 46 | 47 | Donwload the dataset using the bash script provided by the authors. We will be using standard cameras only, so wall and ceiling 48 | cameras aren't needed. Then, run 49 | [the script from the official VIBE repo](https://gist.github.com/mkocabas/cc6fe78aac51f97859e45f46476882b6) to extract frames of videos. 50 | 51 | Directory structure: 52 | ```shell script 53 | $ROOT/data 54 | mpi_inf_3dhp 55 | |-- S1 56 | | |-- Seq1 57 | | |-- Seq2 58 | |-- S2 59 | | |-- Seq1 60 | | |-- Seq2 61 | |-- ... 62 | `-- util 63 | ``` 64 | 65 | - **[Human 3.6M](http://vision.imar.ro/human3.6m/description.php)** 66 | 67 | Human 3.6M is not a open dataset now, thus it is optional in our training code. **However, Human 3.6M has non-negligible effect on the final performance of MAED.** 68 | 69 | Once getting available to the Human 3.6M dataset, one could refer to [the script](https://github.com/nkolot/SPIN/blob/master/datasets/preprocess/h36m_train.py) from the official SPIN repository to preprocess the Human 3.6M dataset. 70 | Directory structure: 71 | ```shell script 72 | human3.6m 73 | |-- annot 74 | |-- dataset_extras 75 | |-- S1 76 | |-- S11 77 | |-- S5 78 | |-- S6 79 | |-- S7 80 | |-- S8 81 | `-- S9 82 | ``` 83 | 84 | - **[3DPW](https://virtualhumans.mpi-inf.mpg.de/3DPW)** 85 | 86 | Directory structure: 87 | ```shell script 88 | 3dpw 89 | |-- imageFiles 90 | | |-- courtyard_arguing_00 91 | | |-- courtyard_backpack_00 92 | | |-- ... 93 | `-- sequenceFiles 94 | |-- test 95 | |-- train 96 | `-- validation 97 | ``` 98 | 99 | - **[PennAction](http://dreamdragon.github.io/PennAction/)** 100 | 101 | Directory structure: 102 | ```shell script 103 | pennaction 104 | |-- frames 105 | | |-- 0000 106 | | |-- 0001 107 | | |-- ... 108 | `-- labels 109 | |-- 0000.mat 110 | |-- 0001.mat 111 | `-- ... 112 | ``` 113 | 114 | - **[PoseTrack](https://posetrack.net/)** 115 | 116 | Directory structure: 117 | ```shell script 118 | posetrack 119 | |-- images 120 | | |-- train 121 | | |-- val 122 | | |-- test 123 | `-- posetrack_data 124 | `-- annotations 125 | |-- train 126 | |-- val 127 | `-- test 128 | ``` 129 | 130 | - **[MPII](http://human-pose.mpi-inf.mpg.de/)** 131 | 132 | Directory structure: 133 | ```shell script 134 | mpii 135 | |-- 099992483.jpg 136 | |-- 099990098.jpg 137 | `-- ... 138 | ``` 139 | 140 | - **[COCO 2014-All](https://cocodataset.org/)** 141 | 142 | Directory structure: 143 | ```shell script 144 | coco2014-all 145 | |-- COCO_train2014_000000000001.jpg 146 | |-- COCO_train2014_000000000002.jpg 147 | `-- ... 148 | ``` 149 | 150 | - **[LSPet](http://sam.johnson.io/research/lspet.html)** 151 | 152 | Directory structure: 153 | ```shell script 154 | lspet 155 | |-- im00001.jpg 156 | |-- im00002.jpg 157 | `-- ... 158 | ``` 159 | 160 | ## 2. Download Annotation (pt format) 161 | Download annotation data for MAED from [Google Drive](https://drive.google.com/drive/folders/1vApUaFNqo-uNP7RtVRxBy2YJJ1IprnQ8?usp=sharing) and move the whole directory to `$ROOT/data`. 162 | 163 | ## 3. Download SMPL data 164 | Download SMPL data for MAED from [Google Drive](https://drive.google.com/drive/folders/1RqkUInP_0DohMvYpnFpqo7z_KWxjQVa6?usp=sharing) and move the whole directory to `$ROOT/data`. 165 | 166 | ## It's Done! 167 | After downloading all the datasets and annotations, the directory structure of `$ROOT/data` should be like: 168 | ```shell script 169 | $ROOT/data 170 | |-- insta_variety 171 | |-- insta_variety_img 172 | |-- 3dpw 173 | |-- mpii3d 174 | |-- posetrack 175 | |-- pennaction 176 | |-- coco2014-all 177 | |-- lspet 178 | |-- mpii 179 | |-- smpl_data 180 | |-- J_regressor_extra.npy 181 | `-- ... 182 | `-- database 183 | |-- insta_train_db.pt 184 | |-- 3dpw_train_db.pt 185 | |-- lspet_train_db.pt 186 | `-- ... 187 | ``` -------------------------------------------------------------------------------- /doc/micheal2.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yangsenius/INT_HMR_Model/8f7ee91bf8272fce37b571c02e5df49c5cd13b20/doc/micheal2.gif -------------------------------------------------------------------------------- /doc/out3.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yangsenius/INT_HMR_Model/8f7ee91bf8272fce37b571c02e5df49c5cd13b20/doc/out3.gif -------------------------------------------------------------------------------- /eval.py: -------------------------------------------------------------------------------- 1 | import os 2 | import torch 3 | import torchvision 4 | 5 | from lib.dataset import VideoDataset 6 | from lib.data_utils.transforms import * 7 | from lib.models import MAED 8 | from lib.models.tokenpose import Token3d 9 | from lib.core.evaluate import Evaluator 10 | from lib.core.config import parse_args 11 | from torch.utils.data import DataLoader 12 | 13 | 14 | def main(cfg, args): 15 | print(f'...Evaluating on {args.eval_ds.lower()} {args.eval_set.lower()} set...') 16 | device = "cuda" 17 | 18 | model = Token3d( 19 | num_blocks=cfg.MODEL.ENCODER.NUM_BLOCKS, 20 | num_heads=cfg.MODEL.ENCODER.NUM_HEADS, 21 | st_mode=cfg.MODEL.ENCODER.SPA_TEMP_MODE, 22 | mask_ratio=cfg.MODEL.MASK_RATIO, 23 | temporal_layers=cfg.MODEL.TEMPORAL_LAYERS, 24 | temporal_num_heads=cfg.MODEL.TEMPORAL_NUM_HEADS, 25 | enable_temp_modeling=cfg.MODEL.ENABLE_TEMP_MODELING, 26 | enable_temp_embedding=cfg.MODEL.ENABLE_TEMP_EMBEDDING 27 | ) 28 | 29 | print("model params:{:.3f}M (/1000^2)".format( 30 | sum([p.numel() for p in model.parameters()]) / 1000**2)) 31 | 32 | if args.pretrained != '' and os.path.isfile(args.pretrained): 33 | checkpoint = torch.load(args.pretrained, map_location='cpu') 34 | # best_performance = checkpoint['performance'] 35 | history_best_performance = checkpoint['history_best_peformance'] \ 36 | if 'history_best_peformance' in checkpoint else checkpoint['performance'] 37 | state_dict = {} 38 | for k, w in checkpoint['state_dict'].items(): 39 | if k.startswith('module.'): 40 | state_dict[k[len('module.'):]] = w 41 | elif k in model.state_dict(): 42 | state_dict[k] = w 43 | else: 44 | continue 45 | 46 | temp_embedding_shape = state_dict['temporal_pos_embedding'].shape 47 | if model.temporal_pos_embedding.shape[1] != temp_embedding_shape[1]: 48 | model.temporal_pos_embedding = torch.nn.Parameter( 49 | torch.zeros(1, temp_embedding_shape[1], temp_embedding_shape[2])) 50 | 51 | # checkpoint['state_dict'] = {k[len('module.'):]: w for k, w in checkpoint['state_dict'].items() if k.startswith('module.') else} 52 | model.load_state_dict(state_dict, strict=False) 53 | print(f'==> Loaded pretrained model from {args.pretrained}...') 54 | print( 55 | f'==> History best Performance on 3DPW test set {history_best_performance}') 56 | else: 57 | print(f'{args.pretrained} is not a pretrained model!!!!') 58 | exit() 59 | 60 | model = model.to(device) 61 | 62 | transforms = torchvision.transforms.Compose([ 63 | CropVideo(cfg.DATASET.HEIGHT, cfg.DATASET.WIDTH, 64 | default_bbox_scale=cfg.EVAL.BBOX_SCALE), 65 | StackFrames(), 66 | ToTensorVideo(), 67 | NormalizeVideo(), 68 | ]) 69 | 70 | test_db = VideoDataset( 71 | args.eval_ds.lower(), 72 | set=args.eval_set.lower(), 73 | transforms=transforms, 74 | sample_pool=cfg.EVAL.SAMPLE_POOL, 75 | random_sample=False, random_start=False, 76 | verbose=True, 77 | debug=cfg.DEBUG) 78 | 79 | test_loader = DataLoader( 80 | dataset=test_db, 81 | batch_size=cfg.EVAL.BATCH_SIZE, 82 | shuffle=False, 83 | num_workers=cfg.NUM_WORKERS, 84 | ) 85 | 86 | Evaluator().run( 87 | model=model, 88 | dataloader=test_loader, 89 | seqlen=cfg.EVAL.SEQLEN, 90 | interp=cfg.EVAL.INTERPOLATION, 91 | save_path=args.output_path, 92 | device=cfg.DEVICE, 93 | ) 94 | 95 | 96 | if __name__ == '__main__': 97 | args, cfg, cfg_file = parse_args() 98 | 99 | main(cfg, args) 100 | -------------------------------------------------------------------------------- /exp/eval/hvd_start.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # smplx: view operation requires contiguous tensor, replacing by reshape operation 3 | sed -i "347c rel_joints.reshape(-1, 3, 1)).view(-1, joints.shape[1], 4, 4)" ~/anconda3/envs/int_hmr/lib/python3.6/site-packages/smplx/lbs.py 4 | 5 | sed -i "96,97d" ~/anconda3/envs/int_hmr/lib/python3.6/site-packages/horovod/torch/mpi_ops.py 6 | 7 | unset OMPI_MCA_plm_rsh_agent 8 | export NCCL_SOCKET_IFNAME=eth1 9 | export NCCL_IB_DISABLE=1 10 | export NCCL_DEBUG=INFO 11 | export LANG=zh_CN.UTF-8 12 | 13 | date=`date +%Y%m%d_%H%M%S` 14 | export LANG=en_US.UTF-8 15 | 16 | # link to the work dir to save checkpoints and logs 17 | 18 | if [ ! -d 'workdir' ];then 19 | mkdir -p workdir 20 | fi 21 | work_dir=workdir/token3d_training_dir 22 | model_name=hvd_token3d_phase3 23 | config_yaml=configs/baseline_phase3.yaml 24 | 25 | 26 | exp_dir=${work_dir}/${model_name} 27 | 28 | if [ ! -d ${exp_dir} ];then 29 | mkdir -p ${exp_dir} 30 | fi 31 | echo 'current work dir is: '${exp_dir} 32 | 33 | echo ">>>> eval" 34 | #'epoch_100.pth.tar' # 'model_best.pth.tar' #44_9_model_best.pth.tar' 35 | best_name='model_best.pth.tar' 36 | best_from=$exp_dir/$best_name 37 | #best_from='/cfs/cfs-31b43a0b8/personal/brucesyang/baseline_training_dir/tp_baseline_token3dpretrain/coco/transpose_r/token3dpretrain/checkpoint.pth' 38 | 39 | python eval.py --cfg $config_yaml\ 40 | --pretrained $best_from \ 41 | --eval_ds 3dpw \ 42 | --eval_set val \ 43 | 2>&1 | tee -a ${exp_dir}/eval_output.log 44 | 45 | python eval.py --cfg $config_yaml\ 46 | --pretrained $best_from \ 47 | --eval_ds 3dpw \ 48 | --eval_set test \ 49 | 2>&1 | tee -a ${exp_dir}/eval_output.log 50 | 51 | python eval.py --cfg $config_yaml\ 52 | --pretrained $best_from \ 53 | --eval_ds h36m \ 54 | --eval_set val \ 55 | 2>&1 | tee -a ${exp_dir}/eval_output.log 56 | -------------------------------------------------------------------------------- /exp/phase1/hvd_start.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # smplx: view operation requires contiguous tensor, replacing by reshape operation 3 | sed -i "347c rel_joints.reshape(-1, 3, 1)).view(-1, joints.shape[1], 4, 4)" ~/anconda3/envs/int_hmr/lib/python3.6/site-packages/smplx/lbs.py 4 | 5 | sed -i "96,97d" ~/anconda3/envs/int_hmr/lib/python3.6/site-packages/horovod/torch/mpi_ops.py 6 | 7 | unset OMPI_MCA_plm_rsh_agent 8 | export NCCL_SOCKET_IFNAME=eth1 9 | export NCCL_IB_DISABLE=1 10 | export NCCL_DEBUG=INFO 11 | export LANG=zh_CN.UTF-8 12 | 13 | date=`date +%Y%m%d_%H%M%S` 14 | export LANG=en_US.UTF-8 15 | 16 | # link to the work dir to save checkpoints and logs 17 | if [ ! -d 'workdir' ];then 18 | mkdir -p workdir 19 | fi 20 | work_dir=workdir/token3d_training_dir 21 | model_name=hvd_token3d_phase1 22 | exp_dir=${work_dir}/${model_name} 23 | resume_name='checkpoint.pt' 24 | 25 | resume_from=$exp_dir/$resume_name 26 | best_from=$work_dir/$best_name 27 | 28 | if [ ! -d ${exp_dir} ];then 29 | mkdir -p ${exp_dir} 30 | fi 31 | echo 'current work dir is: '${exp_dir} 32 | 33 | # for example 34 | # To run on a machine with 4 GPUs: 35 | # horovodrun -np 4 -H localhost:4 python train.py 36 | 37 | # To run on 4 machines with 4 GPUs each 38 | # horovodrun -np 16 -H server1:4,server2:4,server3:4,server4:4 python train.py 39 | 40 | 41 | gpu_min=$1 # total_min_gpu_num 42 | node_list=$2 #server1_ip:gpu_num,server2_ip:gpu_num 43 | 44 | horovodrun -np ${gpu_min} -H ${node_list} \ 45 | python train_hvd.py --cfg configs/baseline_phase1.yaml\ 46 | --resume $resume_from \ 47 | --logdir ${exp_dir} 2>&1 | tee -a ${exp_dir}/hvd_output.log 48 | -------------------------------------------------------------------------------- /exp/phase2/hvd_start.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # smplx: view operation requires contiguous tensor, replacing by reshape operation 3 | sed -i "347c rel_joints.reshape(-1, 3, 1)).view(-1, joints.shape[1], 4, 4)" ~/anconda3/envs/int_hmr/lib/python3.6/site-packages/smplx/lbs.py 4 | 5 | sed -i "96,97d" ~/anconda3/envs/int_hmr/lib/python3.6/site-packages/horovod/torch/mpi_ops.py 6 | 7 | unset OMPI_MCA_plm_rsh_agent 8 | export NCCL_SOCKET_IFNAME=eth1 9 | export NCCL_IB_DISABLE=1 10 | export NCCL_DEBUG=INFO 11 | export LANG=zh_CN.UTF-8 12 | 13 | date=`date +%Y%m%d_%H%M%S` 14 | export LANG=en_US.UTF-8 15 | 16 | # link to the work dir to save checkpoints and logs 17 | if [ ! -d 'workdir' ];then 18 | mkdir -p workdir 19 | fi 20 | work_dir=workdir/token3d_training_dir 21 | model_name=hvd_token3d_phase2 22 | exp_dir=${work_dir}/${model_name} 23 | resume_name='checkpoint.pt' 24 | best_name='hvd_token3d_phase1/epoch_100.pth.tar' 25 | 26 | resume_from=$exp_dir/$resume_name 27 | best_from=$work_dir/$best_name 28 | 29 | if [ ! -d ${exp_dir} ];then 30 | mkdir -p ${exp_dir} 31 | fi 32 | echo 'current work dir is: '${exp_dir} 33 | 34 | # for example 35 | # To run on a machine with 4 GPUs: 36 | # horovodrun -np 4 -H localhost:4 python train.py 37 | 38 | # To run on 4 machines with 4 GPUs each 39 | # horovodrun -np 16 -H server1:4,server2:4,server3:4,server4:4 python train.py 40 | 41 | 42 | gpu_min=$1 # total_min_gpu_num 43 | node_list=$2 #server1_ip:gpu_num,server2_ip:gpu_num 44 | 45 | horovodrun -np ${gpu_min} -H ${node_list} \ 46 | python train_hvd.py --cfg configs/baseline_phase2.yaml\ 47 | --resume $resume_from \ 48 | --pretrained $best_from \ 49 | --logdir ${exp_dir} 2>&1 | tee -a ${exp_dir}/hvd_output.log 50 | -------------------------------------------------------------------------------- /exp/phase3/hvd_start.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # smplx: view operation requires contiguous tensor, replacing by reshape operation 3 | sed -i "347c rel_joints.reshape(-1, 3, 1)).view(-1, joints.shape[1], 4, 4)" ~/anconda3/envs/int_hmr/lib/python3.6/site-packages/smplx/lbs.py 4 | 5 | sed -i "96,97d" ~/anconda3/envs/int_hmr/lib/python3.6/site-packages/horovod/torch/mpi_ops.py 6 | 7 | unset OMPI_MCA_plm_rsh_agent 8 | export NCCL_SOCKET_IFNAME=eth1 9 | export NCCL_IB_DISABLE=1 10 | export NCCL_DEBUG=INFO 11 | export LANG=zh_CN.UTF-8 12 | 13 | date=`date +%Y%m%d_%H%M%S` 14 | export LANG=en_US.UTF-8 15 | 16 | # link to the work dir to save checkpoints and logs 17 | if [ ! -d 'workdir' ];then 18 | mkdir -p workdir 19 | fi 20 | work_dir=workdir/token3d_training_dir 21 | model_name=hvd_token3d_phase3 22 | exp_dir=${work_dir}/${model_name} 23 | resume_name='checkpoint.pt' 24 | best_name='hvd_token3d_phase2/epoch_100.pth.tar' 25 | 26 | resume_from=$exp_dir/$resume_name 27 | best_from=$work_dir/$best_name 28 | 29 | if [ ! -d ${exp_dir} ];then 30 | mkdir -p ${exp_dir} 31 | fi 32 | echo 'current work dir is: '${exp_dir} 33 | 34 | # for example 35 | # To run on a machine with 4 GPUs: 36 | # horovodrun -np 4 -H localhost:4 python train.py 37 | 38 | # To run on 4 machines with 4 GPUs each 39 | # horovodrun -np 16 -H server1:4,server2:4,server3:4,server4:4 python train.py 40 | 41 | 42 | gpu_min=$1 # total_min_gpu_num 43 | node_list=$2 #server1_ip:gpu_num,server2_ip:gpu_num 44 | 45 | horovodrun -np ${gpu_min} -H ${node_list} \ 46 | python train_hvd.py --cfg configs/baseline_phase3.yaml\ 47 | --resume $resume_from \ 48 | --pretrained $best_from \ 49 | --logdir ${exp_dir} 2>&1 | tee -a ${exp_dir}/hvd_output.log 50 | -------------------------------------------------------------------------------- /lib/core/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yangsenius/INT_HMR_Model/8f7ee91bf8272fce37b571c02e5df49c5cd13b20/lib/core/__init__.py -------------------------------------------------------------------------------- /lib/core/config.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import argparse 3 | from pickle import FALSE 4 | from yacs.config import CfgNode as CN 5 | 6 | # CONSTANTS 7 | # You may modify them at will 8 | DB_DIR = 'data/database' 9 | DATA_DIR = 'data/smpl_data' 10 | INSTA_DIR = 'data/insta_variety' 11 | INSTA_IMG_DIR = 'data/insta_variety_img' 12 | MPII3D_DIR = 'data/mpi_inf_3dhp' 13 | THREEDPW_DIR = 'data/3dpw' 14 | HUMAN36M_DIR = 'data/human3.6m' 15 | PENNACTION_DIR = 'data/penn_action' 16 | POSETRACK_DIR = 'data/posetrack' 17 | 18 | # Configuration variables 19 | cfg = CN() 20 | 21 | cfg.OUTPUT_DIR = 'results' 22 | cfg.EXP_NAME = 'default' 23 | cfg.DEVICE = 'cuda' 24 | cfg.DEBUG = True 25 | cfg.LOGDIR = '' 26 | cfg.NUM_WORKERS = 8 27 | cfg.DEBUG_FREQ = 1000 28 | cfg.SEED_VALUE = -1 29 | cfg.SAVE_FREQ = 5 30 | 31 | cfg.CUDNN = CN() 32 | cfg.CUDNN.BENCHMARK = True 33 | cfg.CUDNN.DETERMINISTIC = False 34 | cfg.CUDNN.ENABLED = True 35 | 36 | cfg.TRAIN = CN() 37 | cfg.TRAIN.DATASETS_2D = ['insta'] 38 | cfg.TRAIN.DATASETS_3D = ['mpii3d'] 39 | cfg.TRAIN.DATASETS_IMG = ['coco2014-all'] 40 | cfg.TRAIN.DATASET_EVAL = 'ThreeDPW' 41 | cfg.TRAIN.EVAL_SET = 'val' 42 | cfg.TRAIN.BATCH_SIZE_3D = 4 43 | cfg.TRAIN.BATCH_SIZE_2D = 4 44 | cfg.TRAIN.BATCH_SIZE_IMG = 8 45 | cfg.TRAIN.IMG_USE_FREQ = 1 46 | cfg.TRAIN.START_EPOCH = 0 47 | cfg.TRAIN.END_EPOCH = 5 48 | cfg.TRAIN.RESUME = '' 49 | cfg.TRAIN.NUM_ITERS_PER_EPOCH = -1 50 | 51 | # <====== optimizer 52 | cfg.TRAIN.OPTIM = CN() 53 | cfg.TRAIN.OPTIM.OPTIM = 'Adam' 54 | cfg.TRAIN.OPTIM.LR = 1e-4 55 | cfg.TRAIN.OPTIM.WD = 1e-4 56 | cfg.TRAIN.OPTIM.MOMENTUM = 0.9 57 | cfg.TRAIN.OPTIM.WARMUP_EPOCH = 2 58 | cfg.TRAIN.OPTIM.WARMUP_FACTOR = 0.1 59 | cfg.TRAIN.OPTIM.MILESTONES = [10, 15] 60 | 61 | cfg.DATASET = CN() 62 | cfg.DATASET.SEQLEN = 20 63 | cfg.DATASET.OVERLAP = 0.5 64 | cfg.DATASET.SAMPLE_POOL = 64 65 | cfg.DATASET.SIZE_JITTER = 0.2 66 | cfg.DATASET.ROT_JITTER = 30 67 | cfg.DATASET.RANDOM_SAMPLE = True 68 | cfg.DATASET.RANDOM_START = False 69 | cfg.DATASET.RANDOM_FLIP = 0.5 70 | cfg.DATASET.WIDTH = 224 71 | cfg.DATASET.HEIGHT = 224 72 | cfg.DATASET.RANDOM_CROP_P = 0.0 73 | cfg.DATASET.RANDOM_CROP_SIZE = 0.5 74 | cfg.DATASET.COLOR_JITTER = 0.3 75 | cfg.DATASET.ERASE_PROB = 0.3 76 | cfg.DATASET.ERASE_PART = 0.7 77 | cfg.DATASET.ERASE_FILL = False 78 | cfg.DATASET.ERASE_KP = False 79 | cfg.DATASET.ERASE_MARGIN = 0.2 80 | 81 | 82 | cfg.LOSS = CN() 83 | cfg.LOSS.KP_2D_W = 60. 84 | cfg.LOSS.KP_3D_W = 30. 85 | cfg.LOSS.SHAPE_W = 0.001 86 | cfg.LOSS.POSE_W = 1.0 87 | cfg.LOSS.SMPL_NORM = 1. 88 | cfg.LOSS.ACCL_W = 0. 89 | cfg.LOSS.DELTA_NORM = 0.0001 90 | cfg.LOSS.TEMP_W = 0. 91 | 92 | cfg.MODEL = CN() 93 | 94 | # GRU model hyperparams 95 | cfg.MODEL.DECODER = CN() 96 | cfg.MODEL.PROJ_MODE = 'linear' 97 | cfg.MODEL.USE_JOINT2D_HEAD = False 98 | cfg.MODEL.USE_ROT2TOKEN_HEAD = False 99 | cfg.MODEL.CONTRAINT_TOKEN_DELTA = False 100 | cfg.MODEL.DECODER.BACKBONE = 'ktd' 101 | cfg.MODEL.DECODER.HIDDEN_DIM = 1024 102 | cfg.MODEL.ENCODER = CN() 103 | cfg.MODEL.ENCODER.BACKBONE = 'ste' 104 | cfg.MODEL.ENCODER.NUM_BLOCKS = 6 105 | cfg.MODEL.ENCODER.NUM_HEADS = 12 106 | cfg.MODEL.ENCODER.SPA_TEMP_MODE = 'vanilla' 107 | # temporal 108 | cfg.MODEL.MASK_RATIO = 0. 109 | cfg.MODEL.TEMPORAL_LAYERS = 3 110 | cfg.MODEL.TEMPORAL_NUM_HEADS = 12 111 | 112 | cfg.MODEL.LOAD_PRETRAINED_HEAD = True 113 | 114 | cfg.MODEL.ENABLE_TEMP_MODELING = True 115 | cfg.MODEL.ENABLE_TEMP_EMBEDDING = False 116 | 117 | cfg.EVAL = CN() 118 | cfg.EVAL.SEQLEN = 16 119 | cfg.EVAL.SAMPLE_POOL = 128 120 | cfg.EVAL.BATCH_SIZE = 32 121 | cfg.EVAL.INTERPOLATION = 1 122 | cfg.EVAL.BBOX_SCALE = 1.3 123 | 124 | def get_cfg_defaults(): 125 | """Get a yacs CfgNode object with default values for my_project.""" 126 | # Return a clone so that the defaults will not be altered 127 | # This is for the "local variable" use pattern 128 | return cfg.clone() 129 | 130 | 131 | def update_cfg(cfg_file): 132 | cfg = get_cfg_defaults() 133 | cfg.merge_from_file(cfg_file) 134 | return cfg.clone() 135 | 136 | 137 | def parse_args(): 138 | parser = argparse.ArgumentParser() 139 | parser.add_argument('--cfg', type=str, help='cfg file path') 140 | parser.add_argument('--pretrained', type=str, 141 | help='stage 1 checkpoint file path', default='') 142 | parser.add_argument('--resume', type=str, help='resume', default='') 143 | parser.add_argument('--eval_ds', type=str, help='eval set name', default='3dpw') 144 | parser.add_argument('--eval_set', type=str, 145 | help='eval set in [test|val]', default='test') 146 | parser.add_argument('--image_root', type=str, 147 | help='inference image root', default='') 148 | parser.add_argument('--image_list', type=str, 149 | help='inference image list', default='') 150 | parser.add_argument('--output_path', type=str, 151 | help='path to save the inference file generated in evaluation', default='') 152 | parser.add_argument('--logdir', type=str, 153 | help="workdir save logs, checkpoints, best_models") 154 | args = parser.parse_args() 155 | print(args, end='\n\n') 156 | 157 | cfg_file = args.cfg 158 | if args.cfg is not None: 159 | cfg = update_cfg(args.cfg) 160 | else: 161 | cfg = get_cfg_defaults() 162 | 163 | return args, cfg, cfg_file 164 | -------------------------------------------------------------------------------- /lib/core/evaluate.py: -------------------------------------------------------------------------------- 1 | import time 2 | import torch 3 | import shutil 4 | import logging 5 | import numpy as np 6 | import os.path as osp 7 | import traceback 8 | import joblib 9 | from tqdm import tqdm 10 | from collections import defaultdict 11 | from scipy.interpolate import interp1d 12 | 13 | from lib.core.config import DATA_DIR, DB_DIR 14 | from lib.models.smpl import REGRESSOR_DICT, JID_DICT 15 | from lib.utils.utils import move_dict_to_device, AverageMeter 16 | 17 | from lib.utils.eval_utils import ( 18 | compute_accel, 19 | compute_error_accel, 20 | compute_error_verts, 21 | batch_compute_similarity_transform_torch, 22 | ) 23 | logger = logging.getLogger(__name__) 24 | 25 | class Evaluator(): 26 | def __init__(self): 27 | self.evaluation_accumulators = defaultdict(list) 28 | 29 | def inference(self, 30 | model, 31 | dataloader, 32 | seqlen=8, 33 | interp=1, 34 | device='cpu', 35 | verbose=True, desc='[Evaluating] ' 36 | ): 37 | """ 38 | Args: 39 | interp (int >= 1): 1 out of frame is predicted by the model, while the rest is obtained by interpolation. interp = 1 means all the frames are predicted by the model. 40 | """ 41 | model.eval() 42 | dataset_name = dataloader.dataset.dataset_name 43 | 44 | start = time.time() 45 | 46 | summary_string = '' 47 | 48 | self.evaluation_accumulators = defaultdict(list) 49 | 50 | flatten_dim = lambda x: x.reshape((-1, ) + x.shape[2:]) 51 | 52 | J_regressor = torch.from_numpy(np.load(osp.join(DATA_DIR, REGRESSOR_DICT[dataset_name]))).float() if REGRESSOR_DICT[dataset_name] else None 53 | Jid = JID_DICT[dataset_name] 54 | 55 | tqdm_bar = tqdm(range(len(dataloader)), desc=desc) if verbose else range(len(dataloader)) 56 | test_iter = iter(dataloader) 57 | 58 | for i in tqdm_bar: 59 | target = next(test_iter) 60 | move_dict_to_device(target, device) 61 | 62 | # <============= 63 | with torch.no_grad(): 64 | pred_verts_seq = [] 65 | pred_j3d_seq = [] 66 | pred_j2d_seq = [] 67 | pred_theta_seq = [] 68 | pred_rotmat_seq = [] 69 | valid_joints = [joint_id for joint_id in range(target['kp_3d'].shape[2]) if target['kp_3d'][0,0,joint_id,-1]] 70 | 71 | orig_len = target['images'].shape[1] 72 | interp_len = target['images'][:, ::interp].shape[1] 73 | sample_freq = interp_len // seqlen 74 | 75 | for i in range(sample_freq): 76 | inp = target['images'][:, ::interp][:, i::sample_freq] 77 | 78 | preds = model(inp, J_regressor=J_regressor) 79 | 80 | pred_verts_seq.append(preds['verts'].cpu().numpy()) 81 | pred_j3d_seq.append(preds['kp_3d'][:,:,Jid].cpu().numpy()) 82 | pred_j2d_seq.append(preds['kp_2d'][:,:,Jid].cpu().numpy()) 83 | pred_theta_seq.append(preds['theta'].cpu().numpy()) 84 | pred_rotmat_seq.append(preds['rotmat'].cpu().numpy()) 85 | 86 | # valid_seq is used to filter out repeated frames 87 | valid_seq = flatten_dim(target['valid']).cpu().numpy() 88 | 89 | # register pred 90 | pred_verts_seq = self.interpolate(self.merge_sequence(pred_verts_seq), orig_len, interp_len)[valid_seq] # (NT, 6890, 3) 91 | pred_j3d_seq = self.interpolate(self.merge_sequence(pred_j3d_seq), orig_len, interp_len)[valid_seq] # (NT, n_kp, 3) 92 | pred_j2d_seq = self.interpolate(self.merge_sequence(pred_j2d_seq), orig_len, interp_len)[valid_seq] # (NT, n_kp, 2) 93 | pred_theta_seq = self.interpolate(self.merge_sequence(pred_theta_seq), orig_len, interp_len)[valid_seq] # (NT, 3+72+10) 94 | pred_rotmat_seq = self.interpolate(self.merge_sequence(pred_rotmat_seq), orig_len, interp_len)[valid_seq] # (NT, 3, 3) 95 | 96 | self.evaluation_accumulators['pred_verts'].append(pred_verts_seq) 97 | self.evaluation_accumulators['pred_theta'].append(pred_theta_seq) 98 | self.evaluation_accumulators['pred_rotmat'].append(pred_rotmat_seq) 99 | self.evaluation_accumulators['pred_j3d'].append(pred_j3d_seq) 100 | self.evaluation_accumulators['pred_j2d'].append(pred_j2d_seq) 101 | 102 | # register target 103 | target_j3d_seq = flatten_dim(target['kp_3d'][:, :, valid_joints]).cpu().numpy()[valid_seq] # (NT, n_kp, 4) 104 | target_j2d_seq = flatten_dim(target['kp_2d'][:, :, valid_joints]).cpu().numpy()[valid_seq] # (NT, n_kp, 3) 105 | target_theta_seq = flatten_dim(target['theta']).cpu().numpy()[valid_seq] # (NT, 3+72+10) 106 | self.evaluation_accumulators['target_theta'].append(target_theta_seq) 107 | self.evaluation_accumulators['target_j3d'].append(target_j3d_seq) 108 | self.evaluation_accumulators['target_j2d'].append(target_j2d_seq) 109 | 110 | # register some other infomation 111 | vid_name = np.reshape(np.array(target['instance_id']).T, (-1,))[valid_seq] # (NT,) 112 | paths = np.reshape(np.array(target['paths']).T, (-1,))[valid_seq] # (NT,) 113 | bboxes = np.reshape(target['bbox'].cpu().numpy(), (-1,4))[valid_seq] # (NT, 4) 114 | self.evaluation_accumulators['instance_id'].append(vid_name) 115 | self.evaluation_accumulators['bboxes'].append(bboxes) 116 | self.evaluation_accumulators['paths'].append(paths) 117 | 118 | # =============> 119 | 120 | batch_time = time.time() - start 121 | 122 | summary_string = f'{desc} | batch: {batch_time * 10.0:.4}ms ' 123 | 124 | if verbose: 125 | tqdm_bar.set_description(summary_string) 126 | 127 | def merge_sequence(self, seq): 128 | if seq is None: 129 | return None 130 | seq = np.stack(seq, axis=2) #(N, T//num_of_seq, num_of_seq, ...) 131 | assert len(seq.shape) >= 3 132 | seq = seq.reshape((-1, ) + seq.shape[3:]) #(NT, ...) 133 | return seq 134 | 135 | def evaluate(self, save_path=''): 136 | # stack accumulators along axis 0 137 | for k, v in self.evaluation_accumulators.items(): 138 | self.evaluation_accumulators[k] = np.concatenate(v, axis=0) 139 | 140 | pred_j3ds = self.evaluation_accumulators['pred_j3d'] #(N, n_kp, 3) 141 | target_j3ds = self.evaluation_accumulators['target_j3d'][:,:,:-1] #(N, n_kp, 3) 142 | vis = self.evaluation_accumulators['target_j3d'][:,:,-1:] #(N, n_kp, 1) 143 | num_pred = len(pred_j3ds) 144 | target_j3ds *= vis 145 | pred_j3ds *= vis 146 | 147 | pred_j3ds = torch.from_numpy(pred_j3ds).float() 148 | target_j3ds = torch.from_numpy(target_j3ds).float() 149 | 150 | pred_pelvis = (pred_j3ds[:,[2],:] + pred_j3ds[:,[3],:]) / 2.0 151 | target_pelvis = (target_j3ds[:,[2],:] + target_j3ds[:,[3],:]) / 2.0 152 | 153 | pred_j3ds -= pred_pelvis 154 | target_j3ds -= target_pelvis 155 | 156 | 157 | # reduce cpu memory 158 | pred_j3ds = pred_j3ds.cuda() 159 | target_j3ds = target_j3ds.cuda() 160 | del pred_pelvis, target_pelvis 161 | 162 | 163 | # Absolute error (MPJPE) 164 | errors = torch.sqrt(((pred_j3ds - target_j3ds) ** 2).sum(dim=-1)).mean(dim=-1).cpu().numpy() 165 | S1_hat = batch_compute_similarity_transform_torch(pred_j3ds, target_j3ds) 166 | errors_pa = torch.sqrt(((S1_hat - target_j3ds) ** 2).sum(dim=-1)).mean(dim=-1).cpu().numpy() 167 | pred_verts = self.evaluation_accumulators['pred_verts'] 168 | target_theta = self.evaluation_accumulators['target_theta'] 169 | pve = compute_error_verts(target_theta=target_theta, pred_verts=pred_verts) 170 | 171 | pred_j3ds = pred_j3ds.cpu().numpy() 172 | target_j3ds = target_j3ds.cpu().numpy() 173 | 174 | accel_err = compute_error_accel(joints_pred=pred_j3ds, joints_gt=target_j3ds) 175 | accel = compute_accel(pred_j3ds) 176 | 177 | m2mm = 1000 178 | 179 | eval_dict = { 180 | 'mpjpe': np.mean(errors) * m2mm, 181 | 'pa-mpjpe': np.mean(errors_pa) * m2mm, 182 | 'pve': np.mean(pve) * m2mm, 183 | 'accel': np.mean(accel) * m2mm, 184 | 'accel_err': np.mean(accel_err) * m2mm 185 | } 186 | 187 | if save_path: 188 | self.save_result(save_path, mpjpe=errors, pa_mpjpe=errors_pa, accel=accel_err) 189 | 190 | return eval_dict, num_pred 191 | 192 | def log(self, eval_dict, num_pred, desc=''): 193 | print(f"Evaluated on {int(num_pred)} number of poses.") 194 | print(f'{desc}' + ' '.join([f'{k.upper()}: {v:.4f},'for k,v in eval_dict.items()])) 195 | 196 | def run(self, model, dataloader, 197 | seqlen=8, interp=1, device='cpu', 198 | save_path='', verbose=True, desc='[Evaluating]' 199 | ): 200 | self.inference(model, dataloader, seqlen=seqlen, interp=interp, device=device, verbose=verbose, desc=desc) 201 | #self.count_attn(model) 202 | eval_dict, num_pred = self.evaluate(save_path) 203 | self.log(eval_dict, num_pred) 204 | 205 | def count_attn(self, model): 206 | result = {} 207 | result["vid_name"] = np.concatenate(self.evaluation_accumulators['instance_id'], axis=0) 208 | 209 | for i, blk in enumerate(model.backbone.blocks): 210 | result[f"attn_s_{i}"] = blk.attn.attn_count_s 211 | result[f"attn_t_{i}"] = blk.attn.attn_count_t 212 | 213 | joblib.dump(result, "attn.pt") 214 | 215 | def save_result(self, save_path, *args, **kwargs): 216 | save_fields = [ 217 | 'pred_theta', 218 | #'pred_j3d', 219 | #'pred_j2d', 220 | 'pred_verts', 221 | 'paths', 222 | 'bboxes', 223 | #'pred_rotmat' 224 | ] 225 | save_dic = {k: v for k, v in self.evaluation_accumulators.items() if k in save_fields} 226 | save_dic.update(kwargs) 227 | joblib.dump(save_dic, osp.join(save_path, 'inference.pkl')) 228 | 229 | def interpolate(self, sequence, orig_len, interp_len): 230 | """ 231 | Args: 232 | sequence (np array): size (N*interp_len, ...) 233 | orig_len (int) 234 | interp_len (int): larger than or equal to orig_len 235 | 236 | Return: 237 | A np array of size (N*orig_len, ...) 238 | """ 239 | if orig_len == interp_len: return sequence 240 | sequence = sequence.reshape((-1, interp_len) + sequence.shape[1:]) # (N, interp_len, ...) 241 | x = np.linspace(1., 0., num=interp_len, endpoint=False)[::-1] # (interp_len, ) 242 | f = interp1d(x, sequence, axis=1, fill_value="extrapolate") 243 | 244 | new_x = np.linspace(0., 1., num=orig_len, endpoint=True) # (orig_len, ) 245 | ret = f(new_x) 246 | ret = ret.reshape((-1,) + ret.shape[2:]) 247 | return ret 248 | -------------------------------------------------------------------------------- /lib/data_utils/img_utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import cv2 3 | import torch 4 | import io 5 | import numpy as np 6 | import os.path as osp 7 | 8 | from skimage.util.shape import view_as_windows 9 | from PIL import Image 10 | 11 | import shutil 12 | import uuid 13 | 14 | 15 | 16 | def download(p, cache_dir='/dockerdata'): 17 | new_p = '{}/{}'.format(cache_dir, p) 18 | if (not os.path.exists(new_p)): 19 | subdir = os.path.dirname(new_p) 20 | if (subdir != ''): 21 | os.makedirs(subdir, exist_ok=True) 22 | uuid_str = uuid.uuid4().hex 23 | tmp_new_p = new_p + "." + uuid_str 24 | shutil.copyfile(p, tmp_new_p) 25 | shutil.move(tmp_new_p, new_p) 26 | return new_p 27 | 28 | 29 | 30 | def get_bbox_from_kp2d(kp_2d): 31 | # get bbox 32 | if len(kp_2d.shape) > 2: 33 | ul = np.array([kp_2d[:, :, 0].min(axis=1), kp_2d[:, :, 1].min(axis=1)]) # upper left 34 | lr = np.array([kp_2d[:, :, 0].max(axis=1), kp_2d[:, :, 1].max(axis=1)]) # lower right 35 | else: 36 | ul = np.array([kp_2d[:, 0].min(), kp_2d[:, 1].min()]) # upper left 37 | lr = np.array([kp_2d[:, 0].max(), kp_2d[:, 1].max()]) # lower right 38 | 39 | # ul[1] -= (lr[1] - ul[1]) * 0.10 # prevent cutting the head 40 | w = lr[0] - ul[0] 41 | h = lr[1] - ul[1] 42 | c_x, c_y = ul[0] + w / 2, ul[1] + h / 2 43 | # to keep the aspect ratio 44 | w = h = np.where(w / h > 1, w, h) 45 | w = h = h * 1.1 46 | 47 | bbox = np.array([c_x, c_y, w, h]) # shape = (4,N) 48 | return bbox 49 | 50 | def split_into_chunks(vid_names, seqlen, stride, pad=True): 51 | video_start_end_indices = [] 52 | 53 | video_names, group = np.unique(vid_names, return_index=True) 54 | perm = np.argsort(group) 55 | video_names, group = video_names[perm], group[perm] 56 | 57 | indices = np.split(np.arange(0, vid_names.shape[0]), group[1:]) 58 | 59 | for idx in range(len(video_names)): 60 | indexes = indices[idx] 61 | if pad: 62 | padlen = (seqlen - indexes.shape[0] % seqlen) % seqlen 63 | indexes = np.pad(indexes, ((0, padlen)), 'reflect') 64 | if indexes.shape[0] < seqlen: 65 | continue 66 | chunks = view_as_windows(indexes, (seqlen,), step=stride) 67 | chunks = chunks.tolist() 68 | #start_finish = chunks[:, (0, -1)].tolist() 69 | #video_start_end_indices += start_finish 70 | video_start_end_indices += chunks 71 | 72 | return video_start_end_indices 73 | 74 | def pad_image(img, h, w): 75 | img = img.copy() 76 | img_h, img_w, _ = img.shape 77 | pad_top = (h - img_h) // 2 78 | pad_bottom = h - img_h - pad_top 79 | pad_left = (w - img_w) // 2 80 | pad_right = w - img_w - pad_left 81 | 82 | img = np.pad(img, ((pad_top, pad_bottom),(pad_left, pad_right),(0, 0))) 83 | 84 | return img 85 | # 86 | # def read_img(path, convert='RGB', check_exist=False): 87 | # if check_exist and not osp.exists(path): 88 | # return None 89 | # try: 90 | # img = Image.open(path) 91 | # if convert: 92 | # img = img.convert(convert) 93 | # except: 94 | # raise IOError('File error: ', path) 95 | # return np.array(img) 96 | 97 | 98 | def read_img(path, convert='RGB', check_exist=False): 99 | 100 | if isinstance(path, bytes): 101 | path = path.decode() 102 | if 'mpi_inf_3dhp' in path and 'mpi_inf_3dhp_test_set' not in path: 103 | path = path[:-10] + 'frame_' + path[-10:] 104 | path = path.replace('v', 'V') 105 | 106 | if check_exist and not osp.exists(path): 107 | return None 108 | try: 109 | # img = Image.open(download(path)) 110 | img = Image.open(path) 111 | if convert: 112 | img = img.convert(convert) 113 | except: 114 | img_list = [ 115 | './data/mpi_inf_3dhp/mpi_inf_3dhp_test_set/TS2/imageSequence/img_005398.jpg', 116 | './data/mpi_inf_3dhp/mpi_inf_3dhp_test_set/TS2/imageSequence/img_003276.jpg' 117 | ] 118 | if path in img_list: 119 | img = Image.open('./data/mpi_inf_3dhp/mpi_inf_3dhp_test_set/TS2/imageSequence/img_003277.jpg') 120 | else: 121 | raise IOError('File error: ', path) 122 | return np.array(img) 123 | -------------------------------------------------------------------------------- /lib/data_utils/insta_utils_imgs.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | sys.path.append('.') 4 | 5 | import argparse 6 | import numpy as np 7 | import os.path as osp 8 | from multiprocessing import Process, Pool 9 | from glob import glob 10 | from tqdm import tqdm 11 | import tensorflow as tf 12 | from PIL import Image 13 | 14 | from lib.core.config import INSTA_DIR, INSTA_IMG_DIR 15 | 16 | 17 | def process_single_record(fname, outdir, split): 18 | sess = tf.Session() 19 | #print(fname) 20 | record_name = fname.split('/')[-1] 21 | for vid_idx, serialized_ex in enumerate(tf.python_io.tf_record_iterator(fname)): 22 | #print(vid_idx) 23 | os.makedirs(osp.join(outdir, split, record_name, str(vid_idx)), exist_ok=True) 24 | example = tf.train.Example() 25 | example.ParseFromString(serialized_ex) 26 | 27 | N = int(example.features.feature['meta/N'].int64_list.value[0]) 28 | 29 | images_data = example.features.feature[ 30 | 'image/encoded'].bytes_list.value 31 | 32 | 33 | for i in range(N): 34 | image = np.expand_dims(sess.run(tf.image.decode_jpeg(images_data[i], channels=3)), axis=0) 35 | #video.append(image) 36 | image = Image.fromarray(np.squeeze(image, axis=0)) 37 | image.save(osp.join(outdir, split, record_name, str(vid_idx), str(i)+".jpg")) 38 | 39 | 40 | if __name__ == '__main__': 41 | parser = argparse.ArgumentParser() 42 | parser.add_argument('--inp_dir', type=str, help='tfrecords file path', default=INSTA_DIR) 43 | parser.add_argument('--n', type=int, help='total num of workers') 44 | parser.add_argument('--i', type=int, help='current index of worker (from 0 to n-1)') 45 | parser.add_argument('--split', type=str, help='train or test') 46 | parser.add_argument('--out_dir', type=str, help='output images path', default=INSTA_IMG_DIR) 47 | args = parser.parse_args() 48 | 49 | fpaths = glob(f'{args.inp_dir}/{args.split}/*.tfrecord') 50 | fpaths = sorted(fpaths) 51 | 52 | total = len(fpaths) 53 | fpaths = fpaths[args.i*total//args.n : (args.i+1)*total//args.n] 54 | 55 | #print(fpaths) 56 | #print(len(fpaths)) 57 | 58 | os.makedirs(args.out_dir, exist_ok=True) 59 | 60 | for idx, fp in enumerate(fpaths): 61 | process_single_record(fp, args.out_dir, args.split) -------------------------------------------------------------------------------- /lib/data_utils/mpii3d_utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | This script is borrowed from https://github.com/mkocabas/VIBE. 3 | Adhere to their license to use this script. 4 | 5 | We hacked it a little bit to make it happy in our framework. 6 | """ 7 | 8 | import sys 9 | sys.path.append('.') 10 | import os 11 | import cv2 12 | import h5py 13 | import glob 14 | import json 15 | import joblib 16 | import argparse 17 | import numpy as np 18 | from tqdm import tqdm 19 | import os.path as osp 20 | import scipy.io as sio 21 | 22 | from lib.core.config import DB_DIR, MPII3D_DIR 23 | from lib.utils.utils import tqdm_enumerate 24 | from lib.data_utils.kp_utils import convert_kps 25 | from lib.data_utils.img_utils import get_bbox_from_kp2d 26 | 27 | 28 | def read_openpose(json_file, gt_part, dataset): 29 | # get only the arms/legs joints 30 | op_to_12 = [11, 10, 9, 12, 13, 14, 4, 3, 2, 5, 6, 7] 31 | # read the openpose detection 32 | json_data = json.load(open(json_file, 'r')) 33 | people = json_data['people'] 34 | if len(people) == 0: 35 | # no openpose detection 36 | keyp25 = np.zeros([25,3]) 37 | else: 38 | # size of person in pixels 39 | scale = max(max(gt_part[:,0])-min(gt_part[:,0]),max(gt_part[:,1])-min(gt_part[:,1])) 40 | # go through all people and find a match 41 | dist_conf = np.inf*np.ones(len(people)) 42 | for i, person in enumerate(people): 43 | # openpose keypoints 44 | op_keyp25 = np.reshape(person['pose_keypoints_2d'], [25,3]) 45 | op_keyp12 = op_keyp25[op_to_12, :2] 46 | op_conf12 = op_keyp25[op_to_12, 2:3] > 0 47 | # all the relevant joints should be detected 48 | if min(op_conf12) > 0: 49 | # weighted distance of keypoints 50 | dist_conf[i] = np.mean(np.sqrt(np.sum(op_conf12*(op_keyp12 - gt_part[:12, :2])**2, axis=1))) 51 | # closest match 52 | p_sel = np.argmin(dist_conf) 53 | # the exact threshold is not super important but these are the values we used 54 | if dataset == 'mpii': 55 | thresh = 30 56 | elif dataset == 'coco': 57 | thresh = 10 58 | else: 59 | thresh = 0 60 | # dataset-specific thresholding based on pixel size of person 61 | if min(dist_conf)/scale > 0.1 and min(dist_conf) < thresh: 62 | keyp25 = np.zeros([25,3]) 63 | else: 64 | keyp25 = np.reshape(people[p_sel]['pose_keypoints_2d'], [25,3]) 65 | return keyp25 66 | 67 | 68 | def read_calibration(calib_file, vid_list): 69 | Ks, Rs, Ts = [], [], [] 70 | file = open(calib_file, 'r') 71 | content = file.readlines() 72 | for vid_i in vid_list: 73 | K = np.array([float(s) for s in content[vid_i * 7 + 5][11:-2].split()]) 74 | K = np.reshape(K, (4, 4)) 75 | RT = np.array([float(s) for s in content[vid_i * 7 + 6][11:-2].split()]) 76 | RT = np.reshape(RT, (4, 4)) 77 | R = RT[:3, :3] 78 | T = RT[:3, 3] / 1000 79 | Ks.append(K) 80 | Rs.append(R) 81 | Ts.append(T) 82 | return Ks, Rs, Ts 83 | 84 | 85 | def read_data_train(dataset_path, user_list, seq_list, vid_list, debug=False): 86 | h, w = 2048, 2048 87 | dataset = { 88 | 'vid_name': [], 89 | 'frame_id': [], 90 | 'joints3D': [], 91 | 'joints2D': [], 92 | 'bbox': [], 93 | 'img_name': [], 94 | } 95 | 96 | for user_i in user_list: 97 | for seq_i in seq_list: 98 | seq_path = os.path.join(dataset_path, 99 | 'S' + str(user_i), 100 | 'Seq' + str(seq_i)) 101 | # mat file with annotations 102 | annot_file = os.path.join(seq_path, 'annot.mat') 103 | annot2 = sio.loadmat(annot_file)['annot2'] 104 | annot3 = sio.loadmat(annot_file)['annot3'] 105 | # calibration file and camera parameters 106 | for j, vid_i in enumerate(vid_list): 107 | # image folder 108 | imgs_path = os.path.join(seq_path, 109 | 'video_' + str(vid_i)) 110 | # print(annot2, annot3, imgs_path) 111 | # per frame 112 | 113 | 114 | if not os.path.isdir(imgs_path): 115 | continue 116 | pattern = os.path.join(imgs_path, '*.jpg') 117 | img_list = sorted(glob.glob(pattern)) 118 | vid_used_frames = [] 119 | vid_used_joints = [] 120 | vid_used_bbox = [] 121 | vid_segments = [] 122 | vid_uniq_id = "subj" + str(user_i) + '_seq' + str(seq_i) + "_vid" + str(vid_i) + "_seg0" 123 | for i, img_i in tqdm_enumerate(img_list, desc="sub{}_seq{}_vid{}".format(user_i, seq_i, vid_i)): 124 | 125 | # for each image we store the relevant annotations 126 | img_name = img_i.split('/')[-1] 127 | joints_2d_raw = np.reshape(annot2[vid_i][0][i], (1, 28, 2)) 128 | joints_2d_raw= np.append(joints_2d_raw, np.ones((1,28,1)), axis=2) 129 | joints_2d = convert_kps(joints_2d_raw, "mpii3d", "spin").reshape((-1,3)) 130 | 131 | joints_3d_raw = np.reshape(annot3[vid_i][0][i], (1, 28, 3)) / 1000 132 | joints_3d = convert_kps(joints_3d_raw, "mpii3d", "spin").reshape((-1,3)) 133 | 134 | bbox = get_bbox_from_kp2d(joints_2d[~np.all(joints_2d == 0, axis=1)]).reshape(4) 135 | 136 | joints_3d = joints_3d - joints_3d[39] # 4 is the root 137 | 138 | # check that all joints are visible 139 | x_in = np.logical_and(joints_2d[:, 0] < w, joints_2d[:, 0] >= 0) 140 | y_in = np.logical_and(joints_2d[:, 1] < h, joints_2d[:, 1] >= 0) 141 | ok_pts = np.logical_and(x_in, y_in) 142 | 143 | if np.sum(ok_pts) < joints_2d.shape[0]: 144 | vid_uniq_id = "_".join(vid_uniq_id.split("_")[:-1])+ "_seg" +\ 145 | str(int(dataset['vid_name'][-1].split("_")[-1][3:])+1) 146 | continue 147 | 148 | dataset['vid_name'].append(vid_uniq_id) 149 | dataset['frame_id'].append(img_name.split(".")[0]) 150 | dataset['img_name'].append(img_i) 151 | dataset['joints2D'].append(joints_2d) 152 | dataset['joints3D'].append(joints_3d) 153 | dataset['bbox'].append(bbox) 154 | vid_segments.append(vid_uniq_id) 155 | vid_used_frames.append(img_i) 156 | vid_used_joints.append(joints_2d) 157 | vid_used_bbox.append(bbox) 158 | 159 | vid_segments= np.array(vid_segments) 160 | ids = np.zeros((len(set(vid_segments))+1)) 161 | ids[-1] = len(vid_used_frames) + 1 162 | if (np.where(vid_segments[:-1] != vid_segments[1:])[0]).size != 0: 163 | ids[1:-1] = (np.where(vid_segments[:-1] != vid_segments[1:])[0]) + 1 164 | 165 | 166 | for k in dataset.keys(): 167 | dataset[k] = np.array(dataset[k]) 168 | 169 | valid = np.zeros([len(dataset['joints3D']), 49, 1]) 170 | valid[:, 25:39, :] = 1 171 | valid[:, (39, 41, 43), :] = 1 172 | dataset['joints3D'] = np.concatenate([dataset['joints3D'], valid], axis=-1) 173 | 174 | return dataset 175 | 176 | 177 | def read_test_data(dataset_path): 178 | 179 | dataset = { 180 | 'vid_name': [], 181 | 'frame_id': [], 182 | 'joints3D': [], 183 | 'joints2D': [], 184 | 'bbox': [], 185 | 'img_name': [], 186 | 'features': [], 187 | "valid_i": [] 188 | } 189 | 190 | user_list = range(1, 7) 191 | 192 | for user_i in user_list: 193 | print('Subject', user_i) 194 | seq_path = os.path.join(dataset_path, 195 | 'mpi_inf_3dhp_test_set', 196 | 'TS' + str(user_i)) 197 | # mat file with annotations 198 | annot_file = os.path.join(seq_path, 'annot_data.mat') 199 | mat_as_h5 = h5py.File(annot_file, 'r') 200 | annot2 = np.array(mat_as_h5['annot2']) 201 | annot3 = np.array(mat_as_h5['univ_annot3']) 202 | valid = np.array(mat_as_h5['valid_frame']) 203 | 204 | vid_used_frames = [] 205 | vid_used_joints = [] 206 | vid_used_bbox = [] 207 | vid_segments = [] 208 | vid_uniq_id = "subj" + str(user_i) + "_seg0" 209 | 210 | 211 | for frame_i, valid_i in tqdm(enumerate(valid)): 212 | 213 | img_i = os.path.join('mpi_inf_3dhp_test_set', 214 | 'TS' + str(user_i), 215 | 'imageSequence', 216 | 'img_' + str(frame_i + 1).zfill(6) + '.jpg') 217 | 218 | joints_2d_raw = np.expand_dims(annot2[frame_i, 0, :, :], axis = 0) 219 | joints_2d_raw = np.append(joints_2d_raw, np.ones((1, 17, 1)), axis=2) 220 | 221 | 222 | joints_2d = convert_kps(joints_2d_raw, src="mpii3d_test", dst="spin").reshape((-1, 3)) 223 | 224 | joints_3d_raw = np.reshape(annot3[frame_i, 0, :, :], (1, 17, 3)) / 1000 225 | joints_3d = convert_kps(joints_3d_raw, "mpii3d_test", "spin").reshape((-1, 3)) 226 | joints_3d = joints_3d - joints_3d[39] # substract pelvis zero is the root for test 227 | 228 | bbox = get_bbox_from_kp2d(joints_2d[~np.all(joints_2d == 0, axis=1)]).reshape(4) 229 | 230 | 231 | # check that all joints are visible 232 | img_file = os.path.join(dataset_path, img_i) 233 | I = cv2.imread(img_file) 234 | h, w, _ = I.shape 235 | x_in = np.logical_and(joints_2d[:, 0] < w, joints_2d[:, 0] >= 0) 236 | y_in = np.logical_and(joints_2d[:, 1] < h, joints_2d[:, 1] >= 0) 237 | ok_pts = np.logical_and(x_in, y_in) 238 | 239 | if np.sum(ok_pts) < joints_2d.shape[0]: 240 | vid_uniq_id = "_".join(vid_uniq_id.split("_")[:-1]) + "_seg" + \ 241 | str(int(dataset['vid_name'][-1].split("_")[-1][3:]) + 1) 242 | continue 243 | 244 | print(joints_3d.shape) 245 | dataset['vid_name'].append(vid_uniq_id) 246 | dataset['frame_id'].append(img_file.split("/")[-1].split(".")[0]) 247 | dataset['img_name'].append(img_file) 248 | dataset['joints2D'].append(joints_2d) 249 | dataset['joints3D'].append(joints_3d) 250 | dataset['bbox'].append(bbox) 251 | dataset['valid_i'].append(valid_i) 252 | 253 | vid_segments.append(vid_uniq_id) 254 | vid_used_frames.append(img_file) 255 | vid_used_joints.append(joints_2d) 256 | vid_used_bbox.append(bbox) 257 | 258 | vid_segments = np.array(vid_segments) 259 | ids = np.zeros((len(set(vid_segments)) + 1)) 260 | ids[-1] = len(vid_used_frames) + 1 261 | if (np.where(vid_segments[:-1] != vid_segments[1:])[0]).size != 0: 262 | ids[1:-1] = (np.where(vid_segments[:-1] != vid_segments[1:])[0]) + 1 263 | 264 | for k in dataset.keys(): 265 | dataset[k] = np.array(dataset[k]) 266 | 267 | valid = np.zeros([len(dataset['joints3D']), 49, 1]) 268 | valid[:, 25:39, :] = 1 269 | valid[:, (39, 41, 43), :] = 1 270 | dataset['joints3D'] = np.concatenate([dataset['joints3D'], valid], axis=-1) 271 | 272 | return dataset 273 | 274 | if __name__ == '__main__': 275 | parser = argparse.ArgumentParser() 276 | parser.add_argument('--inp_dir', type=str, help='dataset directory', default=MPII3D_DIR) 277 | parser.add_argument('--out_dir', type=str, help='output directory', default=DB_DIR) 278 | parser.add_argument('--sub', nargs='+', type=int, default=[1,2,3,4,5,6,7,8]) 279 | parser.add_argument('--seq', nargs='+', type=int, default=[1,2]) 280 | parser.add_argument('--vid', nargs='+', type=int, default=[0,1,2,3,4,5,6,7,8]) 281 | args = parser.parse_args() 282 | 283 | print(args.sub) 284 | print(args.seq) 285 | print(args.vid) 286 | 287 | dataset = read_data_train(args.inp_dir, args.sub, args.seq, args.vid) 288 | joblib.dump(dataset, osp.join(args.out_dir, 'mpii3d_train_db.pt')) 289 | 290 | #dataset = read_test_data(args.inp_dir) 291 | #joblib.dump(dataset, osp.join(args.out_dir, 'mpii3d_val_db.pt')) 292 | 293 | 294 | 295 | -------------------------------------------------------------------------------- /lib/data_utils/penn_action_utils.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | This script is borrowed from https://github.com/mkocabas/VIBE. 4 | Adhere to their license to use this script. 5 | 6 | We hacked it a little bit to make it happy in our framework. 7 | """ 8 | 9 | import sys 10 | sys.path.append('.') 11 | 12 | import glob 13 | import torch 14 | import joblib 15 | import argparse 16 | from tqdm import tqdm 17 | import os.path as osp 18 | from skimage import io 19 | from scipy.io import loadmat 20 | 21 | from lib.data_utils.kp_utils import * 22 | from lib.core.config import DB_DIR, PENNACTION_DIR 23 | from lib.data_utils.img_utils import get_bbox_from_kp2d 24 | 25 | 26 | def calc_kpt_bound(kp_2d): 27 | MAX_COORD = 10000 28 | x = kp_2d[:, 0] 29 | y = kp_2d[:, 1] 30 | z = kp_2d[:, 2] 31 | u = MAX_COORD 32 | d = -1 33 | l = MAX_COORD 34 | r = -1 35 | for idx, vis in enumerate(z): 36 | if vis == 0: # skip invisible joint 37 | continue 38 | u = min(u, y[idx]) 39 | d = max(d, y[idx]) 40 | l = min(l, x[idx]) 41 | r = max(r, x[idx]) 42 | return u, d, l, r 43 | 44 | 45 | def load_mat(path): 46 | mat = loadmat(path) 47 | del mat['pose'], mat['__header__'], mat['__globals__'], mat['__version__'], mat['train'], mat['action'] 48 | mat['nframes'] = mat['nframes'][0][0] 49 | 50 | return mat 51 | 52 | 53 | def read_data(folder): 54 | dataset = { 55 | 'img_name' : [], 56 | 'joints2D': [], 57 | 'bbox': [], 58 | 'vid_name': [], 59 | } 60 | 61 | file_names = sorted(glob.glob(folder + '/labels/'+'*.mat')) 62 | 63 | for fname in tqdm(file_names): 64 | vid_dict=load_mat(fname) 65 | imgs = sorted(glob.glob(folder + '/frames/'+ fname.strip().split('/')[-1].split('.')[0]+'/*.jpg')) 66 | kp_2d = np.zeros((vid_dict['nframes'], 13, 3)) 67 | perm_idxs = get_perm_idxs('pennaction', 'common') 68 | 69 | kp_2d[:, :, 0] = vid_dict['x'] 70 | kp_2d[:, :, 1] = vid_dict['y'] 71 | kp_2d[:, :, 2] = vid_dict['visibility'] 72 | kp_2d = kp_2d[:, perm_idxs, :] 73 | 74 | # fix inconsistency 75 | n_kp_2d = np.zeros((kp_2d.shape[0], 14, 3)) 76 | n_kp_2d[:, :12, :] = kp_2d[:, :-1, :] 77 | n_kp_2d[:, 13, :] = kp_2d[:, 12, :] 78 | kp_2d = n_kp_2d 79 | 80 | bbox = np.zeros((vid_dict['nframes'], 4)) 81 | 82 | for fr_id, fr in enumerate(kp_2d): 83 | u, d, l, r = calc_kpt_bound(fr) 84 | center = np.array([(l + r) * 0.5, (u + d) * 0.5], dtype=np.float32) 85 | c_x, c_y = center[0], center[1] 86 | w, h = r - l, d - u 87 | w = h = np.where(w / h > 1, w, h) 88 | 89 | bbox[fr_id,:] = np.array([c_x, c_y, w, h]) 90 | 91 | dataset['vid_name'].append(np.array([f'{fname}']* vid_dict['nframes'])) 92 | dataset['img_name'].append(np.array(imgs)) 93 | dataset['joints2D'].append(kp_2d) 94 | dataset['bbox'].append(bbox) 95 | 96 | for k in dataset.keys(): 97 | dataset[k] = np.array(dataset[k]) 98 | dataset[k] = np.concatenate(dataset[k]) 99 | 100 | dataset['joints2D'] = convert_kps(dataset['joints2D'], src='pennaction', dst='spin') 101 | 102 | return dataset 103 | 104 | 105 | if __name__ == '__main__': 106 | parser = argparse.ArgumentParser() 107 | parser.add_argument('--inp_dir', type=str, help='dataset directory', default=PENNACTION_DIR) 108 | parser.add_argument('--out_dir', type=str, help='output directory', default=DB_DIR) 109 | args = parser.parse_args() 110 | 111 | dataset = read_data(args.inp_dir) 112 | joblib.dump(dataset, osp.join(args.out_dir, 'pennaction_train_db.pt')) -------------------------------------------------------------------------------- /lib/data_utils/posetrack_utils.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | This script is borrowed from https://github.com/mkocabas/VIBE. 4 | Adhere to their license to use this script. 5 | 6 | We hacked it a little bit to make it happy in our framework. 7 | """ 8 | 9 | import sys 10 | sys.path.append('.') 11 | 12 | import glob 13 | import joblib 14 | import argparse 15 | import numpy as np 16 | import json 17 | import os.path as osp 18 | 19 | from lib.core.config import DB_DIR, POSETRACK_DIR 20 | from lib.utils.utils import tqdm_enumerate 21 | from lib.data_utils.kp_utils import get_posetrack_original_kp_names, convert_kps 22 | 23 | def read_data(folder, set): 24 | dataset = { 25 | 'img_name' : [] , 26 | 'joints2D': [], 27 | 'bbox': [], 28 | 'vid_name': [], 29 | } 30 | 31 | file_names = glob.glob(osp.join(folder, 'posetrack_data/annotations/', f'{set}/*.json')) 32 | file_names = sorted(file_names) 33 | nn_corrupted = 0 34 | tot_frames = 0 35 | min_frame_number = 8 36 | 37 | for fid,fname in tqdm_enumerate(file_names): 38 | if fname == osp.join(folder, 'annotations/train/021133_mpii_train.json'): 39 | continue 40 | 41 | with open(fname, 'r') as entry: 42 | anns = json.load(entry) 43 | # num_frames = anns['images'][0]['nframes'] 44 | anns['images'] = [item for item in anns['images'] if item['is_labeled'] ] 45 | num_frames = len(anns['images']) 46 | frame2imgname = dict() 47 | for el in anns['images']: 48 | frame2imgname[el['frame_id']] = el['file_name'] 49 | 50 | num_people = -1 51 | for x in anns['annotations']: 52 | if num_people < x['track_id']: 53 | num_people = x['track_id'] 54 | num_people += 1 55 | posetrack_joints = get_posetrack_original_kp_names() 56 | idxs = [anns['categories'][0]['keypoints'].index(h) for h in posetrack_joints if h in anns['categories'][0]['keypoints']] 57 | for x in anns['annotations']: 58 | kps = np.array(x['keypoints']).reshape((17,3)) 59 | kps = kps[idxs,:] 60 | x['keypoints'] = list(kps.flatten()) 61 | 62 | tot_frames += num_people * num_frames 63 | for p_id in range(num_people): 64 | 65 | annot_pid = [(item['keypoints'], item['bbox'], item['image_id']) 66 | for item in anns['annotations'] 67 | if item['track_id'] == p_id and not(np.count_nonzero(item['keypoints']) == 0) ] 68 | 69 | if len(annot_pid) < min_frame_number: 70 | nn_corrupted += len(annot_pid) 71 | continue 72 | 73 | bbox = np.zeros((len(annot_pid),4)) 74 | # perm_idxs = get_perm_idxs('posetrack', 'common') 75 | kp_2d = np.zeros((len(annot_pid), len(annot_pid[0][0])//3 ,3)) 76 | img_paths = np.zeros((len(annot_pid))) 77 | 78 | for i, (key2djnts, bbox_p, image_id) in enumerate(annot_pid): 79 | 80 | if (bbox_p[2]==0 or bbox_p[3]==0) : 81 | nn_corrupted +=1 82 | continue 83 | 84 | img_paths[i] = image_id 85 | key2djnts[2::3] = len(key2djnts[2::3])*[1] 86 | 87 | kp_2d[i,:] = np.array(key2djnts).reshape(int(len(key2djnts)/3),3) # [perm_idxs, :] 88 | for kp_loc in kp_2d[i,:]: 89 | if kp_loc[0] == 0 and kp_loc[1] == 0: 90 | kp_loc[2] = 0 91 | 92 | 93 | x_tl = bbox_p[0] 94 | y_tl = bbox_p[1] 95 | w = bbox_p[2] 96 | h = bbox_p[3] 97 | bbox_p[0] = x_tl + w / 2 98 | bbox_p[1] = y_tl + h / 2 99 | # 100 | 101 | w = h = np.where(w / h > 1, w, h) 102 | w = h = h * 0.8 103 | bbox_p[2] = w 104 | bbox_p[3] = h 105 | bbox[i, :] = bbox_p 106 | 107 | img_paths = list(img_paths) 108 | img_paths = [osp.join(folder, frame2imgname[item]) if item != 0 else 0 for item in img_paths ] 109 | 110 | bbx_idxs = [] 111 | for bbx_id, bbx in enumerate(bbox): 112 | if np.count_nonzero(bbx) == 0: 113 | bbx_idxs += [bbx_id] 114 | 115 | kp_2d = np.delete(kp_2d, bbx_idxs, 0) 116 | img_paths = np.delete(np.array(img_paths), bbx_idxs, 0) 117 | bbox = np.delete(bbox, np.where(~bbox.any(axis=1))[0], axis=0) 118 | 119 | # Convert to common 2d keypoint format 120 | if bbox.size == 0 or bbox.shape[0] < min_frame_number: 121 | nn_corrupted += 1 122 | continue 123 | 124 | kp_2d = convert_kps(kp_2d, src='posetrack', dst='spin') 125 | 126 | dataset['vid_name'].append(np.array([f'{fname}_{p_id}']*img_paths.shape[0])) 127 | dataset['img_name'].append(np.array(img_paths)) 128 | dataset['joints2D'].append(kp_2d) 129 | dataset['bbox'].append(np.array(bbox)) 130 | 131 | 132 | assert kp_2d.shape[0] == img_paths.shape[0] == bbox.shape[0] 133 | 134 | 135 | 136 | print(nn_corrupted, tot_frames) 137 | for k in dataset.keys(): 138 | dataset[k] = np.array(dataset[k]) 139 | 140 | for k in dataset.keys(): 141 | dataset[k] = np.concatenate(dataset[k]) 142 | 143 | for k,v in dataset.items(): 144 | print(k, v.shape) 145 | 146 | return dataset 147 | 148 | 149 | if __name__ == '__main__': 150 | parser = argparse.ArgumentParser() 151 | parser.add_argument('--inp_dir', type=str, help='dataset directory', default=POSETRACK_DIR) 152 | parser.add_argument('--out_dir', type=str, help='output directory', default=DB_DIR) 153 | args = parser.parse_args() 154 | 155 | dataset_train = read_data(args.inp_dir, 'train') 156 | joblib.dump(dataset_train, osp.join(args.out_dir, 'posetrack_train_db.pt')) 157 | -------------------------------------------------------------------------------- /lib/data_utils/threedpw_utils.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | This script is borrowed from https://github.com/mkocabas/VIBE. 4 | Adhere to their license to use this script. 5 | 6 | We hacked it a little bit to make it happy in our framework. 7 | """ 8 | 9 | import sys 10 | sys.path.append('.') 11 | 12 | import os 13 | import cv2 14 | import torch 15 | import joblib 16 | import argparse 17 | import numpy as np 18 | import pickle as pkl 19 | import os.path as osp 20 | from tqdm import tqdm 21 | 22 | from lib.data_utils.kp_utils import * 23 | from lib.core.config import DB_DIR, DATA_DIR, THREEDPW_DIR 24 | from lib.utils.smooth_bbox import get_smooth_bbox_params 25 | from lib.models.smpl import SMPL, SMPL_MODEL_DIR, H36M_TO_J14 26 | from lib.utils.geometry import batch_rodrigues, rotation_matrix_to_angle_axis 27 | from lib.data_utils.kp_utils import convert_kps 28 | 29 | NUM_JOINTS = 24 30 | VIS_THRESH = 0.3 31 | MIN_KP = 6 32 | 33 | def read_data(folder, set, debug=False): 34 | 35 | dataset = { 36 | 'vid_name': [], 37 | 'frame_id': [], 38 | 'joints3D': [], 39 | 'joints2D': [], 40 | 'shape': [], 41 | 'pose': [], 42 | 'bbox': [], 43 | 'img_name': [], 44 | 'valid': [], 45 | } 46 | 47 | sequences = [x.split('.')[0] for x in os.listdir(osp.join(folder, 'sequenceFiles', set))] 48 | 49 | J_regressor = None 50 | 51 | smpl = SMPL(SMPL_MODEL_DIR, batch_size=1, create_transl=False) 52 | if set == 'test' or set == 'validation': 53 | J_regressor = torch.from_numpy(np.load(osp.join(DATA_DIR, 'J_regressor_h36m.npy'))).float() 54 | 55 | for i, seq in tqdm(enumerate(sequences)): 56 | 57 | data_file = osp.join(folder, 'sequenceFiles', set, seq + '.pkl') 58 | 59 | data = pkl.load(open(data_file, 'rb'), encoding='latin1') 60 | 61 | img_dir = osp.join(folder, 'imageFiles', seq) 62 | 63 | num_people = len(data['poses']) 64 | num_frames = len(data['img_frame_ids']) 65 | assert (data['poses2d'][0].shape[0] == num_frames) 66 | 67 | for p_id in range(num_people): 68 | pose = torch.from_numpy(data['poses'][p_id]).float() 69 | shape = torch.from_numpy(data['betas'][p_id][:10]).float().repeat(pose.size(0), 1) 70 | trans = torch.from_numpy(data['trans'][p_id]).float() 71 | j2d = data['poses2d'][p_id].transpose(0,2,1) 72 | cam_pose = data['cam_poses'] 73 | campose_valid = data['campose_valid'][p_id] 74 | 75 | # ======== Align the mesh params ======== # 76 | rot = pose[:, :3] 77 | rot_mat = batch_rodrigues(rot) 78 | 79 | Rc = torch.from_numpy(cam_pose[:, :3, :3]).float() 80 | Rs = torch.bmm(Rc, rot_mat.reshape(-1, 3, 3)) 81 | rot = rotation_matrix_to_angle_axis(Rs) 82 | pose[:, :3] = rot 83 | # ======== Align the mesh params ======== # 84 | 85 | output = smpl(betas=shape, body_pose=pose[:,3:], global_orient=pose[:,:3], transl=trans) 86 | # verts = output.vertices 87 | j3d = output.joints 88 | 89 | if J_regressor is not None: 90 | vertices = output.vertices 91 | J_regressor_batch = J_regressor[None, :].expand(vertices.shape[0], -1, -1).to(vertices.device) 92 | j3d = torch.matmul(J_regressor_batch, vertices) 93 | j3d = j3d[:, H36M_TO_J14, :] 94 | 95 | img_paths = [] 96 | for i_frame in range(num_frames): 97 | img_path = os.path.join(img_dir + '/image_{:05d}.jpg'.format(i_frame)) 98 | img_paths.append(img_path) 99 | 100 | bbox_params, time_pt1, time_pt2 = get_smooth_bbox_params(j2d, vis_thresh=VIS_THRESH, sigma=8) 101 | 102 | # process bbox_params 103 | c_x = bbox_params[:,0] 104 | c_y = bbox_params[:,1] 105 | scale = bbox_params[:,2] 106 | w = h = 150. / scale 107 | w = h = h * 1.1 108 | bbox = np.vstack([c_x,c_y,w,h]).T 109 | 110 | # process keypoints 111 | j2d[:, :, 2] = j2d[:, :, 2] > 0.3 # set the visibility flags 112 | # Convert to common 2d keypoint format 113 | perm_idxs = get_perm_idxs('3dpw', 'common') 114 | perm_idxs += [0, 0] # no neck, top head 115 | j2d = j2d[:, perm_idxs] 116 | j2d[:, 12:, 2] = 0.0 117 | 118 | # print('j2d', j2d[time_pt1:time_pt2].shape) 119 | # print('campose', campose_valid[time_pt1:time_pt2].shape) 120 | 121 | img_paths_array = np.array(img_paths)[time_pt1:time_pt2] 122 | dataset['vid_name'].append(np.array([f'{seq}_{p_id}']*num_frames)[time_pt1:time_pt2]) 123 | dataset['frame_id'].append(np.arange(0, num_frames)[time_pt1:time_pt2]) 124 | dataset['img_name'].append(img_paths_array) 125 | dataset['joints3D'].append(j3d.numpy()[time_pt1:time_pt2]) 126 | dataset['joints2D'].append(j2d[time_pt1:time_pt2]) 127 | dataset['shape'].append(shape.numpy()[time_pt1:time_pt2]) 128 | dataset['pose'].append(pose.numpy()[time_pt1:time_pt2]) 129 | dataset['bbox'].append(bbox) 130 | dataset['valid'].append(campose_valid[time_pt1:time_pt2]) 131 | 132 | for k in dataset.keys(): 133 | dataset[k] = np.concatenate(dataset[k]) 134 | print(k, dataset[k].shape) 135 | 136 | # Filter out keypoints 137 | indices_to_use = np.where((dataset['joints2D'][:, :, 2] > VIS_THRESH).sum(-1) > MIN_KP)[0] 138 | for k in dataset.keys(): 139 | dataset[k] = dataset[k][indices_to_use] 140 | 141 | dataset['joints2D'] = convert_kps(dataset['joints2D'], src='common', dst='spin') 142 | valid = np.zeros([len(dataset['joints3D']), 49, 1]) 143 | valid[:, 25:39, :] = 1 144 | if set != 'train': 145 | dataset['joints3D'] = convert_kps(dataset['joints3D'], src='common', dst='spin') 146 | dataset['joints3D'] = np.concatenate([dataset['joints3D'], valid], axis=-1) 147 | 148 | return dataset 149 | 150 | 151 | if __name__ == '__main__': 152 | parser = argparse.ArgumentParser() 153 | parser.add_argument('--inp_dir', type=str, help='dataset directory', default=THREEDPW_DIR) 154 | parser.add_argument('--out_dir', type=str, help='output directory', default=DB_DIR) 155 | args = parser.parse_args() 156 | 157 | debug = False 158 | 159 | dataset = read_data(args.inp_dir, 'validation', debug=debug) 160 | joblib.dump(dataset, osp.join(args.out_dir, '3dpw_val_db.pt')) 161 | 162 | dataset = read_data(args.inp_dir, 'train', debug=debug) 163 | joblib.dump(dataset, osp.join(args.out_dir, '3dpw_train_db.pt')) 164 | 165 | dataset = read_data(args.inp_dir, 'test', debug=debug) 166 | joblib.dump(dataset, osp.join(args.out_dir, '3dpw_test_db.pt')) 167 | -------------------------------------------------------------------------------- /lib/data_utils/transforms/__init__.py: -------------------------------------------------------------------------------- 1 | from .crop import * 2 | from .color_jitter import * 3 | from .basic import * 4 | from .random_erase import * 5 | from .random_hflip import * -------------------------------------------------------------------------------- /lib/data_utils/transforms/basic.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torchvision.transforms.functional as F 3 | import numpy as np 4 | 5 | from PIL import Image 6 | 7 | class _NormalizeBase(object): 8 | def __init__(self, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], patch_size=224, inplace=False): 9 | self.mean = mean 10 | self.std = std 11 | self.inplace = inplace 12 | self.patch_size = patch_size 13 | 14 | def normalize_2d_kp(self, kp_2d): 15 | # Normalize keypoints between -1, 1 16 | ratio = 1.0 / self.patch_size 17 | kp_2d = 2.0 * kp_2d * ratio - 1.0 18 | 19 | return kp_2d 20 | 21 | def __call__(self, instance): 22 | raise NotImplementedError() 23 | 24 | class NormalizeVideo(_NormalizeBase): 25 | def __init__(self, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], patch_size=224, inplace=False): 26 | super(NormalizeVideo, self).__init__(mean, std, patch_size, inplace) 27 | 28 | def __call__(self, instance): 29 | clip = instance['clip'] 30 | new_clip = [] 31 | for c in clip: 32 | new_clip.append(F.normalize(c, self.mean, self.std, self.inplace)) 33 | new_clip = torch.stack(new_clip, dim=0) 34 | 35 | ret = {k: v for k, v in instance.items() if k not in ['clip', 'kp_2d', 'kp_2d_full']} 36 | ret.update({'clip': new_clip}) 37 | 38 | if 'kp_2d' in instance: 39 | kp = instance['kp_2d'] 40 | new_kp = kp 41 | new_kp[:,:, :2] = self.normalize_2d_kp(kp[:,:, :2]) 42 | ret.update({'kp_2d':new_kp}) 43 | 44 | if 'kp_2d_full' in instance: 45 | kp = instance['kp_2d_full'] 46 | new_kp = kp 47 | new_kp[:,:, :2] = self.normalize_2d_kp(kp[:,:, :2]) 48 | ret.update({'kp_2d_full':new_kp}) 49 | 50 | return ret 51 | 52 | class NormalizeImage(_NormalizeBase): 53 | def __init__(self, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], patch_size=224, inplace=False): 54 | super(NormalizeImage, self).__init__(mean, std, patch_size, inplace) 55 | 56 | def __call__(self, instance): 57 | image = instance['image'] 58 | new_image = F.normalize(image, self.mean, self.std, self.inplace) 59 | 60 | if 'kp_2d' in instance: 61 | kp = instance['kp_2d'] 62 | new_kp = kp 63 | new_kp[:, :2] = self.normalize_2d_kp(kp[:, :2]) 64 | 65 | ret = {k: v for k, v in instance.items() if k not in ['image', 'kp_2d']} 66 | 67 | if 'kp_2d' in instance: 68 | ret.update({'kp_2d':new_kp}) 69 | 70 | ret.update({'image': new_image}) 71 | 72 | return ret 73 | 74 | class StackFrames(object): 75 | """Stack a list of PIL Images or numpy arrays along a new dimension. 76 | 77 | Args: 78 | roll (float): whether to convert BGR to RGB. Default value is False 79 | """ 80 | def __init__(self, roll=False): 81 | self.roll = roll 82 | 83 | def __call__(self, instance): 84 | clip = instance['clip'] 85 | if self.roll: 86 | stacked_clip = np.stack([np.array(x)[:, :, ::-1] for x in clip], axis=0) 87 | else: 88 | stacked_clip = np.stack([np.array(x) for x in clip], axis=0) 89 | 90 | ret = {k:v for k, v in instance.items() if k!='clip'} 91 | ret.update({'clip': stacked_clip}) 92 | return ret 93 | 94 | class ToTensorVideo(object): 95 | """ Converts a sequence of PIL.Image (RGB) or numpy.ndarray (T x H x W x C) in the range [0, 255] 96 | to a torch.FloatTensor of shape (T x C x H x W) in the range [0.0, 1.0] """ 97 | def __call__(self, instance): 98 | clip = instance['clip'] 99 | new_clip = [] 100 | for img in clip: 101 | img = F.to_tensor(img) 102 | new_clip.append(img) 103 | clip = torch.stack(new_clip, dim=0) 104 | 105 | ret = {k: torch.from_numpy(v) for k, v in instance.items() if k!='clip'} 106 | ret.update({'clip': clip}) 107 | return ret 108 | 109 | class ToTensorImage(object): 110 | """ Converts a PIL.Image (RGB) or numpy.ndarray (H x W x C) in the range [0, 255] 111 | to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] """ 112 | def __call__(self, instance): 113 | image = instance['image'] 114 | image = F.to_tensor(image) 115 | ret = {k: torch.from_numpy(v) for k, v in instance.items() if k!='image'} 116 | ret.update({'image': image}) 117 | return ret -------------------------------------------------------------------------------- /lib/data_utils/transforms/color_jitter.py: -------------------------------------------------------------------------------- 1 | import random 2 | import torchvision.transforms.functional as F 3 | import numpy as np 4 | 5 | from PIL import Image 6 | 7 | 8 | class _ColorJitter(object): 9 | def __init__(self, brightness=0, contrast=0, saturation=0, hue=0): 10 | self.brightness = brightness 11 | self.contrast = contrast 12 | self.saturation = saturation 13 | self.hue = hue 14 | 15 | def get_params(self, brightness, contrast, saturation, hue): 16 | if brightness > 0: 17 | brightness_factor = random.uniform( 18 | max(0, 1 - brightness), 1 + brightness) 19 | else: 20 | brightness_factor = None 21 | 22 | if contrast > 0: 23 | contrast_factor = random.uniform( 24 | max(0, 1 - contrast), 1 + contrast) 25 | else: 26 | contrast_factor = None 27 | 28 | if saturation > 0: 29 | saturation_factor = random.uniform( 30 | max(0, 1 - saturation), 1 + saturation) 31 | else: 32 | saturation_factor = None 33 | 34 | if hue > 0: 35 | hue_factor = random.uniform(-hue, hue) 36 | else: 37 | hue_factor = None 38 | return brightness_factor, contrast_factor, saturation_factor, hue_factor 39 | 40 | class ColorJitterVideo(_ColorJitter): 41 | """Randomly change the brightness, contrast and saturation and hue of the clip 42 | Args: 43 | brightness (float): How much to jitter brightness. brightness_factor 44 | is chosen uniformly from [max(0, 1 - brightness), 1 + brightness]. 45 | contrast (float): How much to jitter contrast. contrast_factor 46 | is chosen uniformly from [max(0, 1 - contrast), 1 + contrast]. 47 | saturation (float): How much to jitter saturation. saturation_factor 48 | is chosen uniformly from [max(0, 1 - saturation), 1 + saturation]. 49 | hue(float): How much to jitter hue. hue_factor is chosen uniformly from 50 | [-hue, hue]. Should be >=0 and <= 0.5. 51 | """ 52 | 53 | def __init__(self, brightness=0, contrast=0, saturation=0, hue=0): 54 | super(ColorJitterVideo, self).__init__(brightness, contrast, saturation, hue) 55 | 56 | def __call__(self, instance): 57 | """ 58 | Args: 59 | instance (dict): must contain key 'clip', and instance['clip'] is a list of PIL Image or numpy array 60 | """ 61 | if isinstance(instance['clip'][0], Image.Image): 62 | clip = instance['clip'] 63 | elif isinstance(instance['clip'][0], np.ndarray): 64 | clip = [Image.fromarray(c) for c in instance['clip']] 65 | else: 66 | clip = instance['clip'][0] 67 | raise TypeError( 68 | f'Color jitter not yet implemented for {type(clip)}') 69 | 70 | brightness, contrast, saturation, hue = self.get_params( 71 | self.brightness, self.contrast, self.saturation, self.hue) 72 | # Create img transform function sequence 73 | img_transforms = [] 74 | if brightness is not None: 75 | img_transforms.append(lambda img: F.adjust_brightness(img, brightness)) 76 | if saturation is not None: 77 | img_transforms.append(lambda img: F.adjust_saturation(img, saturation)) 78 | if hue is not None: 79 | img_transforms.append(lambda img: F.adjust_hue(img, hue)) 80 | if contrast is not None: 81 | img_transforms.append(lambda img: F.adjust_contrast(img, contrast)) 82 | random.shuffle(img_transforms) 83 | # Apply to all images 84 | jittered_clip = [] 85 | for img in clip: 86 | jittered_img = img.copy() 87 | for func in img_transforms: 88 | jittered_img = func(jittered_img) 89 | jittered_clip.append(jittered_img) 90 | 91 | 92 | ret = {k:v for k, v in instance.items() if k!='clip'} 93 | ret.update({'clip': jittered_clip}) 94 | 95 | return ret 96 | 97 | class ColorJitterImage(_ColorJitter): 98 | """Randomly change the brightness, contrast and saturation and hue of the images 99 | Args: 100 | brightness (float): How much to jitter brightness. brightness_factor 101 | is chosen uniformly from [max(0, 1 - brightness), 1 + brightness]. 102 | contrast (float): How much to jitter contrast. contrast_factor 103 | is chosen uniformly from [max(0, 1 - contrast), 1 + contrast]. 104 | saturation (float): How much to jitter saturation. saturation_factor 105 | is chosen uniformly from [max(0, 1 - saturation), 1 + saturation]. 106 | hue(float): How much to jitter hue. hue_factor is chosen uniformly from 107 | [-hue, hue]. Should be >=0 and <= 0.5. 108 | """ 109 | 110 | def __init__(self, brightness=0, contrast=0, saturation=0, hue=0): 111 | super(ColorJitterImage, self).__init__(brightness, contrast, saturation, hue) 112 | 113 | def __call__(self, instance): 114 | """ 115 | Args: 116 | instance (dict): must contain key 'image'. 117 | instance['image'] PIL Image or numpy arrays. 118 | """ 119 | if isinstance(instance['image'], Image.Image): 120 | image = instance['image'] 121 | elif isinstance(instance['image'], np.ndarray): 122 | image = Image.fromarray(instance['image']) 123 | else: 124 | image = instance['image'] 125 | raise TypeError( 126 | f'Random Erase not yet implemented for {type(image)}') 127 | 128 | brightness, contrast, saturation, hue = self.get_params( 129 | self.brightness, self.contrast, self.saturation, self.hue) 130 | # Create img transform function sequence 131 | img_transforms = [] 132 | if brightness is not None: 133 | img_transforms.append(lambda img: F.adjust_brightness(img, brightness)) 134 | if saturation is not None: 135 | img_transforms.append(lambda img: F.adjust_saturation(img, saturation)) 136 | if hue is not None: 137 | img_transforms.append(lambda img: F.adjust_hue(img, hue)) 138 | if contrast is not None: 139 | img_transforms.append(lambda img: F.adjust_contrast(img, contrast)) 140 | random.shuffle(img_transforms) 141 | 142 | # Apply to images 143 | jittered_img = image.copy() 144 | for func in img_transforms: 145 | jittered_img = func(jittered_img) 146 | 147 | ret = {k:v for k, v in instance.items() if k!='image'} 148 | ret.update({'image': jittered_img}) 149 | 150 | return ret -------------------------------------------------------------------------------- /lib/data_utils/transforms/crop.py: -------------------------------------------------------------------------------- 1 | import os 2 | import cv2 3 | import random 4 | 5 | import numpy as np 6 | import os.path as osp 7 | import torchvision.transforms as transforms 8 | import torchvision.transforms.functional as F 9 | 10 | from PIL import Image 11 | 12 | class _CropBase(object): 13 | """Crop PIL Image or numpy array according to specified bounding box. 14 | Also affine keypoints coordinates to make sure keypoints are on the right position of the croped clip. 15 | 16 | In order to make the augmentation process simple and efficient, some image-level augmention are done here 17 | in a coupling manner, such as random rotation and random scale jitter. 18 | 19 | Args: 20 | patch_height (float or int): cropped clip height. Default value is 224. 21 | patch_width (float or int): cropped clip width. Default value is 224. 22 | rot_jitter (float): how much to randomly rotate clip and keypoints. rotation angle 23 | is chosen uniformly from [-rot_jitter, rot_jitter]. 24 | size_jitter (float): how much to randomly rescale clip and keypoints. scale factor 25 | is chosen uniformly from [1.3 - size_jitter, 1.3 + size_jitter]. 26 | random_crop_p (float): how much probability to apply random crop gen_augmentation 27 | random_crop_size (float): max ratio of height and width to be cropped 28 | """ 29 | def __init__(self, patch_height=224, patch_width=224, rot_jitter=0., 30 | size_jitter=0., random_crop_p=0., random_crop_size=0.5, default_bbox_scale=1.3): 31 | self.patch_width = patch_width 32 | self.patch_height = patch_height 33 | self.size_jitter = size_jitter 34 | self.rot_jitter = rot_jitter 35 | self.random_crop_p = random_crop_p 36 | self.random_crop_size = random_crop_size 37 | self.s = default_bbox_scale 38 | 39 | def gen_augmentation(self): 40 | scale = random.uniform(self.s - self.size_jitter, self.s + self.size_jitter) 41 | rot = random.uniform(-self.rot_jitter, self.rot_jitter) 42 | if np.random.rand() < self.random_crop_p: 43 | scale = np.random.uniform(self.s - self.random_crop_size, self.s) 44 | shift_w = np.random.uniform(-(self.s-scale)/2.0, (self.s-scale)/2.0) 45 | shift_h = np.random.uniform(-(self.s-scale)/2.0, (self.s-scale)/2.0) 46 | return (scale, scale), rot, (shift_w, shift_h) 47 | else: 48 | return (scale, scale), rot, (0, 0) 49 | 50 | def rotate_2d(self, pt_2d, rot_rad): 51 | x = pt_2d[0] 52 | y = pt_2d[1] 53 | sn, cs = np.sin(rot_rad), np.cos(rot_rad) 54 | xx = x * cs - y * sn 55 | yy = x * sn + y * cs 56 | return np.array([xx, yy], dtype=np.float32) 57 | 58 | def gen_trans(self, bbox, scale, rot, shift): 59 | # augment size with scale 60 | src_w = bbox[2] * scale[0] 61 | src_h = bbox[3] * scale[1] 62 | src_center = bbox[:2] + bbox[2:] * shift 63 | 64 | # augment rotation 65 | rot_rad = np.pi * rot / 180 66 | src_downdir = self.rotate_2d(np.array([0, src_h * 0.5], dtype=np.float32), rot_rad) 67 | src_rightdir = self.rotate_2d(np.array([src_w * 0.5, 0], dtype=np.float32), rot_rad) 68 | 69 | dst_w = self.patch_width 70 | dst_h = self.patch_height 71 | dst_center = np.array([dst_w * 0.5, dst_h * 0.5], dtype=np.float32) 72 | dst_downdir = np.array([0, dst_h * 0.5], dtype=np.float32) 73 | dst_rightdir = np.array([dst_w * 0.5, 0], dtype=np.float32) 74 | 75 | src = np.zeros((3, 2), dtype=np.float32) 76 | src[0, :] = src_center 77 | src[1, :] = src_center + src_downdir 78 | src[2, :] = src_center + src_rightdir 79 | 80 | dst = np.zeros((3, 2), dtype=np.float32) 81 | dst[0, :] = dst_center 82 | dst[1, :] = dst_center + dst_downdir 83 | dst[2, :] = dst_center + dst_rightdir 84 | 85 | trans = cv2.getAffineTransform(np.float32(src), np.float32(dst)) 86 | 87 | return trans 88 | 89 | def trans_image(self, image, trans): 90 | affined = cv2.warpAffine(image.copy(), trans, (int(self.patch_width), int(self.patch_height)), 91 | flags=cv2.INTER_LINEAR, borderMode=cv2.BORDER_CONSTANT) 92 | affined = Image.fromarray(affined) 93 | return affined 94 | 95 | def trans_keypoints(self, kp_2d, trans): 96 | if len(kp_2d.shape) == 1: 97 | # a single keypoint 98 | src_pt = np.array([kp_2d[0], kp_2d[1], 1.]).T 99 | dst_pt = np.dot(trans, src_pt) 100 | return np.concatenate([dst_pt[0:2], kp_2d[-1:]], axis=0) 101 | else: 102 | # list of keypoints 103 | new_kp = np.zeros_like(kp_2d) 104 | for i, kp in enumerate(kp_2d): 105 | new_kp[i] = self.trans_keypoints(kp, trans) 106 | return new_kp 107 | 108 | def __call__(self, instance): 109 | raise NotImplementedError() 110 | 111 | 112 | 113 | class CropImage(_CropBase): 114 | """Crop PIL Image or numpy array and keypoints according to specified bounding box. 115 | """ 116 | def __init__(self, patch_height=224, patch_width=224, rot_jitter=0., 117 | size_jitter=0., random_crop_p=0., random_crop_size=0.5, default_bbox_scale=1.3): 118 | super(CropImage, self).__init__(patch_height, patch_width, rot_jitter, 119 | size_jitter, random_crop_p, random_crop_size, default_bbox_scale) 120 | 121 | def __call__(self, instance): 122 | if 'bbox' not in instance.keys(): 123 | # do nothing if bbox is not specified 124 | return instance 125 | 126 | image, bbox = instance['image'], instance['bbox'] 127 | kp_2d = instance['kp_2d'] if 'kp_2d' in instance else None 128 | 129 | scale, rot, shift = self.gen_augmentation() 130 | trans = self.gen_trans(bbox, scale, rot, shift) 131 | image = self.trans_image(image, trans) 132 | if kp_2d is not None: 133 | kp_2d = self.trans_keypoints(kp_2d, trans) 134 | 135 | ret = {k: v for k, v in instance.items() if k not in ['image', 'kp_2d']} 136 | ret.update({'image': image}) 137 | if kp_2d is not None: 138 | ret.update({'kp_2d': kp_2d}) 139 | return ret 140 | 141 | 142 | 143 | class CropVideo(_CropBase): 144 | """Crop a sequence of PIL Image or numpy array and keypoints according to specified bounding box. 145 | """ 146 | def __init__(self, patch_height=224, patch_width=224, rot_jitter=0., 147 | size_jitter=0., random_crop_p=0., random_crop_size=0.5, default_bbox_scale=1.3): 148 | super(CropVideo, self).__init__(patch_height, patch_width, rot_jitter, 149 | size_jitter, random_crop_p, random_crop_size, default_bbox_scale) 150 | 151 | def __call__(self, instance): 152 | if 'bbox' not in instance.keys(): 153 | # do nothing if bbox is not specified 154 | return instance 155 | 156 | clip, bboxs = instance['clip'], instance['bbox'] 157 | 158 | kp_2d = instance['kp_2d'] if 'kp_2d' in instance else [None] * len(clip) 159 | 160 | scale, rot, shift = self.gen_augmentation() 161 | 162 | clip_croped = [] 163 | keypoints_affine = [] 164 | for frame, bbox, keypoint in zip(clip, bboxs, kp_2d): 165 | trans = self.gen_trans(bbox, scale, rot, shift) 166 | clip_croped.append(self.trans_image(frame, trans)) 167 | if keypoint is not None: 168 | keypoints_affine.append(self.trans_keypoints(keypoint, trans)) 169 | 170 | if len(keypoints_affine) > 0: 171 | keypoints_affine = np.stack(keypoints_affine, axis=0) 172 | 173 | ret = {k: v for k, v in instance.items() if k not in ['clip', 'kp_2d']} 174 | ret.update({'clip': clip_croped}) 175 | if len(keypoints_affine) > 0: 176 | ret.update({'kp_2d': keypoints_affine}) 177 | return ret 178 | -------------------------------------------------------------------------------- /lib/data_utils/transforms/random_erase.py: -------------------------------------------------------------------------------- 1 | import random 2 | import numpy as np 3 | 4 | from PIL import Image 5 | 6 | class _RandomEraseBase(object): 7 | """Randomly erase the lower part of the clip 8 | Args: 9 | prob (float): The probability to apply random erase 10 | max_erase_part (float): The maximum ratio of the erase part 11 | random_filling (bool): if True, fill the erased part with random pixel, otherwise with zero pixel. 12 | erase_kp (bool): if True, mask out the keypoints in the erased part. 13 | margin: (float): if is set to True, the keypoints in the margin between erased part and unerased part will not be masked out. 14 | """ 15 | 16 | def __init__(self, prob=0, max_erase_part=0.5, random_filling=True, erase_kp=True, margin=0.1): 17 | self.prob = prob 18 | self.max_erase_part = max_erase_part 19 | self.random_filling = random_filling 20 | self.erase_kp = erase_kp 21 | self.margin = margin 22 | 23 | def _erase_top(self, img, kp_2d, kp_3d, erased_ratio): 24 | h,w,_ = img.shape 25 | erased_h = int(h * erased_ratio) 26 | if erased_h > 0: 27 | if self.random_filling: 28 | img[:erased_h] = np.random.randint(256, size=(erased_h, w, 3), dtype=np.uint8) 29 | else: 30 | img[:erased_h] = 0 31 | if self.erase_kp: 32 | for i, kp in enumerate(kp_2d): 33 | if erased_h - kp[1] > h * self.margin: 34 | kp_2d[2] = 0. 35 | if kp_3d is not None: 36 | kp_3d[t, i, -1] = 0 37 | return img, kp_2d, kp_3d 38 | 39 | def _erase_bottom(self, img, kp_2d, kp_3d, erased_ratio): 40 | h,w,_ = img.shape 41 | erased_h = int(h * erased_ratio) 42 | if erased_h > 0: 43 | if self.random_filling: 44 | img[-erased_h:] = np.random.randint(256, size=(erased_h, w, 3), dtype=np.uint8) 45 | else: 46 | img[-erased_h:] = 0 47 | if self.erase_kp: 48 | for i, kp in enumerate(kp_2d): 49 | if erased_h - (h - kp[1]) > h * self.margin: 50 | kp_2d[2] = 0. 51 | if kp_3d is not None: 52 | kp_3d[t, i, -1] = 0 53 | return img, kp_2d, kp_3d 54 | 55 | def _erase_left(self, img, kp_2d, kp_3d, erased_ratio): 56 | h,w,_ = img.shape 57 | erased_w = int(w * erased_ratio) 58 | if erased_w > 0: 59 | if self.random_filling: 60 | img[:erased_w] = np.random.randint(256, size=(h, erased_w, 3), dtype=np.uint8) 61 | else: 62 | img[:erased_w] = 0 63 | if self.erase_kp: 64 | for i, kp in enumerate(kp_2d): 65 | if erased_w - kp[0] > w * self.margin: 66 | kp_2d[2] = 0. 67 | if kp_3d is not None: 68 | kp_3d[t, i, -1] = 0 69 | return img, kp_2d, kp_3d 70 | 71 | def _erase_right(self, img, kp_2d, kp_3d, erased_ratio): 72 | h,w,_ = img.shape 73 | erased_w = int(w * erased_ratio) 74 | if erased_w > 0: 75 | if self.random_filling: 76 | img[-erased_w:] = np.random.randint(256, size=(h, erased_w, 3), dtype=np.uint8) 77 | else: 78 | img[-erased_w:] = 0 79 | if self.erase_kp: 80 | for i, kp in enumerate(kp_2d): 81 | if erased_w - (w - kp[0]) > w * self.margin: 82 | kp_2d[2] = 0. 83 | if kp_3d is not None: 84 | kp_3d[t, i, -1] = 0 85 | return img, kp_2d, kp_3d 86 | 87 | def __call__(self, instance): 88 | raise NotImplementedError() 89 | 90 | 91 | class RandomEraseVideo(_RandomEraseBase): 92 | """Randomly erase the lower part of the clip 93 | Args: 94 | prob (float): The probability to apply random erase 95 | max_erase_part (float): The maximum ratio of the erase part 96 | random_filling (bool): if True, fill the erased part with random pixel, otherwise with zero pixel. 97 | erase_kp (bool): if True, mask out the keypoints in the erased part. 98 | margin: (float): if is set to True, the keypoints in the margin between erased part and unerased part will not be masked out. 99 | """ 100 | 101 | def __init__(self, prob=0, max_erase_part=0.5, random_filling=True, erase_kp=True, margin=0.1): 102 | super(RandomEraseVideo, self).__init__(prob, max_erase_part, random_filling, erase_kp, margin) 103 | 104 | def __call__(self, instance): 105 | """ 106 | Args: 107 | instance (dict): must contain key 'image'. 108 | instance['image'] PIL Image or numpy arrays. 109 | """ 110 | if isinstance(instance['clip'][0], Image.Image): 111 | clip = instance['clip'] 112 | elif isinstance(instance['clip'][0], np.ndarray): 113 | clip = [Image.fromarray(c) for c in instance['clip']] 114 | else: 115 | clip = instance['clip'][0] 116 | raise TypeError( 117 | f'Random Erase not yet implemented for {type(clip)}') 118 | 119 | kp_2d = instance['kp_2d'].copy() 120 | kp_3d = instance['kp_3d'].copy() if 'kp_3d' in instance else None 121 | 122 | # Apply to all images 123 | erased_clip = [] 124 | erased_kp_2ds = [] 125 | 126 | erased_part = random.choice([self._erase_left, self._erase_right, self._erase_top, self._erase_bottom]) 127 | for t, (kp_2d_frame, img) in enumerate(zip(kp_2d, clip)): 128 | erased_img = img.copy() 129 | erased_kp_2d = kp_2d_frame.copy() 130 | if np.random.rand() < self.prob: 131 | erased_ratio = np.random.rand() * self.max_erase_part 132 | erased_img = np.array(erased_img) 133 | erased_img, erased_kp_2d, _ = erased_part(erased_img, erased_kp_2d, None, erased_ratio) 134 | 135 | erased_img = Image.fromarray(erased_img) 136 | 137 | erased_kp_2ds.append(erased_kp_2d) 138 | erased_clip.append(erased_img) 139 | 140 | erased_kp_2ds = np.stack(erased_kp_2ds, axis=0) 141 | 142 | 143 | ret = {k:v for k, v in instance.items() if k not in ['clip', 'kp_2d', 'kp_3d']} 144 | ret.update({'clip': erased_clip, 'kp_2d': erased_kp_2ds}) 145 | if kp_3d is not None: 146 | ret.update({'kp_3d': kp_3d}) 147 | 148 | 149 | return ret 150 | 151 | 152 | class RandomEraseImage(_RandomEraseBase): 153 | """Randomly erase the lower part of the clip 154 | Args: 155 | prob (float): The probability to apply random erase 156 | max_erase_part (float): The maximum ratio of the erase part 157 | random_filling (bool): if True, fill the erased part with random pixel, otherwise with zero pixel. 158 | erase_kp (bool): if True, mask out the keypoints in the erased part. 159 | margin: (float): if is set to True, the keypoints in the margin between erased part and unerased part will not be masked out. 160 | """ 161 | 162 | def __init__(self, prob=0, max_erase_part=0.5, random_filling=True, erase_kp=True, margin=0.1): 163 | super(RandomEraseImage, self).__init__(prob, max_erase_part, random_filling, erase_kp, margin) 164 | 165 | def __call__(self, instance): 166 | """ 167 | Args: 168 | instance (dict): must contain key 'image'. 169 | instance['image'] PIL Image or numpy arrays. 170 | """ 171 | if isinstance(instance['image'], Image.Image): 172 | image = instance['image'] 173 | elif isinstance(instance['image'], np.ndarray): 174 | image = Image.fromarray(instance['image']) 175 | else: 176 | image = instance['image'] 177 | raise TypeError( 178 | f'Random Erase not yet implemented for {type(image)}') 179 | 180 | kp_2d = instance['kp_2d'].copy() 181 | kp_3d = instance['kp_3d'].copy() if 'kp_3d' in instance else None 182 | 183 | erased_part = random.choice([self._erase_left, self._erase_right, self._erase_top, self._erase_bottom]) 184 | erased_img = image.copy() 185 | erased_kp_2d = kp_2d.copy() 186 | if np.random.rand() < self.prob: 187 | erased_ratio = np.random.rand() * self.max_erase_part 188 | erased_img = np.array(erased_img) 189 | erased_img, erased_kp_2d, _ = erased_part(erased_img, kp_2d, None, erased_ratio) 190 | erased_img = Image.fromarray(erased_img) 191 | 192 | 193 | ret = {k:v for k, v in instance.items() if k not in ['image', 'kp_2d', 'kp_3d']} 194 | ret.update({'image': erased_img, 'kp_2d': erased_kp_2d}) 195 | if kp_3d is not None: 196 | ret.update({'kp_3d': kp_3d}) 197 | 198 | return ret -------------------------------------------------------------------------------- /lib/data_utils/transforms/random_hflip.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import random 3 | import torchvision.transforms.functional as F 4 | 5 | from lib.data_utils.kp_utils import keypoint_2d_hflip, keypoint_3d_hflip, smpl_pose_hflip 6 | 7 | from PIL import Image 8 | 9 | class RandomHorizontalFlipImage(object): 10 | """Horizontally flip the input image, keypoints and smpl pose randomly with a given probability. 11 | 12 | Args: 13 | p (float): probability of the image being flipped. Default value is 0.5 14 | """ 15 | def __init__(self, p=0.5): 16 | self.p = p 17 | 18 | def __call__(self, instance): 19 | """ 20 | instance (dict): must contain key 'image' and 'kp_2d'. Optional: support 'kp_3d' and 'pose' flip. 21 | 22 | instance['image'] is a list of PIL Images or numpy arrays. 23 | instance['kp_2d'] is a numpy array. 24 | instance['kp_3d'] is a numpy array. 25 | instance['pose'] is a numpy array. 26 | 27 | Returns: 28 | same as input, while image and keypoints are flipped. 29 | """ 30 | if isinstance(instance['image'], Image.Image): 31 | image = instance['image'] 32 | elif isinstance(instance['image'], np.ndarray): 33 | image = Image.fromarray(instance['image']) 34 | else: 35 | image = instance['image'] 36 | raise TypeError( 37 | f'Random Horizontal Flip not yet implemented for {type(image)}') 38 | 39 | kp_2d = instance['kp_2d'].copy() 40 | kp_3d = instance['kp_3d'].copy() if 'kp_3d' in instance else None 41 | pose = instance['pose'].copy() if 'pose' in instance else None 42 | 43 | img_width = image.size[0] 44 | 45 | if random.random() < self.p: 46 | flipped_image = F.hflip(image) 47 | flipped_kp_2d = keypoint_2d_hflip(kp_2d, img_width) 48 | flipped_kp_3d = keypoint_3d_hflip(kp_3d) if kp_3d is not None else None 49 | flipped_pose = smpl_pose_hflip(pose) if pose is not None else None 50 | else: 51 | flipped_image = image 52 | flipped_kp_2d = kp_2d 53 | flipped_kp_3d = kp_3d 54 | flipped_pose = pose 55 | 56 | ret = {k:v for k, v in instance.items() if k not in ['image', 'kp_2d', 'kp_3d', 'pose']} 57 | ret.update({'image': flipped_image, 'kp_2d':flipped_kp_2d}) 58 | if flipped_kp_3d is not None: 59 | ret.update({'kp_3d': flipped_kp_3d}) 60 | if flipped_pose is not None: 61 | ret.update({'pose': flipped_pose}) 62 | 63 | return ret 64 | 65 | 66 | class RandomHorizontalFlipVideo(object): 67 | """Horizontally flip the given list of PIL Images randomly with a given probability. 68 | 69 | Args: 70 | p (float): probability of the image being flipped. Default value is 0.5 71 | """ 72 | def __init__(self, p=0.5): 73 | self.p = p 74 | 75 | def __call__(self, instance): 76 | """ 77 | instance (dict): must contain key 'clip' and 'kp_2d'. Optional: support 'kp_3d' and 'pose' flip. 78 | 79 | instance['clip'] is a list of PIL Images or numpy arrays. 80 | instance['kp_2d'] is a numpy array. 81 | instance['kp_3d'] is a numpy array. 82 | instance['pose'] is a numpy array. 83 | 84 | Returns: 85 | same as input, while clip and keypoints are flipped. 86 | """ 87 | if isinstance(instance['clip'][0], Image.Image): 88 | clip = instance['clip'] 89 | elif isinstance(instance['clip'][0], np.ndarray): 90 | clip = [Image.fromarray(c) for c in instance['clip']] 91 | else: 92 | clip = instance['clip'][0] 93 | raise TypeError( 94 | f'Random Horizontal Flip not yet implemented for {type(clip)}') 95 | 96 | kp_2d = instance['kp_2d'].copy() 97 | kp_3d = instance['kp_3d'].copy() if 'kp_3d' in instance else None 98 | pose = instance['pose'].copy() if 'pose' in instance else None 99 | 100 | img_width = clip[0].size[0] 101 | 102 | if random.random() < self.p: 103 | flipped_clip = [] 104 | for img in clip: 105 | flipped_clip.append(F.hflip(img)) 106 | flipped_kp_2d = keypoint_2d_hflip(kp_2d, img_width) 107 | flipped_kp_3d = keypoint_3d_hflip(kp_3d) if kp_3d is not None else None 108 | flipped_pose = smpl_pose_hflip(pose) if pose is not None else None 109 | else: 110 | flipped_clip = clip 111 | flipped_kp_2d = kp_2d 112 | flipped_kp_3d = kp_3d 113 | flipped_pose = pose 114 | 115 | ret = {k:v for k, v in instance.items() if k not in ['clip', 'kp_2d', 'kp_3d', 'pose']} 116 | ret.update({'clip': flipped_clip, 'kp_2d':flipped_kp_2d}) 117 | if flipped_kp_3d is not None: 118 | ret.update({'kp_3d': flipped_kp_3d}) 119 | if flipped_pose is not None: 120 | ret.update({'pose': flipped_pose}) 121 | 122 | return ret -------------------------------------------------------------------------------- /lib/dataset/__init__.py: -------------------------------------------------------------------------------- 1 | from .dataset_video import VideoDataset 2 | from .dataset_image import ImageDataset 3 | -------------------------------------------------------------------------------- /lib/dataset/dataset_image.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import os 3 | # import mc 4 | import cv2 5 | import torch 6 | import numpy as np 7 | import joblib 8 | import os.path as osp 9 | from PIL import Image 10 | from torch.utils.data import Dataset 11 | 12 | from lib.utils.geometry import rotation_matrix_to_angle_axis 13 | from lib.data_utils.img_utils import read_img 14 | from lib.core.config import DB_DIR 15 | 16 | 17 | class ImageDataset(Dataset): 18 | def __init__(self, dataset_name, set, 19 | transforms=None, 20 | verbose=True, debug=False): 21 | 22 | self.dataset_name = dataset_name 23 | self.set = set 24 | self.transforms = transforms 25 | 26 | self.debug = debug 27 | self.verbose = verbose 28 | 29 | self.db = self._load_db() 30 | 31 | if self.verbose: 32 | print(f'{self.dataset_name} - Number of dataset objects {self.__len__()}') 33 | 34 | def _load_db(self): 35 | db_file = osp.join(DB_DIR, f'{self.dataset_name}_{self.set}_db.pt') 36 | 37 | if osp.isfile(db_file): 38 | db = joblib.load(db_file) 39 | else: 40 | raise ValueError(f'{db_file} do not exists') 41 | 42 | if self.verbose: 43 | print(f'Loaded {self.dataset_name} dataset from {db_file}') 44 | return db 45 | 46 | def __len__(self): 47 | return len(self.db['img_name']) 48 | 49 | def __getitem__(self, index): 50 | kp_2d = self.db['joints2D'][index] 51 | kp_3d = self.db['joints3D'][index] if 'joints3D' in self.db else np.zeros([49, 4]) 52 | # path_names = self.db['img_name'][index].split('/') 53 | # # if 'human3.6m' in path_names: 54 | # # 55 | path_name = self.db['img_name'][index] 56 | image = read_img(path_name) 57 | shape = self.db['shape'][index] if 'shape' in self.db else np.zeros([10]) 58 | cam = self.db['cam'][index] if 'cam' in self.db else np.array([1., 0., 0.]) 59 | bbox = self.db['bbox'][index] 60 | 61 | pose = self.db['pose'][index].astype(np.float32) if 'pose' in self.db else np.zeros([72]) 62 | if len(pose.shape) > 1: 63 | pose = rotation_matrix_to_angle_axis(torch.from_numpy(pose)).numpy().flatten() 64 | 65 | target = { 66 | 'image': image, 67 | 'kp_2d': kp_2d, 68 | 'kp_3d': kp_3d, 69 | 'pose':pose, 70 | 'shape':shape, 71 | 'cam':cam, 72 | 'bbox': bbox 73 | } 74 | if self.transforms: 75 | target = self.transforms(target) 76 | 77 | target['theta'] = torch.cat([target['cam'].float(), target['pose'].float(), target['shape'].float()], dim=0) # camera, pose and shape 78 | target['w_smpl'] = torch.tensor(1).float() if 'pose' in self.db else torch.tensor(0).float() 79 | 80 | new_target = {} 81 | for k, v in target.items(): 82 | if k in ['pose', 'cam', 'shape']: 83 | continue 84 | new_target[k] = v.float() 85 | 86 | return new_target 87 | -------------------------------------------------------------------------------- /lib/dataset/dataset_video.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import os 3 | import torch 4 | import logging 5 | import numpy as np 6 | import os.path as osp 7 | import joblib 8 | import random 9 | 10 | from torch.utils.data import Dataset 11 | 12 | from lib.core.config import DB_DIR 13 | from lib.models.smpl import OP_TO_J14, J49_TO_J14 14 | from lib.data_utils.kp_utils import convert_kps 15 | from lib.data_utils.img_utils import split_into_chunks, read_img 16 | 17 | logger = logging.getLogger(__name__) 18 | 19 | class VideoDataset(Dataset): 20 | def __init__(self, dataset_name, set, transforms, 21 | seqlen=0, overlap=0., sample_pool=64, 22 | random_sample=True, random_start=False, 23 | pad=True, verbose=True, debug=False): 24 | 25 | self.dataset_name = dataset_name 26 | self.set = set 27 | self.transforms = transforms 28 | 29 | assert seqlen > 0 or sample_pool > 0 30 | self.seqlen = seqlen if seqlen > 0 else sample_pool 31 | self.sample_pool = sample_pool if sample_pool > 0 else seqlen 32 | self.sample_freq = self.sample_pool // self.seqlen 33 | #assert self.sample_pool % self.seqlen == 0 34 | 35 | self.overlap = overlap 36 | self.stride = max(int(self.sample_pool * (1-overlap)), 1) if overlap < 1 else overlap 37 | 38 | self.random_sample = random_sample 39 | self.random_start = random_start 40 | assert not (self.random_sample and self.random_start) 41 | # Either random sample or random start, cannot be both 42 | 43 | self.debug = debug 44 | self.verbose = verbose 45 | 46 | self.db = self._load_db() 47 | self.vid_indices = split_into_chunks(self.db['vid_name'], self.sample_pool, self.stride, pad) 48 | 49 | if self.verbose: 50 | print(f'{self.dataset_name} - Dataset overlap ratio: {self.overlap}') 51 | print(f'{self.dataset_name} - Number of dataset objects {self.__len__()}') 52 | 53 | 54 | def __len__(self): 55 | return len(self.vid_indices) 56 | 57 | def __getitem__(self, index): 58 | is_train = self.set == 'train' 59 | target = {} 60 | 61 | # determine sample index 62 | sample_idx, full_sample_idx = self.gen_sample_index(index) 63 | 64 | # load and process 2D&3D keypoints 65 | kp_2d, kp_3d = self.get_keypoints(sample_idx) 66 | 67 | # load SMPL parameters: theta, beta along with cam params. 68 | cam, pose, shape, w_smpl = self.get_smpl_params(sample_idx) 69 | target['w_smpl'] = w_smpl 70 | 71 | # bounding box 72 | if self.dataset_name != 'insta': 73 | bbox = self.db['bbox'][sample_idx] 74 | if not is_train: 75 | target['bbox'] = self.db['bbox'][sample_idx] 76 | 77 | # images 78 | image_paths = self.db['img_name'][sample_idx] 79 | # new_img_paths = [] 80 | # for path_name in image_paths: 81 | # if isinstance(path_name, bytes): 82 | # path_name = path_name.decode() 83 | # if 'mpi_inf_3dhp' in path_name: 84 | # path_name = path_name[:-10] + 'frame_' + path_name[-10:] 85 | # path_name = path_name.replace('v', 'V') 86 | # new_img_paths.append(path_name) 87 | 88 | images = [read_img(path) for path in image_paths] 89 | 90 | if not is_train: 91 | target['paths'] = self.db['img_name'][sample_idx].tolist() 92 | 93 | # preprocess and augmentation 94 | raw_inp = { 95 | 'clip': images, 96 | 'kp_2d': kp_2d, 97 | 'kp_3d':kp_3d, 98 | 'pose':pose, 99 | 'shape':shape, 100 | 'cam':cam, 101 | } 102 | if self.dataset_name != 'insta': 103 | raw_inp['bbox'] = bbox 104 | transformed = self.transforms(raw_inp) 105 | 106 | target['images'] = transformed['clip'].float() 107 | target['kp_2d'] = transformed['kp_2d'].float() 108 | target['kp_3d'] = transformed['kp_3d'].float() 109 | 110 | theta = torch.cat([transformed['cam'].float(), transformed['pose'].float(), transformed['shape'].float()], dim=1) #(T, 85) 111 | target['theta'] = theta.float() # camera, pose and shape 112 | 113 | # optional info for evaluation 114 | if self.dataset_name == 'mpii3d' and not is_train: 115 | target['valid'] = self.db['valid_i'][sample_idx] 116 | vn = self.db['vid_name'][sample_idx] 117 | fi = self.db['frame_id'][sample_idx] 118 | target['instance_id'] = [f'{v}/{f}'for v,f in zip(vn,fi)] 119 | if self.dataset_name in ['3dpw', 'h36m'] and not is_train: 120 | vn = self.db['vid_name'][sample_idx] 121 | fi = self.db['frame_id'][sample_idx] 122 | target['instance_id'] = [f'{v}/{f}'for v,f in zip(vn,fi)] 123 | if not is_train: 124 | valid = np.array(full_sample_idx) 125 | valid = valid - np.roll(valid, 1) 126 | valid = valid > 0 127 | valid[0] = True 128 | target['valid'] = torch.from_numpy(valid) 129 | 130 | # record data source for further use 131 | target['index'] = torch.tensor([index]) 132 | 133 | return target 134 | 135 | def _load_db(self): 136 | db_file = osp.join(DB_DIR, f'{self.dataset_name}_{self.set}_db.pt') 137 | print('db_file', db_file, DB_DIR) 138 | 139 | if osp.isfile(db_file): 140 | db = joblib.load(db_file) 141 | else: 142 | raise ValueError(f'{db_file} do not exists') 143 | 144 | if self.verbose: 145 | print(f'Loaded {self.dataset_name} dataset from {db_file}') 146 | return db 147 | 148 | def gen_sample_index(self, index): 149 | full_sample_idx = self.vid_indices[index] 150 | 151 | if self.random_sample: 152 | sample_idx = [] 153 | for i in range(self.seqlen): 154 | sample_idx.append(full_sample_idx[self.sample_freq*i + random.randint(0, self.sample_freq-1)]) 155 | elif self.random_start: 156 | start = random.randint(0, self.sample_freq-1) 157 | sample_idx = full_sample_idx[start::self.sample_freq][:self.seqlen] 158 | else: 159 | sample_idx = full_sample_idx[::self.sample_freq][:self.seqlen] 160 | 161 | return sample_idx, full_sample_idx 162 | 163 | def get_keypoints(self, sample_idx): 164 | if 'joints2D' in self.db: 165 | kp_2d = self.db['joints2D'][sample_idx] 166 | else: 167 | kp_2d = np.zeros([self.seqlen, 49, 3]) 168 | 169 | if 'joints3D' in self.db: 170 | kp_3d = self.db['joints3D'][sample_idx] 171 | else: 172 | kp_3d = np.zeros([self.seqlen, 49, 4]) 173 | 174 | return kp_2d, kp_3d 175 | 176 | def get_smpl_params(self, sample_idx): 177 | # w_smpl indicates whether the instance's SMPL parameters are valid 178 | if 'pose' in self.db: 179 | assert 'shape' in self.db 180 | pose = self.db['pose'][sample_idx] 181 | shape = self.db['shape'][sample_idx] 182 | w_smpl = torch.ones(self.seqlen).float() 183 | else: 184 | pose = np.zeros((self.seqlen, 72)) 185 | shape = np.zeros((self.seqlen, 10)) 186 | w_smpl = torch.zeros(self.seqlen).float() 187 | 188 | cam = np.concatenate([np.ones((self.seqlen, 1)), np.zeros((self.seqlen, 2))], axis=1) #(T, 3) 189 | return cam, pose, shape, w_smpl 190 | -------------------------------------------------------------------------------- /lib/dataset/loaders.py: -------------------------------------------------------------------------------- 1 | from torch.utils.data import ConcatDataset, DataLoader, Subset 2 | from torch.utils.data.distributed import DistributedSampler 3 | import torch.distributed as dist 4 | from lib.dataset import * 5 | import torch 6 | import joblib 7 | import os.path as osp 8 | 9 | def get_data_loaders(cfg, transforms_3d, transforms_2d, transforms_val, transforms_img, rank, world_size, verbose=True): 10 | def get_2d_datasets(dataset_names): 11 | datasets = [] 12 | for dataset_name in dataset_names: 13 | db = VideoDataset( 14 | dataset_name=dataset_name, 15 | set='train', 16 | transforms=transforms_2d, 17 | seqlen=cfg.DATASET.SEQLEN, 18 | overlap=cfg.DATASET.OVERLAP, 19 | sample_pool=cfg.DATASET.SAMPLE_POOL, 20 | random_sample=cfg.DATASET.RANDOM_SAMPLE, 21 | random_start=cfg.DATASET.RANDOM_START, 22 | verbose=verbose, 23 | debug=cfg.DEBUG 24 | ) 25 | datasets.append(db) 26 | return ConcatDataset(datasets) 27 | 28 | def get_3d_datasets(dataset_names): 29 | datasets = [] 30 | for dataset_name in dataset_names: 31 | db = VideoDataset( 32 | dataset_name=dataset_name, 33 | set='train', 34 | transforms=transforms_3d, 35 | seqlen=cfg.DATASET.SEQLEN, 36 | overlap=cfg.DATASET.OVERLAP if dataset_name != '3dpw' else 8, 37 | sample_pool=cfg.DATASET.SAMPLE_POOL, 38 | random_sample=cfg.DATASET.RANDOM_SAMPLE, 39 | random_start=cfg.DATASET.RANDOM_START, 40 | verbose=verbose, 41 | debug=cfg.DEBUG, 42 | ) 43 | datasets.append(db) 44 | return ConcatDataset(datasets) 45 | 46 | def get_img_datasets(dataset_names): 47 | datasets = [] 48 | for dataset_name in dataset_names: 49 | db = ImageDataset( 50 | dataset_name=dataset_name, 51 | set='train', 52 | transforms=transforms_img, 53 | verbose=verbose, 54 | debug=cfg.DEBUG, 55 | ) 56 | if dataset_name == 'mpii3d': 57 | db = Subset(db, list(range(len(db)))[::5]) 58 | datasets.append(db) 59 | return ConcatDataset(datasets) 60 | 61 | # ===== Video 2D keypoints datasets ===== 62 | train_2d_dataset_names = cfg.TRAIN.DATASETS_2D 63 | data_2d_batch_size = cfg.TRAIN.BATCH_SIZE_2D 64 | 65 | if data_2d_batch_size: 66 | train_2d_db = get_2d_datasets(train_2d_dataset_names) 67 | train_2d_sampler = DistributedSampler(train_2d_db, rank=rank, num_replicas=world_size) 68 | train_2d_loader = DataLoader( 69 | dataset=train_2d_db, 70 | batch_size=data_2d_batch_size, 71 | #shuffle=True, 72 | num_workers=cfg.NUM_WORKERS, 73 | sampler=train_2d_sampler 74 | ) 75 | else: 76 | train_2d_loader = None 77 | 78 | # ===== Video 3D keypoint datasets ===== 79 | train_3d_dataset_names = cfg.TRAIN.DATASETS_3D 80 | data_3d_batch_size = cfg.TRAIN.BATCH_SIZE_3D 81 | 82 | if data_3d_batch_size: 83 | train_3d_db = get_3d_datasets(train_3d_dataset_names) 84 | train_3d_sampler = DistributedSampler(train_3d_db, rank=rank, num_replicas=world_size) 85 | train_3d_loader = DataLoader( 86 | dataset=train_3d_db, 87 | batch_size=data_3d_batch_size, 88 | #shuffle=True, 89 | num_workers=cfg.NUM_WORKERS, 90 | sampler=train_3d_sampler 91 | ) 92 | else: 93 | train_3d_loader = None 94 | 95 | # ===== Image datasets ===== 96 | train_img_dataset_names = cfg.TRAIN.DATASETS_IMG 97 | data_img_batch_size = cfg.TRAIN.BATCH_SIZE_IMG 98 | 99 | if data_img_batch_size: 100 | train_img_db = get_img_datasets(train_img_dataset_names) 101 | train_img_sampler = DistributedSampler(train_img_db, rank=rank, num_replicas=world_size) 102 | train_img_loader = DataLoader( 103 | dataset=train_img_db, 104 | batch_size=data_img_batch_size, 105 | num_workers=cfg.NUM_WORKERS, 106 | sampler=train_img_sampler, 107 | ) 108 | else: 109 | train_img_loader = None 110 | 111 | eval_set = 'test' if data_3d_batch_size else 'val' 112 | 113 | # ===== Evaluation dataset ===== 114 | valid_db = VideoDataset( 115 | dataset_name=cfg.TRAIN.DATASET_EVAL, 116 | set=cfg.TRAIN.EVAL_SET, 117 | transforms=transforms_val, 118 | overlap=0, 119 | sample_pool=cfg.EVAL.SAMPLE_POOL, 120 | random_sample=False, 121 | random_start=False, 122 | verbose=verbose, 123 | debug=cfg.DEBUG 124 | ) 125 | #valid_sampler = DistributedSampler(valid_db, rank=rank, num_replicas=world_size) 126 | valid_sampler = None 127 | 128 | valid_loader = DataLoader( 129 | dataset=valid_db, 130 | batch_size=cfg.EVAL.BATCH_SIZE, 131 | shuffle=False, 132 | num_workers=cfg.NUM_WORKERS, 133 | sampler=valid_sampler 134 | ) 135 | 136 | return train_2d_loader, train_3d_loader, valid_loader, train_img_loader 137 | -------------------------------------------------------------------------------- /lib/models/__init__.py: -------------------------------------------------------------------------------- 1 | from .maed import MAED 2 | from .ops import * 3 | -------------------------------------------------------------------------------- /lib/models/ktd.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pickle 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | from lib.models.smpl import SMPL, SMPL_MODEL_DIR 7 | from lib.models.spin import projection 8 | from lib.utils.geometry import rot6d_to_rotmat, rotation_matrix_to_angle_axis 9 | 10 | ANCESTOR_INDEX = [ 11 | [], 12 | [0], 13 | [0], 14 | [0], 15 | [0, 1], 16 | [0, 2], 17 | [0, 3], 18 | [0, 1, 4], 19 | [0, 2, 5], 20 | [0, 3, 6], 21 | [0, 1, 4, 7], 22 | [0, 2, 5, 8], 23 | [0, 3, 6, 9], 24 | [0, 3, 6, 9], 25 | [0, 3, 6, 9], 26 | [0, 3, 6, 9, 12], 27 | [0, 3, 6, 9, 13], 28 | [0, 3, 6, 9, 14], 29 | [0, 3, 6, 9, 13, 16], 30 | [0, 3, 6, 9, 14, 17], 31 | [0, 3, 6, 9, 13, 16, 18], 32 | [0, 3, 6, 9, 14, 17, 19], 33 | [0, 3, 6, 9, 13, 16, 18, 20], 34 | [0, 3, 6, 9, 14, 17, 19, 21] 35 | ] 36 | 37 | class KTD(nn.Module): 38 | def __init__(self, feat_dim=2048, hidden_dim=1024, **kwargs): 39 | super(KTD, self).__init__() 40 | 41 | self.feat_dim = feat_dim 42 | self.smpl = SMPL( 43 | SMPL_MODEL_DIR, 44 | create_transl=False, 45 | create_global_orient=False, 46 | create_body_pose=False, 47 | create_betas=False, 48 | ) 49 | npose_per_joint = 6 50 | nshape = 10 51 | ncam = 3 52 | 53 | self.fc1 = nn.Linear(feat_dim, hidden_dim) 54 | self.drop1 = nn.Dropout() 55 | self.fc2 = nn.Linear(hidden_dim, hidden_dim) 56 | self.drop2 = nn.Dropout() 57 | 58 | self.joint_regs = nn.ModuleList() 59 | for joint_idx, ancestor_idx in enumerate(ANCESTOR_INDEX): 60 | regressor = nn.Linear(hidden_dim + npose_per_joint * len(ancestor_idx), npose_per_joint) 61 | nn.init.xavier_uniform_(regressor.weight, gain=0.01) 62 | self.joint_regs.append(regressor) 63 | 64 | self.decshape = nn.Linear(hidden_dim, nshape) 65 | self.deccam = nn.Linear(hidden_dim, ncam) 66 | nn.init.xavier_uniform_(self.decshape.weight, gain=0.01) 67 | nn.init.xavier_uniform_(self.deccam.weight, gain=0.01) 68 | 69 | def forward(self, x, seqlen, J_regressor=None, 70 | return_shape_cam=False, **kwargs): 71 | nt = x.shape[0] 72 | N = nt//seqlen 73 | 74 | x = self.fc1(x) 75 | x = self.drop1(x) 76 | x = self.fc2(x) 77 | x = self.drop2(x) 78 | pred_shape = self.decshape(x) 79 | pred_cam = self.deccam(x) 80 | 81 | pose = [] 82 | for ancestor_idx, reg in zip(ANCESTOR_INDEX, self.joint_regs): 83 | ances = torch.cat([x] + [pose[i] for i in ancestor_idx], dim=1) 84 | pose.append(reg(ances)) 85 | 86 | pred_pose = torch.cat(pose, dim=1) 87 | 88 | if return_shape_cam: 89 | return pred_shape, pred_cam 90 | output_regress = self.get_output(pred_pose, pred_shape, pred_cam, J_regressor) 91 | 92 | return output_regress 93 | 94 | def get_output(self, pred_pose, pred_shape, pred_cam, J_regressor): 95 | output = {} 96 | 97 | nt = pred_pose.shape[0] 98 | pred_rotmat = rot6d_to_rotmat(pred_pose).reshape(nt, -1, 3, 3) 99 | 100 | pred_output = self.smpl( 101 | betas=pred_shape, 102 | body_pose=pred_rotmat[:, 1:], 103 | global_orient=pred_rotmat[:, 0].unsqueeze(1), 104 | pose2rot=False 105 | ) 106 | 107 | pred_vertices = pred_output.vertices[:nt] 108 | pred_joints = pred_output.joints[:nt] 109 | 110 | if J_regressor is not None: 111 | J_regressor_batch = J_regressor[None, :].expand(pred_vertices.shape[0], -1, -1).to(pred_vertices.device) 112 | pred_joints = torch.matmul(J_regressor_batch, pred_vertices) 113 | 114 | pred_keypoints_2d = projection(pred_joints, pred_cam) 115 | 116 | pose = rotation_matrix_to_angle_axis(pred_rotmat.reshape(-1, 3, 3)).reshape(nt, -1) 117 | 118 | output['theta'] = torch.cat([pred_cam, pose, pred_shape], dim=1) 119 | output['verts'] = pred_vertices 120 | output['kp_2d'] = pred_keypoints_2d 121 | output['kp_3d'] = pred_joints 122 | output['rotmat'] = pred_rotmat 123 | 124 | return output 125 | -------------------------------------------------------------------------------- /lib/models/maed.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | 3 | from torchvision.models import resnet50 4 | from lib.utils.utils import determine_output_feature_dim 5 | from lib.models.ktd import KTD 6 | from lib.models.spin import Regressor as Iterative 7 | from lib.models.vision_transformer import vit_custom_resnet50_224_in21k 8 | 9 | class MAED(nn.Module): 10 | def __init__(self, 11 | encoder='ste', num_blocks=6, num_heads=12, st_mode='parallel', 12 | decoder='ktd', hidden_dim=1024, 13 | **kwargs): 14 | super(MAED, self).__init__() 15 | 16 | self._init_encoder(encoder, num_blocks, num_heads, st_mode, **kwargs) 17 | self._init_decoder(decoder, hidden_dim, **kwargs) 18 | 19 | 20 | def _init_decoder(self, decoder, hidden_dim=1024, **kwargs): 21 | _, feat_dim = determine_output_feature_dim(inp_size=(1, 3, 224, 224), model=self.encoder) 22 | 23 | self.decoder_type = decoder 24 | if decoder.lower() == 'ktd': 25 | self.decoder = KTD(feat_dim=feat_dim, hidden_dim=hidden_dim, **kwargs) 26 | elif decoder.lower() == 'iterative': 27 | self.decoder = Iterative(feat_dim=feat_dim, hidden_dim=hidden_dim, **kwargs) 28 | else: 29 | raise NotImplementedError(decoder) 30 | 31 | 32 | def _init_encoder(self, encoder, num_blocks, num_heads, st_mode, **kwargs): 33 | 34 | self.encoder_type = encoder 35 | if encoder.lower() == 'cnn': 36 | self.encoder = resnet50(pretrained=True) 37 | self.encoder.fc = nn.Identity() 38 | elif encoder.lower() == 'ste': 39 | self.encoder = vit_custom_resnet50_224_in21k(num_blocks, num_heads, st_mode, num_classes=-1) 40 | else: 41 | raise NotImplementedError(encoder) 42 | 43 | def extract_feature(self, x): 44 | 45 | batch_size, seqlen = x.shape[:2] 46 | 47 | x = x.reshape(-1, x.shape[-3], x.shape[-2], x.shape[-1]) # (N,T,3,H,W) -> (NT,3,H,W) 48 | xf = self.encoder(x) 49 | xf = xf.reshape(batch_size, seqlen, -1) 50 | return xf 51 | 52 | def forward(self, x, J_regressor=None, **kwargs): 53 | batch_size, seqlen = x.shape[:2] 54 | 55 | x = x.reshape(-1, x.shape[-3], x.shape[-2], x.shape[-1]) # (N,T,3,H,W) -> (NT,3,H,W) 56 | 57 | xf = self.encoder(x, seqlen=seqlen) if self.encoder_type == 'ste' else self.encoder(x) #(NT, 2048, 7, 7) 58 | 59 | output = self.decoder(xf, seqlen=seqlen, J_regressor=J_regressor, **kwargs) 60 | 61 | output['theta'] = output['theta'].reshape(batch_size, seqlen, -1) 62 | output['verts'] = output['verts'].reshape(batch_size, seqlen, -1, 3) 63 | output['kp_2d'] = output['kp_2d'].reshape(batch_size, seqlen, -1, 2) 64 | output['kp_3d'] = output['kp_3d'].reshape(batch_size, seqlen, -1, 3) 65 | output['rotmat'] = output['rotmat'].reshape(batch_size, seqlen, -1, 3, 3) 66 | 67 | return output -------------------------------------------------------------------------------- /lib/models/ops/__init__.py: -------------------------------------------------------------------------------- 1 | from .drop import DropPath -------------------------------------------------------------------------------- /lib/models/ops/drop.py: -------------------------------------------------------------------------------- 1 | """ DropBlock, DropPath 2 | PyTorch implementations of DropBlock and DropPath (Stochastic Depth) regularization layers. 3 | Papers: 4 | DropBlock: A regularization method for convolutional networks (https://arxiv.org/abs/1810.12890) 5 | Deep Networks with Stochastic Depth (https://arxiv.org/abs/1603.09382) 6 | Code: 7 | DropBlock impl inspired by two Tensorflow impl that I liked: 8 | - https://github.com/tensorflow/tpu/blob/master/models/official/resnet/resnet_model.py#L74 9 | - https://github.com/clovaai/assembled-cnn/blob/master/nets/blocks.py 10 | Hacked together by / Copyright 2020 Ross Wightman 11 | """ 12 | 13 | import torch 14 | import torch.nn as nn 15 | import torch.nn.functional as F 16 | 17 | def drop_path(x, drop_prob: float = 0., training: bool = False): 18 | """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). 19 | This is the same as the DropConnect impl I created for EfficientNet, etc networks, however, 20 | the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper... 21 | See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for 22 | changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use 23 | 'survival rate' as the argument. 24 | """ 25 | if drop_prob == 0. or not training: 26 | return x 27 | keep_prob = 1 - drop_prob 28 | shape = (x.shape[0],) + (1,) * (x.ndim - 1) # work with diff dim tensors, not just 2D ConvNets 29 | random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device) 30 | random_tensor.floor_() # binarize 31 | output = x.div(keep_prob) * random_tensor 32 | return output 33 | 34 | 35 | class DropPath(nn.Module): 36 | """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). 37 | """ 38 | def __init__(self, drop_prob=None): 39 | super(DropPath, self).__init__() 40 | self.drop_prob = drop_prob 41 | 42 | def forward(self, x): 43 | return drop_path(x, self.drop_prob, self.training) -------------------------------------------------------------------------------- /lib/models/smpl.py: -------------------------------------------------------------------------------- 1 | """ 2 | This script is brought from https://github.com/nkolot/SPIN 3 | Adhere to their licence to use this script 4 | """ 5 | 6 | import torch 7 | import numpy as np 8 | import os.path as osp 9 | from smplx import SMPL as _SMPL 10 | from smplx.body_models import ModelOutput 11 | from smplx.lbs import vertices2joints 12 | 13 | from lib.core.config import DATA_DIR 14 | 15 | # Map joints to SMPL joints 16 | JOINT_MAP = { 17 | 'OP Nose': 24, 'OP Neck': 12, 'OP RShoulder': 17, 18 | 'OP RElbow': 19, 'OP RWrist': 21, 'OP LShoulder': 16, 19 | 'OP LElbow': 18, 'OP LWrist': 20, 'OP MidHip': 0, 20 | 'OP RHip': 2, 'OP RKnee': 5, 'OP RAnkle': 8, 21 | 'OP LHip': 1, 'OP LKnee': 4, 'OP LAnkle': 7, 22 | 'OP REye': 25, 'OP LEye': 26, 'OP REar': 27, 23 | 'OP LEar': 28, 'OP LBigToe': 29, 'OP LSmallToe': 30, 24 | 'OP LHeel': 31, 'OP RBigToe': 32, 'OP RSmallToe': 33, 'OP RHeel': 34, 25 | 'Right Ankle': 8, 'Right Knee': 5, 'Right Hip': 45, 26 | 'Left Hip': 46, 'Left Knee': 4, 'Left Ankle': 7, 27 | 'Right Wrist': 21, 'Right Elbow': 19, 'Right Shoulder': 17, 28 | 'Left Shoulder': 16, 'Left Elbow': 18, 'Left Wrist': 20, 29 | 'Neck (LSP)': 47, 'Top of Head (LSP)': 48, 30 | 'Pelvis (MPII)': 49, 'Thorax (MPII)': 50, 31 | 'Spine (H36M)': 51, 'Jaw (H36M)': 52, 32 | 'Head (H36M)': 53, 'Nose': 24, 'Left Eye': 26, 33 | 'Right Eye': 25, 'Left Ear': 28, 'Right Ear': 27 34 | } 35 | JOINT_NAMES = [ 36 | 'OP Nose', 'OP Neck', 'OP RShoulder', 37 | 'OP RElbow', 'OP RWrist', 'OP LShoulder', 38 | 'OP LElbow', 'OP LWrist', 'OP MidHip', 39 | 'OP RHip', 'OP RKnee', 'OP RAnkle', 40 | 'OP LHip', 'OP LKnee', 'OP LAnkle', 41 | 'OP REye', 'OP LEye', 'OP REar', 42 | 'OP LEar', 'OP LBigToe', 'OP LSmallToe', 43 | 'OP LHeel', 'OP RBigToe', 'OP RSmallToe', 'OP RHeel', 44 | 'Right Ankle', 'Right Knee', 'Right Hip', 45 | 'Left Hip', 'Left Knee', 'Left Ankle', 46 | 'Right Wrist', 'Right Elbow', 'Right Shoulder', 47 | 'Left Shoulder', 'Left Elbow', 'Left Wrist', 48 | 'Neck (LSP)', 'Top of Head (LSP)', 49 | 'Pelvis (MPII)', 'Thorax (MPII)', 50 | 'Spine (H36M)', 'Jaw (H36M)', 51 | 'Head (H36M)', 'Nose', 'Left Eye', 52 | 'Right Eye', 'Left Ear', 'Right Ear' 53 | ] 54 | 55 | JOINT_IDS = {JOINT_NAMES[i]: i for i in range(len(JOINT_NAMES))} 56 | JOINT_REGRESSOR_TRAIN_EXTRA = osp.join(DATA_DIR, 'J_regressor_extra.npy') 57 | SMPL_MEAN_PARAMS = osp.join(DATA_DIR, 'smpl_mean_params.npz') 58 | SMPL_MODEL_DIR = DATA_DIR 59 | # H36M_TO_J17 = [6, 5, 4, 1, 2, 3, 16, 15, 14, 11, 12, 13, 8, 10, 0, 7, 9] 60 | # rankle,rknee,rhip,lhip,lknee,lankle,rwrist,relbow,rshoulder,lshoulder,lelbow,lwrist,neck,headtop,hip,Spine,Head 61 | 62 | H36M_TO_J17 = [6, 5, 4, 1, 2, 3, 16, 15, 14, 11, 12, 13, 8, 0, 7, 9, 10] 63 | H36M_TO_J14 = [6, 5, 4, 1, 2, 3, 16, 15, 14, 11, 12, 13, 8, 10] 64 | H36M_TO_MPII3D = [6, 5, 4, 1, 2, 3, 16, 15, 14, 11, 12, 13, 8, 10, 0, 7, 9] 65 | 66 | OP_TO_J14 = [11, 10, 9, 12, 13, 14, 4, 3, 2, 5, 6, 7, 1, -1] 67 | J49_TO_J14 = list(range(25, 39)) 68 | J49_TO_MPII3D = list(range(25, 39)) + [39, 41, 43] 69 | J49_TO_H36M = [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 39, 41, 42, 43] 70 | # rankle,rknee.,rhip,lhip,lknee,lankle,rwrist,relbow,rshoulder,lshoulder,lelbow,lwrist,neck,hip,Spine,Jaw,Head 71 | 72 | REGRESSOR_DICT = { 73 | '3dpw': 'J_regressor_h36m.npy', 74 | 'mpii3d': None, 75 | 'h36m': 'J_regressor_h36m.npy' 76 | } 77 | JID_DICT = { 78 | '3dpw': H36M_TO_J14, 79 | 'h36m': H36M_TO_J17, 80 | 'mpii3d': J49_TO_MPII3D 81 | } 82 | 83 | 84 | class SMPL(_SMPL): 85 | """ Extension of the official SMPL implementation to support more joints """ 86 | 87 | def __init__(self, *args, **kwargs): 88 | super(SMPL, self).__init__(*args, **kwargs) 89 | joints = [JOINT_MAP[i] for i in JOINT_NAMES] 90 | J_regressor_extra = np.load(JOINT_REGRESSOR_TRAIN_EXTRA) 91 | self.register_buffer('J_regressor_extra', torch.tensor(J_regressor_extra, dtype=torch.float32)) 92 | self.joint_map = torch.tensor(joints, dtype=torch.long) 93 | 94 | def forward(self, *args, **kwargs): 95 | kwargs['get_skin'] = True 96 | smpl_output = super(SMPL, self).forward(*args, **kwargs) 97 | extra_joints = vertices2joints(self.J_regressor_extra, smpl_output.vertices) 98 | joints = torch.cat([smpl_output.joints, extra_joints], dim=1) 99 | joints = joints[:, self.joint_map, :] 100 | output = ModelOutput(vertices=smpl_output.vertices, 101 | global_orient=smpl_output.global_orient, 102 | body_pose=smpl_output.body_pose, 103 | joints=joints, 104 | betas=smpl_output.betas, 105 | full_pose=smpl_output.full_pose) 106 | return output 107 | 108 | 109 | def get_smpl_faces(): 110 | smpl = SMPL(SMPL_MODEL_DIR, batch_size=1, create_transl=False) 111 | return smpl.faces 112 | -------------------------------------------------------------------------------- /lib/models/spin.py: -------------------------------------------------------------------------------- 1 | """ 2 | This script is brought from https://github.com/nkolot/SPIN 3 | Adhere to their licence to use this script 4 | """ 5 | 6 | import math 7 | import torch 8 | import numpy as np 9 | import os.path as osp 10 | import torch.nn as nn 11 | 12 | from lib.core.config import DATA_DIR 13 | from lib.utils.geometry import rotation_matrix_to_angle_axis, rot6d_to_rotmat 14 | from lib.models.smpl import SMPL, SMPL_MODEL_DIR, H36M_TO_J17, SMPL_MEAN_PARAMS 15 | 16 | 17 | class Regressor(nn.Module): 18 | def __init__(self, smpl_mean_params=SMPL_MEAN_PARAMS, feat_dim=2048, hidden_dim=1024, **kwargs): 19 | super(Regressor, self).__init__() 20 | 21 | self.smpl = SMPL( 22 | SMPL_MODEL_DIR, 23 | create_transl=False, 24 | create_global_orient=False, 25 | create_body_pose=False, 26 | create_betas=False, 27 | ) 28 | npose = 24 * 6 29 | nshape = 10 30 | 31 | self.fc1 = nn.Linear(feat_dim + npose + nshape + 3, hidden_dim) 32 | self.drop1 = nn.Dropout() 33 | self.fc2 = nn.Linear(hidden_dim, hidden_dim) 34 | self.drop2 = nn.Dropout() 35 | self.decpose = nn.Linear(hidden_dim, npose) 36 | self.decshape = nn.Linear(hidden_dim, nshape) 37 | self.deccam = nn.Linear(hidden_dim, 3) 38 | nn.init.xavier_uniform_(self.decpose.weight, gain=0.01) 39 | nn.init.xavier_uniform_(self.decshape.weight, gain=0.01) 40 | nn.init.xavier_uniform_(self.deccam.weight, gain=0.01) 41 | 42 | mean_params = np.load(smpl_mean_params) 43 | init_pose = torch.from_numpy(mean_params['pose'][:]).unsqueeze(0) 44 | init_shape = torch.from_numpy(mean_params['shape'][:].astype('float32')).unsqueeze(0) 45 | init_cam = torch.from_numpy(mean_params['cam']).unsqueeze(0) 46 | self.register_buffer('init_pose', init_pose) 47 | self.register_buffer('init_shape', init_shape) 48 | self.register_buffer('init_cam', init_cam) 49 | 50 | 51 | def iterative_regress(self, x, init_pose=None, init_shape=None, init_cam=None, n_iter=3): 52 | nt = x.shape[0] 53 | 54 | if init_pose is None: 55 | init_pose = self.init_pose.expand(nt, -1) 56 | if init_shape is None: 57 | init_shape = self.init_shape.expand(nt, -1) 58 | if init_cam is None: 59 | init_cam = self.init_cam.expand(nt, -1) 60 | 61 | pred_pose = init_pose 62 | pred_shape = init_shape 63 | pred_cam = init_cam 64 | for i in range(n_iter): 65 | xc = torch.cat([x, pred_pose, pred_shape, pred_cam], 1) 66 | xc = self.fc1(xc) 67 | xc = self.drop1(xc) 68 | xc = self.fc2(xc) 69 | xc = self.drop2(xc) 70 | pred_pose = self.decpose(xc) + pred_pose 71 | pred_shape = self.decshape(xc) + pred_shape 72 | pred_cam = self.deccam(xc) + pred_cam 73 | 74 | return pred_pose, pred_shape, pred_cam 75 | 76 | def forward(self, x, seqlen, J_regressor=None, 77 | init_pose=None, init_shape=None, init_cam=None, n_iter=3, **kwargs): 78 | nt = x.shape[0] 79 | N = nt//seqlen 80 | 81 | pred_pose, pred_shape, pred_cam = self.iterative_regress(x, init_pose, init_shape, init_cam, n_iter=3) 82 | output_regress = self.get_output(pred_pose, pred_shape, pred_cam, J_regressor) 83 | 84 | return output_regress 85 | 86 | 87 | def get_output(self, pred_pose, pred_shape, pred_cam, J_regressor): 88 | output = {} 89 | nt = pred_pose.shape[0] 90 | pred_rotmat = rot6d_to_rotmat(pred_pose).reshape(nt, -1, 3, 3) 91 | 92 | pred_output = self.smpl( 93 | betas=pred_shape, 94 | body_pose=pred_rotmat[:, 1:], 95 | global_orient=pred_rotmat[:, 0].unsqueeze(1), 96 | pose2rot=False 97 | ) 98 | pred_vertices = pred_output.vertices[:nt] 99 | pred_joints = pred_output.joints[:nt] 100 | if J_regressor is not None: 101 | J_regressor_batch = J_regressor[None, :].expand(pred_vertices.shape[0], -1, -1).to(pred_vertices.device) 102 | pred_joints = torch.matmul(J_regressor_batch, pred_vertices) 103 | pred_keypoints_2d = projection(pred_joints, pred_cam) 104 | pose = rotation_matrix_to_angle_axis(pred_rotmat.reshape(-1, 3, 3)).reshape(nt, -1) 105 | output['theta'] = torch.cat([pred_cam, pose, pred_shape], dim=1) 106 | output['verts'] = pred_vertices 107 | output['kp_2d'] = pred_keypoints_2d 108 | output['kp_3d'] = pred_joints 109 | output['rotmat'] = pred_rotmat 110 | return output 111 | 112 | 113 | def projection(pred_joints, pred_camera): 114 | pred_cam_t = torch.stack([pred_camera[:, 1], 115 | pred_camera[:, 2], 116 | 2 * 5000. / (224. * pred_camera[:, 0] + 1e-9)], dim=-1) 117 | batch_size = pred_joints.shape[0] 118 | camera_center = torch.zeros(batch_size, 2) 119 | pred_keypoints_2d = perspective_projection(pred_joints, 120 | rotation=torch.eye(3).unsqueeze(0).expand(batch_size, -1, -1).to(pred_joints.device), 121 | translation=pred_cam_t, 122 | focal_length=5000., 123 | camera_center=camera_center) 124 | # Normalize keypoints to [-1,1] 125 | pred_keypoints_2d = pred_keypoints_2d / (224. / 2.) 126 | return pred_keypoints_2d 127 | 128 | 129 | def perspective_projection(points, rotation, translation, 130 | focal_length, camera_center): 131 | """ 132 | This function computes the perspective projection of a set of points. 133 | Input: 134 | points (bs, N, 3): 3D points 135 | rotation (bs, 3, 3): Camera rotation 136 | translation (bs, 3): Camera translation 137 | focal_length (bs,) or scalar: Focal length 138 | camera_center (bs, 2): Camera center 139 | """ 140 | batch_size = points.shape[0] 141 | K = torch.zeros([batch_size, 3, 3], device=points.device) 142 | K[:,0,0] = focal_length 143 | K[:,1,1] = focal_length 144 | K[:,2,2] = 1. 145 | K[:,:-1, -1] = camera_center 146 | 147 | # Transform points 148 | points = torch.einsum('bij,bkj->bki', rotation, points) 149 | points = points + translation.unsqueeze(1) 150 | 151 | # Apply perspective distortion 152 | projected_points = points / points[:,:,-1].unsqueeze(-1) 153 | 154 | # Apply camera intrinsics 155 | projected_points = torch.einsum('bij,bkj->bki', K, projected_points) 156 | 157 | return projected_points[:, :, :-1] 158 | -------------------------------------------------------------------------------- /lib/models/tokenpose.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import numpy as np 4 | # , HybridEmbed, PatchEmbed, Block 5 | from lib.models.vision_transformer import VisionTransformer, trunc_normal_, Block 6 | from lib.utils.geometry import rot6d_to_rotmat, rotation_matrix_to_angle_axis 7 | from lib.models.smpl import SMPL, SMPL_MODEL_DIR, SMPL_MEAN_PARAMS 8 | from lib.models.spin import projection 9 | 10 | 11 | class TokenPoseRot6d(VisionTransformer): 12 | def __init__(self, img_size=224, joints_num=24, pred_rot_dim=6, patch_size=16, in_chans=3, num_classes=1000, embed_dim=768, depth=12, 13 | num_heads=12, mlp_ratio=4, qkv_bias=False, qk_scale=None, representation_size=None, 14 | drop_rate=0, attn_drop_rate=0, drop_path_rate=0, hybrid_backbone=None, 15 | token_init_mode='normal', proj_rot_mode='linear', use_joint2d_head=False, 16 | norm_layer=nn.LayerNorm, st_mode='vanilla', contraint_token_delta=False, seq_length=16, 17 | use_rot6d_to_token_head=False, mask_ratio=0., 18 | temporal_layers=3, temporal_num_heads=1, 19 | enable_temp_modeling=True, enable_temp_embedding=False): 20 | 21 | super().__init__(img_size=img_size, patch_size=patch_size, in_chans=in_chans, 22 | num_classes=num_classes, embed_dim=embed_dim, depth=depth, num_heads=num_heads, 23 | mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale, 24 | representation_size=representation_size, drop_rate=drop_rate, 25 | attn_drop_rate=attn_drop_rate, drop_path_rate=drop_path_rate, 26 | hybrid_backbone=hybrid_backbone, norm_layer=norm_layer, st_mode=st_mode) 27 | 28 | # joints tokens 29 | self.joint3d_tokens = nn.Parameter(torch.zeros(1, joints_num, embed_dim)) 30 | self.shape_token = nn.Parameter(torch.zeros(1, 1, embed_dim)) 31 | self.cam_token = nn.Parameter(torch.zeros(1, 1, embed_dim)) 32 | 33 | self.joints_num = joints_num 34 | 35 | self._init_tokens(mode=token_init_mode) 36 | 37 | self.return_tokens = contraint_token_delta 38 | 39 | self.joint3d_head = nn.Linear(embed_dim, pred_rot_dim) 40 | self.shape_head = nn.Linear(embed_dim, 10) 41 | self.cam_head = nn.Linear(embed_dim, 3) 42 | 43 | self.apply(self._init_weights) 44 | 45 | self.smpl = SMPL( 46 | SMPL_MODEL_DIR, 47 | create_transl=False, 48 | create_global_orient=False, 49 | create_body_pose=False, 50 | create_betas=False, 51 | ) 52 | 53 | self.enable_temp_modeling = False 54 | self.reconstruct = False 55 | 56 | if enable_temp_modeling: 57 | self.enable_temp_modeling = enable_temp_modeling 58 | # stochastic depth decay rule 59 | dpr = [x.item() for x in torch.linspace(0, drop_path_rate, depth)] 60 | self.temporal_transformer = nn.ModuleList([ 61 | Block(dim=embed_dim, num_heads=temporal_num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale, 62 | drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer, 63 | st_mode='vanilla') for i in range(temporal_layers)]) 64 | 65 | self.enable_pos_embedding = False 66 | if enable_temp_embedding: 67 | self.enable_pos_embedding = True 68 | self.temporal_pos_embedding = nn.Parameter(torch.zeros(1, seq_length, embed_dim)) 69 | trunc_normal_(self.temporal_pos_embedding, std=.02) 70 | 71 | self.mask_ratio = mask_ratio 72 | 73 | if mask_ratio > 0.: 74 | self.reconstruct = True 75 | self.mask_token = nn.Parameter(torch.zeros(1, 1, embed_dim)) 76 | 77 | del self.head, self.pre_logits 78 | 79 | def random_masking(self, x, mask_ratio): 80 | """ 81 | Perform per-sample random masking by per-sample shuffling. 82 | Per-sample shuffling is done by argsort random noise. 83 | x: [N, L, D], sequence 84 | """ 85 | N, L, D = x.shape # batch, length, dim 86 | len_keep = int(L * (1 - mask_ratio)) 87 | 88 | noise = torch.rand(N, L, device=x.device) # noise in [0, 1] 89 | 90 | # sort noise for each sample 91 | # ascend: small is keep, large is remove 92 | ids_shuffle = torch.argsort(noise, dim=1) 93 | ids_restore = torch.argsort(ids_shuffle, dim=1) 94 | 95 | # keep the first subset 96 | # ids_keep = ids_shuffle[:, :len_keep] 97 | # x_keep = torch.gather(x, dim=1, index=ids_keep.unsqueeze(-1).repeat(1, 1, D)) 98 | 99 | # generate the binary mask: 0 is keep, 1 is remove 100 | mask = torch.ones([N, L], device=x.device) 101 | mask[:, :len_keep] = 0 102 | # unshuffle to get the binary mask 103 | mask = torch.gather(mask, dim=1, index=ids_restore) 104 | 105 | x[mask.long(), :] = self.mask_token # .expand(mask.shape[0], mask.shape[1], -1) 106 | x_masked = x 107 | 108 | return x_masked 109 | 110 | def _init_tokens(self, mode='normal'): 111 | if mode == 'normal': 112 | trunc_normal_(self.joint3d_tokens, std=.02) 113 | trunc_normal_(self.shape_token, std=.02) 114 | trunc_normal_(self.cam_token, std=.02) 115 | else: 116 | print("zero initialize tokens") 117 | pass 118 | 119 | def forward_features(self, x, seqlen=1): 120 | B = x.shape[0] # (NT, 3, H, W) 121 | 122 | x = self.patch_embed(x) # (NT, 14*14, 2048) (bs, seq, embedding_size) 123 | 124 | joint3d_tokens = self.joint3d_tokens.expand(B, -1, -1) 125 | shape_token = self.shape_token.expand(B, -1, -1) 126 | cam_token = self.cam_token.expand(B, -1, -1) 127 | 128 | cls_token = self.cls_token.expand(B, -1, -1) 129 | x = torch.cat([cls_token, x], dim=1) 130 | x = x + self.pos_embed[:, :, :] 131 | 132 | x = torch.cat([joint3d_tokens, shape_token, cam_token, x], dim=1) 133 | # [NT, HW+24+1+1， embedding_size] 134 | 135 | x = self.pos_drop(x) 136 | 137 | for blk in self.blocks: 138 | x = blk(x, seqlen) 139 | 140 | x = self.norm(x) 141 | joint3d_tokens = x[:, :self.joints_num] 142 | shape_token = x[:, self.joints_num] 143 | cam_token = x[:, self.joints_num + 1] 144 | 145 | return joint3d_tokens, shape_token, cam_token 146 | 147 | def temporal_modeling(self, joint3d_tokens, seq_length, mask_ratio=0.): 148 | # joint3d_tokens [B, N, C] 149 | B, N, C = joint3d_tokens.shape 150 | joint3d_tokens = joint3d_tokens.reshape(-1,seq_length, N, C).permute(0, 2, 1, 3) 151 | joint3d_tokens_temporal = joint3d_tokens.reshape(-1, seq_length, C) 152 | 153 | # [bs*N, seq_length, C] 154 | if self.enable_pos_embedding: 155 | if self.temporal_pos_embedding.shape[1] !=seq_length: 156 | 157 | temporal_pos_embedding = torch.nn.functional.interpolate( 158 | self.temporal_pos_embedding.data.permute(0,2,1), 159 | size=seq_length, 160 | mode='linear' 161 | ).permute(0, 2, 1) 162 | self.temporal_pos_embedding = torch.nn.Parameter(temporal_pos_embedding) 163 | joint3d_tokens_temporal += self.temporal_pos_embedding[:, :seq_length, :] 164 | 165 | if self.training and mask_ratio > 0.: 166 | joint3d_tokens_temporal_masked = self.random_masking( 167 | joint3d_tokens_temporal, mask_ratio) 168 | else: 169 | joint3d_tokens_temporal_masked = joint3d_tokens_temporal 170 | 171 | for blk in self.temporal_transformer: 172 | joint3d_tokens_temporal_masked = blk(joint3d_tokens_temporal_masked) 173 | 174 | pred_joint3d_tokens_temporal = joint3d_tokens_temporal_masked.reshape( 175 | -1, N, seq_length, C).permute(0, 2, 1, 3) 176 | pred_joint3d_tokens_temporal = pred_joint3d_tokens_temporal.reshape(B, N, C) 177 | return pred_joint3d_tokens_temporal 178 | 179 | def forward(self, x, J_regressor=None, **kwargs): 180 | 181 | batch_size, seqlen = x.shape[:2] 182 | x = x.reshape(-1, x.shape[-3], x.shape[-2], x.shape[-1]) # (NT, 3, H, W) 183 | 184 | joint3d_tokens, shape_token, cam_token = self.forward_features(x, seqlen) 185 | 186 | if self.enable_temp_modeling: 187 | joint3d_tokens_before = joint3d_tokens.clone().detach_() 188 | joint3d_tokens = self.temporal_modeling( 189 | joint3d_tokens, seqlen, mask_ratio=self.mask_ratio) 190 | 191 | # # [bs*seq_length, N, embed_dim] 192 | pred_joints_rot6d = self.joint3d_head(joint3d_tokens) # [b, 24, 6] 193 | pred_shape = self.shape_head(shape_token) 194 | pred_cam = self.cam_head(cam_token) 195 | 196 | output = {} 197 | 198 | # mse loss 199 | if self.reconstruct and self.training: 200 | reconstruct_loss = (joint3d_tokens - joint3d_tokens_before)**2 201 | reconstruct_loss = reconstruct_loss.mean() 202 | output['reconstruct_loss'] = reconstruct_loss 203 | 204 | nt = pred_joints_rot6d.shape[0] 205 | pred_rotmat = rot6d_to_rotmat(pred_joints_rot6d).reshape(nt, -1, 3, 3) 206 | 207 | pred_output = self.smpl( 208 | betas=pred_shape, 209 | body_pose=pred_rotmat[:, 1:], 210 | global_orient=pred_rotmat[:, 0].unsqueeze(1), 211 | pose2rot=False 212 | ) 213 | 214 | pred_vertices = pred_output.vertices[:nt] 215 | pred_joints = pred_output.joints[:nt] 216 | 217 | if J_regressor is not None: 218 | J_regressor_batch = J_regressor[None, :].expand( 219 | pred_vertices.shape[0], -1, -1).to(pred_vertices.device) 220 | pred_joints = torch.matmul(J_regressor_batch, pred_vertices) 221 | 222 | pred_keypoints_2d = projection(pred_joints, pred_cam) 223 | 224 | pose = rotation_matrix_to_angle_axis( 225 | pred_rotmat.reshape(-1, 3, 3)).reshape(nt, -1) 226 | 227 | output['theta'] = torch.cat([pred_cam, pose, pred_shape], dim=1) 228 | output['verts'] = pred_vertices 229 | output['kp_2d'] = pred_keypoints_2d 230 | output['kp_3d'] = pred_joints 231 | output['rotmat'] = pred_rotmat 232 | 233 | output['theta'] = output['theta'].reshape(batch_size, seqlen, -1) 234 | output['verts'] = output['verts'].reshape(batch_size, seqlen, -1, 3) 235 | output['kp_2d'] = output['kp_2d'].reshape(batch_size, seqlen, -1, 2) 236 | output['kp_3d'] = output['kp_3d'].reshape(batch_size, seqlen, -1, 3) 237 | output['rotmat'] = output['rotmat'].reshape(batch_size, seqlen, -1, 3, 3) 238 | 239 | return output 240 | 241 | 242 | from lib.models.resnetv2 import ResNetV2 243 | import torch.utils.model_zoo as model_zoo 244 | from lib.models.vision_transformer import _conv_filter, model_urls 245 | from functools import partial 246 | 247 | 248 | def Token3d(num_blocks, num_heads, st_mode, pretrained=True, proj_rot_mode='linear', 249 | use_joint2d_head=False, contraint_token_delta=False, 250 | use_rot6d_to_token_head=False, mask_ratio=0., 251 | temporal_layers=3, temporal_num_heads=1, 252 | enable_temp_modeling=True, enable_temp_embedding=False, 253 | **kwargs): 254 | """ Hybrid model with a R50 and a Vit of custom layers . 255 | """ 256 | # create a ResNetV2 w/o pre-activation, that uses StdConv and GroupNorm and has 3 stages, no head 257 | backbone = ResNetV2( 258 | layers=(3, 4, 9), num_classes=0, global_pool='', in_chans=kwargs.get('in_chans', 3), 259 | preact=False, stem_type='same') 260 | model = TokenPoseRot6d( 261 | patch_size=16, embed_dim=768, depth=num_blocks, num_heads=num_heads, 262 | hybrid_backbone=backbone, mlp_ratio=4, qkv_bias=True, 263 | representation_size=768, norm_layer=partial(nn.LayerNorm, eps=1e-6), 264 | st_mode=st_mode, proj_rot_mode=proj_rot_mode, 265 | use_joint2d_head=use_joint2d_head, 266 | contraint_token_delta=contraint_token_delta, 267 | use_rot6d_to_token_head=use_rot6d_to_token_head, 268 | mask_ratio=mask_ratio, 269 | temporal_layers=temporal_layers, 270 | temporal_num_heads=temporal_num_heads, 271 | enable_temp_modeling=enable_temp_modeling, 272 | enable_temp_embedding=enable_temp_embedding, 273 | **kwargs) 274 | if pretrained: 275 | state_dict = model_zoo.load_url( 276 | model_urls['vit_base_resnet50_224_in21k'], progress=False, map_location='cpu') 277 | state_dict = _conv_filter(state_dict) 278 | del state_dict['head.weight'] 279 | del state_dict['head.bias'] 280 | model.load_state_dict(state_dict, strict=False) 281 | return model -------------------------------------------------------------------------------- /lib/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yangsenius/INT_HMR_Model/8f7ee91bf8272fce37b571c02e5df49c5cd13b20/lib/utils/__init__.py -------------------------------------------------------------------------------- /lib/utils/demo_utils.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | This script is brought from https://github.com/mkocabas/VIBE 4 | Adhere to their licence to use this script 5 | """ 6 | 7 | import os 8 | import cv2 9 | import time 10 | import json 11 | import torch 12 | import subprocess 13 | import numpy as np 14 | import os.path as osp 15 | from pytube import YouTube 16 | from collections import OrderedDict 17 | 18 | from lib.utils.smooth_bbox import get_smooth_bbox_params, get_all_bbox_params 19 | from lib.utils.geometry import rotation_matrix_to_angle_axis 20 | 21 | 22 | def download_youtube_clip(url, download_folder): 23 | return YouTube(url).streams.first().download(output_path=download_folder) 24 | 25 | 26 | def trim_videos(filename, start_time, end_time, output_filename): 27 | command = ['ffmpeg', 28 | '-i', '"%s"' % filename, 29 | '-ss', str(start_time), 30 | '-t', str(end_time - start_time), 31 | '-c:v', 'libx264', '-c:a', 'copy', 32 | '-threads', '1', 33 | '-loglevel', 'panic', 34 | '"%s"' % output_filename] 35 | # command = ' '.join(command) 36 | subprocess.call(command) 37 | 38 | 39 | def video_to_images(vid_file, img_folder=None, return_info=False): 40 | if img_folder is None: 41 | img_folder = osp.join('/tmp', osp.basename(vid_file).replace('.', '_')) 42 | 43 | os.makedirs(img_folder, exist_ok=True) 44 | 45 | command = ['/mnt/lustre/wanziniu/ffmpeg-4.3.1-amd64-static/ffmpeg', 46 | '-i', vid_file, 47 | '-f', 'image2', 48 | '-v', 'error', 49 | f'{img_folder}/%06d.png'] 50 | print(f'Running \"{" ".join(command)}\"') 51 | subprocess.call(command) 52 | 53 | print(f'Images saved to \"{img_folder}\"') 54 | 55 | img_shape = cv2.imread(osp.join(img_folder, '000001.png')).shape 56 | 57 | if return_info: 58 | return img_folder, len(os.listdir(img_folder)), img_shape 59 | else: 60 | return img_folder 61 | 62 | 63 | def download_url(url, outdir): 64 | print(f'Downloading files from {url}') 65 | cmd = ['wget', '-c', url, '-P', outdir] 66 | subprocess.call(cmd) 67 | 68 | 69 | def download_ckpt(outdir='data/vibe_data', use_3dpw=False): 70 | os.makedirs(outdir, exist_ok=True) 71 | 72 | if use_3dpw: 73 | ckpt_file = 'data/vibe_data/vibe_model_w_3dpw.pth.tar' 74 | url = 'https://www.dropbox.com/s/41ozgqorcp095ja/vibe_model_w_3dpw.pth.tar' 75 | if not os.path.isfile(ckpt_file): 76 | download_url(url=url, outdir=outdir) 77 | else: 78 | ckpt_file = 'data/vibe_data/vibe_model_wo_3dpw.pth.tar' 79 | url = 'https://www.dropbox.com/s/amj2p8bmf6g56k6/vibe_model_wo_3dpw.pth.tar' 80 | if not os.path.isfile(ckpt_file): 81 | download_url(url=url, outdir=outdir) 82 | 83 | return ckpt_file 84 | 85 | 86 | def images_to_video(img_folder, output_vid_file): 87 | os.makedirs(img_folder, exist_ok=True) 88 | 89 | command = [ 90 | 'ffmpeg', '-y', '-threads', '16', '-i', f'{img_folder}/%06d.png', '-profile:v', 'baseline', 91 | '-level', '3.0', '-c:v', 'libx264', '-pix_fmt', 'yuv420p', '-an', '-v', 'error', output_vid_file, 92 | ] 93 | 94 | print(f'Running \"{" ".join(command)}\"') 95 | subprocess.call(command) 96 | 97 | 98 | def convert_crop_cam_to_orig_img(cam, bbox, img_width, img_height): 99 | ''' 100 | Convert predicted camera from cropped image coordinates 101 | to original image coordinates 102 | :param cam (ndarray, shape=(3,)): weak perspective camera in cropped img coordinates 103 | :param bbox (ndarray, shape=(4,)): bbox coordinates (c_x, c_y, h) 104 | :param img_width (int): original image width 105 | :param img_height (int): original image height 106 | :return: 107 | ''' 108 | cx, cy, w, h = bbox[:,0], bbox[:,1], bbox[:,2], bbox[:,3] 109 | hw, hh = img_width / 2., img_height / 2. 110 | sx = cam[:,0] * (1. / (img_width / w)) 111 | sy = cam[:,0] * (1. / (img_height / h)) 112 | tx = ((cx - hw) / hw / sx) + cam[:,1] 113 | ty = ((cy - hh) / hh / sy) + cam[:,2] 114 | orig_cam = np.stack([sx, sy, tx, ty]).T 115 | return orig_cam 116 | 117 | 118 | def prepare_rendering_results(vibe_results, nframes): 119 | frame_results = [{} for _ in range(nframes)] 120 | for person_id, person_data in vibe_results.items(): 121 | for idx, frame_id in enumerate(person_data['frame_ids']): 122 | frame_results[frame_id][person_id] = { 123 | 'verts': person_data['verts'][idx], 124 | 'cam': person_data['orig_cam'][idx], 125 | } 126 | 127 | # naive depth ordering based on the scale of the weak perspective camera 128 | for frame_id, frame_data in enumerate(frame_results): 129 | # sort based on y-scale of the cam in original image coords 130 | sort_idx = np.argsort([v['cam'][1] for k,v in frame_data.items()]) 131 | frame_results[frame_id] = OrderedDict( 132 | {list(frame_data.keys())[i]:frame_data[list(frame_data.keys())[i]] for i in sort_idx} 133 | ) 134 | 135 | return frame_results 136 | -------------------------------------------------------------------------------- /lib/utils/eval_utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | Some functions are borrowed from https://github.com/akanazawa/human_dynamics/blob/master/src/evaluation/eval_util.py 3 | Adhere to their licence to use these functions 4 | """ 5 | 6 | import torch 7 | import numpy as np 8 | 9 | 10 | def compute_accel(joints): 11 | """ 12 | Computes acceleration of 3D joints. 13 | Args: 14 | joints (Nx25x3). 15 | Returns: 16 | Accelerations (N-2). 17 | """ 18 | velocities = joints[1:] - joints[:-1] 19 | acceleration = velocities[1:] - velocities[:-1] 20 | acceleration_normed = np.linalg.norm(acceleration, axis=2) 21 | return np.mean(acceleration_normed, axis=1) 22 | 23 | 24 | def compute_error_accel(joints_gt, joints_pred, vis=None): 25 | """ 26 | Computes acceleration error: 27 | 1/(n-2) \sum_{i=1}^{n-1} X_{i-1} - 2X_i + X_{i+1} 28 | Note that for each frame that is not visible, three entries in the 29 | acceleration error should be zero'd out. 30 | Args: 31 | joints_gt (Nx14x3). 32 | joints_pred (Nx14x3). 33 | vis (N). 34 | Returns: 35 | error_accel (N-2). 36 | """ 37 | # (N-2)x14x3 38 | accel_gt = joints_gt[:-2] - 2 * joints_gt[1:-1] + joints_gt[2:] 39 | accel_pred = joints_pred[:-2] - 2 * joints_pred[1:-1] + joints_pred[2:] 40 | 41 | normed = np.linalg.norm(accel_pred - accel_gt, axis=2) 42 | 43 | if vis is None: 44 | new_vis = np.ones(len(normed), dtype=bool) 45 | else: 46 | invis = np.logical_not(vis) 47 | invis1 = np.roll(invis, -1) 48 | invis2 = np.roll(invis, -2) 49 | new_invis = np.logical_or(invis, np.logical_or(invis1, invis2))[:-2] 50 | new_vis = np.logical_not(new_invis) 51 | 52 | return np.mean(normed[new_vis], axis=1) 53 | 54 | 55 | def compute_error_verts(pred_verts, target_verts=None, target_theta=None): 56 | """ 57 | Computes MPJPE over 6890 surface vertices. 58 | Args: 59 | verts_gt (Nx6890x3). 60 | verts_pred (Nx6890x3). 61 | Returns: 62 | error_verts (N). 63 | """ 64 | 65 | if target_verts is None: 66 | from lib.models.smpl import SMPL_MODEL_DIR 67 | from lib.models.smpl import SMPL 68 | device = 'cpu' 69 | smpl = SMPL( 70 | SMPL_MODEL_DIR, 71 | batch_size=1, # target_theta.shape[0], 72 | ).to(device) 73 | 74 | betas = torch.from_numpy(target_theta[:,75:]).to(device) 75 | pose = torch.from_numpy(target_theta[:,3:75]).to(device) 76 | 77 | target_verts = [] 78 | b_ = torch.split(betas, 5000) 79 | p_ = torch.split(pose, 5000) 80 | 81 | for b,p in zip(b_,p_): 82 | output = smpl(betas=b, body_pose=p[:, 3:], global_orient=p[:, :3], pose2rot=True) 83 | target_verts.append(output.vertices.detach().cpu().numpy()) 84 | 85 | target_verts = np.concatenate(target_verts, axis=0) 86 | 87 | assert len(pred_verts) == len(target_verts) 88 | error_per_vert = np.sqrt(np.sum((target_verts - pred_verts) ** 2, axis=2)) 89 | return np.mean(error_per_vert, axis=1) 90 | 91 | 92 | def compute_similarity_transform(S1, S2): 93 | ''' 94 | Computes a similarity transform (sR, t) that takes 95 | a set of 3D points S1 (3 x N) closest to a set of 3D points S2, 96 | where R is an 3x3 rotation matrix, t 3x1 translation, s scale. 97 | i.e. solves the orthogonal Procrutes problem. 98 | ''' 99 | transposed = False 100 | if S1.shape[0] != 3 and S1.shape[0] != 2: 101 | S1 = S1.T 102 | S2 = S2.T 103 | transposed = True 104 | assert(S2.shape[1] == S1.shape[1]) 105 | 106 | # 1. Remove mean. 107 | mu1 = S1.mean(axis=1, keepdims=True) 108 | mu2 = S2.mean(axis=1, keepdims=True) 109 | X1 = S1 - mu1 110 | X2 = S2 - mu2 111 | 112 | # 2. Compute variance of X1 used for scale. 113 | var1 = np.sum(X1**2) 114 | 115 | # 3. The outer product of X1 and X2. 116 | K = X1.dot(X2.T) 117 | 118 | # 4. Solution that Maximizes trace(R'K) is R=U*V', where U, V are 119 | # singular vectors of K. 120 | U, s, Vh = np.linalg.svd(K) 121 | V = Vh.T 122 | # Construct Z that fixes the orientation of R to get det(R)=1. 123 | Z = np.eye(U.shape[0]) 124 | Z[-1, -1] *= np.sign(np.linalg.det(U.dot(V.T))) 125 | # Construct R. 126 | R = V.dot(Z.dot(U.T)) 127 | 128 | # 5. Recover scale. 129 | scale = np.trace(R.dot(K)) / var1 130 | 131 | # 6. Recover translation. 132 | t = mu2 - scale*(R.dot(mu1)) 133 | 134 | # 7. Error: 135 | S1_hat = scale*R.dot(S1) + t 136 | 137 | if transposed: 138 | S1_hat = S1_hat.T 139 | 140 | return S1_hat 141 | 142 | 143 | def compute_similarity_transform_torch(S1, S2): 144 | ''' 145 | Computes a similarity transform (sR, t) that takes 146 | a set of 3D points S1 (3 x N) closest to a set of 3D points S2, 147 | where R is an 3x3 rotation matrix, t 3x1 translation, s scale. 148 | i.e. solves the orthogonal Procrutes problem. 149 | ''' 150 | transposed = False 151 | if S1.shape[0] != 3 and S1.shape[0] != 2: 152 | S1 = S1.T 153 | S2 = S2.T 154 | transposed = True 155 | assert (S2.shape[1] == S1.shape[1]) 156 | 157 | # 1. Remove mean. 158 | mu1 = S1.mean(axis=1, keepdims=True) 159 | mu2 = S2.mean(axis=1, keepdims=True) 160 | X1 = S1 - mu1 161 | X2 = S2 - mu2 162 | 163 | # print('X1', X1.shape) 164 | 165 | # 2. Compute variance of X1 used for scale. 166 | var1 = torch.sum(X1 ** 2) 167 | 168 | # print('var', var1.shape) 169 | 170 | # 3. The outer product of X1 and X2. 171 | K = X1.mm(X2.T) 172 | 173 | # 4. Solution that Maximizes trace(R'K) is R=U*V', where U, V are 174 | # singular vectors of K. 175 | U, s, V = torch.svd(K) 176 | # V = Vh.T 177 | # Construct Z that fixes the orientation of R to get det(R)=1. 178 | Z = torch.eye(U.shape[0], device=S1.device) 179 | Z[-1, -1] *= torch.sign(torch.det(U @ V.T)) 180 | # Construct R. 181 | R = V.mm(Z.mm(U.T)) 182 | 183 | # print('R', X1.shape) 184 | 185 | # 5. Recover scale. 186 | scale = torch.trace(R.mm(K)) / var1 187 | # print(R.shape, mu1.shape) 188 | # 6. Recover translation. 189 | t = mu2 - scale * (R.mm(mu1)) 190 | # print(t.shape) 191 | 192 | # 7. Error: 193 | S1_hat = scale * R.mm(S1) + t 194 | 195 | if transposed: 196 | S1_hat = S1_hat.T 197 | 198 | return S1_hat 199 | 200 | 201 | def batch_compute_similarity_transform_torch(S1, S2): 202 | ''' 203 | Computes a similarity transform (sR, t) that takes 204 | a set of 3D points S1 (3 x N) closest to a set of 3D points S2, 205 | where R is an 3x3 rotation matrix, t 3x1 translation, s scale. 206 | i.e. solves the orthogonal Procrutes problem. 207 | ''' 208 | transposed = False 209 | if S1.shape[0] != 3 and S1.shape[0] != 2: 210 | S1 = S1.permute(0,2,1) 211 | S2 = S2.permute(0,2,1) 212 | transposed = True 213 | assert(S2.shape[1] == S1.shape[1]) 214 | 215 | # 1. Remove mean. 216 | mu1 = S1.mean(axis=-1, keepdims=True) 217 | mu2 = S2.mean(axis=-1, keepdims=True) 218 | 219 | X1 = S1 - mu1 220 | X2 = S2 - mu2 221 | 222 | # 2. Compute variance of X1 used for scale. 223 | var1 = torch.sum(X1**2, dim=1).sum(dim=1) 224 | 225 | # 3. The outer product of X1 and X2. 226 | K = X1.bmm(X2.permute(0,2,1)) 227 | 228 | # 4. Solution that Maximizes trace(R'K) is R=U*V', where U, V are 229 | # singular vectors of K. 230 | U, s, V = torch.svd(K) 231 | 232 | # Construct Z that fixes the orientation of R to get det(R)=1. 233 | Z = torch.eye(U.shape[1], device=S1.device).unsqueeze(0) 234 | Z = Z.repeat(U.shape[0],1,1) 235 | Z[:,-1, -1] *= torch.sign(torch.det(U.bmm(V.permute(0,2,1)))) 236 | 237 | # Construct R. 238 | R = V.bmm(Z.bmm(U.permute(0,2,1))) 239 | 240 | # 5. Recover scale. 241 | scale = torch.cat([torch.trace(x).unsqueeze(0) for x in R.bmm(K)]) / var1 242 | 243 | # 6. Recover translation. 244 | t = mu2 - (scale.unsqueeze(-1).unsqueeze(-1) * (R.bmm(mu1))) 245 | 246 | # 7. Error: 247 | S1_hat = scale.unsqueeze(-1).unsqueeze(-1) * R.bmm(S1) + t 248 | 249 | if transposed: 250 | S1_hat = S1_hat.permute(0,2,1) 251 | 252 | return S1_hat 253 | 254 | 255 | def align_by_pelvis(joints): 256 | """ 257 | Assumes joints is 14 x 3 in LSP order. 258 | Then hips are: [3, 2] 259 | Takes mid point of these points, then subtracts it. 260 | """ 261 | 262 | left_id = 2 263 | right_id = 3 264 | 265 | pelvis = (joints[left_id, :] + joints[right_id, :]) / 2.0 266 | return joints - np.expand_dims(pelvis, axis=0) 267 | 268 | 269 | def compute_errors(gt3ds, preds): 270 | """ 271 | Gets MPJPE after pelvis alignment + MPJPE after Procrustes. 272 | Evaluates on the 14 common joints. 273 | Inputs: 274 | - gt3ds: N x 14 x 3 275 | - preds: N x 14 x 3 276 | """ 277 | errors, errors_pa = [], [] 278 | for i, (gt3d, pred) in enumerate(zip(gt3ds, preds)): 279 | gt3d = gt3d.reshape(-1, 3) 280 | # Root align. 281 | gt3d = align_by_pelvis(gt3d) 282 | pred3d = align_by_pelvis(pred) 283 | 284 | joint_error = np.sqrt(np.sum((gt3d - pred3d)**2, axis=1)) 285 | errors.append(np.mean(joint_error)) 286 | 287 | # Get PA error. 288 | pred3d_sym = compute_similarity_transform(pred3d, gt3d) 289 | pa_error = np.sqrt(np.sum((gt3d - pred3d_sym)**2, axis=1)) 290 | errors_pa.append(np.mean(pa_error)) 291 | 292 | return errors, errors_pa 293 | -------------------------------------------------------------------------------- /lib/utils/fbx_output.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | This script is brought from https://github.com/mkocabas/VIBE 4 | Adhere to their licence to use this script 5 | """ 6 | # Notes: 7 | # + Male and female gender models only 8 | # + Script can be run from command line or in Blender Editor (Text Editor>Run Script) 9 | # + Command line: Install mathutils module in your bpy virtualenv with 'pip install mathutils==2.81.2' 10 | 11 | import os 12 | import sys 13 | import bpy 14 | import time 15 | import joblib 16 | import argparse 17 | import numpy as np 18 | import addon_utils 19 | from math import radians 20 | from mathutils import Matrix, Vector, Quaternion, Euler 21 | 22 | # Globals 23 | male_model_path = 'data/SMPL_unity_v.1.0.0/smpl/Models/SMPL_m_unityDoubleBlends_lbs_10_scale5_207_v1.0.0.fbx' 24 | female_model_path = 'data/SMPL_unity_v.1.0.0/smpl/Models/SMPL_f_unityDoubleBlends_lbs_10_scale5_207_v1.0.0.fbx' 25 | 26 | fps_source = 30 27 | fps_target = 30 28 | 29 | gender = 'male' 30 | 31 | start_origin = 1 32 | 33 | bone_name_from_index = { 34 | 0 : 'Pelvis', 35 | 1 : 'L_Hip', 36 | 2 : 'R_Hip', 37 | 3 : 'Spine1', 38 | 4 : 'L_Knee', 39 | 5 : 'R_Knee', 40 | 6 : 'Spine2', 41 | 7 : 'L_Ankle', 42 | 8: 'R_Ankle', 43 | 9: 'Spine3', 44 | 10: 'L_Foot', 45 | 11: 'R_Foot', 46 | 12: 'Neck', 47 | 13: 'L_Collar', 48 | 14: 'R_Collar', 49 | 15: 'Head', 50 | 16: 'L_Shoulder', 51 | 17: 'R_Shoulder', 52 | 18: 'L_Elbow', 53 | 19: 'R_Elbow', 54 | 20: 'L_Wrist', 55 | 21: 'R_Wrist', 56 | 22: 'L_Hand', 57 | 23: 'R_Hand' 58 | } 59 | 60 | # Helper functions 61 | 62 | # Computes rotation matrix through Rodrigues formula as in cv2.Rodrigues 63 | # Source: smpl/plugins/blender/corrective_bpy_sh.py 64 | def Rodrigues(rotvec): 65 | theta = np.linalg.norm(rotvec) 66 | r = (rotvec/theta).reshape(3, 1) if theta > 0. else rotvec 67 | cost = np.cos(theta) 68 | mat = np.asarray([[0, -r[2], r[1]], 69 | [r[2], 0, -r[0]], 70 | [-r[1], r[0], 0]]) 71 | return(cost*np.eye(3) + (1-cost)*r.dot(r.T) + np.sin(theta)*mat) 72 | 73 | 74 | # Setup scene 75 | def setup_scene(model_path, fps_target): 76 | scene = bpy.data.scenes['Scene'] 77 | 78 | ########################### 79 | # Engine independent setup 80 | ########################### 81 | 82 | scene.render.fps = fps_target 83 | 84 | # Remove default cube 85 | if 'Cube' in bpy.data.objects: 86 | bpy.data.objects['Cube'].select_set(True) 87 | bpy.ops.object.delete() 88 | 89 | # Import gender specific .fbx template file 90 | bpy.ops.import_scene.fbx(filepath=model_path) 91 | 92 | 93 | # Process single pose into keyframed bone orientations 94 | def process_pose(current_frame, pose, trans, pelvis_position): 95 | 96 | if pose.shape[0] == 72: 97 | rod_rots = pose.reshape(24, 3) 98 | else: 99 | rod_rots = pose.reshape(26, 3) 100 | 101 | mat_rots = [Rodrigues(rod_rot) for rod_rot in rod_rots] 102 | 103 | # Set the location of the Pelvis bone to the translation parameter 104 | armature = bpy.data.objects['Armature'] 105 | bones = armature.pose.bones 106 | 107 | # Pelvis: X-Right, Y-Up, Z-Forward (Blender -Y) 108 | 109 | # Set absolute pelvis location relative to Pelvis bone head 110 | bones[bone_name_from_index[0]].location = Vector((100*trans[1], 100*trans[2], 100*trans[0])) - pelvis_position 111 | 112 | # bones['Root'].location = Vector(trans) 113 | bones[bone_name_from_index[0]].keyframe_insert('location', frame=current_frame) 114 | 115 | for index, mat_rot in enumerate(mat_rots, 0): 116 | if index >= 24: 117 | continue 118 | 119 | bone = bones[bone_name_from_index[index]] 120 | 121 | bone_rotation = Matrix(mat_rot).to_quaternion() 122 | quat_x_90_cw = Quaternion((1.0, 0.0, 0.0), radians(-90)) 123 | quat_z_90_cw = Quaternion((0.0, 0.0, 1.0), radians(-90)) 124 | 125 | if index == 0: 126 | # Rotate pelvis so that avatar stands upright and looks along negative Y avis 127 | bone.rotation_quaternion = (quat_x_90_cw @ quat_z_90_cw) @ bone_rotation 128 | else: 129 | bone.rotation_quaternion = bone_rotation 130 | 131 | bone.keyframe_insert('rotation_quaternion', frame=current_frame) 132 | 133 | return 134 | 135 | 136 | # Process all the poses from the pose file 137 | def process_poses( 138 | input_path, 139 | gender, 140 | fps_source, 141 | fps_target, 142 | start_origin, 143 | person_id=1, 144 | ): 145 | 146 | print('Processing: ' + input_path) 147 | 148 | data = joblib.load(input_path) 149 | poses = data[person_id]['pose'] 150 | trans = np.zeros((poses.shape[0], 3)) 151 | 152 | if gender == 'female': 153 | model_path = female_model_path 154 | for k,v in bone_name_from_index.items(): 155 | bone_name_from_index[k] = 'f_avg_' + v 156 | elif gender == 'male': 157 | model_path = male_model_path 158 | for k,v in bone_name_from_index.items(): 159 | bone_name_from_index[k] = 'm_avg_' + v 160 | else: 161 | print('ERROR: Unsupported gender: ' + gender) 162 | sys.exit(1) 163 | 164 | # Limit target fps to source fps 165 | if fps_target > fps_source: 166 | fps_target = fps_source 167 | 168 | print(f'Gender: {gender}') 169 | print(f'Number of source poses: {str(poses.shape[0])}') 170 | print(f'Source frames-per-second: {str(fps_source)}') 171 | print(f'Target frames-per-second: {str(fps_target)}') 172 | print('--------------------------------------------------') 173 | 174 | setup_scene(model_path, fps_target) 175 | 176 | scene = bpy.data.scenes['Scene'] 177 | sample_rate = int(fps_source/fps_target) 178 | scene.frame_end = (int)(poses.shape[0]/sample_rate) 179 | 180 | # Retrieve pelvis world position. 181 | # Unit is [cm] due to Armature scaling. 182 | # Need to make copy since reference will change when bone location is modified. 183 | bpy.ops.object.mode_set(mode='EDIT') 184 | pelvis_position = Vector(bpy.data.armatures[0].edit_bones[bone_name_from_index[0]].head) 185 | bpy.ops.object.mode_set(mode='OBJECT') 186 | 187 | source_index = 0 188 | frame = 1 189 | 190 | offset = np.array([0.0, 0.0, 0.0]) 191 | 192 | while source_index < poses.shape[0]: 193 | print('Adding pose: ' + str(source_index)) 194 | 195 | if start_origin: 196 | if source_index == 0: 197 | offset = np.array([trans[source_index][0], trans[source_index][1], 0]) 198 | 199 | # Go to new frame 200 | scene.frame_set(frame) 201 | 202 | process_pose(frame, poses[source_index], (trans[source_index] - offset), pelvis_position) 203 | source_index += sample_rate 204 | frame += 1 205 | 206 | return frame 207 | 208 | 209 | def export_animated_mesh(output_path): 210 | # Create output directory if needed 211 | output_dir = os.path.dirname(output_path) 212 | if not os.path.isdir(output_dir): 213 | os.makedirs(output_dir, exist_ok=True) 214 | 215 | # Select only skinned mesh and rig 216 | bpy.ops.object.select_all(action='DESELECT') 217 | bpy.data.objects['Armature'].select_set(True) 218 | bpy.data.objects['Armature'].children[0].select_set(True) 219 | 220 | if output_path.endswith('.glb'): 221 | print('Exporting to glTF binary (.glb)') 222 | # Currently exporting without shape/pose shapes for smaller file sizes 223 | bpy.ops.export_scene.gltf(filepath=output_path, export_format='GLB', export_selected=True, export_morph=False) 224 | elif output_path.endswith('.fbx'): 225 | print('Exporting to FBX binary (.fbx)') 226 | bpy.ops.export_scene.fbx(filepath=output_path, use_selection=True, add_leaf_bones=False) 227 | else: 228 | print('ERROR: Unsupported export format: ' + output_path) 229 | sys.exit(1) 230 | 231 | return 232 | 233 | 234 | if __name__ == '__main__': 235 | try: 236 | if bpy.app.background: 237 | 238 | parser = argparse.ArgumentParser(description='Create keyframed animated skinned SMPL mesh from VIBE output') 239 | parser.add_argument('--input', dest='input_path', type=str, required=True, 240 | help='Input file or directory') 241 | parser.add_argument('--output', dest='output_path', type=str, required=True, 242 | help='Output file or directory') 243 | parser.add_argument('--fps_source', type=int, default=fps_source, 244 | help='Source framerate') 245 | parser.add_argument('--fps_target', type=int, default=fps_target, 246 | help='Target framerate') 247 | parser.add_argument('--gender', type=str, default=gender, 248 | help='Always use specified gender') 249 | parser.add_argument('--start_origin', type=int, default=start_origin, 250 | help='Start animation centered above origin') 251 | parser.add_argument('--person_id', type=int, default=1, 252 | help='Detected person ID to use for fbx animation') 253 | 254 | args = parser.parse_args() 255 | 256 | input_path = args.input_path 257 | output_path = args.output_path 258 | 259 | if not os.path.exists(input_path): 260 | print('ERROR: Invalid input path') 261 | sys.exit(1) 262 | 263 | fps_source = args.fps_source 264 | fps_target = args.fps_target 265 | 266 | gender = args.gender 267 | 268 | start_origin = args.start_origin 269 | 270 | # end if bpy.app.background 271 | 272 | startTime = time.perf_counter() 273 | 274 | # Process data 275 | cwd = os.getcwd() 276 | 277 | # Turn relative input/output paths into absolute paths 278 | if not input_path.startswith(os.path.sep): 279 | input_path = os.path.join(cwd, input_path) 280 | 281 | if not output_path.startswith(os.path.sep): 282 | output_path = os.path.join(cwd, output_path) 283 | 284 | print('Input path: ' + input_path) 285 | print('Output path: ' + output_path) 286 | 287 | if not (output_path.endswith('.fbx') or output_path.endswith('.glb')): 288 | print('ERROR: Invalid output format (must be .fbx or .glb)') 289 | sys.exit(1) 290 | 291 | # Process pose file 292 | if input_path.endswith('.pkl'): 293 | if not os.path.isfile(input_path): 294 | print('ERROR: Invalid input file') 295 | sys.exit(1) 296 | 297 | poses_processed = process_poses( 298 | input_path=input_path, 299 | gender=gender, 300 | fps_source=fps_source, 301 | fps_target=fps_target, 302 | start_origin=start_origin, 303 | person_id=args.person_id 304 | ) 305 | export_animated_mesh(output_path) 306 | 307 | print('--------------------------------------------------') 308 | print('Animation export finished.') 309 | print(f'Poses processed: {str(poses_processed)}') 310 | print(f'Processing time : {time.perf_counter() - startTime:.2f} s') 311 | print('--------------------------------------------------') 312 | sys.exit(0) 313 | 314 | except SystemExit as ex: 315 | if ex.code is None: 316 | exit_status = 0 317 | else: 318 | exit_status = ex.code 319 | 320 | print('Exiting. Exit status: ' + str(exit_status)) 321 | 322 | # Only exit to OS when we are not running in Blender GUI 323 | if bpy.app.background: 324 | sys.exit(exit_status) -------------------------------------------------------------------------------- /lib/utils/geometry.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | This script is brought from https://github.com/mkocabas/VIBE 4 | Adhere to their licence to use this script 5 | """ 6 | 7 | import torch 8 | import numpy as np 9 | from torch.nn import functional as F 10 | 11 | 12 | def batch_rodrigues(axisang): 13 | # This function is borrowed from https://github.com/MandyMo/pytorch_HMR/blob/master/src/util.py#L37 14 | # axisang N x 3 15 | axisang_norm = torch.norm(axisang + 1e-8, p=2, dim=1) 16 | angle = torch.unsqueeze(axisang_norm, -1) 17 | axisang_normalized = torch.div(axisang, angle) 18 | angle = angle * 0.5 19 | v_cos = torch.cos(angle) 20 | v_sin = torch.sin(angle) 21 | quat = torch.cat([v_cos, v_sin * axisang_normalized], dim=1) 22 | rot_mat = quat2mat(quat) 23 | rot_mat = rot_mat.view(rot_mat.shape[0], 9) 24 | return rot_mat 25 | 26 | 27 | def quat2mat(quat): 28 | """ 29 | This function is borrowed from https://github.com/MandyMo/pytorch_HMR/blob/master/src/util.py#L50 30 | 31 | Convert quaternion coefficients to rotation matrix. 32 | Args: 33 | quat: size = [batch_size, 4] 4 <===>(w, x, y, z) 34 | Returns: 35 | Rotation matrix corresponding to the quaternion -- size = [batch_size, 3, 3] 36 | """ 37 | norm_quat = quat 38 | norm_quat = norm_quat / norm_quat.norm(p=2, dim=1, keepdim=True) 39 | w, x, y, z = norm_quat[:, 0], norm_quat[:, 1], norm_quat[:, 40 | 2], norm_quat[:, 41 | 3] 42 | 43 | batch_size = quat.size(0) 44 | 45 | w2, x2, y2, z2 = w.pow(2), x.pow(2), y.pow(2), z.pow(2) 46 | wx, wy, wz = w * x, w * y, w * z 47 | xy, xz, yz = x * y, x * z, y * z 48 | 49 | rotMat = torch.stack([ 50 | w2 + x2 - y2 - z2, 2 * xy - 2 * wz, 2 * wy + 2 * xz, 2 * wz + 2 * xy, 51 | w2 - x2 + y2 - z2, 2 * yz - 2 * wx, 2 * xz - 2 * wy, 2 * wx + 2 * yz, 52 | w2 - x2 - y2 + z2 53 | ], 54 | dim=1).view(batch_size, 3, 3) 55 | return rotMat 56 | 57 | 58 | def rotation_matrix_to_angle_axis(rotation_matrix): 59 | """ 60 | This function is borrowed from https://github.com/kornia/kornia 61 | 62 | Convert 3x4 rotation matrix to Rodrigues vector 63 | 64 | Args: 65 | rotation_matrix (Tensor): rotation matrix. 66 | 67 | Returns: 68 | Tensor: Rodrigues vector transformation. 69 | 70 | Shape: 71 | - Input: :math:`(N, 3, 4)` 72 | - Output: :math:`(N, 3)` 73 | 74 | Example: 75 | >>> input = torch.rand(2, 3, 4) # Nx4x4 76 | >>> output = tgm.rotation_matrix_to_angle_axis(input) # Nx3 77 | """ 78 | if rotation_matrix.shape[1:] == (3,3): 79 | rot_mat = rotation_matrix.reshape(-1, 3, 3) 80 | hom = torch.tensor([0, 0, 1], dtype=torch.float32, 81 | device=rotation_matrix.device).reshape(1, 3, 1).expand(rot_mat.shape[0], -1, -1) 82 | rotation_matrix = torch.cat([rot_mat, hom], dim=-1) 83 | 84 | quaternion = rotation_matrix_to_quaternion(rotation_matrix) 85 | aa = quaternion_to_angle_axis(quaternion) 86 | aa[torch.isnan(aa)] = 0.0 87 | return aa 88 | 89 | 90 | def quaternion_to_angle_axis(quaternion: torch.Tensor) -> torch.Tensor: 91 | """ 92 | This function is borrowed from https://github.com/kornia/kornia 93 | 94 | Convert quaternion vector to angle axis of rotation. 95 | 96 | Adapted from ceres C++ library: ceres-solver/include/ceres/rotation.h 97 | 98 | Args: 99 | quaternion (torch.Tensor): tensor with quaternions. 100 | 101 | Return: 102 | torch.Tensor: tensor with angle axis of rotation. 103 | 104 | Shape: 105 | - Input: :math:`(*, 4)` where `*` means, any number of dimensions 106 | - Output: :math:`(*, 3)` 107 | 108 | Example: 109 | >>> quaternion = torch.rand(2, 4) # Nx4 110 | >>> angle_axis = tgm.quaternion_to_angle_axis(quaternion) # Nx3 111 | """ 112 | if not torch.is_tensor(quaternion): 113 | raise TypeError("Input type is not a torch.Tensor. Got {}".format( 114 | type(quaternion))) 115 | 116 | if not quaternion.shape[-1] == 4: 117 | raise ValueError("Input must be a tensor of shape Nx4 or 4. Got {}" 118 | .format(quaternion.shape)) 119 | # unpack input and compute conversion 120 | q1: torch.Tensor = quaternion[..., 1] 121 | q2: torch.Tensor = quaternion[..., 2] 122 | q3: torch.Tensor = quaternion[..., 3] 123 | sin_squared_theta: torch.Tensor = q1 * q1 + q2 * q2 + q3 * q3 124 | 125 | sin_theta: torch.Tensor = torch.sqrt(sin_squared_theta) 126 | cos_theta: torch.Tensor = quaternion[..., 0] 127 | two_theta: torch.Tensor = 2.0 * torch.where( 128 | cos_theta < 0.0, 129 | torch.atan2(-sin_theta, -cos_theta), 130 | torch.atan2(sin_theta, cos_theta)) 131 | 132 | k_pos: torch.Tensor = two_theta / sin_theta 133 | k_neg: torch.Tensor = 2.0 * torch.ones_like(sin_theta) 134 | k: torch.Tensor = torch.where(sin_squared_theta > 0.0, k_pos, k_neg) 135 | 136 | angle_axis: torch.Tensor = torch.zeros_like(quaternion)[..., :3] 137 | angle_axis[..., 0] += q1 * k 138 | angle_axis[..., 1] += q2 * k 139 | angle_axis[..., 2] += q3 * k 140 | return angle_axis 141 | 142 | 143 | def rotation_matrix_to_quaternion(rotation_matrix, eps=1e-6): 144 | """ 145 | This function is borrowed from https://github.com/kornia/kornia 146 | 147 | Convert 3x4 rotation matrix to 4d quaternion vector 148 | 149 | This algorithm is based on algorithm described in 150 | https://github.com/KieranWynn/pyquaternion/blob/master/pyquaternion/quaternion.py#L201 151 | 152 | Args: 153 | rotation_matrix (Tensor): the rotation matrix to convert. 154 | 155 | Return: 156 | Tensor: the rotation in quaternion 157 | 158 | Shape: 159 | - Input: :math:`(N, 3, 4)` 160 | - Output: :math:`(N, 4)` 161 | 162 | Example: 163 | >>> input = torch.rand(4, 3, 4) # Nx3x4 164 | >>> output = tgm.rotation_matrix_to_quaternion(input) # Nx4 165 | """ 166 | if not torch.is_tensor(rotation_matrix): 167 | raise TypeError("Input type is not a torch.Tensor. Got {}".format( 168 | type(rotation_matrix))) 169 | 170 | if len(rotation_matrix.shape) > 3: 171 | raise ValueError( 172 | "Input size must be a three dimensional tensor. Got {}".format( 173 | rotation_matrix.shape)) 174 | if not rotation_matrix.shape[-2:] == (3, 4): 175 | raise ValueError( 176 | "Input size must be a N x 3 x 4 tensor. Got {}".format( 177 | rotation_matrix.shape)) 178 | 179 | rmat_t = torch.transpose(rotation_matrix, 1, 2) 180 | 181 | mask_d2 = rmat_t[:, 2, 2] < eps 182 | 183 | mask_d0_d1 = rmat_t[:, 0, 0] > rmat_t[:, 1, 1] 184 | mask_d0_nd1 = rmat_t[:, 0, 0] < -rmat_t[:, 1, 1] 185 | 186 | t0 = 1 + rmat_t[:, 0, 0] - rmat_t[:, 1, 1] - rmat_t[:, 2, 2] 187 | q0 = torch.stack([rmat_t[:, 1, 2] - rmat_t[:, 2, 1], 188 | t0, rmat_t[:, 0, 1] + rmat_t[:, 1, 0], 189 | rmat_t[:, 2, 0] + rmat_t[:, 0, 2]], -1) 190 | t0_rep = t0.repeat(4, 1).t() 191 | 192 | t1 = 1 - rmat_t[:, 0, 0] + rmat_t[:, 1, 1] - rmat_t[:, 2, 2] 193 | q1 = torch.stack([rmat_t[:, 2, 0] - rmat_t[:, 0, 2], 194 | rmat_t[:, 0, 1] + rmat_t[:, 1, 0], 195 | t1, rmat_t[:, 1, 2] + rmat_t[:, 2, 1]], -1) 196 | t1_rep = t1.repeat(4, 1).t() 197 | 198 | t2 = 1 - rmat_t[:, 0, 0] - rmat_t[:, 1, 1] + rmat_t[:, 2, 2] 199 | q2 = torch.stack([rmat_t[:, 0, 1] - rmat_t[:, 1, 0], 200 | rmat_t[:, 2, 0] + rmat_t[:, 0, 2], 201 | rmat_t[:, 1, 2] + rmat_t[:, 2, 1], t2], -1) 202 | t2_rep = t2.repeat(4, 1).t() 203 | 204 | t3 = 1 + rmat_t[:, 0, 0] + rmat_t[:, 1, 1] + rmat_t[:, 2, 2] 205 | q3 = torch.stack([t3, rmat_t[:, 1, 2] - rmat_t[:, 2, 1], 206 | rmat_t[:, 2, 0] - rmat_t[:, 0, 2], 207 | rmat_t[:, 0, 1] - rmat_t[:, 1, 0]], -1) 208 | t3_rep = t3.repeat(4, 1).t() 209 | 210 | mask_c0 = mask_d2 * mask_d0_d1 211 | mask_c1 = mask_d2 * ~mask_d0_d1 212 | mask_c2 = ~mask_d2 * mask_d0_nd1 213 | mask_c3 = ~mask_d2 * ~mask_d0_nd1 214 | mask_c0 = mask_c0.view(-1, 1).type_as(q0) 215 | mask_c1 = mask_c1.view(-1, 1).type_as(q1) 216 | mask_c2 = mask_c2.view(-1, 1).type_as(q2) 217 | mask_c3 = mask_c3.view(-1, 1).type_as(q3) 218 | 219 | q = q0 * mask_c0 + q1 * mask_c1 + q2 * mask_c2 + q3 * mask_c3 220 | q /= torch.sqrt(t0_rep * mask_c0 + t1_rep * mask_c1 + # noqa 221 | t2_rep * mask_c2 + t3_rep * mask_c3) # noqa 222 | q *= 0.5 223 | return q 224 | 225 | 226 | def estimate_translation_np(S, joints_2d, joints_conf, focal_length=5000., img_size=224.): 227 | """ 228 | This function is borrowed from https://github.com/nkolot/SPIN/utils/geometry.py 229 | 230 | Find camera translation that brings 3D joints S closest to 2D the corresponding joints_2d. 231 | Input: 232 | S: (25, 3) 3D joint locations 233 | joints: (25, 3) 2D joint locations and confidence 234 | Returns: 235 | (3,) camera translation vector 236 | """ 237 | 238 | num_joints = S.shape[0] 239 | # focal length 240 | f = np.array([focal_length,focal_length]) 241 | # optical center 242 | center = np.array([img_size/2., img_size/2.]) 243 | 244 | # transformations 245 | Z = np.reshape(np.tile(S[:,2],(2,1)).T,-1) 246 | XY = np.reshape(S[:,0:2],-1) 247 | O = np.tile(center,num_joints) 248 | F = np.tile(f,num_joints) 249 | weight2 = np.reshape(np.tile(np.sqrt(joints_conf),(2,1)).T,-1) 250 | 251 | # least squares 252 | Q = np.array([F*np.tile(np.array([1,0]),num_joints), F*np.tile(np.array([0,1]),num_joints), O-np.reshape(joints_2d,-1)]).T 253 | c = (np.reshape(joints_2d,-1)-O)*Z - F*XY 254 | 255 | # weighted least squares 256 | W = np.diagflat(weight2) 257 | Q = np.dot(W,Q) 258 | c = np.dot(W,c) 259 | 260 | # square matrix 261 | A = np.dot(Q.T,Q) 262 | b = np.dot(Q.T,c) 263 | 264 | # solution 265 | trans = np.linalg.solve(A, b) 266 | 267 | return trans 268 | 269 | 270 | def estimate_translation(S, joints_2d, focal_length=5000., img_size=224.): 271 | """ 272 | This function is borrowed from https://github.com/nkolot/SPIN/utils/geometry.py 273 | 274 | Find camera translation that brings 3D joints S closest to 2D the corresponding joints_2d. 275 | Input: 276 | S: (B, 49, 3) 3D joint locations 277 | joints: (B, 49, 3) 2D joint locations and confidence 278 | Returns: 279 | (B, 3) camera translation vectors 280 | """ 281 | 282 | device = S.device 283 | # Use only joints 25:49 (GT joints) 284 | S = S[:, 25:, :].cpu().numpy() 285 | joints_2d = joints_2d[:, 25:, :].cpu().numpy() 286 | joints_conf = joints_2d[:, :, -1] 287 | joints_2d = joints_2d[:, :, :-1] 288 | trans = np.zeros((S.shape[0], 3), dtype=np.float32) 289 | # Find the translation for each example in the batch 290 | for i in range(S.shape[0]): 291 | S_i = S[i] 292 | joints_i = joints_2d[i] 293 | conf_i = joints_conf[i] 294 | trans[i] = estimate_translation_np(S_i, joints_i, conf_i, focal_length=focal_length, img_size=img_size) 295 | return torch.from_numpy(trans).to(device) 296 | 297 | 298 | def rot6d_to_rotmat_spin(x): 299 | """Convert 6D rotation representation to 3x3 rotation matrix. 300 | Based on Zhou et al., "On the Continuity of Rotation Representations in Neural Networks", CVPR 2019 301 | Input: 302 | (B,6) Batch of 6-D rotation representations 303 | Output: 304 | (B,3,3) Batch of corresponding rotation matrices 305 | """ 306 | x = x.view(-1,3,2) 307 | a1 = x[:, :, 0] 308 | a2 = x[:, :, 1] 309 | b1 = F.normalize(a1) 310 | b2 = F.normalize(a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1) 311 | 312 | # inp = a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1 313 | # denom = inp.pow(2).sum(dim=1).sqrt().unsqueeze(-1) + 1e-8 314 | # b2 = inp / denom 315 | 316 | b3 = torch.cross(b1, b2) 317 | return torch.stack((b1, b2, b3), dim=-1) 318 | 319 | 320 | def rot6d_to_rotmat(x): 321 | x = x.view(-1,3,2) 322 | 323 | # Normalize the first vector 324 | b1 = F.normalize(x[:, :, 0], dim=1, eps=1e-6) 325 | 326 | dot_prod = torch.sum(b1 * x[:, :, 1], dim=1, keepdim=True) 327 | # Compute the second vector by finding the orthogonal complement to it 328 | b2 = F.normalize(x[:, :, 1] - dot_prod * b1, dim=-1, eps=1e-6) 329 | 330 | # Finish building the basis by taking the cross product 331 | b3 = torch.cross(b1, b2, dim=1) 332 | rot_mats = torch.stack([b1, b2, b3], dim=-1) 333 | 334 | return rot_mats -------------------------------------------------------------------------------- /lib/utils/pose_tracker.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | This script is brought from https://github.com/mkocabas/VIBE 4 | Adhere to their licence to use this script 5 | """ 6 | 7 | import os 8 | import json 9 | import shutil 10 | import subprocess 11 | import numpy as np 12 | import os.path as osp 13 | 14 | 15 | def run_openpose( 16 | video_file, 17 | output_folder, 18 | staf_folder, 19 | vis=False, 20 | ): 21 | pwd = os.getcwd() 22 | 23 | os.chdir(staf_folder) 24 | 25 | render = 1 if vis else 0 26 | display = 2 if vis else 0 27 | cmd = [ 28 | 'build/examples/openpose/openpose.bin', 29 | '--model_pose', 'BODY_21A', 30 | '--tracking', '1', 31 | '--render_pose', str(render), 32 | '--video', video_file, 33 | '--write_json', output_folder, 34 | '--display', str(display) 35 | ] 36 | 37 | print('Executing', ' '.join(cmd)) 38 | subprocess.call(cmd) 39 | os.chdir(pwd) 40 | 41 | 42 | def read_posetrack_keypoints(output_folder): 43 | 44 | people = dict() 45 | 46 | for idx, result_file in enumerate(sorted(os.listdir(output_folder))): 47 | json_file = osp.join(output_folder, result_file) 48 | data = json.load(open(json_file)) 49 | # print(idx, data) 50 | for person in data['people']: 51 | person_id = person['person_id'][0] 52 | joints2d = person['pose_keypoints_2d'] 53 | if person_id in people.keys(): 54 | people[person_id]['joints2d'].append(joints2d) 55 | people[person_id]['frames'].append(idx) 56 | else: 57 | people[person_id] = { 58 | 'joints2d': [], 59 | 'frames': [], 60 | } 61 | people[person_id]['joints2d'].append(joints2d) 62 | people[person_id]['frames'].append(idx) 63 | 64 | for k in people.keys(): 65 | people[k]['joints2d'] = np.array(people[k]['joints2d']).reshape((len(people[k]['joints2d']), -1, 3)) 66 | people[k]['frames'] = np.array(people[k]['frames']) 67 | 68 | return people 69 | 70 | 71 | def run_posetracker(video_file, staf_folder, posetrack_output_folder='/tmp', display=False): 72 | posetrack_output_folder = os.path.join( 73 | posetrack_output_folder, 74 | f'{os.path.basename(video_file)}_posetrack' 75 | ) 76 | 77 | # run posetrack on video 78 | run_openpose( 79 | video_file, 80 | posetrack_output_folder, 81 | vis=display, 82 | staf_folder=staf_folder 83 | ) 84 | 85 | people_dict = read_posetrack_keypoints(posetrack_output_folder) 86 | 87 | shutil.rmtree(posetrack_output_folder) 88 | 89 | return people_dict -------------------------------------------------------------------------------- /lib/utils/renderer.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | This script is brought from https://github.com/mkocabas/VIBE 4 | Adhere to their licence to use this script 5 | """ 6 | 7 | import os 8 | 9 | import math 10 | import trimesh 11 | import pyrender 12 | import numpy as np 13 | from pyrender.constants import RenderFlags 14 | from lib.models.smpl import get_smpl_faces 15 | 16 | class WeakPerspectiveCamera(pyrender.Camera): 17 | def __init__(self, 18 | scale, 19 | translation, 20 | znear=pyrender.camera.DEFAULT_Z_NEAR, 21 | zfar=None, 22 | name=None): 23 | super(WeakPerspectiveCamera, self).__init__( 24 | znear=znear, 25 | zfar=zfar, 26 | name=name, 27 | ) 28 | self.scale = scale 29 | self.translation = translation 30 | 31 | def get_projection_matrix(self, width=None, height=None): 32 | P = np.eye(4) 33 | P[0, 0] = self.scale[0] 34 | P[1, 1] = self.scale[1] 35 | P[0, 3] = self.translation[0] * self.scale[0] 36 | P[1, 3] = -self.translation[1] * self.scale[1] 37 | P[2, 2] = -1 38 | return P 39 | 40 | 41 | class Renderer: 42 | def __init__(self, resolution=(224,224), orig_img=False, wireframe=False): 43 | self.resolution = resolution 44 | 45 | self.faces = get_smpl_faces() 46 | self.orig_img = orig_img 47 | self.wireframe = wireframe 48 | self.renderer = pyrender.OffscreenRenderer( 49 | viewport_width=self.resolution[0], 50 | viewport_height=self.resolution[1], 51 | point_size=1.0 52 | ) 53 | 54 | # set the scene 55 | self.scene = pyrender.Scene(bg_color=[0.0, 0.0, 0.0, 0.0], ambient_light=(0.3, 0.3, 0.3)) 56 | 57 | light = pyrender.PointLight(color=[1.0, 1.0, 1.0], intensity=1) 58 | 59 | light_pose = np.eye(4) 60 | light_pose[:3, 3] = [0, -1, 1] 61 | self.scene.add(light, pose=light_pose) 62 | 63 | light_pose[:3, 3] = [0, 1, 1] 64 | self.scene.add(light, pose=light_pose) 65 | 66 | light_pose[:3, 3] = [1, 1, 2] 67 | self.scene.add(light, pose=light_pose) 68 | 69 | def set_faces(self, indices): 70 | inter = [np.intersect1d(face, indices, assume_unique=True) for face in self.faces] 71 | idx = [x.size == 3 for x in inter] 72 | self.faces = self.faces[idx] 73 | 74 | def render(self, img, verts, cam, angle=None, axis=None, mesh_filename=None, color=[1.0, 1.0, 0.9]): 75 | 76 | mesh = trimesh.Trimesh(vertices=verts, faces=self.faces, process=False) 77 | 78 | Rx = trimesh.transformations.rotation_matrix(math.radians(180), [1, 0, 0]) 79 | mesh.apply_transform(Rx) 80 | 81 | if mesh_filename is not None: 82 | mesh.export(mesh_filename) 83 | 84 | if angle and axis: 85 | R = trimesh.transformations.rotation_matrix(math.radians(angle), axis) 86 | mesh.apply_transform(R) 87 | 88 | sx, sy, tx, ty = cam 89 | 90 | camera = WeakPerspectiveCamera( 91 | scale=[sx, sy], 92 | translation=[tx, ty], 93 | zfar=1000. 94 | ) 95 | 96 | material = pyrender.MetallicRoughnessMaterial( 97 | metallicFactor=0.0, 98 | alphaMode='OPAQUE', 99 | baseColorFactor=(color[0], color[1], color[2], 1.0) 100 | ) 101 | 102 | mesh = pyrender.Mesh.from_trimesh(mesh, material=material) 103 | 104 | mesh_node = self.scene.add(mesh, 'mesh') 105 | 106 | camera_pose = np.eye(4) 107 | cam_node = self.scene.add(camera, pose=camera_pose) 108 | 109 | if self.wireframe: 110 | render_flags = RenderFlags.RGBA | RenderFlags.ALL_WIREFRAME 111 | else: 112 | render_flags = RenderFlags.RGBA 113 | 114 | rgb, _ = self.renderer.render(self.scene, flags=render_flags) 115 | valid_mask = (rgb[:, :, -1] > 0)[:, :, np.newaxis] 116 | output_img = rgb[:, :, :-1] * valid_mask + (1 - valid_mask) * img 117 | image = output_img.astype(np.uint8) 118 | 119 | self.scene.remove_node(mesh_node) 120 | self.scene.remove_node(cam_node) 121 | 122 | return image 123 | -------------------------------------------------------------------------------- /lib/utils/smooth_bbox.py: -------------------------------------------------------------------------------- 1 | """ 2 | This script is borrowed from https://github.com/akanazawa/human_dynamics/blob/master/src/util/smooth_bbox.py 3 | Adhere to their licence to use this script 4 | """ 5 | 6 | import numpy as np 7 | import scipy.signal as signal 8 | from scipy.ndimage.filters import gaussian_filter1d 9 | 10 | 11 | def get_smooth_bbox_params(kps, vis_thresh=2, kernel_size=11, sigma=3): 12 | """ 13 | Computes smooth bounding box parameters from keypoints: 14 | 1. Computes bbox by rescaling the person to be around 150 px. 15 | 2. Linearly interpolates bbox params for missing annotations. 16 | 3. Median filtering 17 | 4. Gaussian filtering. 18 | 19 | Recommended thresholds: 20 | * detect-and-track: 0 21 | * 3DPW: 0.1 22 | 23 | Args: 24 | kps (list): List of kps (Nx3) or None. 25 | vis_thresh (float): Threshold for visibility. 26 | kernel_size (int): Kernel size for median filtering (must be odd). 27 | sigma (float): Sigma for gaussian smoothing. 28 | 29 | Returns: 30 | Smooth bbox params [cx, cy, scale], start index, end index 31 | """ 32 | bbox_params, start, end = get_all_bbox_params(kps, vis_thresh) 33 | smoothed = smooth_bbox_params(bbox_params, kernel_size, sigma) 34 | smoothed = np.vstack((np.zeros((start, 3)), smoothed)) 35 | return smoothed, start, end 36 | 37 | 38 | def kp_to_bbox_param(kp, vis_thresh): 39 | """ 40 | Finds the bounding box parameters from the 2D keypoints. 41 | 42 | Args: 43 | kp (Kx3): 2D Keypoints. 44 | vis_thresh (float): Threshold for visibility. 45 | 46 | Returns: 47 | [center_x, center_y, scale] 48 | """ 49 | if kp is None: 50 | return 51 | vis = kp[:, 2] > vis_thresh 52 | if not np.any(vis): 53 | return 54 | min_pt = np.min(kp[vis, :2], axis=0) 55 | max_pt = np.max(kp[vis, :2], axis=0) 56 | person_height = np.linalg.norm(max_pt - min_pt) 57 | if person_height < 0.5: 58 | return 59 | center = (min_pt + max_pt) / 2. 60 | scale = 150. / person_height 61 | return np.append(center, scale) 62 | 63 | 64 | def get_all_bbox_params(kps, vis_thresh=2): 65 | """ 66 | Finds bounding box parameters for all keypoints. 67 | 68 | Look for sequences in the middle with no predictions and linearly 69 | interpolate the bbox params for those 70 | 71 | Args: 72 | kps (list): List of kps (Kx3) or None. 73 | vis_thresh (float): Threshold for visibility. 74 | 75 | Returns: 76 | bbox_params, start_index (incl), end_index (excl) 77 | """ 78 | # keeps track of how many indices in a row with no prediction 79 | num_to_interpolate = 0 80 | start_index = -1 81 | bbox_params = np.empty(shape=(0, 3), dtype=np.float32) 82 | 83 | for i, kp in enumerate(kps): 84 | bbox_param = kp_to_bbox_param(kp, vis_thresh=vis_thresh) 85 | if bbox_param is None: 86 | num_to_interpolate += 1 87 | continue 88 | 89 | if start_index == -1: 90 | # Found the first index with a prediction! 91 | start_index = i 92 | num_to_interpolate = 0 93 | 94 | if num_to_interpolate > 0: 95 | # Linearly interpolate each param. 96 | previous = bbox_params[-1] 97 | # This will be 3x(n+2) 98 | interpolated = np.array( 99 | [np.linspace(prev, curr, num_to_interpolate + 2) 100 | for prev, curr in zip(previous, bbox_param)]) 101 | bbox_params = np.vstack((bbox_params, interpolated.T[1:-1])) 102 | num_to_interpolate = 0 103 | bbox_params = np.vstack((bbox_params, bbox_param)) 104 | 105 | return bbox_params, start_index, i - num_to_interpolate + 1 106 | 107 | 108 | def smooth_bbox_params(bbox_params, kernel_size=11, sigma=8): 109 | """ 110 | Applies median filtering and then gaussian filtering to bounding box 111 | parameters. 112 | 113 | Args: 114 | bbox_params (Nx3): [cx, cy, scale]. 115 | kernel_size (int): Kernel size for median filtering (must be odd). 116 | sigma (float): Sigma for gaussian smoothing. 117 | 118 | Returns: 119 | Smoothed bounding box parameters (Nx3). 120 | """ 121 | smoothed = np.array([signal.medfilt(param, kernel_size) 122 | for param in bbox_params.T]).T 123 | return np.array([gaussian_filter1d(traj, sigma) for traj in smoothed.T]).T 124 | -------------------------------------------------------------------------------- /lib/utils/utils.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | This script is brought from https://github.com/mkocabas/VIBE 4 | Adhere to their licence to use this script 5 | """ 6 | 7 | import os 8 | import yaml 9 | import time 10 | import torch 11 | import shutil 12 | import logging 13 | import operator 14 | import torch 15 | from tqdm import tqdm 16 | from os import path as osp 17 | from functools import reduce 18 | from typing import List, Union 19 | 20 | 21 | def move_dict_to_device(dic, device, tensor2float=False): 22 | for k,v in dic.items(): 23 | if isinstance(v, torch.Tensor): 24 | if tensor2float: 25 | dic[k] = v.float().to(device) 26 | else: 27 | dic[k] = v.to(device) 28 | elif isinstance(v, dict): 29 | move_dict_to_device(v, device) 30 | 31 | 32 | def get_from_dict(dict, keys): 33 | return reduce(operator.getitem, keys, dict) 34 | 35 | 36 | def tqdm_enumerate(iter, desc=""): 37 | i = 0 38 | for y in tqdm(iter, desc=desc): 39 | yield i, y 40 | i += 1 41 | 42 | 43 | def iterdict(d): 44 | for k,v in d.items(): 45 | if isinstance(v, dict): 46 | d[k] = dict(v) 47 | iterdict(v) 48 | return d 49 | 50 | 51 | def accuracy(output, target): 52 | _, pred = output.topk(1) 53 | pred = pred.view(-1) 54 | 55 | correct = pred.eq(target).sum() 56 | 57 | return correct.item(), target.size(0) - correct.item() 58 | 59 | 60 | def lr_decay(optimizer, step, lr, decay_step, gamma): 61 | lr = lr * gamma ** (step/decay_step) 62 | for param_group in optimizer.param_groups: 63 | param_group['lr'] = lr 64 | return lr 65 | 66 | 67 | def step_decay(optimizer, step, lr, decay_step, gamma): 68 | lr = lr * gamma ** (step / decay_step) 69 | for param_group in optimizer.param_groups: 70 | param_group['lr'] = lr 71 | return lr 72 | 73 | 74 | def read_yaml(filename): 75 | return yaml.load(open(filename, 'r')) 76 | 77 | 78 | def write_yaml(filename, object): 79 | with open(filename, 'w') as f: 80 | yaml.dump(object, f) 81 | 82 | 83 | def save_cfgnode_to_yaml(cfgnode, filename, mode='w'): 84 | with open(filename, mode) as f: 85 | f.write(cfgnode.dump()) 86 | 87 | 88 | def save_to_file(obj, filename, mode='w'): 89 | with open(filename, mode) as f: 90 | f.write(obj) 91 | 92 | 93 | def concatenate_dicts(dict_list, dim=0): 94 | rdict = dict.fromkeys(dict_list[0].keys()) 95 | for k in rdict.keys(): 96 | rdict[k] = torch.cat([d[k] for d in dict_list], dim=dim) 97 | return rdict 98 | 99 | 100 | def bool_to_string(x: Union[List[bool],bool]) -> Union[List[str],str]: 101 | """ 102 | boolean to string conversion 103 | :param x: list or bool to be converted 104 | :return: string converted thing 105 | """ 106 | if isinstance(x, bool): 107 | return [str(x)] 108 | for i, j in enumerate(x): 109 | x[i]=str(j) 110 | return x 111 | 112 | 113 | def checkpoint2model(checkpoint, key='gen_state_dict'): 114 | state_dict = checkpoint[key] 115 | print(f'Performance of loaded model on 3DPW is {checkpoint["performance"]:.2f}mm') 116 | # del state_dict['regressor.mean_theta'] 117 | return state_dict 118 | 119 | 120 | def get_optimizer(model, optim_type, lr, weight_decay, momentum): 121 | if optim_type in ['sgd', 'SGD']: 122 | opt = torch.optim.SGD( 123 | lr=lr, 124 | params=[{'params': p, 'name': n} for n, p in model.named_parameters()], 125 | momentum=momentum 126 | ) 127 | elif optim_type in ['Adam', 'adam', 'ADAM']: 128 | opt = torch.optim.Adam( 129 | lr=lr, 130 | params=[{'params': p, 'name': n} for n, p in model.named_parameters()], 131 | weight_decay=weight_decay 132 | ) 133 | else: 134 | raise ModuleNotFoundError 135 | return opt 136 | 137 | 138 | def create_logger(logdir, phase='train'): 139 | os.makedirs(logdir, exist_ok=True) 140 | 141 | log_file = osp.join(logdir, f'{phase}_log.txt') 142 | 143 | head = '%(asctime)-15s %(message)s' 144 | logging.basicConfig(filename=log_file, 145 | format=head) 146 | logger = logging.getLogger() 147 | logger.setLevel(logging.INFO) 148 | console = logging.StreamHandler() 149 | logging.getLogger('').addHandler(console) 150 | 151 | return logger 152 | 153 | 154 | class AverageMeter(object): 155 | def __init__(self): 156 | self.val = 0 157 | self.avg = 0 158 | self.sum = 0 159 | self.count = 0 160 | 161 | def update(self, val, n=1): 162 | self.val = val 163 | self.sum += val * n 164 | self.count += n 165 | self.avg = self.sum / self.count 166 | 167 | 168 | def prepare_output_dir(cfg, cfg_file): 169 | 170 | # ==== create logdir 171 | logtime = time.strftime('%Y-%m-%d:_%H-%M-%S') 172 | logdir = f'{logtime}_{cfg.EXP_NAME}' 173 | 174 | logdir = osp.join(cfg.OUTPUT_DIR, logdir) 175 | os.makedirs(logdir, exist_ok=True) 176 | 177 | cfg.LOGDIR = logdir 178 | #cfg.TRAIN.PRETRAINED = osp.join(logdir, "model_best.pth.tar") 179 | 180 | # save config 181 | save_cfgnode_to_yaml(cfg, osp.join(cfg.LOGDIR, 'config.yaml')) 182 | 183 | return cfg 184 | 185 | def determine_output_feature_dim(inp_size, model): 186 | with torch.no_grad(): 187 | # FIXME this is hacky, but most reliable way of determining the exact dim of the output feature 188 | # map for all networks, the feature metadata has reliable channel and stride info, but using 189 | # stride to calc feature dim requires info about padding of each stage that isn't captured. 190 | training = model.training 191 | if training: 192 | model.eval() 193 | o = model(torch.zeros(inp_size)) 194 | if isinstance(o, (list, tuple)): 195 | o = o[-1] # last feature if backbone outputs list/tuple of features 196 | feature_size = o.shape[-2:] 197 | feature_dim = o.shape[1] 198 | model.train(training) 199 | return feature_size, feature_dim -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | tqdm 2 | yacs 3 | numpy 4 | smplx==0.1.13 5 | h5py 6 | joblib 7 | tensorboard 8 | scikit-image 9 | scikit-video 10 | opencv-python 11 | trimesh 12 | pyrender 13 | scipy 14 | # chumpy ##git+https://github.com/mattloper/chumpy.git -------------------------------------------------------------------------------- /scripts/eval.sh: -------------------------------------------------------------------------------- 1 | export PYTHONPATH="./:$PYTHONPATH" 2 | srun \ 3 | --partition=innova \ 4 | --nodes=1 \ 5 | --ntasks-per-node=1 \ 6 | --gres=gpu:1 \ 7 | python eval.py --cfg $1 --pretrained $2 --eval_ds $3 --eval_set $4 8 | -------------------------------------------------------------------------------- /scripts/prepare_insta.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | n=1 3 | for ((i=0;i<$n;i++)) 4 | do 5 | srun \ 6 | --job-name=insta_data \ 7 | --kill-on-bad-exit=1 \ 8 | python lib/data_utils/insta_utils_imgs.py --inp_dir ./data/insta_variety --n $n --i $i >log/$i.log 2>&1 & 9 | done 10 | -------------------------------------------------------------------------------- /scripts/prepare_training_data.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | #mkdir -p ./data/vibe_db 4 | export PYTHONPATH="./:$PYTHONPATH" 5 | 6 | # AMASS 7 | #python lib/data_utils/amass_utils.py --dir ./data/amass 8 | 9 | # InstaVariety 10 | # Comment this if you already downloaded the preprocessed file 11 | #python lib/data_utils/insta_utils.py --dir ./data/insta_variety 12 | 13 | # 3DPW 14 | #python lib/data_utils/threedpw_utils.py --dir ./data/3dpw 15 | 16 | # MPI-INF-3D-HP 17 | python lib/data_utils/mpii3d_utils.py --dir ./data/mpi_inf_3dhp 18 | 19 | # PoseTrack 20 | python lib/data_utils/posetrack_utils.py --dir ./data/posetrack 21 | 22 | # PennAction 23 | python lib/data_utils/penn_action_utils.py --dir ./data/penn_action 24 | -------------------------------------------------------------------------------- /scripts/run.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | ### rand a 5 digit port number 4 | function rand(){ 5 | min=$1 6 | max=$(($2-$min+1)) 7 | num=$(($RANDOM+1000000000)) 8 | echo $(($num%$max+$min)) 9 | } 10 | export MASTER_PORT=$(rand 10000 20000) 11 | echo "MASTER_PORT="$MASTER_PORT 12 | 13 | export PYTHONPATH="./:$PYTHONPATH" 14 | srun \ 15 | --mpi=pmi2 \ 16 | --partition=innova \ 17 | --nodes=$1 \ 18 | --ntasks-per-node=$2 \ 19 | --gres=gpu:$2 \ 20 | --kill-on-bad-exit=1 \ 21 | python train.py --cfg=$3 --pretrained=$4 22 | -------------------------------------------------------------------------------- /tox.ini: -------------------------------------------------------------------------------- 1 | [flake8] 2 | max-line-length = 88 3 | ignore = F401,E402,F403,W503,W504 --------------------------------------------------------------------------------