├── .gitignore
├── .vscode
└── settings.json
├── README.md
├── configs
├── baseline_phase1.yaml
├── baseline_phase2.yaml
└── baseline_phase3.yaml
├── doc
├── cxk.gif
├── dance5_.gif
├── data.md
├── micheal2.gif
└── out3.gif
├── eval.py
├── exp
├── eval
│ └── hvd_start.sh
├── phase1
│ └── hvd_start.sh
├── phase2
│ └── hvd_start.sh
└── phase3
│ └── hvd_start.sh
├── lib
├── core
│ ├── __init__.py
│ ├── config.py
│ ├── evaluate.py
│ ├── loss.py
│ └── trainer.py
├── data_utils
│ ├── img_utils.py
│ ├── insta_utils.py
│ ├── insta_utils_imgs.py
│ ├── kp_utils.py
│ ├── mpii3d_utils.py
│ ├── penn_action_utils.py
│ ├── posetrack_utils.py
│ ├── threedpw_utils.py
│ └── transforms
│ │ ├── __init__.py
│ │ ├── basic.py
│ │ ├── color_jitter.py
│ │ ├── crop.py
│ │ ├── random_erase.py
│ │ └── random_hflip.py
├── dataset
│ ├── __init__.py
│ ├── dataset_image.py
│ ├── dataset_video.py
│ └── loaders.py
├── models
│ ├── __init__.py
│ ├── ktd.py
│ ├── maed.py
│ ├── ops
│ │ ├── __init__.py
│ │ └── drop.py
│ ├── resnetv2.py
│ ├── smpl.py
│ ├── spin.py
│ ├── tokenpose.py
│ └── vision_transformer.py
└── utils
│ ├── __init__.py
│ ├── demo_utils.py
│ ├── eval_utils.py
│ ├── fbx_output.py
│ ├── geometry.py
│ ├── pose_tracker.py
│ ├── renderer.py
│ ├── smooth_bbox.py
│ ├── utils.py
│ └── vis.py
├── requirements.txt
├── scripts
├── eval.sh
├── prepare_insta.sh
├── prepare_training_data.sh
└── run.sh
├── tox.ini
└── train_hvd.py
/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 |
--------------------------------------------------------------------------------
/.vscode/settings.json:
--------------------------------------------------------------------------------
1 | {
2 | "python.formatting.provider": "autopep8"
3 | }
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ## Capturing the Motion of Every Joint: 3D Human Pose and Shape Estimation with Independent Tokens
2 |
3 | [[project]](https://yangsenius.github.io/INT_HMR_Model/) [[arxiv]](https://arxiv.org/abs/2303.00298) [[paper]](https://openreview.net/forum?id=0Vv4H4Ch0la)[[examples]](https://yangsenius.github.io/INT_HMR_Model/)
4 |
5 |
6 |
7 | *The multi-person videos above are based on the VIBE detection and tracking framework.*
8 |
9 |
10 |
11 | > [**Capturing the motion of every joint: 3D human pose and shape estimation with independent tokens**](https://openreview.net/pdf?id=0Vv4H4Ch0la),
12 | > [Sen Yang](https://yangsenius.github.io/INT_HMR_Model/), [Wen Heng](), [Gang Liu](https://scholar.google.com/citations?user=ZyzfB9sAAAAJ&hl=zh-CN&authuser=1), [Guozhong Luo](https://github.com/guozhongluo), [Wankou Yang](https://scholar.google.com/citations?user=inPYAuYAAAAJ&hl=zh-CN), [Gang Yu](https://www.skicyyu.org/),
13 | > *The Eleventh International Conference on Learning Representations, ICLR2023 spotlight*
14 |
15 |
16 |
17 | ## Getting Started
18 |
19 |
20 | This repo is based on the enviroment of `python>=3.6` and `PyTorch>=1.8`. It's better to use the virtual enironment of `conda`
21 |
22 | ```
23 | conda create -n int_hmr python=3.6 && conda activate int_hmr
24 | ```
25 |
26 | Install `PyTorch` following the steps of the official guide on [PyTorch website](https://pytorch.org/get-started/locally/).
27 |
28 | The models in the paper were trained using the distributed training framework `Horovod`. If you want to train the model distributedly using this code, please install the `Horovod` following the [website](https://horovod.readthedocs.io/en/stable/), we use the version of horovod:0.3.3.
29 |
30 | And install the dependencies using `conda`:
31 |
32 | ```
33 | pip install -r requirements.txt
34 | ```
35 |
36 | ## Data preparation
37 |
38 | We follow the steps of [MAED](https://github.com/ziniuwan/maed) repo to prepare the training data. Please refer to [data.md](doc/data.md)
39 |
40 | ## Training
41 |
42 |
43 | To run on a machine with 4 GPUs:
44 |
45 | ```
46 | sh hvd_start.sh 4 localhost:4
47 | ```
48 |
49 | To run on 4 machines with 4 GPUs each
50 |
51 | ```
52 | sh hvd_start.sh 16 server1_ip:4,server2_ip:4,server3_ip:4,server4_ip:4
53 | ```
54 | Here we show the training commands of using a single machine with 4 GPUs for the proposed scheme of progressive 3-stage training.
55 |
56 | 1.Image based pre-training:
57 | ```
58 | sh exp/phase1/hvd_start.sh 4 localhost:4
59 | ```
60 | 2.Image/Video based pre-training:
61 | ```
62 | sh exp/phase2/hvd_start.sh 4 localhost:4
63 | ```
64 | 3.Fine-tuning:
65 | ```
66 | sh exp/phase3/hvd_start.sh 4 localhost:4
67 | ```
68 |
69 | ## Evaluation
70 |
71 | ```
72 | sh exp/eval/hvd_start.sh 4 localhost:4
73 | ```
74 |
75 | ## Pretrained models
76 |
77 | |PA-MPJPE (3DPW test set)|Length of temp embed.|Link|
78 | |:-:|:-:|-|
79 | |42.0 (T=64)|16|[Model-1 Google drive](https://drive.google.com/file/d/1ffCEhjXxOQ5EIx3Xt2NF0EoOx-3Py0he/view?usp=drive_link)|
80 | |42.3 (T=64)|64|[Model-2 Google drive](https://drive.google.com/file/d/1Kq25NESN6d2QQtUJ02Fjme4BEobWe3Az/view?usp=drive_link)|
81 |
82 | ## Citation
83 | If you find this repository useful please give it a star 🌟 or consider citing our work:
84 |
85 | ```
86 | @inproceedings{
87 | yang2023capturing,
88 | title={Capturing the Motion of Every Joint: 3D Human Pose and Shape Estimation with Independent Tokens},
89 | author={Sen Yang and Wen Heng and Gang Liu and GUOZHONG LUO and Wankou Yang and Gang YU},
90 | booktitle={The Eleventh International Conference on Learning Representations (ICLR) },
91 | year={2023},
92 | url={https://openreview.net/forum?id=0Vv4H4Ch0la}
93 | }
94 | ```
95 |
96 | ## Credit
97 | Thanks for the great open-source codes of [MAED](https://github.com/ziniuwan/maed) and [VIBE](https://github.com/mkocabas/VIBE)
98 |
99 |
--------------------------------------------------------------------------------
/configs/baseline_phase1.yaml:
--------------------------------------------------------------------------------
1 | DEBUG: false
2 | DEBUG_FREQ: 1
3 | LOGDIR: ''
4 | DEVICE: 'cuda'
5 | EXP_NAME: 'token3d'
6 | OUTPUT_DIR: 'results/'
7 | NUM_WORKERS: 8
8 | SEED_VALUE: -1
9 | SAVE_FREQ: 1
10 | DATASET:
11 | SEQLEN: 16
12 | SAMPLE_POOL: 128
13 | OVERLAP: 0.5
14 | RANDOM_SAMPLE: false
15 | RANDOM_START: true
16 | SIZE_JITTER: 0.
17 | ROT_JITTER: 0
18 | RANDOM_FLIP: 0.5
19 | RANDOM_CROP_P: 0.2
20 | RANDOM_CROP_SIZE: 0.6
21 | COLOR_JITTER: 0.
22 | ERASE_PROB: 0.
23 | ERASE_PART: 0.
24 | ERASE_FILL: False
25 | ERASE_KP: False
26 | ERASE_MARGIN: 0.
27 | WIDTH: 224
28 | HEIGHT: 224
29 | EVAL:
30 | SAMPLE_POOL: 128
31 | SEQLEN: 16
32 | BATCH_SIZE: 8
33 | INTERPOLATION: 1
34 | BBOX_SCALE: 1.1
35 | LOSS:
36 | KP_2D_W: 300.0
37 | KP_3D_W: 600.0
38 | SHAPE_W: 0.06
39 | POSE_W: 60.0
40 | SMPL_NORM: 1.0
41 | ACCL_W: 0.0
42 | TRAIN:
43 | BATCH_SIZE_3D: 0
44 | BATCH_SIZE_2D: 0
45 | BATCH_SIZE_IMG: 120
46 | IMG_USE_FREQ: 1
47 | NUM_ITERS_PER_EPOCH: -1
48 | RESUME: ''
49 | START_EPOCH: 0
50 | END_EPOCH: 100 #
51 | DATASETS_2D: []
52 | DATASETS_3D: []
53 | DATASETS_IMG:
54 | - 'coco2014-all'
55 | - 'lspet'
56 | - 'mpii'
57 | - 'mpii3d'
58 | - 'h36m'
59 | DATASET_EVAL: '3dpw'
60 | EVAL_SET: 'test'
61 | OPTIM:
62 | LR: 0.0001
63 | WD: 0.00001
64 | OPTIM: 'Adam'
65 | WARMUP_EPOCH: 0
66 | WARMUP_FACTOR: 0.1
67 | MILESTONES: [60, 90]
68 | MODEL:
69 | ENABLE_TEMP_MODELING: False
70 | ENCODER:
71 | NUM_BLOCKS: 6
72 | NUM_HEADS: 12
73 | SPA_TEMP_MODE: 'vanilla'
74 |
75 |
--------------------------------------------------------------------------------
/configs/baseline_phase2.yaml:
--------------------------------------------------------------------------------
1 | DEBUG: false
2 | DEBUG_FREQ: 1
3 | LOGDIR: ''
4 | DEVICE: 'cuda'
5 | EXP_NAME: 'token3d'
6 | OUTPUT_DIR: 'results/'
7 | NUM_WORKERS: 8
8 | SEED_VALUE: -1
9 | SAVE_FREQ: 1
10 | DATASET:
11 | SEQLEN: 16
12 | SAMPLE_POOL: 128
13 | OVERLAP: 0.5
14 | RANDOM_SAMPLE: false
15 | RANDOM_START: true
16 | SIZE_JITTER: 0.
17 | ROT_JITTER: 0
18 | RANDOM_FLIP: 0.5
19 | RANDOM_CROP_P: 0.2
20 | RANDOM_CROP_SIZE: 0.6
21 | COLOR_JITTER: 0.3
22 | ERASE_PROB: 0.3
23 | ERASE_PART: 0.7
24 | ERASE_FILL: False
25 | ERASE_KP: False
26 | ERASE_MARGIN: 0.2
27 | WIDTH: 224
28 | HEIGHT: 224
29 | EVAL:
30 | SAMPLE_POOL: 128
31 | SEQLEN: 16
32 | BATCH_SIZE: 8
33 | INTERPOLATION: 1
34 | BBOX_SCALE: 1.1
35 | LOSS:
36 | KP_2D_W: 300.0
37 | KP_3D_W: 600.0
38 | SHAPE_W: 0.06
39 | POSE_W: 60.0
40 | SMPL_NORM: 1.0
41 | ACCL_W: 0.0
42 | TEMP_W: 0.0
43 | TRAIN:
44 | BATCH_SIZE_3D: 4
45 | BATCH_SIZE_2D: 3
46 | BATCH_SIZE_IMG: 7
47 | IMG_USE_FREQ: 1
48 | NUM_ITERS_PER_EPOCH: -1
49 | RESUME: ''
50 | START_EPOCH: 0
51 | END_EPOCH: 100 #
52 | DATASETS_2D:
53 | - 'insta'
54 | - 'posetrack'
55 | - 'pennaction'
56 | DATASETS_3D:
57 | # - '3dpw'
58 | - 'mpii3d'
59 | - 'h36m'
60 | DATASETS_IMG:
61 | - 'coco2014-all'
62 | - 'lspet'
63 | - 'mpii'
64 | DATASET_EVAL: 'h36m'
65 | EVAL_SET: 'val'
66 | OPTIM:
67 | LR: 0.0001
68 | WD: 0.00001
69 | OPTIM: 'Adam'
70 | WARMUP_EPOCH: 0
71 | WARMUP_FACTOR: 0.1
72 | MILESTONES: [60, 90]
73 | MODEL:
74 | ENABLE_TEMP_MODELING: true
75 | ENABLE_TEMP_EMBEDDING: true
76 | ENCODER:
77 | NUM_BLOCKS: 6
78 | NUM_HEADS: 12
79 | SPA_TEMP_MODE: 'vanilla'
80 | MASK_RATIO: 0.
81 | TEMPORAL_LAYERS: 3
82 | TEMPORAL_NUM_HEADS: 12
83 | LOAD_PRETRAINED_HEAD: True
84 |
85 |
--------------------------------------------------------------------------------
/configs/baseline_phase3.yaml:
--------------------------------------------------------------------------------
1 | DEBUG: false
2 | DEBUG_FREQ: 1
3 | LOGDIR: ''
4 | DEVICE: 'cuda'
5 | EXP_NAME: 'token3d'
6 | OUTPUT_DIR: 'results/'
7 | NUM_WORKERS: 8
8 | SEED_VALUE: -1
9 | SAVE_FREQ: 1
10 | DATASET:
11 | SEQLEN: 16
12 | SAMPLE_POOL: 128
13 | OVERLAP: 0.5
14 | RANDOM_SAMPLE: false
15 | RANDOM_START: true
16 | SIZE_JITTER: 0.
17 | ROT_JITTER: 0
18 | RANDOM_FLIP: 0.5
19 | RANDOM_CROP_P: 0.2
20 | RANDOM_CROP_SIZE: 0.6
21 | COLOR_JITTER: 0.3
22 | ERASE_PROB: 0.3
23 | ERASE_PART: 0.7
24 | ERASE_FILL: False
25 | ERASE_KP: False
26 | ERASE_MARGIN: 0.2
27 | WIDTH: 224
28 | HEIGHT: 224
29 | EVAL:
30 | SAMPLE_POOL: 128
31 | SEQLEN: 64 # T=16
32 | BATCH_SIZE: 8
33 | INTERPOLATION: 1
34 | BBOX_SCALE: 1.1
35 | LOSS:
36 | KP_2D_W: 300.0
37 | KP_3D_W: 600.0
38 | SHAPE_W: 0.06
39 | POSE_W: 60.0
40 | SMPL_NORM: 0.01
41 | ACCL_W: 0.0
42 | TEMP_W: 600.0
43 | TRAIN:
44 | BATCH_SIZE_3D: 4
45 | BATCH_SIZE_2D: 0
46 | BATCH_SIZE_IMG: 0
47 | IMG_USE_FREQ: 1
48 | NUM_ITERS_PER_EPOCH: -1
49 | RESUME: ''
50 | START_EPOCH: 0
51 | END_EPOCH: 50 #
52 | DATASETS_2D: []
53 | DATASETS_3D:
54 | - '3dpw'
55 | - 'h36m'
56 | DATASETS_IMG: []
57 | DATASET_EVAL: '3dpw'
58 | EVAL_SET: 'test'
59 | OPTIM:
60 | LR: 0.0001
61 | WD: 0.00001
62 | OPTIM: 'sgd'
63 | WARMUP_EPOCH: 0
64 | WARMUP_FACTOR: 0.1
65 | MILESTONES: [30, 40]
66 | MODEL:
67 | ENABLE_TEMP_MODELING: true
68 | ENABLE_TEMP_EMBEDDING: true
69 | ENCODER:
70 | NUM_BLOCKS: 6
71 | NUM_HEADS: 12
72 | SPA_TEMP_MODE: 'vanilla'
73 | MASK_RATIO: 0.
74 | TEMPORAL_LAYERS: 3
75 | TEMPORAL_NUM_HEADS: 12
76 | LOAD_PRETRAINED_HEAD: True
77 |
78 |
--------------------------------------------------------------------------------
/doc/cxk.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yangsenius/INT_HMR_Model/8f7ee91bf8272fce37b571c02e5df49c5cd13b20/doc/cxk.gif
--------------------------------------------------------------------------------
/doc/dance5_.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yangsenius/INT_HMR_Model/8f7ee91bf8272fce37b571c02e5df49c5cd13b20/doc/dance5_.gif
--------------------------------------------------------------------------------
/doc/data.md:
--------------------------------------------------------------------------------
1 | Throughout the documentation we refer to the repo root folder as `$ROOT`. All the datasets listed below should be put in or linked to `$ROOT/data`.
2 |
3 | # Data Preparation
4 |
5 | ## 1. Download Datasets
6 | You should first down load the datasets used in MAED.
7 |
8 | - **InstaVariety**
9 |
10 | Download the
11 | [preprocessed tfrecords](https://github.com/akanazawa/human_dynamics/blob/master/doc/insta_variety.md#pre-processed-tfrecords)
12 | provided by the authors of Temporal HMR.
13 |
14 | Directory structure:
15 | ```shell script
16 | insta_variety
17 | |-- train
18 | | |-- insta_variety_00_copy00_hmr_noS5.ckpt-642561.tfrecord
19 | | |-- insta_variety_01_copy00_hmr_noS5.ckpt-642561.tfrecord
20 | | `-- ...
21 | `-- test
22 | |-- insta_variety_00_copy00_hmr_noS5.ckpt-642561.tfrecord
23 | |-- insta_variety_01_copy00_hmr_noS5.ckpt-642561.tfrecord
24 | `-- ...
25 | ```
26 |
27 | As the original InstaVariety is saved in tfrecord format, which is not suitable for use in Pytorch. You could run this
28 | [script](../scripts/prepare_insta.sh) which will extract frames of every tfrecord and save them as jpeg.
29 |
30 | Directory structure after extraction:
31 | ```shell script
32 | insta_variety_img
33 | |-- train
34 | |-- insta_variety_00_copy00_hmr_noS5.ckpt-642561.tfrecord
35 | | |-- 0
36 | | |-- 1
37 | | `-- ...
38 | |-- insta_variety_01_copy00_hmr_noS5.ckpt-642561.tfrecord
39 | | |-- 0
40 | | |-- 1
41 | | `-- ...
42 | `-- ...
43 | ```
44 |
45 | - **[MPI-3D-HP](http://gvv.mpi-inf.mpg.de/3dhp-dataset)**
46 |
47 | Donwload the dataset using the bash script provided by the authors. We will be using standard cameras only, so wall and ceiling
48 | cameras aren't needed. Then, run
49 | [the script from the official VIBE repo](https://gist.github.com/mkocabas/cc6fe78aac51f97859e45f46476882b6) to extract frames of videos.
50 |
51 | Directory structure:
52 | ```shell script
53 | $ROOT/data
54 | mpi_inf_3dhp
55 | |-- S1
56 | | |-- Seq1
57 | | |-- Seq2
58 | |-- S2
59 | | |-- Seq1
60 | | |-- Seq2
61 | |-- ...
62 | `-- util
63 | ```
64 |
65 | - **[Human 3.6M](http://vision.imar.ro/human3.6m/description.php)**
66 |
67 | Human 3.6M is not a open dataset now, thus it is optional in our training code. **However, Human 3.6M has non-negligible effect on the final performance of MAED.**
68 |
69 | Once getting available to the Human 3.6M dataset, one could refer to [the script](https://github.com/nkolot/SPIN/blob/master/datasets/preprocess/h36m_train.py) from the official SPIN repository to preprocess the Human 3.6M dataset.
70 | Directory structure:
71 | ```shell script
72 | human3.6m
73 | |-- annot
74 | |-- dataset_extras
75 | |-- S1
76 | |-- S11
77 | |-- S5
78 | |-- S6
79 | |-- S7
80 | |-- S8
81 | `-- S9
82 | ```
83 |
84 | - **[3DPW](https://virtualhumans.mpi-inf.mpg.de/3DPW)**
85 |
86 | Directory structure:
87 | ```shell script
88 | 3dpw
89 | |-- imageFiles
90 | | |-- courtyard_arguing_00
91 | | |-- courtyard_backpack_00
92 | | |-- ...
93 | `-- sequenceFiles
94 | |-- test
95 | |-- train
96 | `-- validation
97 | ```
98 |
99 | - **[PennAction](http://dreamdragon.github.io/PennAction/)**
100 |
101 | Directory structure:
102 | ```shell script
103 | pennaction
104 | |-- frames
105 | | |-- 0000
106 | | |-- 0001
107 | | |-- ...
108 | `-- labels
109 | |-- 0000.mat
110 | |-- 0001.mat
111 | `-- ...
112 | ```
113 |
114 | - **[PoseTrack](https://posetrack.net/)**
115 |
116 | Directory structure:
117 | ```shell script
118 | posetrack
119 | |-- images
120 | | |-- train
121 | | |-- val
122 | | |-- test
123 | `-- posetrack_data
124 | `-- annotations
125 | |-- train
126 | |-- val
127 | `-- test
128 | ```
129 |
130 | - **[MPII](http://human-pose.mpi-inf.mpg.de/)**
131 |
132 | Directory structure:
133 | ```shell script
134 | mpii
135 | |-- 099992483.jpg
136 | |-- 099990098.jpg
137 | `-- ...
138 | ```
139 |
140 | - **[COCO 2014-All](https://cocodataset.org/)**
141 |
142 | Directory structure:
143 | ```shell script
144 | coco2014-all
145 | |-- COCO_train2014_000000000001.jpg
146 | |-- COCO_train2014_000000000002.jpg
147 | `-- ...
148 | ```
149 |
150 | - **[LSPet](http://sam.johnson.io/research/lspet.html)**
151 |
152 | Directory structure:
153 | ```shell script
154 | lspet
155 | |-- im00001.jpg
156 | |-- im00002.jpg
157 | `-- ...
158 | ```
159 |
160 | ## 2. Download Annotation (pt format)
161 | Download annotation data for MAED from [Google Drive](https://drive.google.com/drive/folders/1vApUaFNqo-uNP7RtVRxBy2YJJ1IprnQ8?usp=sharing) and move the whole directory to `$ROOT/data`.
162 |
163 | ## 3. Download SMPL data
164 | Download SMPL data for MAED from [Google Drive](https://drive.google.com/drive/folders/1RqkUInP_0DohMvYpnFpqo7z_KWxjQVa6?usp=sharing) and move the whole directory to `$ROOT/data`.
165 |
166 | ## It's Done!
167 | After downloading all the datasets and annotations, the directory structure of `$ROOT/data` should be like:
168 | ```shell script
169 | $ROOT/data
170 | |-- insta_variety
171 | |-- insta_variety_img
172 | |-- 3dpw
173 | |-- mpii3d
174 | |-- posetrack
175 | |-- pennaction
176 | |-- coco2014-all
177 | |-- lspet
178 | |-- mpii
179 | |-- smpl_data
180 | |-- J_regressor_extra.npy
181 | `-- ...
182 | `-- database
183 | |-- insta_train_db.pt
184 | |-- 3dpw_train_db.pt
185 | |-- lspet_train_db.pt
186 | `-- ...
187 | ```
--------------------------------------------------------------------------------
/doc/micheal2.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yangsenius/INT_HMR_Model/8f7ee91bf8272fce37b571c02e5df49c5cd13b20/doc/micheal2.gif
--------------------------------------------------------------------------------
/doc/out3.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yangsenius/INT_HMR_Model/8f7ee91bf8272fce37b571c02e5df49c5cd13b20/doc/out3.gif
--------------------------------------------------------------------------------
/eval.py:
--------------------------------------------------------------------------------
1 | import os
2 | import torch
3 | import torchvision
4 |
5 | from lib.dataset import VideoDataset
6 | from lib.data_utils.transforms import *
7 | from lib.models import MAED
8 | from lib.models.tokenpose import Token3d
9 | from lib.core.evaluate import Evaluator
10 | from lib.core.config import parse_args
11 | from torch.utils.data import DataLoader
12 |
13 |
14 | def main(cfg, args):
15 | print(f'...Evaluating on {args.eval_ds.lower()} {args.eval_set.lower()} set...')
16 | device = "cuda"
17 |
18 | model = Token3d(
19 | num_blocks=cfg.MODEL.ENCODER.NUM_BLOCKS,
20 | num_heads=cfg.MODEL.ENCODER.NUM_HEADS,
21 | st_mode=cfg.MODEL.ENCODER.SPA_TEMP_MODE,
22 | mask_ratio=cfg.MODEL.MASK_RATIO,
23 | temporal_layers=cfg.MODEL.TEMPORAL_LAYERS,
24 | temporal_num_heads=cfg.MODEL.TEMPORAL_NUM_HEADS,
25 | enable_temp_modeling=cfg.MODEL.ENABLE_TEMP_MODELING,
26 | enable_temp_embedding=cfg.MODEL.ENABLE_TEMP_EMBEDDING
27 | )
28 |
29 | print("model params:{:.3f}M (/1000^2)".format(
30 | sum([p.numel() for p in model.parameters()]) / 1000**2))
31 |
32 | if args.pretrained != '' and os.path.isfile(args.pretrained):
33 | checkpoint = torch.load(args.pretrained, map_location='cpu')
34 | # best_performance = checkpoint['performance']
35 | history_best_performance = checkpoint['history_best_peformance'] \
36 | if 'history_best_peformance' in checkpoint else checkpoint['performance']
37 | state_dict = {}
38 | for k, w in checkpoint['state_dict'].items():
39 | if k.startswith('module.'):
40 | state_dict[k[len('module.'):]] = w
41 | elif k in model.state_dict():
42 | state_dict[k] = w
43 | else:
44 | continue
45 |
46 | temp_embedding_shape = state_dict['temporal_pos_embedding'].shape
47 | if model.temporal_pos_embedding.shape[1] != temp_embedding_shape[1]:
48 | model.temporal_pos_embedding = torch.nn.Parameter(
49 | torch.zeros(1, temp_embedding_shape[1], temp_embedding_shape[2]))
50 |
51 | # checkpoint['state_dict'] = {k[len('module.'):]: w for k, w in checkpoint['state_dict'].items() if k.startswith('module.') else}
52 | model.load_state_dict(state_dict, strict=False)
53 | print(f'==> Loaded pretrained model from {args.pretrained}...')
54 | print(
55 | f'==> History best Performance on 3DPW test set {history_best_performance}')
56 | else:
57 | print(f'{args.pretrained} is not a pretrained model!!!!')
58 | exit()
59 |
60 | model = model.to(device)
61 |
62 | transforms = torchvision.transforms.Compose([
63 | CropVideo(cfg.DATASET.HEIGHT, cfg.DATASET.WIDTH,
64 | default_bbox_scale=cfg.EVAL.BBOX_SCALE),
65 | StackFrames(),
66 | ToTensorVideo(),
67 | NormalizeVideo(),
68 | ])
69 |
70 | test_db = VideoDataset(
71 | args.eval_ds.lower(),
72 | set=args.eval_set.lower(),
73 | transforms=transforms,
74 | sample_pool=cfg.EVAL.SAMPLE_POOL,
75 | random_sample=False, random_start=False,
76 | verbose=True,
77 | debug=cfg.DEBUG)
78 |
79 | test_loader = DataLoader(
80 | dataset=test_db,
81 | batch_size=cfg.EVAL.BATCH_SIZE,
82 | shuffle=False,
83 | num_workers=cfg.NUM_WORKERS,
84 | )
85 |
86 | Evaluator().run(
87 | model=model,
88 | dataloader=test_loader,
89 | seqlen=cfg.EVAL.SEQLEN,
90 | interp=cfg.EVAL.INTERPOLATION,
91 | save_path=args.output_path,
92 | device=cfg.DEVICE,
93 | )
94 |
95 |
96 | if __name__ == '__main__':
97 | args, cfg, cfg_file = parse_args()
98 |
99 | main(cfg, args)
100 |
--------------------------------------------------------------------------------
/exp/eval/hvd_start.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | # smplx: view operation requires contiguous tensor, replacing by reshape operation
3 | sed -i "347c rel_joints.reshape(-1, 3, 1)).view(-1, joints.shape[1], 4, 4)" ~/anconda3/envs/int_hmr/lib/python3.6/site-packages/smplx/lbs.py
4 |
5 | sed -i "96,97d" ~/anconda3/envs/int_hmr/lib/python3.6/site-packages/horovod/torch/mpi_ops.py
6 |
7 | unset OMPI_MCA_plm_rsh_agent
8 | export NCCL_SOCKET_IFNAME=eth1
9 | export NCCL_IB_DISABLE=1
10 | export NCCL_DEBUG=INFO
11 | export LANG=zh_CN.UTF-8
12 |
13 | date=`date +%Y%m%d_%H%M%S`
14 | export LANG=en_US.UTF-8
15 |
16 | # link to the work dir to save checkpoints and logs
17 |
18 | if [ ! -d 'workdir' ];then
19 | mkdir -p workdir
20 | fi
21 | work_dir=workdir/token3d_training_dir
22 | model_name=hvd_token3d_phase3
23 | config_yaml=configs/baseline_phase3.yaml
24 |
25 |
26 | exp_dir=${work_dir}/${model_name}
27 |
28 | if [ ! -d ${exp_dir} ];then
29 | mkdir -p ${exp_dir}
30 | fi
31 | echo 'current work dir is: '${exp_dir}
32 |
33 | echo ">>>> eval"
34 | #'epoch_100.pth.tar' # 'model_best.pth.tar' #44_9_model_best.pth.tar'
35 | best_name='model_best.pth.tar'
36 | best_from=$exp_dir/$best_name
37 | #best_from='/cfs/cfs-31b43a0b8/personal/brucesyang/baseline_training_dir/tp_baseline_token3dpretrain/coco/transpose_r/token3dpretrain/checkpoint.pth'
38 |
39 | python eval.py --cfg $config_yaml\
40 | --pretrained $best_from \
41 | --eval_ds 3dpw \
42 | --eval_set val \
43 | 2>&1 | tee -a ${exp_dir}/eval_output.log
44 |
45 | python eval.py --cfg $config_yaml\
46 | --pretrained $best_from \
47 | --eval_ds 3dpw \
48 | --eval_set test \
49 | 2>&1 | tee -a ${exp_dir}/eval_output.log
50 |
51 | python eval.py --cfg $config_yaml\
52 | --pretrained $best_from \
53 | --eval_ds h36m \
54 | --eval_set val \
55 | 2>&1 | tee -a ${exp_dir}/eval_output.log
56 |
--------------------------------------------------------------------------------
/exp/phase1/hvd_start.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | # smplx: view operation requires contiguous tensor, replacing by reshape operation
3 | sed -i "347c rel_joints.reshape(-1, 3, 1)).view(-1, joints.shape[1], 4, 4)" ~/anconda3/envs/int_hmr/lib/python3.6/site-packages/smplx/lbs.py
4 |
5 | sed -i "96,97d" ~/anconda3/envs/int_hmr/lib/python3.6/site-packages/horovod/torch/mpi_ops.py
6 |
7 | unset OMPI_MCA_plm_rsh_agent
8 | export NCCL_SOCKET_IFNAME=eth1
9 | export NCCL_IB_DISABLE=1
10 | export NCCL_DEBUG=INFO
11 | export LANG=zh_CN.UTF-8
12 |
13 | date=`date +%Y%m%d_%H%M%S`
14 | export LANG=en_US.UTF-8
15 |
16 | # link to the work dir to save checkpoints and logs
17 | if [ ! -d 'workdir' ];then
18 | mkdir -p workdir
19 | fi
20 | work_dir=workdir/token3d_training_dir
21 | model_name=hvd_token3d_phase1
22 | exp_dir=${work_dir}/${model_name}
23 | resume_name='checkpoint.pt'
24 |
25 | resume_from=$exp_dir/$resume_name
26 | best_from=$work_dir/$best_name
27 |
28 | if [ ! -d ${exp_dir} ];then
29 | mkdir -p ${exp_dir}
30 | fi
31 | echo 'current work dir is: '${exp_dir}
32 |
33 | # for example
34 | # To run on a machine with 4 GPUs:
35 | # horovodrun -np 4 -H localhost:4 python train.py
36 |
37 | # To run on 4 machines with 4 GPUs each
38 | # horovodrun -np 16 -H server1:4,server2:4,server3:4,server4:4 python train.py
39 |
40 |
41 | gpu_min=$1 # total_min_gpu_num
42 | node_list=$2 #server1_ip:gpu_num,server2_ip:gpu_num
43 |
44 | horovodrun -np ${gpu_min} -H ${node_list} \
45 | python train_hvd.py --cfg configs/baseline_phase1.yaml\
46 | --resume $resume_from \
47 | --logdir ${exp_dir} 2>&1 | tee -a ${exp_dir}/hvd_output.log
48 |
--------------------------------------------------------------------------------
/exp/phase2/hvd_start.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | # smplx: view operation requires contiguous tensor, replacing by reshape operation
3 | sed -i "347c rel_joints.reshape(-1, 3, 1)).view(-1, joints.shape[1], 4, 4)" ~/anconda3/envs/int_hmr/lib/python3.6/site-packages/smplx/lbs.py
4 |
5 | sed -i "96,97d" ~/anconda3/envs/int_hmr/lib/python3.6/site-packages/horovod/torch/mpi_ops.py
6 |
7 | unset OMPI_MCA_plm_rsh_agent
8 | export NCCL_SOCKET_IFNAME=eth1
9 | export NCCL_IB_DISABLE=1
10 | export NCCL_DEBUG=INFO
11 | export LANG=zh_CN.UTF-8
12 |
13 | date=`date +%Y%m%d_%H%M%S`
14 | export LANG=en_US.UTF-8
15 |
16 | # link to the work dir to save checkpoints and logs
17 | if [ ! -d 'workdir' ];then
18 | mkdir -p workdir
19 | fi
20 | work_dir=workdir/token3d_training_dir
21 | model_name=hvd_token3d_phase2
22 | exp_dir=${work_dir}/${model_name}
23 | resume_name='checkpoint.pt'
24 | best_name='hvd_token3d_phase1/epoch_100.pth.tar'
25 |
26 | resume_from=$exp_dir/$resume_name
27 | best_from=$work_dir/$best_name
28 |
29 | if [ ! -d ${exp_dir} ];then
30 | mkdir -p ${exp_dir}
31 | fi
32 | echo 'current work dir is: '${exp_dir}
33 |
34 | # for example
35 | # To run on a machine with 4 GPUs:
36 | # horovodrun -np 4 -H localhost:4 python train.py
37 |
38 | # To run on 4 machines with 4 GPUs each
39 | # horovodrun -np 16 -H server1:4,server2:4,server3:4,server4:4 python train.py
40 |
41 |
42 | gpu_min=$1 # total_min_gpu_num
43 | node_list=$2 #server1_ip:gpu_num,server2_ip:gpu_num
44 |
45 | horovodrun -np ${gpu_min} -H ${node_list} \
46 | python train_hvd.py --cfg configs/baseline_phase2.yaml\
47 | --resume $resume_from \
48 | --pretrained $best_from \
49 | --logdir ${exp_dir} 2>&1 | tee -a ${exp_dir}/hvd_output.log
50 |
--------------------------------------------------------------------------------
/exp/phase3/hvd_start.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | # smplx: view operation requires contiguous tensor, replacing by reshape operation
3 | sed -i "347c rel_joints.reshape(-1, 3, 1)).view(-1, joints.shape[1], 4, 4)" ~/anconda3/envs/int_hmr/lib/python3.6/site-packages/smplx/lbs.py
4 |
5 | sed -i "96,97d" ~/anconda3/envs/int_hmr/lib/python3.6/site-packages/horovod/torch/mpi_ops.py
6 |
7 | unset OMPI_MCA_plm_rsh_agent
8 | export NCCL_SOCKET_IFNAME=eth1
9 | export NCCL_IB_DISABLE=1
10 | export NCCL_DEBUG=INFO
11 | export LANG=zh_CN.UTF-8
12 |
13 | date=`date +%Y%m%d_%H%M%S`
14 | export LANG=en_US.UTF-8
15 |
16 | # link to the work dir to save checkpoints and logs
17 | if [ ! -d 'workdir' ];then
18 | mkdir -p workdir
19 | fi
20 | work_dir=workdir/token3d_training_dir
21 | model_name=hvd_token3d_phase3
22 | exp_dir=${work_dir}/${model_name}
23 | resume_name='checkpoint.pt'
24 | best_name='hvd_token3d_phase2/epoch_100.pth.tar'
25 |
26 | resume_from=$exp_dir/$resume_name
27 | best_from=$work_dir/$best_name
28 |
29 | if [ ! -d ${exp_dir} ];then
30 | mkdir -p ${exp_dir}
31 | fi
32 | echo 'current work dir is: '${exp_dir}
33 |
34 | # for example
35 | # To run on a machine with 4 GPUs:
36 | # horovodrun -np 4 -H localhost:4 python train.py
37 |
38 | # To run on 4 machines with 4 GPUs each
39 | # horovodrun -np 16 -H server1:4,server2:4,server3:4,server4:4 python train.py
40 |
41 |
42 | gpu_min=$1 # total_min_gpu_num
43 | node_list=$2 #server1_ip:gpu_num,server2_ip:gpu_num
44 |
45 | horovodrun -np ${gpu_min} -H ${node_list} \
46 | python train_hvd.py --cfg configs/baseline_phase3.yaml\
47 | --resume $resume_from \
48 | --pretrained $best_from \
49 | --logdir ${exp_dir} 2>&1 | tee -a ${exp_dir}/hvd_output.log
50 |
--------------------------------------------------------------------------------
/lib/core/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yangsenius/INT_HMR_Model/8f7ee91bf8272fce37b571c02e5df49c5cd13b20/lib/core/__init__.py
--------------------------------------------------------------------------------
/lib/core/config.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | import argparse
3 | from pickle import FALSE
4 | from yacs.config import CfgNode as CN
5 |
6 | # CONSTANTS
7 | # You may modify them at will
8 | DB_DIR = 'data/database'
9 | DATA_DIR = 'data/smpl_data'
10 | INSTA_DIR = 'data/insta_variety'
11 | INSTA_IMG_DIR = 'data/insta_variety_img'
12 | MPII3D_DIR = 'data/mpi_inf_3dhp'
13 | THREEDPW_DIR = 'data/3dpw'
14 | HUMAN36M_DIR = 'data/human3.6m'
15 | PENNACTION_DIR = 'data/penn_action'
16 | POSETRACK_DIR = 'data/posetrack'
17 |
18 | # Configuration variables
19 | cfg = CN()
20 |
21 | cfg.OUTPUT_DIR = 'results'
22 | cfg.EXP_NAME = 'default'
23 | cfg.DEVICE = 'cuda'
24 | cfg.DEBUG = True
25 | cfg.LOGDIR = ''
26 | cfg.NUM_WORKERS = 8
27 | cfg.DEBUG_FREQ = 1000
28 | cfg.SEED_VALUE = -1
29 | cfg.SAVE_FREQ = 5
30 |
31 | cfg.CUDNN = CN()
32 | cfg.CUDNN.BENCHMARK = True
33 | cfg.CUDNN.DETERMINISTIC = False
34 | cfg.CUDNN.ENABLED = True
35 |
36 | cfg.TRAIN = CN()
37 | cfg.TRAIN.DATASETS_2D = ['insta']
38 | cfg.TRAIN.DATASETS_3D = ['mpii3d']
39 | cfg.TRAIN.DATASETS_IMG = ['coco2014-all']
40 | cfg.TRAIN.DATASET_EVAL = 'ThreeDPW'
41 | cfg.TRAIN.EVAL_SET = 'val'
42 | cfg.TRAIN.BATCH_SIZE_3D = 4
43 | cfg.TRAIN.BATCH_SIZE_2D = 4
44 | cfg.TRAIN.BATCH_SIZE_IMG = 8
45 | cfg.TRAIN.IMG_USE_FREQ = 1
46 | cfg.TRAIN.START_EPOCH = 0
47 | cfg.TRAIN.END_EPOCH = 5
48 | cfg.TRAIN.RESUME = ''
49 | cfg.TRAIN.NUM_ITERS_PER_EPOCH = -1
50 |
51 | # <====== optimizer
52 | cfg.TRAIN.OPTIM = CN()
53 | cfg.TRAIN.OPTIM.OPTIM = 'Adam'
54 | cfg.TRAIN.OPTIM.LR = 1e-4
55 | cfg.TRAIN.OPTIM.WD = 1e-4
56 | cfg.TRAIN.OPTIM.MOMENTUM = 0.9
57 | cfg.TRAIN.OPTIM.WARMUP_EPOCH = 2
58 | cfg.TRAIN.OPTIM.WARMUP_FACTOR = 0.1
59 | cfg.TRAIN.OPTIM.MILESTONES = [10, 15]
60 |
61 | cfg.DATASET = CN()
62 | cfg.DATASET.SEQLEN = 20
63 | cfg.DATASET.OVERLAP = 0.5
64 | cfg.DATASET.SAMPLE_POOL = 64
65 | cfg.DATASET.SIZE_JITTER = 0.2
66 | cfg.DATASET.ROT_JITTER = 30
67 | cfg.DATASET.RANDOM_SAMPLE = True
68 | cfg.DATASET.RANDOM_START = False
69 | cfg.DATASET.RANDOM_FLIP = 0.5
70 | cfg.DATASET.WIDTH = 224
71 | cfg.DATASET.HEIGHT = 224
72 | cfg.DATASET.RANDOM_CROP_P = 0.0
73 | cfg.DATASET.RANDOM_CROP_SIZE = 0.5
74 | cfg.DATASET.COLOR_JITTER = 0.3
75 | cfg.DATASET.ERASE_PROB = 0.3
76 | cfg.DATASET.ERASE_PART = 0.7
77 | cfg.DATASET.ERASE_FILL = False
78 | cfg.DATASET.ERASE_KP = False
79 | cfg.DATASET.ERASE_MARGIN = 0.2
80 |
81 |
82 | cfg.LOSS = CN()
83 | cfg.LOSS.KP_2D_W = 60.
84 | cfg.LOSS.KP_3D_W = 30.
85 | cfg.LOSS.SHAPE_W = 0.001
86 | cfg.LOSS.POSE_W = 1.0
87 | cfg.LOSS.SMPL_NORM = 1.
88 | cfg.LOSS.ACCL_W = 0.
89 | cfg.LOSS.DELTA_NORM = 0.0001
90 | cfg.LOSS.TEMP_W = 0.
91 |
92 | cfg.MODEL = CN()
93 |
94 | # GRU model hyperparams
95 | cfg.MODEL.DECODER = CN()
96 | cfg.MODEL.PROJ_MODE = 'linear'
97 | cfg.MODEL.USE_JOINT2D_HEAD = False
98 | cfg.MODEL.USE_ROT2TOKEN_HEAD = False
99 | cfg.MODEL.CONTRAINT_TOKEN_DELTA = False
100 | cfg.MODEL.DECODER.BACKBONE = 'ktd'
101 | cfg.MODEL.DECODER.HIDDEN_DIM = 1024
102 | cfg.MODEL.ENCODER = CN()
103 | cfg.MODEL.ENCODER.BACKBONE = 'ste'
104 | cfg.MODEL.ENCODER.NUM_BLOCKS = 6
105 | cfg.MODEL.ENCODER.NUM_HEADS = 12
106 | cfg.MODEL.ENCODER.SPA_TEMP_MODE = 'vanilla'
107 | # temporal
108 | cfg.MODEL.MASK_RATIO = 0.
109 | cfg.MODEL.TEMPORAL_LAYERS = 3
110 | cfg.MODEL.TEMPORAL_NUM_HEADS = 12
111 |
112 | cfg.MODEL.LOAD_PRETRAINED_HEAD = True
113 |
114 | cfg.MODEL.ENABLE_TEMP_MODELING = True
115 | cfg.MODEL.ENABLE_TEMP_EMBEDDING = False
116 |
117 | cfg.EVAL = CN()
118 | cfg.EVAL.SEQLEN = 16
119 | cfg.EVAL.SAMPLE_POOL = 128
120 | cfg.EVAL.BATCH_SIZE = 32
121 | cfg.EVAL.INTERPOLATION = 1
122 | cfg.EVAL.BBOX_SCALE = 1.3
123 |
124 | def get_cfg_defaults():
125 | """Get a yacs CfgNode object with default values for my_project."""
126 | # Return a clone so that the defaults will not be altered
127 | # This is for the "local variable" use pattern
128 | return cfg.clone()
129 |
130 |
131 | def update_cfg(cfg_file):
132 | cfg = get_cfg_defaults()
133 | cfg.merge_from_file(cfg_file)
134 | return cfg.clone()
135 |
136 |
137 | def parse_args():
138 | parser = argparse.ArgumentParser()
139 | parser.add_argument('--cfg', type=str, help='cfg file path')
140 | parser.add_argument('--pretrained', type=str,
141 | help='stage 1 checkpoint file path', default='')
142 | parser.add_argument('--resume', type=str, help='resume', default='')
143 | parser.add_argument('--eval_ds', type=str, help='eval set name', default='3dpw')
144 | parser.add_argument('--eval_set', type=str,
145 | help='eval set in [test|val]', default='test')
146 | parser.add_argument('--image_root', type=str,
147 | help='inference image root', default='')
148 | parser.add_argument('--image_list', type=str,
149 | help='inference image list', default='')
150 | parser.add_argument('--output_path', type=str,
151 | help='path to save the inference file generated in evaluation', default='')
152 | parser.add_argument('--logdir', type=str,
153 | help="workdir save logs, checkpoints, best_models")
154 | args = parser.parse_args()
155 | print(args, end='\n\n')
156 |
157 | cfg_file = args.cfg
158 | if args.cfg is not None:
159 | cfg = update_cfg(args.cfg)
160 | else:
161 | cfg = get_cfg_defaults()
162 |
163 | return args, cfg, cfg_file
164 |
--------------------------------------------------------------------------------
/lib/core/evaluate.py:
--------------------------------------------------------------------------------
1 | import time
2 | import torch
3 | import shutil
4 | import logging
5 | import numpy as np
6 | import os.path as osp
7 | import traceback
8 | import joblib
9 | from tqdm import tqdm
10 | from collections import defaultdict
11 | from scipy.interpolate import interp1d
12 |
13 | from lib.core.config import DATA_DIR, DB_DIR
14 | from lib.models.smpl import REGRESSOR_DICT, JID_DICT
15 | from lib.utils.utils import move_dict_to_device, AverageMeter
16 |
17 | from lib.utils.eval_utils import (
18 | compute_accel,
19 | compute_error_accel,
20 | compute_error_verts,
21 | batch_compute_similarity_transform_torch,
22 | )
23 | logger = logging.getLogger(__name__)
24 |
25 | class Evaluator():
26 | def __init__(self):
27 | self.evaluation_accumulators = defaultdict(list)
28 |
29 | def inference(self,
30 | model,
31 | dataloader,
32 | seqlen=8,
33 | interp=1,
34 | device='cpu',
35 | verbose=True, desc='[Evaluating] '
36 | ):
37 | """
38 | Args:
39 | interp (int >= 1): 1 out of frame is predicted by the model, while the rest is obtained by interpolation. interp = 1 means all the frames are predicted by the model.
40 | """
41 | model.eval()
42 | dataset_name = dataloader.dataset.dataset_name
43 |
44 | start = time.time()
45 |
46 | summary_string = ''
47 |
48 | self.evaluation_accumulators = defaultdict(list)
49 |
50 | flatten_dim = lambda x: x.reshape((-1, ) + x.shape[2:])
51 |
52 | J_regressor = torch.from_numpy(np.load(osp.join(DATA_DIR, REGRESSOR_DICT[dataset_name]))).float() if REGRESSOR_DICT[dataset_name] else None
53 | Jid = JID_DICT[dataset_name]
54 |
55 | tqdm_bar = tqdm(range(len(dataloader)), desc=desc) if verbose else range(len(dataloader))
56 | test_iter = iter(dataloader)
57 |
58 | for i in tqdm_bar:
59 | target = next(test_iter)
60 | move_dict_to_device(target, device)
61 |
62 | # <=============
63 | with torch.no_grad():
64 | pred_verts_seq = []
65 | pred_j3d_seq = []
66 | pred_j2d_seq = []
67 | pred_theta_seq = []
68 | pred_rotmat_seq = []
69 | valid_joints = [joint_id for joint_id in range(target['kp_3d'].shape[2]) if target['kp_3d'][0,0,joint_id,-1]]
70 |
71 | orig_len = target['images'].shape[1]
72 | interp_len = target['images'][:, ::interp].shape[1]
73 | sample_freq = interp_len // seqlen
74 |
75 | for i in range(sample_freq):
76 | inp = target['images'][:, ::interp][:, i::sample_freq]
77 |
78 | preds = model(inp, J_regressor=J_regressor)
79 |
80 | pred_verts_seq.append(preds['verts'].cpu().numpy())
81 | pred_j3d_seq.append(preds['kp_3d'][:,:,Jid].cpu().numpy())
82 | pred_j2d_seq.append(preds['kp_2d'][:,:,Jid].cpu().numpy())
83 | pred_theta_seq.append(preds['theta'].cpu().numpy())
84 | pred_rotmat_seq.append(preds['rotmat'].cpu().numpy())
85 |
86 | # valid_seq is used to filter out repeated frames
87 | valid_seq = flatten_dim(target['valid']).cpu().numpy()
88 |
89 | # register pred
90 | pred_verts_seq = self.interpolate(self.merge_sequence(pred_verts_seq), orig_len, interp_len)[valid_seq] # (NT, 6890, 3)
91 | pred_j3d_seq = self.interpolate(self.merge_sequence(pred_j3d_seq), orig_len, interp_len)[valid_seq] # (NT, n_kp, 3)
92 | pred_j2d_seq = self.interpolate(self.merge_sequence(pred_j2d_seq), orig_len, interp_len)[valid_seq] # (NT, n_kp, 2)
93 | pred_theta_seq = self.interpolate(self.merge_sequence(pred_theta_seq), orig_len, interp_len)[valid_seq] # (NT, 3+72+10)
94 | pred_rotmat_seq = self.interpolate(self.merge_sequence(pred_rotmat_seq), orig_len, interp_len)[valid_seq] # (NT, 3, 3)
95 |
96 | self.evaluation_accumulators['pred_verts'].append(pred_verts_seq)
97 | self.evaluation_accumulators['pred_theta'].append(pred_theta_seq)
98 | self.evaluation_accumulators['pred_rotmat'].append(pred_rotmat_seq)
99 | self.evaluation_accumulators['pred_j3d'].append(pred_j3d_seq)
100 | self.evaluation_accumulators['pred_j2d'].append(pred_j2d_seq)
101 |
102 | # register target
103 | target_j3d_seq = flatten_dim(target['kp_3d'][:, :, valid_joints]).cpu().numpy()[valid_seq] # (NT, n_kp, 4)
104 | target_j2d_seq = flatten_dim(target['kp_2d'][:, :, valid_joints]).cpu().numpy()[valid_seq] # (NT, n_kp, 3)
105 | target_theta_seq = flatten_dim(target['theta']).cpu().numpy()[valid_seq] # (NT, 3+72+10)
106 | self.evaluation_accumulators['target_theta'].append(target_theta_seq)
107 | self.evaluation_accumulators['target_j3d'].append(target_j3d_seq)
108 | self.evaluation_accumulators['target_j2d'].append(target_j2d_seq)
109 |
110 | # register some other infomation
111 | vid_name = np.reshape(np.array(target['instance_id']).T, (-1,))[valid_seq] # (NT,)
112 | paths = np.reshape(np.array(target['paths']).T, (-1,))[valid_seq] # (NT,)
113 | bboxes = np.reshape(target['bbox'].cpu().numpy(), (-1,4))[valid_seq] # (NT, 4)
114 | self.evaluation_accumulators['instance_id'].append(vid_name)
115 | self.evaluation_accumulators['bboxes'].append(bboxes)
116 | self.evaluation_accumulators['paths'].append(paths)
117 |
118 | # =============>
119 |
120 | batch_time = time.time() - start
121 |
122 | summary_string = f'{desc} | batch: {batch_time * 10.0:.4}ms '
123 |
124 | if verbose:
125 | tqdm_bar.set_description(summary_string)
126 |
127 | def merge_sequence(self, seq):
128 | if seq is None:
129 | return None
130 | seq = np.stack(seq, axis=2) #(N, T//num_of_seq, num_of_seq, ...)
131 | assert len(seq.shape) >= 3
132 | seq = seq.reshape((-1, ) + seq.shape[3:]) #(NT, ...)
133 | return seq
134 |
135 | def evaluate(self, save_path=''):
136 | # stack accumulators along axis 0
137 | for k, v in self.evaluation_accumulators.items():
138 | self.evaluation_accumulators[k] = np.concatenate(v, axis=0)
139 |
140 | pred_j3ds = self.evaluation_accumulators['pred_j3d'] #(N, n_kp, 3)
141 | target_j3ds = self.evaluation_accumulators['target_j3d'][:,:,:-1] #(N, n_kp, 3)
142 | vis = self.evaluation_accumulators['target_j3d'][:,:,-1:] #(N, n_kp, 1)
143 | num_pred = len(pred_j3ds)
144 | target_j3ds *= vis
145 | pred_j3ds *= vis
146 |
147 | pred_j3ds = torch.from_numpy(pred_j3ds).float()
148 | target_j3ds = torch.from_numpy(target_j3ds).float()
149 |
150 | pred_pelvis = (pred_j3ds[:,[2],:] + pred_j3ds[:,[3],:]) / 2.0
151 | target_pelvis = (target_j3ds[:,[2],:] + target_j3ds[:,[3],:]) / 2.0
152 |
153 | pred_j3ds -= pred_pelvis
154 | target_j3ds -= target_pelvis
155 |
156 |
157 | # reduce cpu memory
158 | pred_j3ds = pred_j3ds.cuda()
159 | target_j3ds = target_j3ds.cuda()
160 | del pred_pelvis, target_pelvis
161 |
162 |
163 | # Absolute error (MPJPE)
164 | errors = torch.sqrt(((pred_j3ds - target_j3ds) ** 2).sum(dim=-1)).mean(dim=-1).cpu().numpy()
165 | S1_hat = batch_compute_similarity_transform_torch(pred_j3ds, target_j3ds)
166 | errors_pa = torch.sqrt(((S1_hat - target_j3ds) ** 2).sum(dim=-1)).mean(dim=-1).cpu().numpy()
167 | pred_verts = self.evaluation_accumulators['pred_verts']
168 | target_theta = self.evaluation_accumulators['target_theta']
169 | pve = compute_error_verts(target_theta=target_theta, pred_verts=pred_verts)
170 |
171 | pred_j3ds = pred_j3ds.cpu().numpy()
172 | target_j3ds = target_j3ds.cpu().numpy()
173 |
174 | accel_err = compute_error_accel(joints_pred=pred_j3ds, joints_gt=target_j3ds)
175 | accel = compute_accel(pred_j3ds)
176 |
177 | m2mm = 1000
178 |
179 | eval_dict = {
180 | 'mpjpe': np.mean(errors) * m2mm,
181 | 'pa-mpjpe': np.mean(errors_pa) * m2mm,
182 | 'pve': np.mean(pve) * m2mm,
183 | 'accel': np.mean(accel) * m2mm,
184 | 'accel_err': np.mean(accel_err) * m2mm
185 | }
186 |
187 | if save_path:
188 | self.save_result(save_path, mpjpe=errors, pa_mpjpe=errors_pa, accel=accel_err)
189 |
190 | return eval_dict, num_pred
191 |
192 | def log(self, eval_dict, num_pred, desc=''):
193 | print(f"Evaluated on {int(num_pred)} number of poses.")
194 | print(f'{desc}' + ' '.join([f'{k.upper()}: {v:.4f},'for k,v in eval_dict.items()]))
195 |
196 | def run(self, model, dataloader,
197 | seqlen=8, interp=1, device='cpu',
198 | save_path='', verbose=True, desc='[Evaluating]'
199 | ):
200 | self.inference(model, dataloader, seqlen=seqlen, interp=interp, device=device, verbose=verbose, desc=desc)
201 | #self.count_attn(model)
202 | eval_dict, num_pred = self.evaluate(save_path)
203 | self.log(eval_dict, num_pred)
204 |
205 | def count_attn(self, model):
206 | result = {}
207 | result["vid_name"] = np.concatenate(self.evaluation_accumulators['instance_id'], axis=0)
208 |
209 | for i, blk in enumerate(model.backbone.blocks):
210 | result[f"attn_s_{i}"] = blk.attn.attn_count_s
211 | result[f"attn_t_{i}"] = blk.attn.attn_count_t
212 |
213 | joblib.dump(result, "attn.pt")
214 |
215 | def save_result(self, save_path, *args, **kwargs):
216 | save_fields = [
217 | 'pred_theta',
218 | #'pred_j3d',
219 | #'pred_j2d',
220 | 'pred_verts',
221 | 'paths',
222 | 'bboxes',
223 | #'pred_rotmat'
224 | ]
225 | save_dic = {k: v for k, v in self.evaluation_accumulators.items() if k in save_fields}
226 | save_dic.update(kwargs)
227 | joblib.dump(save_dic, osp.join(save_path, 'inference.pkl'))
228 |
229 | def interpolate(self, sequence, orig_len, interp_len):
230 | """
231 | Args:
232 | sequence (np array): size (N*interp_len, ...)
233 | orig_len (int)
234 | interp_len (int): larger than or equal to orig_len
235 |
236 | Return:
237 | A np array of size (N*orig_len, ...)
238 | """
239 | if orig_len == interp_len: return sequence
240 | sequence = sequence.reshape((-1, interp_len) + sequence.shape[1:]) # (N, interp_len, ...)
241 | x = np.linspace(1., 0., num=interp_len, endpoint=False)[::-1] # (interp_len, )
242 | f = interp1d(x, sequence, axis=1, fill_value="extrapolate")
243 |
244 | new_x = np.linspace(0., 1., num=orig_len, endpoint=True) # (orig_len, )
245 | ret = f(new_x)
246 | ret = ret.reshape((-1,) + ret.shape[2:])
247 | return ret
248 |
--------------------------------------------------------------------------------
/lib/data_utils/img_utils.py:
--------------------------------------------------------------------------------
1 | import os
2 | import cv2
3 | import torch
4 | import io
5 | import numpy as np
6 | import os.path as osp
7 |
8 | from skimage.util.shape import view_as_windows
9 | from PIL import Image
10 |
11 | import shutil
12 | import uuid
13 |
14 |
15 |
16 | def download(p, cache_dir='/dockerdata'):
17 | new_p = '{}/{}'.format(cache_dir, p)
18 | if (not os.path.exists(new_p)):
19 | subdir = os.path.dirname(new_p)
20 | if (subdir != ''):
21 | os.makedirs(subdir, exist_ok=True)
22 | uuid_str = uuid.uuid4().hex
23 | tmp_new_p = new_p + "." + uuid_str
24 | shutil.copyfile(p, tmp_new_p)
25 | shutil.move(tmp_new_p, new_p)
26 | return new_p
27 |
28 |
29 |
30 | def get_bbox_from_kp2d(kp_2d):
31 | # get bbox
32 | if len(kp_2d.shape) > 2:
33 | ul = np.array([kp_2d[:, :, 0].min(axis=1), kp_2d[:, :, 1].min(axis=1)]) # upper left
34 | lr = np.array([kp_2d[:, :, 0].max(axis=1), kp_2d[:, :, 1].max(axis=1)]) # lower right
35 | else:
36 | ul = np.array([kp_2d[:, 0].min(), kp_2d[:, 1].min()]) # upper left
37 | lr = np.array([kp_2d[:, 0].max(), kp_2d[:, 1].max()]) # lower right
38 |
39 | # ul[1] -= (lr[1] - ul[1]) * 0.10 # prevent cutting the head
40 | w = lr[0] - ul[0]
41 | h = lr[1] - ul[1]
42 | c_x, c_y = ul[0] + w / 2, ul[1] + h / 2
43 | # to keep the aspect ratio
44 | w = h = np.where(w / h > 1, w, h)
45 | w = h = h * 1.1
46 |
47 | bbox = np.array([c_x, c_y, w, h]) # shape = (4,N)
48 | return bbox
49 |
50 | def split_into_chunks(vid_names, seqlen, stride, pad=True):
51 | video_start_end_indices = []
52 |
53 | video_names, group = np.unique(vid_names, return_index=True)
54 | perm = np.argsort(group)
55 | video_names, group = video_names[perm], group[perm]
56 |
57 | indices = np.split(np.arange(0, vid_names.shape[0]), group[1:])
58 |
59 | for idx in range(len(video_names)):
60 | indexes = indices[idx]
61 | if pad:
62 | padlen = (seqlen - indexes.shape[0] % seqlen) % seqlen
63 | indexes = np.pad(indexes, ((0, padlen)), 'reflect')
64 | if indexes.shape[0] < seqlen:
65 | continue
66 | chunks = view_as_windows(indexes, (seqlen,), step=stride)
67 | chunks = chunks.tolist()
68 | #start_finish = chunks[:, (0, -1)].tolist()
69 | #video_start_end_indices += start_finish
70 | video_start_end_indices += chunks
71 |
72 | return video_start_end_indices
73 |
74 | def pad_image(img, h, w):
75 | img = img.copy()
76 | img_h, img_w, _ = img.shape
77 | pad_top = (h - img_h) // 2
78 | pad_bottom = h - img_h - pad_top
79 | pad_left = (w - img_w) // 2
80 | pad_right = w - img_w - pad_left
81 |
82 | img = np.pad(img, ((pad_top, pad_bottom),(pad_left, pad_right),(0, 0)))
83 |
84 | return img
85 | #
86 | # def read_img(path, convert='RGB', check_exist=False):
87 | # if check_exist and not osp.exists(path):
88 | # return None
89 | # try:
90 | # img = Image.open(path)
91 | # if convert:
92 | # img = img.convert(convert)
93 | # except:
94 | # raise IOError('File error: ', path)
95 | # return np.array(img)
96 |
97 |
98 | def read_img(path, convert='RGB', check_exist=False):
99 |
100 | if isinstance(path, bytes):
101 | path = path.decode()
102 | if 'mpi_inf_3dhp' in path and 'mpi_inf_3dhp_test_set' not in path:
103 | path = path[:-10] + 'frame_' + path[-10:]
104 | path = path.replace('v', 'V')
105 |
106 | if check_exist and not osp.exists(path):
107 | return None
108 | try:
109 | # img = Image.open(download(path))
110 | img = Image.open(path)
111 | if convert:
112 | img = img.convert(convert)
113 | except:
114 | img_list = [
115 | './data/mpi_inf_3dhp/mpi_inf_3dhp_test_set/TS2/imageSequence/img_005398.jpg',
116 | './data/mpi_inf_3dhp/mpi_inf_3dhp_test_set/TS2/imageSequence/img_003276.jpg'
117 | ]
118 | if path in img_list:
119 | img = Image.open('./data/mpi_inf_3dhp/mpi_inf_3dhp_test_set/TS2/imageSequence/img_003277.jpg')
120 | else:
121 | raise IOError('File error: ', path)
122 | return np.array(img)
123 |
--------------------------------------------------------------------------------
/lib/data_utils/insta_utils_imgs.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 | sys.path.append('.')
4 |
5 | import argparse
6 | import numpy as np
7 | import os.path as osp
8 | from multiprocessing import Process, Pool
9 | from glob import glob
10 | from tqdm import tqdm
11 | import tensorflow as tf
12 | from PIL import Image
13 |
14 | from lib.core.config import INSTA_DIR, INSTA_IMG_DIR
15 |
16 |
17 | def process_single_record(fname, outdir, split):
18 | sess = tf.Session()
19 | #print(fname)
20 | record_name = fname.split('/')[-1]
21 | for vid_idx, serialized_ex in enumerate(tf.python_io.tf_record_iterator(fname)):
22 | #print(vid_idx)
23 | os.makedirs(osp.join(outdir, split, record_name, str(vid_idx)), exist_ok=True)
24 | example = tf.train.Example()
25 | example.ParseFromString(serialized_ex)
26 |
27 | N = int(example.features.feature['meta/N'].int64_list.value[0])
28 |
29 | images_data = example.features.feature[
30 | 'image/encoded'].bytes_list.value
31 |
32 |
33 | for i in range(N):
34 | image = np.expand_dims(sess.run(tf.image.decode_jpeg(images_data[i], channels=3)), axis=0)
35 | #video.append(image)
36 | image = Image.fromarray(np.squeeze(image, axis=0))
37 | image.save(osp.join(outdir, split, record_name, str(vid_idx), str(i)+".jpg"))
38 |
39 |
40 | if __name__ == '__main__':
41 | parser = argparse.ArgumentParser()
42 | parser.add_argument('--inp_dir', type=str, help='tfrecords file path', default=INSTA_DIR)
43 | parser.add_argument('--n', type=int, help='total num of workers')
44 | parser.add_argument('--i', type=int, help='current index of worker (from 0 to n-1)')
45 | parser.add_argument('--split', type=str, help='train or test')
46 | parser.add_argument('--out_dir', type=str, help='output images path', default=INSTA_IMG_DIR)
47 | args = parser.parse_args()
48 |
49 | fpaths = glob(f'{args.inp_dir}/{args.split}/*.tfrecord')
50 | fpaths = sorted(fpaths)
51 |
52 | total = len(fpaths)
53 | fpaths = fpaths[args.i*total//args.n : (args.i+1)*total//args.n]
54 |
55 | #print(fpaths)
56 | #print(len(fpaths))
57 |
58 | os.makedirs(args.out_dir, exist_ok=True)
59 |
60 | for idx, fp in enumerate(fpaths):
61 | process_single_record(fp, args.out_dir, args.split)
--------------------------------------------------------------------------------
/lib/data_utils/mpii3d_utils.py:
--------------------------------------------------------------------------------
1 | """
2 | This script is borrowed from https://github.com/mkocabas/VIBE.
3 | Adhere to their license to use this script.
4 |
5 | We hacked it a little bit to make it happy in our framework.
6 | """
7 |
8 | import sys
9 | sys.path.append('.')
10 | import os
11 | import cv2
12 | import h5py
13 | import glob
14 | import json
15 | import joblib
16 | import argparse
17 | import numpy as np
18 | from tqdm import tqdm
19 | import os.path as osp
20 | import scipy.io as sio
21 |
22 | from lib.core.config import DB_DIR, MPII3D_DIR
23 | from lib.utils.utils import tqdm_enumerate
24 | from lib.data_utils.kp_utils import convert_kps
25 | from lib.data_utils.img_utils import get_bbox_from_kp2d
26 |
27 |
28 | def read_openpose(json_file, gt_part, dataset):
29 | # get only the arms/legs joints
30 | op_to_12 = [11, 10, 9, 12, 13, 14, 4, 3, 2, 5, 6, 7]
31 | # read the openpose detection
32 | json_data = json.load(open(json_file, 'r'))
33 | people = json_data['people']
34 | if len(people) == 0:
35 | # no openpose detection
36 | keyp25 = np.zeros([25,3])
37 | else:
38 | # size of person in pixels
39 | scale = max(max(gt_part[:,0])-min(gt_part[:,0]),max(gt_part[:,1])-min(gt_part[:,1]))
40 | # go through all people and find a match
41 | dist_conf = np.inf*np.ones(len(people))
42 | for i, person in enumerate(people):
43 | # openpose keypoints
44 | op_keyp25 = np.reshape(person['pose_keypoints_2d'], [25,3])
45 | op_keyp12 = op_keyp25[op_to_12, :2]
46 | op_conf12 = op_keyp25[op_to_12, 2:3] > 0
47 | # all the relevant joints should be detected
48 | if min(op_conf12) > 0:
49 | # weighted distance of keypoints
50 | dist_conf[i] = np.mean(np.sqrt(np.sum(op_conf12*(op_keyp12 - gt_part[:12, :2])**2, axis=1)))
51 | # closest match
52 | p_sel = np.argmin(dist_conf)
53 | # the exact threshold is not super important but these are the values we used
54 | if dataset == 'mpii':
55 | thresh = 30
56 | elif dataset == 'coco':
57 | thresh = 10
58 | else:
59 | thresh = 0
60 | # dataset-specific thresholding based on pixel size of person
61 | if min(dist_conf)/scale > 0.1 and min(dist_conf) < thresh:
62 | keyp25 = np.zeros([25,3])
63 | else:
64 | keyp25 = np.reshape(people[p_sel]['pose_keypoints_2d'], [25,3])
65 | return keyp25
66 |
67 |
68 | def read_calibration(calib_file, vid_list):
69 | Ks, Rs, Ts = [], [], []
70 | file = open(calib_file, 'r')
71 | content = file.readlines()
72 | for vid_i in vid_list:
73 | K = np.array([float(s) for s in content[vid_i * 7 + 5][11:-2].split()])
74 | K = np.reshape(K, (4, 4))
75 | RT = np.array([float(s) for s in content[vid_i * 7 + 6][11:-2].split()])
76 | RT = np.reshape(RT, (4, 4))
77 | R = RT[:3, :3]
78 | T = RT[:3, 3] / 1000
79 | Ks.append(K)
80 | Rs.append(R)
81 | Ts.append(T)
82 | return Ks, Rs, Ts
83 |
84 |
85 | def read_data_train(dataset_path, user_list, seq_list, vid_list, debug=False):
86 | h, w = 2048, 2048
87 | dataset = {
88 | 'vid_name': [],
89 | 'frame_id': [],
90 | 'joints3D': [],
91 | 'joints2D': [],
92 | 'bbox': [],
93 | 'img_name': [],
94 | }
95 |
96 | for user_i in user_list:
97 | for seq_i in seq_list:
98 | seq_path = os.path.join(dataset_path,
99 | 'S' + str(user_i),
100 | 'Seq' + str(seq_i))
101 | # mat file with annotations
102 | annot_file = os.path.join(seq_path, 'annot.mat')
103 | annot2 = sio.loadmat(annot_file)['annot2']
104 | annot3 = sio.loadmat(annot_file)['annot3']
105 | # calibration file and camera parameters
106 | for j, vid_i in enumerate(vid_list):
107 | # image folder
108 | imgs_path = os.path.join(seq_path,
109 | 'video_' + str(vid_i))
110 | # print(annot2, annot3, imgs_path)
111 | # per frame
112 |
113 |
114 | if not os.path.isdir(imgs_path):
115 | continue
116 | pattern = os.path.join(imgs_path, '*.jpg')
117 | img_list = sorted(glob.glob(pattern))
118 | vid_used_frames = []
119 | vid_used_joints = []
120 | vid_used_bbox = []
121 | vid_segments = []
122 | vid_uniq_id = "subj" + str(user_i) + '_seq' + str(seq_i) + "_vid" + str(vid_i) + "_seg0"
123 | for i, img_i in tqdm_enumerate(img_list, desc="sub{}_seq{}_vid{}".format(user_i, seq_i, vid_i)):
124 |
125 | # for each image we store the relevant annotations
126 | img_name = img_i.split('/')[-1]
127 | joints_2d_raw = np.reshape(annot2[vid_i][0][i], (1, 28, 2))
128 | joints_2d_raw= np.append(joints_2d_raw, np.ones((1,28,1)), axis=2)
129 | joints_2d = convert_kps(joints_2d_raw, "mpii3d", "spin").reshape((-1,3))
130 |
131 | joints_3d_raw = np.reshape(annot3[vid_i][0][i], (1, 28, 3)) / 1000
132 | joints_3d = convert_kps(joints_3d_raw, "mpii3d", "spin").reshape((-1,3))
133 |
134 | bbox = get_bbox_from_kp2d(joints_2d[~np.all(joints_2d == 0, axis=1)]).reshape(4)
135 |
136 | joints_3d = joints_3d - joints_3d[39] # 4 is the root
137 |
138 | # check that all joints are visible
139 | x_in = np.logical_and(joints_2d[:, 0] < w, joints_2d[:, 0] >= 0)
140 | y_in = np.logical_and(joints_2d[:, 1] < h, joints_2d[:, 1] >= 0)
141 | ok_pts = np.logical_and(x_in, y_in)
142 |
143 | if np.sum(ok_pts) < joints_2d.shape[0]:
144 | vid_uniq_id = "_".join(vid_uniq_id.split("_")[:-1])+ "_seg" +\
145 | str(int(dataset['vid_name'][-1].split("_")[-1][3:])+1)
146 | continue
147 |
148 | dataset['vid_name'].append(vid_uniq_id)
149 | dataset['frame_id'].append(img_name.split(".")[0])
150 | dataset['img_name'].append(img_i)
151 | dataset['joints2D'].append(joints_2d)
152 | dataset['joints3D'].append(joints_3d)
153 | dataset['bbox'].append(bbox)
154 | vid_segments.append(vid_uniq_id)
155 | vid_used_frames.append(img_i)
156 | vid_used_joints.append(joints_2d)
157 | vid_used_bbox.append(bbox)
158 |
159 | vid_segments= np.array(vid_segments)
160 | ids = np.zeros((len(set(vid_segments))+1))
161 | ids[-1] = len(vid_used_frames) + 1
162 | if (np.where(vid_segments[:-1] != vid_segments[1:])[0]).size != 0:
163 | ids[1:-1] = (np.where(vid_segments[:-1] != vid_segments[1:])[0]) + 1
164 |
165 |
166 | for k in dataset.keys():
167 | dataset[k] = np.array(dataset[k])
168 |
169 | valid = np.zeros([len(dataset['joints3D']), 49, 1])
170 | valid[:, 25:39, :] = 1
171 | valid[:, (39, 41, 43), :] = 1
172 | dataset['joints3D'] = np.concatenate([dataset['joints3D'], valid], axis=-1)
173 |
174 | return dataset
175 |
176 |
177 | def read_test_data(dataset_path):
178 |
179 | dataset = {
180 | 'vid_name': [],
181 | 'frame_id': [],
182 | 'joints3D': [],
183 | 'joints2D': [],
184 | 'bbox': [],
185 | 'img_name': [],
186 | 'features': [],
187 | "valid_i": []
188 | }
189 |
190 | user_list = range(1, 7)
191 |
192 | for user_i in user_list:
193 | print('Subject', user_i)
194 | seq_path = os.path.join(dataset_path,
195 | 'mpi_inf_3dhp_test_set',
196 | 'TS' + str(user_i))
197 | # mat file with annotations
198 | annot_file = os.path.join(seq_path, 'annot_data.mat')
199 | mat_as_h5 = h5py.File(annot_file, 'r')
200 | annot2 = np.array(mat_as_h5['annot2'])
201 | annot3 = np.array(mat_as_h5['univ_annot3'])
202 | valid = np.array(mat_as_h5['valid_frame'])
203 |
204 | vid_used_frames = []
205 | vid_used_joints = []
206 | vid_used_bbox = []
207 | vid_segments = []
208 | vid_uniq_id = "subj" + str(user_i) + "_seg0"
209 |
210 |
211 | for frame_i, valid_i in tqdm(enumerate(valid)):
212 |
213 | img_i = os.path.join('mpi_inf_3dhp_test_set',
214 | 'TS' + str(user_i),
215 | 'imageSequence',
216 | 'img_' + str(frame_i + 1).zfill(6) + '.jpg')
217 |
218 | joints_2d_raw = np.expand_dims(annot2[frame_i, 0, :, :], axis = 0)
219 | joints_2d_raw = np.append(joints_2d_raw, np.ones((1, 17, 1)), axis=2)
220 |
221 |
222 | joints_2d = convert_kps(joints_2d_raw, src="mpii3d_test", dst="spin").reshape((-1, 3))
223 |
224 | joints_3d_raw = np.reshape(annot3[frame_i, 0, :, :], (1, 17, 3)) / 1000
225 | joints_3d = convert_kps(joints_3d_raw, "mpii3d_test", "spin").reshape((-1, 3))
226 | joints_3d = joints_3d - joints_3d[39] # substract pelvis zero is the root for test
227 |
228 | bbox = get_bbox_from_kp2d(joints_2d[~np.all(joints_2d == 0, axis=1)]).reshape(4)
229 |
230 |
231 | # check that all joints are visible
232 | img_file = os.path.join(dataset_path, img_i)
233 | I = cv2.imread(img_file)
234 | h, w, _ = I.shape
235 | x_in = np.logical_and(joints_2d[:, 0] < w, joints_2d[:, 0] >= 0)
236 | y_in = np.logical_and(joints_2d[:, 1] < h, joints_2d[:, 1] >= 0)
237 | ok_pts = np.logical_and(x_in, y_in)
238 |
239 | if np.sum(ok_pts) < joints_2d.shape[0]:
240 | vid_uniq_id = "_".join(vid_uniq_id.split("_")[:-1]) + "_seg" + \
241 | str(int(dataset['vid_name'][-1].split("_")[-1][3:]) + 1)
242 | continue
243 |
244 | print(joints_3d.shape)
245 | dataset['vid_name'].append(vid_uniq_id)
246 | dataset['frame_id'].append(img_file.split("/")[-1].split(".")[0])
247 | dataset['img_name'].append(img_file)
248 | dataset['joints2D'].append(joints_2d)
249 | dataset['joints3D'].append(joints_3d)
250 | dataset['bbox'].append(bbox)
251 | dataset['valid_i'].append(valid_i)
252 |
253 | vid_segments.append(vid_uniq_id)
254 | vid_used_frames.append(img_file)
255 | vid_used_joints.append(joints_2d)
256 | vid_used_bbox.append(bbox)
257 |
258 | vid_segments = np.array(vid_segments)
259 | ids = np.zeros((len(set(vid_segments)) + 1))
260 | ids[-1] = len(vid_used_frames) + 1
261 | if (np.where(vid_segments[:-1] != vid_segments[1:])[0]).size != 0:
262 | ids[1:-1] = (np.where(vid_segments[:-1] != vid_segments[1:])[0]) + 1
263 |
264 | for k in dataset.keys():
265 | dataset[k] = np.array(dataset[k])
266 |
267 | valid = np.zeros([len(dataset['joints3D']), 49, 1])
268 | valid[:, 25:39, :] = 1
269 | valid[:, (39, 41, 43), :] = 1
270 | dataset['joints3D'] = np.concatenate([dataset['joints3D'], valid], axis=-1)
271 |
272 | return dataset
273 |
274 | if __name__ == '__main__':
275 | parser = argparse.ArgumentParser()
276 | parser.add_argument('--inp_dir', type=str, help='dataset directory', default=MPII3D_DIR)
277 | parser.add_argument('--out_dir', type=str, help='output directory', default=DB_DIR)
278 | parser.add_argument('--sub', nargs='+', type=int, default=[1,2,3,4,5,6,7,8])
279 | parser.add_argument('--seq', nargs='+', type=int, default=[1,2])
280 | parser.add_argument('--vid', nargs='+', type=int, default=[0,1,2,3,4,5,6,7,8])
281 | args = parser.parse_args()
282 |
283 | print(args.sub)
284 | print(args.seq)
285 | print(args.vid)
286 |
287 | dataset = read_data_train(args.inp_dir, args.sub, args.seq, args.vid)
288 | joblib.dump(dataset, osp.join(args.out_dir, 'mpii3d_train_db.pt'))
289 |
290 | #dataset = read_test_data(args.inp_dir)
291 | #joblib.dump(dataset, osp.join(args.out_dir, 'mpii3d_val_db.pt'))
292 |
293 |
294 |
295 |
--------------------------------------------------------------------------------
/lib/data_utils/penn_action_utils.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | This script is borrowed from https://github.com/mkocabas/VIBE.
4 | Adhere to their license to use this script.
5 |
6 | We hacked it a little bit to make it happy in our framework.
7 | """
8 |
9 | import sys
10 | sys.path.append('.')
11 |
12 | import glob
13 | import torch
14 | import joblib
15 | import argparse
16 | from tqdm import tqdm
17 | import os.path as osp
18 | from skimage import io
19 | from scipy.io import loadmat
20 |
21 | from lib.data_utils.kp_utils import *
22 | from lib.core.config import DB_DIR, PENNACTION_DIR
23 | from lib.data_utils.img_utils import get_bbox_from_kp2d
24 |
25 |
26 | def calc_kpt_bound(kp_2d):
27 | MAX_COORD = 10000
28 | x = kp_2d[:, 0]
29 | y = kp_2d[:, 1]
30 | z = kp_2d[:, 2]
31 | u = MAX_COORD
32 | d = -1
33 | l = MAX_COORD
34 | r = -1
35 | for idx, vis in enumerate(z):
36 | if vis == 0: # skip invisible joint
37 | continue
38 | u = min(u, y[idx])
39 | d = max(d, y[idx])
40 | l = min(l, x[idx])
41 | r = max(r, x[idx])
42 | return u, d, l, r
43 |
44 |
45 | def load_mat(path):
46 | mat = loadmat(path)
47 | del mat['pose'], mat['__header__'], mat['__globals__'], mat['__version__'], mat['train'], mat['action']
48 | mat['nframes'] = mat['nframes'][0][0]
49 |
50 | return mat
51 |
52 |
53 | def read_data(folder):
54 | dataset = {
55 | 'img_name' : [],
56 | 'joints2D': [],
57 | 'bbox': [],
58 | 'vid_name': [],
59 | }
60 |
61 | file_names = sorted(glob.glob(folder + '/labels/'+'*.mat'))
62 |
63 | for fname in tqdm(file_names):
64 | vid_dict=load_mat(fname)
65 | imgs = sorted(glob.glob(folder + '/frames/'+ fname.strip().split('/')[-1].split('.')[0]+'/*.jpg'))
66 | kp_2d = np.zeros((vid_dict['nframes'], 13, 3))
67 | perm_idxs = get_perm_idxs('pennaction', 'common')
68 |
69 | kp_2d[:, :, 0] = vid_dict['x']
70 | kp_2d[:, :, 1] = vid_dict['y']
71 | kp_2d[:, :, 2] = vid_dict['visibility']
72 | kp_2d = kp_2d[:, perm_idxs, :]
73 |
74 | # fix inconsistency
75 | n_kp_2d = np.zeros((kp_2d.shape[0], 14, 3))
76 | n_kp_2d[:, :12, :] = kp_2d[:, :-1, :]
77 | n_kp_2d[:, 13, :] = kp_2d[:, 12, :]
78 | kp_2d = n_kp_2d
79 |
80 | bbox = np.zeros((vid_dict['nframes'], 4))
81 |
82 | for fr_id, fr in enumerate(kp_2d):
83 | u, d, l, r = calc_kpt_bound(fr)
84 | center = np.array([(l + r) * 0.5, (u + d) * 0.5], dtype=np.float32)
85 | c_x, c_y = center[0], center[1]
86 | w, h = r - l, d - u
87 | w = h = np.where(w / h > 1, w, h)
88 |
89 | bbox[fr_id,:] = np.array([c_x, c_y, w, h])
90 |
91 | dataset['vid_name'].append(np.array([f'{fname}']* vid_dict['nframes']))
92 | dataset['img_name'].append(np.array(imgs))
93 | dataset['joints2D'].append(kp_2d)
94 | dataset['bbox'].append(bbox)
95 |
96 | for k in dataset.keys():
97 | dataset[k] = np.array(dataset[k])
98 | dataset[k] = np.concatenate(dataset[k])
99 |
100 | dataset['joints2D'] = convert_kps(dataset['joints2D'], src='pennaction', dst='spin')
101 |
102 | return dataset
103 |
104 |
105 | if __name__ == '__main__':
106 | parser = argparse.ArgumentParser()
107 | parser.add_argument('--inp_dir', type=str, help='dataset directory', default=PENNACTION_DIR)
108 | parser.add_argument('--out_dir', type=str, help='output directory', default=DB_DIR)
109 | args = parser.parse_args()
110 |
111 | dataset = read_data(args.inp_dir)
112 | joblib.dump(dataset, osp.join(args.out_dir, 'pennaction_train_db.pt'))
--------------------------------------------------------------------------------
/lib/data_utils/posetrack_utils.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | This script is borrowed from https://github.com/mkocabas/VIBE.
4 | Adhere to their license to use this script.
5 |
6 | We hacked it a little bit to make it happy in our framework.
7 | """
8 |
9 | import sys
10 | sys.path.append('.')
11 |
12 | import glob
13 | import joblib
14 | import argparse
15 | import numpy as np
16 | import json
17 | import os.path as osp
18 |
19 | from lib.core.config import DB_DIR, POSETRACK_DIR
20 | from lib.utils.utils import tqdm_enumerate
21 | from lib.data_utils.kp_utils import get_posetrack_original_kp_names, convert_kps
22 |
23 | def read_data(folder, set):
24 | dataset = {
25 | 'img_name' : [] ,
26 | 'joints2D': [],
27 | 'bbox': [],
28 | 'vid_name': [],
29 | }
30 |
31 | file_names = glob.glob(osp.join(folder, 'posetrack_data/annotations/', f'{set}/*.json'))
32 | file_names = sorted(file_names)
33 | nn_corrupted = 0
34 | tot_frames = 0
35 | min_frame_number = 8
36 |
37 | for fid,fname in tqdm_enumerate(file_names):
38 | if fname == osp.join(folder, 'annotations/train/021133_mpii_train.json'):
39 | continue
40 |
41 | with open(fname, 'r') as entry:
42 | anns = json.load(entry)
43 | # num_frames = anns['images'][0]['nframes']
44 | anns['images'] = [item for item in anns['images'] if item['is_labeled'] ]
45 | num_frames = len(anns['images'])
46 | frame2imgname = dict()
47 | for el in anns['images']:
48 | frame2imgname[el['frame_id']] = el['file_name']
49 |
50 | num_people = -1
51 | for x in anns['annotations']:
52 | if num_people < x['track_id']:
53 | num_people = x['track_id']
54 | num_people += 1
55 | posetrack_joints = get_posetrack_original_kp_names()
56 | idxs = [anns['categories'][0]['keypoints'].index(h) for h in posetrack_joints if h in anns['categories'][0]['keypoints']]
57 | for x in anns['annotations']:
58 | kps = np.array(x['keypoints']).reshape((17,3))
59 | kps = kps[idxs,:]
60 | x['keypoints'] = list(kps.flatten())
61 |
62 | tot_frames += num_people * num_frames
63 | for p_id in range(num_people):
64 |
65 | annot_pid = [(item['keypoints'], item['bbox'], item['image_id'])
66 | for item in anns['annotations']
67 | if item['track_id'] == p_id and not(np.count_nonzero(item['keypoints']) == 0) ]
68 |
69 | if len(annot_pid) < min_frame_number:
70 | nn_corrupted += len(annot_pid)
71 | continue
72 |
73 | bbox = np.zeros((len(annot_pid),4))
74 | # perm_idxs = get_perm_idxs('posetrack', 'common')
75 | kp_2d = np.zeros((len(annot_pid), len(annot_pid[0][0])//3 ,3))
76 | img_paths = np.zeros((len(annot_pid)))
77 |
78 | for i, (key2djnts, bbox_p, image_id) in enumerate(annot_pid):
79 |
80 | if (bbox_p[2]==0 or bbox_p[3]==0) :
81 | nn_corrupted +=1
82 | continue
83 |
84 | img_paths[i] = image_id
85 | key2djnts[2::3] = len(key2djnts[2::3])*[1]
86 |
87 | kp_2d[i,:] = np.array(key2djnts).reshape(int(len(key2djnts)/3),3) # [perm_idxs, :]
88 | for kp_loc in kp_2d[i,:]:
89 | if kp_loc[0] == 0 and kp_loc[1] == 0:
90 | kp_loc[2] = 0
91 |
92 |
93 | x_tl = bbox_p[0]
94 | y_tl = bbox_p[1]
95 | w = bbox_p[2]
96 | h = bbox_p[3]
97 | bbox_p[0] = x_tl + w / 2
98 | bbox_p[1] = y_tl + h / 2
99 | #
100 |
101 | w = h = np.where(w / h > 1, w, h)
102 | w = h = h * 0.8
103 | bbox_p[2] = w
104 | bbox_p[3] = h
105 | bbox[i, :] = bbox_p
106 |
107 | img_paths = list(img_paths)
108 | img_paths = [osp.join(folder, frame2imgname[item]) if item != 0 else 0 for item in img_paths ]
109 |
110 | bbx_idxs = []
111 | for bbx_id, bbx in enumerate(bbox):
112 | if np.count_nonzero(bbx) == 0:
113 | bbx_idxs += [bbx_id]
114 |
115 | kp_2d = np.delete(kp_2d, bbx_idxs, 0)
116 | img_paths = np.delete(np.array(img_paths), bbx_idxs, 0)
117 | bbox = np.delete(bbox, np.where(~bbox.any(axis=1))[0], axis=0)
118 |
119 | # Convert to common 2d keypoint format
120 | if bbox.size == 0 or bbox.shape[0] < min_frame_number:
121 | nn_corrupted += 1
122 | continue
123 |
124 | kp_2d = convert_kps(kp_2d, src='posetrack', dst='spin')
125 |
126 | dataset['vid_name'].append(np.array([f'{fname}_{p_id}']*img_paths.shape[0]))
127 | dataset['img_name'].append(np.array(img_paths))
128 | dataset['joints2D'].append(kp_2d)
129 | dataset['bbox'].append(np.array(bbox))
130 |
131 |
132 | assert kp_2d.shape[0] == img_paths.shape[0] == bbox.shape[0]
133 |
134 |
135 |
136 | print(nn_corrupted, tot_frames)
137 | for k in dataset.keys():
138 | dataset[k] = np.array(dataset[k])
139 |
140 | for k in dataset.keys():
141 | dataset[k] = np.concatenate(dataset[k])
142 |
143 | for k,v in dataset.items():
144 | print(k, v.shape)
145 |
146 | return dataset
147 |
148 |
149 | if __name__ == '__main__':
150 | parser = argparse.ArgumentParser()
151 | parser.add_argument('--inp_dir', type=str, help='dataset directory', default=POSETRACK_DIR)
152 | parser.add_argument('--out_dir', type=str, help='output directory', default=DB_DIR)
153 | args = parser.parse_args()
154 |
155 | dataset_train = read_data(args.inp_dir, 'train')
156 | joblib.dump(dataset_train, osp.join(args.out_dir, 'posetrack_train_db.pt'))
157 |
--------------------------------------------------------------------------------
/lib/data_utils/threedpw_utils.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | This script is borrowed from https://github.com/mkocabas/VIBE.
4 | Adhere to their license to use this script.
5 |
6 | We hacked it a little bit to make it happy in our framework.
7 | """
8 |
9 | import sys
10 | sys.path.append('.')
11 |
12 | import os
13 | import cv2
14 | import torch
15 | import joblib
16 | import argparse
17 | import numpy as np
18 | import pickle as pkl
19 | import os.path as osp
20 | from tqdm import tqdm
21 |
22 | from lib.data_utils.kp_utils import *
23 | from lib.core.config import DB_DIR, DATA_DIR, THREEDPW_DIR
24 | from lib.utils.smooth_bbox import get_smooth_bbox_params
25 | from lib.models.smpl import SMPL, SMPL_MODEL_DIR, H36M_TO_J14
26 | from lib.utils.geometry import batch_rodrigues, rotation_matrix_to_angle_axis
27 | from lib.data_utils.kp_utils import convert_kps
28 |
29 | NUM_JOINTS = 24
30 | VIS_THRESH = 0.3
31 | MIN_KP = 6
32 |
33 | def read_data(folder, set, debug=False):
34 |
35 | dataset = {
36 | 'vid_name': [],
37 | 'frame_id': [],
38 | 'joints3D': [],
39 | 'joints2D': [],
40 | 'shape': [],
41 | 'pose': [],
42 | 'bbox': [],
43 | 'img_name': [],
44 | 'valid': [],
45 | }
46 |
47 | sequences = [x.split('.')[0] for x in os.listdir(osp.join(folder, 'sequenceFiles', set))]
48 |
49 | J_regressor = None
50 |
51 | smpl = SMPL(SMPL_MODEL_DIR, batch_size=1, create_transl=False)
52 | if set == 'test' or set == 'validation':
53 | J_regressor = torch.from_numpy(np.load(osp.join(DATA_DIR, 'J_regressor_h36m.npy'))).float()
54 |
55 | for i, seq in tqdm(enumerate(sequences)):
56 |
57 | data_file = osp.join(folder, 'sequenceFiles', set, seq + '.pkl')
58 |
59 | data = pkl.load(open(data_file, 'rb'), encoding='latin1')
60 |
61 | img_dir = osp.join(folder, 'imageFiles', seq)
62 |
63 | num_people = len(data['poses'])
64 | num_frames = len(data['img_frame_ids'])
65 | assert (data['poses2d'][0].shape[0] == num_frames)
66 |
67 | for p_id in range(num_people):
68 | pose = torch.from_numpy(data['poses'][p_id]).float()
69 | shape = torch.from_numpy(data['betas'][p_id][:10]).float().repeat(pose.size(0), 1)
70 | trans = torch.from_numpy(data['trans'][p_id]).float()
71 | j2d = data['poses2d'][p_id].transpose(0,2,1)
72 | cam_pose = data['cam_poses']
73 | campose_valid = data['campose_valid'][p_id]
74 |
75 | # ======== Align the mesh params ======== #
76 | rot = pose[:, :3]
77 | rot_mat = batch_rodrigues(rot)
78 |
79 | Rc = torch.from_numpy(cam_pose[:, :3, :3]).float()
80 | Rs = torch.bmm(Rc, rot_mat.reshape(-1, 3, 3))
81 | rot = rotation_matrix_to_angle_axis(Rs)
82 | pose[:, :3] = rot
83 | # ======== Align the mesh params ======== #
84 |
85 | output = smpl(betas=shape, body_pose=pose[:,3:], global_orient=pose[:,:3], transl=trans)
86 | # verts = output.vertices
87 | j3d = output.joints
88 |
89 | if J_regressor is not None:
90 | vertices = output.vertices
91 | J_regressor_batch = J_regressor[None, :].expand(vertices.shape[0], -1, -1).to(vertices.device)
92 | j3d = torch.matmul(J_regressor_batch, vertices)
93 | j3d = j3d[:, H36M_TO_J14, :]
94 |
95 | img_paths = []
96 | for i_frame in range(num_frames):
97 | img_path = os.path.join(img_dir + '/image_{:05d}.jpg'.format(i_frame))
98 | img_paths.append(img_path)
99 |
100 | bbox_params, time_pt1, time_pt2 = get_smooth_bbox_params(j2d, vis_thresh=VIS_THRESH, sigma=8)
101 |
102 | # process bbox_params
103 | c_x = bbox_params[:,0]
104 | c_y = bbox_params[:,1]
105 | scale = bbox_params[:,2]
106 | w = h = 150. / scale
107 | w = h = h * 1.1
108 | bbox = np.vstack([c_x,c_y,w,h]).T
109 |
110 | # process keypoints
111 | j2d[:, :, 2] = j2d[:, :, 2] > 0.3 # set the visibility flags
112 | # Convert to common 2d keypoint format
113 | perm_idxs = get_perm_idxs('3dpw', 'common')
114 | perm_idxs += [0, 0] # no neck, top head
115 | j2d = j2d[:, perm_idxs]
116 | j2d[:, 12:, 2] = 0.0
117 |
118 | # print('j2d', j2d[time_pt1:time_pt2].shape)
119 | # print('campose', campose_valid[time_pt1:time_pt2].shape)
120 |
121 | img_paths_array = np.array(img_paths)[time_pt1:time_pt2]
122 | dataset['vid_name'].append(np.array([f'{seq}_{p_id}']*num_frames)[time_pt1:time_pt2])
123 | dataset['frame_id'].append(np.arange(0, num_frames)[time_pt1:time_pt2])
124 | dataset['img_name'].append(img_paths_array)
125 | dataset['joints3D'].append(j3d.numpy()[time_pt1:time_pt2])
126 | dataset['joints2D'].append(j2d[time_pt1:time_pt2])
127 | dataset['shape'].append(shape.numpy()[time_pt1:time_pt2])
128 | dataset['pose'].append(pose.numpy()[time_pt1:time_pt2])
129 | dataset['bbox'].append(bbox)
130 | dataset['valid'].append(campose_valid[time_pt1:time_pt2])
131 |
132 | for k in dataset.keys():
133 | dataset[k] = np.concatenate(dataset[k])
134 | print(k, dataset[k].shape)
135 |
136 | # Filter out keypoints
137 | indices_to_use = np.where((dataset['joints2D'][:, :, 2] > VIS_THRESH).sum(-1) > MIN_KP)[0]
138 | for k in dataset.keys():
139 | dataset[k] = dataset[k][indices_to_use]
140 |
141 | dataset['joints2D'] = convert_kps(dataset['joints2D'], src='common', dst='spin')
142 | valid = np.zeros([len(dataset['joints3D']), 49, 1])
143 | valid[:, 25:39, :] = 1
144 | if set != 'train':
145 | dataset['joints3D'] = convert_kps(dataset['joints3D'], src='common', dst='spin')
146 | dataset['joints3D'] = np.concatenate([dataset['joints3D'], valid], axis=-1)
147 |
148 | return dataset
149 |
150 |
151 | if __name__ == '__main__':
152 | parser = argparse.ArgumentParser()
153 | parser.add_argument('--inp_dir', type=str, help='dataset directory', default=THREEDPW_DIR)
154 | parser.add_argument('--out_dir', type=str, help='output directory', default=DB_DIR)
155 | args = parser.parse_args()
156 |
157 | debug = False
158 |
159 | dataset = read_data(args.inp_dir, 'validation', debug=debug)
160 | joblib.dump(dataset, osp.join(args.out_dir, '3dpw_val_db.pt'))
161 |
162 | dataset = read_data(args.inp_dir, 'train', debug=debug)
163 | joblib.dump(dataset, osp.join(args.out_dir, '3dpw_train_db.pt'))
164 |
165 | dataset = read_data(args.inp_dir, 'test', debug=debug)
166 | joblib.dump(dataset, osp.join(args.out_dir, '3dpw_test_db.pt'))
167 |
--------------------------------------------------------------------------------
/lib/data_utils/transforms/__init__.py:
--------------------------------------------------------------------------------
1 | from .crop import *
2 | from .color_jitter import *
3 | from .basic import *
4 | from .random_erase import *
5 | from .random_hflip import *
--------------------------------------------------------------------------------
/lib/data_utils/transforms/basic.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torchvision.transforms.functional as F
3 | import numpy as np
4 |
5 | from PIL import Image
6 |
7 | class _NormalizeBase(object):
8 | def __init__(self, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], patch_size=224, inplace=False):
9 | self.mean = mean
10 | self.std = std
11 | self.inplace = inplace
12 | self.patch_size = patch_size
13 |
14 | def normalize_2d_kp(self, kp_2d):
15 | # Normalize keypoints between -1, 1
16 | ratio = 1.0 / self.patch_size
17 | kp_2d = 2.0 * kp_2d * ratio - 1.0
18 |
19 | return kp_2d
20 |
21 | def __call__(self, instance):
22 | raise NotImplementedError()
23 |
24 | class NormalizeVideo(_NormalizeBase):
25 | def __init__(self, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], patch_size=224, inplace=False):
26 | super(NormalizeVideo, self).__init__(mean, std, patch_size, inplace)
27 |
28 | def __call__(self, instance):
29 | clip = instance['clip']
30 | new_clip = []
31 | for c in clip:
32 | new_clip.append(F.normalize(c, self.mean, self.std, self.inplace))
33 | new_clip = torch.stack(new_clip, dim=0)
34 |
35 | ret = {k: v for k, v in instance.items() if k not in ['clip', 'kp_2d', 'kp_2d_full']}
36 | ret.update({'clip': new_clip})
37 |
38 | if 'kp_2d' in instance:
39 | kp = instance['kp_2d']
40 | new_kp = kp
41 | new_kp[:,:, :2] = self.normalize_2d_kp(kp[:,:, :2])
42 | ret.update({'kp_2d':new_kp})
43 |
44 | if 'kp_2d_full' in instance:
45 | kp = instance['kp_2d_full']
46 | new_kp = kp
47 | new_kp[:,:, :2] = self.normalize_2d_kp(kp[:,:, :2])
48 | ret.update({'kp_2d_full':new_kp})
49 |
50 | return ret
51 |
52 | class NormalizeImage(_NormalizeBase):
53 | def __init__(self, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], patch_size=224, inplace=False):
54 | super(NormalizeImage, self).__init__(mean, std, patch_size, inplace)
55 |
56 | def __call__(self, instance):
57 | image = instance['image']
58 | new_image = F.normalize(image, self.mean, self.std, self.inplace)
59 |
60 | if 'kp_2d' in instance:
61 | kp = instance['kp_2d']
62 | new_kp = kp
63 | new_kp[:, :2] = self.normalize_2d_kp(kp[:, :2])
64 |
65 | ret = {k: v for k, v in instance.items() if k not in ['image', 'kp_2d']}
66 |
67 | if 'kp_2d' in instance:
68 | ret.update({'kp_2d':new_kp})
69 |
70 | ret.update({'image': new_image})
71 |
72 | return ret
73 |
74 | class StackFrames(object):
75 | """Stack a list of PIL Images or numpy arrays along a new dimension.
76 |
77 | Args:
78 | roll (float): whether to convert BGR to RGB. Default value is False
79 | """
80 | def __init__(self, roll=False):
81 | self.roll = roll
82 |
83 | def __call__(self, instance):
84 | clip = instance['clip']
85 | if self.roll:
86 | stacked_clip = np.stack([np.array(x)[:, :, ::-1] for x in clip], axis=0)
87 | else:
88 | stacked_clip = np.stack([np.array(x) for x in clip], axis=0)
89 |
90 | ret = {k:v for k, v in instance.items() if k!='clip'}
91 | ret.update({'clip': stacked_clip})
92 | return ret
93 |
94 | class ToTensorVideo(object):
95 | """ Converts a sequence of PIL.Image (RGB) or numpy.ndarray (T x H x W x C) in the range [0, 255]
96 | to a torch.FloatTensor of shape (T x C x H x W) in the range [0.0, 1.0] """
97 | def __call__(self, instance):
98 | clip = instance['clip']
99 | new_clip = []
100 | for img in clip:
101 | img = F.to_tensor(img)
102 | new_clip.append(img)
103 | clip = torch.stack(new_clip, dim=0)
104 |
105 | ret = {k: torch.from_numpy(v) for k, v in instance.items() if k!='clip'}
106 | ret.update({'clip': clip})
107 | return ret
108 |
109 | class ToTensorImage(object):
110 | """ Converts a PIL.Image (RGB) or numpy.ndarray (H x W x C) in the range [0, 255]
111 | to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] """
112 | def __call__(self, instance):
113 | image = instance['image']
114 | image = F.to_tensor(image)
115 | ret = {k: torch.from_numpy(v) for k, v in instance.items() if k!='image'}
116 | ret.update({'image': image})
117 | return ret
--------------------------------------------------------------------------------
/lib/data_utils/transforms/color_jitter.py:
--------------------------------------------------------------------------------
1 | import random
2 | import torchvision.transforms.functional as F
3 | import numpy as np
4 |
5 | from PIL import Image
6 |
7 |
8 | class _ColorJitter(object):
9 | def __init__(self, brightness=0, contrast=0, saturation=0, hue=0):
10 | self.brightness = brightness
11 | self.contrast = contrast
12 | self.saturation = saturation
13 | self.hue = hue
14 |
15 | def get_params(self, brightness, contrast, saturation, hue):
16 | if brightness > 0:
17 | brightness_factor = random.uniform(
18 | max(0, 1 - brightness), 1 + brightness)
19 | else:
20 | brightness_factor = None
21 |
22 | if contrast > 0:
23 | contrast_factor = random.uniform(
24 | max(0, 1 - contrast), 1 + contrast)
25 | else:
26 | contrast_factor = None
27 |
28 | if saturation > 0:
29 | saturation_factor = random.uniform(
30 | max(0, 1 - saturation), 1 + saturation)
31 | else:
32 | saturation_factor = None
33 |
34 | if hue > 0:
35 | hue_factor = random.uniform(-hue, hue)
36 | else:
37 | hue_factor = None
38 | return brightness_factor, contrast_factor, saturation_factor, hue_factor
39 |
40 | class ColorJitterVideo(_ColorJitter):
41 | """Randomly change the brightness, contrast and saturation and hue of the clip
42 | Args:
43 | brightness (float): How much to jitter brightness. brightness_factor
44 | is chosen uniformly from [max(0, 1 - brightness), 1 + brightness].
45 | contrast (float): How much to jitter contrast. contrast_factor
46 | is chosen uniformly from [max(0, 1 - contrast), 1 + contrast].
47 | saturation (float): How much to jitter saturation. saturation_factor
48 | is chosen uniformly from [max(0, 1 - saturation), 1 + saturation].
49 | hue(float): How much to jitter hue. hue_factor is chosen uniformly from
50 | [-hue, hue]. Should be >=0 and <= 0.5.
51 | """
52 |
53 | def __init__(self, brightness=0, contrast=0, saturation=0, hue=0):
54 | super(ColorJitterVideo, self).__init__(brightness, contrast, saturation, hue)
55 |
56 | def __call__(self, instance):
57 | """
58 | Args:
59 | instance (dict): must contain key 'clip', and instance['clip'] is a list of PIL Image or numpy array
60 | """
61 | if isinstance(instance['clip'][0], Image.Image):
62 | clip = instance['clip']
63 | elif isinstance(instance['clip'][0], np.ndarray):
64 | clip = [Image.fromarray(c) for c in instance['clip']]
65 | else:
66 | clip = instance['clip'][0]
67 | raise TypeError(
68 | f'Color jitter not yet implemented for {type(clip)}')
69 |
70 | brightness, contrast, saturation, hue = self.get_params(
71 | self.brightness, self.contrast, self.saturation, self.hue)
72 | # Create img transform function sequence
73 | img_transforms = []
74 | if brightness is not None:
75 | img_transforms.append(lambda img: F.adjust_brightness(img, brightness))
76 | if saturation is not None:
77 | img_transforms.append(lambda img: F.adjust_saturation(img, saturation))
78 | if hue is not None:
79 | img_transforms.append(lambda img: F.adjust_hue(img, hue))
80 | if contrast is not None:
81 | img_transforms.append(lambda img: F.adjust_contrast(img, contrast))
82 | random.shuffle(img_transforms)
83 | # Apply to all images
84 | jittered_clip = []
85 | for img in clip:
86 | jittered_img = img.copy()
87 | for func in img_transforms:
88 | jittered_img = func(jittered_img)
89 | jittered_clip.append(jittered_img)
90 |
91 |
92 | ret = {k:v for k, v in instance.items() if k!='clip'}
93 | ret.update({'clip': jittered_clip})
94 |
95 | return ret
96 |
97 | class ColorJitterImage(_ColorJitter):
98 | """Randomly change the brightness, contrast and saturation and hue of the images
99 | Args:
100 | brightness (float): How much to jitter brightness. brightness_factor
101 | is chosen uniformly from [max(0, 1 - brightness), 1 + brightness].
102 | contrast (float): How much to jitter contrast. contrast_factor
103 | is chosen uniformly from [max(0, 1 - contrast), 1 + contrast].
104 | saturation (float): How much to jitter saturation. saturation_factor
105 | is chosen uniformly from [max(0, 1 - saturation), 1 + saturation].
106 | hue(float): How much to jitter hue. hue_factor is chosen uniformly from
107 | [-hue, hue]. Should be >=0 and <= 0.5.
108 | """
109 |
110 | def __init__(self, brightness=0, contrast=0, saturation=0, hue=0):
111 | super(ColorJitterImage, self).__init__(brightness, contrast, saturation, hue)
112 |
113 | def __call__(self, instance):
114 | """
115 | Args:
116 | instance (dict): must contain key 'image'.
117 | instance['image'] PIL Image or numpy arrays.
118 | """
119 | if isinstance(instance['image'], Image.Image):
120 | image = instance['image']
121 | elif isinstance(instance['image'], np.ndarray):
122 | image = Image.fromarray(instance['image'])
123 | else:
124 | image = instance['image']
125 | raise TypeError(
126 | f'Random Erase not yet implemented for {type(image)}')
127 |
128 | brightness, contrast, saturation, hue = self.get_params(
129 | self.brightness, self.contrast, self.saturation, self.hue)
130 | # Create img transform function sequence
131 | img_transforms = []
132 | if brightness is not None:
133 | img_transforms.append(lambda img: F.adjust_brightness(img, brightness))
134 | if saturation is not None:
135 | img_transforms.append(lambda img: F.adjust_saturation(img, saturation))
136 | if hue is not None:
137 | img_transforms.append(lambda img: F.adjust_hue(img, hue))
138 | if contrast is not None:
139 | img_transforms.append(lambda img: F.adjust_contrast(img, contrast))
140 | random.shuffle(img_transforms)
141 |
142 | # Apply to images
143 | jittered_img = image.copy()
144 | for func in img_transforms:
145 | jittered_img = func(jittered_img)
146 |
147 | ret = {k:v for k, v in instance.items() if k!='image'}
148 | ret.update({'image': jittered_img})
149 |
150 | return ret
--------------------------------------------------------------------------------
/lib/data_utils/transforms/crop.py:
--------------------------------------------------------------------------------
1 | import os
2 | import cv2
3 | import random
4 |
5 | import numpy as np
6 | import os.path as osp
7 | import torchvision.transforms as transforms
8 | import torchvision.transforms.functional as F
9 |
10 | from PIL import Image
11 |
12 | class _CropBase(object):
13 | """Crop PIL Image or numpy array according to specified bounding box.
14 | Also affine keypoints coordinates to make sure keypoints are on the right position of the croped clip.
15 |
16 | In order to make the augmentation process simple and efficient, some image-level augmention are done here
17 | in a coupling manner, such as random rotation and random scale jitter.
18 |
19 | Args:
20 | patch_height (float or int): cropped clip height. Default value is 224.
21 | patch_width (float or int): cropped clip width. Default value is 224.
22 | rot_jitter (float): how much to randomly rotate clip and keypoints. rotation angle
23 | is chosen uniformly from [-rot_jitter, rot_jitter].
24 | size_jitter (float): how much to randomly rescale clip and keypoints. scale factor
25 | is chosen uniformly from [1.3 - size_jitter, 1.3 + size_jitter].
26 | random_crop_p (float): how much probability to apply random crop gen_augmentation
27 | random_crop_size (float): max ratio of height and width to be cropped
28 | """
29 | def __init__(self, patch_height=224, patch_width=224, rot_jitter=0.,
30 | size_jitter=0., random_crop_p=0., random_crop_size=0.5, default_bbox_scale=1.3):
31 | self.patch_width = patch_width
32 | self.patch_height = patch_height
33 | self.size_jitter = size_jitter
34 | self.rot_jitter = rot_jitter
35 | self.random_crop_p = random_crop_p
36 | self.random_crop_size = random_crop_size
37 | self.s = default_bbox_scale
38 |
39 | def gen_augmentation(self):
40 | scale = random.uniform(self.s - self.size_jitter, self.s + self.size_jitter)
41 | rot = random.uniform(-self.rot_jitter, self.rot_jitter)
42 | if np.random.rand() < self.random_crop_p:
43 | scale = np.random.uniform(self.s - self.random_crop_size, self.s)
44 | shift_w = np.random.uniform(-(self.s-scale)/2.0, (self.s-scale)/2.0)
45 | shift_h = np.random.uniform(-(self.s-scale)/2.0, (self.s-scale)/2.0)
46 | return (scale, scale), rot, (shift_w, shift_h)
47 | else:
48 | return (scale, scale), rot, (0, 0)
49 |
50 | def rotate_2d(self, pt_2d, rot_rad):
51 | x = pt_2d[0]
52 | y = pt_2d[1]
53 | sn, cs = np.sin(rot_rad), np.cos(rot_rad)
54 | xx = x * cs - y * sn
55 | yy = x * sn + y * cs
56 | return np.array([xx, yy], dtype=np.float32)
57 |
58 | def gen_trans(self, bbox, scale, rot, shift):
59 | # augment size with scale
60 | src_w = bbox[2] * scale[0]
61 | src_h = bbox[3] * scale[1]
62 | src_center = bbox[:2] + bbox[2:] * shift
63 |
64 | # augment rotation
65 | rot_rad = np.pi * rot / 180
66 | src_downdir = self.rotate_2d(np.array([0, src_h * 0.5], dtype=np.float32), rot_rad)
67 | src_rightdir = self.rotate_2d(np.array([src_w * 0.5, 0], dtype=np.float32), rot_rad)
68 |
69 | dst_w = self.patch_width
70 | dst_h = self.patch_height
71 | dst_center = np.array([dst_w * 0.5, dst_h * 0.5], dtype=np.float32)
72 | dst_downdir = np.array([0, dst_h * 0.5], dtype=np.float32)
73 | dst_rightdir = np.array([dst_w * 0.5, 0], dtype=np.float32)
74 |
75 | src = np.zeros((3, 2), dtype=np.float32)
76 | src[0, :] = src_center
77 | src[1, :] = src_center + src_downdir
78 | src[2, :] = src_center + src_rightdir
79 |
80 | dst = np.zeros((3, 2), dtype=np.float32)
81 | dst[0, :] = dst_center
82 | dst[1, :] = dst_center + dst_downdir
83 | dst[2, :] = dst_center + dst_rightdir
84 |
85 | trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
86 |
87 | return trans
88 |
89 | def trans_image(self, image, trans):
90 | affined = cv2.warpAffine(image.copy(), trans, (int(self.patch_width), int(self.patch_height)),
91 | flags=cv2.INTER_LINEAR, borderMode=cv2.BORDER_CONSTANT)
92 | affined = Image.fromarray(affined)
93 | return affined
94 |
95 | def trans_keypoints(self, kp_2d, trans):
96 | if len(kp_2d.shape) == 1:
97 | # a single keypoint
98 | src_pt = np.array([kp_2d[0], kp_2d[1], 1.]).T
99 | dst_pt = np.dot(trans, src_pt)
100 | return np.concatenate([dst_pt[0:2], kp_2d[-1:]], axis=0)
101 | else:
102 | # list of keypoints
103 | new_kp = np.zeros_like(kp_2d)
104 | for i, kp in enumerate(kp_2d):
105 | new_kp[i] = self.trans_keypoints(kp, trans)
106 | return new_kp
107 |
108 | def __call__(self, instance):
109 | raise NotImplementedError()
110 |
111 |
112 |
113 | class CropImage(_CropBase):
114 | """Crop PIL Image or numpy array and keypoints according to specified bounding box.
115 | """
116 | def __init__(self, patch_height=224, patch_width=224, rot_jitter=0.,
117 | size_jitter=0., random_crop_p=0., random_crop_size=0.5, default_bbox_scale=1.3):
118 | super(CropImage, self).__init__(patch_height, patch_width, rot_jitter,
119 | size_jitter, random_crop_p, random_crop_size, default_bbox_scale)
120 |
121 | def __call__(self, instance):
122 | if 'bbox' not in instance.keys():
123 | # do nothing if bbox is not specified
124 | return instance
125 |
126 | image, bbox = instance['image'], instance['bbox']
127 | kp_2d = instance['kp_2d'] if 'kp_2d' in instance else None
128 |
129 | scale, rot, shift = self.gen_augmentation()
130 | trans = self.gen_trans(bbox, scale, rot, shift)
131 | image = self.trans_image(image, trans)
132 | if kp_2d is not None:
133 | kp_2d = self.trans_keypoints(kp_2d, trans)
134 |
135 | ret = {k: v for k, v in instance.items() if k not in ['image', 'kp_2d']}
136 | ret.update({'image': image})
137 | if kp_2d is not None:
138 | ret.update({'kp_2d': kp_2d})
139 | return ret
140 |
141 |
142 |
143 | class CropVideo(_CropBase):
144 | """Crop a sequence of PIL Image or numpy array and keypoints according to specified bounding box.
145 | """
146 | def __init__(self, patch_height=224, patch_width=224, rot_jitter=0.,
147 | size_jitter=0., random_crop_p=0., random_crop_size=0.5, default_bbox_scale=1.3):
148 | super(CropVideo, self).__init__(patch_height, patch_width, rot_jitter,
149 | size_jitter, random_crop_p, random_crop_size, default_bbox_scale)
150 |
151 | def __call__(self, instance):
152 | if 'bbox' not in instance.keys():
153 | # do nothing if bbox is not specified
154 | return instance
155 |
156 | clip, bboxs = instance['clip'], instance['bbox']
157 |
158 | kp_2d = instance['kp_2d'] if 'kp_2d' in instance else [None] * len(clip)
159 |
160 | scale, rot, shift = self.gen_augmentation()
161 |
162 | clip_croped = []
163 | keypoints_affine = []
164 | for frame, bbox, keypoint in zip(clip, bboxs, kp_2d):
165 | trans = self.gen_trans(bbox, scale, rot, shift)
166 | clip_croped.append(self.trans_image(frame, trans))
167 | if keypoint is not None:
168 | keypoints_affine.append(self.trans_keypoints(keypoint, trans))
169 |
170 | if len(keypoints_affine) > 0:
171 | keypoints_affine = np.stack(keypoints_affine, axis=0)
172 |
173 | ret = {k: v for k, v in instance.items() if k not in ['clip', 'kp_2d']}
174 | ret.update({'clip': clip_croped})
175 | if len(keypoints_affine) > 0:
176 | ret.update({'kp_2d': keypoints_affine})
177 | return ret
178 |
--------------------------------------------------------------------------------
/lib/data_utils/transforms/random_erase.py:
--------------------------------------------------------------------------------
1 | import random
2 | import numpy as np
3 |
4 | from PIL import Image
5 |
6 | class _RandomEraseBase(object):
7 | """Randomly erase the lower part of the clip
8 | Args:
9 | prob (float): The probability to apply random erase
10 | max_erase_part (float): The maximum ratio of the erase part
11 | random_filling (bool): if True, fill the erased part with random pixel, otherwise with zero pixel.
12 | erase_kp (bool): if True, mask out the keypoints in the erased part.
13 | margin: (float): if is set to True, the keypoints in the margin between erased part and unerased part will not be masked out.
14 | """
15 |
16 | def __init__(self, prob=0, max_erase_part=0.5, random_filling=True, erase_kp=True, margin=0.1):
17 | self.prob = prob
18 | self.max_erase_part = max_erase_part
19 | self.random_filling = random_filling
20 | self.erase_kp = erase_kp
21 | self.margin = margin
22 |
23 | def _erase_top(self, img, kp_2d, kp_3d, erased_ratio):
24 | h,w,_ = img.shape
25 | erased_h = int(h * erased_ratio)
26 | if erased_h > 0:
27 | if self.random_filling:
28 | img[:erased_h] = np.random.randint(256, size=(erased_h, w, 3), dtype=np.uint8)
29 | else:
30 | img[:erased_h] = 0
31 | if self.erase_kp:
32 | for i, kp in enumerate(kp_2d):
33 | if erased_h - kp[1] > h * self.margin:
34 | kp_2d[2] = 0.
35 | if kp_3d is not None:
36 | kp_3d[t, i, -1] = 0
37 | return img, kp_2d, kp_3d
38 |
39 | def _erase_bottom(self, img, kp_2d, kp_3d, erased_ratio):
40 | h,w,_ = img.shape
41 | erased_h = int(h * erased_ratio)
42 | if erased_h > 0:
43 | if self.random_filling:
44 | img[-erased_h:] = np.random.randint(256, size=(erased_h, w, 3), dtype=np.uint8)
45 | else:
46 | img[-erased_h:] = 0
47 | if self.erase_kp:
48 | for i, kp in enumerate(kp_2d):
49 | if erased_h - (h - kp[1]) > h * self.margin:
50 | kp_2d[2] = 0.
51 | if kp_3d is not None:
52 | kp_3d[t, i, -1] = 0
53 | return img, kp_2d, kp_3d
54 |
55 | def _erase_left(self, img, kp_2d, kp_3d, erased_ratio):
56 | h,w,_ = img.shape
57 | erased_w = int(w * erased_ratio)
58 | if erased_w > 0:
59 | if self.random_filling:
60 | img[:erased_w] = np.random.randint(256, size=(h, erased_w, 3), dtype=np.uint8)
61 | else:
62 | img[:erased_w] = 0
63 | if self.erase_kp:
64 | for i, kp in enumerate(kp_2d):
65 | if erased_w - kp[0] > w * self.margin:
66 | kp_2d[2] = 0.
67 | if kp_3d is not None:
68 | kp_3d[t, i, -1] = 0
69 | return img, kp_2d, kp_3d
70 |
71 | def _erase_right(self, img, kp_2d, kp_3d, erased_ratio):
72 | h,w,_ = img.shape
73 | erased_w = int(w * erased_ratio)
74 | if erased_w > 0:
75 | if self.random_filling:
76 | img[-erased_w:] = np.random.randint(256, size=(h, erased_w, 3), dtype=np.uint8)
77 | else:
78 | img[-erased_w:] = 0
79 | if self.erase_kp:
80 | for i, kp in enumerate(kp_2d):
81 | if erased_w - (w - kp[0]) > w * self.margin:
82 | kp_2d[2] = 0.
83 | if kp_3d is not None:
84 | kp_3d[t, i, -1] = 0
85 | return img, kp_2d, kp_3d
86 |
87 | def __call__(self, instance):
88 | raise NotImplementedError()
89 |
90 |
91 | class RandomEraseVideo(_RandomEraseBase):
92 | """Randomly erase the lower part of the clip
93 | Args:
94 | prob (float): The probability to apply random erase
95 | max_erase_part (float): The maximum ratio of the erase part
96 | random_filling (bool): if True, fill the erased part with random pixel, otherwise with zero pixel.
97 | erase_kp (bool): if True, mask out the keypoints in the erased part.
98 | margin: (float): if is set to True, the keypoints in the margin between erased part and unerased part will not be masked out.
99 | """
100 |
101 | def __init__(self, prob=0, max_erase_part=0.5, random_filling=True, erase_kp=True, margin=0.1):
102 | super(RandomEraseVideo, self).__init__(prob, max_erase_part, random_filling, erase_kp, margin)
103 |
104 | def __call__(self, instance):
105 | """
106 | Args:
107 | instance (dict): must contain key 'image'.
108 | instance['image'] PIL Image or numpy arrays.
109 | """
110 | if isinstance(instance['clip'][0], Image.Image):
111 | clip = instance['clip']
112 | elif isinstance(instance['clip'][0], np.ndarray):
113 | clip = [Image.fromarray(c) for c in instance['clip']]
114 | else:
115 | clip = instance['clip'][0]
116 | raise TypeError(
117 | f'Random Erase not yet implemented for {type(clip)}')
118 |
119 | kp_2d = instance['kp_2d'].copy()
120 | kp_3d = instance['kp_3d'].copy() if 'kp_3d' in instance else None
121 |
122 | # Apply to all images
123 | erased_clip = []
124 | erased_kp_2ds = []
125 |
126 | erased_part = random.choice([self._erase_left, self._erase_right, self._erase_top, self._erase_bottom])
127 | for t, (kp_2d_frame, img) in enumerate(zip(kp_2d, clip)):
128 | erased_img = img.copy()
129 | erased_kp_2d = kp_2d_frame.copy()
130 | if np.random.rand() < self.prob:
131 | erased_ratio = np.random.rand() * self.max_erase_part
132 | erased_img = np.array(erased_img)
133 | erased_img, erased_kp_2d, _ = erased_part(erased_img, erased_kp_2d, None, erased_ratio)
134 |
135 | erased_img = Image.fromarray(erased_img)
136 |
137 | erased_kp_2ds.append(erased_kp_2d)
138 | erased_clip.append(erased_img)
139 |
140 | erased_kp_2ds = np.stack(erased_kp_2ds, axis=0)
141 |
142 |
143 | ret = {k:v for k, v in instance.items() if k not in ['clip', 'kp_2d', 'kp_3d']}
144 | ret.update({'clip': erased_clip, 'kp_2d': erased_kp_2ds})
145 | if kp_3d is not None:
146 | ret.update({'kp_3d': kp_3d})
147 |
148 |
149 | return ret
150 |
151 |
152 | class RandomEraseImage(_RandomEraseBase):
153 | """Randomly erase the lower part of the clip
154 | Args:
155 | prob (float): The probability to apply random erase
156 | max_erase_part (float): The maximum ratio of the erase part
157 | random_filling (bool): if True, fill the erased part with random pixel, otherwise with zero pixel.
158 | erase_kp (bool): if True, mask out the keypoints in the erased part.
159 | margin: (float): if is set to True, the keypoints in the margin between erased part and unerased part will not be masked out.
160 | """
161 |
162 | def __init__(self, prob=0, max_erase_part=0.5, random_filling=True, erase_kp=True, margin=0.1):
163 | super(RandomEraseImage, self).__init__(prob, max_erase_part, random_filling, erase_kp, margin)
164 |
165 | def __call__(self, instance):
166 | """
167 | Args:
168 | instance (dict): must contain key 'image'.
169 | instance['image'] PIL Image or numpy arrays.
170 | """
171 | if isinstance(instance['image'], Image.Image):
172 | image = instance['image']
173 | elif isinstance(instance['image'], np.ndarray):
174 | image = Image.fromarray(instance['image'])
175 | else:
176 | image = instance['image']
177 | raise TypeError(
178 | f'Random Erase not yet implemented for {type(image)}')
179 |
180 | kp_2d = instance['kp_2d'].copy()
181 | kp_3d = instance['kp_3d'].copy() if 'kp_3d' in instance else None
182 |
183 | erased_part = random.choice([self._erase_left, self._erase_right, self._erase_top, self._erase_bottom])
184 | erased_img = image.copy()
185 | erased_kp_2d = kp_2d.copy()
186 | if np.random.rand() < self.prob:
187 | erased_ratio = np.random.rand() * self.max_erase_part
188 | erased_img = np.array(erased_img)
189 | erased_img, erased_kp_2d, _ = erased_part(erased_img, kp_2d, None, erased_ratio)
190 | erased_img = Image.fromarray(erased_img)
191 |
192 |
193 | ret = {k:v for k, v in instance.items() if k not in ['image', 'kp_2d', 'kp_3d']}
194 | ret.update({'image': erased_img, 'kp_2d': erased_kp_2d})
195 | if kp_3d is not None:
196 | ret.update({'kp_3d': kp_3d})
197 |
198 | return ret
--------------------------------------------------------------------------------
/lib/data_utils/transforms/random_hflip.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import random
3 | import torchvision.transforms.functional as F
4 |
5 | from lib.data_utils.kp_utils import keypoint_2d_hflip, keypoint_3d_hflip, smpl_pose_hflip
6 |
7 | from PIL import Image
8 |
9 | class RandomHorizontalFlipImage(object):
10 | """Horizontally flip the input image, keypoints and smpl pose randomly with a given probability.
11 |
12 | Args:
13 | p (float): probability of the image being flipped. Default value is 0.5
14 | """
15 | def __init__(self, p=0.5):
16 | self.p = p
17 |
18 | def __call__(self, instance):
19 | """
20 | instance (dict): must contain key 'image' and 'kp_2d'. Optional: support 'kp_3d' and 'pose' flip.
21 |
22 | instance['image'] is a list of PIL Images or numpy arrays.
23 | instance['kp_2d'] is a numpy array.
24 | instance['kp_3d'] is a numpy array.
25 | instance['pose'] is a numpy array.
26 |
27 | Returns:
28 | same as input, while image and keypoints are flipped.
29 | """
30 | if isinstance(instance['image'], Image.Image):
31 | image = instance['image']
32 | elif isinstance(instance['image'], np.ndarray):
33 | image = Image.fromarray(instance['image'])
34 | else:
35 | image = instance['image']
36 | raise TypeError(
37 | f'Random Horizontal Flip not yet implemented for {type(image)}')
38 |
39 | kp_2d = instance['kp_2d'].copy()
40 | kp_3d = instance['kp_3d'].copy() if 'kp_3d' in instance else None
41 | pose = instance['pose'].copy() if 'pose' in instance else None
42 |
43 | img_width = image.size[0]
44 |
45 | if random.random() < self.p:
46 | flipped_image = F.hflip(image)
47 | flipped_kp_2d = keypoint_2d_hflip(kp_2d, img_width)
48 | flipped_kp_3d = keypoint_3d_hflip(kp_3d) if kp_3d is not None else None
49 | flipped_pose = smpl_pose_hflip(pose) if pose is not None else None
50 | else:
51 | flipped_image = image
52 | flipped_kp_2d = kp_2d
53 | flipped_kp_3d = kp_3d
54 | flipped_pose = pose
55 |
56 | ret = {k:v for k, v in instance.items() if k not in ['image', 'kp_2d', 'kp_3d', 'pose']}
57 | ret.update({'image': flipped_image, 'kp_2d':flipped_kp_2d})
58 | if flipped_kp_3d is not None:
59 | ret.update({'kp_3d': flipped_kp_3d})
60 | if flipped_pose is not None:
61 | ret.update({'pose': flipped_pose})
62 |
63 | return ret
64 |
65 |
66 | class RandomHorizontalFlipVideo(object):
67 | """Horizontally flip the given list of PIL Images randomly with a given probability.
68 |
69 | Args:
70 | p (float): probability of the image being flipped. Default value is 0.5
71 | """
72 | def __init__(self, p=0.5):
73 | self.p = p
74 |
75 | def __call__(self, instance):
76 | """
77 | instance (dict): must contain key 'clip' and 'kp_2d'. Optional: support 'kp_3d' and 'pose' flip.
78 |
79 | instance['clip'] is a list of PIL Images or numpy arrays.
80 | instance['kp_2d'] is a numpy array.
81 | instance['kp_3d'] is a numpy array.
82 | instance['pose'] is a numpy array.
83 |
84 | Returns:
85 | same as input, while clip and keypoints are flipped.
86 | """
87 | if isinstance(instance['clip'][0], Image.Image):
88 | clip = instance['clip']
89 | elif isinstance(instance['clip'][0], np.ndarray):
90 | clip = [Image.fromarray(c) for c in instance['clip']]
91 | else:
92 | clip = instance['clip'][0]
93 | raise TypeError(
94 | f'Random Horizontal Flip not yet implemented for {type(clip)}')
95 |
96 | kp_2d = instance['kp_2d'].copy()
97 | kp_3d = instance['kp_3d'].copy() if 'kp_3d' in instance else None
98 | pose = instance['pose'].copy() if 'pose' in instance else None
99 |
100 | img_width = clip[0].size[0]
101 |
102 | if random.random() < self.p:
103 | flipped_clip = []
104 | for img in clip:
105 | flipped_clip.append(F.hflip(img))
106 | flipped_kp_2d = keypoint_2d_hflip(kp_2d, img_width)
107 | flipped_kp_3d = keypoint_3d_hflip(kp_3d) if kp_3d is not None else None
108 | flipped_pose = smpl_pose_hflip(pose) if pose is not None else None
109 | else:
110 | flipped_clip = clip
111 | flipped_kp_2d = kp_2d
112 | flipped_kp_3d = kp_3d
113 | flipped_pose = pose
114 |
115 | ret = {k:v for k, v in instance.items() if k not in ['clip', 'kp_2d', 'kp_3d', 'pose']}
116 | ret.update({'clip': flipped_clip, 'kp_2d':flipped_kp_2d})
117 | if flipped_kp_3d is not None:
118 | ret.update({'kp_3d': flipped_kp_3d})
119 | if flipped_pose is not None:
120 | ret.update({'pose': flipped_pose})
121 |
122 | return ret
--------------------------------------------------------------------------------
/lib/dataset/__init__.py:
--------------------------------------------------------------------------------
1 | from .dataset_video import VideoDataset
2 | from .dataset_image import ImageDataset
3 |
--------------------------------------------------------------------------------
/lib/dataset/dataset_image.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | import os
3 | # import mc
4 | import cv2
5 | import torch
6 | import numpy as np
7 | import joblib
8 | import os.path as osp
9 | from PIL import Image
10 | from torch.utils.data import Dataset
11 |
12 | from lib.utils.geometry import rotation_matrix_to_angle_axis
13 | from lib.data_utils.img_utils import read_img
14 | from lib.core.config import DB_DIR
15 |
16 |
17 | class ImageDataset(Dataset):
18 | def __init__(self, dataset_name, set,
19 | transforms=None,
20 | verbose=True, debug=False):
21 |
22 | self.dataset_name = dataset_name
23 | self.set = set
24 | self.transforms = transforms
25 |
26 | self.debug = debug
27 | self.verbose = verbose
28 |
29 | self.db = self._load_db()
30 |
31 | if self.verbose:
32 | print(f'{self.dataset_name} - Number of dataset objects {self.__len__()}')
33 |
34 | def _load_db(self):
35 | db_file = osp.join(DB_DIR, f'{self.dataset_name}_{self.set}_db.pt')
36 |
37 | if osp.isfile(db_file):
38 | db = joblib.load(db_file)
39 | else:
40 | raise ValueError(f'{db_file} do not exists')
41 |
42 | if self.verbose:
43 | print(f'Loaded {self.dataset_name} dataset from {db_file}')
44 | return db
45 |
46 | def __len__(self):
47 | return len(self.db['img_name'])
48 |
49 | def __getitem__(self, index):
50 | kp_2d = self.db['joints2D'][index]
51 | kp_3d = self.db['joints3D'][index] if 'joints3D' in self.db else np.zeros([49, 4])
52 | # path_names = self.db['img_name'][index].split('/')
53 | # # if 'human3.6m' in path_names:
54 | # #
55 | path_name = self.db['img_name'][index]
56 | image = read_img(path_name)
57 | shape = self.db['shape'][index] if 'shape' in self.db else np.zeros([10])
58 | cam = self.db['cam'][index] if 'cam' in self.db else np.array([1., 0., 0.])
59 | bbox = self.db['bbox'][index]
60 |
61 | pose = self.db['pose'][index].astype(np.float32) if 'pose' in self.db else np.zeros([72])
62 | if len(pose.shape) > 1:
63 | pose = rotation_matrix_to_angle_axis(torch.from_numpy(pose)).numpy().flatten()
64 |
65 | target = {
66 | 'image': image,
67 | 'kp_2d': kp_2d,
68 | 'kp_3d': kp_3d,
69 | 'pose':pose,
70 | 'shape':shape,
71 | 'cam':cam,
72 | 'bbox': bbox
73 | }
74 | if self.transforms:
75 | target = self.transforms(target)
76 |
77 | target['theta'] = torch.cat([target['cam'].float(), target['pose'].float(), target['shape'].float()], dim=0) # camera, pose and shape
78 | target['w_smpl'] = torch.tensor(1).float() if 'pose' in self.db else torch.tensor(0).float()
79 |
80 | new_target = {}
81 | for k, v in target.items():
82 | if k in ['pose', 'cam', 'shape']:
83 | continue
84 | new_target[k] = v.float()
85 |
86 | return new_target
87 |
--------------------------------------------------------------------------------
/lib/dataset/dataset_video.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | import os
3 | import torch
4 | import logging
5 | import numpy as np
6 | import os.path as osp
7 | import joblib
8 | import random
9 |
10 | from torch.utils.data import Dataset
11 |
12 | from lib.core.config import DB_DIR
13 | from lib.models.smpl import OP_TO_J14, J49_TO_J14
14 | from lib.data_utils.kp_utils import convert_kps
15 | from lib.data_utils.img_utils import split_into_chunks, read_img
16 |
17 | logger = logging.getLogger(__name__)
18 |
19 | class VideoDataset(Dataset):
20 | def __init__(self, dataset_name, set, transforms,
21 | seqlen=0, overlap=0., sample_pool=64,
22 | random_sample=True, random_start=False,
23 | pad=True, verbose=True, debug=False):
24 |
25 | self.dataset_name = dataset_name
26 | self.set = set
27 | self.transforms = transforms
28 |
29 | assert seqlen > 0 or sample_pool > 0
30 | self.seqlen = seqlen if seqlen > 0 else sample_pool
31 | self.sample_pool = sample_pool if sample_pool > 0 else seqlen
32 | self.sample_freq = self.sample_pool // self.seqlen
33 | #assert self.sample_pool % self.seqlen == 0
34 |
35 | self.overlap = overlap
36 | self.stride = max(int(self.sample_pool * (1-overlap)), 1) if overlap < 1 else overlap
37 |
38 | self.random_sample = random_sample
39 | self.random_start = random_start
40 | assert not (self.random_sample and self.random_start)
41 | # Either random sample or random start, cannot be both
42 |
43 | self.debug = debug
44 | self.verbose = verbose
45 |
46 | self.db = self._load_db()
47 | self.vid_indices = split_into_chunks(self.db['vid_name'], self.sample_pool, self.stride, pad)
48 |
49 | if self.verbose:
50 | print(f'{self.dataset_name} - Dataset overlap ratio: {self.overlap}')
51 | print(f'{self.dataset_name} - Number of dataset objects {self.__len__()}')
52 |
53 |
54 | def __len__(self):
55 | return len(self.vid_indices)
56 |
57 | def __getitem__(self, index):
58 | is_train = self.set == 'train'
59 | target = {}
60 |
61 | # determine sample index
62 | sample_idx, full_sample_idx = self.gen_sample_index(index)
63 |
64 | # load and process 2D&3D keypoints
65 | kp_2d, kp_3d = self.get_keypoints(sample_idx)
66 |
67 | # load SMPL parameters: theta, beta along with cam params.
68 | cam, pose, shape, w_smpl = self.get_smpl_params(sample_idx)
69 | target['w_smpl'] = w_smpl
70 |
71 | # bounding box
72 | if self.dataset_name != 'insta':
73 | bbox = self.db['bbox'][sample_idx]
74 | if not is_train:
75 | target['bbox'] = self.db['bbox'][sample_idx]
76 |
77 | # images
78 | image_paths = self.db['img_name'][sample_idx]
79 | # new_img_paths = []
80 | # for path_name in image_paths:
81 | # if isinstance(path_name, bytes):
82 | # path_name = path_name.decode()
83 | # if 'mpi_inf_3dhp' in path_name:
84 | # path_name = path_name[:-10] + 'frame_' + path_name[-10:]
85 | # path_name = path_name.replace('v', 'V')
86 | # new_img_paths.append(path_name)
87 |
88 | images = [read_img(path) for path in image_paths]
89 |
90 | if not is_train:
91 | target['paths'] = self.db['img_name'][sample_idx].tolist()
92 |
93 | # preprocess and augmentation
94 | raw_inp = {
95 | 'clip': images,
96 | 'kp_2d': kp_2d,
97 | 'kp_3d':kp_3d,
98 | 'pose':pose,
99 | 'shape':shape,
100 | 'cam':cam,
101 | }
102 | if self.dataset_name != 'insta':
103 | raw_inp['bbox'] = bbox
104 | transformed = self.transforms(raw_inp)
105 |
106 | target['images'] = transformed['clip'].float()
107 | target['kp_2d'] = transformed['kp_2d'].float()
108 | target['kp_3d'] = transformed['kp_3d'].float()
109 |
110 | theta = torch.cat([transformed['cam'].float(), transformed['pose'].float(), transformed['shape'].float()], dim=1) #(T, 85)
111 | target['theta'] = theta.float() # camera, pose and shape
112 |
113 | # optional info for evaluation
114 | if self.dataset_name == 'mpii3d' and not is_train:
115 | target['valid'] = self.db['valid_i'][sample_idx]
116 | vn = self.db['vid_name'][sample_idx]
117 | fi = self.db['frame_id'][sample_idx]
118 | target['instance_id'] = [f'{v}/{f}'for v,f in zip(vn,fi)]
119 | if self.dataset_name in ['3dpw', 'h36m'] and not is_train:
120 | vn = self.db['vid_name'][sample_idx]
121 | fi = self.db['frame_id'][sample_idx]
122 | target['instance_id'] = [f'{v}/{f}'for v,f in zip(vn,fi)]
123 | if not is_train:
124 | valid = np.array(full_sample_idx)
125 | valid = valid - np.roll(valid, 1)
126 | valid = valid > 0
127 | valid[0] = True
128 | target['valid'] = torch.from_numpy(valid)
129 |
130 | # record data source for further use
131 | target['index'] = torch.tensor([index])
132 |
133 | return target
134 |
135 | def _load_db(self):
136 | db_file = osp.join(DB_DIR, f'{self.dataset_name}_{self.set}_db.pt')
137 | print('db_file', db_file, DB_DIR)
138 |
139 | if osp.isfile(db_file):
140 | db = joblib.load(db_file)
141 | else:
142 | raise ValueError(f'{db_file} do not exists')
143 |
144 | if self.verbose:
145 | print(f'Loaded {self.dataset_name} dataset from {db_file}')
146 | return db
147 |
148 | def gen_sample_index(self, index):
149 | full_sample_idx = self.vid_indices[index]
150 |
151 | if self.random_sample:
152 | sample_idx = []
153 | for i in range(self.seqlen):
154 | sample_idx.append(full_sample_idx[self.sample_freq*i + random.randint(0, self.sample_freq-1)])
155 | elif self.random_start:
156 | start = random.randint(0, self.sample_freq-1)
157 | sample_idx = full_sample_idx[start::self.sample_freq][:self.seqlen]
158 | else:
159 | sample_idx = full_sample_idx[::self.sample_freq][:self.seqlen]
160 |
161 | return sample_idx, full_sample_idx
162 |
163 | def get_keypoints(self, sample_idx):
164 | if 'joints2D' in self.db:
165 | kp_2d = self.db['joints2D'][sample_idx]
166 | else:
167 | kp_2d = np.zeros([self.seqlen, 49, 3])
168 |
169 | if 'joints3D' in self.db:
170 | kp_3d = self.db['joints3D'][sample_idx]
171 | else:
172 | kp_3d = np.zeros([self.seqlen, 49, 4])
173 |
174 | return kp_2d, kp_3d
175 |
176 | def get_smpl_params(self, sample_idx):
177 | # w_smpl indicates whether the instance's SMPL parameters are valid
178 | if 'pose' in self.db:
179 | assert 'shape' in self.db
180 | pose = self.db['pose'][sample_idx]
181 | shape = self.db['shape'][sample_idx]
182 | w_smpl = torch.ones(self.seqlen).float()
183 | else:
184 | pose = np.zeros((self.seqlen, 72))
185 | shape = np.zeros((self.seqlen, 10))
186 | w_smpl = torch.zeros(self.seqlen).float()
187 |
188 | cam = np.concatenate([np.ones((self.seqlen, 1)), np.zeros((self.seqlen, 2))], axis=1) #(T, 3)
189 | return cam, pose, shape, w_smpl
190 |
--------------------------------------------------------------------------------
/lib/dataset/loaders.py:
--------------------------------------------------------------------------------
1 | from torch.utils.data import ConcatDataset, DataLoader, Subset
2 | from torch.utils.data.distributed import DistributedSampler
3 | import torch.distributed as dist
4 | from lib.dataset import *
5 | import torch
6 | import joblib
7 | import os.path as osp
8 |
9 | def get_data_loaders(cfg, transforms_3d, transforms_2d, transforms_val, transforms_img, rank, world_size, verbose=True):
10 | def get_2d_datasets(dataset_names):
11 | datasets = []
12 | for dataset_name in dataset_names:
13 | db = VideoDataset(
14 | dataset_name=dataset_name,
15 | set='train',
16 | transforms=transforms_2d,
17 | seqlen=cfg.DATASET.SEQLEN,
18 | overlap=cfg.DATASET.OVERLAP,
19 | sample_pool=cfg.DATASET.SAMPLE_POOL,
20 | random_sample=cfg.DATASET.RANDOM_SAMPLE,
21 | random_start=cfg.DATASET.RANDOM_START,
22 | verbose=verbose,
23 | debug=cfg.DEBUG
24 | )
25 | datasets.append(db)
26 | return ConcatDataset(datasets)
27 |
28 | def get_3d_datasets(dataset_names):
29 | datasets = []
30 | for dataset_name in dataset_names:
31 | db = VideoDataset(
32 | dataset_name=dataset_name,
33 | set='train',
34 | transforms=transforms_3d,
35 | seqlen=cfg.DATASET.SEQLEN,
36 | overlap=cfg.DATASET.OVERLAP if dataset_name != '3dpw' else 8,
37 | sample_pool=cfg.DATASET.SAMPLE_POOL,
38 | random_sample=cfg.DATASET.RANDOM_SAMPLE,
39 | random_start=cfg.DATASET.RANDOM_START,
40 | verbose=verbose,
41 | debug=cfg.DEBUG,
42 | )
43 | datasets.append(db)
44 | return ConcatDataset(datasets)
45 |
46 | def get_img_datasets(dataset_names):
47 | datasets = []
48 | for dataset_name in dataset_names:
49 | db = ImageDataset(
50 | dataset_name=dataset_name,
51 | set='train',
52 | transforms=transforms_img,
53 | verbose=verbose,
54 | debug=cfg.DEBUG,
55 | )
56 | if dataset_name == 'mpii3d':
57 | db = Subset(db, list(range(len(db)))[::5])
58 | datasets.append(db)
59 | return ConcatDataset(datasets)
60 |
61 | # ===== Video 2D keypoints datasets =====
62 | train_2d_dataset_names = cfg.TRAIN.DATASETS_2D
63 | data_2d_batch_size = cfg.TRAIN.BATCH_SIZE_2D
64 |
65 | if data_2d_batch_size:
66 | train_2d_db = get_2d_datasets(train_2d_dataset_names)
67 | train_2d_sampler = DistributedSampler(train_2d_db, rank=rank, num_replicas=world_size)
68 | train_2d_loader = DataLoader(
69 | dataset=train_2d_db,
70 | batch_size=data_2d_batch_size,
71 | #shuffle=True,
72 | num_workers=cfg.NUM_WORKERS,
73 | sampler=train_2d_sampler
74 | )
75 | else:
76 | train_2d_loader = None
77 |
78 | # ===== Video 3D keypoint datasets =====
79 | train_3d_dataset_names = cfg.TRAIN.DATASETS_3D
80 | data_3d_batch_size = cfg.TRAIN.BATCH_SIZE_3D
81 |
82 | if data_3d_batch_size:
83 | train_3d_db = get_3d_datasets(train_3d_dataset_names)
84 | train_3d_sampler = DistributedSampler(train_3d_db, rank=rank, num_replicas=world_size)
85 | train_3d_loader = DataLoader(
86 | dataset=train_3d_db,
87 | batch_size=data_3d_batch_size,
88 | #shuffle=True,
89 | num_workers=cfg.NUM_WORKERS,
90 | sampler=train_3d_sampler
91 | )
92 | else:
93 | train_3d_loader = None
94 |
95 | # ===== Image datasets =====
96 | train_img_dataset_names = cfg.TRAIN.DATASETS_IMG
97 | data_img_batch_size = cfg.TRAIN.BATCH_SIZE_IMG
98 |
99 | if data_img_batch_size:
100 | train_img_db = get_img_datasets(train_img_dataset_names)
101 | train_img_sampler = DistributedSampler(train_img_db, rank=rank, num_replicas=world_size)
102 | train_img_loader = DataLoader(
103 | dataset=train_img_db,
104 | batch_size=data_img_batch_size,
105 | num_workers=cfg.NUM_WORKERS,
106 | sampler=train_img_sampler,
107 | )
108 | else:
109 | train_img_loader = None
110 |
111 | eval_set = 'test' if data_3d_batch_size else 'val'
112 |
113 | # ===== Evaluation dataset =====
114 | valid_db = VideoDataset(
115 | dataset_name=cfg.TRAIN.DATASET_EVAL,
116 | set=cfg.TRAIN.EVAL_SET,
117 | transforms=transforms_val,
118 | overlap=0,
119 | sample_pool=cfg.EVAL.SAMPLE_POOL,
120 | random_sample=False,
121 | random_start=False,
122 | verbose=verbose,
123 | debug=cfg.DEBUG
124 | )
125 | #valid_sampler = DistributedSampler(valid_db, rank=rank, num_replicas=world_size)
126 | valid_sampler = None
127 |
128 | valid_loader = DataLoader(
129 | dataset=valid_db,
130 | batch_size=cfg.EVAL.BATCH_SIZE,
131 | shuffle=False,
132 | num_workers=cfg.NUM_WORKERS,
133 | sampler=valid_sampler
134 | )
135 |
136 | return train_2d_loader, train_3d_loader, valid_loader, train_img_loader
137 |
--------------------------------------------------------------------------------
/lib/models/__init__.py:
--------------------------------------------------------------------------------
1 | from .maed import MAED
2 | from .ops import *
3 |
--------------------------------------------------------------------------------
/lib/models/ktd.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import pickle
3 | import torch
4 | import torch.nn as nn
5 | import torch.nn.functional as F
6 | from lib.models.smpl import SMPL, SMPL_MODEL_DIR
7 | from lib.models.spin import projection
8 | from lib.utils.geometry import rot6d_to_rotmat, rotation_matrix_to_angle_axis
9 |
10 | ANCESTOR_INDEX = [
11 | [],
12 | [0],
13 | [0],
14 | [0],
15 | [0, 1],
16 | [0, 2],
17 | [0, 3],
18 | [0, 1, 4],
19 | [0, 2, 5],
20 | [0, 3, 6],
21 | [0, 1, 4, 7],
22 | [0, 2, 5, 8],
23 | [0, 3, 6, 9],
24 | [0, 3, 6, 9],
25 | [0, 3, 6, 9],
26 | [0, 3, 6, 9, 12],
27 | [0, 3, 6, 9, 13],
28 | [0, 3, 6, 9, 14],
29 | [0, 3, 6, 9, 13, 16],
30 | [0, 3, 6, 9, 14, 17],
31 | [0, 3, 6, 9, 13, 16, 18],
32 | [0, 3, 6, 9, 14, 17, 19],
33 | [0, 3, 6, 9, 13, 16, 18, 20],
34 | [0, 3, 6, 9, 14, 17, 19, 21]
35 | ]
36 |
37 | class KTD(nn.Module):
38 | def __init__(self, feat_dim=2048, hidden_dim=1024, **kwargs):
39 | super(KTD, self).__init__()
40 |
41 | self.feat_dim = feat_dim
42 | self.smpl = SMPL(
43 | SMPL_MODEL_DIR,
44 | create_transl=False,
45 | create_global_orient=False,
46 | create_body_pose=False,
47 | create_betas=False,
48 | )
49 | npose_per_joint = 6
50 | nshape = 10
51 | ncam = 3
52 |
53 | self.fc1 = nn.Linear(feat_dim, hidden_dim)
54 | self.drop1 = nn.Dropout()
55 | self.fc2 = nn.Linear(hidden_dim, hidden_dim)
56 | self.drop2 = nn.Dropout()
57 |
58 | self.joint_regs = nn.ModuleList()
59 | for joint_idx, ancestor_idx in enumerate(ANCESTOR_INDEX):
60 | regressor = nn.Linear(hidden_dim + npose_per_joint * len(ancestor_idx), npose_per_joint)
61 | nn.init.xavier_uniform_(regressor.weight, gain=0.01)
62 | self.joint_regs.append(regressor)
63 |
64 | self.decshape = nn.Linear(hidden_dim, nshape)
65 | self.deccam = nn.Linear(hidden_dim, ncam)
66 | nn.init.xavier_uniform_(self.decshape.weight, gain=0.01)
67 | nn.init.xavier_uniform_(self.deccam.weight, gain=0.01)
68 |
69 | def forward(self, x, seqlen, J_regressor=None,
70 | return_shape_cam=False, **kwargs):
71 | nt = x.shape[0]
72 | N = nt//seqlen
73 |
74 | x = self.fc1(x)
75 | x = self.drop1(x)
76 | x = self.fc2(x)
77 | x = self.drop2(x)
78 | pred_shape = self.decshape(x)
79 | pred_cam = self.deccam(x)
80 |
81 | pose = []
82 | for ancestor_idx, reg in zip(ANCESTOR_INDEX, self.joint_regs):
83 | ances = torch.cat([x] + [pose[i] for i in ancestor_idx], dim=1)
84 | pose.append(reg(ances))
85 |
86 | pred_pose = torch.cat(pose, dim=1)
87 |
88 | if return_shape_cam:
89 | return pred_shape, pred_cam
90 | output_regress = self.get_output(pred_pose, pred_shape, pred_cam, J_regressor)
91 |
92 | return output_regress
93 |
94 | def get_output(self, pred_pose, pred_shape, pred_cam, J_regressor):
95 | output = {}
96 |
97 | nt = pred_pose.shape[0]
98 | pred_rotmat = rot6d_to_rotmat(pred_pose).reshape(nt, -1, 3, 3)
99 |
100 | pred_output = self.smpl(
101 | betas=pred_shape,
102 | body_pose=pred_rotmat[:, 1:],
103 | global_orient=pred_rotmat[:, 0].unsqueeze(1),
104 | pose2rot=False
105 | )
106 |
107 | pred_vertices = pred_output.vertices[:nt]
108 | pred_joints = pred_output.joints[:nt]
109 |
110 | if J_regressor is not None:
111 | J_regressor_batch = J_regressor[None, :].expand(pred_vertices.shape[0], -1, -1).to(pred_vertices.device)
112 | pred_joints = torch.matmul(J_regressor_batch, pred_vertices)
113 |
114 | pred_keypoints_2d = projection(pred_joints, pred_cam)
115 |
116 | pose = rotation_matrix_to_angle_axis(pred_rotmat.reshape(-1, 3, 3)).reshape(nt, -1)
117 |
118 | output['theta'] = torch.cat([pred_cam, pose, pred_shape], dim=1)
119 | output['verts'] = pred_vertices
120 | output['kp_2d'] = pred_keypoints_2d
121 | output['kp_3d'] = pred_joints
122 | output['rotmat'] = pred_rotmat
123 |
124 | return output
125 |
--------------------------------------------------------------------------------
/lib/models/maed.py:
--------------------------------------------------------------------------------
1 | import torch.nn as nn
2 |
3 | from torchvision.models import resnet50
4 | from lib.utils.utils import determine_output_feature_dim
5 | from lib.models.ktd import KTD
6 | from lib.models.spin import Regressor as Iterative
7 | from lib.models.vision_transformer import vit_custom_resnet50_224_in21k
8 |
9 | class MAED(nn.Module):
10 | def __init__(self,
11 | encoder='ste', num_blocks=6, num_heads=12, st_mode='parallel',
12 | decoder='ktd', hidden_dim=1024,
13 | **kwargs):
14 | super(MAED, self).__init__()
15 |
16 | self._init_encoder(encoder, num_blocks, num_heads, st_mode, **kwargs)
17 | self._init_decoder(decoder, hidden_dim, **kwargs)
18 |
19 |
20 | def _init_decoder(self, decoder, hidden_dim=1024, **kwargs):
21 | _, feat_dim = determine_output_feature_dim(inp_size=(1, 3, 224, 224), model=self.encoder)
22 |
23 | self.decoder_type = decoder
24 | if decoder.lower() == 'ktd':
25 | self.decoder = KTD(feat_dim=feat_dim, hidden_dim=hidden_dim, **kwargs)
26 | elif decoder.lower() == 'iterative':
27 | self.decoder = Iterative(feat_dim=feat_dim, hidden_dim=hidden_dim, **kwargs)
28 | else:
29 | raise NotImplementedError(decoder)
30 |
31 |
32 | def _init_encoder(self, encoder, num_blocks, num_heads, st_mode, **kwargs):
33 |
34 | self.encoder_type = encoder
35 | if encoder.lower() == 'cnn':
36 | self.encoder = resnet50(pretrained=True)
37 | self.encoder.fc = nn.Identity()
38 | elif encoder.lower() == 'ste':
39 | self.encoder = vit_custom_resnet50_224_in21k(num_blocks, num_heads, st_mode, num_classes=-1)
40 | else:
41 | raise NotImplementedError(encoder)
42 |
43 | def extract_feature(self, x):
44 |
45 | batch_size, seqlen = x.shape[:2]
46 |
47 | x = x.reshape(-1, x.shape[-3], x.shape[-2], x.shape[-1]) # (N,T,3,H,W) -> (NT,3,H,W)
48 | xf = self.encoder(x)
49 | xf = xf.reshape(batch_size, seqlen, -1)
50 | return xf
51 |
52 | def forward(self, x, J_regressor=None, **kwargs):
53 | batch_size, seqlen = x.shape[:2]
54 |
55 | x = x.reshape(-1, x.shape[-3], x.shape[-2], x.shape[-1]) # (N,T,3,H,W) -> (NT,3,H,W)
56 |
57 | xf = self.encoder(x, seqlen=seqlen) if self.encoder_type == 'ste' else self.encoder(x) #(NT, 2048, 7, 7)
58 |
59 | output = self.decoder(xf, seqlen=seqlen, J_regressor=J_regressor, **kwargs)
60 |
61 | output['theta'] = output['theta'].reshape(batch_size, seqlen, -1)
62 | output['verts'] = output['verts'].reshape(batch_size, seqlen, -1, 3)
63 | output['kp_2d'] = output['kp_2d'].reshape(batch_size, seqlen, -1, 2)
64 | output['kp_3d'] = output['kp_3d'].reshape(batch_size, seqlen, -1, 3)
65 | output['rotmat'] = output['rotmat'].reshape(batch_size, seqlen, -1, 3, 3)
66 |
67 | return output
--------------------------------------------------------------------------------
/lib/models/ops/__init__.py:
--------------------------------------------------------------------------------
1 | from .drop import DropPath
--------------------------------------------------------------------------------
/lib/models/ops/drop.py:
--------------------------------------------------------------------------------
1 | """ DropBlock, DropPath
2 | PyTorch implementations of DropBlock and DropPath (Stochastic Depth) regularization layers.
3 | Papers:
4 | DropBlock: A regularization method for convolutional networks (https://arxiv.org/abs/1810.12890)
5 | Deep Networks with Stochastic Depth (https://arxiv.org/abs/1603.09382)
6 | Code:
7 | DropBlock impl inspired by two Tensorflow impl that I liked:
8 | - https://github.com/tensorflow/tpu/blob/master/models/official/resnet/resnet_model.py#L74
9 | - https://github.com/clovaai/assembled-cnn/blob/master/nets/blocks.py
10 | Hacked together by / Copyright 2020 Ross Wightman
11 | """
12 |
13 | import torch
14 | import torch.nn as nn
15 | import torch.nn.functional as F
16 |
17 | def drop_path(x, drop_prob: float = 0., training: bool = False):
18 | """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
19 | This is the same as the DropConnect impl I created for EfficientNet, etc networks, however,
20 | the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
21 | See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for
22 | changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use
23 | 'survival rate' as the argument.
24 | """
25 | if drop_prob == 0. or not training:
26 | return x
27 | keep_prob = 1 - drop_prob
28 | shape = (x.shape[0],) + (1,) * (x.ndim - 1) # work with diff dim tensors, not just 2D ConvNets
29 | random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
30 | random_tensor.floor_() # binarize
31 | output = x.div(keep_prob) * random_tensor
32 | return output
33 |
34 |
35 | class DropPath(nn.Module):
36 | """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
37 | """
38 | def __init__(self, drop_prob=None):
39 | super(DropPath, self).__init__()
40 | self.drop_prob = drop_prob
41 |
42 | def forward(self, x):
43 | return drop_path(x, self.drop_prob, self.training)
--------------------------------------------------------------------------------
/lib/models/smpl.py:
--------------------------------------------------------------------------------
1 | """
2 | This script is brought from https://github.com/nkolot/SPIN
3 | Adhere to their licence to use this script
4 | """
5 |
6 | import torch
7 | import numpy as np
8 | import os.path as osp
9 | from smplx import SMPL as _SMPL
10 | from smplx.body_models import ModelOutput
11 | from smplx.lbs import vertices2joints
12 |
13 | from lib.core.config import DATA_DIR
14 |
15 | # Map joints to SMPL joints
16 | JOINT_MAP = {
17 | 'OP Nose': 24, 'OP Neck': 12, 'OP RShoulder': 17,
18 | 'OP RElbow': 19, 'OP RWrist': 21, 'OP LShoulder': 16,
19 | 'OP LElbow': 18, 'OP LWrist': 20, 'OP MidHip': 0,
20 | 'OP RHip': 2, 'OP RKnee': 5, 'OP RAnkle': 8,
21 | 'OP LHip': 1, 'OP LKnee': 4, 'OP LAnkle': 7,
22 | 'OP REye': 25, 'OP LEye': 26, 'OP REar': 27,
23 | 'OP LEar': 28, 'OP LBigToe': 29, 'OP LSmallToe': 30,
24 | 'OP LHeel': 31, 'OP RBigToe': 32, 'OP RSmallToe': 33, 'OP RHeel': 34,
25 | 'Right Ankle': 8, 'Right Knee': 5, 'Right Hip': 45,
26 | 'Left Hip': 46, 'Left Knee': 4, 'Left Ankle': 7,
27 | 'Right Wrist': 21, 'Right Elbow': 19, 'Right Shoulder': 17,
28 | 'Left Shoulder': 16, 'Left Elbow': 18, 'Left Wrist': 20,
29 | 'Neck (LSP)': 47, 'Top of Head (LSP)': 48,
30 | 'Pelvis (MPII)': 49, 'Thorax (MPII)': 50,
31 | 'Spine (H36M)': 51, 'Jaw (H36M)': 52,
32 | 'Head (H36M)': 53, 'Nose': 24, 'Left Eye': 26,
33 | 'Right Eye': 25, 'Left Ear': 28, 'Right Ear': 27
34 | }
35 | JOINT_NAMES = [
36 | 'OP Nose', 'OP Neck', 'OP RShoulder',
37 | 'OP RElbow', 'OP RWrist', 'OP LShoulder',
38 | 'OP LElbow', 'OP LWrist', 'OP MidHip',
39 | 'OP RHip', 'OP RKnee', 'OP RAnkle',
40 | 'OP LHip', 'OP LKnee', 'OP LAnkle',
41 | 'OP REye', 'OP LEye', 'OP REar',
42 | 'OP LEar', 'OP LBigToe', 'OP LSmallToe',
43 | 'OP LHeel', 'OP RBigToe', 'OP RSmallToe', 'OP RHeel',
44 | 'Right Ankle', 'Right Knee', 'Right Hip',
45 | 'Left Hip', 'Left Knee', 'Left Ankle',
46 | 'Right Wrist', 'Right Elbow', 'Right Shoulder',
47 | 'Left Shoulder', 'Left Elbow', 'Left Wrist',
48 | 'Neck (LSP)', 'Top of Head (LSP)',
49 | 'Pelvis (MPII)', 'Thorax (MPII)',
50 | 'Spine (H36M)', 'Jaw (H36M)',
51 | 'Head (H36M)', 'Nose', 'Left Eye',
52 | 'Right Eye', 'Left Ear', 'Right Ear'
53 | ]
54 |
55 | JOINT_IDS = {JOINT_NAMES[i]: i for i in range(len(JOINT_NAMES))}
56 | JOINT_REGRESSOR_TRAIN_EXTRA = osp.join(DATA_DIR, 'J_regressor_extra.npy')
57 | SMPL_MEAN_PARAMS = osp.join(DATA_DIR, 'smpl_mean_params.npz')
58 | SMPL_MODEL_DIR = DATA_DIR
59 | # H36M_TO_J17 = [6, 5, 4, 1, 2, 3, 16, 15, 14, 11, 12, 13, 8, 10, 0, 7, 9]
60 | # rankle,rknee,rhip,lhip,lknee,lankle,rwrist,relbow,rshoulder,lshoulder,lelbow,lwrist,neck,headtop,hip,Spine,Head
61 |
62 | H36M_TO_J17 = [6, 5, 4, 1, 2, 3, 16, 15, 14, 11, 12, 13, 8, 0, 7, 9, 10]
63 | H36M_TO_J14 = [6, 5, 4, 1, 2, 3, 16, 15, 14, 11, 12, 13, 8, 10]
64 | H36M_TO_MPII3D = [6, 5, 4, 1, 2, 3, 16, 15, 14, 11, 12, 13, 8, 10, 0, 7, 9]
65 |
66 | OP_TO_J14 = [11, 10, 9, 12, 13, 14, 4, 3, 2, 5, 6, 7, 1, -1]
67 | J49_TO_J14 = list(range(25, 39))
68 | J49_TO_MPII3D = list(range(25, 39)) + [39, 41, 43]
69 | J49_TO_H36M = [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 39, 41, 42, 43]
70 | # rankle,rknee.,rhip,lhip,lknee,lankle,rwrist,relbow,rshoulder,lshoulder,lelbow,lwrist,neck,hip,Spine,Jaw,Head
71 |
72 | REGRESSOR_DICT = {
73 | '3dpw': 'J_regressor_h36m.npy',
74 | 'mpii3d': None,
75 | 'h36m': 'J_regressor_h36m.npy'
76 | }
77 | JID_DICT = {
78 | '3dpw': H36M_TO_J14,
79 | 'h36m': H36M_TO_J17,
80 | 'mpii3d': J49_TO_MPII3D
81 | }
82 |
83 |
84 | class SMPL(_SMPL):
85 | """ Extension of the official SMPL implementation to support more joints """
86 |
87 | def __init__(self, *args, **kwargs):
88 | super(SMPL, self).__init__(*args, **kwargs)
89 | joints = [JOINT_MAP[i] for i in JOINT_NAMES]
90 | J_regressor_extra = np.load(JOINT_REGRESSOR_TRAIN_EXTRA)
91 | self.register_buffer('J_regressor_extra', torch.tensor(J_regressor_extra, dtype=torch.float32))
92 | self.joint_map = torch.tensor(joints, dtype=torch.long)
93 |
94 | def forward(self, *args, **kwargs):
95 | kwargs['get_skin'] = True
96 | smpl_output = super(SMPL, self).forward(*args, **kwargs)
97 | extra_joints = vertices2joints(self.J_regressor_extra, smpl_output.vertices)
98 | joints = torch.cat([smpl_output.joints, extra_joints], dim=1)
99 | joints = joints[:, self.joint_map, :]
100 | output = ModelOutput(vertices=smpl_output.vertices,
101 | global_orient=smpl_output.global_orient,
102 | body_pose=smpl_output.body_pose,
103 | joints=joints,
104 | betas=smpl_output.betas,
105 | full_pose=smpl_output.full_pose)
106 | return output
107 |
108 |
109 | def get_smpl_faces():
110 | smpl = SMPL(SMPL_MODEL_DIR, batch_size=1, create_transl=False)
111 | return smpl.faces
112 |
--------------------------------------------------------------------------------
/lib/models/spin.py:
--------------------------------------------------------------------------------
1 | """
2 | This script is brought from https://github.com/nkolot/SPIN
3 | Adhere to their licence to use this script
4 | """
5 |
6 | import math
7 | import torch
8 | import numpy as np
9 | import os.path as osp
10 | import torch.nn as nn
11 |
12 | from lib.core.config import DATA_DIR
13 | from lib.utils.geometry import rotation_matrix_to_angle_axis, rot6d_to_rotmat
14 | from lib.models.smpl import SMPL, SMPL_MODEL_DIR, H36M_TO_J17, SMPL_MEAN_PARAMS
15 |
16 |
17 | class Regressor(nn.Module):
18 | def __init__(self, smpl_mean_params=SMPL_MEAN_PARAMS, feat_dim=2048, hidden_dim=1024, **kwargs):
19 | super(Regressor, self).__init__()
20 |
21 | self.smpl = SMPL(
22 | SMPL_MODEL_DIR,
23 | create_transl=False,
24 | create_global_orient=False,
25 | create_body_pose=False,
26 | create_betas=False,
27 | )
28 | npose = 24 * 6
29 | nshape = 10
30 |
31 | self.fc1 = nn.Linear(feat_dim + npose + nshape + 3, hidden_dim)
32 | self.drop1 = nn.Dropout()
33 | self.fc2 = nn.Linear(hidden_dim, hidden_dim)
34 | self.drop2 = nn.Dropout()
35 | self.decpose = nn.Linear(hidden_dim, npose)
36 | self.decshape = nn.Linear(hidden_dim, nshape)
37 | self.deccam = nn.Linear(hidden_dim, 3)
38 | nn.init.xavier_uniform_(self.decpose.weight, gain=0.01)
39 | nn.init.xavier_uniform_(self.decshape.weight, gain=0.01)
40 | nn.init.xavier_uniform_(self.deccam.weight, gain=0.01)
41 |
42 | mean_params = np.load(smpl_mean_params)
43 | init_pose = torch.from_numpy(mean_params['pose'][:]).unsqueeze(0)
44 | init_shape = torch.from_numpy(mean_params['shape'][:].astype('float32')).unsqueeze(0)
45 | init_cam = torch.from_numpy(mean_params['cam']).unsqueeze(0)
46 | self.register_buffer('init_pose', init_pose)
47 | self.register_buffer('init_shape', init_shape)
48 | self.register_buffer('init_cam', init_cam)
49 |
50 |
51 | def iterative_regress(self, x, init_pose=None, init_shape=None, init_cam=None, n_iter=3):
52 | nt = x.shape[0]
53 |
54 | if init_pose is None:
55 | init_pose = self.init_pose.expand(nt, -1)
56 | if init_shape is None:
57 | init_shape = self.init_shape.expand(nt, -1)
58 | if init_cam is None:
59 | init_cam = self.init_cam.expand(nt, -1)
60 |
61 | pred_pose = init_pose
62 | pred_shape = init_shape
63 | pred_cam = init_cam
64 | for i in range(n_iter):
65 | xc = torch.cat([x, pred_pose, pred_shape, pred_cam], 1)
66 | xc = self.fc1(xc)
67 | xc = self.drop1(xc)
68 | xc = self.fc2(xc)
69 | xc = self.drop2(xc)
70 | pred_pose = self.decpose(xc) + pred_pose
71 | pred_shape = self.decshape(xc) + pred_shape
72 | pred_cam = self.deccam(xc) + pred_cam
73 |
74 | return pred_pose, pred_shape, pred_cam
75 |
76 | def forward(self, x, seqlen, J_regressor=None,
77 | init_pose=None, init_shape=None, init_cam=None, n_iter=3, **kwargs):
78 | nt = x.shape[0]
79 | N = nt//seqlen
80 |
81 | pred_pose, pred_shape, pred_cam = self.iterative_regress(x, init_pose, init_shape, init_cam, n_iter=3)
82 | output_regress = self.get_output(pred_pose, pred_shape, pred_cam, J_regressor)
83 |
84 | return output_regress
85 |
86 |
87 | def get_output(self, pred_pose, pred_shape, pred_cam, J_regressor):
88 | output = {}
89 | nt = pred_pose.shape[0]
90 | pred_rotmat = rot6d_to_rotmat(pred_pose).reshape(nt, -1, 3, 3)
91 |
92 | pred_output = self.smpl(
93 | betas=pred_shape,
94 | body_pose=pred_rotmat[:, 1:],
95 | global_orient=pred_rotmat[:, 0].unsqueeze(1),
96 | pose2rot=False
97 | )
98 | pred_vertices = pred_output.vertices[:nt]
99 | pred_joints = pred_output.joints[:nt]
100 | if J_regressor is not None:
101 | J_regressor_batch = J_regressor[None, :].expand(pred_vertices.shape[0], -1, -1).to(pred_vertices.device)
102 | pred_joints = torch.matmul(J_regressor_batch, pred_vertices)
103 | pred_keypoints_2d = projection(pred_joints, pred_cam)
104 | pose = rotation_matrix_to_angle_axis(pred_rotmat.reshape(-1, 3, 3)).reshape(nt, -1)
105 | output['theta'] = torch.cat([pred_cam, pose, pred_shape], dim=1)
106 | output['verts'] = pred_vertices
107 | output['kp_2d'] = pred_keypoints_2d
108 | output['kp_3d'] = pred_joints
109 | output['rotmat'] = pred_rotmat
110 | return output
111 |
112 |
113 | def projection(pred_joints, pred_camera):
114 | pred_cam_t = torch.stack([pred_camera[:, 1],
115 | pred_camera[:, 2],
116 | 2 * 5000. / (224. * pred_camera[:, 0] + 1e-9)], dim=-1)
117 | batch_size = pred_joints.shape[0]
118 | camera_center = torch.zeros(batch_size, 2)
119 | pred_keypoints_2d = perspective_projection(pred_joints,
120 | rotation=torch.eye(3).unsqueeze(0).expand(batch_size, -1, -1).to(pred_joints.device),
121 | translation=pred_cam_t,
122 | focal_length=5000.,
123 | camera_center=camera_center)
124 | # Normalize keypoints to [-1,1]
125 | pred_keypoints_2d = pred_keypoints_2d / (224. / 2.)
126 | return pred_keypoints_2d
127 |
128 |
129 | def perspective_projection(points, rotation, translation,
130 | focal_length, camera_center):
131 | """
132 | This function computes the perspective projection of a set of points.
133 | Input:
134 | points (bs, N, 3): 3D points
135 | rotation (bs, 3, 3): Camera rotation
136 | translation (bs, 3): Camera translation
137 | focal_length (bs,) or scalar: Focal length
138 | camera_center (bs, 2): Camera center
139 | """
140 | batch_size = points.shape[0]
141 | K = torch.zeros([batch_size, 3, 3], device=points.device)
142 | K[:,0,0] = focal_length
143 | K[:,1,1] = focal_length
144 | K[:,2,2] = 1.
145 | K[:,:-1, -1] = camera_center
146 |
147 | # Transform points
148 | points = torch.einsum('bij,bkj->bki', rotation, points)
149 | points = points + translation.unsqueeze(1)
150 |
151 | # Apply perspective distortion
152 | projected_points = points / points[:,:,-1].unsqueeze(-1)
153 |
154 | # Apply camera intrinsics
155 | projected_points = torch.einsum('bij,bkj->bki', K, projected_points)
156 |
157 | return projected_points[:, :, :-1]
158 |
--------------------------------------------------------------------------------
/lib/models/tokenpose.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import numpy as np
4 | # , HybridEmbed, PatchEmbed, Block
5 | from lib.models.vision_transformer import VisionTransformer, trunc_normal_, Block
6 | from lib.utils.geometry import rot6d_to_rotmat, rotation_matrix_to_angle_axis
7 | from lib.models.smpl import SMPL, SMPL_MODEL_DIR, SMPL_MEAN_PARAMS
8 | from lib.models.spin import projection
9 |
10 |
11 | class TokenPoseRot6d(VisionTransformer):
12 | def __init__(self, img_size=224, joints_num=24, pred_rot_dim=6, patch_size=16, in_chans=3, num_classes=1000, embed_dim=768, depth=12,
13 | num_heads=12, mlp_ratio=4, qkv_bias=False, qk_scale=None, representation_size=None,
14 | drop_rate=0, attn_drop_rate=0, drop_path_rate=0, hybrid_backbone=None,
15 | token_init_mode='normal', proj_rot_mode='linear', use_joint2d_head=False,
16 | norm_layer=nn.LayerNorm, st_mode='vanilla', contraint_token_delta=False, seq_length=16,
17 | use_rot6d_to_token_head=False, mask_ratio=0.,
18 | temporal_layers=3, temporal_num_heads=1,
19 | enable_temp_modeling=True, enable_temp_embedding=False):
20 |
21 | super().__init__(img_size=img_size, patch_size=patch_size, in_chans=in_chans,
22 | num_classes=num_classes, embed_dim=embed_dim, depth=depth, num_heads=num_heads,
23 | mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,
24 | representation_size=representation_size, drop_rate=drop_rate,
25 | attn_drop_rate=attn_drop_rate, drop_path_rate=drop_path_rate,
26 | hybrid_backbone=hybrid_backbone, norm_layer=norm_layer, st_mode=st_mode)
27 |
28 | # joints tokens
29 | self.joint3d_tokens = nn.Parameter(torch.zeros(1, joints_num, embed_dim))
30 | self.shape_token = nn.Parameter(torch.zeros(1, 1, embed_dim))
31 | self.cam_token = nn.Parameter(torch.zeros(1, 1, embed_dim))
32 |
33 | self.joints_num = joints_num
34 |
35 | self._init_tokens(mode=token_init_mode)
36 |
37 | self.return_tokens = contraint_token_delta
38 |
39 | self.joint3d_head = nn.Linear(embed_dim, pred_rot_dim)
40 | self.shape_head = nn.Linear(embed_dim, 10)
41 | self.cam_head = nn.Linear(embed_dim, 3)
42 |
43 | self.apply(self._init_weights)
44 |
45 | self.smpl = SMPL(
46 | SMPL_MODEL_DIR,
47 | create_transl=False,
48 | create_global_orient=False,
49 | create_body_pose=False,
50 | create_betas=False,
51 | )
52 |
53 | self.enable_temp_modeling = False
54 | self.reconstruct = False
55 |
56 | if enable_temp_modeling:
57 | self.enable_temp_modeling = enable_temp_modeling
58 | # stochastic depth decay rule
59 | dpr = [x.item() for x in torch.linspace(0, drop_path_rate, depth)]
60 | self.temporal_transformer = nn.ModuleList([
61 | Block(dim=embed_dim, num_heads=temporal_num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,
62 | drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer,
63 | st_mode='vanilla') for i in range(temporal_layers)])
64 |
65 | self.enable_pos_embedding = False
66 | if enable_temp_embedding:
67 | self.enable_pos_embedding = True
68 | self.temporal_pos_embedding = nn.Parameter(torch.zeros(1, seq_length, embed_dim))
69 | trunc_normal_(self.temporal_pos_embedding, std=.02)
70 |
71 | self.mask_ratio = mask_ratio
72 |
73 | if mask_ratio > 0.:
74 | self.reconstruct = True
75 | self.mask_token = nn.Parameter(torch.zeros(1, 1, embed_dim))
76 |
77 | del self.head, self.pre_logits
78 |
79 | def random_masking(self, x, mask_ratio):
80 | """
81 | Perform per-sample random masking by per-sample shuffling.
82 | Per-sample shuffling is done by argsort random noise.
83 | x: [N, L, D], sequence
84 | """
85 | N, L, D = x.shape # batch, length, dim
86 | len_keep = int(L * (1 - mask_ratio))
87 |
88 | noise = torch.rand(N, L, device=x.device) # noise in [0, 1]
89 |
90 | # sort noise for each sample
91 | # ascend: small is keep, large is remove
92 | ids_shuffle = torch.argsort(noise, dim=1)
93 | ids_restore = torch.argsort(ids_shuffle, dim=1)
94 |
95 | # keep the first subset
96 | # ids_keep = ids_shuffle[:, :len_keep]
97 | # x_keep = torch.gather(x, dim=1, index=ids_keep.unsqueeze(-1).repeat(1, 1, D))
98 |
99 | # generate the binary mask: 0 is keep, 1 is remove
100 | mask = torch.ones([N, L], device=x.device)
101 | mask[:, :len_keep] = 0
102 | # unshuffle to get the binary mask
103 | mask = torch.gather(mask, dim=1, index=ids_restore)
104 |
105 | x[mask.long(), :] = self.mask_token # .expand(mask.shape[0], mask.shape[1], -1)
106 | x_masked = x
107 |
108 | return x_masked
109 |
110 | def _init_tokens(self, mode='normal'):
111 | if mode == 'normal':
112 | trunc_normal_(self.joint3d_tokens, std=.02)
113 | trunc_normal_(self.shape_token, std=.02)
114 | trunc_normal_(self.cam_token, std=.02)
115 | else:
116 | print("zero initialize tokens")
117 | pass
118 |
119 | def forward_features(self, x, seqlen=1):
120 | B = x.shape[0] # (NT, 3, H, W)
121 |
122 | x = self.patch_embed(x) # (NT, 14*14, 2048) (bs, seq, embedding_size)
123 |
124 | joint3d_tokens = self.joint3d_tokens.expand(B, -1, -1)
125 | shape_token = self.shape_token.expand(B, -1, -1)
126 | cam_token = self.cam_token.expand(B, -1, -1)
127 |
128 | cls_token = self.cls_token.expand(B, -1, -1)
129 | x = torch.cat([cls_token, x], dim=1)
130 | x = x + self.pos_embed[:, :, :]
131 |
132 | x = torch.cat([joint3d_tokens, shape_token, cam_token, x], dim=1)
133 | # [NT, HW+24+1+1, embedding_size]
134 |
135 | x = self.pos_drop(x)
136 |
137 | for blk in self.blocks:
138 | x = blk(x, seqlen)
139 |
140 | x = self.norm(x)
141 | joint3d_tokens = x[:, :self.joints_num]
142 | shape_token = x[:, self.joints_num]
143 | cam_token = x[:, self.joints_num + 1]
144 |
145 | return joint3d_tokens, shape_token, cam_token
146 |
147 | def temporal_modeling(self, joint3d_tokens, seq_length, mask_ratio=0.):
148 | # joint3d_tokens [B, N, C]
149 | B, N, C = joint3d_tokens.shape
150 | joint3d_tokens = joint3d_tokens.reshape(-1,seq_length, N, C).permute(0, 2, 1, 3)
151 | joint3d_tokens_temporal = joint3d_tokens.reshape(-1, seq_length, C)
152 |
153 | # [bs*N, seq_length, C]
154 | if self.enable_pos_embedding:
155 | if self.temporal_pos_embedding.shape[1] !=seq_length:
156 |
157 | temporal_pos_embedding = torch.nn.functional.interpolate(
158 | self.temporal_pos_embedding.data.permute(0,2,1),
159 | size=seq_length,
160 | mode='linear'
161 | ).permute(0, 2, 1)
162 | self.temporal_pos_embedding = torch.nn.Parameter(temporal_pos_embedding)
163 | joint3d_tokens_temporal += self.temporal_pos_embedding[:, :seq_length, :]
164 |
165 | if self.training and mask_ratio > 0.:
166 | joint3d_tokens_temporal_masked = self.random_masking(
167 | joint3d_tokens_temporal, mask_ratio)
168 | else:
169 | joint3d_tokens_temporal_masked = joint3d_tokens_temporal
170 |
171 | for blk in self.temporal_transformer:
172 | joint3d_tokens_temporal_masked = blk(joint3d_tokens_temporal_masked)
173 |
174 | pred_joint3d_tokens_temporal = joint3d_tokens_temporal_masked.reshape(
175 | -1, N, seq_length, C).permute(0, 2, 1, 3)
176 | pred_joint3d_tokens_temporal = pred_joint3d_tokens_temporal.reshape(B, N, C)
177 | return pred_joint3d_tokens_temporal
178 |
179 | def forward(self, x, J_regressor=None, **kwargs):
180 |
181 | batch_size, seqlen = x.shape[:2]
182 | x = x.reshape(-1, x.shape[-3], x.shape[-2], x.shape[-1]) # (NT, 3, H, W)
183 |
184 | joint3d_tokens, shape_token, cam_token = self.forward_features(x, seqlen)
185 |
186 | if self.enable_temp_modeling:
187 | joint3d_tokens_before = joint3d_tokens.clone().detach_()
188 | joint3d_tokens = self.temporal_modeling(
189 | joint3d_tokens, seqlen, mask_ratio=self.mask_ratio)
190 |
191 | # # [bs*seq_length, N, embed_dim]
192 | pred_joints_rot6d = self.joint3d_head(joint3d_tokens) # [b, 24, 6]
193 | pred_shape = self.shape_head(shape_token)
194 | pred_cam = self.cam_head(cam_token)
195 |
196 | output = {}
197 |
198 | # mse loss
199 | if self.reconstruct and self.training:
200 | reconstruct_loss = (joint3d_tokens - joint3d_tokens_before)**2
201 | reconstruct_loss = reconstruct_loss.mean()
202 | output['reconstruct_loss'] = reconstruct_loss
203 |
204 | nt = pred_joints_rot6d.shape[0]
205 | pred_rotmat = rot6d_to_rotmat(pred_joints_rot6d).reshape(nt, -1, 3, 3)
206 |
207 | pred_output = self.smpl(
208 | betas=pred_shape,
209 | body_pose=pred_rotmat[:, 1:],
210 | global_orient=pred_rotmat[:, 0].unsqueeze(1),
211 | pose2rot=False
212 | )
213 |
214 | pred_vertices = pred_output.vertices[:nt]
215 | pred_joints = pred_output.joints[:nt]
216 |
217 | if J_regressor is not None:
218 | J_regressor_batch = J_regressor[None, :].expand(
219 | pred_vertices.shape[0], -1, -1).to(pred_vertices.device)
220 | pred_joints = torch.matmul(J_regressor_batch, pred_vertices)
221 |
222 | pred_keypoints_2d = projection(pred_joints, pred_cam)
223 |
224 | pose = rotation_matrix_to_angle_axis(
225 | pred_rotmat.reshape(-1, 3, 3)).reshape(nt, -1)
226 |
227 | output['theta'] = torch.cat([pred_cam, pose, pred_shape], dim=1)
228 | output['verts'] = pred_vertices
229 | output['kp_2d'] = pred_keypoints_2d
230 | output['kp_3d'] = pred_joints
231 | output['rotmat'] = pred_rotmat
232 |
233 | output['theta'] = output['theta'].reshape(batch_size, seqlen, -1)
234 | output['verts'] = output['verts'].reshape(batch_size, seqlen, -1, 3)
235 | output['kp_2d'] = output['kp_2d'].reshape(batch_size, seqlen, -1, 2)
236 | output['kp_3d'] = output['kp_3d'].reshape(batch_size, seqlen, -1, 3)
237 | output['rotmat'] = output['rotmat'].reshape(batch_size, seqlen, -1, 3, 3)
238 |
239 | return output
240 |
241 |
242 | from lib.models.resnetv2 import ResNetV2
243 | import torch.utils.model_zoo as model_zoo
244 | from lib.models.vision_transformer import _conv_filter, model_urls
245 | from functools import partial
246 |
247 |
248 | def Token3d(num_blocks, num_heads, st_mode, pretrained=True, proj_rot_mode='linear',
249 | use_joint2d_head=False, contraint_token_delta=False,
250 | use_rot6d_to_token_head=False, mask_ratio=0.,
251 | temporal_layers=3, temporal_num_heads=1,
252 | enable_temp_modeling=True, enable_temp_embedding=False,
253 | **kwargs):
254 | """ Hybrid model with a R50 and a Vit of custom layers .
255 | """
256 | # create a ResNetV2 w/o pre-activation, that uses StdConv and GroupNorm and has 3 stages, no head
257 | backbone = ResNetV2(
258 | layers=(3, 4, 9), num_classes=0, global_pool='', in_chans=kwargs.get('in_chans', 3),
259 | preact=False, stem_type='same')
260 | model = TokenPoseRot6d(
261 | patch_size=16, embed_dim=768, depth=num_blocks, num_heads=num_heads,
262 | hybrid_backbone=backbone, mlp_ratio=4, qkv_bias=True,
263 | representation_size=768, norm_layer=partial(nn.LayerNorm, eps=1e-6),
264 | st_mode=st_mode, proj_rot_mode=proj_rot_mode,
265 | use_joint2d_head=use_joint2d_head,
266 | contraint_token_delta=contraint_token_delta,
267 | use_rot6d_to_token_head=use_rot6d_to_token_head,
268 | mask_ratio=mask_ratio,
269 | temporal_layers=temporal_layers,
270 | temporal_num_heads=temporal_num_heads,
271 | enable_temp_modeling=enable_temp_modeling,
272 | enable_temp_embedding=enable_temp_embedding,
273 | **kwargs)
274 | if pretrained:
275 | state_dict = model_zoo.load_url(
276 | model_urls['vit_base_resnet50_224_in21k'], progress=False, map_location='cpu')
277 | state_dict = _conv_filter(state_dict)
278 | del state_dict['head.weight']
279 | del state_dict['head.bias']
280 | model.load_state_dict(state_dict, strict=False)
281 | return model
--------------------------------------------------------------------------------
/lib/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yangsenius/INT_HMR_Model/8f7ee91bf8272fce37b571c02e5df49c5cd13b20/lib/utils/__init__.py
--------------------------------------------------------------------------------
/lib/utils/demo_utils.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | This script is brought from https://github.com/mkocabas/VIBE
4 | Adhere to their licence to use this script
5 | """
6 |
7 | import os
8 | import cv2
9 | import time
10 | import json
11 | import torch
12 | import subprocess
13 | import numpy as np
14 | import os.path as osp
15 | from pytube import YouTube
16 | from collections import OrderedDict
17 |
18 | from lib.utils.smooth_bbox import get_smooth_bbox_params, get_all_bbox_params
19 | from lib.utils.geometry import rotation_matrix_to_angle_axis
20 |
21 |
22 | def download_youtube_clip(url, download_folder):
23 | return YouTube(url).streams.first().download(output_path=download_folder)
24 |
25 |
26 | def trim_videos(filename, start_time, end_time, output_filename):
27 | command = ['ffmpeg',
28 | '-i', '"%s"' % filename,
29 | '-ss', str(start_time),
30 | '-t', str(end_time - start_time),
31 | '-c:v', 'libx264', '-c:a', 'copy',
32 | '-threads', '1',
33 | '-loglevel', 'panic',
34 | '"%s"' % output_filename]
35 | # command = ' '.join(command)
36 | subprocess.call(command)
37 |
38 |
39 | def video_to_images(vid_file, img_folder=None, return_info=False):
40 | if img_folder is None:
41 | img_folder = osp.join('/tmp', osp.basename(vid_file).replace('.', '_'))
42 |
43 | os.makedirs(img_folder, exist_ok=True)
44 |
45 | command = ['/mnt/lustre/wanziniu/ffmpeg-4.3.1-amd64-static/ffmpeg',
46 | '-i', vid_file,
47 | '-f', 'image2',
48 | '-v', 'error',
49 | f'{img_folder}/%06d.png']
50 | print(f'Running \"{" ".join(command)}\"')
51 | subprocess.call(command)
52 |
53 | print(f'Images saved to \"{img_folder}\"')
54 |
55 | img_shape = cv2.imread(osp.join(img_folder, '000001.png')).shape
56 |
57 | if return_info:
58 | return img_folder, len(os.listdir(img_folder)), img_shape
59 | else:
60 | return img_folder
61 |
62 |
63 | def download_url(url, outdir):
64 | print(f'Downloading files from {url}')
65 | cmd = ['wget', '-c', url, '-P', outdir]
66 | subprocess.call(cmd)
67 |
68 |
69 | def download_ckpt(outdir='data/vibe_data', use_3dpw=False):
70 | os.makedirs(outdir, exist_ok=True)
71 |
72 | if use_3dpw:
73 | ckpt_file = 'data/vibe_data/vibe_model_w_3dpw.pth.tar'
74 | url = 'https://www.dropbox.com/s/41ozgqorcp095ja/vibe_model_w_3dpw.pth.tar'
75 | if not os.path.isfile(ckpt_file):
76 | download_url(url=url, outdir=outdir)
77 | else:
78 | ckpt_file = 'data/vibe_data/vibe_model_wo_3dpw.pth.tar'
79 | url = 'https://www.dropbox.com/s/amj2p8bmf6g56k6/vibe_model_wo_3dpw.pth.tar'
80 | if not os.path.isfile(ckpt_file):
81 | download_url(url=url, outdir=outdir)
82 |
83 | return ckpt_file
84 |
85 |
86 | def images_to_video(img_folder, output_vid_file):
87 | os.makedirs(img_folder, exist_ok=True)
88 |
89 | command = [
90 | 'ffmpeg', '-y', '-threads', '16', '-i', f'{img_folder}/%06d.png', '-profile:v', 'baseline',
91 | '-level', '3.0', '-c:v', 'libx264', '-pix_fmt', 'yuv420p', '-an', '-v', 'error', output_vid_file,
92 | ]
93 |
94 | print(f'Running \"{" ".join(command)}\"')
95 | subprocess.call(command)
96 |
97 |
98 | def convert_crop_cam_to_orig_img(cam, bbox, img_width, img_height):
99 | '''
100 | Convert predicted camera from cropped image coordinates
101 | to original image coordinates
102 | :param cam (ndarray, shape=(3,)): weak perspective camera in cropped img coordinates
103 | :param bbox (ndarray, shape=(4,)): bbox coordinates (c_x, c_y, h)
104 | :param img_width (int): original image width
105 | :param img_height (int): original image height
106 | :return:
107 | '''
108 | cx, cy, w, h = bbox[:,0], bbox[:,1], bbox[:,2], bbox[:,3]
109 | hw, hh = img_width / 2., img_height / 2.
110 | sx = cam[:,0] * (1. / (img_width / w))
111 | sy = cam[:,0] * (1. / (img_height / h))
112 | tx = ((cx - hw) / hw / sx) + cam[:,1]
113 | ty = ((cy - hh) / hh / sy) + cam[:,2]
114 | orig_cam = np.stack([sx, sy, tx, ty]).T
115 | return orig_cam
116 |
117 |
118 | def prepare_rendering_results(vibe_results, nframes):
119 | frame_results = [{} for _ in range(nframes)]
120 | for person_id, person_data in vibe_results.items():
121 | for idx, frame_id in enumerate(person_data['frame_ids']):
122 | frame_results[frame_id][person_id] = {
123 | 'verts': person_data['verts'][idx],
124 | 'cam': person_data['orig_cam'][idx],
125 | }
126 |
127 | # naive depth ordering based on the scale of the weak perspective camera
128 | for frame_id, frame_data in enumerate(frame_results):
129 | # sort based on y-scale of the cam in original image coords
130 | sort_idx = np.argsort([v['cam'][1] for k,v in frame_data.items()])
131 | frame_results[frame_id] = OrderedDict(
132 | {list(frame_data.keys())[i]:frame_data[list(frame_data.keys())[i]] for i in sort_idx}
133 | )
134 |
135 | return frame_results
136 |
--------------------------------------------------------------------------------
/lib/utils/eval_utils.py:
--------------------------------------------------------------------------------
1 | """
2 | Some functions are borrowed from https://github.com/akanazawa/human_dynamics/blob/master/src/evaluation/eval_util.py
3 | Adhere to their licence to use these functions
4 | """
5 |
6 | import torch
7 | import numpy as np
8 |
9 |
10 | def compute_accel(joints):
11 | """
12 | Computes acceleration of 3D joints.
13 | Args:
14 | joints (Nx25x3).
15 | Returns:
16 | Accelerations (N-2).
17 | """
18 | velocities = joints[1:] - joints[:-1]
19 | acceleration = velocities[1:] - velocities[:-1]
20 | acceleration_normed = np.linalg.norm(acceleration, axis=2)
21 | return np.mean(acceleration_normed, axis=1)
22 |
23 |
24 | def compute_error_accel(joints_gt, joints_pred, vis=None):
25 | """
26 | Computes acceleration error:
27 | 1/(n-2) \sum_{i=1}^{n-1} X_{i-1} - 2X_i + X_{i+1}
28 | Note that for each frame that is not visible, three entries in the
29 | acceleration error should be zero'd out.
30 | Args:
31 | joints_gt (Nx14x3).
32 | joints_pred (Nx14x3).
33 | vis (N).
34 | Returns:
35 | error_accel (N-2).
36 | """
37 | # (N-2)x14x3
38 | accel_gt = joints_gt[:-2] - 2 * joints_gt[1:-1] + joints_gt[2:]
39 | accel_pred = joints_pred[:-2] - 2 * joints_pred[1:-1] + joints_pred[2:]
40 |
41 | normed = np.linalg.norm(accel_pred - accel_gt, axis=2)
42 |
43 | if vis is None:
44 | new_vis = np.ones(len(normed), dtype=bool)
45 | else:
46 | invis = np.logical_not(vis)
47 | invis1 = np.roll(invis, -1)
48 | invis2 = np.roll(invis, -2)
49 | new_invis = np.logical_or(invis, np.logical_or(invis1, invis2))[:-2]
50 | new_vis = np.logical_not(new_invis)
51 |
52 | return np.mean(normed[new_vis], axis=1)
53 |
54 |
55 | def compute_error_verts(pred_verts, target_verts=None, target_theta=None):
56 | """
57 | Computes MPJPE over 6890 surface vertices.
58 | Args:
59 | verts_gt (Nx6890x3).
60 | verts_pred (Nx6890x3).
61 | Returns:
62 | error_verts (N).
63 | """
64 |
65 | if target_verts is None:
66 | from lib.models.smpl import SMPL_MODEL_DIR
67 | from lib.models.smpl import SMPL
68 | device = 'cpu'
69 | smpl = SMPL(
70 | SMPL_MODEL_DIR,
71 | batch_size=1, # target_theta.shape[0],
72 | ).to(device)
73 |
74 | betas = torch.from_numpy(target_theta[:,75:]).to(device)
75 | pose = torch.from_numpy(target_theta[:,3:75]).to(device)
76 |
77 | target_verts = []
78 | b_ = torch.split(betas, 5000)
79 | p_ = torch.split(pose, 5000)
80 |
81 | for b,p in zip(b_,p_):
82 | output = smpl(betas=b, body_pose=p[:, 3:], global_orient=p[:, :3], pose2rot=True)
83 | target_verts.append(output.vertices.detach().cpu().numpy())
84 |
85 | target_verts = np.concatenate(target_verts, axis=0)
86 |
87 | assert len(pred_verts) == len(target_verts)
88 | error_per_vert = np.sqrt(np.sum((target_verts - pred_verts) ** 2, axis=2))
89 | return np.mean(error_per_vert, axis=1)
90 |
91 |
92 | def compute_similarity_transform(S1, S2):
93 | '''
94 | Computes a similarity transform (sR, t) that takes
95 | a set of 3D points S1 (3 x N) closest to a set of 3D points S2,
96 | where R is an 3x3 rotation matrix, t 3x1 translation, s scale.
97 | i.e. solves the orthogonal Procrutes problem.
98 | '''
99 | transposed = False
100 | if S1.shape[0] != 3 and S1.shape[0] != 2:
101 | S1 = S1.T
102 | S2 = S2.T
103 | transposed = True
104 | assert(S2.shape[1] == S1.shape[1])
105 |
106 | # 1. Remove mean.
107 | mu1 = S1.mean(axis=1, keepdims=True)
108 | mu2 = S2.mean(axis=1, keepdims=True)
109 | X1 = S1 - mu1
110 | X2 = S2 - mu2
111 |
112 | # 2. Compute variance of X1 used for scale.
113 | var1 = np.sum(X1**2)
114 |
115 | # 3. The outer product of X1 and X2.
116 | K = X1.dot(X2.T)
117 |
118 | # 4. Solution that Maximizes trace(R'K) is R=U*V', where U, V are
119 | # singular vectors of K.
120 | U, s, Vh = np.linalg.svd(K)
121 | V = Vh.T
122 | # Construct Z that fixes the orientation of R to get det(R)=1.
123 | Z = np.eye(U.shape[0])
124 | Z[-1, -1] *= np.sign(np.linalg.det(U.dot(V.T)))
125 | # Construct R.
126 | R = V.dot(Z.dot(U.T))
127 |
128 | # 5. Recover scale.
129 | scale = np.trace(R.dot(K)) / var1
130 |
131 | # 6. Recover translation.
132 | t = mu2 - scale*(R.dot(mu1))
133 |
134 | # 7. Error:
135 | S1_hat = scale*R.dot(S1) + t
136 |
137 | if transposed:
138 | S1_hat = S1_hat.T
139 |
140 | return S1_hat
141 |
142 |
143 | def compute_similarity_transform_torch(S1, S2):
144 | '''
145 | Computes a similarity transform (sR, t) that takes
146 | a set of 3D points S1 (3 x N) closest to a set of 3D points S2,
147 | where R is an 3x3 rotation matrix, t 3x1 translation, s scale.
148 | i.e. solves the orthogonal Procrutes problem.
149 | '''
150 | transposed = False
151 | if S1.shape[0] != 3 and S1.shape[0] != 2:
152 | S1 = S1.T
153 | S2 = S2.T
154 | transposed = True
155 | assert (S2.shape[1] == S1.shape[1])
156 |
157 | # 1. Remove mean.
158 | mu1 = S1.mean(axis=1, keepdims=True)
159 | mu2 = S2.mean(axis=1, keepdims=True)
160 | X1 = S1 - mu1
161 | X2 = S2 - mu2
162 |
163 | # print('X1', X1.shape)
164 |
165 | # 2. Compute variance of X1 used for scale.
166 | var1 = torch.sum(X1 ** 2)
167 |
168 | # print('var', var1.shape)
169 |
170 | # 3. The outer product of X1 and X2.
171 | K = X1.mm(X2.T)
172 |
173 | # 4. Solution that Maximizes trace(R'K) is R=U*V', where U, V are
174 | # singular vectors of K.
175 | U, s, V = torch.svd(K)
176 | # V = Vh.T
177 | # Construct Z that fixes the orientation of R to get det(R)=1.
178 | Z = torch.eye(U.shape[0], device=S1.device)
179 | Z[-1, -1] *= torch.sign(torch.det(U @ V.T))
180 | # Construct R.
181 | R = V.mm(Z.mm(U.T))
182 |
183 | # print('R', X1.shape)
184 |
185 | # 5. Recover scale.
186 | scale = torch.trace(R.mm(K)) / var1
187 | # print(R.shape, mu1.shape)
188 | # 6. Recover translation.
189 | t = mu2 - scale * (R.mm(mu1))
190 | # print(t.shape)
191 |
192 | # 7. Error:
193 | S1_hat = scale * R.mm(S1) + t
194 |
195 | if transposed:
196 | S1_hat = S1_hat.T
197 |
198 | return S1_hat
199 |
200 |
201 | def batch_compute_similarity_transform_torch(S1, S2):
202 | '''
203 | Computes a similarity transform (sR, t) that takes
204 | a set of 3D points S1 (3 x N) closest to a set of 3D points S2,
205 | where R is an 3x3 rotation matrix, t 3x1 translation, s scale.
206 | i.e. solves the orthogonal Procrutes problem.
207 | '''
208 | transposed = False
209 | if S1.shape[0] != 3 and S1.shape[0] != 2:
210 | S1 = S1.permute(0,2,1)
211 | S2 = S2.permute(0,2,1)
212 | transposed = True
213 | assert(S2.shape[1] == S1.shape[1])
214 |
215 | # 1. Remove mean.
216 | mu1 = S1.mean(axis=-1, keepdims=True)
217 | mu2 = S2.mean(axis=-1, keepdims=True)
218 |
219 | X1 = S1 - mu1
220 | X2 = S2 - mu2
221 |
222 | # 2. Compute variance of X1 used for scale.
223 | var1 = torch.sum(X1**2, dim=1).sum(dim=1)
224 |
225 | # 3. The outer product of X1 and X2.
226 | K = X1.bmm(X2.permute(0,2,1))
227 |
228 | # 4. Solution that Maximizes trace(R'K) is R=U*V', where U, V are
229 | # singular vectors of K.
230 | U, s, V = torch.svd(K)
231 |
232 | # Construct Z that fixes the orientation of R to get det(R)=1.
233 | Z = torch.eye(U.shape[1], device=S1.device).unsqueeze(0)
234 | Z = Z.repeat(U.shape[0],1,1)
235 | Z[:,-1, -1] *= torch.sign(torch.det(U.bmm(V.permute(0,2,1))))
236 |
237 | # Construct R.
238 | R = V.bmm(Z.bmm(U.permute(0,2,1)))
239 |
240 | # 5. Recover scale.
241 | scale = torch.cat([torch.trace(x).unsqueeze(0) for x in R.bmm(K)]) / var1
242 |
243 | # 6. Recover translation.
244 | t = mu2 - (scale.unsqueeze(-1).unsqueeze(-1) * (R.bmm(mu1)))
245 |
246 | # 7. Error:
247 | S1_hat = scale.unsqueeze(-1).unsqueeze(-1) * R.bmm(S1) + t
248 |
249 | if transposed:
250 | S1_hat = S1_hat.permute(0,2,1)
251 |
252 | return S1_hat
253 |
254 |
255 | def align_by_pelvis(joints):
256 | """
257 | Assumes joints is 14 x 3 in LSP order.
258 | Then hips are: [3, 2]
259 | Takes mid point of these points, then subtracts it.
260 | """
261 |
262 | left_id = 2
263 | right_id = 3
264 |
265 | pelvis = (joints[left_id, :] + joints[right_id, :]) / 2.0
266 | return joints - np.expand_dims(pelvis, axis=0)
267 |
268 |
269 | def compute_errors(gt3ds, preds):
270 | """
271 | Gets MPJPE after pelvis alignment + MPJPE after Procrustes.
272 | Evaluates on the 14 common joints.
273 | Inputs:
274 | - gt3ds: N x 14 x 3
275 | - preds: N x 14 x 3
276 | """
277 | errors, errors_pa = [], []
278 | for i, (gt3d, pred) in enumerate(zip(gt3ds, preds)):
279 | gt3d = gt3d.reshape(-1, 3)
280 | # Root align.
281 | gt3d = align_by_pelvis(gt3d)
282 | pred3d = align_by_pelvis(pred)
283 |
284 | joint_error = np.sqrt(np.sum((gt3d - pred3d)**2, axis=1))
285 | errors.append(np.mean(joint_error))
286 |
287 | # Get PA error.
288 | pred3d_sym = compute_similarity_transform(pred3d, gt3d)
289 | pa_error = np.sqrt(np.sum((gt3d - pred3d_sym)**2, axis=1))
290 | errors_pa.append(np.mean(pa_error))
291 |
292 | return errors, errors_pa
293 |
--------------------------------------------------------------------------------
/lib/utils/fbx_output.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | This script is brought from https://github.com/mkocabas/VIBE
4 | Adhere to their licence to use this script
5 | """
6 | # Notes:
7 | # + Male and female gender models only
8 | # + Script can be run from command line or in Blender Editor (Text Editor>Run Script)
9 | # + Command line: Install mathutils module in your bpy virtualenv with 'pip install mathutils==2.81.2'
10 |
11 | import os
12 | import sys
13 | import bpy
14 | import time
15 | import joblib
16 | import argparse
17 | import numpy as np
18 | import addon_utils
19 | from math import radians
20 | from mathutils import Matrix, Vector, Quaternion, Euler
21 |
22 | # Globals
23 | male_model_path = 'data/SMPL_unity_v.1.0.0/smpl/Models/SMPL_m_unityDoubleBlends_lbs_10_scale5_207_v1.0.0.fbx'
24 | female_model_path = 'data/SMPL_unity_v.1.0.0/smpl/Models/SMPL_f_unityDoubleBlends_lbs_10_scale5_207_v1.0.0.fbx'
25 |
26 | fps_source = 30
27 | fps_target = 30
28 |
29 | gender = 'male'
30 |
31 | start_origin = 1
32 |
33 | bone_name_from_index = {
34 | 0 : 'Pelvis',
35 | 1 : 'L_Hip',
36 | 2 : 'R_Hip',
37 | 3 : 'Spine1',
38 | 4 : 'L_Knee',
39 | 5 : 'R_Knee',
40 | 6 : 'Spine2',
41 | 7 : 'L_Ankle',
42 | 8: 'R_Ankle',
43 | 9: 'Spine3',
44 | 10: 'L_Foot',
45 | 11: 'R_Foot',
46 | 12: 'Neck',
47 | 13: 'L_Collar',
48 | 14: 'R_Collar',
49 | 15: 'Head',
50 | 16: 'L_Shoulder',
51 | 17: 'R_Shoulder',
52 | 18: 'L_Elbow',
53 | 19: 'R_Elbow',
54 | 20: 'L_Wrist',
55 | 21: 'R_Wrist',
56 | 22: 'L_Hand',
57 | 23: 'R_Hand'
58 | }
59 |
60 | # Helper functions
61 |
62 | # Computes rotation matrix through Rodrigues formula as in cv2.Rodrigues
63 | # Source: smpl/plugins/blender/corrective_bpy_sh.py
64 | def Rodrigues(rotvec):
65 | theta = np.linalg.norm(rotvec)
66 | r = (rotvec/theta).reshape(3, 1) if theta > 0. else rotvec
67 | cost = np.cos(theta)
68 | mat = np.asarray([[0, -r[2], r[1]],
69 | [r[2], 0, -r[0]],
70 | [-r[1], r[0], 0]])
71 | return(cost*np.eye(3) + (1-cost)*r.dot(r.T) + np.sin(theta)*mat)
72 |
73 |
74 | # Setup scene
75 | def setup_scene(model_path, fps_target):
76 | scene = bpy.data.scenes['Scene']
77 |
78 | ###########################
79 | # Engine independent setup
80 | ###########################
81 |
82 | scene.render.fps = fps_target
83 |
84 | # Remove default cube
85 | if 'Cube' in bpy.data.objects:
86 | bpy.data.objects['Cube'].select_set(True)
87 | bpy.ops.object.delete()
88 |
89 | # Import gender specific .fbx template file
90 | bpy.ops.import_scene.fbx(filepath=model_path)
91 |
92 |
93 | # Process single pose into keyframed bone orientations
94 | def process_pose(current_frame, pose, trans, pelvis_position):
95 |
96 | if pose.shape[0] == 72:
97 | rod_rots = pose.reshape(24, 3)
98 | else:
99 | rod_rots = pose.reshape(26, 3)
100 |
101 | mat_rots = [Rodrigues(rod_rot) for rod_rot in rod_rots]
102 |
103 | # Set the location of the Pelvis bone to the translation parameter
104 | armature = bpy.data.objects['Armature']
105 | bones = armature.pose.bones
106 |
107 | # Pelvis: X-Right, Y-Up, Z-Forward (Blender -Y)
108 |
109 | # Set absolute pelvis location relative to Pelvis bone head
110 | bones[bone_name_from_index[0]].location = Vector((100*trans[1], 100*trans[2], 100*trans[0])) - pelvis_position
111 |
112 | # bones['Root'].location = Vector(trans)
113 | bones[bone_name_from_index[0]].keyframe_insert('location', frame=current_frame)
114 |
115 | for index, mat_rot in enumerate(mat_rots, 0):
116 | if index >= 24:
117 | continue
118 |
119 | bone = bones[bone_name_from_index[index]]
120 |
121 | bone_rotation = Matrix(mat_rot).to_quaternion()
122 | quat_x_90_cw = Quaternion((1.0, 0.0, 0.0), radians(-90))
123 | quat_z_90_cw = Quaternion((0.0, 0.0, 1.0), radians(-90))
124 |
125 | if index == 0:
126 | # Rotate pelvis so that avatar stands upright and looks along negative Y avis
127 | bone.rotation_quaternion = (quat_x_90_cw @ quat_z_90_cw) @ bone_rotation
128 | else:
129 | bone.rotation_quaternion = bone_rotation
130 |
131 | bone.keyframe_insert('rotation_quaternion', frame=current_frame)
132 |
133 | return
134 |
135 |
136 | # Process all the poses from the pose file
137 | def process_poses(
138 | input_path,
139 | gender,
140 | fps_source,
141 | fps_target,
142 | start_origin,
143 | person_id=1,
144 | ):
145 |
146 | print('Processing: ' + input_path)
147 |
148 | data = joblib.load(input_path)
149 | poses = data[person_id]['pose']
150 | trans = np.zeros((poses.shape[0], 3))
151 |
152 | if gender == 'female':
153 | model_path = female_model_path
154 | for k,v in bone_name_from_index.items():
155 | bone_name_from_index[k] = 'f_avg_' + v
156 | elif gender == 'male':
157 | model_path = male_model_path
158 | for k,v in bone_name_from_index.items():
159 | bone_name_from_index[k] = 'm_avg_' + v
160 | else:
161 | print('ERROR: Unsupported gender: ' + gender)
162 | sys.exit(1)
163 |
164 | # Limit target fps to source fps
165 | if fps_target > fps_source:
166 | fps_target = fps_source
167 |
168 | print(f'Gender: {gender}')
169 | print(f'Number of source poses: {str(poses.shape[0])}')
170 | print(f'Source frames-per-second: {str(fps_source)}')
171 | print(f'Target frames-per-second: {str(fps_target)}')
172 | print('--------------------------------------------------')
173 |
174 | setup_scene(model_path, fps_target)
175 |
176 | scene = bpy.data.scenes['Scene']
177 | sample_rate = int(fps_source/fps_target)
178 | scene.frame_end = (int)(poses.shape[0]/sample_rate)
179 |
180 | # Retrieve pelvis world position.
181 | # Unit is [cm] due to Armature scaling.
182 | # Need to make copy since reference will change when bone location is modified.
183 | bpy.ops.object.mode_set(mode='EDIT')
184 | pelvis_position = Vector(bpy.data.armatures[0].edit_bones[bone_name_from_index[0]].head)
185 | bpy.ops.object.mode_set(mode='OBJECT')
186 |
187 | source_index = 0
188 | frame = 1
189 |
190 | offset = np.array([0.0, 0.0, 0.0])
191 |
192 | while source_index < poses.shape[0]:
193 | print('Adding pose: ' + str(source_index))
194 |
195 | if start_origin:
196 | if source_index == 0:
197 | offset = np.array([trans[source_index][0], trans[source_index][1], 0])
198 |
199 | # Go to new frame
200 | scene.frame_set(frame)
201 |
202 | process_pose(frame, poses[source_index], (trans[source_index] - offset), pelvis_position)
203 | source_index += sample_rate
204 | frame += 1
205 |
206 | return frame
207 |
208 |
209 | def export_animated_mesh(output_path):
210 | # Create output directory if needed
211 | output_dir = os.path.dirname(output_path)
212 | if not os.path.isdir(output_dir):
213 | os.makedirs(output_dir, exist_ok=True)
214 |
215 | # Select only skinned mesh and rig
216 | bpy.ops.object.select_all(action='DESELECT')
217 | bpy.data.objects['Armature'].select_set(True)
218 | bpy.data.objects['Armature'].children[0].select_set(True)
219 |
220 | if output_path.endswith('.glb'):
221 | print('Exporting to glTF binary (.glb)')
222 | # Currently exporting without shape/pose shapes for smaller file sizes
223 | bpy.ops.export_scene.gltf(filepath=output_path, export_format='GLB', export_selected=True, export_morph=False)
224 | elif output_path.endswith('.fbx'):
225 | print('Exporting to FBX binary (.fbx)')
226 | bpy.ops.export_scene.fbx(filepath=output_path, use_selection=True, add_leaf_bones=False)
227 | else:
228 | print('ERROR: Unsupported export format: ' + output_path)
229 | sys.exit(1)
230 |
231 | return
232 |
233 |
234 | if __name__ == '__main__':
235 | try:
236 | if bpy.app.background:
237 |
238 | parser = argparse.ArgumentParser(description='Create keyframed animated skinned SMPL mesh from VIBE output')
239 | parser.add_argument('--input', dest='input_path', type=str, required=True,
240 | help='Input file or directory')
241 | parser.add_argument('--output', dest='output_path', type=str, required=True,
242 | help='Output file or directory')
243 | parser.add_argument('--fps_source', type=int, default=fps_source,
244 | help='Source framerate')
245 | parser.add_argument('--fps_target', type=int, default=fps_target,
246 | help='Target framerate')
247 | parser.add_argument('--gender', type=str, default=gender,
248 | help='Always use specified gender')
249 | parser.add_argument('--start_origin', type=int, default=start_origin,
250 | help='Start animation centered above origin')
251 | parser.add_argument('--person_id', type=int, default=1,
252 | help='Detected person ID to use for fbx animation')
253 |
254 | args = parser.parse_args()
255 |
256 | input_path = args.input_path
257 | output_path = args.output_path
258 |
259 | if not os.path.exists(input_path):
260 | print('ERROR: Invalid input path')
261 | sys.exit(1)
262 |
263 | fps_source = args.fps_source
264 | fps_target = args.fps_target
265 |
266 | gender = args.gender
267 |
268 | start_origin = args.start_origin
269 |
270 | # end if bpy.app.background
271 |
272 | startTime = time.perf_counter()
273 |
274 | # Process data
275 | cwd = os.getcwd()
276 |
277 | # Turn relative input/output paths into absolute paths
278 | if not input_path.startswith(os.path.sep):
279 | input_path = os.path.join(cwd, input_path)
280 |
281 | if not output_path.startswith(os.path.sep):
282 | output_path = os.path.join(cwd, output_path)
283 |
284 | print('Input path: ' + input_path)
285 | print('Output path: ' + output_path)
286 |
287 | if not (output_path.endswith('.fbx') or output_path.endswith('.glb')):
288 | print('ERROR: Invalid output format (must be .fbx or .glb)')
289 | sys.exit(1)
290 |
291 | # Process pose file
292 | if input_path.endswith('.pkl'):
293 | if not os.path.isfile(input_path):
294 | print('ERROR: Invalid input file')
295 | sys.exit(1)
296 |
297 | poses_processed = process_poses(
298 | input_path=input_path,
299 | gender=gender,
300 | fps_source=fps_source,
301 | fps_target=fps_target,
302 | start_origin=start_origin,
303 | person_id=args.person_id
304 | )
305 | export_animated_mesh(output_path)
306 |
307 | print('--------------------------------------------------')
308 | print('Animation export finished.')
309 | print(f'Poses processed: {str(poses_processed)}')
310 | print(f'Processing time : {time.perf_counter() - startTime:.2f} s')
311 | print('--------------------------------------------------')
312 | sys.exit(0)
313 |
314 | except SystemExit as ex:
315 | if ex.code is None:
316 | exit_status = 0
317 | else:
318 | exit_status = ex.code
319 |
320 | print('Exiting. Exit status: ' + str(exit_status))
321 |
322 | # Only exit to OS when we are not running in Blender GUI
323 | if bpy.app.background:
324 | sys.exit(exit_status)
--------------------------------------------------------------------------------
/lib/utils/geometry.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | This script is brought from https://github.com/mkocabas/VIBE
4 | Adhere to their licence to use this script
5 | """
6 |
7 | import torch
8 | import numpy as np
9 | from torch.nn import functional as F
10 |
11 |
12 | def batch_rodrigues(axisang):
13 | # This function is borrowed from https://github.com/MandyMo/pytorch_HMR/blob/master/src/util.py#L37
14 | # axisang N x 3
15 | axisang_norm = torch.norm(axisang + 1e-8, p=2, dim=1)
16 | angle = torch.unsqueeze(axisang_norm, -1)
17 | axisang_normalized = torch.div(axisang, angle)
18 | angle = angle * 0.5
19 | v_cos = torch.cos(angle)
20 | v_sin = torch.sin(angle)
21 | quat = torch.cat([v_cos, v_sin * axisang_normalized], dim=1)
22 | rot_mat = quat2mat(quat)
23 | rot_mat = rot_mat.view(rot_mat.shape[0], 9)
24 | return rot_mat
25 |
26 |
27 | def quat2mat(quat):
28 | """
29 | This function is borrowed from https://github.com/MandyMo/pytorch_HMR/blob/master/src/util.py#L50
30 |
31 | Convert quaternion coefficients to rotation matrix.
32 | Args:
33 | quat: size = [batch_size, 4] 4 <===>(w, x, y, z)
34 | Returns:
35 | Rotation matrix corresponding to the quaternion -- size = [batch_size, 3, 3]
36 | """
37 | norm_quat = quat
38 | norm_quat = norm_quat / norm_quat.norm(p=2, dim=1, keepdim=True)
39 | w, x, y, z = norm_quat[:, 0], norm_quat[:, 1], norm_quat[:,
40 | 2], norm_quat[:,
41 | 3]
42 |
43 | batch_size = quat.size(0)
44 |
45 | w2, x2, y2, z2 = w.pow(2), x.pow(2), y.pow(2), z.pow(2)
46 | wx, wy, wz = w * x, w * y, w * z
47 | xy, xz, yz = x * y, x * z, y * z
48 |
49 | rotMat = torch.stack([
50 | w2 + x2 - y2 - z2, 2 * xy - 2 * wz, 2 * wy + 2 * xz, 2 * wz + 2 * xy,
51 | w2 - x2 + y2 - z2, 2 * yz - 2 * wx, 2 * xz - 2 * wy, 2 * wx + 2 * yz,
52 | w2 - x2 - y2 + z2
53 | ],
54 | dim=1).view(batch_size, 3, 3)
55 | return rotMat
56 |
57 |
58 | def rotation_matrix_to_angle_axis(rotation_matrix):
59 | """
60 | This function is borrowed from https://github.com/kornia/kornia
61 |
62 | Convert 3x4 rotation matrix to Rodrigues vector
63 |
64 | Args:
65 | rotation_matrix (Tensor): rotation matrix.
66 |
67 | Returns:
68 | Tensor: Rodrigues vector transformation.
69 |
70 | Shape:
71 | - Input: :math:`(N, 3, 4)`
72 | - Output: :math:`(N, 3)`
73 |
74 | Example:
75 | >>> input = torch.rand(2, 3, 4) # Nx4x4
76 | >>> output = tgm.rotation_matrix_to_angle_axis(input) # Nx3
77 | """
78 | if rotation_matrix.shape[1:] == (3,3):
79 | rot_mat = rotation_matrix.reshape(-1, 3, 3)
80 | hom = torch.tensor([0, 0, 1], dtype=torch.float32,
81 | device=rotation_matrix.device).reshape(1, 3, 1).expand(rot_mat.shape[0], -1, -1)
82 | rotation_matrix = torch.cat([rot_mat, hom], dim=-1)
83 |
84 | quaternion = rotation_matrix_to_quaternion(rotation_matrix)
85 | aa = quaternion_to_angle_axis(quaternion)
86 | aa[torch.isnan(aa)] = 0.0
87 | return aa
88 |
89 |
90 | def quaternion_to_angle_axis(quaternion: torch.Tensor) -> torch.Tensor:
91 | """
92 | This function is borrowed from https://github.com/kornia/kornia
93 |
94 | Convert quaternion vector to angle axis of rotation.
95 |
96 | Adapted from ceres C++ library: ceres-solver/include/ceres/rotation.h
97 |
98 | Args:
99 | quaternion (torch.Tensor): tensor with quaternions.
100 |
101 | Return:
102 | torch.Tensor: tensor with angle axis of rotation.
103 |
104 | Shape:
105 | - Input: :math:`(*, 4)` where `*` means, any number of dimensions
106 | - Output: :math:`(*, 3)`
107 |
108 | Example:
109 | >>> quaternion = torch.rand(2, 4) # Nx4
110 | >>> angle_axis = tgm.quaternion_to_angle_axis(quaternion) # Nx3
111 | """
112 | if not torch.is_tensor(quaternion):
113 | raise TypeError("Input type is not a torch.Tensor. Got {}".format(
114 | type(quaternion)))
115 |
116 | if not quaternion.shape[-1] == 4:
117 | raise ValueError("Input must be a tensor of shape Nx4 or 4. Got {}"
118 | .format(quaternion.shape))
119 | # unpack input and compute conversion
120 | q1: torch.Tensor = quaternion[..., 1]
121 | q2: torch.Tensor = quaternion[..., 2]
122 | q3: torch.Tensor = quaternion[..., 3]
123 | sin_squared_theta: torch.Tensor = q1 * q1 + q2 * q2 + q3 * q3
124 |
125 | sin_theta: torch.Tensor = torch.sqrt(sin_squared_theta)
126 | cos_theta: torch.Tensor = quaternion[..., 0]
127 | two_theta: torch.Tensor = 2.0 * torch.where(
128 | cos_theta < 0.0,
129 | torch.atan2(-sin_theta, -cos_theta),
130 | torch.atan2(sin_theta, cos_theta))
131 |
132 | k_pos: torch.Tensor = two_theta / sin_theta
133 | k_neg: torch.Tensor = 2.0 * torch.ones_like(sin_theta)
134 | k: torch.Tensor = torch.where(sin_squared_theta > 0.0, k_pos, k_neg)
135 |
136 | angle_axis: torch.Tensor = torch.zeros_like(quaternion)[..., :3]
137 | angle_axis[..., 0] += q1 * k
138 | angle_axis[..., 1] += q2 * k
139 | angle_axis[..., 2] += q3 * k
140 | return angle_axis
141 |
142 |
143 | def rotation_matrix_to_quaternion(rotation_matrix, eps=1e-6):
144 | """
145 | This function is borrowed from https://github.com/kornia/kornia
146 |
147 | Convert 3x4 rotation matrix to 4d quaternion vector
148 |
149 | This algorithm is based on algorithm described in
150 | https://github.com/KieranWynn/pyquaternion/blob/master/pyquaternion/quaternion.py#L201
151 |
152 | Args:
153 | rotation_matrix (Tensor): the rotation matrix to convert.
154 |
155 | Return:
156 | Tensor: the rotation in quaternion
157 |
158 | Shape:
159 | - Input: :math:`(N, 3, 4)`
160 | - Output: :math:`(N, 4)`
161 |
162 | Example:
163 | >>> input = torch.rand(4, 3, 4) # Nx3x4
164 | >>> output = tgm.rotation_matrix_to_quaternion(input) # Nx4
165 | """
166 | if not torch.is_tensor(rotation_matrix):
167 | raise TypeError("Input type is not a torch.Tensor. Got {}".format(
168 | type(rotation_matrix)))
169 |
170 | if len(rotation_matrix.shape) > 3:
171 | raise ValueError(
172 | "Input size must be a three dimensional tensor. Got {}".format(
173 | rotation_matrix.shape))
174 | if not rotation_matrix.shape[-2:] == (3, 4):
175 | raise ValueError(
176 | "Input size must be a N x 3 x 4 tensor. Got {}".format(
177 | rotation_matrix.shape))
178 |
179 | rmat_t = torch.transpose(rotation_matrix, 1, 2)
180 |
181 | mask_d2 = rmat_t[:, 2, 2] < eps
182 |
183 | mask_d0_d1 = rmat_t[:, 0, 0] > rmat_t[:, 1, 1]
184 | mask_d0_nd1 = rmat_t[:, 0, 0] < -rmat_t[:, 1, 1]
185 |
186 | t0 = 1 + rmat_t[:, 0, 0] - rmat_t[:, 1, 1] - rmat_t[:, 2, 2]
187 | q0 = torch.stack([rmat_t[:, 1, 2] - rmat_t[:, 2, 1],
188 | t0, rmat_t[:, 0, 1] + rmat_t[:, 1, 0],
189 | rmat_t[:, 2, 0] + rmat_t[:, 0, 2]], -1)
190 | t0_rep = t0.repeat(4, 1).t()
191 |
192 | t1 = 1 - rmat_t[:, 0, 0] + rmat_t[:, 1, 1] - rmat_t[:, 2, 2]
193 | q1 = torch.stack([rmat_t[:, 2, 0] - rmat_t[:, 0, 2],
194 | rmat_t[:, 0, 1] + rmat_t[:, 1, 0],
195 | t1, rmat_t[:, 1, 2] + rmat_t[:, 2, 1]], -1)
196 | t1_rep = t1.repeat(4, 1).t()
197 |
198 | t2 = 1 - rmat_t[:, 0, 0] - rmat_t[:, 1, 1] + rmat_t[:, 2, 2]
199 | q2 = torch.stack([rmat_t[:, 0, 1] - rmat_t[:, 1, 0],
200 | rmat_t[:, 2, 0] + rmat_t[:, 0, 2],
201 | rmat_t[:, 1, 2] + rmat_t[:, 2, 1], t2], -1)
202 | t2_rep = t2.repeat(4, 1).t()
203 |
204 | t3 = 1 + rmat_t[:, 0, 0] + rmat_t[:, 1, 1] + rmat_t[:, 2, 2]
205 | q3 = torch.stack([t3, rmat_t[:, 1, 2] - rmat_t[:, 2, 1],
206 | rmat_t[:, 2, 0] - rmat_t[:, 0, 2],
207 | rmat_t[:, 0, 1] - rmat_t[:, 1, 0]], -1)
208 | t3_rep = t3.repeat(4, 1).t()
209 |
210 | mask_c0 = mask_d2 * mask_d0_d1
211 | mask_c1 = mask_d2 * ~mask_d0_d1
212 | mask_c2 = ~mask_d2 * mask_d0_nd1
213 | mask_c3 = ~mask_d2 * ~mask_d0_nd1
214 | mask_c0 = mask_c0.view(-1, 1).type_as(q0)
215 | mask_c1 = mask_c1.view(-1, 1).type_as(q1)
216 | mask_c2 = mask_c2.view(-1, 1).type_as(q2)
217 | mask_c3 = mask_c3.view(-1, 1).type_as(q3)
218 |
219 | q = q0 * mask_c0 + q1 * mask_c1 + q2 * mask_c2 + q3 * mask_c3
220 | q /= torch.sqrt(t0_rep * mask_c0 + t1_rep * mask_c1 + # noqa
221 | t2_rep * mask_c2 + t3_rep * mask_c3) # noqa
222 | q *= 0.5
223 | return q
224 |
225 |
226 | def estimate_translation_np(S, joints_2d, joints_conf, focal_length=5000., img_size=224.):
227 | """
228 | This function is borrowed from https://github.com/nkolot/SPIN/utils/geometry.py
229 |
230 | Find camera translation that brings 3D joints S closest to 2D the corresponding joints_2d.
231 | Input:
232 | S: (25, 3) 3D joint locations
233 | joints: (25, 3) 2D joint locations and confidence
234 | Returns:
235 | (3,) camera translation vector
236 | """
237 |
238 | num_joints = S.shape[0]
239 | # focal length
240 | f = np.array([focal_length,focal_length])
241 | # optical center
242 | center = np.array([img_size/2., img_size/2.])
243 |
244 | # transformations
245 | Z = np.reshape(np.tile(S[:,2],(2,1)).T,-1)
246 | XY = np.reshape(S[:,0:2],-1)
247 | O = np.tile(center,num_joints)
248 | F = np.tile(f,num_joints)
249 | weight2 = np.reshape(np.tile(np.sqrt(joints_conf),(2,1)).T,-1)
250 |
251 | # least squares
252 | Q = np.array([F*np.tile(np.array([1,0]),num_joints), F*np.tile(np.array([0,1]),num_joints), O-np.reshape(joints_2d,-1)]).T
253 | c = (np.reshape(joints_2d,-1)-O)*Z - F*XY
254 |
255 | # weighted least squares
256 | W = np.diagflat(weight2)
257 | Q = np.dot(W,Q)
258 | c = np.dot(W,c)
259 |
260 | # square matrix
261 | A = np.dot(Q.T,Q)
262 | b = np.dot(Q.T,c)
263 |
264 | # solution
265 | trans = np.linalg.solve(A, b)
266 |
267 | return trans
268 |
269 |
270 | def estimate_translation(S, joints_2d, focal_length=5000., img_size=224.):
271 | """
272 | This function is borrowed from https://github.com/nkolot/SPIN/utils/geometry.py
273 |
274 | Find camera translation that brings 3D joints S closest to 2D the corresponding joints_2d.
275 | Input:
276 | S: (B, 49, 3) 3D joint locations
277 | joints: (B, 49, 3) 2D joint locations and confidence
278 | Returns:
279 | (B, 3) camera translation vectors
280 | """
281 |
282 | device = S.device
283 | # Use only joints 25:49 (GT joints)
284 | S = S[:, 25:, :].cpu().numpy()
285 | joints_2d = joints_2d[:, 25:, :].cpu().numpy()
286 | joints_conf = joints_2d[:, :, -1]
287 | joints_2d = joints_2d[:, :, :-1]
288 | trans = np.zeros((S.shape[0], 3), dtype=np.float32)
289 | # Find the translation for each example in the batch
290 | for i in range(S.shape[0]):
291 | S_i = S[i]
292 | joints_i = joints_2d[i]
293 | conf_i = joints_conf[i]
294 | trans[i] = estimate_translation_np(S_i, joints_i, conf_i, focal_length=focal_length, img_size=img_size)
295 | return torch.from_numpy(trans).to(device)
296 |
297 |
298 | def rot6d_to_rotmat_spin(x):
299 | """Convert 6D rotation representation to 3x3 rotation matrix.
300 | Based on Zhou et al., "On the Continuity of Rotation Representations in Neural Networks", CVPR 2019
301 | Input:
302 | (B,6) Batch of 6-D rotation representations
303 | Output:
304 | (B,3,3) Batch of corresponding rotation matrices
305 | """
306 | x = x.view(-1,3,2)
307 | a1 = x[:, :, 0]
308 | a2 = x[:, :, 1]
309 | b1 = F.normalize(a1)
310 | b2 = F.normalize(a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1)
311 |
312 | # inp = a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1
313 | # denom = inp.pow(2).sum(dim=1).sqrt().unsqueeze(-1) + 1e-8
314 | # b2 = inp / denom
315 |
316 | b3 = torch.cross(b1, b2)
317 | return torch.stack((b1, b2, b3), dim=-1)
318 |
319 |
320 | def rot6d_to_rotmat(x):
321 | x = x.view(-1,3,2)
322 |
323 | # Normalize the first vector
324 | b1 = F.normalize(x[:, :, 0], dim=1, eps=1e-6)
325 |
326 | dot_prod = torch.sum(b1 * x[:, :, 1], dim=1, keepdim=True)
327 | # Compute the second vector by finding the orthogonal complement to it
328 | b2 = F.normalize(x[:, :, 1] - dot_prod * b1, dim=-1, eps=1e-6)
329 |
330 | # Finish building the basis by taking the cross product
331 | b3 = torch.cross(b1, b2, dim=1)
332 | rot_mats = torch.stack([b1, b2, b3], dim=-1)
333 |
334 | return rot_mats
--------------------------------------------------------------------------------
/lib/utils/pose_tracker.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | This script is brought from https://github.com/mkocabas/VIBE
4 | Adhere to their licence to use this script
5 | """
6 |
7 | import os
8 | import json
9 | import shutil
10 | import subprocess
11 | import numpy as np
12 | import os.path as osp
13 |
14 |
15 | def run_openpose(
16 | video_file,
17 | output_folder,
18 | staf_folder,
19 | vis=False,
20 | ):
21 | pwd = os.getcwd()
22 |
23 | os.chdir(staf_folder)
24 |
25 | render = 1 if vis else 0
26 | display = 2 if vis else 0
27 | cmd = [
28 | 'build/examples/openpose/openpose.bin',
29 | '--model_pose', 'BODY_21A',
30 | '--tracking', '1',
31 | '--render_pose', str(render),
32 | '--video', video_file,
33 | '--write_json', output_folder,
34 | '--display', str(display)
35 | ]
36 |
37 | print('Executing', ' '.join(cmd))
38 | subprocess.call(cmd)
39 | os.chdir(pwd)
40 |
41 |
42 | def read_posetrack_keypoints(output_folder):
43 |
44 | people = dict()
45 |
46 | for idx, result_file in enumerate(sorted(os.listdir(output_folder))):
47 | json_file = osp.join(output_folder, result_file)
48 | data = json.load(open(json_file))
49 | # print(idx, data)
50 | for person in data['people']:
51 | person_id = person['person_id'][0]
52 | joints2d = person['pose_keypoints_2d']
53 | if person_id in people.keys():
54 | people[person_id]['joints2d'].append(joints2d)
55 | people[person_id]['frames'].append(idx)
56 | else:
57 | people[person_id] = {
58 | 'joints2d': [],
59 | 'frames': [],
60 | }
61 | people[person_id]['joints2d'].append(joints2d)
62 | people[person_id]['frames'].append(idx)
63 |
64 | for k in people.keys():
65 | people[k]['joints2d'] = np.array(people[k]['joints2d']).reshape((len(people[k]['joints2d']), -1, 3))
66 | people[k]['frames'] = np.array(people[k]['frames'])
67 |
68 | return people
69 |
70 |
71 | def run_posetracker(video_file, staf_folder, posetrack_output_folder='/tmp', display=False):
72 | posetrack_output_folder = os.path.join(
73 | posetrack_output_folder,
74 | f'{os.path.basename(video_file)}_posetrack'
75 | )
76 |
77 | # run posetrack on video
78 | run_openpose(
79 | video_file,
80 | posetrack_output_folder,
81 | vis=display,
82 | staf_folder=staf_folder
83 | )
84 |
85 | people_dict = read_posetrack_keypoints(posetrack_output_folder)
86 |
87 | shutil.rmtree(posetrack_output_folder)
88 |
89 | return people_dict
--------------------------------------------------------------------------------
/lib/utils/renderer.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | This script is brought from https://github.com/mkocabas/VIBE
4 | Adhere to their licence to use this script
5 | """
6 |
7 | import os
8 |
9 | import math
10 | import trimesh
11 | import pyrender
12 | import numpy as np
13 | from pyrender.constants import RenderFlags
14 | from lib.models.smpl import get_smpl_faces
15 |
16 | class WeakPerspectiveCamera(pyrender.Camera):
17 | def __init__(self,
18 | scale,
19 | translation,
20 | znear=pyrender.camera.DEFAULT_Z_NEAR,
21 | zfar=None,
22 | name=None):
23 | super(WeakPerspectiveCamera, self).__init__(
24 | znear=znear,
25 | zfar=zfar,
26 | name=name,
27 | )
28 | self.scale = scale
29 | self.translation = translation
30 |
31 | def get_projection_matrix(self, width=None, height=None):
32 | P = np.eye(4)
33 | P[0, 0] = self.scale[0]
34 | P[1, 1] = self.scale[1]
35 | P[0, 3] = self.translation[0] * self.scale[0]
36 | P[1, 3] = -self.translation[1] * self.scale[1]
37 | P[2, 2] = -1
38 | return P
39 |
40 |
41 | class Renderer:
42 | def __init__(self, resolution=(224,224), orig_img=False, wireframe=False):
43 | self.resolution = resolution
44 |
45 | self.faces = get_smpl_faces()
46 | self.orig_img = orig_img
47 | self.wireframe = wireframe
48 | self.renderer = pyrender.OffscreenRenderer(
49 | viewport_width=self.resolution[0],
50 | viewport_height=self.resolution[1],
51 | point_size=1.0
52 | )
53 |
54 | # set the scene
55 | self.scene = pyrender.Scene(bg_color=[0.0, 0.0, 0.0, 0.0], ambient_light=(0.3, 0.3, 0.3))
56 |
57 | light = pyrender.PointLight(color=[1.0, 1.0, 1.0], intensity=1)
58 |
59 | light_pose = np.eye(4)
60 | light_pose[:3, 3] = [0, -1, 1]
61 | self.scene.add(light, pose=light_pose)
62 |
63 | light_pose[:3, 3] = [0, 1, 1]
64 | self.scene.add(light, pose=light_pose)
65 |
66 | light_pose[:3, 3] = [1, 1, 2]
67 | self.scene.add(light, pose=light_pose)
68 |
69 | def set_faces(self, indices):
70 | inter = [np.intersect1d(face, indices, assume_unique=True) for face in self.faces]
71 | idx = [x.size == 3 for x in inter]
72 | self.faces = self.faces[idx]
73 |
74 | def render(self, img, verts, cam, angle=None, axis=None, mesh_filename=None, color=[1.0, 1.0, 0.9]):
75 |
76 | mesh = trimesh.Trimesh(vertices=verts, faces=self.faces, process=False)
77 |
78 | Rx = trimesh.transformations.rotation_matrix(math.radians(180), [1, 0, 0])
79 | mesh.apply_transform(Rx)
80 |
81 | if mesh_filename is not None:
82 | mesh.export(mesh_filename)
83 |
84 | if angle and axis:
85 | R = trimesh.transformations.rotation_matrix(math.radians(angle), axis)
86 | mesh.apply_transform(R)
87 |
88 | sx, sy, tx, ty = cam
89 |
90 | camera = WeakPerspectiveCamera(
91 | scale=[sx, sy],
92 | translation=[tx, ty],
93 | zfar=1000.
94 | )
95 |
96 | material = pyrender.MetallicRoughnessMaterial(
97 | metallicFactor=0.0,
98 | alphaMode='OPAQUE',
99 | baseColorFactor=(color[0], color[1], color[2], 1.0)
100 | )
101 |
102 | mesh = pyrender.Mesh.from_trimesh(mesh, material=material)
103 |
104 | mesh_node = self.scene.add(mesh, 'mesh')
105 |
106 | camera_pose = np.eye(4)
107 | cam_node = self.scene.add(camera, pose=camera_pose)
108 |
109 | if self.wireframe:
110 | render_flags = RenderFlags.RGBA | RenderFlags.ALL_WIREFRAME
111 | else:
112 | render_flags = RenderFlags.RGBA
113 |
114 | rgb, _ = self.renderer.render(self.scene, flags=render_flags)
115 | valid_mask = (rgb[:, :, -1] > 0)[:, :, np.newaxis]
116 | output_img = rgb[:, :, :-1] * valid_mask + (1 - valid_mask) * img
117 | image = output_img.astype(np.uint8)
118 |
119 | self.scene.remove_node(mesh_node)
120 | self.scene.remove_node(cam_node)
121 |
122 | return image
123 |
--------------------------------------------------------------------------------
/lib/utils/smooth_bbox.py:
--------------------------------------------------------------------------------
1 | """
2 | This script is borrowed from https://github.com/akanazawa/human_dynamics/blob/master/src/util/smooth_bbox.py
3 | Adhere to their licence to use this script
4 | """
5 |
6 | import numpy as np
7 | import scipy.signal as signal
8 | from scipy.ndimage.filters import gaussian_filter1d
9 |
10 |
11 | def get_smooth_bbox_params(kps, vis_thresh=2, kernel_size=11, sigma=3):
12 | """
13 | Computes smooth bounding box parameters from keypoints:
14 | 1. Computes bbox by rescaling the person to be around 150 px.
15 | 2. Linearly interpolates bbox params for missing annotations.
16 | 3. Median filtering
17 | 4. Gaussian filtering.
18 |
19 | Recommended thresholds:
20 | * detect-and-track: 0
21 | * 3DPW: 0.1
22 |
23 | Args:
24 | kps (list): List of kps (Nx3) or None.
25 | vis_thresh (float): Threshold for visibility.
26 | kernel_size (int): Kernel size for median filtering (must be odd).
27 | sigma (float): Sigma for gaussian smoothing.
28 |
29 | Returns:
30 | Smooth bbox params [cx, cy, scale], start index, end index
31 | """
32 | bbox_params, start, end = get_all_bbox_params(kps, vis_thresh)
33 | smoothed = smooth_bbox_params(bbox_params, kernel_size, sigma)
34 | smoothed = np.vstack((np.zeros((start, 3)), smoothed))
35 | return smoothed, start, end
36 |
37 |
38 | def kp_to_bbox_param(kp, vis_thresh):
39 | """
40 | Finds the bounding box parameters from the 2D keypoints.
41 |
42 | Args:
43 | kp (Kx3): 2D Keypoints.
44 | vis_thresh (float): Threshold for visibility.
45 |
46 | Returns:
47 | [center_x, center_y, scale]
48 | """
49 | if kp is None:
50 | return
51 | vis = kp[:, 2] > vis_thresh
52 | if not np.any(vis):
53 | return
54 | min_pt = np.min(kp[vis, :2], axis=0)
55 | max_pt = np.max(kp[vis, :2], axis=0)
56 | person_height = np.linalg.norm(max_pt - min_pt)
57 | if person_height < 0.5:
58 | return
59 | center = (min_pt + max_pt) / 2.
60 | scale = 150. / person_height
61 | return np.append(center, scale)
62 |
63 |
64 | def get_all_bbox_params(kps, vis_thresh=2):
65 | """
66 | Finds bounding box parameters for all keypoints.
67 |
68 | Look for sequences in the middle with no predictions and linearly
69 | interpolate the bbox params for those
70 |
71 | Args:
72 | kps (list): List of kps (Kx3) or None.
73 | vis_thresh (float): Threshold for visibility.
74 |
75 | Returns:
76 | bbox_params, start_index (incl), end_index (excl)
77 | """
78 | # keeps track of how many indices in a row with no prediction
79 | num_to_interpolate = 0
80 | start_index = -1
81 | bbox_params = np.empty(shape=(0, 3), dtype=np.float32)
82 |
83 | for i, kp in enumerate(kps):
84 | bbox_param = kp_to_bbox_param(kp, vis_thresh=vis_thresh)
85 | if bbox_param is None:
86 | num_to_interpolate += 1
87 | continue
88 |
89 | if start_index == -1:
90 | # Found the first index with a prediction!
91 | start_index = i
92 | num_to_interpolate = 0
93 |
94 | if num_to_interpolate > 0:
95 | # Linearly interpolate each param.
96 | previous = bbox_params[-1]
97 | # This will be 3x(n+2)
98 | interpolated = np.array(
99 | [np.linspace(prev, curr, num_to_interpolate + 2)
100 | for prev, curr in zip(previous, bbox_param)])
101 | bbox_params = np.vstack((bbox_params, interpolated.T[1:-1]))
102 | num_to_interpolate = 0
103 | bbox_params = np.vstack((bbox_params, bbox_param))
104 |
105 | return bbox_params, start_index, i - num_to_interpolate + 1
106 |
107 |
108 | def smooth_bbox_params(bbox_params, kernel_size=11, sigma=8):
109 | """
110 | Applies median filtering and then gaussian filtering to bounding box
111 | parameters.
112 |
113 | Args:
114 | bbox_params (Nx3): [cx, cy, scale].
115 | kernel_size (int): Kernel size for median filtering (must be odd).
116 | sigma (float): Sigma for gaussian smoothing.
117 |
118 | Returns:
119 | Smoothed bounding box parameters (Nx3).
120 | """
121 | smoothed = np.array([signal.medfilt(param, kernel_size)
122 | for param in bbox_params.T]).T
123 | return np.array([gaussian_filter1d(traj, sigma) for traj in smoothed.T]).T
124 |
--------------------------------------------------------------------------------
/lib/utils/utils.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | This script is brought from https://github.com/mkocabas/VIBE
4 | Adhere to their licence to use this script
5 | """
6 |
7 | import os
8 | import yaml
9 | import time
10 | import torch
11 | import shutil
12 | import logging
13 | import operator
14 | import torch
15 | from tqdm import tqdm
16 | from os import path as osp
17 | from functools import reduce
18 | from typing import List, Union
19 |
20 |
21 | def move_dict_to_device(dic, device, tensor2float=False):
22 | for k,v in dic.items():
23 | if isinstance(v, torch.Tensor):
24 | if tensor2float:
25 | dic[k] = v.float().to(device)
26 | else:
27 | dic[k] = v.to(device)
28 | elif isinstance(v, dict):
29 | move_dict_to_device(v, device)
30 |
31 |
32 | def get_from_dict(dict, keys):
33 | return reduce(operator.getitem, keys, dict)
34 |
35 |
36 | def tqdm_enumerate(iter, desc=""):
37 | i = 0
38 | for y in tqdm(iter, desc=desc):
39 | yield i, y
40 | i += 1
41 |
42 |
43 | def iterdict(d):
44 | for k,v in d.items():
45 | if isinstance(v, dict):
46 | d[k] = dict(v)
47 | iterdict(v)
48 | return d
49 |
50 |
51 | def accuracy(output, target):
52 | _, pred = output.topk(1)
53 | pred = pred.view(-1)
54 |
55 | correct = pred.eq(target).sum()
56 |
57 | return correct.item(), target.size(0) - correct.item()
58 |
59 |
60 | def lr_decay(optimizer, step, lr, decay_step, gamma):
61 | lr = lr * gamma ** (step/decay_step)
62 | for param_group in optimizer.param_groups:
63 | param_group['lr'] = lr
64 | return lr
65 |
66 |
67 | def step_decay(optimizer, step, lr, decay_step, gamma):
68 | lr = lr * gamma ** (step / decay_step)
69 | for param_group in optimizer.param_groups:
70 | param_group['lr'] = lr
71 | return lr
72 |
73 |
74 | def read_yaml(filename):
75 | return yaml.load(open(filename, 'r'))
76 |
77 |
78 | def write_yaml(filename, object):
79 | with open(filename, 'w') as f:
80 | yaml.dump(object, f)
81 |
82 |
83 | def save_cfgnode_to_yaml(cfgnode, filename, mode='w'):
84 | with open(filename, mode) as f:
85 | f.write(cfgnode.dump())
86 |
87 |
88 | def save_to_file(obj, filename, mode='w'):
89 | with open(filename, mode) as f:
90 | f.write(obj)
91 |
92 |
93 | def concatenate_dicts(dict_list, dim=0):
94 | rdict = dict.fromkeys(dict_list[0].keys())
95 | for k in rdict.keys():
96 | rdict[k] = torch.cat([d[k] for d in dict_list], dim=dim)
97 | return rdict
98 |
99 |
100 | def bool_to_string(x: Union[List[bool],bool]) -> Union[List[str],str]:
101 | """
102 | boolean to string conversion
103 | :param x: list or bool to be converted
104 | :return: string converted thing
105 | """
106 | if isinstance(x, bool):
107 | return [str(x)]
108 | for i, j in enumerate(x):
109 | x[i]=str(j)
110 | return x
111 |
112 |
113 | def checkpoint2model(checkpoint, key='gen_state_dict'):
114 | state_dict = checkpoint[key]
115 | print(f'Performance of loaded model on 3DPW is {checkpoint["performance"]:.2f}mm')
116 | # del state_dict['regressor.mean_theta']
117 | return state_dict
118 |
119 |
120 | def get_optimizer(model, optim_type, lr, weight_decay, momentum):
121 | if optim_type in ['sgd', 'SGD']:
122 | opt = torch.optim.SGD(
123 | lr=lr,
124 | params=[{'params': p, 'name': n} for n, p in model.named_parameters()],
125 | momentum=momentum
126 | )
127 | elif optim_type in ['Adam', 'adam', 'ADAM']:
128 | opt = torch.optim.Adam(
129 | lr=lr,
130 | params=[{'params': p, 'name': n} for n, p in model.named_parameters()],
131 | weight_decay=weight_decay
132 | )
133 | else:
134 | raise ModuleNotFoundError
135 | return opt
136 |
137 |
138 | def create_logger(logdir, phase='train'):
139 | os.makedirs(logdir, exist_ok=True)
140 |
141 | log_file = osp.join(logdir, f'{phase}_log.txt')
142 |
143 | head = '%(asctime)-15s %(message)s'
144 | logging.basicConfig(filename=log_file,
145 | format=head)
146 | logger = logging.getLogger()
147 | logger.setLevel(logging.INFO)
148 | console = logging.StreamHandler()
149 | logging.getLogger('').addHandler(console)
150 |
151 | return logger
152 |
153 |
154 | class AverageMeter(object):
155 | def __init__(self):
156 | self.val = 0
157 | self.avg = 0
158 | self.sum = 0
159 | self.count = 0
160 |
161 | def update(self, val, n=1):
162 | self.val = val
163 | self.sum += val * n
164 | self.count += n
165 | self.avg = self.sum / self.count
166 |
167 |
168 | def prepare_output_dir(cfg, cfg_file):
169 |
170 | # ==== create logdir
171 | logtime = time.strftime('%Y-%m-%d:_%H-%M-%S')
172 | logdir = f'{logtime}_{cfg.EXP_NAME}'
173 |
174 | logdir = osp.join(cfg.OUTPUT_DIR, logdir)
175 | os.makedirs(logdir, exist_ok=True)
176 |
177 | cfg.LOGDIR = logdir
178 | #cfg.TRAIN.PRETRAINED = osp.join(logdir, "model_best.pth.tar")
179 |
180 | # save config
181 | save_cfgnode_to_yaml(cfg, osp.join(cfg.LOGDIR, 'config.yaml'))
182 |
183 | return cfg
184 |
185 | def determine_output_feature_dim(inp_size, model):
186 | with torch.no_grad():
187 | # FIXME this is hacky, but most reliable way of determining the exact dim of the output feature
188 | # map for all networks, the feature metadata has reliable channel and stride info, but using
189 | # stride to calc feature dim requires info about padding of each stage that isn't captured.
190 | training = model.training
191 | if training:
192 | model.eval()
193 | o = model(torch.zeros(inp_size))
194 | if isinstance(o, (list, tuple)):
195 | o = o[-1] # last feature if backbone outputs list/tuple of features
196 | feature_size = o.shape[-2:]
197 | feature_dim = o.shape[1]
198 | model.train(training)
199 | return feature_size, feature_dim
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | tqdm
2 | yacs
3 | numpy
4 | smplx==0.1.13
5 | h5py
6 | joblib
7 | tensorboard
8 | scikit-image
9 | scikit-video
10 | opencv-python
11 | trimesh
12 | pyrender
13 | scipy
14 | # chumpy ##git+https://github.com/mattloper/chumpy.git
--------------------------------------------------------------------------------
/scripts/eval.sh:
--------------------------------------------------------------------------------
1 | export PYTHONPATH="./:$PYTHONPATH"
2 | srun \
3 | --partition=innova \
4 | --nodes=1 \
5 | --ntasks-per-node=1 \
6 | --gres=gpu:1 \
7 | python eval.py --cfg $1 --pretrained $2 --eval_ds $3 --eval_set $4
8 |
--------------------------------------------------------------------------------
/scripts/prepare_insta.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | n=1
3 | for ((i=0;i<$n;i++))
4 | do
5 | srun \
6 | --job-name=insta_data \
7 | --kill-on-bad-exit=1 \
8 | python lib/data_utils/insta_utils_imgs.py --inp_dir ./data/insta_variety --n $n --i $i >log/$i.log 2>&1 &
9 | done
10 |
--------------------------------------------------------------------------------
/scripts/prepare_training_data.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | #mkdir -p ./data/vibe_db
4 | export PYTHONPATH="./:$PYTHONPATH"
5 |
6 | # AMASS
7 | #python lib/data_utils/amass_utils.py --dir ./data/amass
8 |
9 | # InstaVariety
10 | # Comment this if you already downloaded the preprocessed file
11 | #python lib/data_utils/insta_utils.py --dir ./data/insta_variety
12 |
13 | # 3DPW
14 | #python lib/data_utils/threedpw_utils.py --dir ./data/3dpw
15 |
16 | # MPI-INF-3D-HP
17 | python lib/data_utils/mpii3d_utils.py --dir ./data/mpi_inf_3dhp
18 |
19 | # PoseTrack
20 | python lib/data_utils/posetrack_utils.py --dir ./data/posetrack
21 |
22 | # PennAction
23 | python lib/data_utils/penn_action_utils.py --dir ./data/penn_action
24 |
--------------------------------------------------------------------------------
/scripts/run.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | ### rand a 5 digit port number
4 | function rand(){
5 | min=$1
6 | max=$(($2-$min+1))
7 | num=$(($RANDOM+1000000000))
8 | echo $(($num%$max+$min))
9 | }
10 | export MASTER_PORT=$(rand 10000 20000)
11 | echo "MASTER_PORT="$MASTER_PORT
12 |
13 | export PYTHONPATH="./:$PYTHONPATH"
14 | srun \
15 | --mpi=pmi2 \
16 | --partition=innova \
17 | --nodes=$1 \
18 | --ntasks-per-node=$2 \
19 | --gres=gpu:$2 \
20 | --kill-on-bad-exit=1 \
21 | python train.py --cfg=$3 --pretrained=$4
22 |
--------------------------------------------------------------------------------
/tox.ini:
--------------------------------------------------------------------------------
1 | [flake8]
2 | max-line-length = 88
3 | ignore = F401,E402,F403,W503,W504
--------------------------------------------------------------------------------