├── .gitignore ├── DATASET.md ├── README.md ├── configs ├── all_linear │ ├── mpii_heatmap.json │ ├── pushup_heatmap.json │ └── pushup_regression.json ├── mpii │ ├── config_blazepose_mpii_heatmap_bce.json │ ├── config_blazepose_mpii_heatmap_bce_regress_huber.json │ ├── config_blazepose_mpii_pushup_heatmap_bce.json │ ├── config_blazepose_mpii_pushup_heatmap_bce_regress_huber.json │ └── pushup_2head.json └── pushup_recognition.json ├── convert_to_onnx.py ├── images └── blazepose_full.png ├── requirements.txt ├── run_video.py ├── src ├── __init__.py ├── data_loaders │ ├── __init__.py │ ├── augmentation.py │ ├── augmentation2.py │ ├── augmentation_utils.py │ ├── humanpose.py │ ├── humanpose_2head.py │ └── pushup_recognition.py ├── metrics │ ├── f1.py │ ├── mae.py │ └── pck.py ├── models │ ├── __init__.py │ ├── blazepose_all_linear.py │ ├── blazepose_full.py │ ├── blazepose_layers.py │ ├── blazepose_legacy.py │ ├── blazepose_with_pushup_classify.py │ └── pushup_recognition.py ├── train_phase.py ├── trainers │ ├── __init__.py │ ├── blazepose_trainer.py │ ├── losses.py │ └── pushup_recognition_trainer.py └── utils │ ├── __init__.py │ ├── heatmap.py │ ├── keypoints.py │ ├── pre_processing.py │ └── visualizer.py ├── test.py ├── tools ├── lsp_data_to_json.py ├── merge_lsp.py ├── merge_lsp_lspet_pushup.py ├── merge_mpii_pushup.py ├── process_data.py ├── process_pushup_data.py ├── split_data_mpii.py ├── split_lsp_lspet.py ├── split_lsp_lspet_7points.py ├── split_mpii.py └── visualize_data.py └── train.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | __pycache__ 3 | /.vscode 4 | /experiments* 5 | !/experiments/.gitkeep 6 | /data 7 | !/data/.gitkeep 8 | /.vscode 9 | /trained_models 10 | !/trained_models/.gitkeep -------------------------------------------------------------------------------- /DATASET.md: -------------------------------------------------------------------------------- 1 | # DATASET 2 | 3 | ### I. Our custom dataset format 4 | 5 | All data should be converted to our custom dataset format before being used for training. Our format has this folder structure: 6 | ``` 7 | dataset_name/ 8 | images/ 9 | train.json 10 | val.json 11 | test.json 12 | ``` 13 | 14 | - `images` is a folder containing image files. 15 | - `train.json`, `val.json`, `test.json` are annotation files. Here are an example of labels in these files: 16 | 17 | ``` 18 | [ 19 | { 20 | "image": "001.png", 21 | "points": [[280, 540], [315, 468], [356, 354], [354, 243], [471, 331], [514, 440], [546, 540]], 22 | "visibility": [1, 1, 1, 1, 0, 0, 1] 23 | } 24 | { 25 | "image": "002.png", 26 | "points": [[269, 529], [289, 465], [305, 410], [310, 309], [455, 358], [542, 429], [560, 542]], 27 | "visibility": [1, 0, 0, 1, 1, 1, 1] 28 | }, 29 | ... 30 | ] 31 | ``` 32 | 33 | ### II. LSP and LSPET 34 | 35 | - Link to LSP dataset: . 36 | - Link to LSPET dataset: . 37 | 38 | #### 1. Convert annotation to JSON format 39 | 40 | - The annotation contains x and y locations and a binary value indicating the visbility of joints. 41 | - Use `tools/lsp_data_to_json.py` to convert LSP and LSPET annotation files to json format: 42 | - **NOTE:** We removed 6061 images from LSPET dataset due to missing points. 43 | 44 | ``` 45 | python tools/lsp_data_to_json.py --image_folder=data/lsp_dataset/images --input_file data/lsp_dataset/joints.mat --output_file data/lsp_dataset/labels.json 46 | python tools/lsp_data_to_json.py --image_folder=data/lspet_dataset/images --input_file data/lspet_dataset/joints.mat --output_file data/lspet_dataset/labels.json 47 | ``` 48 | 49 | #### 2. Merge 2 dataset and divide into subsets 50 | 51 | + Training: 3739 from LSPET and 1800 from LSP. 52 | + Validation: 100 from LSPET and 100 from LSP. 53 | + Test: 100 from LSPET and 100 from LSP. 54 | 55 | Please update paths to LSP and LSPET in `tools/split_lsp_lspet.py` and run: 56 | 57 | ``` 58 | python tools/split_lsp_lspet.py 59 | ``` 60 | 61 | 62 | ### III. MPII Humanpose 63 | 64 | - We only use images with numOtherPeople = 0. The original dataset are divided into 3 subsets: 65 | 66 | + Training: 9503 images. 67 | + Validation: 1000 images. 68 | + Test: 1000 images. 69 | 70 | 71 | ### IV. PushUp dataset 72 | 73 | We have push-up 420 videos, divided in 3 sets: 74 | 75 | + Training: 8837 images from 317 videos. 76 | + Validation: 1189 images from 41 videos. 77 | + Test: 1013 images from 62 videos. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # BlazePose Tensorflow 2.x 2 | 3 | This is an implementation of Google BlazePose in Tensorflow 2.x. The original paper is "BlazePose: On-device Real-time Body Pose tracking" by Valentin Bazarevsky, Ivan Grishchenko, Karthik Raveendran, Tyler Zhu, Fan Zhang, and Matthias Grundmann, which is available on [arXiv](https://arxiv.org/abs/2006.10204). You can find some demonstrations of BlazePose from [Google blog](https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html). 4 | 5 | Currently, the model being developed in this repo is based on TFLite (.tflite) model from [here](https://github.com/PINTO0309/PINTO_model_zoo/tree/master/058_BlazePose_Full_Keypoints/01_Accurate). I use [Netron.app](https://netron.app/) to visualize the architecture and try to mimic that architecture in my implementation. The visualized model architecture can be found [here](images/blazepose_full.png). Other architectures will be added in the future. 6 | 7 | **Note:** This repository is still under active development. 8 | 9 | **Update 14/12/2020:** Our PushUp Counter App is using this BlazePose model to count pushups from videos/webcam. [***Read more.***](https://github.com/vietanhdev/pushup-counter-app) 10 | 11 | ## TODOs 12 | 13 | - [ ] Implementation 14 | 15 | - [x] Initialize code for model from .tflite file. 16 | 17 | - [x] Basic dataset loader 18 | 19 | - [x] Implement loss function. 20 | 21 | - [x] Implement training code. 22 | 23 | - [x] Advanced augmentation: Random occlusion (BlazePose paper) 24 | 25 | - [x] Implement demo code for video and webcam. 26 | 27 | - [x] Support PCK metric. 28 | 29 | - [ ] Implement testing code. 30 | 31 | - [ ] Add training graph and pretrained models. 32 | 33 | - [ ] Support offset maps. 34 | 35 | - [ ] Experiment with other loss functions. 36 | 37 | - [ ] Workout counting from keypoints. 38 | 39 | - [ ] Rewrite in eager mode. 40 | 41 | - [ ] Datasets 42 | 43 | - [x] Support LSP dataset and LSPET dataset (partially). [More](DATASET.md). 44 | 45 | - [x] Support PushUps dataset. 46 | 47 | - [x] Support MPII dataset. 48 | 49 | - [ ] Support YOGA-82 dataset. 50 | 51 | - [ ] Custom dataset. 52 | 53 | - [ ] Convert and run model in TF Lite format. 54 | 55 | - [ ] Convert and run model in TensorRT. 56 | 57 | - [ ] Convert and run model in Tensorflow.js. 58 | 59 | ## Demo 60 | 61 | - Download pretrained model for PushUp dataset [here](https://1drv.ms/u/s!Av71xxzl6mYZgddJ7IdF0wfjwI3sgw?e=l94WL5) and put into `trained_models/blazepose_pushup_v1.h5`. Test with your webcam: 62 | 63 | ``` 64 | python run_video.py -c configs/mpii/config_blazepose_mpii_pushup_heatmap_bce_regress_huber.json -m trained_models/blazepose_pushup_v1.h5 -v webcam --confidence 0.3 65 | ``` 66 | 67 | The pretrained model is only in experimental state now. It only detects 7 keypoints for Push Up counting and it may not produce a good result now. I will update other models in the future. 68 | 69 | ## Training 70 | 71 | **NOTE:** Currently, I only focus on PushUp datase, which contains 7 keypoints. Due to the copyright of this dataset, I don't have permission to publish it on the Internet. You can read the instruction and try with your own dataset. 72 | 73 | - Prepare dataset using instruction from [DATASET.md](DATASET.md). 74 | 75 | - Training heatmap branch: 76 | 77 | ``` 78 | python train.py -c configs/mpii/config_blazepose_mpii_pushup_heatmap_bce.json 79 | ``` 80 | 81 | - After heatmap branch converged, set `load_weights` to `true` and update the `pretrained_weights_path` to the best model, and continue with the regression branch: 82 | 83 | ``` 84 | python train.py -c configs/mpii/config_blazepose_mpii_pushup_heatmap_bce_regress_huber.json 85 | ``` 86 | 87 | ## Reference 88 | 89 | - Cite the original paper: 90 | 91 | ```tex 92 | @article{Bazarevsky2020BlazePoseOR, 93 | title={BlazePose: On-device Real-time Body Pose tracking}, 94 | author={Valentin Bazarevsky and I. Grishchenko and K. Raveendran and Tyler Lixuan Zhu and Fangfang Zhang and M. Grundmann}, 95 | journal={ArXiv}, 96 | year={2020}, 97 | volume={abs/2006.10204} 98 | } 99 | ``` 100 | 101 | This source code uses some code and ideas from these repos: 102 | 103 | - https://fairyonice.github.io/Achieving-top-5-in-Kaggles-facial-keypoints-detection-using-FCN.html 104 | - https://github.com/yuanyuanli85/Stacked_Hourglass_Network_Keras 105 | 106 | ## Contributions 107 | 108 | Please feel free to [submit an issue](https://github.com/vietanhdev/tf-blazepose/issues) or [pull a request](https://github.com/vietanhdev/tf-blazepose/pulls). 109 | 110 | -------------------------------------------------------------------------------- /configs/all_linear/mpii_heatmap.json: -------------------------------------------------------------------------------- 1 | { 2 | "experiment_name": "all_linear_mpii_heatmap", 3 | "trainer": "blazepose_trainer", 4 | "data_loader": "humanpose", 5 | "data": { 6 | "train_images": "data/mpii/images/", 7 | "train_labels": "data/mpii/train.json", 8 | "val_images": "data/mpii/images/", 9 | "val_labels": "data/mpii/val.json", 10 | "test_images": "data/mpii/images/", 11 | "test_labels": "data/mpii/test.json", 12 | "symmetry_point_ids": [[12, 13], [10, 15], [11, 14], [2, 3], [1, 4], [0, 5]] 13 | }, 14 | "model" : { 15 | "im_width": 256, 16 | "im_height": 256, 17 | "heatmap_width": 128, 18 | "heatmap_height": 128, 19 | "heatmap_kp_sigma": 4, 20 | "num_keypoints": 16, 21 | "model_type": "ALL_LINEAR_TWO_HEAD" 22 | }, 23 | "train": { 24 | "train_phase": "HEATMAP", 25 | "heatmap_loss": "huber", 26 | "keypoint_loss": "huber", 27 | "loss_weights": {"heatmap": 1.0, "joints": 0.0}, 28 | "train_batch_size": 32, 29 | "val_batch_size": 32, 30 | "nb_epochs": 1000, 31 | "learning_rate": 5e-4, 32 | "load_weights": false, 33 | "pretrained_weights_path": "", 34 | "initial_epoch": 0 35 | }, 36 | "test": { 37 | "pck_ref_points_idxs" : [8, 9], 38 | "pck_thresh": 0.5 39 | } 40 | } 41 | -------------------------------------------------------------------------------- /configs/all_linear/pushup_heatmap.json: -------------------------------------------------------------------------------- 1 | { 2 | "experiment_name": "pushup_heatmap", 3 | "trainer": "blazepose_trainer", 4 | "data_loader": "humanpose", 5 | "data": { 6 | "train_images": "data/mpii_pushup/images/", 7 | "train_labels": "data/mpii_pushup/train.json", 8 | "val_images": "data/mpii_pushup/images/", 9 | "val_labels": "data/mpii_pushup/val.json", 10 | "test_images": "data/mpii_pushup/images/", 11 | "test_labels": "data/mpii_pushup/test.json", 12 | "symmetry_point_ids": [[0,6], [1,5], [2,4]] 13 | }, 14 | "model" : { 15 | "im_width": 256, 16 | "im_height": 256, 17 | "heatmap_width": 128, 18 | "heatmap_height": 128, 19 | "heatmap_kp_sigma": 4, 20 | "num_keypoints": 7, 21 | "model_type": "ALL_LINEAR_TWO_HEAD" 22 | }, 23 | "train": { 24 | "train_phase": "HEATMAP", 25 | "heatmap_loss": "huber", 26 | "keypoint_loss": "huber", 27 | "loss_weights": {"heatmap": 1.0, "joints": 0.0}, 28 | "train_batch_size": 32, 29 | "val_batch_size": 32, 30 | "nb_epochs": 1000, 31 | "learning_rate": 1e-4, 32 | "load_weights": false, 33 | "pretrained_weights_path": "", 34 | "initial_epoch": 0 35 | }, 36 | "test": { 37 | "pck_ref_points_idxs" : [2, 4], 38 | "pck_thresh": 0.25 39 | } 40 | } 41 | -------------------------------------------------------------------------------- /configs/all_linear/pushup_regression.json: -------------------------------------------------------------------------------- 1 | { 2 | "experiment_name": "pushup_regression", 3 | "trainer": "blazepose_trainer", 4 | "data_loader": "humanpose", 5 | "data": { 6 | "train_images": "data/mpii_pushup/images/", 7 | "train_labels": "data/mpii_pushup/train.json", 8 | "val_images": "data/mpii_pushup/images/", 9 | "val_labels": "data/mpii_pushup/val.json", 10 | "test_images": "data/mpii_pushup/images/", 11 | "test_labels": "data/mpii_pushup/test.json", 12 | "symmetry_point_ids": [[0,6], [1,5], [2,4]] 13 | }, 14 | "model" : { 15 | "im_width": 256, 16 | "im_height": 256, 17 | "heatmap_width": 128, 18 | "heatmap_height": 128, 19 | "heatmap_kp_sigma": 4, 20 | "num_keypoints": 7, 21 | "model_type": "ALL_LINEAR_TWO_HEAD" 22 | }, 23 | "train": { 24 | "train_phase": "REGRESSION", 25 | "heatmap_loss": "huber", 26 | "keypoint_loss": "huber", 27 | "loss_weights": {"heatmap": 0.0, "joints": 1.0}, 28 | "train_batch_size": 32, 29 | "val_batch_size": 32, 30 | "nb_epochs": 1000, 31 | "learning_rate": 1e-4, 32 | "load_weights": false, 33 | "pretrained_weights_path": "", 34 | "initial_epoch": 0 35 | }, 36 | "test": { 37 | "pck_ref_points_idxs" : [2, 4], 38 | "pck_thresh": 0.25 39 | } 40 | } 41 | -------------------------------------------------------------------------------- /configs/mpii/config_blazepose_mpii_heatmap_bce.json: -------------------------------------------------------------------------------- 1 | { 2 | "experiment_name": "blazepose_mpii_heatmap_bce", 3 | "trainer": "blazepose_trainer", 4 | "data_loader": "humanpose", 5 | "data": { 6 | "train_images": "data/mpii/images/", 7 | "train_labels": "data/mpii/train.json", 8 | "val_images": "data/mpii/images/", 9 | "val_labels": "data/mpii/val.json", 10 | "test_images": "data/mpii/images/", 11 | "test_labels": "data/mpii/test.json", 12 | "symmetry_point_ids": [[12, 13], [10, 15], [11, 14], [2, 3], [1, 4], [0, 5]] 13 | }, 14 | "model" : { 15 | "im_width": 256, 16 | "im_height": 256, 17 | "heatmap_width": 128, 18 | "heatmap_height": 128, 19 | "heatmap_kp_sigma": 4, 20 | "num_keypoints": 16, 21 | "model_type": "SIGMOID_HEATMAP_SIGMOID_REGRESS_TWO_HEAD" 22 | }, 23 | "train": { 24 | "train_phase": "HEATMAP", 25 | "heatmap_loss": "binary_crossentropy", 26 | "keypoint_loss": "huber", 27 | "loss_weights": {"heatmap": 1.0, "joints": 0.0}, 28 | "train_batch_size": 16, 29 | "val_batch_size": 16, 30 | "nb_epochs": 1000, 31 | "learning_rate": 1e-4, 32 | "load_weights": false, 33 | "pretrained_weights_path": "", 34 | "initial_epoch": 0 35 | }, 36 | "test": { 37 | "pck_ref_points_idxs" : [8, 9], 38 | "pck_thresh": 0.5 39 | } 40 | } 41 | -------------------------------------------------------------------------------- /configs/mpii/config_blazepose_mpii_heatmap_bce_regress_huber.json: -------------------------------------------------------------------------------- 1 | { 2 | "experiment_name": "blazepose_mpii_heatmap_bce_regress_huber", 3 | "trainer": "blazepose_trainer", 4 | "data_loader": "humanpose", 5 | "data": { 6 | "train_images": "data/mpii/images/", 7 | "train_labels": "data/mpii/train.json", 8 | "val_images": "data/mpii/images/", 9 | "val_labels": "data/mpii/val.json", 10 | "test_images": "data/mpii/images/", 11 | "test_labels": "data/mpii/test.json", 12 | "symmetry_point_ids": [[12, 13], [10, 15], [11, 14], [2, 3], [1, 4], [0, 5]] 13 | }, 14 | "model" : { 15 | "im_width": 256, 16 | "im_height": 256, 17 | "heatmap_width": 128, 18 | "heatmap_height": 128, 19 | "heatmap_kp_sigma": 4, 20 | "num_keypoints": 7, 21 | "model_type": "SIGMOID_HEATMAP_LINEAR_REGRESS_TWO_HEAD" 22 | }, 23 | "train": { 24 | "train_phase": "REGRESSION", 25 | "heatmap_loss": "binary_crossentropy", 26 | "keypoint_loss": "huber", 27 | "loss_weights": {"heatmap": 0.0, "joints": 1.0}, 28 | "train_batch_size": 32, 29 | "val_batch_size": 32, 30 | "nb_epochs": 1000, 31 | "learning_rate": 1e-3, 32 | "load_weights": false, 33 | "pretrained_weights_path": "", 34 | "initial_epoch": 0 35 | }, 36 | "test": { 37 | "pck_ref_points_idxs" : [8, 9], 38 | "pck_thresh": 0.5 39 | } 40 | } 41 | -------------------------------------------------------------------------------- /configs/mpii/config_blazepose_mpii_pushup_heatmap_bce.json: -------------------------------------------------------------------------------- 1 | { 2 | "experiment_name": "blazepose_mpii_pushup_heatmap_bce", 3 | "trainer": "blazepose_trainer", 4 | "data_loader": "humanpose", 5 | "data": { 6 | "train_images": "data/mpii_pushup/images/", 7 | "train_labels": "data/mpii_pushup/train.json", 8 | "val_images": "data/mpii_pushup/images/", 9 | "val_labels": "data/mpii_pushup/val.json", 10 | "test_images": "data/mpii_pushup/images/", 11 | "test_labels": "data/mpii_pushup/test.json", 12 | "symmetry_point_ids": [[0,6], [1,5], [2,4]] 13 | }, 14 | "model" : { 15 | "im_width": 256, 16 | "im_height": 256, 17 | "heatmap_width": 128, 18 | "heatmap_height": 128, 19 | "heatmap_kp_sigma": 4, 20 | "num_keypoints": 7, 21 | "model_type": "SIGMOID_HEATMAP_LINEAR_REGRESS_TWO_HEAD" 22 | }, 23 | "train": { 24 | "train_phase": "HEATMAP", 25 | "heatmap_loss": "binary_crossentropy", 26 | "keypoint_loss": "huber", 27 | "loss_weights": {"heatmap": 1.0, "joints": 0.0}, 28 | "train_batch_size": 16, 29 | "val_batch_size": 16, 30 | "nb_epochs": 1000, 31 | "learning_rate": 1e-4, 32 | "load_weights": false, 33 | "pretrained_weights_path": "", 34 | "initial_epoch": 0 35 | }, 36 | "test": { 37 | "pck_ref_points_idxs" : [2, 4], 38 | "pck_thresh": 0.25 39 | } 40 | } 41 | -------------------------------------------------------------------------------- /configs/mpii/config_blazepose_mpii_pushup_heatmap_bce_regress_huber.json: -------------------------------------------------------------------------------- 1 | { 2 | "experiment_name": "config_blazepose_mpii_pushup_heatmap_bce_regress_huber", 3 | "trainer": "blazepose_trainer", 4 | "data_loader": "humanpose", 5 | "data": { 6 | "train_images": "data/mpii_pushup/images/", 7 | "train_labels": "data/mpii_pushup/train.json", 8 | "val_images": "data/mpii_pushup/images/", 9 | "val_labels": "data/mpii_pushup/val.json", 10 | "test_images": "data/mpii_pushup/images/", 11 | "test_labels": "data/mpii_pushup/test.json", 12 | "symmetry_point_ids": [[0,6], [1,5], [2,4]] 13 | }, 14 | "model" : { 15 | "im_width": 256, 16 | "im_height": 256, 17 | "heatmap_width": 128, 18 | "heatmap_height": 128, 19 | "heatmap_kp_sigma": 4, 20 | "num_keypoints": 7, 21 | "model_type": "SIGMOID_HEATMAP_LINEAR_REGRESS_TWO_HEAD" 22 | }, 23 | "train": { 24 | "train_phase": "REGRESSION", 25 | "heatmap_loss": "binary_crossentropy", 26 | "keypoint_loss": "huber", 27 | "loss_weights": {"heatmap": 1.0, "joints": 1.0}, 28 | "train_batch_size": 16, 29 | "val_batch_size": 16, 30 | "nb_epochs": 1000, 31 | "learning_rate": 0, 32 | "load_weights": false, 33 | "pretrained_weights_path": "", 34 | "initial_epoch": 0 35 | }, 36 | "test": { 37 | "pck_ref_points_idxs" : [2, 4], 38 | "pck_thresh": 0.25 39 | } 40 | } 41 | -------------------------------------------------------------------------------- /configs/mpii/pushup_2head.json: -------------------------------------------------------------------------------- 1 | { 2 | "experiment_name": "2head_model", 3 | "trainer": "blazepose_trainer2", 4 | "data_loader": "humanpose_2head", 5 | "data": { 6 | "train_images": "data/mpii_pushup/images/", 7 | "train_labels": "data/mpii_pushup/train2.json", 8 | "val_images": "data/mpii_pushup/images/", 9 | "val_labels": "data/mpii_pushup/val2.json", 10 | "test_images": "data/mpii_pushup/images/", 11 | "test_labels": "data/mpii_pushup/test2.json", 12 | "symmetry_point_ids": [[12, 13], [10, 15], [11, 14], [2, 3], [1, 4], [0, 5]] 13 | }, 14 | "model" : { 15 | "im_width": 256, 16 | "im_height": 256, 17 | "heatmap_width": 128, 18 | "heatmap_height": 128, 19 | "heatmap_kp_sigma": 4, 20 | "num_keypoints": 16, 21 | "model_type": "BLAZEPOSE_WITH_PUSHUP_CLASSIFY" 22 | }, 23 | "train": { 24 | "train_phase": "HEATMAP", 25 | "heatmap_loss": "focal_tversky", 26 | "is_pushup_loss": "binary_crossentropy", 27 | "loss_weights": {"heatmap": 1.0, "is_pushup": 1.0}, 28 | "train_batch_size": 16, 29 | "val_batch_size": 16, 30 | "nb_epochs": 1000, 31 | "learning_rate": 1e-4, 32 | "load_weights": false, 33 | "pretrained_weights_path": "", 34 | "initial_epoch": 0 35 | }, 36 | "test": { 37 | "pck_ref_points_idxs" : [12, 13], 38 | "pck_thresh": 0.25 39 | } 40 | } 41 | -------------------------------------------------------------------------------- /configs/pushup_recognition.json: -------------------------------------------------------------------------------- 1 | { 2 | "experiment_name": "pushup_recognition2", 3 | "trainer": "pushup_recognition_trainer", 4 | "data_loader": "pushup_recognition", 5 | "data": { 6 | "train_images": "data/mpii_pushup/images/", 7 | "train_labels": "data/mpii_pushup/train.json", 8 | "val_images": "data/mpii_pushup/images/", 9 | "val_labels": "data/mpii_pushup/val.json", 10 | "test_images": "data/mpii_pushup/images/", 11 | "test_labels": "data/mpii_pushup/test.json" 12 | }, 13 | "model" : { 14 | "im_width": 224, 15 | "im_height": 224, 16 | "model_type": "PUSHUP_RECOGNITION" 17 | }, 18 | "train": { 19 | "loss": "binary_crossentropy", 20 | "train_batch_size": 16, 21 | "val_batch_size": 16, 22 | "nb_epochs": 1000, 23 | "learning_rate": 5e-4, 24 | "load_weights": false, 25 | "pretrained_weights_path": "experiments/pushup_recognition/models/model_ep003.h5", 26 | "initial_epoch": 0 27 | }, 28 | "test": {} 29 | } 30 | -------------------------------------------------------------------------------- /convert_to_onnx.py: -------------------------------------------------------------------------------- 1 | 2 | import tensorflow.keras.backend as K 3 | K.set_learning_phase(0) 4 | import tensorflow as tf 5 | import keras2onnx 6 | from tensorflow.keras.models import load_model 7 | 8 | MODEL_PATH = "" 9 | model = load_model(MODEL_PATH) 10 | submodel = tf.keras.models.Model(inputs=model.inputs, outputs=model.get_layer("joints").outputs) 11 | submodel._name = "blazepose_heatmap_v1" 12 | print(submodel.summary()) 13 | onnx_model = keras2onnx.convert_keras(submodel, submodel.name) 14 | 15 | file = open("blazepose_heatmap_v1.1.onnx", "wb") 16 | file.write(onnx_model.SerializeToString()) 17 | file.close() -------------------------------------------------------------------------------- /images/blazepose_full.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vietanhdev/tf-blazepose/95db5ced1a4084c6a424161fbfc38c4e70612c97/images/blazepose_full.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vietanhdev/tf-blazepose/95db5ced1a4084c6a424161fbfc38c4e70612c97/requirements.txt -------------------------------------------------------------------------------- /run_video.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import importlib 3 | import json 4 | import cv2 5 | import numpy as np 6 | from src.utils.heatmap import find_keypoints_from_heatmap 7 | from src.utils.visualizer import visualize_keypoints 8 | import tensorflow as tf 9 | 10 | for gpu in tf.config.experimental.list_physical_devices('GPU'): 11 | tf.compat.v2.config.experimental.set_memory_growth(gpu, True) 12 | 13 | parser = argparse.ArgumentParser() 14 | parser.add_argument( 15 | '-c', 16 | '--conf_file', default="config.json", 17 | help='Configuration file') 18 | parser.add_argument( 19 | '-m', 20 | '--model', default="model.h5", 21 | help='Path to h5 model') 22 | parser.add_argument( 23 | '-confidence', 24 | '--confidence', 25 | default=0.05, 26 | help='Confidence for heatmap point') 27 | parser.add_argument( 28 | '-v', 29 | '--video', 30 | help='Path to video file') 31 | 32 | args = parser.parse_args() 33 | 34 | # Webcam 35 | if args.video == "webcam": 36 | args.video = 0 37 | 38 | confth = float(args.confidence) 39 | 40 | # Open and load the config json 41 | with open(args.conf_file) as config_buffer: 42 | config = json.loads(config_buffer.read()) 43 | 44 | # Load model 45 | trainer = importlib.import_module("src.trainers.{}".format(config["trainer"])) 46 | model = trainer.load_model(config, args.model) 47 | 48 | # Dataloader 49 | datalib = importlib.import_module("src.data_loaders.{}".format(config["data_loader"])) 50 | DataSequence = datalib.DataSequence 51 | 52 | cap = cv2.VideoCapture(args.video) 53 | cv2.namedWindow("Result", cv2.WINDOW_NORMAL) 54 | while(True): 55 | 56 | ret, origin_frame = cap.read() 57 | 58 | scale = np.array([float(origin_frame.shape[1]) / config["model"]["im_width"], 59 | float(origin_frame.shape[0]) / config["model"]["im_height"]], dtype=float) 60 | 61 | img = cv2.resize(origin_frame, (config["model"]["im_width"], config["model"]["im_height"])) 62 | input_x = DataSequence.preprocess_images(np.array([img])) 63 | 64 | regress_kps, heatmap = model.predict(input_x) 65 | heatmap_kps = find_keypoints_from_heatmap(heatmap)[0] 66 | heatmap_kps = np.array(heatmap_kps) 67 | 68 | # Scale heatmap keypoint 69 | heatmap_stride = np.array([config["model"]["im_width"] / config["model"]["heatmap_width"], 70 | config["model"]["im_height"] / config["model"]["heatmap_height"]], dtype=float) 71 | heatmap_kps[:, :2] = heatmap_kps[:, :2] * scale * heatmap_stride 72 | 73 | # Scale regression keypoint 74 | regress_kps = regress_kps.reshape((-1, 3)) 75 | regress_kps[:, :2] = regress_kps[:, :2] * np.array([origin_frame.shape[1], origin_frame.shape[0]]) 76 | 77 | # Filter heatmap keypoint by confidence 78 | heatmap_kps_visibility = np.ones((len(heatmap_kps),), dtype=int) 79 | for i in range(len(heatmap_kps)): 80 | if heatmap_kps[i, 2] < confth: 81 | heatmap_kps[i, :2] = [-1, -1] 82 | heatmap_kps_visibility[i] = 0 83 | 84 | regress_kps_visibility = np.ones((len(regress_kps),), dtype=int) 85 | for i in range(len(regress_kps)): 86 | if regress_kps[i, 2] < 0.5: 87 | regress_kps[i, :2] = [-1, -1] 88 | regress_kps_visibility[i] = 0 89 | 90 | edges = [[0,1,2,3,4,5,6]] 91 | 92 | draw = origin_frame.copy() 93 | draw = visualize_keypoints(draw, regress_kps[:, :2], visibility=regress_kps_visibility, edges=edges, point_color=(0, 255, 0), text_color=(255, 0, 0)) 94 | draw = visualize_keypoints(draw, heatmap_kps[:, :2], visibility=heatmap_kps_visibility, edges=edges, point_color=(0, 255, 0), text_color=(0, 0, 255)) 95 | cv2.imshow('Result', draw) 96 | 97 | heatmap = np.sum(heatmap[0], axis=2) 98 | heatmap = cv2.resize(heatmap, None, fx=3, fy=3) 99 | heatmap = heatmap * 1.5 100 | cv2.imshow('Heatmap', heatmap) 101 | if cv2.waitKey(1) & 0xFF == ord('q'): 102 | break 103 | 104 | # When everything done, release the capture 105 | cap.release() 106 | cv2.destroyAllWindows() 107 | -------------------------------------------------------------------------------- /src/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vietanhdev/tf-blazepose/95db5ced1a4084c6a424161fbfc38c4e70612c97/src/__init__.py -------------------------------------------------------------------------------- /src/data_loaders/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vietanhdev/tf-blazepose/95db5ced1a4084c6a424161fbfc38c4e70612c97/src/data_loaders/__init__.py -------------------------------------------------------------------------------- /src/data_loaders/augmentation.py: -------------------------------------------------------------------------------- 1 | import random 2 | 3 | import cv2 4 | import imgaug as ia 5 | import numpy as np 6 | from imgaug import augmenters as iaa 7 | 8 | from .augmentation_utils import add_vertical_reflection 9 | 10 | seq = [None] 11 | 12 | 13 | def load_aug(): 14 | 15 | def sometimes(aug): return iaa.Sometimes(0.2, aug) 16 | 17 | seq[0] = iaa.Sequential( 18 | [ 19 | # crop images by -5% to 10% of their height/width 20 | sometimes(iaa.CropAndPad( 21 | percent=(-0.05, 0.1), 22 | pad_mode=ia.ALL, 23 | pad_cval=(0, 255) 24 | )), 25 | sometimes(iaa.Affine( 26 | scale={"x": (0.9, 1.1), "y": (0.9, 1.1)}, 27 | translate_percent={"x": (-0.05, 0.05), "y": (-0.05, 0.05)}, 28 | rotate=(-10, 10), 29 | shear=(-5, 5), 30 | order=[0, 1], 31 | # if mode is constant, use a cval between 0 and 255 32 | cval=(0, 255), 33 | # use any of scikit-image's warping modes (see 2nd image from the top for examples) 34 | mode=ia.ALL 35 | )), 36 | iaa.Sometimes(0.1, iaa.MotionBlur(k=15, angle=[-45, 45])), 37 | # execute 0 to 5 of the following (less important) augmenters per image 38 | # don't execute all of them, as that would often be way too strong 39 | iaa.SomeOf((0, 5), 40 | [ 41 | iaa.OneOf([ 42 | iaa.GaussianBlur((0, 3.0)), 43 | iaa.AverageBlur(k=(2, 5)), 44 | iaa.MedianBlur(k=(3, 5)), 45 | ]), 46 | iaa.Sharpen(alpha=(0, 1.0), lightness=( 47 | 0.75, 1.5)), # sharpen images 48 | # add gaussian noise to images 49 | iaa.AdditiveGaussianNoise(loc=0, scale=( 50 | 0.0, 0.05*255), per_channel=0.5), 51 | # change brightness of images (by -10 to 10 of original value) 52 | iaa.Add((-10, 10), per_channel=0.5), 53 | # change hue and saturation 54 | iaa.AddToHueAndSaturation((-20, 20)), 55 | # either change the brightness of the whole image (sometimes 56 | # per channel) or change the brightness of subareas 57 | iaa.OneOf([ 58 | iaa.Multiply((0.5, 1.5), per_channel=0.5), 59 | iaa.FrequencyNoiseAlpha( 60 | exponent=(-4, 0), 61 | first=iaa.Multiply((0.5, 1.5), per_channel=True), 62 | second=iaa.LinearContrast((0.5, 2.0)) 63 | ) 64 | ]), 65 | # improve or worsen the contrast 66 | iaa.LinearContrast((0.5, 2.0), per_channel=0.5), 67 | iaa.Grayscale(alpha=(0.0, 1.0)), 68 | ], 69 | random_order=True 70 | ) 71 | ], 72 | random_order=True 73 | ) 74 | 75 | 76 | def augment_img(image, landmark=None): 77 | if seq[0] is None: 78 | load_aug() 79 | 80 | if landmark is None: 81 | image_aug = seq[0](images=np.array([image])) 82 | return image_aug[0] 83 | else: 84 | 85 | landmark_xy = landmark[:, :2] 86 | image_aug, landmark_xy = seq[0](images=np.array( 87 | [image]), keypoints=np.array([landmark_xy])) 88 | image_aug = image_aug[0] 89 | landmark_xy = landmark_xy[0] 90 | 91 | # Simulate reflection 92 | if random.random() < 0.1: 93 | image_aug = add_vertical_reflection(image_aug, landmark_xy) 94 | 95 | landmark[:, :2] = landmark_xy 96 | # draw = image_aug.copy() 97 | # for i in range(landmark.shape[0]): 98 | # draw = cv2.circle(draw, (int(landmark[i][0]), int(landmark[i][1])), 2, (0,255,0), 2) 99 | # cv2.imshow("draw", draw) 100 | # cv2.waitKey(0) 101 | return image_aug, landmark 102 | -------------------------------------------------------------------------------- /src/data_loaders/augmentation2.py: -------------------------------------------------------------------------------- 1 | import random 2 | 3 | import cv2 4 | import imgaug as ia 5 | import numpy as np 6 | from imgaug import augmenters as iaa 7 | 8 | from .augmentation_utils import add_vertical_reflection 9 | 10 | seq = [None] 11 | 12 | 13 | def load_aug(): 14 | 15 | def sometimes(aug): return iaa.Sometimes(0.2, aug) 16 | 17 | seq[0] = iaa.Sequential( 18 | [ 19 | iaa.Crop(percent=(0, 0.3)), # random crops 20 | iaa.Fliplr(0.5), 21 | # crop images by -5% to 10% of their height/width 22 | sometimes(iaa.CropAndPad( 23 | percent=(-0.2, 0.3), 24 | pad_mode=ia.ALL, 25 | pad_cval=(0, 255) 26 | )), 27 | sometimes(iaa.Affine( 28 | scale={"x": (0.7, 1.3), "y": (0.7, 1.3)}, 29 | translate_percent={"x": (-0.2, 0.2), "y": (-0.2, 0.2)}, 30 | rotate=(-5, 5), 31 | shear=(-5, 5), 32 | order=[0, 1], 33 | # if mode is constant, use a cval between 0 and 255 34 | cval=(0, 255), 35 | # use any of scikit-image's warping modes (see 2nd image from the top for examples) 36 | mode=ia.ALL 37 | )), 38 | iaa.Sometimes(0.1, iaa.MotionBlur(k=15, angle=[-45, 45])), 39 | # execute 0 to 5 of the following (less important) augmenters per image 40 | # don't execute all of them, as that would often be way too strong 41 | iaa.SomeOf((0, 5), 42 | [ 43 | iaa.OneOf([ 44 | iaa.GaussianBlur((0, 3.0)), 45 | iaa.AverageBlur(k=(2, 5)), 46 | iaa.MedianBlur(k=(3, 5)), 47 | ]), 48 | iaa.Sharpen(alpha=(0, 1.0), lightness=( 49 | 0.75, 1.5)), # sharpen images 50 | # add gaussian noise to images 51 | iaa.AdditiveGaussianNoise(loc=0, scale=( 52 | 0.0, 0.05*255), per_channel=0.5), 53 | # change brightness of images (by -10 to 10 of original value) 54 | iaa.Add((-10, 10), per_channel=0.5), 55 | # change hue and saturation 56 | iaa.AddToHueAndSaturation((-20, 20)), 57 | # either change the brightness of the whole image (sometimes 58 | # per channel) or change the brightness of subareas 59 | iaa.OneOf([ 60 | iaa.Multiply((0.5, 1.5), per_channel=0.5), 61 | iaa.FrequencyNoiseAlpha( 62 | exponent=(-4, 0), 63 | first=iaa.Multiply((0.5, 1.5), per_channel=True), 64 | second=iaa.LinearContrast((0.5, 2.0)) 65 | ) 66 | ]), 67 | # improve or worsen the contrast 68 | iaa.LinearContrast((0.3, 3.0), per_channel=0.5), 69 | iaa.Grayscale(alpha=(0.0, 1.0)), 70 | iaa.Multiply((0.333, 3), per_channel=0.5), 71 | ], 72 | random_order=True 73 | ) 74 | ], 75 | random_order=True 76 | ) 77 | 78 | 79 | def crop(image): 80 | # print(image.shape) 81 | old_size = (image.shape[1], image.shape[0]) 82 | im_height = image.shape[0] 83 | max_y = random.randint(int(0.6 * im_height), int(0.9 * im_height)) 84 | # print(max_y) 85 | 86 | image = image[0:max_y, :, :].copy() 87 | image = cv2.resize(image, (old_size[0], old_size[1])) 88 | # print(image.shape) 89 | return image 90 | 91 | def crop0(image): 92 | # print(image.shape) 93 | old_size = (image.shape[1], image.shape[0]) 94 | im_height = image.shape[0] 95 | max_y = random.randint(int(0.0 * im_height), int(0.5 * im_height)) 96 | # print(max_y) 97 | 98 | image = image[max_y:, :, :].copy() 99 | image = cv2.resize(image, (old_size[0], old_size[1])) 100 | # print(image.shape) 101 | return image 102 | 103 | def crop2(image): 104 | # print(image.shape) 105 | old_size = (image.shape[1], image.shape[0]) 106 | im_width = image.shape[1] 107 | max_x = random.randint(int(0.0 * im_width), int(0.3 * im_width)) 108 | # print(max_y) 109 | 110 | image = image[:, max_x:, :].copy() 111 | image = cv2.resize(image, (old_size[0], old_size[1])) 112 | # print(image.shape) 113 | return image 114 | 115 | def crop3(image): 116 | # print(image.shape) 117 | old_size = (image.shape[1], image.shape[0]) 118 | im_width = image.shape[1] 119 | max_x = random.randint(int(0.7 * im_width), int(0.95 * im_width)) 120 | # print(max_y) 121 | 122 | image = image[:, :max_x, :].copy() 123 | image = cv2.resize(image, (old_size[0], old_size[1])) 124 | # print(image.shape) 125 | return image 126 | 127 | def augment_img(image, y, landmark=None): 128 | if seq[0] is None: 129 | load_aug() 130 | 131 | if random.random() < 0.5 and y: 132 | if random.random() < 0.5: 133 | image = crop(image) 134 | else: 135 | image = crop0(image) 136 | 137 | image = crop2(image) 138 | image = crop3(image) 139 | 140 | if landmark is None: 141 | image_aug = seq[0](images=np.array([image])) 142 | return image_aug[0] 143 | else: 144 | image_aug, landmark = seq[0](images=np.array( 145 | [image]), keypoints=np.array([landmark])) 146 | image_aug = image_aug[0] 147 | landmark = landmark[0] 148 | 149 | # Simulate reflection 150 | if random.random() < 0.1: 151 | image_aug = add_vertical_reflection(image_aug, landmark) 152 | 153 | # draw = image_aug.copy() 154 | # for i in range(landmark.shape[0]): 155 | # draw = cv2.circle(draw, (int(landmark[i][0]), int(landmark[i][1])), 2, (0,255,0), 2) 156 | # cv2.imshow("draw", draw) 157 | # cv2.waitKey(0) 158 | return image_aug, landmark 159 | -------------------------------------------------------------------------------- /src/data_loaders/augmentation_utils.py: -------------------------------------------------------------------------------- 1 | import random 2 | import numpy as np 3 | import cv2 4 | 5 | 6 | def add_vertical_reflection(image, keypoints, min_height=0.1): 7 | """Add vertical reflection 8 | 9 | Args: 10 | image: Input image 11 | keypoints: Keypoints 12 | min_height [int]: Min height ratio of reflection (over image height) 13 | 14 | Return: 15 | Augmented image 16 | """ 17 | 18 | im_height = image.shape[0] 19 | max_y = np.max(np.array(keypoints)[:, 1]) 20 | reflection_height = min(im_height - max_y - 1, max_y) 21 | 22 | if reflection_height < min_height * im_height: 23 | return image 24 | 25 | alpha = random.uniform(0.5, 0.9) 26 | beta = (1.0 - alpha) 27 | image[max_y:max_y+reflection_height, :, :] = cv2.addWeighted(image[max_y:max_y+reflection_height, :, :], 28 | alpha, 29 | cv2.flip( 30 | image[max_y-reflection_height:max_y, :, :], 0), 31 | beta, 0.0) 32 | 33 | return image 34 | 35 | 36 | def random_occlusion(image, keypoints, visibility=None, rect_ratio=None, rect_color="random"): 37 | """Generate random rectangle to occlude points 38 | From BlazePose paper: "To support the prediction of invisible points, we simulate occlusions (random 39 | rectangles filled with various colors) during training and introduce a per-point 40 | visibility classifier that indicates whether a particular point is occluded and 41 | if the position prediction is deemed inaccurate." 42 | 43 | Args: 44 | image: Input image 45 | keypoints: Keypoints in format [[x1, y1], [x2, y2], ...] 46 | visibility [list]: List of visibilities of keypoints. 0: occluded by rectangle, 1: visible 47 | rect_ratio: Rect ratio wrt image width and height. Format ((min_width, max_width), (min_height, max_height)) 48 | Example: ((0.2, 0.5), (0.2, 0.5)) 49 | rect_color: Scalar indicating color to fill in the rectangle 50 | 51 | Return: 52 | image: Generated image 53 | visibility [list]: List of visibilities of keypoints. 0: occluded by rectangle, 1: visible 54 | """ 55 | 56 | if rect_ratio is None: 57 | rect_ratio = ((0.2, 0.5), (0.2, 0.5)) 58 | 59 | im_height, im_width = image.shape[:2] 60 | rect_width = int(im_width * random.uniform(*rect_ratio[0])) 61 | rect_height = int(im_height * random.uniform(*rect_ratio[1])) 62 | rect_x = random.randint(0, im_width - rect_width) 63 | rect_y = random.randint(0, im_height - rect_height) 64 | 65 | gen_image = image.copy() 66 | if rect_color == "random": 67 | rect_color = (random.randint(0, 255), random.randint( 68 | 0, 255), random.randint(0, 255)) 69 | gen_image = cv2.rectangle(gen_image, (rect_x, rect_y), 70 | (rect_x + rect_width, rect_y + rect_height), rect_color, -1) 71 | 72 | if visibility is None: 73 | visibility = [1] * len(keypoints) 74 | for i in range(len(visibility)): 75 | if rect_x < keypoints[i][0] and keypoints[i][0] < rect_x + rect_width \ 76 | and rect_y < keypoints[i][1] and keypoints[i][1] < rect_y + rect_height: 77 | visibility[i] = 0 78 | 79 | return gen_image, visibility 80 | -------------------------------------------------------------------------------- /src/data_loaders/humanpose.py: -------------------------------------------------------------------------------- 1 | import json 2 | import math 3 | import os 4 | import random 5 | 6 | import cv2 7 | import numpy as np 8 | from tensorflow.keras.utils import Sequence 9 | 10 | from ..utils.heatmap import gen_gt_heatmap 11 | from ..utils.keypoints import normalize_landmark 12 | from ..utils.pre_processing import square_crop_with_keypoints 13 | from ..utils.visualizer import visualize_keypoints 14 | from .augmentation import augment_img 15 | from .augmentation_utils import random_occlusion 16 | 17 | 18 | class DataSequence(Sequence): 19 | 20 | def __init__(self, image_folder, label_file, batch_size=8, input_size=(256, 256), output_heatmap=True, heatmap_size=(128, 128), heatmap_sigma=4, n_points=16, shuffle=True, augment=False, random_flip=False, random_rotate=False, random_scale_on_crop=False, clip_landmark=False, symmetry_point_ids=None): 21 | 22 | self.batch_size = batch_size 23 | self.input_size = input_size 24 | self.output_heatmap = output_heatmap 25 | self.heatmap_size = heatmap_size 26 | self.heatmap_sigma = heatmap_sigma 27 | self.image_folder = image_folder 28 | self.random_flip = random_flip 29 | self.random_rotate = random_rotate 30 | self.random_scale_on_crop = random_scale_on_crop 31 | self.augment = augment 32 | self.n_points = n_points 33 | self.symmetry_point_ids = symmetry_point_ids 34 | self.clip_landmark = clip_landmark # Clip value of landmark to range [0, 1] 35 | 36 | with open(label_file, "r") as fp: 37 | self.anno = json.load(fp) 38 | 39 | if shuffle: 40 | random.shuffle(self.anno) 41 | 42 | def __len__(self): 43 | """ 44 | Number of batch in the Sequence. 45 | :return: The number of batches in the Sequence. 46 | """ 47 | return math.ceil(len(self.anno) / float(self.batch_size)) 48 | 49 | def __getitem__(self, idx): 50 | """ 51 | Retrieve the mask and the image in batches at position idx 52 | :param idx: position of the batch in the Sequence. 53 | :return: batches of image and the corresponding mask 54 | """ 55 | 56 | batch_data = self.anno[idx * 57 | self.batch_size: (1 + idx) * self.batch_size] 58 | 59 | batch_image = [] 60 | batch_landmark = [] 61 | batch_heatmap = [] 62 | 63 | for data in batch_data: 64 | 65 | # Load and augment data 66 | image, landmark, heatmap = self.load_data(self.image_folder, data) 67 | 68 | batch_image.append(image) 69 | batch_landmark.append(landmark) 70 | if self.output_heatmap: 71 | batch_heatmap.append(heatmap) 72 | 73 | batch_image = np.array(batch_image) 74 | batch_landmark = np.array(batch_landmark) 75 | if self.output_heatmap: 76 | batch_heatmap = np.array(batch_heatmap) 77 | 78 | batch_image = DataSequence.preprocess_images(batch_image) 79 | batch_landmark = self.preprocess_landmarks(batch_landmark) 80 | 81 | # Prevent values from going outside [0, 1] 82 | # Only applied for sigmoid output 83 | if self.clip_landmark: 84 | batch_landmark[batch_landmark < 0] = 0 85 | batch_landmark[batch_landmark > 1] = 1 86 | 87 | if self.output_heatmap: 88 | return batch_image, [batch_landmark, batch_heatmap] 89 | else: 90 | return batch_image, batch_landmark 91 | 92 | @staticmethod 93 | def preprocess_images(images): 94 | # Convert color to RGB 95 | for i in range(images.shape[0]): 96 | images[i] = cv2.cvtColor(images[i], cv2.COLOR_BGR2RGB) 97 | mean = np.array([0.5, 0.5, 0.5], dtype=np.float) 98 | images = np.array(images, dtype=np.float32) 99 | images = images / 255.0 100 | images -= mean 101 | return images 102 | 103 | def preprocess_landmarks(self, landmarks): 104 | 105 | first_dim = landmarks.shape[0] 106 | landmarks = landmarks.reshape((-1, 3)) 107 | landmarks = normalize_landmark(landmarks, self.input_size) 108 | landmarks = landmarks.reshape((first_dim, -1)) 109 | return landmarks 110 | 111 | def load_data(self, img_folder, data): 112 | 113 | # Load image 114 | path = os.path.join(img_folder, data["image"]) 115 | image = cv2.imread(path) 116 | 117 | # Load landmark and apply square cropping for image 118 | landmark = data["points"] 119 | bbox = data["bbox"] 120 | landmark = np.array(landmark) 121 | 122 | # Convert all (-1, -1) to (0, 0) 123 | for i in range(landmark.shape[0]): 124 | if landmark[i][0] == -1 and landmark[i][1] == -1: 125 | landmark[i, :] = [0, 0] 126 | 127 | # Generate visibility mask 128 | # visible = inside image + not occluded by simulated rectangle 129 | # (see BlazePose paper for more detail) 130 | visibility = np.ones((landmark.shape[0], 1), dtype=int) 131 | for i in range(len(visibility)): 132 | if 0 > landmark[i][0] or landmark[i][0] >= image.shape[1] \ 133 | or 0 > landmark[i][1] or landmark[i][1] >= image.shape[0]: 134 | visibility[i] = 0 135 | 136 | image, landmark = square_crop_with_keypoints( 137 | image, bbox, landmark, pad_value="random") 138 | landmark = np.array(landmark) 139 | 140 | # Resize image 141 | old_img_size = np.array([image.shape[1], image.shape[0]]) 142 | image = cv2.resize(image, self.input_size) 143 | landmark = ( 144 | landmark * np.divide(np.array(self.input_size).astype(float), old_img_size)).astype(int) 145 | 146 | # Horizontal flip 147 | # and update the order of landmark points 148 | if self.random_flip and random.choice([0, 1]): 149 | image = cv2.flip(image, 1) 150 | 151 | # Mark missing keypoints 152 | missing_idxs = [] 153 | for i in range(landmark.shape[0]): 154 | if landmark[i, 0] == 0 and landmark[i, 1] == 0: 155 | missing_idxs.append(i) 156 | 157 | # Flip landmark 158 | landmark[:, 0] = self.input_size[0] - landmark[:, 0] 159 | 160 | # Restore missing keypoints 161 | for i in missing_idxs: 162 | landmark[i, 0] = 0 163 | landmark[i, 1] = 0 164 | 165 | # Change the indices of landmark points and visibility 166 | if self.symmetry_point_ids is not None: 167 | for p1, p2 in self.symmetry_point_ids: 168 | l = landmark[p1, :].copy() 169 | landmark[p1, :] = landmark[p2, :].copy() 170 | landmark[p2, :] = l 171 | 172 | if self.augment: 173 | image, landmark = augment_img(image, landmark) 174 | 175 | # Random occlusion 176 | # (see BlazePose paper for more detail) 177 | if self.augment and random.random() < 0.2: 178 | landmark = landmark.reshape(-1, 2) 179 | image, visibility = random_occlusion(image, landmark, visibility=visibility, 180 | rect_ratio=((0.2, 0.5), (0.2, 0.5)), rect_color="random") 181 | 182 | # Concatenate visibility into landmark 183 | visibility = np.array(visibility) 184 | visibility = visibility.reshape((landmark.shape[0], 1)) 185 | landmark = np.hstack((landmark, visibility)) 186 | 187 | # Generate heatmap 188 | gtmap = None 189 | if self.output_heatmap: 190 | gtmap_kps = landmark.copy() 191 | gtmap_kps[:, :2] = (np.array(gtmap_kps[:, :2]).astype(float) 192 | * np.array(self.heatmap_size) / np.array(self.input_size)).astype(int) 193 | gtmap = gen_gt_heatmap( 194 | gtmap_kps, self.heatmap_sigma, self.heatmap_size) 195 | # gtmap = np.clip(np.sum(gtmap, axis=2, keepdims=True), None, 1) 196 | 197 | # Uncomment following lines to debug augmentation 198 | # draw = visualize_keypoints(image, landmark, visibility, text_color=(0,0,255)) 199 | # cv2.namedWindow("draw", cv2.WINDOW_NORMAL) 200 | # cv2.imshow("draw", draw) 201 | # if self.output_heatmap: 202 | # cv2.imshow("gtmap", gtmap.sum(axis=2)) 203 | # cv2.waitKey(0) 204 | 205 | landmark = np.array(landmark) 206 | return image, landmark, gtmap 207 | -------------------------------------------------------------------------------- /src/data_loaders/humanpose_2head.py: -------------------------------------------------------------------------------- 1 | import json 2 | import math 3 | import os 4 | import random 5 | 6 | import cv2 7 | import numpy as np 8 | from tensorflow.keras.utils import Sequence 9 | 10 | from ..utils.heatmap import gen_gt_heatmap 11 | from ..utils.keypoints import normalize_landmark 12 | from ..utils.pre_processing import square_crop_with_keypoints 13 | from ..utils.visualizer import visualize_keypoints 14 | from .augmentation import augment_img 15 | from .augmentation_utils import random_occlusion 16 | 17 | 18 | class DataSequence(Sequence): 19 | 20 | def __init__(self, image_folder, label_file, batch_size=8, input_size=(256, 256), output_heatmap=True, heatmap_size=(128, 128), heatmap_sigma=4, n_points=16, shuffle=True, augment=False, random_flip=False, random_rotate=False, random_scale_on_crop=False, clip_landmark=False, symmetry_point_ids=None): 21 | 22 | self.batch_size = batch_size 23 | self.input_size = input_size 24 | self.output_heatmap = output_heatmap 25 | self.heatmap_size = heatmap_size 26 | self.heatmap_sigma = heatmap_sigma 27 | self.image_folder = image_folder 28 | self.random_flip = random_flip 29 | self.random_rotate = random_rotate 30 | self.random_scale_on_crop = random_scale_on_crop 31 | self.augment = augment 32 | self.n_points = n_points 33 | self.symmetry_point_ids = symmetry_point_ids 34 | self.clip_landmark = clip_landmark # Clip value of landmark to range [0, 1] 35 | 36 | with open(label_file, "r") as fp: 37 | self.anno = json.load(fp) 38 | 39 | if shuffle: 40 | random.shuffle(self.anno) 41 | 42 | def __len__(self): 43 | """ 44 | Number of batch in the Sequence. 45 | :return: The number of batches in the Sequence. 46 | """ 47 | return math.ceil(len(self.anno) / float(self.batch_size)) 48 | 49 | def __getitem__(self, idx): 50 | """ 51 | Retrieve the mask and the image in batches at position idx 52 | :param idx: position of the batch in the Sequence. 53 | :return: batches of image and the corresponding mask 54 | """ 55 | 56 | batch_data = self.anno[idx * 57 | self.batch_size: (1 + idx) * self.batch_size] 58 | 59 | batch_image = [] 60 | batch_landmark = [] 61 | batch_heatmap = [] 62 | batch_pushup = [] 63 | 64 | for data in batch_data: 65 | 66 | # Load and augment data 67 | image, landmark, heatmap, is_pushup = self.load_data(self.image_folder, data) 68 | 69 | batch_image.append(image) 70 | batch_landmark.append(landmark) 71 | batch_pushup.append(is_pushup) 72 | if self.output_heatmap: 73 | batch_heatmap.append(heatmap) 74 | 75 | batch_image = np.array(batch_image) 76 | batch_landmark = np.array(batch_landmark) 77 | if self.output_heatmap: 78 | batch_heatmap = np.array(batch_heatmap) 79 | 80 | batch_image = DataSequence.preprocess_images(batch_image) 81 | batch_landmark = self.preprocess_landmarks(batch_landmark) 82 | 83 | # print(batch_pushup) 84 | batch_pushup = np.array(batch_pushup) 85 | 86 | # Prevent values from going outside [0, 1] 87 | # Only applied for sigmoid output 88 | if self.clip_landmark: 89 | batch_landmark[batch_landmark < 0] = 0 90 | batch_landmark[batch_landmark > 1] = 1 91 | 92 | # if self.output_heatmap: 93 | # return batch_image, [batch_landmark, batch_heatmap] 94 | # else: 95 | # return batch_image, batch_landmark 96 | return batch_image, [batch_heatmap, batch_pushup] 97 | 98 | @staticmethod 99 | def preprocess_images(images): 100 | # Convert color to RGB 101 | for i in range(images.shape[0]): 102 | images[i] = cv2.cvtColor(images[i], cv2.COLOR_BGR2RGB) 103 | mean = np.array([0.5, 0.5, 0.5], dtype=np.float) 104 | images = np.array(images, dtype=np.float32) 105 | images = images / 255.0 106 | images -= mean 107 | return images 108 | 109 | def preprocess_landmarks(self, landmarks): 110 | 111 | first_dim = landmarks.shape[0] 112 | landmarks = landmarks.reshape((-1, 3)) 113 | landmarks = normalize_landmark(landmarks, self.input_size) 114 | landmarks = landmarks.reshape((first_dim, -1)) 115 | return landmarks 116 | 117 | def load_data(self, img_folder, data): 118 | 119 | # Load image 120 | path = os.path.join(img_folder, data["image"]) 121 | image = cv2.imread(path) 122 | 123 | # Load landmark and apply square cropping for image 124 | landmark = data["points"] 125 | bbox = data["bbox"] 126 | landmark = np.array(landmark) 127 | 128 | # Convert all (-1, -1) to (0, 0) 129 | for i in range(landmark.shape[0]): 130 | if landmark[i][0] == -1 and landmark[i][1] == -1: 131 | landmark[i, :] = [0, 0] 132 | 133 | # Generate visibility mask 134 | # visible = inside image + not occluded by simulated rectangle 135 | # (see BlazePose paper for more detail) 136 | visibility = np.ones((landmark.shape[0], 1), dtype=int) 137 | for i in range(len(visibility)): 138 | if 0 > landmark[i][0] or landmark[i][0] >= image.shape[1] \ 139 | or 0 > landmark[i][1] or landmark[i][1] >= image.shape[0] \ 140 | or (landmark[i][0] == 0 and landmark[i][1] == 0): 141 | visibility[i] = 0 142 | 143 | image, landmark = square_crop_with_keypoints( 144 | image, bbox, landmark, pad_value="random") 145 | landmark = np.array(landmark) 146 | 147 | # Resize image 148 | old_img_size = np.array([image.shape[1], image.shape[0]]) 149 | image = cv2.resize(image, self.input_size) 150 | landmark = ( 151 | landmark * np.divide(np.array(self.input_size).astype(float), old_img_size)).astype(int) 152 | 153 | # Horizontal flip 154 | # and update the order of landmark points 155 | if self.random_flip and random.choice([0, 1]): 156 | image = cv2.flip(image, 1) 157 | 158 | # Mark missing keypoints 159 | missing_idxs = [] 160 | for i in range(landmark.shape[0]): 161 | if landmark[i, 0] == 0 and landmark[i, 1] == 0: 162 | missing_idxs.append(i) 163 | 164 | # Flip landmark 165 | landmark[:, 0] = self.input_size[0] - landmark[:, 0] 166 | 167 | # Restore missing keypoints 168 | for i in missing_idxs: 169 | landmark[i, 0] = 0 170 | landmark[i, 1] = 0 171 | 172 | # Change the indices of landmark points and visibility 173 | if self.symmetry_point_ids is not None: 174 | for p1, p2 in self.symmetry_point_ids: 175 | l = landmark[p1, :].copy() 176 | landmark[p1, :] = landmark[p2, :].copy() 177 | landmark[p2, :] = l 178 | 179 | # Random occlusion 180 | # (see BlazePose paper for more detail) 181 | if self.augment and random.random() < 0.2: 182 | landmark = landmark.reshape(-1, 2) 183 | image, visibility = random_occlusion(image, landmark, visibility=visibility, 184 | rect_ratio=((0.2, 0.5), (0.2, 0.5)), rect_color="random") 185 | 186 | # Concatenate visibility into landmark 187 | visibility = np.array(visibility) 188 | visibility = visibility.reshape((landmark.shape[0], 1)) 189 | landmark = np.hstack((landmark, visibility)) 190 | 191 | if self.augment: 192 | 193 | # Mark missing keypoints 194 | missing_idxs = [] 195 | for i in range(landmark.shape[0]): 196 | if landmark[i, 0] == 0 and landmark[i, 1] == 0: 197 | missing_idxs.append(i) 198 | 199 | image, landmark = augment_img(image, landmark) 200 | 201 | # Restore missing keypoints 202 | for i in missing_idxs: 203 | landmark[i, 0] = 0 204 | landmark[i, 1] = 0 205 | 206 | 207 | # Generate heatmap 208 | gtmap = None 209 | if self.output_heatmap: 210 | gtmap_kps = landmark.copy() 211 | gtmap_kps[:, :2] = (np.array(gtmap_kps[:, :2]).astype(float) 212 | * np.array(self.heatmap_size) / np.array(self.input_size)).astype(int) 213 | gtmap = gen_gt_heatmap( 214 | gtmap_kps, self.heatmap_sigma, self.heatmap_size) 215 | 216 | # # Uncomment following lines to debug augmentation 217 | # draw = visualize_keypoints(image, landmark, visibility, text_color=(0,0,255)) 218 | # cv2.namedWindow("draw", cv2.WINDOW_NORMAL) 219 | # cv2.imshow("draw", draw) 220 | # if self.output_heatmap: 221 | # cv2.imshow("gtmap", gtmap.sum(axis=2)) 222 | # cv2.waitKey(0) 223 | 224 | landmark = np.array(landmark) 225 | return image, landmark, gtmap, int(data["is_pushing_up"]) 226 | -------------------------------------------------------------------------------- /src/data_loaders/pushup_recognition.py: -------------------------------------------------------------------------------- 1 | import json 2 | import math 3 | import os 4 | import random 5 | 6 | import cv2 7 | import numpy as np 8 | from tensorflow.keras.utils import Sequence 9 | 10 | from ..utils.heatmap import gen_gt_heatmap 11 | from ..utils.keypoints import normalize_landmark 12 | from ..utils.pre_processing import square_crop_with_keypoints 13 | from ..utils.visualizer import visualize_keypoints 14 | from .augmentation2 import augment_img 15 | from .augmentation_utils import random_occlusion 16 | 17 | 18 | class DataSequence(Sequence): 19 | 20 | def __init__(self, image_folder, label_file, batch_size=8, input_size=(256, 256), shuffle=True, augment=False, random_flip=False, random_rotate=False, random_scale_on_crop=False): 21 | 22 | self.batch_size = batch_size 23 | self.input_size = input_size 24 | self.image_folder = image_folder 25 | self.random_flip = random_flip 26 | self.random_rotate = random_rotate 27 | self.random_scale_on_crop = random_scale_on_crop 28 | self.augment = augment 29 | 30 | with open(label_file, "r") as fp: 31 | self.anno = json.load(fp) 32 | 33 | if shuffle: 34 | random.shuffle(self.anno) 35 | 36 | def __len__(self): 37 | """ 38 | Number of batch in the Sequence. 39 | :return: The number of batches in the Sequence. 40 | """ 41 | return math.ceil(len(self.anno) / float(self.batch_size)) 42 | 43 | def __getitem__(self, idx): 44 | """ 45 | Retrieve the mask and the image in batches at position idx 46 | :param idx: position of the batch in the Sequence. 47 | :return: batches of image and the corresponding mask 48 | """ 49 | 50 | batch_data = self.anno[idx * self.batch_size: (1 + idx) * self.batch_size] 51 | 52 | batch_image = [] 53 | batch_y = [] 54 | 55 | for data in batch_data: 56 | 57 | # Load and augment data 58 | image, y = self.load_data(self.image_folder, data) 59 | batch_image.append(image) 60 | batch_y.append(y) 61 | 62 | batch_image = np.array(batch_image) 63 | batch_y = np.array(batch_y) 64 | 65 | batch_image = DataSequence.preprocess_images(batch_image) 66 | 67 | return batch_image, batch_y 68 | 69 | @staticmethod 70 | def preprocess_images(images): 71 | # Convert color to RGB 72 | for i in range(images.shape[0]): 73 | images[i] = cv2.cvtColor(images[i], cv2.COLOR_BGR2RGB) 74 | mean = np.array([0.5, 0.5, 0.5], dtype=np.float) 75 | images = np.array(images, dtype=np.float32) 76 | images = images / 255.0 77 | images -= mean 78 | return images 79 | 80 | 81 | def load_data(self, img_folder, data): 82 | 83 | # Load image 84 | path = os.path.join(img_folder, data["image"]) 85 | image = cv2.imread(path) 86 | 87 | # Resize image 88 | image = cv2.resize(image, self.input_size) 89 | 90 | is_pushing_up = int(data["is_pushing_up"]) 91 | 92 | # Horizontal flip 93 | # and update the order of landmark points 94 | if self.random_flip and random.choice([0, 1]): 95 | image = cv2.flip(image, 1) 96 | 97 | if self.augment: 98 | image = augment_img(image, not is_pushing_up) 99 | 100 | # cv2.imshow("Image", image) 101 | # cv2.waitKey(0) 102 | 103 | 104 | 105 | return image, is_pushing_up 106 | -------------------------------------------------------------------------------- /src/metrics/f1.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow.keras.metrics import Precision, Recall 3 | 4 | class F1_Score(tf.keras.metrics.Metric): 5 | 6 | def __init__(self, name='f1_score', **kwargs): 7 | super().__init__(name=name, **kwargs) 8 | self.f1 = self.add_weight(name='f1', initializer='zeros') 9 | self.precision_fn = Precision(thresholds=0.5) 10 | self.recall_fn = Recall(thresholds=0.5) 11 | 12 | def update_state(self, y_true, y_pred, sample_weight=None): 13 | p = self.precision_fn(y_true, y_pred) 14 | r = self.recall_fn(y_true, y_pred) 15 | # since f1 is a variable, we use assign 16 | self.f1.assign(2 * ((p * r) / (p + r + 1e-6))) 17 | 18 | def result(self): 19 | return self.f1 20 | 21 | def reset_states(self): 22 | # we also need to reset the state of the precision and recall objects 23 | self.precision_fn.reset_states() 24 | self.recall_fn.reset_states() 25 | self.f1.assign(0) -------------------------------------------------------------------------------- /src/metrics/mae.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import numpy as np 3 | 4 | from ..utils.heatmap import find_keypoints_from_heatmap 5 | 6 | 7 | @tf.function 8 | def calc_mae(batch_keypoints_true, batch_keypoints_pred, keypoint_thresh=0.1): 9 | 10 | mask = tf.greater(batch_keypoints_true[:, :, 2], keypoint_thresh) 11 | tf.boolean_mask(batch_keypoints_true[:, :, 0], mask) 12 | tf.boolean_mask(batch_keypoints_true[:, :, 1], mask) 13 | 14 | mask = tf.greater(batch_keypoints_pred[:, :, 2], keypoint_thresh) 15 | tf.boolean_mask(batch_keypoints_pred[:, :, 0], mask) 16 | tf.boolean_mask(batch_keypoints_pred[:, :, 1], mask) 17 | 18 | error = tf.abs(batch_keypoints_pred[:, :, :2] - batch_keypoints_true[:, :, :2]) 19 | n_points = tf.cast(tf.reduce_prod(tf.shape(error)), tf.float32) 20 | error = tf.reduce_sum(tf.cast(error, tf.float32)) 21 | 22 | return error, n_points 23 | 24 | 25 | def get_mae_metric(): 26 | 27 | class MAE(tf.keras.metrics.Metric): 28 | 29 | def __init__(self, name='mae', **kwargs): 30 | super(MAE, self).__init__(name=name, **kwargs) 31 | self.total_error = self.add_weight(name='total_error', initializer='zeros') 32 | self.n_total = self.add_weight(name='n_total', initializer='zeros') 33 | 34 | def reset_states(self): 35 | self.total_error.assign(0) 36 | self.n_total.assign(0) 37 | 38 | def update_state(self, y_true, y_pred, sample_weight=None): 39 | 40 | keypoint_thresh = 0.0 41 | if len(tf.shape(y_true)) == 4: # Heatmap 42 | batch_keypoints_pred = find_keypoints_from_heatmap(y_pred, normalize=True) 43 | batch_keypoints_true = find_keypoints_from_heatmap(y_true, normalize=True) 44 | keypoint_thresh = 0.1 45 | elif len(tf.shape(y_true)) == 2: # Regression 46 | batch_keypoints_pred = tf.reshape( 47 | y_pred, (tf.shape(y_pred)[0], -1, 3)) 48 | batch_keypoints_true = tf.reshape( 49 | y_true, (tf.shape(y_true)[0], -1, 3)) 50 | keypoint_thresh = 0.5 51 | else: 52 | tf.print("Error: Wrong MAE input shape.") 53 | exit(0) 54 | error, n_points = calc_mae( 55 | batch_keypoints_true, batch_keypoints_pred, keypoint_thresh=keypoint_thresh) 56 | self.total_error.assign_add(error) 57 | self.n_total.assign_add(n_points) 58 | 59 | def result(self): 60 | return self.total_error / (self.n_total + 1e-5) 61 | 62 | return MAE 63 | -------------------------------------------------------------------------------- /src/metrics/pck.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import numpy as np 3 | 4 | from ..utils.heatmap import find_keypoints_from_heatmap 5 | 6 | 7 | @tf.function 8 | def calc_pck(batch_keypoints_true, batch_keypoints_pred, keypoint_thresh=0.0, ref_point_pair=(2, 4), pck_thresh=0.5): 9 | 10 | mask = tf.greater(batch_keypoints_true[:, :, 2], keypoint_thresh) 11 | tf.boolean_mask(batch_keypoints_true[:, :, 0], mask) 12 | tf.boolean_mask(batch_keypoints_true[:, :, 1], mask) 13 | 14 | mask = tf.greater(batch_keypoints_pred[:, :, 2], keypoint_thresh) 15 | tf.boolean_mask(batch_keypoints_pred[:, :, 0], mask) 16 | tf.boolean_mask(batch_keypoints_pred[:, :, 1], mask) 17 | 18 | ref_distance = tf.math.reduce_euclidean_norm( 19 | batch_keypoints_true[:, ref_point_pair[0], :2] - batch_keypoints_true[:, ref_point_pair[1], :2], axis=1, keepdims=True) 20 | error = tf.math.reduce_euclidean_norm( 21 | batch_keypoints_pred[:, :, :2] - batch_keypoints_true[:, :, :2], axis=2) 22 | wrong_matrix = tf.cast(error, tf.float32) > (tf.cast(ref_distance, tf.float32) * pck_thresh) 23 | n_wrongs = tf.reduce_sum(tf.cast(wrong_matrix, tf.float32)) 24 | return tf.cast(n_wrongs, tf.float32), tf.cast(tf.reduce_prod(tf.shape(wrong_matrix)), tf.float32) 25 | 26 | 27 | def get_pck_metric(ref_point_pair=(2, 4), thresh=0.2): 28 | 29 | class PCK(tf.keras.metrics.Metric): 30 | 31 | def __init__(self, name='pck', ref_point_pair=ref_point_pair, pck_thresh=thresh, **kwargs): 32 | super(PCK, self).__init__(name=name, **kwargs) 33 | self.ref_point_pair = ref_point_pair 34 | self.pck_thresh = pck_thresh 35 | self.n_wrongs = self.add_weight(name='n_wrongs', initializer='zeros') 36 | self.n_total = self.add_weight(name='n_total', initializer='zeros') 37 | 38 | def reset_states(self): 39 | self.n_wrongs.assign(0) 40 | self.n_total.assign(0) 41 | 42 | def update_state(self, y_true, y_pred, sample_weight=None): 43 | 44 | keypoint_thresh = 0.0 45 | if len(tf.shape(y_true)) == 4: # Heatmap 46 | batch_keypoints_pred = find_keypoints_from_heatmap(y_pred) 47 | batch_keypoints_true = find_keypoints_from_heatmap(y_true) 48 | keypoint_thresh = 0.1 49 | elif len(tf.shape(y_true)) == 2: # Regression 50 | batch_keypoints_pred = tf.reshape( 51 | y_pred, (tf.shape(y_pred)[0], -1, 3)) 52 | batch_keypoints_true = tf.reshape( 53 | y_true, (tf.shape(y_true)[0], -1, 3)) 54 | keypoint_thresh = 0.5 55 | else: 56 | tf.print("Error: Wrong PCK input shape.") 57 | exit(0) 58 | n_wrongs, n_points = calc_pck( 59 | batch_keypoints_true, batch_keypoints_pred, keypoint_thresh=keypoint_thresh, ref_point_pair=self.ref_point_pair, pck_thresh=self.pck_thresh) 60 | self.n_wrongs.assign_add(n_wrongs) 61 | self.n_total.assign_add(n_points) 62 | 63 | def result(self): 64 | return (self.n_total - self.n_wrongs) / (self.n_total + 1e-5) 65 | 66 | return PCK 67 | -------------------------------------------------------------------------------- /src/models/__init__.py: -------------------------------------------------------------------------------- 1 | from .blazepose_legacy import BlazePose as BlazePoseLegacy 2 | from .blazepose_full import BlazePose as BlazePoseFull 3 | from .blazepose_all_linear import BlazePose as BlazePoseAllLinear 4 | from .blazepose_with_pushup_classify import BlazePose as BlazePoseWithClassify 5 | from .pushup_recognition import PushUpRecognition 6 | 7 | class ModelCreator(): 8 | 9 | @staticmethod 10 | def create_model(model_name, n_points=0): 11 | 12 | if model_name == "SIGMOID_HEATMAP_SIGMOID_REGRESS_TWO_HEAD": 13 | return BlazePoseLegacy(n_points).build_model("TWO_HEAD") 14 | elif model_name == "SIGMOID_HEATMAP_SIGMOID_REGRESS_HEATMAP": 15 | return BlazePoseLegacy(n_points).build_model("HEATMAP") 16 | elif model_name == "SIGMOID_HEATMAP_SIGMOID_REGRESS_REGRESSION": 17 | return BlazePoseLegacy(n_points).build_model("REGRESSION") 18 | 19 | elif model_name == "SIGMOID_HEATMAP_LINEAR_REGRESS_TWO_HEAD": 20 | return BlazePoseFull(n_points).build_model("TWO_HEAD") 21 | elif model_name == "SIGMOID_HEATMAP_LINEAR_REGRESS_HEATMAP": 22 | return BlazePoseFull(n_points).build_model("HEATMAP") 23 | elif model_name == "SIGMOID_HEATMAP_LINEAR_REGRESS_REGRESSION": 24 | return BlazePoseFull(n_points).build_model("REGRESSION") 25 | 26 | elif model_name == "ALL_LINEAR_TWO_HEAD": 27 | return BlazePoseAllLinear(n_points).build_model("TWO_HEAD") 28 | elif model_name == "ALL_LINEAR_HEATMAP": 29 | return BlazePoseAllLinear(n_points).build_model("HEATMAP") 30 | elif model_name == "ALL_LINEAR_REGRESSION": 31 | return BlazePoseAllLinear(n_points).build_model("REGRESSION") 32 | 33 | elif model_name == "PUSHUP_RECOGNITION": 34 | return PushUpRecognition.build_model() 35 | 36 | elif model_name == "BLAZEPOSE_WITH_PUSHUP_CLASSIFY": 37 | return BlazePoseWithClassify(n_points).build_model("TWO_HEAD") 38 | -------------------------------------------------------------------------------- /src/models/blazepose_all_linear.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow.keras.models import Model 3 | from .blazepose_layers import BlazeBlock 4 | 5 | 6 | class BlazePose(): 7 | def __init__(self, num_keypoints: int): 8 | 9 | self.num_keypoints = num_keypoints 10 | 11 | self.conv1 = tf.keras.layers.Conv2D( 12 | filters=24, kernel_size=3, strides=(2, 2), padding='same', activation='relu' 13 | ) 14 | 15 | self.conv2_1 = tf.keras.models.Sequential([ 16 | tf.keras.layers.DepthwiseConv2D( 17 | kernel_size=3, padding='same', activation=None), 18 | tf.keras.layers.Conv2D(filters=24, kernel_size=1, activation=None) 19 | ]) 20 | 21 | self.conv2_2 = tf.keras.models.Sequential([ 22 | tf.keras.layers.DepthwiseConv2D( 23 | kernel_size=3, padding='same', activation=None), 24 | tf.keras.layers.Conv2D(filters=24, kernel_size=1, activation=None) 25 | ]) 26 | 27 | # === Heatmap === 28 | 29 | self.conv3 = BlazeBlock(block_num=3, channel=48) 30 | self.conv4 = BlazeBlock(block_num=4, channel=96) 31 | self.conv5 = BlazeBlock(block_num=5, channel=192) 32 | self.conv6 = BlazeBlock(block_num=6, channel=288) 33 | 34 | self.conv7a = tf.keras.models.Sequential([ 35 | tf.keras.layers.DepthwiseConv2D( 36 | kernel_size=3, padding="same", activation=None), 37 | tf.keras.layers.Conv2D( 38 | filters=48, kernel_size=1, activation="relu"), 39 | tf.keras.layers.UpSampling2D(size=(2, 2), interpolation="bilinear") 40 | ]) 41 | self.conv7b = tf.keras.models.Sequential([ 42 | tf.keras.layers.DepthwiseConv2D( 43 | kernel_size=3, padding="same", activation=None), 44 | tf.keras.layers.Conv2D( 45 | filters=48, kernel_size=1, activation="relu") 46 | ]) 47 | 48 | self.conv8a = tf.keras.layers.UpSampling2D( 49 | size=(2, 2), interpolation="bilinear") 50 | self.conv8b = tf.keras.models.Sequential([ 51 | tf.keras.layers.DepthwiseConv2D( 52 | kernel_size=3, padding="same", activation=None), 53 | tf.keras.layers.Conv2D( 54 | filters=48, kernel_size=1, activation="relu") 55 | ]) 56 | 57 | self.conv9a = tf.keras.layers.UpSampling2D( 58 | size=(2, 2), interpolation="bilinear") 59 | self.conv9b = tf.keras.models.Sequential([ 60 | tf.keras.layers.DepthwiseConv2D( 61 | kernel_size=3, padding="same", activation=None), 62 | tf.keras.layers.Conv2D( 63 | filters=48, kernel_size=1, activation="relu") 64 | ]) 65 | 66 | self.conv10a = tf.keras.models.Sequential([ 67 | tf.keras.layers.DepthwiseConv2D( 68 | kernel_size=3, padding="same", activation=None), 69 | tf.keras.layers.Conv2D( 70 | filters=8, kernel_size=1, activation="relu"), 71 | tf.keras.layers.UpSampling2D(size=(2, 2), interpolation="bilinear") 72 | ]) 73 | self.conv10b = tf.keras.models.Sequential([ 74 | tf.keras.layers.DepthwiseConv2D( 75 | kernel_size=3, padding="same", activation=None), 76 | tf.keras.layers.Conv2D(filters=8, kernel_size=1, activation="relu") 77 | ]) 78 | 79 | self.conv11 = tf.keras.models.Sequential([ 80 | tf.keras.layers.DepthwiseConv2D( 81 | kernel_size=3, padding="same", activation=None), 82 | tf.keras.layers.Conv2D( 83 | filters=8, kernel_size=1, activation="relu"), 84 | tf.keras.layers.Conv2D( 85 | filters=self.num_keypoints, kernel_size=3, padding="same", activation=None) # -> Heatmap output 86 | ], name="heatmap") 87 | 88 | # === Regression === 89 | 90 | # In: 1, 64, 64, 48) 91 | self.conv12a = BlazeBlock(block_num=4, channel=96, name_prefix="regression_conv12a_") # input res: 64 92 | self.conv12b = tf.keras.models.Sequential([ 93 | tf.keras.layers.DepthwiseConv2D( 94 | kernel_size=3, padding="same", activation=None, name="regression_conv12b_depthwise"), 95 | tf.keras.layers.Conv2D( 96 | filters=96, kernel_size=1, activation="relu", name="regression_conv12b_conv1x1") 97 | ], name="regression_conv12b") 98 | 99 | self.conv13a = BlazeBlock(block_num=5, channel=192, name_prefix="regression_conv13a_") # input res: 32 100 | self.conv13b = tf.keras.models.Sequential([ 101 | tf.keras.layers.DepthwiseConv2D( 102 | kernel_size=3, padding="same", activation=None, name="regression_conv13b_depthwise"), 103 | tf.keras.layers.Conv2D( 104 | filters=192, kernel_size=1, activation="relu", name="regression_conv13b_conv1x1") 105 | ], name="regression_conv13b") 106 | 107 | self.conv14a = BlazeBlock(block_num=6, channel=288, name_prefix="regression_conv14a_") # input res: 16 108 | self.conv14b = tf.keras.models.Sequential([ 109 | tf.keras.layers.DepthwiseConv2D( 110 | kernel_size=3, padding="same", activation=None, name="regression_conv14b_depthwise"), 111 | tf.keras.layers.Conv2D( 112 | filters=288, kernel_size=1, activation="relu", name="regression_conv14b_conv1x1") 113 | ], name="regression_conv14b") 114 | 115 | self.conv15 = tf.keras.models.Sequential([ 116 | BlazeBlock(block_num=7, channel=288, channel_padding=0, name_prefix="regression_conv15a_"), 117 | BlazeBlock(block_num=7, channel=288, channel_padding=0, name_prefix="regression_conv15b_") 118 | ], name="regression_conv15") 119 | 120 | self.conv16 = tf.keras.models.Sequential([ 121 | tf.keras.layers.Conv2D( 122 | filters=3*self.num_keypoints, kernel_size=2, activation=None), 123 | tf.keras.layers.Reshape((3*self.num_keypoints, 1), name="regression_final_dense") 124 | ], name="joints") 125 | 126 | def build_model(self, model_type): 127 | 128 | input_x = tf.keras.layers.Input(shape=(256, 256, 3)) 129 | 130 | # Block 1 131 | # In: 1x256x256x3 132 | x = self.conv1(input_x) 133 | 134 | # Block 2 135 | # In: 1x128x128x24 136 | x = x + self.conv2_1(x) 137 | x = tf.keras.activations.relu(x) 138 | 139 | # Block 3 140 | # In: 1x128x128x24 141 | x = x + self.conv2_2(x) 142 | y0 = tf.keras.activations.relu(x) 143 | 144 | # === Heatmap === 145 | 146 | # In: 1, 128, 128, 24 147 | y1 = self.conv3(y0) 148 | y2 = self.conv4(y1) 149 | y3 = self.conv5(y2) 150 | y4 = self.conv6(y3) 151 | 152 | x = self.conv7a(y4) + self.conv7b(y3) 153 | x = self.conv8a(x) + self.conv8b(y2) 154 | # In: 1, 32, 32, 96 155 | x = self.conv9a(x) + self.conv9b(y1) 156 | # In: 1, 64, 64, 48 157 | y = self.conv10a(x) + self.conv10b(y0) 158 | heatmap = self.conv11(y) 159 | 160 | # === Regression === 161 | 162 | # Stop gradient for regression on 2-head model 163 | if model_type == "TWO_HEAD": 164 | x = tf.keras.backend.stop_gradient(x) 165 | y2 = tf.keras.backend.stop_gradient(y2) 166 | y3 = tf.keras.backend.stop_gradient(y3) 167 | y4 = tf.keras.backend.stop_gradient(y4) 168 | 169 | x = self.conv12a(x) + self.conv12b(y2) 170 | # In: 1, 32, 32, 96 171 | x = self.conv13a(x) + self.conv13b(y3) 172 | # In: 1, 16, 16, 192 173 | x = self.conv14a(x) + self.conv14b(y4) 174 | # In: 1, 8, 8, 288 175 | x = self.conv15(x) 176 | # In: 1, 2, 2, 288 177 | joints = self.conv16(x) 178 | 179 | if model_type == "TWO_HEAD": 180 | return Model(inputs=input_x, outputs=[joints, heatmap]) 181 | elif model_type == "HEATMAP": 182 | return Model(inputs=input_x, outputs=heatmap) 183 | elif model_type == "REGRESSION": 184 | return Model(inputs=input_x, outputs=joints) 185 | else: 186 | raise ValueError("Wrong model type.") 187 | -------------------------------------------------------------------------------- /src/models/blazepose_full.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow.keras.models import Model 3 | from .blazepose_layers import BlazeBlock 4 | 5 | 6 | class BlazePose(): 7 | def __init__(self, num_keypoints: int): 8 | 9 | self.num_keypoints = num_keypoints 10 | 11 | self.conv1 = tf.keras.layers.Conv2D( 12 | filters=24, kernel_size=3, strides=(2, 2), padding='same', activation='relu' 13 | ) 14 | 15 | self.conv2_1 = tf.keras.models.Sequential([ 16 | tf.keras.layers.DepthwiseConv2D( 17 | kernel_size=3, padding='same', activation=None), 18 | tf.keras.layers.Conv2D(filters=24, kernel_size=1, activation=None) 19 | ]) 20 | 21 | self.conv2_2 = tf.keras.models.Sequential([ 22 | tf.keras.layers.DepthwiseConv2D( 23 | kernel_size=3, padding='same', activation=None), 24 | tf.keras.layers.Conv2D(filters=24, kernel_size=1, activation=None) 25 | ]) 26 | 27 | # === Heatmap === 28 | 29 | self.conv3 = BlazeBlock(block_num=3, channel=48) 30 | self.conv4 = BlazeBlock(block_num=4, channel=96) 31 | self.conv5 = BlazeBlock(block_num=5, channel=192) 32 | self.conv6 = BlazeBlock(block_num=6, channel=288) 33 | 34 | self.conv7a = tf.keras.models.Sequential([ 35 | tf.keras.layers.DepthwiseConv2D( 36 | kernel_size=3, padding="same", activation=None), 37 | tf.keras.layers.Conv2D( 38 | filters=48, kernel_size=1, activation="relu"), 39 | tf.keras.layers.UpSampling2D(size=(2, 2), interpolation="bilinear") 40 | ]) 41 | self.conv7b = tf.keras.models.Sequential([ 42 | tf.keras.layers.DepthwiseConv2D( 43 | kernel_size=3, padding="same", activation=None), 44 | tf.keras.layers.Conv2D( 45 | filters=48, kernel_size=1, activation="relu") 46 | ]) 47 | 48 | self.conv8a = tf.keras.layers.UpSampling2D( 49 | size=(2, 2), interpolation="bilinear") 50 | self.conv8b = tf.keras.models.Sequential([ 51 | tf.keras.layers.DepthwiseConv2D( 52 | kernel_size=3, padding="same", activation=None), 53 | tf.keras.layers.Conv2D( 54 | filters=48, kernel_size=1, activation="relu") 55 | ]) 56 | 57 | self.conv9a = tf.keras.layers.UpSampling2D( 58 | size=(2, 2), interpolation="bilinear") 59 | self.conv9b = tf.keras.models.Sequential([ 60 | tf.keras.layers.DepthwiseConv2D( 61 | kernel_size=3, padding="same", activation=None), 62 | tf.keras.layers.Conv2D( 63 | filters=48, kernel_size=1, activation="relu") 64 | ]) 65 | 66 | self.conv10a = tf.keras.models.Sequential([ 67 | tf.keras.layers.DepthwiseConv2D( 68 | kernel_size=3, padding="same", activation=None), 69 | tf.keras.layers.Conv2D( 70 | filters=8, kernel_size=1, activation="relu"), 71 | tf.keras.layers.UpSampling2D(size=(2, 2), interpolation="bilinear") 72 | ]) 73 | self.conv10b = tf.keras.models.Sequential([ 74 | tf.keras.layers.DepthwiseConv2D( 75 | kernel_size=3, padding="same", activation=None), 76 | tf.keras.layers.Conv2D(filters=8, kernel_size=1, activation="relu") 77 | ]) 78 | 79 | self.conv11 = tf.keras.models.Sequential([ 80 | tf.keras.layers.DepthwiseConv2D( 81 | kernel_size=3, padding="same", activation=None), 82 | tf.keras.layers.Conv2D( 83 | filters=8, kernel_size=1, activation="relu"), 84 | tf.keras.layers.Conv2D( 85 | filters=self.num_keypoints, kernel_size=3, padding="same", activation=None) # -> Heatmap output 86 | ]) 87 | 88 | # === Regression === 89 | 90 | # In: 1, 64, 64, 48) 91 | self.conv12a = BlazeBlock(block_num=4, channel=96, name_prefix="regression_conv12a_") # input res: 64 92 | self.conv12b = tf.keras.models.Sequential([ 93 | tf.keras.layers.DepthwiseConv2D( 94 | kernel_size=3, padding="same", activation=None, name="regression_conv12b_depthwise"), 95 | tf.keras.layers.Conv2D( 96 | filters=96, kernel_size=1, activation="relu", name="regression_conv12b_conv1x1") 97 | ], name="regression_conv12b") 98 | 99 | self.conv13a = BlazeBlock(block_num=5, channel=192, name_prefix="regression_conv13a_") # input res: 32 100 | self.conv13b = tf.keras.models.Sequential([ 101 | tf.keras.layers.DepthwiseConv2D( 102 | kernel_size=3, padding="same", activation=None, name="regression_conv13b_depthwise"), 103 | tf.keras.layers.Conv2D( 104 | filters=192, kernel_size=1, activation="relu", name="regression_conv13b_conv1x1") 105 | ], name="regression_conv13b") 106 | 107 | self.conv14a = BlazeBlock(block_num=6, channel=288, name_prefix="regression_conv14a_") # input res: 16 108 | self.conv14b = tf.keras.models.Sequential([ 109 | tf.keras.layers.DepthwiseConv2D( 110 | kernel_size=3, padding="same", activation=None, name="regression_conv14b_depthwise"), 111 | tf.keras.layers.Conv2D( 112 | filters=288, kernel_size=1, activation="relu", name="regression_conv14b_conv1x1") 113 | ], name="regression_conv14b") 114 | 115 | self.conv15 = tf.keras.models.Sequential([ 116 | BlazeBlock(block_num=7, channel=288, channel_padding=0, name_prefix="regression_conv15a_"), 117 | BlazeBlock(block_num=7, channel=288, channel_padding=0, name_prefix="regression_conv15b_") 118 | ], name="regression_conv15") 119 | 120 | self.conv16 = tf.keras.models.Sequential([ 121 | tf.keras.layers.Conv2D( 122 | filters=3*self.num_keypoints, kernel_size=2, activation=None), 123 | tf.keras.layers.Reshape((3*self.num_keypoints, 1), name="regression_final_dense") 124 | ], name="joints") 125 | 126 | def build_model(self, model_type): 127 | 128 | input_x = tf.keras.layers.Input(shape=(256, 256, 3)) 129 | 130 | # Block 1 131 | # In: 1x256x256x3 132 | x = self.conv1(input_x) 133 | 134 | # Block 2 135 | # In: 1x128x128x24 136 | x = x + self.conv2_1(x) 137 | x = tf.keras.activations.relu(x) 138 | 139 | # Block 3 140 | # In: 1x128x128x24 141 | x = x + self.conv2_2(x) 142 | y0 = tf.keras.activations.relu(x) 143 | 144 | # === Heatmap === 145 | 146 | # In: 1, 128, 128, 24 147 | y1 = self.conv3(y0) 148 | y2 = self.conv4(y1) 149 | y3 = self.conv5(y2) 150 | y4 = self.conv6(y3) 151 | 152 | x = self.conv7a(y4) + self.conv7b(y3) 153 | x = self.conv8a(x) + self.conv8b(y2) 154 | # In: 1, 32, 32, 96 155 | x = self.conv9a(x) + self.conv9b(y1) 156 | # In: 1, 64, 64, 48 157 | y = self.conv10a(x) + self.conv10b(y0) 158 | y = self.conv11(y) 159 | 160 | # In: 1, 128, 128, 8 161 | heatmap = tf.keras.layers.Activation("sigmoid", name="heatmap")(y) 162 | 163 | # === Regression === 164 | 165 | # Stop gradient for regression on 2-head model 166 | if model_type == "TWO_HEAD": 167 | x = tf.keras.backend.stop_gradient(x) 168 | y2 = tf.keras.backend.stop_gradient(y2) 169 | y3 = tf.keras.backend.stop_gradient(y3) 170 | y4 = tf.keras.backend.stop_gradient(y4) 171 | 172 | x = self.conv12a(x) + self.conv12b(y2) 173 | # In: 1, 32, 32, 96 174 | x = self.conv13a(x) + self.conv13b(y3) 175 | # In: 1, 16, 16, 192 176 | x = self.conv14a(x) + self.conv14b(y4) 177 | # In: 1, 8, 8, 288 178 | x = self.conv15(x) 179 | # In: 1, 2, 2, 288 180 | joints = self.conv16(x) 181 | 182 | if model_type == "TWO_HEAD": 183 | return Model(inputs=input_x, outputs=[joints, heatmap]) 184 | elif model_type == "HEATMAP": 185 | return Model(inputs=input_x, outputs=heatmap) 186 | elif model_type == "REGRESSION": 187 | return Model(inputs=input_x, outputs=joints) 188 | else: 189 | raise ValueError("Wrong model type.") 190 | -------------------------------------------------------------------------------- /src/models/blazepose_layers.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | 3 | 4 | class ChannelPadding(tf.keras.layers.Layer): 5 | def __init__(self, channels): 6 | super(ChannelPadding, self).__init__() 7 | self.channels = channels 8 | 9 | def build(self, input_shapes): 10 | self.pad_shape = tf.constant( 11 | [[0, 0], [0, 0], [0, 0], [0, self.channels - input_shapes[-1]]]) 12 | 13 | def call(self, x): 14 | return tf.pad(x, self.pad_shape) 15 | 16 | 17 | class BlazeBlock(tf.keras.Model): 18 | def __init__(self, block_num=3, channel=48, channel_padding=1, name_prefix=""): 19 | super(BlazeBlock, self).__init__() 20 | 21 | self.downsample_a = tf.keras.models.Sequential([ 22 | tf.keras.layers.DepthwiseConv2D(kernel_size=3, strides=( 23 | 2, 2), padding='same', activation=None, name=name_prefix+"downsample_a_depthwise"), 24 | tf.keras.layers.Conv2D( 25 | filters=channel, kernel_size=1, activation=None, name=name_prefix+"downsample_a_conv1x1") 26 | ]) 27 | if channel_padding: 28 | self.downsample_b = tf.keras.models.Sequential([ 29 | tf.keras.layers.MaxPool2D(pool_size=(2, 2)), 30 | ChannelPadding(channels=channel) 31 | ]) 32 | else: 33 | self.downsample_b = tf.keras.layers.MaxPool2D(pool_size=(2, 2)) 34 | 35 | self.conv = list() 36 | for i in range(block_num): 37 | self.conv.append(tf.keras.models.Sequential([ 38 | tf.keras.layers.DepthwiseConv2D( 39 | kernel_size=3, padding='same', activation=None, name=name_prefix+"conv_block_{}".format(i+1)), 40 | tf.keras.layers.Conv2D( 41 | filters=channel, kernel_size=1, activation=None) 42 | ])) 43 | 44 | def call(self, x): 45 | x = tf.keras.activations.relu( 46 | self.downsample_a(x) + self.downsample_b(x)) 47 | for i in range(len(self.conv)): 48 | x = tf.keras.activations.relu(x + self.conv[i](x)) 49 | return x 50 | -------------------------------------------------------------------------------- /src/models/blazepose_legacy.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow.keras.models import Model 3 | from .blazepose_layers import BlazeBlock 4 | 5 | 6 | class BlazePose(): 7 | def __init__(self, num_keypoints: int): 8 | 9 | self.num_keypoints = num_keypoints 10 | self.conv1 = tf.keras.layers.Conv2D( 11 | filters=24, kernel_size=3, strides=(2, 2), padding='same', activation='relu' 12 | ) 13 | 14 | self.conv2_1 = tf.keras.models.Sequential([ 15 | tf.keras.layers.DepthwiseConv2D( 16 | kernel_size=3, padding='same', activation=None), 17 | tf.keras.layers.Conv2D(filters=24, kernel_size=1, activation=None) 18 | ]) 19 | 20 | self.conv2_2 = tf.keras.models.Sequential([ 21 | tf.keras.layers.DepthwiseConv2D( 22 | kernel_size=3, padding='same', activation=None), 23 | tf.keras.layers.Conv2D(filters=24, kernel_size=1, activation=None) 24 | ]) 25 | 26 | # === Heatmap === 27 | 28 | self.conv3 = BlazeBlock(block_num=3, channel=48) # input res: 128 29 | self.conv4 = BlazeBlock(block_num=4, channel=96) # input res: 64 30 | self.conv5 = BlazeBlock(block_num=5, channel=192) # input res: 32 31 | self.conv6 = BlazeBlock(block_num=6, channel=288) # input res: 16 32 | 33 | self.conv7a = tf.keras.models.Sequential([ 34 | tf.keras.layers.DepthwiseConv2D( 35 | kernel_size=3, padding="same", activation=None), 36 | tf.keras.layers.Conv2D( 37 | filters=48, kernel_size=1, activation="relu"), 38 | tf.keras.layers.UpSampling2D(size=(2, 2), interpolation="bilinear") 39 | ]) 40 | self.conv7b = tf.keras.models.Sequential([ 41 | tf.keras.layers.DepthwiseConv2D( 42 | kernel_size=3, padding="same", activation=None), 43 | tf.keras.layers.Conv2D( 44 | filters=48, kernel_size=1, activation="relu") 45 | ]) 46 | 47 | self.conv8a = tf.keras.layers.UpSampling2D( 48 | size=(2, 2), interpolation="bilinear") 49 | self.conv8b = tf.keras.models.Sequential([ 50 | tf.keras.layers.DepthwiseConv2D( 51 | kernel_size=3, padding="same", activation=None), 52 | tf.keras.layers.Conv2D( 53 | filters=48, kernel_size=1, activation="relu") 54 | ]) 55 | 56 | self.conv9a = tf.keras.layers.UpSampling2D( 57 | size=(2, 2), interpolation="bilinear") 58 | self.conv9b = tf.keras.models.Sequential([ 59 | tf.keras.layers.DepthwiseConv2D( 60 | kernel_size=3, padding="same", activation=None), 61 | tf.keras.layers.Conv2D( 62 | filters=48, kernel_size=1, activation="relu") 63 | ]) 64 | 65 | self.conv10a = tf.keras.models.Sequential([ 66 | tf.keras.layers.DepthwiseConv2D( 67 | kernel_size=3, padding="same", activation=None), 68 | tf.keras.layers.Conv2D( 69 | filters=8, kernel_size=1, activation="relu"), 70 | tf.keras.layers.UpSampling2D(size=(2, 2), interpolation="bilinear") 71 | ]) 72 | self.conv10b = tf.keras.models.Sequential([ 73 | tf.keras.layers.DepthwiseConv2D( 74 | kernel_size=3, padding="same", activation=None), 75 | tf.keras.layers.Conv2D(filters=8, kernel_size=1, activation="relu") 76 | ]) 77 | 78 | self.conv11 = tf.keras.models.Sequential([ 79 | tf.keras.layers.DepthwiseConv2D( 80 | kernel_size=3, padding="same", activation=None), 81 | tf.keras.layers.Conv2D( 82 | filters=8, kernel_size=1, activation="relu"), 83 | # heatmap 84 | tf.keras.layers.Conv2D( 85 | filters=self.num_keypoints, kernel_size=3, padding="same", activation=None) 86 | ]) 87 | 88 | # === Regression === 89 | 90 | # In: 1, 64, 64, 48) 91 | self.conv12a = BlazeBlock(block_num=4, channel=96, name_prefix="regression_conv12a_") # input res: 64 92 | self.conv12b = tf.keras.models.Sequential([ 93 | tf.keras.layers.DepthwiseConv2D( 94 | kernel_size=3, padding="same", activation=None, name="regression_conv12b_depthwise"), 95 | tf.keras.layers.Conv2D( 96 | filters=96, kernel_size=1, activation="relu", name="regression_conv12b_conv1x1") 97 | ], name="regression_conv12b") 98 | 99 | self.conv13a = BlazeBlock(block_num=5, channel=192, name_prefix="regression_conv13a_") # input res: 32 100 | self.conv13b = tf.keras.models.Sequential([ 101 | tf.keras.layers.DepthwiseConv2D( 102 | kernel_size=3, padding="same", activation=None, name="regression_conv13b_depthwise"), 103 | tf.keras.layers.Conv2D( 104 | filters=192, kernel_size=1, activation="relu", name="regression_conv13b_conv1x1") 105 | ], name="regression_conv13b") 106 | 107 | self.conv14a = BlazeBlock(block_num=6, channel=288, name_prefix="regression_conv14a_") # input res: 16 108 | self.conv14b = tf.keras.models.Sequential([ 109 | tf.keras.layers.DepthwiseConv2D( 110 | kernel_size=3, padding="same", activation=None, name="regression_conv14b_depthwise"), 111 | tf.keras.layers.Conv2D( 112 | filters=288, kernel_size=1, activation="relu", name="regression_conv14b_conv1x1") 113 | ], name="regression_conv14b") 114 | 115 | self.conv15 = tf.keras.models.Sequential([ 116 | BlazeBlock(block_num=7, channel=288, channel_padding=0, name_prefix="regression_conv15a_"), 117 | BlazeBlock(block_num=7, channel=288, channel_padding=0, name_prefix="regression_conv15b_") 118 | ], name="regression_conv15") 119 | 120 | self.conv16 = tf.keras.models.Sequential([ 121 | tf.keras.layers.GlobalAveragePooling2D(), 122 | # In: 1, 1, 1, 288 123 | tf.keras.layers.Dense(units=3*self.num_keypoints, 124 | activation=None, name="regression_final_dense"), 125 | ], name="regression_conv16") 126 | 127 | def build_model(self, model_type): 128 | 129 | input_x = tf.keras.layers.Input(shape=(256, 256, 3)) 130 | 131 | # Block 1 132 | # In: 1x256x256x3 133 | x = self.conv1(input_x) 134 | 135 | # Block 2 136 | # In: 1x128x128x24 137 | x = x + self.conv2_1(x) 138 | x = tf.keras.activations.relu(x) 139 | 140 | # Block 3 141 | # In: 1x128x128x24 142 | x = x + self.conv2_2(x) 143 | y0 = tf.keras.activations.relu(x) 144 | 145 | # === Heatmap === 146 | 147 | # In: 1, 128, 128, 24 148 | y1 = self.conv3(y0) 149 | y2 = self.conv4(y1) 150 | y3 = self.conv5(y2) 151 | y4 = self.conv6(y3) 152 | 153 | x = self.conv7a(y4) + self.conv7b(y3) 154 | x = self.conv8a(x) + self.conv8b(y2) 155 | # In: 1, 32, 32, 96 156 | x = self.conv9a(x) + self.conv9b(y1) 157 | # In: 1, 64, 64, 48 158 | y = self.conv10a(x) + self.conv10b(y0) 159 | # In: 1, 128, 128, 8 160 | heatmap = tf.keras.layers.Activation("sigmoid", name="heatmap")(self.conv11(y)) 161 | 162 | # === Regression === 163 | 164 | # Stop gradient for regression on 2-head model 165 | if model_type == "TWO_HEAD": 166 | x = tf.keras.backend.stop_gradient(x) 167 | y2 = tf.keras.backend.stop_gradient(y2) 168 | y3 = tf.keras.backend.stop_gradient(y3) 169 | y4 = tf.keras.backend.stop_gradient(y4) 170 | 171 | x = self.conv12a(x) + self.conv12b(y2) 172 | # In: 1, 32, 32, 96 173 | x = self.conv13a(x) + self.conv13b(y3) 174 | # In: 1, 16, 16, 192 175 | x = self.conv14a(x) + self.conv14b(y4) 176 | # In: 1, 8, 8, 288 177 | x = self.conv15(x) 178 | # In: 1, 2, 2, 288 179 | joints = self.conv16(x) 180 | joints = tf.keras.layers.Activation("sigmoid", name="joints")(joints) 181 | 182 | if model_type == "TWO_HEAD": 183 | return Model(inputs=input_x, outputs=[joints, heatmap]) 184 | elif model_type == "HEATMAP": 185 | return Model(inputs=input_x, outputs=heatmap) 186 | elif model_type == "REGRESSION": 187 | return Model(inputs=input_x, outputs=joints) 188 | else: 189 | raise ValueError("Wrong model type.") 190 | -------------------------------------------------------------------------------- /src/models/blazepose_with_pushup_classify.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow.keras.models import Model 3 | from .blazepose_layers import BlazeBlock 4 | 5 | 6 | class BlazePose(): 7 | def __init__(self, num_keypoints: int): 8 | 9 | self.num_keypoints = num_keypoints 10 | 11 | self.conv1 = tf.keras.layers.Conv2D( 12 | filters=24, kernel_size=3, strides=(2, 2), padding='same', activation='relu' 13 | ) 14 | 15 | self.conv2_1 = tf.keras.models.Sequential([ 16 | tf.keras.layers.DepthwiseConv2D( 17 | kernel_size=3, padding='same', activation=None), 18 | tf.keras.layers.Conv2D(filters=24, kernel_size=1, activation=None) 19 | ]) 20 | 21 | self.conv2_2 = tf.keras.models.Sequential([ 22 | tf.keras.layers.DepthwiseConv2D( 23 | kernel_size=3, padding='same', activation=None), 24 | tf.keras.layers.Conv2D(filters=24, kernel_size=1, activation=None) 25 | ]) 26 | 27 | # === Heatmap === 28 | 29 | self.conv3 = BlazeBlock(block_num=3, channel=48) 30 | self.conv4 = BlazeBlock(block_num=4, channel=96) 31 | self.conv5 = BlazeBlock(block_num=5, channel=192) 32 | self.conv6 = BlazeBlock(block_num=6, channel=288) 33 | 34 | self.conv7a = tf.keras.models.Sequential([ 35 | tf.keras.layers.DepthwiseConv2D( 36 | kernel_size=3, padding="same", activation=None), 37 | tf.keras.layers.Conv2D( 38 | filters=48, kernel_size=1, activation="relu"), 39 | tf.keras.layers.UpSampling2D(size=(2, 2), interpolation="bilinear") 40 | ]) 41 | self.conv7b = tf.keras.models.Sequential([ 42 | tf.keras.layers.DepthwiseConv2D( 43 | kernel_size=3, padding="same", activation=None), 44 | tf.keras.layers.Conv2D( 45 | filters=48, kernel_size=1, activation="relu") 46 | ]) 47 | 48 | self.conv8a = tf.keras.layers.UpSampling2D( 49 | size=(2, 2), interpolation="bilinear") 50 | self.conv8b = tf.keras.models.Sequential([ 51 | tf.keras.layers.DepthwiseConv2D( 52 | kernel_size=3, padding="same", activation=None), 53 | tf.keras.layers.Conv2D( 54 | filters=48, kernel_size=1, activation="relu") 55 | ]) 56 | 57 | self.conv9a = tf.keras.layers.UpSampling2D( 58 | size=(2, 2), interpolation="bilinear") 59 | self.conv9b = tf.keras.models.Sequential([ 60 | tf.keras.layers.DepthwiseConv2D( 61 | kernel_size=3, padding="same", activation=None), 62 | tf.keras.layers.Conv2D( 63 | filters=48, kernel_size=1, activation="relu") 64 | ]) 65 | 66 | self.conv10a = tf.keras.models.Sequential([ 67 | tf.keras.layers.DepthwiseConv2D( 68 | kernel_size=3, padding="same", activation=None), 69 | tf.keras.layers.Conv2D( 70 | filters=8, kernel_size=1, activation="relu"), 71 | tf.keras.layers.UpSampling2D(size=(2, 2), interpolation="bilinear") 72 | ]) 73 | self.conv10b = tf.keras.models.Sequential([ 74 | tf.keras.layers.DepthwiseConv2D( 75 | kernel_size=3, padding="same", activation=None), 76 | tf.keras.layers.Conv2D(filters=8, kernel_size=1, activation="relu") 77 | ]) 78 | 79 | self.conv11 = tf.keras.models.Sequential([ 80 | tf.keras.layers.DepthwiseConv2D( 81 | kernel_size=3, padding="same", activation=None), 82 | tf.keras.layers.Conv2D( 83 | filters=8, kernel_size=1, activation="relu"), 84 | tf.keras.layers.Conv2D( 85 | filters=self.num_keypoints, kernel_size=3, padding="same", activation=None) # -> Heatmap output 86 | ]) 87 | 88 | # === Regression === 89 | self.conv12 = tf.keras.models.Sequential([ 90 | tf.keras.layers.DepthwiseConv2D( 91 | kernel_size=3, padding="same", activation=None), 92 | tf.keras.layers.Conv2D( 93 | filters=48, kernel_size=1, activation="relu"), 94 | tf.keras.layers.DepthwiseConv2D( 95 | kernel_size=3, padding="same", activation=None), 96 | tf.keras.layers.Conv2D( 97 | filters=48, kernel_size=1, activation="relu"), 98 | ], name="regression_1") 99 | self.conv13 = tf.keras.models.Sequential([ 100 | tf.keras.layers.GlobalAveragePooling2D(), 101 | tf.keras.layers.Dropout(0.2), 102 | tf.keras.layers.Dense(units=1, activation=None, name="regression_2") 103 | ]) 104 | 105 | def build_model(self, model_type): 106 | 107 | input_x = tf.keras.layers.Input(shape=(256, 256, 3)) 108 | 109 | # Block 1 110 | # In: 1x256x256x3 111 | x = self.conv1(input_x) 112 | 113 | # Block 2 114 | # In: 1x128x128x24 115 | x = x + self.conv2_1(x) 116 | x = tf.keras.activations.relu(x) 117 | 118 | # Block 3 119 | # In: 1x128x128x24 120 | x = x + self.conv2_2(x) 121 | y0 = tf.keras.activations.relu(x) 122 | 123 | # === Heatmap === 124 | 125 | # In: 1, 128, 128, 24 126 | y1 = self.conv3(y0) 127 | y2 = self.conv4(y1) 128 | y3 = self.conv5(y2) 129 | y4 = self.conv6(y3) 130 | 131 | x = self.conv7a(y4) + self.conv7b(y3) 132 | x = self.conv8a(x) + self.conv8b(y2) 133 | # In: 1, 32, 32, 96 134 | x = self.conv9a(x) + self.conv9b(y1) 135 | # In: 1, 64, 64, 48 136 | y = self.conv10a(x) + self.conv10b(y0) 137 | y = self.conv11(y) 138 | 139 | # In: 1, 128, 128, 8 140 | heatmap = tf.keras.layers.Activation("sigmoid", name="heatmap")(y) 141 | 142 | # === Regression === 143 | 144 | # Stop gradient for regression on 2-head model 145 | # x = tf.keras.backend.stop_gradient(y4) 146 | x = self.conv12(y4) 147 | x = self.conv13(x) 148 | is_pushup = tf.keras.layers.Activation("sigmoid", name="is_pushup")(x) 149 | 150 | return Model(inputs=input_x, outputs=[heatmap, is_pushup]) -------------------------------------------------------------------------------- /src/models/pushup_recognition.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow.keras.models import Model 3 | 4 | class PushUpRecognition(): 5 | 6 | @staticmethod 7 | def build_model(): 8 | 9 | input_x = tf.keras.layers.Input(shape=(224, 224, 3)) 10 | x = tf.keras.applications.MobileNetV2(input_shape=(224, 224, 3), 11 | include_top=False, 12 | weights='imagenet', 13 | alpha=0.5)(input_x) 14 | x = tf.keras.layers.GlobalAveragePooling2D()(x) 15 | x = tf.keras.layers.Dropout(0.5)(x) 16 | x = tf.keras.layers.Dense(1, activation="sigmoid")(x) 17 | 18 | return Model(inputs=input_x, outputs=x) -------------------------------------------------------------------------------- /src/train_phase.py: -------------------------------------------------------------------------------- 1 | import enum 2 | 3 | 4 | class TrainPhase(enum.Enum): 5 | HEATMAP = "HEATMAP" 6 | REGRESSION = "REGRESSION" 7 | UNKNOWN = "UNKNOWN" -------------------------------------------------------------------------------- /src/trainers/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vietanhdev/tf-blazepose/95db5ced1a4084c6a424161fbfc38c4e70612c97/src/trainers/__init__.py -------------------------------------------------------------------------------- /src/trainers/blazepose_trainer.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pathlib 3 | import importlib 4 | 5 | import tensorflow as tf 6 | from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard 7 | 8 | from ..train_phase import TrainPhase 9 | from ..models import ModelCreator 10 | 11 | from .losses import euclidean_distance_loss, focal_tversky, focal_loss, get_huber_loss, get_wing_loss 12 | from ..metrics.pck import get_pck_metric 13 | from ..metrics.mae import get_mae_metric 14 | 15 | def train(config): 16 | """Train model 17 | 18 | Args: 19 | config (dict): Training configuration from configuration file 20 | """ 21 | 22 | import tensorflow as tf 23 | 24 | train_config = config["train"] 25 | test_config = config["test"] 26 | model_config = config["model"] 27 | 28 | # Dataloader 29 | datalib = importlib.import_module("src.data_loaders.{}".format(config["data_loader"])) 30 | DataSequence = datalib.DataSequence 31 | 32 | # Initialize model 33 | model = ModelCreator.create_model(model_config["model_type"], model_config["num_keypoints"]) 34 | 35 | # Freeze regression branch when training heatmap 36 | train_phase = TrainPhase(train_config.get("train_phase", "UNKNOWN")) 37 | if train_phase == train_phase.HEATMAP: 38 | print("Freeze these layers:") 39 | for layer in model.layers: 40 | if layer.name.startswith("regression"): 41 | print(layer.name) 42 | layer.trainable = False 43 | # Freeze heatmap branch when training regression 44 | elif train_phase == train_phase.REGRESSION: 45 | print("Freeze these layers:") 46 | for layer in model.layers: 47 | if not layer.name.startswith("regression"): 48 | print(layer.name) 49 | layer.trainable = False 50 | 51 | print(model.summary()) 52 | 53 | loss_functions = { 54 | "heatmap": train_config["heatmap_loss"], 55 | "joints": train_config["keypoint_loss"] 56 | } 57 | 58 | # Replace all names with functions for custom losses 59 | for k in loss_functions.keys(): 60 | if loss_functions[k] == "euclidean_distance_loss": 61 | loss_functions[k] = euclidean_distance_loss 62 | elif loss_functions[k] == "focal_tversky": 63 | loss_functions[k] = focal_tversky 64 | elif loss_functions[k] == "huber": 65 | loss_functions[k] = get_huber_loss(delta=1.0, weights=(1.0, 1.0)) 66 | elif loss_functions[k] == "focal": 67 | loss_functions[k] = focal_loss(gamma=2, alpha=0.25) 68 | elif loss_functions[k] == "wing_loss": 69 | loss_functions[k] = get_wing_loss() 70 | 71 | 72 | loss_weights = train_config["loss_weights"] 73 | hm_pck_metric = get_pck_metric(ref_point_pair=test_config["pck_ref_points_idxs"], thresh=test_config["pck_thresh"])(name="pck1") 74 | hm_mae_metric = get_mae_metric()(name="mae1") 75 | kp_pck_metric = get_pck_metric(ref_point_pair=test_config["pck_ref_points_idxs"], thresh=test_config["pck_thresh"])(name="pck2") 76 | kp_mae_metric = get_mae_metric()(name="mae2") 77 | model.compile(optimizer=tf.optimizers.SGD(train_config["learning_rate"], momentum=0.9), 78 | loss=loss_functions, loss_weights=loss_weights, metrics={"heatmap": [hm_pck_metric, hm_mae_metric], "joints": [kp_pck_metric, kp_mae_metric]}) 79 | 80 | # Load pretrained model 81 | if train_config["load_weights"]: 82 | print("Loading model weights: " + 83 | train_config["pretrained_weights_path"]) 84 | model.load_weights(train_config["pretrained_weights_path"]) 85 | 86 | # Create experiment folder 87 | exp_path = os.path.join("experiments/{}".format(config["experiment_name"])) 88 | pathlib.Path(exp_path).mkdir(parents=True, exist_ok=True) 89 | 90 | # Define the callbacks 91 | tb_log_path = os.path.join(exp_path, "tb_logs") 92 | tb = TensorBoard(log_dir=tb_log_path, write_graph=True) 93 | model_folder_path = os.path.join(exp_path, "models") 94 | pathlib.Path(model_folder_path).mkdir(parents=True, exist_ok=True) 95 | mc = ModelCheckpoint(filepath=os.path.join( 96 | model_folder_path, "model_ep{epoch:03d}.h5"), save_weights_only=True, save_format="h5", verbose=1) 97 | 98 | # Load data 99 | train_dataset = DataSequence( 100 | config["data"]["train_images"], 101 | config["data"]["train_labels"], 102 | batch_size=train_config["train_batch_size"], 103 | input_size=(model_config["im_width"], model_config["im_height"]), 104 | heatmap_size=(model_config["heatmap_width"], model_config["heatmap_height"]), 105 | heatmap_sigma=model_config["heatmap_kp_sigma"], 106 | n_points=model_config["num_keypoints"], 107 | symmetry_point_ids=config["data"]["symmetry_point_ids"], 108 | shuffle=True, augment=True, random_flip=True, random_rotate=True, 109 | clip_landmark=train_config["keypoint_loss"] == "binary_crossentropy", 110 | random_scale_on_crop=True) 111 | val_dataset = DataSequence( 112 | config["data"]["val_images"], 113 | config["data"]["val_labels"], 114 | batch_size=train_config["val_batch_size"], 115 | input_size=(model_config["im_width"], model_config["im_height"]), 116 | heatmap_size=(model_config["heatmap_width"], model_config["heatmap_height"]), 117 | heatmap_sigma=model_config["heatmap_kp_sigma"], 118 | n_points=model_config["num_keypoints"], 119 | symmetry_point_ids=config["data"]["symmetry_point_ids"], 120 | shuffle=False, augment=False, random_flip=False, random_rotate=False, 121 | clip_landmark=train_config["keypoint_loss"] == "binary_crossentropy",random_scale_on_crop=False) 122 | 123 | test_dataset = DataSequence( 124 | config["data"]["test_images"], 125 | config["data"]["test_labels"], 126 | batch_size=train_config["val_batch_size"], 127 | input_size=(model_config["im_width"], model_config["im_height"]), 128 | heatmap_size=(model_config["heatmap_width"], model_config["heatmap_height"]), 129 | heatmap_sigma=model_config["heatmap_kp_sigma"], 130 | n_points=model_config["num_keypoints"], 131 | symmetry_point_ids=config["data"]["symmetry_point_ids"], 132 | shuffle=False, augment=False, random_flip=False, random_rotate=False, 133 | clip_landmark=train_config["keypoint_loss"] == "binary_crossentropy",random_scale_on_crop=False) 134 | 135 | # Initial epoch. Use when continue training 136 | initial_epoch = train_config.get("initial_epoch", 0) 137 | 138 | 139 | # Train 140 | model.fit(train_dataset, 141 | epochs=train_config["nb_epochs"], 142 | steps_per_epoch=len(train_dataset), 143 | validation_data=val_dataset, 144 | validation_steps=len(val_dataset), 145 | callbacks=[tb, mc], 146 | initial_epoch=initial_epoch, 147 | verbose=1) 148 | 149 | 150 | def load_model(config, model_path): 151 | """Load pretrained model 152 | 153 | Args: 154 | config (dict): Model configuration 155 | model (str): Path to h5 model to be tested 156 | """ 157 | 158 | model_config = config["model"] 159 | 160 | # Initialize model and load weights 161 | model = ModelCreator.create_model(model_config["model_type"], model_config["num_keypoints"]) 162 | model.compile() 163 | model.load_weights(model_path) 164 | 165 | return model -------------------------------------------------------------------------------- /src/trainers/losses.py: -------------------------------------------------------------------------------- 1 | from tensorflow.keras.losses import binary_crossentropy 2 | import tensorflow.keras.backend as K 3 | import tensorflow as tf 4 | import math 5 | 6 | epsilon = 1e-5 7 | smooth = 1 8 | 9 | def focal_loss(gamma=2., alpha=.25): 10 | def focal_loss_fixed(y_true, y_pred): 11 | pt_1 = tf.where(tf.equal(y_true, 1), y_pred, tf.ones_like(y_pred)) 12 | pt_0 = tf.where(tf.equal(y_true, 0), y_pred, tf.zeros_like(y_pred)) 13 | return -K.mean(alpha * K.pow(1. - pt_1, gamma) * K.log(pt_1)) - K.mean((1 - alpha) * K.pow(pt_0, gamma) * K.log(1. - pt_0)) 14 | return focal_loss_fixed 15 | 16 | def dsc(y_true, y_pred): 17 | smooth = 1. 18 | y_true_f = K.flatten(y_true) 19 | y_pred_f = K.flatten(y_pred) 20 | intersection = K.sum(y_true_f * y_pred_f) 21 | score = (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth) 22 | return score 23 | 24 | def dice_loss(y_true, y_pred): 25 | loss = 1 - dsc(y_true, y_pred) 26 | return loss 27 | 28 | def bce_dice_loss(y_true, y_pred): 29 | loss = binary_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred) 30 | return loss 31 | 32 | def confusion(y_true, y_pred): 33 | smooth=1 34 | y_pred_pos = K.clip(y_pred, 0, 1) 35 | y_pred_neg = 1 - y_pred_pos 36 | y_pos = K.clip(y_true, 0, 1) 37 | y_neg = 1 - y_pos 38 | tp = K.sum(y_pos * y_pred_pos) 39 | fp = K.sum(y_neg * y_pred_pos) 40 | fn = K.sum(y_pos * y_pred_neg) 41 | prec = (tp + smooth)/(tp+fp+smooth) 42 | recall = (tp+smooth)/(tp+fn+smooth) 43 | return prec, recall 44 | 45 | def tp(y_true, y_pred): 46 | smooth = 1 47 | y_pred_pos = K.round(K.clip(y_pred, 0, 1)) 48 | y_pos = K.round(K.clip(y_true, 0, 1)) 49 | tp = (K.sum(y_pos * y_pred_pos) + smooth)/ (K.sum(y_pos) + smooth) 50 | return tp 51 | 52 | def tn(y_true, y_pred): 53 | smooth = 1 54 | y_pred_pos = K.round(K.clip(y_pred, 0, 1)) 55 | y_pred_neg = 1 - y_pred_pos 56 | y_pos = K.round(K.clip(y_true, 0, 1)) 57 | y_neg = 1 - y_pos 58 | tn = (K.sum(y_neg * y_pred_neg) + smooth) / (K.sum(y_neg) + smooth ) 59 | return tn 60 | 61 | def tversky(y_true, y_pred): 62 | y_true_pos = K.flatten(y_true) 63 | y_pred_pos = K.flatten(y_pred) 64 | true_pos = K.sum(y_true_pos * y_pred_pos) 65 | false_neg = K.sum(y_true_pos * (1-y_pred_pos)) 66 | false_pos = K.sum((1-y_true_pos)*y_pred_pos) 67 | alpha = 0.7 68 | return (true_pos + smooth)/(true_pos + alpha*false_neg + (1-alpha)*false_pos + smooth) 69 | 70 | def tversky_loss(y_true, y_pred): 71 | return 1 - tversky(y_true,y_pred) 72 | 73 | def focal_tversky(y_true,y_pred): 74 | pt_1 = tversky(y_true, y_pred) 75 | gamma = 0.75 76 | return K.pow((1-pt_1), gamma) 77 | 78 | def euclidean_distance_loss(y_true, y_pred): 79 | """ 80 | Euclidean distance loss 81 | https://en.wikipedia.org/wiki/Euclidean_distance 82 | :param y_true: TensorFlow tensor 83 | :param y_pred: TensorFlow tensor of the same shape as y_true 84 | :return: float 85 | """ 86 | return K.sqrt(K.sum(K.square(y_pred - y_true), axis=-1)) 87 | 88 | def wing_loss(landmarks, labels, w=10.0, epsilon=2.0): 89 | """ 90 | Arguments: 91 | landmarks, labels: float tensors with shape [batch_size, num_landmarks, 2]. 92 | w, epsilon: a float numbers. 93 | Returns: 94 | a float tensor with shape []. 95 | """ 96 | with tf.name_scope('wing_loss'): 97 | x = landmarks - labels 98 | c = w * (1.0 - math.log(1.0 + w/epsilon)) 99 | absolute_x = tf.abs(x) 100 | losses = tf.where( 101 | tf.greater(w, absolute_x), 102 | w * tf.log(1.0 + absolute_x/epsilon), 103 | absolute_x - c 104 | ) 105 | loss = tf.reduce_mean(tf.reduce_sum(losses, axis=[1, 2]), axis=0) 106 | return loss 107 | 108 | 109 | def get_huber_loss2(delta=1.0, weights=1.0): 110 | # https://www.tensorflow.org/api_docs/python/tf/compat/v1/losses/huber_loss 111 | 112 | def huber_loss(y_true, y_pred): 113 | return tf.compat.v1.losses.huber_loss( 114 | y_true, y_pred, weights=weights, delta=delta 115 | ) 116 | 117 | return huber_loss 118 | 119 | def get_huber_loss(delta=1.0, weights=(1.0, 100.0)): 120 | ''' 121 | ' Huber loss. 122 | ' https://jaromiru.com/2017/05/27/on-using-huber-loss-in-deep-q-learning/ 123 | ' https://en.wikipedia.org/wiki/Huber_loss 124 | ''' 125 | def huber_loss(y_true, y_pred, clip_delta=delta, weights=weights): 126 | error = y_true - y_pred 127 | cond = tf.keras.backend.abs(error) < clip_delta 128 | squared_loss = 0.5 * tf.keras.backend.square(error) 129 | linear_loss = clip_delta * (tf.keras.backend.abs(error) - 0.5 * clip_delta) 130 | total_loss = tf.where(cond, squared_loss, linear_loss) 131 | weights = (y_true * weights[1]) + weights[0] 132 | total_loss = total_loss * weights 133 | return total_loss 134 | 135 | ''' 136 | ' Same as above but returns the mean loss. 137 | ''' 138 | def huber_loss_mean(y_true, y_pred, clip_delta=delta): 139 | return tf.keras.backend.mean(huber_loss(y_true, y_pred, clip_delta)) 140 | 141 | return huber_loss 142 | 143 | 144 | def get_wing_loss(w=10.0, epsilon=2.0): 145 | """ 146 | Arguments: 147 | landmarks, labels: float tensors with shape [batch_size, num_landmarks, 2]. 148 | w, epsilon: a float numbers. 149 | Returns: 150 | a float tensor with shape []. 151 | """ 152 | 153 | def wing_loss(y_true, y_pred): 154 | with tf.name_scope('wing_loss'): 155 | x = y_pred - y_true 156 | c = w * (1.0 - math.log(1.0 + w/epsilon)) 157 | absolute_x = tf.abs(x) 158 | losses = tf.where( 159 | tf.greater(w, absolute_x), 160 | w * tf.log(1.0 + absolute_x/epsilon), 161 | absolute_x - c 162 | ) 163 | loss = tf.reduce_mean(tf.reduce_sum(losses, axis=[1, 2]), axis=0) 164 | return loss 165 | -------------------------------------------------------------------------------- /src/trainers/pushup_recognition_trainer.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pathlib 3 | import importlib 4 | 5 | import tensorflow as tf 6 | from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard 7 | 8 | from ..models import ModelCreator 9 | 10 | from .losses import euclidean_distance_loss, focal_tversky, focal_loss, get_huber_loss 11 | from ..metrics.f1 import F1_Score, Recall, Precision 12 | 13 | def train(config): 14 | """Train model 15 | 16 | Args: 17 | config (dict): Training configuration from configuration file 18 | """ 19 | 20 | train_config = config["train"] 21 | test_config = config["test"] 22 | model_config = config["model"] 23 | 24 | # Dataloader 25 | datalib = importlib.import_module("src.data_loaders.{}".format(config["data_loader"])) 26 | DataSequence = datalib.DataSequence 27 | 28 | # Initialize model 29 | model = ModelCreator.create_model(model_config["model_type"]) 30 | 31 | print(model.summary()) 32 | model.compile(optimizer=tf.optimizers.Adam(train_config["learning_rate"]), 33 | loss=train_config["loss"], metrics=[F1_Score(), Recall(), Precision()]) 34 | 35 | 36 | # Load pretrained model 37 | if train_config["load_weights"]: 38 | print("Loading model weights: " + 39 | train_config["pretrained_weights_path"]) 40 | model.load_weights(train_config["pretrained_weights_path"]) 41 | 42 | # Create experiment folder 43 | exp_path = os.path.join("experiments/{}".format(config["experiment_name"])) 44 | pathlib.Path(exp_path).mkdir(parents=True, exist_ok=True) 45 | 46 | # Define the callbacks 47 | tb_log_path = os.path.join(exp_path, "tb_logs") 48 | tb = TensorBoard(log_dir=tb_log_path, write_graph=True) 49 | model_folder_path = os.path.join(exp_path, "models") 50 | pathlib.Path(model_folder_path).mkdir(parents=True, exist_ok=True) 51 | mc = ModelCheckpoint(filepath=os.path.join( 52 | model_folder_path, "model_ep{epoch:03d}.h5"), save_weights_only=False, save_format="h5", verbose=1) 53 | 54 | # Load data 55 | train_dataset = DataSequence( 56 | config["data"]["train_images"], 57 | config["data"]["train_labels"], 58 | batch_size=train_config["train_batch_size"], 59 | input_size=(model_config["im_width"], model_config["im_height"]), 60 | shuffle=True, augment=True, random_flip=True, random_rotate=True, 61 | random_scale_on_crop=True) 62 | val_dataset = DataSequence( 63 | config["data"]["val_images"], 64 | config["data"]["val_labels"], 65 | batch_size=train_config["val_batch_size"], 66 | input_size=(model_config["im_width"], model_config["im_height"]), 67 | shuffle=False, augment=False, random_flip=False, random_rotate=False,random_scale_on_crop=False) 68 | 69 | # Initial epoch. Use when continue training 70 | initial_epoch = train_config.get("initial_epoch", 0) 71 | 72 | # Train 73 | model.fit(train_dataset, 74 | epochs=train_config["nb_epochs"], 75 | steps_per_epoch=len(train_dataset), 76 | validation_data=val_dataset, 77 | validation_steps=len(val_dataset), 78 | callbacks=[tb, mc], 79 | initial_epoch=initial_epoch, 80 | verbose=1) 81 | 82 | 83 | def load_model(config, model_path): 84 | """Load pretrained model 85 | 86 | Args: 87 | config (dict): Model configuration 88 | model (str): Path to h5 model to be tested 89 | """ 90 | 91 | model_config = config["model"] 92 | 93 | # Initialize model and load weights 94 | model = ModelCreator.create_model(model_config["model_type"]) 95 | model.compile() 96 | model.load_weights(model_path) 97 | 98 | return model -------------------------------------------------------------------------------- /src/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vietanhdev/tf-blazepose/95db5ced1a4084c6a424161fbfc38c4e70612c97/src/utils/__init__.py -------------------------------------------------------------------------------- /src/utils/heatmap.py: -------------------------------------------------------------------------------- 1 | from scipy.ndimage import gaussian_filter, maximum_filter 2 | import numpy as np 3 | import tensorflow as tf 4 | 5 | 6 | def gen_point_heatmap(img, pt, sigma, type='Gaussian'): 7 | """Draw label map for 1 point 8 | 9 | Args: 10 | img: Input image 11 | pt: Point in format (x, y) 12 | sigma: Sigma param in Gaussian or Cauchy kernel 13 | type (str, optional): Type of kernel used to generate heatmap. Defaults to 'Gaussian'. 14 | 15 | Returns: 16 | np.array: Heatmap image 17 | """ 18 | # Draw a 2D gaussian 19 | # Adopted from https://github.com/anewell/pose-hg-train/blob/master/src/pypose/draw.py 20 | 21 | # Check that any part of the gaussian is in-bounds 22 | ul = [int(pt[0] - 3 * sigma), int(pt[1] - 3 * sigma)] 23 | br = [int(pt[0] + 3 * sigma + 1), int(pt[1] + 3 * sigma + 1)] 24 | if (ul[0] >= img.shape[1] or ul[1] >= img.shape[0] or 25 | br[0] < 0 or br[1] < 0): 26 | # If not, just return the image as is 27 | return img 28 | 29 | # Generate gaussian 30 | size = 6 * sigma + 1 31 | x = np.arange(0, size, 1, float) 32 | y = x[:, np.newaxis] 33 | x0 = y0 = size // 2 34 | # The gaussian is not normalized, we want the center value to equal 1 35 | if type == 'Gaussian': 36 | g = np.exp(- ((x - x0) ** 2 + (y - y0) ** 2) / (2 * sigma ** 2)) 37 | elif type == 'Cauchy': 38 | g = sigma / (((x - x0) ** 2 + (y - y0) ** 2 + sigma ** 2) ** 1.5) 39 | 40 | # Usable gaussian range 41 | g_x = max(0, -ul[0]), min(br[0], img.shape[1]) - ul[0] 42 | g_y = max(0, -ul[1]), min(br[1], img.shape[0]) - ul[1] 43 | # Image range 44 | img_x = max(0, ul[0]), min(br[0], img.shape[1]) 45 | img_y = max(0, ul[1]), min(br[1], img.shape[0]) 46 | 47 | img[img_y[0]:img_y[1], img_x[0]:img_x[1]] = g[g_y[0]:g_y[1], g_x[0]:g_x[1]] 48 | return img 49 | 50 | 51 | def gen_gt_heatmap(keypoints, sigma, heatmap_size): 52 | """Generate groundtruth heatmap 53 | 54 | Args: 55 | keypoints: Keypoints in format [[x1, y1], [x2, y2], ...] 56 | sigma: Sigma param in Gaussian 57 | heatmap_size: Heatmap size in format (width, height) 58 | 59 | Returns: 60 | Generated heatmap 61 | """ 62 | npart = keypoints.shape[0] 63 | gtmap = np.zeros(shape=(heatmap_size[1], heatmap_size[0], npart), dtype=float) 64 | for i in range(npart): 65 | if keypoints[i, 0] == 0 and keypoints[i, 1] == 0: 66 | continue 67 | is_visible = True 68 | if len(keypoints[0]) > 2: 69 | visibility = keypoints[i, 2] 70 | if visibility <= 0: 71 | is_visible = False 72 | gtmap[:, :, i] = gen_point_heatmap( 73 | gtmap[:, :, i], keypoints[i, :], sigma) 74 | if not is_visible: 75 | gtmap[:, :, i] *= 0.5 76 | return gtmap 77 | 78 | 79 | @tf.function 80 | def nms(heat, kernel=3): 81 | hmax = tf.nn.max_pool2d(heat, kernel, 1, padding='SAME') 82 | keep = tf.cast(tf.equal(heat, hmax), tf.float32) 83 | return heat*keep 84 | 85 | 86 | @tf.function 87 | def find_keypoints_from_heatmap(batch_heatmaps, normalize=False): 88 | 89 | batch, height, width, n_points = tf.shape(batch_heatmaps)[0], tf.shape( 90 | batch_heatmaps)[1], tf.shape(batch_heatmaps)[2], tf.shape(batch_heatmaps)[3] 91 | 92 | batch_heatmaps = nms(batch_heatmaps) 93 | 94 | flat_tensor = tf.reshape(batch_heatmaps, (batch, -1, n_points)) 95 | 96 | # Argmax of the flat tensor 97 | argmax = tf.argmax(flat_tensor, axis=1) 98 | argmax = tf.cast(argmax, tf.int32) 99 | scores = tf.math.reduce_max(flat_tensor, axis=1) 100 | 101 | # Convert indexes into 2D coordinates 102 | argmax_y = argmax // width 103 | argmax_x = argmax % width 104 | argmax_y = tf.cast(argmax_y, tf.float32) 105 | argmax_x = tf.cast(argmax_x, tf.float32) 106 | 107 | if normalize: 108 | argmax_x = argmax_x / tf.cast(width, tf.float32) 109 | argmax_y = argmax_y / tf.cast(height, tf.float32) 110 | 111 | # Shape: batch * 3 * n_points 112 | batch_keypoints = tf.stack((argmax_x, argmax_y, scores), axis=1) 113 | # Shape: batch * n_points * 3 114 | batch_keypoints = tf.transpose(batch_keypoints, [0, 2, 1]) 115 | 116 | return batch_keypoints 117 | -------------------------------------------------------------------------------- /src/utils/keypoints.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | import numpy as np 4 | import cv2 5 | 6 | 7 | def unnormalize_landmark(landmark, image_size): 8 | """Unnormalize landmark by image size 9 | 10 | Args: 11 | landmark: Normalized keypoints in format [[x1, y1], [x2, y2], ...] 12 | image_size: Image size in format (width, height) 13 | 14 | Returns: 15 | Unnormalized landmark 16 | """ 17 | image_size = np.array(image_size) 18 | landmark[:, :2] = np.multiply( 19 | np.array(landmark[:, :2]), np.array(image_size).reshape((1, 2))) 20 | return landmark 21 | 22 | 23 | def normalize_landmark(landmark, image_size): 24 | """Normalize landmark by image size 25 | 26 | Args: 27 | landmark: Keypoints in format [[x1, y1], [x2, y2], ...] 28 | image_size: Image size in format (width, height) 29 | 30 | Returns: 31 | Normalized landmark 32 | """ 33 | image_size = np.array(image_size) 34 | landmark = np.array(landmark) 35 | landmark = landmark.astype(float) 36 | landmark[:, :2] = np.divide( 37 | landmark[:, :2], np.array(image_size).reshape((1, 2))) 38 | return landmark 39 | 40 | 41 | -------------------------------------------------------------------------------- /src/utils/pre_processing.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import cv2 3 | import random 4 | 5 | def calculate_bbox_from_keypoints(kps, padding=0.1): 6 | """Estimate body bounding box from all body keypoints 7 | 8 | Args: 9 | kps: Keypoints. Shape: (n, 2) 10 | padding: Padding the smallest keypoint bounding box to form body bounding box 11 | """ 12 | 13 | kps = np.array(kps) 14 | min_x = np.min(kps[:, 0]) 15 | min_y = np.min(kps[:, 1]) 16 | max_x = np.max(kps[:, 0]) 17 | max_y = np.max(kps[:, 1]) 18 | 19 | width = max_x - min_x 20 | height = max_y - min_y 21 | 22 | x1 = min_x - padding * width 23 | x2 = max_x + padding * width 24 | y1 = min_y - padding * height 25 | y2 = max_y + padding * height 26 | 27 | return [[x1, y1], [x2, y2]] 28 | 29 | 30 | def square_padding(im, desired_size=800, return_padding=False): 31 | 32 | old_size = im.shape[:2] # old_size is in (height, width) format 33 | 34 | ratio = float(desired_size) / max(old_size) 35 | new_size = tuple([int(x*ratio) for x in old_size]) 36 | 37 | im = cv2.resize(im, (new_size[1], new_size[0])) 38 | 39 | delta_w = desired_size - new_size[1] 40 | delta_h = desired_size - new_size[0] 41 | top, bottom = delta_h // 2, delta_h - (delta_h // 2) 42 | left, right = delta_w // 2, delta_w - (delta_w // 2) 43 | 44 | color = [0, 0, 0] 45 | new_im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, 46 | value=color) 47 | 48 | if not return_padding: 49 | return new_im 50 | else: 51 | h, w = new_im.shape[:2] 52 | padding = (top / h, left / w, bottom / h, right / w) 53 | return new_im, padding 54 | 55 | 56 | def square_crop_with_keypoints(image, bbox, keypoints, pad_value=0): 57 | """Square crop an image knowing a bounding box. This function also update keypoints accordingly 58 | Steps: Extend bbox to a square -> Pad image -> Crop image -> Recalculate keypoints 59 | 60 | Args: 61 | image: Input image 62 | bbox: Bounding box. Shape: (2, 2), Format: [[x1, y1], [x2, y2]] 63 | keypoints: Keypoints in format [[x1, y1], [x2, y2], ...] 64 | pad_value: Scalar indicating padding color 65 | 66 | Returns: 67 | cropped_image, keypoints 68 | """ 69 | 70 | bbox_width = bbox[1][0] - bbox[0][0] 71 | bbox_height = bbox[1][1] - bbox[0][1] 72 | im_height, im_width = image.shape[:2] 73 | 74 | if bbox_width > bbox_height: # Padding on y-axis 75 | pad = int((bbox_width - bbox_height) / 2) 76 | bbox[0][1] -= pad 77 | bbox[1][1] = bbox[0][1] + bbox_width 78 | elif bbox_height > bbox_width: # Padding on x-axis 79 | pad = int((bbox_height - bbox_width) / 2) 80 | bbox[0][0] -= pad 81 | bbox[1][0] = bbox[0][0] + bbox_height 82 | 83 | pad_top = 0 84 | pad_bottom = 0 85 | pad_left = 0 86 | pad_right = 0 87 | if bbox[0][0] < 0: 88 | pad_left = -bbox[0][0] 89 | bbox[0][0] = 0 90 | bbox[1][0] += pad_left 91 | if bbox[0][1] < 0: 92 | pad_top = -bbox[0][1] 93 | bbox[0][1] = 0 94 | bbox[1][1] += pad_top 95 | if bbox[1][0] >= im_width: 96 | pad_right = bbox[1][0] - im_width + 1 97 | if bbox[1][1] >= im_height: 98 | pad_bottom = bbox[1][1] - im_height + 1 99 | 100 | if pad_value == "random": 101 | pad_value = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255)) 102 | padded_image = cv2.copyMakeBorder(image, pad_top, pad_bottom, pad_left, pad_right, cv2.BORDER_CONSTANT, value=pad_value) 103 | 104 | cropped_image = padded_image[bbox[0][1]:bbox[1][1], bbox[0][0]:bbox[1][0]] 105 | 106 | # Mark missing keypoints 107 | keypoints = np.array(keypoints) 108 | missing_idxs = [] 109 | for i in range(keypoints.shape[0]): 110 | if keypoints[i, 0] == 0 and keypoints[i, 1] == 0: 111 | missing_idxs.append(i) 112 | 113 | # Update keypoints 114 | keypoints[:, 0] = keypoints[:, 0] - bbox[0][0] + pad_left 115 | keypoints[:, 1] = keypoints[:, 1] - bbox[0][1] + pad_top 116 | 117 | # Restore missing keypoints 118 | for i in missing_idxs: 119 | keypoints[i, 0] = 0 120 | keypoints[i, 1] = 0 121 | 122 | return cropped_image, keypoints 123 | 124 | -------------------------------------------------------------------------------- /src/utils/visualizer.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | 3 | 4 | def visualize_keypoints(image, keypoints, visibility=None, edges=None, point_color=(0, 255, 0), text_color=(0, 0, 0)): 5 | """Visualize keypoints 6 | 7 | Args: 8 | image: Input image 9 | keypoints: Keypoints in format [[x1, y1], [x2, y2], ...] 10 | visibility [list]: List of visibilities of keypoints. 0: occluded, 1: visible 11 | 12 | Returns: 13 | Visualized image 14 | """ 15 | 16 | draw = image.copy() 17 | for i, p in enumerate(keypoints): 18 | x, y = p[0], p[1] 19 | tmp_point_color = point_color 20 | if visibility is not None and not int(visibility[i]): 21 | tmp_point_color = (100, 100, 100) 22 | draw = cv2.circle(draw, center=(int(x), int(y)), 23 | color=tmp_point_color, radius=5, thickness=-1) 24 | draw = cv2.putText(draw, str(i), (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 25 | 0.5, text_color, 1, cv2.LINE_AA) 26 | 27 | if edges is not None and visibility is not None: 28 | for edge_chain in edges: 29 | for i in range(len(edge_chain) - 1): 30 | if visibility[edge_chain[i]] and visibility[edge_chain[i+1]]: 31 | p1 = tuple(keypoints[edge_chain[i]]) 32 | p2 = tuple(keypoints[edge_chain[i+1]]) 33 | cv2.line(draw, p1, p2, (0, 0, 255), 2) 34 | 35 | return draw 36 | -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import importlib 3 | import json 4 | import tensorflow as tf 5 | 6 | for gpu in tf.config.experimental.list_physical_devices('GPU'): 7 | tf.compat.v2.config.experimental.set_memory_growth(gpu, True) 8 | 9 | parser = argparse.ArgumentParser() 10 | parser.add_argument( 11 | '-c', 12 | '--conf_file', default="config.json", 13 | help='Configuration file') 14 | parser.add_argument( 15 | '-m', 16 | '--model', default="model.h5", 17 | help='Path to h5 model') 18 | args = parser.parse_args() 19 | 20 | # Open and load the config json 21 | with open(args.conf_file) as config_buffer: 22 | config = json.loads(config_buffer.read()) 23 | 24 | trainer = importlib.import_module("src.trainers.{}".format(config["trainer"])) 25 | trainer.test(config, args.model) 26 | -------------------------------------------------------------------------------- /tools/lsp_data_to_json.py: -------------------------------------------------------------------------------- 1 | import os 2 | import argparse 3 | import json 4 | from scipy.io import loadmat 5 | 6 | parser = argparse.ArgumentParser() 7 | parser.add_argument( 8 | '-i', 9 | '--input_file', default="data/lsp_dataset/joints.mat", 10 | help='Path to lsp annotation file') 11 | parser.add_argument( 12 | '-f', 13 | '--image_folder', default="data/lsp_dataset/images", 14 | help='Image folder') 15 | parser.add_argument( 16 | '-o', 17 | '--output_file', default="data/lsp_dataset/labels.json", 18 | help='Output json file') 19 | args = parser.parse_args() 20 | 21 | # Load annotations 22 | annotations = loadmat(args.input_file) 23 | joints = annotations["joints"] 24 | joints_shape = joints.shape 25 | print(joints_shape) 26 | if joints_shape[0] == 3 and joints_shape[1] == 14: # LSP (3, 14, n) -> (n, 14, 3) 27 | joints = joints.swapaxes(0, 2) 28 | elif joints_shape[0] == 14 and joints_shape[1] == 3: # LSPET (14, 3, n) -> (n, 14, 3) 29 | joints = joints.swapaxes(0, 2) 30 | joints = joints.swapaxes(1, 2) 31 | 32 | # List image files 33 | images = [i for i in os.listdir(args.image_folder) if i.endswith("jpg")] 34 | images.sort() 35 | 36 | # Build new annotations 37 | labels = [] 38 | w_img_count = 0 39 | for i in range(len(images)): 40 | points = joints[i, :, :2] 41 | visibility = joints[i, :, 2] 42 | 43 | wrong_label = False 44 | for p in points: 45 | if p[0] <= 0 or p[1] <= 0: 46 | wrong_label = True 47 | break 48 | if wrong_label: 49 | w_img_count += 1 50 | print(w_img_count) 51 | continue 52 | 53 | label = {"image": images[i], "points": points.tolist(), "visibility": visibility.tolist()} 54 | labels.append(label) 55 | 56 | print("Len: ", len(labels)) 57 | 58 | with open(args.output_file, "w") as fp: 59 | json.dump(labels, fp) 60 | 61 | -------------------------------------------------------------------------------- /tools/merge_lsp.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | 4 | with open("data/pushup/train.json", "r") as fp: 5 | train1 = json.load(fp) 6 | for l in train1: 7 | l["is_pushing_up"] = True 8 | with open("data/lsp_lspet_7points/train.json", "r") as fp: 9 | train2 = json.load(fp) 10 | for l in train2: 11 | l["is_pushing_up"] = False 12 | 13 | with open("data/pushup/val.json", "r") as fp: 14 | val1 = json.load(fp) 15 | for l in val1: 16 | l["is_pushing_up"] = True 17 | with open("data/lsp_lspet_7points/val.json", "r") as fp: 18 | val2 = json.load(fp) 19 | for l in val2: 20 | l["is_pushing_up"] = False 21 | 22 | with open("data/pushup/test.json", "r") as fp: 23 | test1 = json.load(fp) 24 | for l in test1: 25 | l["is_pushing_up"] = True 26 | with open("data/lsp_lspet_7points/test.json", "r") as fp: 27 | test2 = json.load(fp) 28 | for l in test2: 29 | l["is_pushing_up"] = False 30 | 31 | with open("data/lsp_lspet_pushup/train.json", "w") as fp: 32 | json.dump(train1 + train2, fp) 33 | with open("data/lsp_lspet_pushup/val.json", "w") as fp: 34 | json.dump(val1 + val2, fp) 35 | with open("data/lsp_lspet_pushup/test.json", "w") as fp: 36 | json.dump(test1 + test2, fp) 37 | 38 | os.system("mkdir -p data/lsp_lspet_pushup/images") 39 | os.system("cp data/pushup/images/* data/lsp_lspet_pushup/images") 40 | os.system("cp data/lsp_lspet_7points/images/* data/lsp_lspet_pushup/images") -------------------------------------------------------------------------------- /tools/merge_lsp_lspet_pushup.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | 4 | with open("data/pushup/train.json", "r") as fp: 5 | train1 = json.load(fp) 6 | for l in train1: 7 | l["is_pushing_up"] = True 8 | with open("data/lsp_lspet_7points/train.json", "r") as fp: 9 | train2 = json.load(fp) 10 | for l in train2: 11 | l["is_pushing_up"] = False 12 | 13 | with open("data/pushup/val.json", "r") as fp: 14 | val1 = json.load(fp) 15 | for l in val1: 16 | l["is_pushing_up"] = True 17 | with open("data/lsp_lspet_7points/val.json", "r") as fp: 18 | val2 = json.load(fp) 19 | for l in val2: 20 | l["is_pushing_up"] = False 21 | 22 | with open("data/pushup/test.json", "r") as fp: 23 | test1 = json.load(fp) 24 | for l in test1: 25 | l["is_pushing_up"] = True 26 | with open("data/lsp_lspet_7points/test.json", "r") as fp: 27 | test2 = json.load(fp) 28 | for l in test2: 29 | l["is_pushing_up"] = False 30 | 31 | with open("data/lsp_lspet_pushup/train.json", "w") as fp: 32 | json.dump(train1 + train2, fp) 33 | with open("data/lsp_lspet_pushup/val.json", "w") as fp: 34 | json.dump(val1 + val2, fp) 35 | with open("data/lsp_lspet_pushup/test.json", "w") as fp: 36 | json.dump(test1 + test2, fp) 37 | 38 | os.system("mkdir -p data/lsp_lspet_pushup/images") 39 | os.system("cp data/pushup/images/* data/lsp_lspet_pushup/images") 40 | os.system("cp data/lsp_lspet_7points/images/* data/lsp_lspet_pushup/images") -------------------------------------------------------------------------------- /tools/merge_mpii_pushup.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | 4 | with open("data/pushup/train.json", "r") as fp: 5 | train1 = json.load(fp) 6 | for l in train1: 7 | l["is_pushing_up"] = True 8 | with open("data/mpii/train.json", "r") as fp: 9 | train2 = json.load(fp) 10 | for l in train2: 11 | x = l["points"] 12 | l["points"] = [x[10], x[11], x[12], x[9], x[13], x[14], x[15]] 13 | x = l["visibility"] 14 | l["visibility"] = [x[10], x[11], x[12], x[9], x[13], x[14], x[15]] 15 | l["is_pushing_up"] = False 16 | 17 | with open("data/pushup/val.json", "r") as fp: 18 | val1 = json.load(fp) 19 | for l in val1: 20 | l["is_pushing_up"] = True 21 | with open("data/mpii/val.json", "r") as fp: 22 | val2 = json.load(fp) 23 | for l in val2: 24 | x = l["points"] 25 | l["points"] = [x[10], x[11], x[12], x[9], x[13], x[14], x[15]] 26 | x = l["visibility"] 27 | l["visibility"] = [x[10], x[11], x[12], x[9], x[13], x[14], x[15]] 28 | l["is_pushing_up"] = False 29 | 30 | with open("data/pushup/test.json", "r") as fp: 31 | test1 = json.load(fp) 32 | for l in test1: 33 | l["is_pushing_up"] = True 34 | with open("data/mpii/test.json", "r") as fp: 35 | test2 = json.load(fp) 36 | for l in test2: 37 | x = l["points"] 38 | l["points"] = [x[10], x[11], x[12], x[9], x[13], x[14], x[15]] 39 | x = l["visibility"] 40 | l["visibility"] = [x[10], x[11], x[12], x[9], x[13], x[14], x[15]] 41 | l["is_pushing_up"] = False 42 | 43 | os.system("mkdir -p data/mpii_pushup/images") 44 | with open("data/mpii_pushup/train.json", "w") as fp: 45 | json.dump(train1 + train2, fp) 46 | with open("data/mpii_pushup/val.json", "w") as fp: 47 | json.dump(val1 + val2, fp) 48 | with open("data/mpii_pushup/test.json", "w") as fp: 49 | json.dump(test1 + test2, fp) 50 | 51 | os.system("cp data/pushup/images/* data/mpii_pushup/images") 52 | os.system("cp data/mpii/images/* data/mpii_pushup/images") -------------------------------------------------------------------------------- /tools/process_data.py: -------------------------------------------------------------------------------- 1 | import json 2 | import random 3 | import cv2 4 | import os 5 | import copy 6 | import shutil 7 | 8 | data = [] 9 | with open("data/pushup/train.json") as fp: 10 | data1 = json.load(fp)["labels"] 11 | with open("data/pushup/test.json") as fp: 12 | data2 = json.load(fp)["labels"] 13 | with open("data/pushup/val.json") as fp: 14 | data3 = json.load(fp)["labels"] 15 | 16 | data = [d for d in data if d["contains_person"] and d["is_pushing_up"]] 17 | print(len(data)) 18 | 19 | mpii1 = [d for d in data1 if d["contains_person"] and not d["is_pushing_up"]] 20 | mpii2 = [d for d in data2 if d["contains_person"] and not d["is_pushing_up"]] 21 | mpii3 = [d for d in data3 if d["contains_person"] and not d["is_pushing_up"]] 22 | 23 | print(len(mpii1)) 24 | print(len(mpii2)) 25 | print(len(mpii3)) 26 | exit(0) 27 | 28 | kk = {} 29 | for d in data: 30 | d["video"] = d["image"].split("_")[0] 31 | kk[d["image"]] = d 32 | 33 | new_data = [] 34 | for k in kk.values(): 35 | new_data.append(k) 36 | 37 | data = new_data 38 | random.shuffle(data) 39 | 40 | for d in data: 41 | with open(os.path.join("labels", "{}.json".format(d["image"])), "w") as fp: 42 | json.dump(d["points"], fp) 43 | 44 | # for d in data: 45 | # # if d["video"] == "317": 46 | # # img = cv2.imread(os.path.join("data/pushup/images", d["image"])) 47 | # # cv2.imshow("Image", img) 48 | # # cv2.waitKey(0) 49 | # shutil.copy(os.path.join("data/pushup/images", d["image"]), 50 | # os.path.join("data/pushup/new_images", d["image"])) 51 | 52 | # videos = {} 53 | # for d in data: 54 | # if d["video"] not in videos: 55 | # videos[d["video"]] = 1 56 | # else: 57 | # videos[d["video"]] += 1 58 | 59 | # video_img_count = [] 60 | # for video, img_count in videos.items(): 61 | # print("{}:{}".format(video, img_count), end="; ") 62 | 63 | # print("We have {} videos", len(videos.keys())) -------------------------------------------------------------------------------- /tools/process_pushup_data.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pathlib 3 | import json 4 | import shutil 5 | import random 6 | import cv2 7 | import numpy as np 8 | from src.utils.pre_processing import calculate_bbox_from_keypoints, square_crop_with_keypoints, random_occlusion 9 | 10 | 11 | label_files = list(os.listdir("data/pushup/labels")) 12 | label_files = [l for l in label_files if l.endswith("json")] 13 | 14 | labels = [] 15 | for lf in label_files: 16 | with open(os.path.join("data/pushup/labels", lf)) as fp: 17 | points = json.load(fp) 18 | if len(points) != 7: 19 | continue 20 | label = { 21 | "video": int(lf.split("_")[0]), 22 | "image": lf.replace(".json", ""), 23 | "points": points, 24 | "visibility": [1,1,1,1,1,1,1] 25 | } 26 | image = cv2.imread(os.path.join("data/pushup/images", label["image"])) 27 | label["bbox"] = [[0,0],[image.shape[1],image.shape[0]]] 28 | label["bbox"] = np.array(label["bbox"]).astype(int).tolist() 29 | labels.append(label) 30 | 31 | # for label in labels: 32 | # # Visualize 33 | # for label in labels: 34 | # image = cv2.imread(os.path.join("data/pushup/images", label["image"])) 35 | 36 | # draw = image.copy() 37 | # for i, p in enumerate(label["points"]): 38 | # x, y = p 39 | # color = (0, 0, 255) if int(label["visibility"][i]) else (255, 0, 0) 40 | # draw = cv2.circle(draw, center=(int(x), int(y)), color=color, radius=1, thickness=2) 41 | # draw = cv2.putText(draw, str(i), (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 42 | # 0.5, (0, 255, 0), 1, cv2.LINE_AA) 43 | # p1 = tuple(label["bbox"][0]) 44 | # p2 = tuple(label["bbox"][1]) 45 | # draw = cv2.rectangle(draw, p1, p2, (0,0,255), 2) 46 | # cv2.imshow("Image", draw) 47 | # # cv2.waitKey(0) 48 | 49 | # cropped_image, keypoints = square_crop_with_keypoints(image, label["bbox"], label["points"], "random") 50 | # draw = cropped_image.copy() 51 | # for i, p in enumerate(keypoints): 52 | # x, y = p 53 | # color = (0, 0, 255) if int(label["visibility"][i]) else (255, 0, 0) 54 | # draw = cv2.circle(draw, center=(int(x), int(y)), color=color, radius=1, thickness=2) 55 | # draw = cv2.putText(draw, str(i), (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 56 | # 0.5, (0, 255, 0), 1, cv2.LINE_AA) 57 | 58 | # cv2.imshow("Square cropped", draw) 59 | # # cv2.waitKey(0) 60 | 61 | # # Test random occlusion 62 | # cropped_image, visibility = random_occlusion(cropped_image, keypoints, visibility=None, rect_ratio=((0.2, 0.5), (0.2, 0.5))) 63 | # draw = cropped_image.copy() 64 | # for i, p in enumerate(keypoints): 65 | # print(i) 66 | # print(visibility[i]) 67 | # x, y = p 68 | # color = (0, 0, 255) if int(visibility[i]) else (255, 0, 0) 69 | # draw = cv2.circle(draw, center=(int(x), int(y)), color=color, radius=1, thickness=2) 70 | # draw = cv2.putText(draw, str(i), (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 71 | # 0.5, (0, 255, 0), 1, cv2.LINE_AA) 72 | 73 | # cv2.imshow("Random occlusion", draw) 74 | # cv2.waitKey(0) 75 | 76 | print("n_labels:", len(labels)) 77 | 78 | # for label in labels: 79 | # label.pop("video", None) 80 | 81 | train_labels = [l for l in labels if l["video"] < 480] 82 | # print("train_labels", len(train_labels)) 83 | # train_videos = set(l["video"] for l in train_labels) 84 | # print("train_videos", len(train_videos)) 85 | 86 | val_labels = [l for l in labels if l["video"] >= 480 and l["video"] < 550] 87 | # print("val_labels", len(val_labels)) 88 | # val_videos = set(l["video"] for l in val_labels) 89 | # print("val_videos", len(val_videos)) 90 | 91 | test_labels = [l for l in labels if l["video"] >= 550] 92 | # print("test_labels", len(test_labels)) 93 | # test_videos = set(l["video"] for l in test_labels) 94 | # print("test_videos", len(test_videos)) 95 | 96 | with open("data/pushup/train.json", "w") as fp: 97 | json.dump(train_labels, fp) 98 | 99 | with open("data/pushup/val.json", "w") as fp: 100 | json.dump(val_labels, fp) 101 | 102 | with open("data/pushup/test.json", "w") as fp: 103 | json.dump(test_labels, fp) 104 | 105 | 106 | -------------------------------------------------------------------------------- /tools/split_data_mpii.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | import pathlib 4 | import json 5 | import shutil 6 | import random 7 | import cv2 8 | import numpy as np 9 | 10 | from src.utils.pre_processing import calculate_bbox_from_keypoints, square_crop_with_keypoints 11 | from src.data_loaders.augmentation_utils import random_occlusion 12 | 13 | image_folder = "data/mpii/images" 14 | jsonfile = "data/mpii_annotations.json" 15 | 16 | # load train or val annotation 17 | with open(jsonfile) as anno_file: 18 | anno = json.load(anno_file) 19 | 20 | val_anno, train_anno = [], [] 21 | for idx, val in enumerate(anno): 22 | if val['isValidation'] == True: 23 | val_anno.append(anno[idx]) 24 | else: 25 | train_anno.append(anno[idx]) 26 | 27 | anno = train_anno + val_anno 28 | 29 | count = 0 30 | labels = [] 31 | for a in anno: 32 | 33 | if a["numOtherPeople"] != 0: 34 | continue 35 | 36 | l = {"image": a["img_paths"]} 37 | points = np.array(a["joint_self"]) 38 | l["points"] = points[:, :2].tolist() 39 | l["visibility"] = points[:, 2].tolist() 40 | 41 | for p in l["points"]: 42 | if p[0] == 0 and p[1] == 0: 43 | p[0] = -1 44 | p[1] = -1 45 | 46 | 47 | inside_points = [p for p in l["points"] if p[0] != -1 or p[1] != -1] 48 | l["bbox"] = calculate_bbox_from_keypoints(inside_points) 49 | l["bbox"] = np.array(l["bbox"]).astype(int).tolist() 50 | 51 | # Crop image 52 | image = cv2.imread(os.path.join(image_folder, l["image"])) 53 | image, keypoints = square_crop_with_keypoints(image, l["bbox"], l["points"], "random") 54 | l["points"] = keypoints.tolist() 55 | 56 | # BBox again 57 | inside_points = [p for p in l["points"] if p[0] != -1 or p[1] != -1] 58 | l["bbox"] = calculate_bbox_from_keypoints(inside_points) 59 | l["bbox"] = np.array(l["bbox"]).astype(int).tolist() 60 | 61 | l["image"] = "mpii_crop_{}.png".format(count) 62 | count += 1 63 | 64 | # cv2.imwrite(os.path.join("data/mpii/images2", l["image"]), image) 65 | 66 | # draw = image.copy() 67 | # for i, p in enumerate(l["points"]): 68 | # x, y = p 69 | # color = (0, 0, 255) if int(l["visibility"][i]) else (255, 0, 0) 70 | # draw = cv2.circle(draw, center=(int(x), int(y)), color=color, radius=1, thickness=2) 71 | # draw = cv2.putText(draw, str(i), (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 72 | # 0.5, (0, 255, 0), 1, cv2.LINE_AA) 73 | # draw = cv2.rectangle(draw, tuple(l["bbox"][0]), tuple(l["bbox"][1]), (0, 0, 255), 5) 74 | # cv2.imshow("Image", draw) 75 | # cv2.waitKey(0) 76 | 77 | 78 | labels.append(l) 79 | 80 | # # Visualize 81 | # for label in labels: 82 | # image = cv2.imread(os.path.join(image_folder, label["image"])) 83 | 84 | # draw = image.copy() 85 | # for i, p in enumerate(label["points"]): 86 | # x, y = p 87 | # color = (0, 0, 255) if int(label["visibility"][i]) else (255, 0, 0) 88 | # draw = cv2.circle(draw, center=(int(x), int(y)), color=color, radius=1, thickness=2) 89 | # draw = cv2.putText(draw, str(i), (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 90 | # 0.5, (0, 255, 0), 1, cv2.LINE_AA) 91 | # p1 = tuple(label["bbox"][0]) 92 | # p2 = tuple(label["bbox"][1]) 93 | # draw = cv2.rectangle(draw, p1, p2, (0,0,255), 2) 94 | # cv2.imshow("Image", draw) 95 | # # cv2.waitKey(0) 96 | 97 | # cropped_image, keypoints = square_crop_with_keypoints(image, label["bbox"], label["points"], "random") 98 | # draw = cropped_image.copy() 99 | # for i, p in enumerate(keypoints): 100 | # x, y = p 101 | # color = (0, 0, 255) if int(label["visibility"][i]) else (255, 0, 0) 102 | # draw = cv2.circle(draw, center=(int(x), int(y)), color=color, radius=1, thickness=2) 103 | # draw = cv2.putText(draw, str(i), (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 104 | # 0.5, (0, 255, 0), 1, cv2.LINE_AA) 105 | 106 | # cv2.imshow("Square cropped", draw) 107 | # # cv2.waitKey(0) 108 | 109 | # # Test random occlusion 110 | # cropped_image, visibility = random_occlusion(cropped_image, keypoints, visibility=None, rect_ratio=((0.2, 0.5), (0.2, 0.5))) 111 | # draw = cropped_image.copy() 112 | # for i, p in enumerate(keypoints): 113 | # print(i) 114 | # print(visibility[i]) 115 | # x, y = p 116 | # color = (0, 0, 255) if int(visibility[i]) else (255, 0, 0) 117 | # draw = cv2.circle(draw, center=(int(x), int(y)), color=color, radius=1, thickness=2) 118 | # draw = cv2.putText(draw, str(i), (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 119 | # 0.5, (0, 255, 0), 1, cv2.LINE_AA) 120 | 121 | # cv2.imshow("Random occlusion", draw) 122 | # cv2.waitKey(0) 123 | 124 | def save_label(file_name, labels): 125 | with open(os.path.join("data/mpii/", file_name), "w") as fp: 126 | json.dump(labels, fp) 127 | 128 | with open("data/mpii/labels.json", "w") as anno_file: 129 | json.dump(labels, anno_file) 130 | 131 | 132 | n_train = 9503 133 | n_val = 1000 134 | save_label("train.json", labels[:n_train]) 135 | save_label("val.json", labels[n_train:n_train+n_val]) 136 | save_label("test.json", labels[n_train+n_val:]) 137 | 138 | print(len(labels)) 139 | 140 | # with open("data/mpii/train.json", "w") as anno_file: 141 | # json.dump(train_anno, anno_file) 142 | 143 | # with open("data/mpii/val.json", "w") as anno_file: 144 | # json.dump(val_anno, anno_file) -------------------------------------------------------------------------------- /tools/split_lsp_lspet.py: -------------------------------------------------------------------------------- 1 | """ 2 | This script is used to concatenate datasets into a bigger dataset 3 | and split them into training, validation and test sets. 4 | """ 5 | import os 6 | import pathlib 7 | import json 8 | import shutil 9 | import random 10 | import cv2 11 | import numpy as np 12 | 13 | from src.utils.pre_processing import calculate_bbox_from_keypoints, square_crop_with_keypoints 14 | from src.data_loaders.augmentation_utils import random_occlusion 15 | 16 | class DatasetCreator: 17 | def __init__(self, root_folder): 18 | 19 | if os.path.exists(root_folder): 20 | print("Folder existed! Please choose a non-existed path. {}".format(root_folder)) 21 | exit(0) 22 | 23 | self.root_folder = root_folder 24 | self.image_folder = os.path.join(root_folder, "images") 25 | pathlib.Path(self.root_folder).mkdir(parents=True, exist_ok=True) 26 | pathlib.Path(self.image_folder).mkdir(parents=True, exist_ok=True) 27 | 28 | 29 | def save_label(self, file_name, labels): 30 | with open(os.path.join(self.root_folder, file_name), "w") as fp: 31 | json.dump(labels, fp) 32 | 33 | def add_set(self, image_folder, label_file, n_train, n_val, n_test, copy_images=True, shuffle=True): 34 | """Add subset into this dataset 35 | 36 | Args: 37 | image_folder (str): Path to image folder of subset 38 | label_file (str): Path to annotation file of subset 39 | n_train (int): Number of samples for training 40 | n_val (int): Number of samples for validation 41 | n_test (int): Number of samples for testing 42 | """ 43 | 44 | with open(label_file, "r") as fp: 45 | labels = json.load(fp) 46 | 47 | # Use all data? 48 | assert (len(labels) == n_train + n_val + n_test) 49 | 50 | if shuffle: 51 | random.seed(42) 52 | random.shuffle(labels) 53 | 54 | for label in labels: 55 | 56 | # Copy images 57 | if copy_images: 58 | 59 | # Rename image if duplicated 60 | if os.path.exists(os.path.join(self.image_folder, label["image"])): 61 | new_name = label["image"] 62 | filename, file_extension = os.path.splitext(label["image"]) 63 | extended_number = 2 64 | while True: 65 | new_name = "{}_ext{}{}".format(filename, extended_number, file_extension) 66 | if os.path.exists(os.path.join(self.image_folder, new_name)): 67 | extended_number += 1 68 | else: 69 | break 70 | shutil.copy( 71 | os.path.join(image_folder, label["image"]), 72 | os.path.join(self.image_folder, new_name), 73 | ) 74 | label["image"] = new_name 75 | else: 76 | shutil.copy( 77 | os.path.join(image_folder, label["image"]), 78 | os.path.join(self.image_folder, label["image"]), 79 | ) 80 | pass 81 | 82 | label["bbox"] = calculate_bbox_from_keypoints(label["points"]) 83 | label["bbox"] = np.array(label["bbox"]).astype(int).tolist() 84 | 85 | # # Visualize 86 | # for label in labels: 87 | # image = cv2.imread(os.path.join(self.image_folder, label["image"])) 88 | 89 | # draw = image.copy() 90 | # for i, p in enumerate(label["points"]): 91 | # x, y = p 92 | # color = (0, 0, 255) if int(label["visibility"][i]) else (255, 0, 0) 93 | # draw = cv2.circle(draw, center=(int(x), int(y)), color=color, radius=1, thickness=2) 94 | # draw = cv2.putText(draw, str(i), (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 95 | # 0.5, (0, 255, 0), 1, cv2.LINE_AA) 96 | # p1 = tuple(label["bbox"][0]) 97 | # p2 = tuple(label["bbox"][1]) 98 | # draw = cv2.rectangle(draw, p1, p2, (0,0,255), 2) 99 | # cv2.imshow("Image", draw) 100 | # # cv2.waitKey(0) 101 | 102 | # cropped_image, keypoints = square_crop_with_keypoints(image, label["bbox"], label["points"], "random") 103 | # draw = cropped_image.copy() 104 | # for i, p in enumerate(keypoints): 105 | # x, y = p 106 | # color = (0, 0, 255) if int(label["visibility"][i]) else (255, 0, 0) 107 | # draw = cv2.circle(draw, center=(int(x), int(y)), color=color, radius=1, thickness=2) 108 | # draw = cv2.putText(draw, str(i), (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 109 | # 0.5, (0, 255, 0), 1, cv2.LINE_AA) 110 | 111 | # cv2.imshow("Square cropped", draw) 112 | # # cv2.waitKey(0) 113 | 114 | # # Test random occlusion 115 | # cropped_image, visibility = random_occlusion(cropped_image, keypoints, visibility=None, rect_ratio=((0.2, 0.5), (0.2, 0.5))) 116 | # draw = cropped_image.copy() 117 | # for i, p in enumerate(keypoints): 118 | # print(i) 119 | # print(visibility[i]) 120 | # x, y = p 121 | # color = (0, 0, 255) if int(visibility[i]) else (255, 0, 0) 122 | # draw = cv2.circle(draw, center=(int(x), int(y)), color=color, radius=1, thickness=2) 123 | # draw = cv2.putText(draw, str(i), (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 124 | # 0.5, (0, 255, 0), 1, cv2.LINE_AA) 125 | 126 | # cv2.imshow("Random occlusion", draw) 127 | # cv2.waitKey(0) 128 | 129 | self.save_label("train.json", labels[:n_train]) 130 | self.save_label("val.json", labels[n_train:n_train+n_val]) 131 | self.save_label("test.json", labels[n_train+n_val:]) 132 | 133 | dataset = DatasetCreator("data/lsp_lspet") 134 | dataset.add_set("data/lsp_dataset/images", "data/lsp_dataset/labels.json", 1800, 100, 100) 135 | dataset.add_set("data/lspet_dataset/images", "data/lspet_dataset/labels.json", 3739, 100, 100) 136 | -------------------------------------------------------------------------------- /tools/split_lsp_lspet_7points.py: -------------------------------------------------------------------------------- 1 | """ 2 | This script is used to concatenate datasets into a bigger dataset 3 | and split them into training, validation and test sets. 4 | """ 5 | import os 6 | import pathlib 7 | import json 8 | import shutil 9 | import random 10 | import cv2 11 | import numpy as np 12 | 13 | from src.utils.pre_processing import calculate_bbox_from_keypoints, square_crop_with_keypoints 14 | from src.data_loaders.augmentation_utils import random_occlusion 15 | 16 | 17 | class DatasetCreator: 18 | def __init__(self, root_folder): 19 | 20 | if os.path.exists(root_folder): 21 | print("Folder existed! Please choose a non-existed path. {}".format(root_folder)) 22 | exit(0) 23 | 24 | self.root_folder = root_folder 25 | self.image_folder = os.path.join(root_folder, "images") 26 | pathlib.Path(self.root_folder).mkdir(parents=True, exist_ok=True) 27 | pathlib.Path(self.image_folder).mkdir(parents=True, exist_ok=True) 28 | 29 | 30 | def save_label(self, file_name, labels): 31 | with open(os.path.join(self.root_folder, file_name), "w") as fp: 32 | json.dump(labels, fp) 33 | 34 | def add_set(self, image_folder, label_file, n_train, n_val, n_test, copy_images=True, shuffle=True): 35 | """Add subset into this dataset 36 | 37 | Args: 38 | image_folder (str): Path to image folder of subset 39 | label_file (str): Path to annotation file of subset 40 | n_train (int): Number of samples for training 41 | n_val (int): Number of samples for validation 42 | n_test (int): Number of samples for testing 43 | """ 44 | 45 | with open(label_file, "r") as fp: 46 | labels = json.load(fp) 47 | 48 | # Use all data? 49 | assert (len(labels) == n_train + n_val + n_test) 50 | 51 | if shuffle: 52 | random.seed(42) 53 | random.shuffle(labels) 54 | 55 | for label in labels: 56 | 57 | # Copy images 58 | if copy_images: 59 | 60 | # Rename image if duplicated 61 | if os.path.exists(os.path.join(self.image_folder, label["image"])): 62 | new_name = label["image"] 63 | filename, file_extension = os.path.splitext(label["image"]) 64 | extended_number = 2 65 | while True: 66 | new_name = "{}_ext{}{}".format(filename, extended_number, file_extension) 67 | if os.path.exists(os.path.join(self.image_folder, new_name)): 68 | extended_number += 1 69 | else: 70 | break 71 | shutil.copy( 72 | os.path.join(image_folder, label["image"]), 73 | os.path.join(self.image_folder, new_name), 74 | ) 75 | label["image"] = new_name 76 | else: 77 | shutil.copy( 78 | os.path.join(image_folder, label["image"]), 79 | os.path.join(self.image_folder, label["image"]), 80 | ) 81 | pass 82 | 83 | label["bbox"] = calculate_bbox_from_keypoints(label["points"]) 84 | label["bbox"] = np.array(label["bbox"]).astype(int).tolist() 85 | 86 | # # Visualize 87 | # for label in labels: 88 | # image = cv2.imread(os.path.join(self.image_folder, label["image"])) 89 | 90 | # draw = image.copy() 91 | # for i, p in enumerate(label["points"]): 92 | # x, y = p 93 | # color = (0, 0, 255) if int(label["visibility"][i]) else (255, 0, 0) 94 | # draw = cv2.circle(draw, center=(int(x), int(y)), color=color, radius=1, thickness=2) 95 | # draw = cv2.putText(draw, str(i), (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 96 | # 0.5, (0, 255, 0), 1, cv2.LINE_AA) 97 | # p1 = tuple(label["bbox"][0]) 98 | # p2 = tuple(label["bbox"][1]) 99 | # draw = cv2.rectangle(draw, p1, p2, (0,0,255), 2) 100 | # cv2.imshow("Image", draw) 101 | # # cv2.waitKey(0) 102 | 103 | # cropped_image, keypoints = square_crop_with_keypoints(image, label["bbox"], label["points"], "random") 104 | # draw = cropped_image.copy() 105 | # for i, p in enumerate(keypoints): 106 | # x, y = p 107 | # color = (0, 0, 255) if int(label["visibility"][i]) else (255, 0, 0) 108 | # draw = cv2.circle(draw, center=(int(x), int(y)), color=color, radius=1, thickness=2) 109 | # draw = cv2.putText(draw, str(i), (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 110 | # 0.5, (0, 255, 0), 1, cv2.LINE_AA) 111 | 112 | # cv2.imshow("Square cropped", draw) 113 | # # cv2.waitKey(0) 114 | 115 | # # Test random occlusion 116 | # cropped_image, visibility = random_occlusion(cropped_image, keypoints, visibility=None, rect_ratio=((0.2, 0.5), (0.2, 0.5))) 117 | # draw = cropped_image.copy() 118 | # for i, p in enumerate(keypoints): 119 | # print(i) 120 | # print(visibility[i]) 121 | # x, y = p 122 | # color = (0, 0, 255) if int(visibility[i]) else (255, 0, 0) 123 | # draw = cv2.circle(draw, center=(int(x), int(y)), color=color, radius=1, thickness=2) 124 | # draw = cv2.putText(draw, str(i), (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 125 | # 0.5, (0, 255, 0), 1, cv2.LINE_AA) 126 | 127 | # cv2.imshow("Random occlusion", draw) 128 | # cv2.waitKey(0) 129 | 130 | for i, label in enumerate(labels): 131 | ps = label["points"] 132 | labels[i]["points"] = [ps[6], ps[7], ps[8], ps[13], ps[9], ps[10], ps[11]] 133 | vs = label["visibility"] 134 | labels[i]["visibility"] = [vs[6], vs[7], vs[8], vs[13], vs[9], vs[10], vs[11]] 135 | 136 | 137 | self.save_label("train.json", labels[:n_train]) 138 | self.save_label("val.json", labels[n_train:n_train+n_val]) 139 | self.save_label("test.json", labels[n_train+n_val:]) 140 | 141 | dataset = DatasetCreator("data/lsp_lspet_7points") 142 | dataset.add_set("data/lsp_dataset/images", "data/lsp_dataset/labels.json", 1800, 100, 100) 143 | dataset.add_set("data/lspet_dataset/images", "data/lspet_dataset/labels.json", 3739, 100, 100) 144 | -------------------------------------------------------------------------------- /tools/split_mpii.py: -------------------------------------------------------------------------------- 1 | """ 2 | This script is used to concatenate datasets into a bigger dataset 3 | and split them into training, validation and test sets. 4 | """ 5 | import os 6 | import pathlib 7 | import json 8 | import shutil 9 | import random 10 | import cv2 11 | import numpy as np 12 | 13 | from src.utils.pre_processing import calculate_bbox_from_keypoints, square_crop_with_keypoints 14 | from src.data_loaders.augmentation_utils import random_occlusion 15 | 16 | class DatasetCreator: 17 | def __init__(self, root_folder): 18 | 19 | if os.path.exists(root_folder): 20 | print("Folder existed! Please choose a non-existed path. {}".format(root_folder)) 21 | exit(0) 22 | 23 | self.root_folder = root_folder 24 | self.image_folder = os.path.join(root_folder, "images") 25 | pathlib.Path(self.root_folder).mkdir(parents=True, exist_ok=True) 26 | pathlib.Path(self.image_folder).mkdir(parents=True, exist_ok=True) 27 | 28 | 29 | def save_label(self, file_name, labels): 30 | with open(os.path.join(self.root_folder, file_name), "w") as fp: 31 | json.dump(labels, fp) 32 | 33 | def add_set(self, image_folder, label_file, n_train, n_val, n_test, copy_images=True, shuffle=True): 34 | """Add subset into this dataset 35 | 36 | Args: 37 | image_folder (str): Path to image folder of subset 38 | label_file (str): Path to annotation file of subset 39 | n_train (int): Number of samples for training 40 | n_val (int): Number of samples for validation 41 | n_test (int): Number of samples for testing 42 | """ 43 | 44 | with open(label_file, "r") as fp: 45 | labels = json.load(fp) 46 | 47 | # Use all data? 48 | assert (len(labels) == n_train + n_val + n_test) 49 | 50 | if shuffle: 51 | random.seed(42) 52 | random.shuffle(labels) 53 | 54 | for label in labels: 55 | 56 | # Copy images 57 | if copy_images: 58 | 59 | # Rename image if duplicated 60 | if os.path.exists(os.path.join(self.image_folder, label["image"])): 61 | new_name = label["image"] 62 | filename, file_extension = os.path.splitext(label["image"]) 63 | extended_number = 2 64 | while True: 65 | new_name = "{}_ext{}{}".format(filename, extended_number, file_extension) 66 | if os.path.exists(os.path.join(self.image_folder, new_name)): 67 | extended_number += 1 68 | else: 69 | break 70 | shutil.copy( 71 | os.path.join(image_folder, label["image"]), 72 | os.path.join(self.image_folder, new_name), 73 | ) 74 | label["image"] = new_name 75 | else: 76 | shutil.copy( 77 | os.path.join(image_folder, label["image"]), 78 | os.path.join(self.image_folder, label["image"]), 79 | ) 80 | pass 81 | 82 | label["bbox"] = calculate_bbox_from_keypoints(label["points"]) 83 | label["bbox"] = np.array(label["bbox"]).astype(int).tolist() 84 | 85 | # # Visualize 86 | # for label in labels: 87 | # image = cv2.imread(os.path.join(self.image_folder, label["image"])) 88 | 89 | # draw = image.copy() 90 | # for i, p in enumerate(label["points"]): 91 | # x, y = p 92 | # color = (0, 0, 255) if int(label["visibility"][i]) else (255, 0, 0) 93 | # draw = cv2.circle(draw, center=(int(x), int(y)), color=color, radius=1, thickness=2) 94 | # draw = cv2.putText(draw, str(i), (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 95 | # 0.5, (0, 255, 0), 1, cv2.LINE_AA) 96 | # p1 = tuple(label["bbox"][0]) 97 | # p2 = tuple(label["bbox"][1]) 98 | # draw = cv2.rectangle(draw, p1, p2, (0,0,255), 2) 99 | # cv2.imshow("Image", draw) 100 | # # cv2.waitKey(0) 101 | 102 | # cropped_image, keypoints = square_crop_with_keypoints(image, label["bbox"], label["points"], "random") 103 | # draw = cropped_image.copy() 104 | # for i, p in enumerate(keypoints): 105 | # x, y = p 106 | # color = (0, 0, 255) if int(label["visibility"][i]) else (255, 0, 0) 107 | # draw = cv2.circle(draw, center=(int(x), int(y)), color=color, radius=1, thickness=2) 108 | # draw = cv2.putText(draw, str(i), (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 109 | # 0.5, (0, 255, 0), 1, cv2.LINE_AA) 110 | 111 | # cv2.imshow("Square cropped", draw) 112 | # # cv2.waitKey(0) 113 | 114 | # # Test random occlusion 115 | # cropped_image, visibility = random_occlusion(cropped_image, keypoints, visibility=None, rect_ratio=((0.2, 0.5), (0.2, 0.5))) 116 | # draw = cropped_image.copy() 117 | # for i, p in enumerate(keypoints): 118 | # print(i) 119 | # print(visibility[i]) 120 | # x, y = p 121 | # color = (0, 0, 255) if int(visibility[i]) else (255, 0, 0) 122 | # draw = cv2.circle(draw, center=(int(x), int(y)), color=color, radius=1, thickness=2) 123 | # draw = cv2.putText(draw, str(i), (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 124 | # 0.5, (0, 255, 0), 1, cv2.LINE_AA) 125 | 126 | # cv2.imshow("Random occlusion", draw) 127 | # cv2.waitKey(0) 128 | 129 | self.save_label("train.json", labels[:n_train]) 130 | self.save_label("val.json", labels[n_train:n_train+n_val]) 131 | self.save_label("test.json", labels[n_train+n_val:]) 132 | 133 | dataset = DatasetCreator("data/lsp_lspet") 134 | dataset.add_set("data/lsp_dataset/images", "data/lsp_dataset/labels.json", 1800, 100, 100) 135 | dataset.add_set("data/lspet_dataset/images", "data/lspet_dataset/labels.json", 3739, 100, 100) 136 | -------------------------------------------------------------------------------- /tools/visualize_data.py: -------------------------------------------------------------------------------- 1 | import os 2 | import argparse 3 | import json 4 | import cv2 5 | 6 | parser = argparse.ArgumentParser() 7 | parser.add_argument( 8 | '-i', 9 | '--image_folder', default="data/lsp_dataset/images", 10 | help='Image folder') 11 | parser.add_argument( 12 | '-l', 13 | '--labels', default="data/lsp_dataset/labels.json", 14 | help='Label/Annotation file') 15 | args = parser.parse_args() 16 | 17 | with open(args.labels, "r") as fp: 18 | labels = json.load(fp) 19 | 20 | cv2.namedWindow("Image", cv2.WINDOW_NORMAL) 21 | for label in labels: 22 | image_name = label["image"] 23 | points = label["points"] 24 | visibility = label["visibility"] 25 | image = cv2.imread(os.path.join(args.image_folder, image_name)) 26 | for i, p in enumerate(points): 27 | x, y = p 28 | color = (0, 0, 255) if int(visibility[i]) else (255, 0, 0) 29 | cv2.circle(image, center=(int(x), int(y)), color=(255, 0, 0), radius=1, thickness=2) 30 | image = cv2.putText(image, str(i), (int(x), int(y)), cv2.FONT_HERSHEY_SIMPLEX, 31 | 0.5, (0, 255, 0), 1, cv2.LINE_AA) 32 | cv2.imshow("Image", image) 33 | cv2.waitKey(0) 34 | 35 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pathlib 3 | import shutil 4 | import argparse 5 | import importlib 6 | import json 7 | import tensorflow as tf 8 | 9 | for gpu in tf.config.experimental.list_physical_devices('GPU'): 10 | tf.compat.v2.config.experimental.set_memory_growth(gpu, True) 11 | 12 | parser = argparse.ArgumentParser() 13 | parser.add_argument( 14 | '-c', 15 | '--conf_file', default="config.json", 16 | help='Configuration file') 17 | args = parser.parse_args() 18 | 19 | # Open and load the config json 20 | with open(args.conf_file) as config_buffer: 21 | config = json.loads(config_buffer.read()) 22 | 23 | # Create experiment folder and copy configuration file 24 | exp_folder = os.path.join("experiments", config["experiment_name"]) 25 | pathlib.Path(exp_folder).mkdir(parents=True, exist_ok=True) 26 | shutil.copy(args.conf_file, exp_folder) 27 | 28 | # Train model 29 | trainer = importlib.import_module("src.trainers.{}".format(config["trainer"])) 30 | trainer.train(config) 31 | --------------------------------------------------------------------------------