├── .gitignore ├── README.md ├── assets └── holistic.gif ├── configs ├── baxter │ ├── depthnet.yaml │ └── full.yaml ├── kuka │ ├── depthnet.yaml │ └── full.yaml └── panda │ ├── depthnet.yaml │ ├── full.yaml │ └── self_supervised │ ├── azure.yaml │ ├── kinect.yaml │ ├── orb.yaml │ ├── realsense.yaml │ └── synth.yaml ├── lib ├── config.py ├── core │ ├── config.py │ └── function.py ├── dataset │ ├── augmentations.py │ ├── const.py │ ├── dream.py │ ├── multiepoch_dataloader.py │ ├── roboutils.py │ └── samplers.py ├── models │ ├── backbones │ │ ├── HRnet.py │ │ ├── Resnet.py │ │ └── configs │ │ │ ├── hrnet_w32.yaml │ │ │ └── hrnet_w48.yaml │ ├── ctrnet │ │ ├── CtRNet.py │ │ ├── __init__.py │ │ ├── keypoint_seg_resnet.py │ │ └── mask_inference.py │ ├── depth_net.py │ └── full_net.py └── utils │ ├── BPnP.py │ ├── geometries.py │ ├── integral.py │ ├── mesh_renderer.py │ ├── metrics.py │ ├── transforms.py │ ├── urdf_robot.py │ ├── urdfpytorch │ ├── __init__.py │ ├── urdf.py │ ├── utils.py │ └── version.py │ ├── utils.py │ └── vis.py ├── requirements.txt └── scripts ├── test.py ├── train.py ├── train_depthnet.py ├── train_full.py └── train_sim2real.py /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__ 2 | *.pyc 3 | /.vscode 4 | 5 | /data 6 | /experiments 7 | /models 8 | /unit_test 9 | 10 | run.sh -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

HoRoPose: Real-time Holistic Robot Pose Estimation with Unknown States
(ECCV 2024)

2 | 3 |

4 | 5 |

6 | [![arXiv](https://img.shields.io/badge/arXiv-2402.05655-b31b1b.svg)](https://arxiv.org/abs/2402.05655.pdf) 7 |

8 |

9 | 10 | [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/real-time-holistic-robot-pose-estimation-with/robot-pose-estimation-on-dream-dataset)](https://paperswithcode.com/sota/robot-pose-estimation-on-dream-dataset?p=real-time-holistic-robot-pose-estimation-with) 11 | 12 |

13 | 14 |

15 | 16 | This is the official PyTorch implementation of the paper "Real-time Holistic Robot Pose Estimation with Unknown States". It provides an efficient framework for real-time robot pose estimation from RGB images without requiring known robot states. 17 | 18 | ## Installation 19 | This project's dependencies include python 3.9, pytorch 1.13, pytorch3d 0.7.4 and CUDA 11.7. 20 | The code is developed and tested on Ubuntu 20.04. 21 | 22 | ```bash 23 | pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 24 | pip install -r requirements.txt 25 | conda install pytorch3d=0.7.4 # from https://anaconda.org/pytorch3d/pytorch3d/files 26 | ``` 27 | 28 | ## Data and Model Preparation 29 | 30 | In our work, we use the following data and pretrained model: 31 | * The [DREAM datasets](https://drive.google.com/drive/folders/1uNK2n9wU4tRE07sM_r640wDhwmOwuxx6) consisting of both real and synthetic subsets, placed under `${ROOT}/data/dream/$`. 32 | * The [URDF](https://drive.google.com/drive/folders/17KNhy28pypheYfDCxgOjJf4IyUnOI3gW?) (Unified Robotics Description Format) of robot Panda, Kuka and Baxter, placed under `${ROOT}/data/deps/$`. 33 | * The [pretrained HRnet backbone](https://drive.google.com/file/d/1eqIftq1T_oIGhmCfkVYSM245Wj5xZaUo/view?) for pose estimation, placed under `${ROOT}/models/$`. 34 | * The openly available [foreground segmentation model](https://drive.google.com/drive/folders/1PpXe3p5dJt9EOM-fwvJ9TNStTWTQFDNK?) of 4 real datasets of Panda from [CtRNet](https://github.com/ucsdarclab/CtRNet-robot-pose-estimation), placed under `${ROOT}/models/panda_segmentation/$`. 35 | 36 | You can download the data and models through provided links. 37 | When finished, the directory tree should look like this. 38 | ``` 39 | ${ROOT} 40 | |-- data 41 | |-- dream 42 | | |-- real 43 | | | |-- panda-3cam_azure 44 | | | |-- panda-3cam_kinect360 45 | | | |-- panda-3cam_realsense 46 | | | |-- panda-orb 47 | | |-- synthetic 48 | | | |-- baxter_synth_test_dr 49 | | | |-- baxter_synth_train_dr 50 | | | |-- kuka_synth_test_dr 51 | | | |-- kuka_synth_test_photo 52 | | | |-- kuka_synth_train_dr 53 | | | |-- panda_synth_test_dr 54 | | | |-- panda_synth_test_photo 55 | | | |-- panda_synth_train_dr 56 | |-- deps 57 | | |-- baxter-description 58 | | |-- kuka-description 59 | | |-- panda-description 60 | |-- models 61 | |-- panda_segmentation 62 | | |-- azure.pth 63 | | |-- kinect.pth 64 | | |-- orb.pth 65 | | |-- realsense.pth 66 | |-- hrnet_w32-36af842e_roc.pth 67 | ``` 68 | 69 | ## Train 70 | We train our final model in a multi-stage fashion. All model is trained using a single NVIDIA V100 with 32GB GPU. Distributed training is also supported. 71 | 72 | We use config files in `configs/` to specify the training process. We recommend filling in the `exp_name` field in the config files with a unique name, as the checkpoints and event logs produced during training will be saved under `experiments/{exp_name}`. The correspondent config file will be automatically copied into this directory. 73 | 74 | ### Synthetic Datasets 75 | 76 | Firstly, pretrain the depthnet (root depth estimator) for 100 epochs for each robot arm: 77 | ```bash 78 | python scripts/train.py --config configs/panda/depthnet.yaml 79 | python scripts/train.py --config configs/kuka/depthnet.yaml 80 | python scripts/train.py --config configs/baxter/depthnet.yaml 81 | ``` 82 | 83 | With depthnet pretrained, we can train the full network for 100 epochs: 84 | ```bash 85 | python scripts/train.py --config configs/panda/full.yaml 86 | python scripts/train.py --config configs/kuka/full.yaml 87 | python scripts/train.py --config configs/baxter/full.yaml 88 | ``` 89 | To save your time when reproducing results of our paper, we provide readily-pretrained [depthnet model weights](https://drive.google.com/drive/folders/1rWC2bbA3U0IiZ7oDoKIVsWK_m4JkVarA?) for full network training. To use them, you can modify the `configs/{robot}/full.yaml` file by filling in the `pretrained_rootnet` field with the path of the downloaded `.pk` file. 90 | 91 | ### Real Datasets of Panda 92 | 93 | We employ self-supervised training for the 4 real datasets of Panda. 94 | 95 | Firstly, train the model on synthetic dataset using `configs/panda/self_supervised/synth.yaml` for 100 epochs. Be sure to fill in the `pretrained_rootnet` field with the path of the pretrained Panda depthnet weight in advance. 96 | 97 | ```bash 98 | python scripts/train.py --config configs/panda/self_supervised/synth.yaml 99 | ``` 100 | The training process above saves checkpoints for 4 real datasets for further self-supervised training (e.g. `experiments/{exp_name}/ckpt/curr_best_auc(add)_azure_model.pk`). 101 | 102 | When finished training on synthetic data, modify the `configs/panda/self_supervised/{real_dataset}.yaml` file by filling in the `pretrained_weight_on_synth` field with the path of the correspondent checkpoint. Then start self-supervised training with: 103 | 104 | ```bash 105 | python scripts/train.py --config configs/panda/self_supervised/azure.yaml 106 | python scripts/train.py --config configs/panda/self_supervised/kinect.yaml 107 | python scripts/train.py --config configs/panda/self_supervised/realsense.yaml 108 | python scripts/train.py --config configs/panda/self_supervised/orb.yaml 109 | ``` 110 | 111 | ## Test 112 | To evaluate models, simply run: 113 | ```bash 114 | python scripts/test.py --exp_path {path of the experiment folder} --dataset {dataset name} 115 | # e.g. python scripts/test.py -e experiments/panda_full --dataset panda_synth_test_dr 116 | # You can add '--vis_skeleton' to visualize the robot keypoint skeleton 117 | ``` 118 | Note that each model is presented in a folder containing ckpt/, log/ and config.yaml. After running test script, result/ will be generated inside the folder. 119 | 120 | ## Model Zoo 121 | You can download our final models from [Google Drive](https://drive.google.com/drive/folders/10Gz0NP39YyuvAlrhTa-XssWTDlyh9v80?usp=sharing) and evaluate them yourself. 122 | 123 | 124 | ## Citation 125 | If you use our code or models in your research, please cite with: 126 | ```bibtex 127 | @inproceedings{holisticrobotpose, 128 | author={Ban, Shikun and Fan, Juling and Ma, Xiaoxuan and Zhu, Wentao and Qiao, Yu and Wang, Yizhou}, 129 | title={Real-time Holistic Robot Pose Estimation with Unknown States}, 130 | booktitle = {European Conference on Computer Vision (ECCV)}, 131 | year = {2024} 132 | } 133 | ``` 134 | 135 | ## Acknowledgment 136 | This repo is built on the excellent work [RoboPose](https://github.com/ylabbe/robopose) and [CtRNet](https://github.com/ucsdarclab/CtRNet-robot-pose-estimation). Thank the authors for releasing their codes. 137 | -------------------------------------------------------------------------------- /assets/holistic.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Oliverbansk/Holistic-Robot-Pose-Estimation/77cd316a36ff8de0c736a66c692ca9276fdc2eae/assets/holistic.gif -------------------------------------------------------------------------------- /configs/baxter/depthnet.yaml: -------------------------------------------------------------------------------- 1 | 2 | # basic training 3 | no_cuda : False 4 | device_id : [0] 5 | 6 | # experiment name (also name of the saving directory) 7 | # model and log directory : {ROOT}/experiment/{exp_name}/ 8 | exp_name : "baxter_depthnet" 9 | 10 | # Data 11 | urdf_robot_name : "baxter" 12 | train_ds_names : "dream/synthetic/baxter_synth_train_dr" 13 | val_ds_names : None 14 | image_size : 256.0 15 | 16 | # Model 17 | backbone_name : "hrnet32" 18 | split_reg_head : False 19 | split_type : "2-first" 20 | use_rpmg: False 21 | 22 | # Optimizer 23 | lr : 1e-4 24 | weight_decay : 0. 25 | use_schedule : False 26 | schedule_type : "linear" 27 | n_epochs_warmup : 15 28 | start_decay : 100 29 | end_decay: 300 30 | final_decay : 0.01 31 | exponent : 0.96 32 | clip_gradient : 1.0 33 | 34 | # Training 35 | batch_size : 64 36 | epoch_size : 104975 37 | n_epochs : 700 38 | n_dataloader_workers : 6 39 | save_epoch_interval : None 40 | 41 | # Method 42 | use_direct_reg_branch : False 43 | n_iter : 4 44 | pose_loss_func : "smoothl1" 45 | rot_loss_func : "smoothl1" 46 | trans_loss_func : "smoothl1" 47 | kp3d_loss_func : "l2norm" 48 | kp2d_loss_func : "l2norm" 49 | rot_loss_weight : 1.0 50 | trans_loss_weight : 1.0 51 | use_2d_reprojection_loss : False 52 | use_3d_loss : True 53 | error2d_loss_weight : 1e-5 54 | error3d_loss_weight : 10.0 55 | joint_individual_weights : None 56 | 57 | use_integral_3d_branch : False 58 | use_limb_loss : False 59 | limb_loss_func : "l1" 60 | limb_loss_weight : 1.0 61 | use_uvd_3d_loss : True 62 | integral_3d_loss_func : "l2norm" 63 | integral_3d_loss_weight : 1.0 64 | use_xyz_3d_loss : False 65 | integral_xyz_3d_loss_func : "l2norm" 66 | integral_xyz_3d_loss_weight : 1.0 67 | bbox_3d_shape : 68 | - 1300 69 | - 1300 70 | - 1300 71 | reference_keypoint_id : 0 # 0:base 72 | 73 | use_pretrained_direct_reg_weights: False 74 | pretrained_direct_reg_weights_path: None 75 | 76 | # rootnet 77 | use_rootnet: True 78 | depth_loss_func : "l1" 79 | use_rootnet_xy_branch : False 80 | xy_loss_func : "mse" 81 | use_origin_bbox : False 82 | use_extended_bbox : True 83 | extend_ratio : [0.2, 0.13] 84 | use_rootnet_with_angle: False 85 | 86 | # Resume 87 | resume_run : False 88 | resume_experiment_name : "" 89 | -------------------------------------------------------------------------------- /configs/baxter/full.yaml: -------------------------------------------------------------------------------- 1 | 2 | # basic training 3 | no_cuda : False 4 | device_id : [0] 5 | 6 | # experiment name (also name of the saving directory) 7 | # model and log directory : {ROOT}/experiment/{exp_name}/ 8 | exp_name : "baxter_full" 9 | 10 | # Data 11 | urdf_robot_name : "baxter" 12 | train_ds_names : "dream/synthetic/baxter_synth_train_dr" 13 | val_ds_names : None 14 | image_size : 256.0 15 | 16 | # Model 17 | backbone_name : "resnet50" 18 | # integral_backbone_name : "resnet34" 19 | rootnet_backbone_name : "hrnet32" 20 | rootnet_image_size : 256.0 21 | other_image_size : 256.0 22 | use_rpmg: False 23 | 24 | # Optimizer 25 | lr : 1e-4 26 | weight_decay : 0. 27 | use_schedule : True 28 | schedule_type : "exponential" 29 | n_epochs_warmup : 0 30 | start_decay : 23 31 | end_decay: 90 32 | final_decay : 0.01 33 | exponent : 0.95 34 | clip_gradient : 5.0 35 | 36 | 37 | # Training 38 | batch_size : 64 39 | epoch_size : 104950 40 | n_epochs : 700 41 | n_dataloader_workers : 6 42 | save_epoch_interval : None 43 | 44 | # Method 45 | use_direct_reg_branch : True 46 | n_iter : 4 47 | pose_loss_func : "mse" 48 | rot_loss_func : "mse" 49 | trans_loss_func : "l2norm" 50 | depth_loss_func : "l1" 51 | uv_loss_func : "l2norm" 52 | kp2d_loss_func : "l2norm" 53 | kp3d_loss_func : "l2norm" 54 | kp2d_int_loss_func : "l2norm" 55 | kp3d_int_loss_func : "l2norm" 56 | align_3d_loss_func : "l2norm" 57 | pose_loss_weight : 1.0 58 | rot_loss_weight : 1.0 59 | trans_loss_weight : 1.0 60 | depth_loss_weight : 1.0 61 | uv_loss_weight : 1.0 62 | kp2d_loss_weight : 10.0 63 | kp3d_loss_weight : 10.0 64 | kp2d_int_loss_weight : 10.0 65 | kp3d_int_loss_weight : 10.0 66 | align_3d_loss_weight : 0.0 67 | joint_individual_weights : None 68 | use_joint_valid_mask : False 69 | rot_iterative_matmul : False 70 | fix_root : True 71 | bbox_3d_shape : 72 | - 1300 73 | - 1300 74 | - 1300 75 | reference_keypoint_id : 0 # 0:base 76 | fix_truncation : False 77 | 78 | use_pretrained_direct_reg_weights: False 79 | pretrained_direct_reg_weights_path: None 80 | 81 | use_pretrained_integral : False 82 | pretrained_integral_weights_path: None 83 | 84 | 85 | # rootnet (+ integral/regression) 86 | use_rootnet: True 87 | resample : False 88 | rootnet_depth_loss_weight : 1.0 89 | depth_loss_func : "l1" 90 | use_rootnet_xy_branch : False 91 | xy_loss_func : "mse" 92 | pretrained_rootnet: "experiments/baxter_rootnet_ref0_1028/ckpt/curr_best_root_depth_model.pk" 93 | use_origin_bbox : False 94 | use_extended_bbox : True 95 | 96 | use_rootnet_with_reg_int_shared_backbone : True 97 | use_rootnet_with_reg_with_int_separate_backbone : False 98 | 99 | # Resume 100 | resume_run : False 101 | resume_experiment_name : "panda_rootnetwithreguv_pretrainedrootnet_extendedbbox_transl2norm_3dw5_usejointmask_notruncate_ref3_lr1e-4con_0911" 102 | -------------------------------------------------------------------------------- /configs/kuka/depthnet.yaml: -------------------------------------------------------------------------------- 1 | 2 | # basic training 3 | no_cuda : False 4 | device_id : [0] 5 | 6 | # experiment name (also name of the saving directory) 7 | # model and log directory : {ROOT}/experiment/{exp_name}/ 8 | exp_name : "kuka_depthnet" 9 | 10 | # Data 11 | urdf_robot_name : "kuka" 12 | train_ds_names : "dream/synthetic/kuka_synth_train_dr" 13 | val_ds_names : None 14 | image_size : 256.0 15 | 16 | # Model 17 | backbone_name : "hrnet32" 18 | split_reg_head : False 19 | split_type : "2-first" 20 | use_rpmg: False 21 | 22 | # Optimizer 23 | lr : 1e-4 24 | weight_decay : 0. 25 | use_schedule : False 26 | schedule_type : "linear" 27 | n_epochs_warmup : 15 28 | start_decay : 100 29 | end_decay: 300 30 | final_decay : 0.01 31 | exponent : 0.96 32 | clip_gradient : 1.0 33 | 34 | # Training 35 | batch_size : 64 36 | epoch_size : 104975 37 | n_epochs : 700 38 | n_dataloader_workers : 6 39 | save_epoch_interval : None 40 | 41 | # Method 42 | use_direct_reg_branch : False 43 | n_iter : 4 44 | pose_loss_func : "smoothl1" 45 | rot_loss_func : "smoothl1" 46 | trans_loss_func : "smoothl1" 47 | kp3d_loss_func : "l2norm" 48 | kp2d_loss_func : "l2norm" 49 | rot_loss_weight : 1.0 50 | trans_loss_weight : 1.0 51 | use_2d_reprojection_loss : False 52 | use_3d_loss : True 53 | error2d_loss_weight : 1e-5 54 | error3d_loss_weight : 10.0 55 | joint_individual_weights : None 56 | 57 | use_integral_3d_branch : False 58 | use_limb_loss : False 59 | limb_loss_func : "l1" 60 | limb_loss_weight : 1.0 61 | use_uvd_3d_loss : True 62 | integral_3d_loss_func : "l2norm" 63 | integral_3d_loss_weight : 1.0 64 | use_xyz_3d_loss : False 65 | integral_xyz_3d_loss_func : "l2norm" 66 | integral_xyz_3d_loss_weight : 1.0 67 | bbox_3d_shape : 68 | - 1300 69 | - 1300 70 | - 1300 71 | reference_keypoint_id : 3 # 0:base 72 | 73 | use_pretrained_direct_reg_weights: False 74 | pretrained_direct_reg_weights_path: None 75 | 76 | # rootnet 77 | use_rootnet: True 78 | depth_loss_func : "l1" 79 | use_rootnet_xy_branch : False 80 | xy_loss_func : "mse" 81 | use_origin_bbox : False 82 | use_extended_bbox : True 83 | extend_ratio : [0.2, 0.13] 84 | use_rootnet_with_angle: False 85 | 86 | # Resume 87 | resume_run : False 88 | resume_experiment_name : "" 89 | -------------------------------------------------------------------------------- /configs/kuka/full.yaml: -------------------------------------------------------------------------------- 1 | 2 | # basic training 3 | no_cuda : False 4 | device_id : [0] 5 | 6 | # experiment name (also name of the saving directory) 7 | # model and log directory : {ROOT}/experiment/{exp_name}/ 8 | exp_name : "kuka_full" 9 | 10 | # Data 11 | urdf_robot_name : "kuka" 12 | train_ds_names : "dream/synthetic/kuka_synth_train_dr" 13 | val_ds_names : None 14 | image_size : 256.0 15 | 16 | # Model 17 | backbone_name : "resnet50" 18 | rootnet_backbone_name : "hrnet32" 19 | rootnet_image_size : 256.0 20 | other_image_size : 256.0 21 | use_rpmg: False 22 | jitter: True 23 | occlusion : True 24 | other_aug : True 25 | 26 | # Optimizer 27 | lr : 1e-4 28 | weight_decay : 0. 29 | use_schedule : True 30 | schedule_type : "exponential" 31 | n_epochs_warmup : 0 32 | start_decay : 25 33 | end_decay: 90 34 | final_decay : 0.01 35 | exponent : 0.95 36 | 37 | # Training 38 | batch_size : 64 39 | epoch_size : 104950 40 | n_epochs : 700 41 | n_dataloader_workers : 6 42 | save_epoch_interval : None 43 | clip_gradient : 5.0 44 | 45 | 46 | # Method 47 | use_direct_reg_branch : True 48 | n_iter : 4 49 | pose_loss_func : "mse" 50 | rot_loss_func : "mse" 51 | trans_loss_func : "l2norm" 52 | depth_loss_func : "l1" 53 | uv_loss_func : "l2norm" 54 | kp2d_loss_func : "l2norm" 55 | kp3d_loss_func : "l2norm" 56 | kp2d_int_loss_func : "l2norm" 57 | kp3d_int_loss_func : "l2norm" 58 | align_3d_loss_func : "l2norm" 59 | pose_loss_weight : 1.0 60 | rot_loss_weight : 1.0 61 | trans_loss_weight : 1.0 62 | depth_loss_weight : 1.0 63 | uv_loss_weight : 1.0 64 | kp2d_loss_weight : 10.0 65 | kp3d_loss_weight : 10.0 66 | kp2d_int_loss_weight : 10.0 67 | kp3d_int_loss_weight : 10.0 68 | align_3d_loss_weight : 0.0 69 | joint_individual_weights : None 70 | use_joint_valid_mask : False 71 | rot_iterative_matmul : False 72 | fix_root : True 73 | bbox_3d_shape : 74 | - 1300 75 | - 1300 76 | - 1300 77 | reference_keypoint_id : 3 # 0:base 78 | fix_truncation : False 79 | 80 | use_pretrained_direct_reg_weights: False 81 | pretrained_direct_reg_weights_path: None 82 | 83 | use_pretrained_integral : False 84 | pretrained_integral_weights_path: None 85 | 86 | 87 | # rootnet (+ integral/regression) 88 | use_rootnet: True 89 | resample : False 90 | rootnet_depth_loss_weight : 1.0 91 | depth_loss_func : "l1" 92 | use_rootnet_xy_branch : False 93 | xy_loss_func : "mse" 94 | pretrained_rootnet: "experiments/kuka_rootnet_ref3/ckpt/curr_best_root_depth_model.pk" 95 | use_origin_bbox : False 96 | use_extended_bbox : True 97 | 98 | use_rootnet_with_reg_int_shared_backbone : True 99 | use_rootnet_with_reg_with_int_separate_backbone : False 100 | 101 | # Resume 102 | resume_run : False 103 | resume_experiment_name : "panda_rootnetwithreguv_pretrainedrootnet_extendedbbox_transl2norm_3dw5_usejointmask_notruncate_ref3_lr1e-4con_0911" 104 | -------------------------------------------------------------------------------- /configs/panda/depthnet.yaml: -------------------------------------------------------------------------------- 1 | 2 | # basic training 3 | no_cuda : False 4 | device_id : [0] 5 | 6 | # experiment name (also name of the saving directory) 7 | # model and log directory : {ROOT}/experiment/{exp_name}/ 8 | exp_name : "panda_depthnet" 9 | 10 | # Data 11 | urdf_robot_name : "panda" 12 | train_ds_names : "dream/synthetic/panda_synth_train_dr" 13 | val_ds_names : None 14 | image_size : 256.0 15 | 16 | # Model 17 | backbone_name : "hrnet32" 18 | split_reg_head : False 19 | split_type : "2-first" 20 | use_rpmg: False 21 | 22 | # Optimizer 23 | lr : 1e-4 24 | weight_decay : 0. 25 | use_schedule : False 26 | schedule_type : "linear" 27 | n_epochs_warmup : 15 28 | start_decay : 100 29 | end_decay: 300 30 | final_decay : 0.01 31 | exponent : 0.96 32 | clip_gradient : 1.0 33 | 34 | # Training 35 | batch_size : 64 36 | epoch_size : 104950 37 | n_epochs : 700 38 | n_dataloader_workers : 6 39 | save_epoch_interval : None 40 | 41 | # Method 42 | use_direct_reg_branch : False 43 | n_iter : 4 44 | pose_loss_func : "smoothl1" 45 | rot_loss_func : "smoothl1" 46 | trans_loss_func : "smoothl1" 47 | kp3d_loss_func : "l2norm" 48 | kp2d_loss_func : "l2norm" 49 | rot_loss_weight : 1.0 50 | trans_loss_weight : 1.0 51 | use_2d_reprojection_loss : False 52 | use_3d_loss : True 53 | error2d_loss_weight : 1e-5 54 | error3d_loss_weight : 10.0 55 | joint_individual_weights : None 56 | 57 | use_integral_3d_branch : False 58 | use_limb_loss : False 59 | limb_loss_func : "l1" 60 | limb_loss_weight : 1.0 61 | use_uvd_3d_loss : True 62 | integral_3d_loss_func : "l2norm" 63 | integral_3d_loss_weight : 1.0 64 | use_xyz_3d_loss : False 65 | integral_xyz_3d_loss_func : "l2norm" 66 | integral_xyz_3d_loss_weight : 1.0 67 | bbox_3d_shape : 68 | - 1300 69 | - 1300 70 | - 1300 71 | reference_keypoint_id : 3 # 0:base 72 | 73 | use_pretrained_direct_reg_weights: False 74 | pretrained_direct_reg_weights_path: None 75 | 76 | # rootnet 77 | use_rootnet: True 78 | depth_loss_func : "l1" 79 | use_rootnet_xy_branch : False 80 | xy_loss_func : "mse" 81 | use_origin_bbox : False 82 | use_extended_bbox : True 83 | extend_ratio : [0.2, 0.13] 84 | use_rootnet_with_angle: False 85 | 86 | # Resume 87 | resume_run : False 88 | resume_experiment_name : "resume_name" 89 | -------------------------------------------------------------------------------- /configs/panda/full.yaml: -------------------------------------------------------------------------------- 1 | 2 | # basic training 3 | no_cuda : False 4 | device_id : [0] 5 | 6 | # experiment name (also name of the saving directory) 7 | # model and log directory : {ROOT}/experiment/{exp_name}/ 8 | exp_name : "panda_full2" 9 | 10 | # Data 11 | urdf_robot_name : "panda" 12 | train_ds_names : "dream/synthetic/panda_synth_train_dr" 13 | val_ds_names : None 14 | image_size : 256.0 15 | 16 | # Model 17 | backbone_name : "resnet50" 18 | rootnet_backbone_name : "hrnet32" 19 | rootnet_image_size : 256.0 20 | other_image_size : 256.0 21 | use_rpmg: False 22 | 23 | # Optimizer 24 | lr : 1e-4 25 | weight_decay : 0. 26 | use_schedule : True 27 | schedule_type : "exponential" 28 | n_epochs_warmup : 0 29 | start_decay : 45 30 | end_decay: 100 31 | final_decay : 0.01 32 | exponent : 0.95 33 | 34 | # Training 35 | batch_size : 64 36 | epoch_size : 104950 37 | n_epochs : 700 38 | n_dataloader_workers : 6 39 | save_epoch_interval : None 40 | clip_gradient : 5.0 41 | 42 | # Method 43 | use_direct_reg_branch : True 44 | n_iter : 4 45 | pose_loss_func : "mse" 46 | rot_loss_func : "mse" 47 | trans_loss_func : "l2norm" 48 | depth_loss_func : "l1" 49 | uv_loss_func : "l2norm" 50 | kp2d_loss_func : "l2norm" 51 | kp3d_loss_func : "l2norm" 52 | kp2d_int_loss_func : "l2norm" 53 | kp3d_int_loss_func : "l2norm" 54 | align_3d_loss_func : "l2norm" 55 | pose_loss_weight : 1.0 56 | rot_loss_weight : 1.0 57 | trans_loss_weight : 1.0 58 | depth_loss_weight : 10.0 59 | uv_loss_weight : 1.0 60 | kp2d_loss_weight : 10.0 61 | kp3d_loss_weight : 10.0 62 | kp2d_int_loss_weight : 10.0 63 | kp3d_int_loss_weight : 10.0 64 | align_3d_loss_weight : 0.0 65 | joint_individual_weights : None 66 | use_joint_valid_mask : False 67 | fix_root : True 68 | bbox_3d_shape : 69 | - 1300 70 | - 1300 71 | - 1300 72 | reference_keypoint_id : 3 # 0:base 73 | fix_truncation : False 74 | 75 | use_pretrained_direct_reg_weights: False 76 | pretrained_direct_reg_weights_path: None 77 | 78 | use_pretrained_integral : False 79 | pretrained_integral_weights_path: None 80 | 81 | 82 | # rootnet (+ integral/regression) 83 | use_rootnet: True 84 | resample : False 85 | rootnet_depth_loss_weight : 1.0 86 | depth_loss_func : "l1" 87 | use_rootnet_xy_branch : False 88 | xy_loss_func : "mse" 89 | pretrained_rootnet: "models/pretrained_depthnet/panda_pretrained_depthnet.pk" 90 | use_origin_bbox : False 91 | use_extended_bbox : True 92 | 93 | use_rootnet_with_reg_int_shared_backbone : True 94 | use_rootnet_with_reg_with_int_separate_backbone : False 95 | 96 | # Resume 97 | resume_run : False 98 | resume_experiment_name : "resume_experiment_name" 99 | -------------------------------------------------------------------------------- /configs/panda/self_supervised/azure.yaml: -------------------------------------------------------------------------------- 1 | 2 | # basic training 3 | no_cuda : False 4 | device_id : [0] 5 | 6 | # experiment name (also name of the saving directory) 7 | # model and log directory : {ROOT}/experiment/{exp_name}/ 8 | exp_name : "panda_azure_self_supervised" 9 | 10 | # Data 11 | urdf_robot_name : "panda" 12 | train_ds_names : "dream/real/panda-3cam_azure" 13 | val_ds_names : None 14 | image_size : 256.0 15 | 16 | # Model 17 | backbone_name : "resnet50" 18 | rootnet_backbone_name : "hrnet32" 19 | rootnet_image_size : 256.0 20 | other_image_size : 256.0 21 | use_rpmg: False 22 | 23 | # Optimizer 24 | lr : 1e-8 25 | weight_decay : 0. 26 | use_schedule : False 27 | schedule_type : "exponential" 28 | n_epochs_warmup : 0 29 | start_decay : 20 30 | end_decay: 300 31 | final_decay : 0.01 32 | exponent : 0.78 33 | 34 | # Training 35 | batch_size : 32 36 | epoch_size : 104950 37 | n_epochs : 700 38 | n_dataloader_workers : 6 39 | save_epoch_interval : None 40 | clip_gradient : 10.0 41 | 42 | # Method 43 | use_direct_reg_branch : True 44 | n_iter : 4 45 | pose_loss_func : "mse" 46 | rot_loss_func : "mse" 47 | trans_loss_func : "l2norm" 48 | depth_loss_func : "l1" 49 | uv_loss_func : "l2norm" 50 | kp2d_loss_func : "l2norm" 51 | kp3d_loss_func : "l2norm" 52 | pose_loss_weight : 1.0 53 | rot_loss_weight : 1.0 54 | trans_loss_weight : 1.0 55 | depth_loss_weight : 1.0 56 | uv_loss_weight : 0.0 57 | kp2d_loss_weight : 10.0 58 | kp3d_loss_weight : 10.0 59 | reg_joint_map : False 60 | joint_conv_dim : [256,256,256] 61 | joint_individual_weights : None 62 | use_joint_valid_mask : True 63 | 64 | 65 | use_integral_3d_branch : False 66 | use_limb_loss : False 67 | limb_loss_func : "l1" 68 | limb_loss_weight : 1.0 69 | use_uvd_3d_loss : False 70 | integral_3d_loss_func : "l2norm" 71 | integral_3d_loss_weight : 1.0 72 | use_xyz_3d_loss : True 73 | integral_xyz_3d_loss_func : "l2norm" 74 | integral_xyz_3d_loss_weight : 1.0 75 | bbox_3d_shape : 76 | - 1300 77 | - 1300 78 | - 1300 79 | 80 | reference_keypoint_id : 3 # 0:base 81 | fix_truncation : False 82 | 83 | use_pretrained_direct_reg_weights: False 84 | pretrained_direct_reg_weights_path: None 85 | 86 | use_pretrained_integral : False 87 | pretrained_integral_weights_path: None 88 | 89 | 90 | # rootnet (+ integral/regression) 91 | use_rootnet: False 92 | resample : False 93 | rootnet_depth_loss_weight : 1.0 94 | depth_loss_func : "l1" 95 | use_rootnet_xy_branch : False 96 | xy_loss_func : "mse" 97 | pretrained_rootnet: None 98 | use_origin_bbox : False 99 | use_extended_bbox : True 100 | 101 | use_rootnet_with_regression_uv : False 102 | use_rootnet_with_reg_int_shared_backbone : True 103 | use_rootnet_with_reg_with_int_separate_backbone : False 104 | 105 | use_sim2real : True 106 | use_view : False 107 | pretrained_weight_on_synth : "panda_synth_pretrain/ckpt/curr_best_auc(add)_azure_model.pk" 108 | 109 | mask_loss_weight : 0.0 110 | iou_loss_weight : 1.0 111 | scale_loss_weight : 0.0 112 | align_3d_loss_weight : 1.0 113 | 114 | # Resume 115 | resume_run : False 116 | resume_experiment_name : "panda_sim2real_az_rri1026_lr1e-8con_iouloss+alignloss_fixbnrun_preepoch82_1031" 117 | -------------------------------------------------------------------------------- /configs/panda/self_supervised/kinect.yaml: -------------------------------------------------------------------------------- 1 | 2 | # basic training 3 | no_cuda : False 4 | device_id : [0] 5 | 6 | # experiment name (also name of the saving directory) 7 | # model and log directory : {ROOT}/experiment/{exp_name}/ 8 | exp_name : "panda_kinect_self_supervised" 9 | 10 | # Data 11 | urdf_robot_name : "panda" 12 | train_ds_names : "dream/real/panda-3cam_kinect360" 13 | val_ds_names : None 14 | image_size : 256.0 15 | 16 | # Model 17 | backbone_name : "resnet50" 18 | rootnet_backbone_name : "hrnet32" 19 | rootnet_image_size : 256.0 20 | other_image_size : 256.0 21 | use_rpmg: False 22 | 23 | # Optimizer 24 | lr : 3e-9 25 | weight_decay : 0. 26 | use_schedule : False 27 | schedule_type : "exponential" 28 | n_epochs_warmup : 0 29 | start_decay : 1 30 | end_decay: 300 31 | final_decay : 0.01 32 | exponent : 0.85 33 | 34 | # Training 35 | batch_size : 32 36 | epoch_size : 104950 37 | n_epochs : 700 38 | n_dataloader_workers : 6 39 | save_epoch_interval : None 40 | clip_gradient : 10.0 41 | 42 | # Method 43 | use_direct_reg_branch : True 44 | n_iter : 4 45 | pose_loss_func : "mse" 46 | rot_loss_func : "mse" 47 | trans_loss_func : "l2norm" 48 | depth_loss_func : "l1" 49 | uv_loss_func : "l2norm" 50 | kp2d_loss_func : "l2norm" 51 | kp3d_loss_func : "l2norm" 52 | pose_loss_weight : 1.0 53 | rot_loss_weight : 1.0 54 | trans_loss_weight : 1.0 55 | depth_loss_weight : 1.0 56 | uv_loss_weight : 0.0 57 | kp2d_loss_weight : 10.0 58 | kp3d_loss_weight : 10.0 59 | reg_joint_map : False 60 | joint_conv_dim : [256,256,256] 61 | joint_individual_weights : None 62 | use_joint_valid_mask : True 63 | 64 | 65 | use_integral_3d_branch : False 66 | use_limb_loss : False 67 | limb_loss_func : "l1" 68 | limb_loss_weight : 1.0 69 | use_uvd_3d_loss : False 70 | integral_3d_loss_func : "l2norm" 71 | integral_3d_loss_weight : 1.0 72 | use_xyz_3d_loss : True 73 | integral_xyz_3d_loss_func : "l2norm" 74 | integral_xyz_3d_loss_weight : 1.0 75 | bbox_3d_shape : 76 | - 1300 77 | - 1300 78 | - 1300 79 | 80 | reference_keypoint_id : 3 # 0:base 81 | fix_truncation : False 82 | 83 | use_pretrained_direct_reg_weights: False 84 | pretrained_direct_reg_weights_path: None 85 | 86 | use_pretrained_integral : False 87 | pretrained_integral_weights_path: None 88 | 89 | 90 | # rootnet (+ integral/regression) 91 | use_rootnet: False 92 | resample : False 93 | rootnet_depth_loss_weight : 1.0 94 | depth_loss_func : "l1" 95 | use_rootnet_xy_branch : False 96 | xy_loss_func : "mse" 97 | pretrained_rootnet: None 98 | use_origin_bbox : False 99 | use_extended_bbox : True 100 | 101 | use_rootnet_with_regression_uv : False 102 | use_rootnet_with_reg_int_shared_backbone : True 103 | use_rootnet_with_reg_with_int_separate_backbone : False 104 | 105 | use_sim2real : True 106 | use_view : False 107 | pretrained_weight_on_synth : "panda_synth_pretrain/ckpt/curr_best_auc(add)_kinect_model.pk" 108 | 109 | mask_loss_weight : 0.0 110 | iou_loss_weight : 1.0 111 | scale_loss_weight : 0.0 112 | align_3d_loss_weight : 1.0 113 | 114 | # Resume 115 | resume_run : False 116 | resume_experiment_name : "panda_sim2real_rootnet+reg1008_lr1e-4con_1011" 117 | -------------------------------------------------------------------------------- /configs/panda/self_supervised/orb.yaml: -------------------------------------------------------------------------------- 1 | 2 | # basic training 3 | no_cuda : False 4 | device_id : [0] 5 | 6 | # experiment name (also name of the saving directory) 7 | # model and log directory : {ROOT}/experiment/{exp_name}/ 8 | exp_name : "panda_orb_self_supervised" 9 | 10 | # Data 11 | urdf_robot_name : "panda" 12 | train_ds_names : "dream/real/panda-orb" 13 | val_ds_names : None 14 | image_size : 256.0 15 | 16 | # Model 17 | backbone_name : "resnet50" 18 | rootnet_backbone_name : "hrnet32" 19 | rootnet_image_size : 256.0 20 | other_image_size : 256.0 21 | use_rpmg: False 22 | 23 | # Optimizer 24 | lr : 1e-7 25 | weight_decay : 0. 26 | use_schedule : False 27 | schedule_type : "exponential" 28 | n_epochs_warmup : 0 29 | start_decay : 20 30 | end_decay: 300 31 | final_decay : 0.01 32 | exponent : 0.78 33 | 34 | # Training 35 | batch_size : 32 36 | epoch_size : 104950 37 | n_epochs : 700 38 | n_dataloader_workers : 6 39 | save_epoch_interval : None 40 | clip_gradient : 10.0 41 | 42 | # Method 43 | use_direct_reg_branch : True 44 | n_iter : 4 45 | pose_loss_func : "mse" 46 | rot_loss_func : "mse" 47 | trans_loss_func : "l2norm" 48 | depth_loss_func : "l1" 49 | uv_loss_func : "l2norm" 50 | kp2d_loss_func : "l2norm" 51 | kp3d_loss_func : "l2norm" 52 | pose_loss_weight : 1.0 53 | rot_loss_weight : 1.0 54 | trans_loss_weight : 1.0 55 | depth_loss_weight : 1.0 56 | uv_loss_weight : 0.0 57 | kp2d_loss_weight : 10.0 58 | kp3d_loss_weight : 10.0 59 | reg_joint_map : False 60 | joint_conv_dim : [256,256,256] 61 | joint_individual_weights : None 62 | use_joint_valid_mask : True 63 | 64 | 65 | use_integral_3d_branch : False 66 | use_limb_loss : False 67 | limb_loss_func : "l1" 68 | limb_loss_weight : 1.0 69 | use_uvd_3d_loss : False 70 | integral_3d_loss_func : "l2norm" 71 | integral_3d_loss_weight : 1.0 72 | use_xyz_3d_loss : True 73 | integral_xyz_3d_loss_func : "l2norm" 74 | integral_xyz_3d_loss_weight : 1.0 75 | bbox_3d_shape : 76 | - 1300 77 | - 1300 78 | - 1300 79 | 80 | reference_keypoint_id : 3 # 0:base 81 | fix_truncation : False 82 | 83 | use_pretrained_direct_reg_weights: False 84 | pretrained_direct_reg_weights_path: None 85 | 86 | use_pretrained_integral : False 87 | pretrained_integral_weights_path: None 88 | 89 | 90 | # rootnet (+ integral/regression) 91 | use_rootnet: False 92 | resample : False 93 | rootnet_depth_loss_weight : 1.0 94 | depth_loss_func : "l1" 95 | use_rootnet_xy_branch : False 96 | xy_loss_func : "mse" 97 | pretrained_rootnet: None 98 | use_origin_bbox : False 99 | use_extended_bbox : True 100 | 101 | use_rootnet_with_regression_uv : False 102 | use_rootnet_with_reg_int_shared_backbone : True 103 | use_rootnet_with_reg_with_int_separate_backbone : False 104 | 105 | use_sim2real : True 106 | use_view : False 107 | pretrained_weight_on_synth : "panda_synth_pretrain/ckpt/curr_best_auc(add)_orb_model.pk" 108 | 109 | mask_loss_weight : 0.0 110 | iou_loss_weight : 1.0 111 | scale_loss_weight : 0.0 112 | align_3d_loss_weight : 1.0 113 | 114 | # Resume 115 | resume_run : False 116 | resume_experiment_name : "resume_name" 117 | -------------------------------------------------------------------------------- /configs/panda/self_supervised/realsense.yaml: -------------------------------------------------------------------------------- 1 | 2 | # basic training 3 | no_cuda : False 4 | device_id : [0] 5 | 6 | # experiment name (also name of the saving directory) 7 | # model and log directory : {ROOT}/experiment/{exp_name}/ 8 | exp_name : "panda_realsense_self_supervised" 9 | 10 | # Data 11 | urdf_robot_name : "panda" 12 | train_ds_names : "dream/real/panda-3cam_realsense" 13 | val_ds_names : None 14 | image_size : 256.0 15 | 16 | # Model 17 | backbone_name : "resnet50" 18 | rootnet_backbone_name : "hrnet32" 19 | rootnet_image_size : 256.0 20 | other_image_size : 256.0 21 | use_rpmg: False 22 | 23 | # Optimizer 24 | lr : 1e-7 25 | weight_decay : 0. 26 | use_schedule : False 27 | schedule_type : "exponential" 28 | n_epochs_warmup : 0 29 | start_decay : 20 30 | end_decay: 300 31 | final_decay : 0.01 32 | exponent : 0.78 33 | 34 | # Training 35 | batch_size : 32 36 | epoch_size : 104950 37 | n_epochs : 700 38 | n_dataloader_workers : 6 39 | save_epoch_interval : None 40 | clip_gradient : 10.0 41 | 42 | # Method 43 | use_direct_reg_branch : True 44 | n_iter : 4 45 | pose_loss_func : "mse" 46 | rot_loss_func : "mse" 47 | trans_loss_func : "l2norm" 48 | depth_loss_func : "l1" 49 | uv_loss_func : "l2norm" 50 | kp2d_loss_func : "l2norm" 51 | kp3d_loss_func : "l2norm" 52 | pose_loss_weight : 1.0 53 | rot_loss_weight : 1.0 54 | trans_loss_weight : 1.0 55 | depth_loss_weight : 1.0 56 | uv_loss_weight : 1.0 57 | kp2d_loss_weight : 10.0 58 | kp3d_loss_weight : 10.0 59 | reg_joint_map : False 60 | joint_conv_dim : [256,256,256] 61 | joint_individual_weights : None 62 | use_joint_valid_mask : False 63 | 64 | 65 | use_integral_3d_branch : False 66 | use_limb_loss : False 67 | limb_loss_func : "l1" 68 | limb_loss_weight : 1.0 69 | use_uvd_3d_loss : False 70 | integral_3d_loss_func : "l2norm" 71 | integral_3d_loss_weight : 1.0 72 | use_xyz_3d_loss : True 73 | integral_xyz_3d_loss_func : "l2norm" 74 | integral_xyz_3d_loss_weight : 1.0 75 | bbox_3d_shape : 76 | - 1300 77 | - 1300 78 | - 1300 79 | 80 | reference_keypoint_id : 3 # 0:base 81 | fix_truncation : False 82 | 83 | use_pretrained_direct_reg_weights: False 84 | pretrained_direct_reg_weights_path: None 85 | 86 | use_pretrained_integral : False 87 | pretrained_integral_weights_path: None 88 | 89 | 90 | # rootnet (+ integral/regression) 91 | use_rootnet: False 92 | resample : False 93 | rootnet_depth_loss_weight : 1.0 94 | depth_loss_func : "l1" 95 | use_rootnet_xy_branch : False 96 | xy_loss_func : "mse" 97 | pretrained_rootnet: None 98 | use_origin_bbox : False 99 | use_extended_bbox : True 100 | 101 | use_rootnet_with_regression_uv : False 102 | use_rootnet_with_reg_int_shared_backbone : True 103 | use_rootnet_with_reg_with_int_separate_backbone : False 104 | 105 | use_sim2real : True 106 | use_view : False 107 | pretrained_weight_on_synth : "panda_synth_pretrain/ckpt/curr_best_auc(add)_realsense_model.pk" 108 | 109 | mask_loss_weight : 0.0 110 | iou_loss_weight : 1.0 111 | scale_loss_weight : 0.0 112 | align_3d_loss_weight : 1.0 113 | 114 | # Resume 115 | resume_run : False 116 | resume_experiment_name : "resume_name" 117 | -------------------------------------------------------------------------------- /configs/panda/self_supervised/synth.yaml: -------------------------------------------------------------------------------- 1 | 2 | # basic training 3 | no_cuda : False 4 | device_id : [0] 5 | 6 | # experiment name (also name of the saving directory) 7 | # model and log directory : {ROOT}/experiment/{exp_name}/ 8 | exp_name : "panda_synth_pretrain" 9 | 10 | # Data 11 | urdf_robot_name : "panda" 12 | train_ds_names : "dream/synthetic/panda_synth_train_dr" 13 | val_ds_names : None 14 | image_size : 256.0 15 | 16 | # Model 17 | backbone_name : "resnet50" 18 | # integral_backbone_name : "resnet34" 19 | rootnet_backbone_name : "hrnet32" 20 | rootnet_image_size : 256.0 21 | other_image_size : 256.0 22 | use_rpmg: False 23 | 24 | # Optimizer 25 | lr : 1e-4 26 | weight_decay : 0. 27 | use_schedule : False 28 | schedule_type : "linear" 29 | n_epochs_warmup : 15 30 | start_decay : 100 31 | end_decay: 300 32 | final_decay : 0.01 33 | exponent : 0.96 34 | 35 | # Training 36 | batch_size : 64 37 | epoch_size : 104950 38 | n_epochs : 700 39 | n_dataloader_workers : 6 40 | save_epoch_interval : None 41 | clip_gradient : 5.0 42 | 43 | # Method 44 | use_direct_reg_branch : True 45 | n_iter : 4 46 | pose_loss_func : "mse" 47 | rot_loss_func : "mse" 48 | trans_loss_func : "l2norm" 49 | depth_loss_func : "l1" 50 | uv_loss_func : "l2norm" 51 | kp2d_loss_func : "l2norm" 52 | kp3d_loss_func : "l2norm" 53 | kp2d_int_loss_func : "l2norm" 54 | kp3d_int_loss_func : "l2norm" 55 | align_3d_loss_func : "l2norm" 56 | pose_loss_weight : 1.0 57 | rot_loss_weight : 1.0 58 | trans_loss_weight : 1.0 59 | depth_loss_weight : 1.0 60 | uv_loss_weight : 1.0 61 | kp2d_loss_weight : 10.0 62 | kp3d_loss_weight : 10.0 63 | kp2d_int_loss_weight : 10.0 64 | kp3d_int_loss_weight : 10.0 65 | align_3d_loss_weight : 0.0 66 | joint_individual_weights : None 67 | use_joint_valid_mask : False 68 | fix_root : True 69 | bbox_3d_shape : 70 | - 1300 71 | - 1300 72 | - 1300 73 | reference_keypoint_id : 3 # 0:base 74 | fix_truncation : False 75 | rotation_dim : 6 76 | 77 | use_pretrained_direct_reg_weights: False 78 | pretrained_direct_reg_weights_path: None 79 | 80 | use_pretrained_integral : False 81 | pretrained_integral_weights_path: None 82 | 83 | 84 | # rootnet (+ integral/regression) 85 | use_rootnet: True 86 | resample : False 87 | rootnet_depth_loss_weight : 1.0 88 | depth_loss_func : "l1" 89 | use_rootnet_xy_branch : False 90 | xy_loss_func : "mse" 91 | pretrained_rootnet: "" 92 | use_origin_bbox : False 93 | use_extended_bbox : True 94 | 95 | use_rootnet_with_reg_int_shared_backbone : True 96 | use_rootnet_with_reg_with_int_separate_backbone : False 97 | 98 | # Resume 99 | resume_run : False 100 | resume_experiment_name : "resume_name" 101 | -------------------------------------------------------------------------------- /lib/config.py: -------------------------------------------------------------------------------- 1 | import os 2 | from joblib import Memory 3 | from pathlib import Path 4 | import getpass 5 | import socket 6 | 7 | hostname = socket.gethostname() 8 | username = getpass.getuser() 9 | 10 | PROJECT_ROOT = Path(__file__).parent 11 | PROJECT_DIR = PROJECT_ROOT 12 | DATA_DIR = PROJECT_DIR / 'data' 13 | LOCAL_DATA_DIR = Path('data') 14 | TEST_DATA_DIR = LOCAL_DATA_DIR 15 | 16 | EXP_DIR = LOCAL_DATA_DIR / 'models' 17 | RESULTS_DIR = LOCAL_DATA_DIR / 'results' 18 | DEBUG_DATA_DIR = LOCAL_DATA_DIR / 'debug_data' 19 | DEPS_DIR = LOCAL_DATA_DIR / 'deps' 20 | CACHE_DIR = LOCAL_DATA_DIR / 'joblib_cache' 21 | assert LOCAL_DATA_DIR.exists() 22 | CACHE_DIR.mkdir(exist_ok=True) 23 | TEST_DATA_DIR.mkdir(exist_ok=True) 24 | RESULTS_DIR.mkdir(exist_ok=True) 25 | DEBUG_DATA_DIR.mkdir(exist_ok=True) 26 | 27 | ASSET_DIR = DATA_DIR / 'assets' 28 | MEMORY = Memory(CACHE_DIR, verbose=2) 29 | 30 | # ROBOTS URDF 31 | DREAM_DS_DIR = LOCAL_DATA_DIR / 'dream' 32 | 33 | PANDA_DESCRIPTION_PATH = os.path.abspath(DEPS_DIR / "panda-description/panda.urdf") 34 | PANDA_DESCRIPTION_PATH_VISUAL = os.path.abspath(DEPS_DIR / "panda-description/patched_urdf/panda.urdf") 35 | KUKA_DESCRIPTION_PATH = os.path.abspath(DEPS_DIR / "kuka-description/iiwa_description/urdf/iiwa7.urdf") 36 | BAXTER_DESCRIPTION_PATH = os.path.abspath("/DATA/disk1/cvda_share/robopose_data/deps/baxter-description/baxter_description/urdf/baxter.urdf") 37 | 38 | OWI_DESCRIPTION = os.path.abspath(DEPS_DIR / 'owi-description' / 'owi535_description' / 'owi535.urdf') 39 | OWI_KEYPOINTS_PATH = os.path.abspath(DEPS_DIR / 'owi-description' / 'keypoints.json') 40 | -------------------------------------------------------------------------------- /lib/core/config.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) 4 | import yaml 5 | from lib.config import LOCAL_DATA_DIR 6 | from easydict import EasyDict 7 | 8 | def make_default_cfg(): 9 | cfg = EasyDict() 10 | 11 | # basic experiment info (must be overwritten) 12 | cfg.exp_name = "default" 13 | cfg.config_path = "default" 14 | 15 | # training 16 | cfg.no_cuda = False 17 | cfg.device_id = 0 18 | cfg.batch_size = 64 19 | cfg.epoch_size = 104950 # will get rid of this eventually, but right now let it be 20 | cfg.n_epochs = 700 21 | cfg.n_dataloader_workers = int(os.environ.get('N_CPUS', 10)) - 2 22 | cfg.clip_gradient = 10.0 23 | 24 | # data 25 | cfg.urdf_robot_name = "panda" 26 | cfg.train_ds_names = os.path.abspath(LOCAL_DATA_DIR / "dream/real/panda_synth_train_dr") 27 | cfg.image_size = 256.0 28 | 29 | # augmentation during training 30 | cfg.jitter = True 31 | cfg.other_aug = True 32 | cfg.occlusion = True 33 | cfg.occlu_p = 0.5 34 | cfg.padding = False 35 | cfg.fix_truncation = False 36 | cfg.truncation_padding = [120,120,120,120] 37 | cfg.rootnet_flip = False 38 | 39 | # pipeline 40 | cfg.use_rootnet = False 41 | cfg.use_rootnet_with_reg_int_shared_backbone = False 42 | cfg.use_sim2real = False 43 | cfg.use_sim2real_real = False 44 | cfg.pretrained_rootnet = None 45 | cfg.pretrained_weight_on_synth = None 46 | cfg.use_view = False 47 | cfg.known_joint = False 48 | 49 | # optimizer and scheduler 50 | cfg.lr = 1e-4 51 | cfg.weight_decay = 0.0 52 | cfg.use_schedule = False 53 | cfg.schedule_type = "" 54 | cfg.n_epochs_warmup = 0 55 | cfg.start_decay = 100 56 | cfg.end_decay = 200 57 | cfg.final_decay = 0.01 58 | cfg.exponent = 1.0 59 | cfg.step_decay = 0.1 60 | cfg.step = 5 61 | 62 | # model 63 | ## basic setting 64 | cfg.backbone_name = "resnet50" 65 | cfg.rootnet_backbone_name = "hrnet32" 66 | cfg.rootnet_image_size = (cfg.image_size, cfg.image_size) 67 | cfg.other_image_size = (cfg.image_size, cfg.image_size) 68 | ## Jointnet/RotationNet 69 | cfg.n_iter = 4 70 | cfg.p_dropout = 0.5 71 | cfg.use_rpmg = False 72 | cfg.reg_joint_map = False 73 | cfg.joint_conv_dim = [] 74 | cfg.rotation_dim = 6 75 | cfg.direct_reg_rot = False 76 | cfg.rot_iterative_matmul = False 77 | cfg.fix_root = True 78 | cfg.reg_from_bb_out = False 79 | cfg.depth_from_bb_out = False 80 | ## KeypointNet 81 | cfg.bbox_3d_shape = [1300, 1300, 1300] 82 | cfg.reference_keypoint_id = 3 83 | ## DepthNet 84 | cfg.resample = False 85 | cfg.use_origin_bbox = False 86 | cfg.use_extended_bbox = True 87 | cfg.extend_ratio = [0.2, 0.13] 88 | cfg.use_offset = False 89 | cfg.use_rootnet_xy_branch = False 90 | cfg.add_fc = False 91 | cfg.multi_kp = False 92 | cfg.kps_need_depth = None 93 | 94 | # loss 95 | ## for full network training 96 | cfg.pose_loss_func = "mse" 97 | cfg.rot_loss_func = "mse" 98 | cfg.trans_loss_func = "l2norm" 99 | cfg.uv_loss_func = "l2norm" 100 | cfg.depth_loss_func = "l1" 101 | cfg.kp3d_loss_func = "l2norm" 102 | cfg.kp2d_loss_func = "l2norm" 103 | cfg.kp3d_int_loss_func = "l2norm" 104 | cfg.kp2d_int_loss_func = "l2norm" 105 | cfg.align_3d_loss_func = "l2norm" 106 | cfg.pose_loss_weight = 0.0 107 | cfg.rot_loss_weight = 0.0 108 | cfg.trans_loss_weight = 0.0 109 | cfg.uv_loss_weight = 0.0 110 | cfg.depth_loss_weight = 0.0 111 | cfg.kp2d_loss_weight = 0.0 112 | cfg.kp3d_loss_weight = 0.0 113 | cfg.kp2d_int_loss_weight = 0.0 114 | cfg.kp3d_int_loss_weight = 0.0 115 | cfg.align_3d_loss_weight = 0.0 116 | cfg.joint_individual_weights = None 117 | cfg.use_joint_valid_mask = False 118 | cfg.fix_mask = False 119 | ## for depthnet training 120 | cfg.rootnet_depth_loss_weight = 1.0 121 | cfg.depth_loss_func = "l1" 122 | cfg.xy_loss_func = "l1" 123 | ## for self-supervised training 124 | cfg.mask_loss_func = "mse_mean" 125 | cfg.mask_loss_weight = 0.0 126 | cfg.scale_loss_weight = 0.0 127 | cfg.iou_loss_weight = 0.0 128 | 129 | # resume 130 | cfg.resume_run = False 131 | cfg.resume_experiment_name = "resume_name" 132 | 133 | return cfg 134 | 135 | 136 | def make_cfg(args): 137 | 138 | cfg = make_default_cfg() 139 | cfg.config_path = args.config 140 | 141 | with open(args.config, encoding="utf-8") as f: 142 | config = yaml.load(f.read(), Loader=yaml.FullLoader) 143 | 144 | for k,v in config.items(): 145 | if k in cfg: 146 | if k == "n_dataloader_workers": 147 | cfg[k] = min(cfg[k], v) 148 | elif k == "train_ds_names": 149 | cfg[k] = os.path.abspath(LOCAL_DATA_DIR / v) 150 | if "move" in v: 151 | cfg[k] = v 152 | elif k in ["lr", "exponent"] or k.endswith("loss_weight"): 153 | cfg[k] = float(v) 154 | elif k in ["joint_individual_weights", "pretrained_rootnet", "pretrained_weight_on_synth"]: 155 | cfg[k] = None if v == "None" else v 156 | elif k == "extend_ratio": 157 | cfg[k] = list(v) 158 | else: 159 | cfg[k] = v 160 | 161 | f.close() 162 | 163 | return cfg -------------------------------------------------------------------------------- /lib/dataset/augmentations.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) 4 | import math 5 | import random 6 | from copy import deepcopy 7 | import numpy as np 8 | import PIL 9 | import torch 10 | import torch.nn.functional as F 11 | from dataset.roboutils import hnormalized, make_detections_from_segmentation 12 | from PIL import ImageEnhance, ImageFilter 13 | from utils.geometries import get_K_crop_resize 14 | 15 | 16 | def to_pil(im): 17 | if isinstance(im, PIL.Image.Image): 18 | return im 19 | elif isinstance(im, torch.Tensor): 20 | return PIL.Image.fromarray(np.asarray(im)) 21 | elif isinstance(im, np.ndarray): 22 | return PIL.Image.fromarray(im) 23 | else: 24 | raise ValueError('Type not supported', type(im)) 25 | 26 | 27 | def to_torch_uint8(im): 28 | if isinstance(im, PIL.Image.Image): 29 | im = torch.as_tensor(np.asarray(im).astype(np.uint8)) 30 | elif isinstance(im, torch.Tensor): 31 | assert im.dtype == torch.uint8 32 | elif isinstance(im, np.ndarray): 33 | assert im.dtype == np.uint8 34 | im = torch.as_tensor(im) 35 | else: 36 | raise ValueError('Type not supported', type(im)) 37 | if im.dim() == 3: 38 | assert im.shape[-1] in {1, 3},f"{im.shape}" 39 | return im 40 | 41 | def occlusion_aug(bbox, img_shape, min_area=0.0, max_area=0.3, max_try_times=5): 42 | xmin, ymin, _, _ = bbox 43 | xmax = bbox[2] 44 | ymax = bbox[3] 45 | imght, imgwidth = img_shape 46 | counter = 0 47 | while True: 48 | # force to break if no suitable occlusion 49 | if counter > max_try_times: # 5 50 | return 0, 0, 0, 0 51 | counter += 1 52 | 53 | area_min = min_area # 0.0 54 | area_max = max_area # 0.3 55 | synth_area = (random.random() * (area_max - area_min) + area_min) * (xmax - xmin) * (ymax - ymin) 56 | 57 | ratio_min = 0.5 58 | ratio_max = 1 / 0.5 59 | synth_ratio = (random.random() * (ratio_max - ratio_min) + ratio_min) 60 | 61 | if(synth_ratio*synth_area<=0): 62 | print(synth_area,xmax,xmin,ymax,ymin) 63 | print(synth_ratio,ratio_max,ratio_min) 64 | synth_h = math.sqrt(synth_area * synth_ratio) 65 | synth_w = math.sqrt(synth_area / synth_ratio) 66 | synth_xmin = random.random() * ((xmax - xmin) - synth_w - 1) + xmin 67 | synth_ymin = random.random() * ((ymax - ymin) - synth_h - 1) + ymin 68 | 69 | if synth_xmin >= 0 and synth_ymin >= 0 and synth_xmin + synth_w < imgwidth and synth_ymin + synth_h < imght: 70 | synth_xmin = int(synth_xmin) 71 | synth_ymin = int(synth_ymin) 72 | synth_w = int(synth_w) 73 | synth_h = int(synth_h) 74 | break 75 | return synth_ymin, synth_h, synth_xmin, synth_w 76 | 77 | class PillowBlur: 78 | def __init__(self, p=0.4, factor_interval=(1, 3)): 79 | self.p = p 80 | self.factor_interval = factor_interval 81 | 82 | def __call__(self, im, mask, obs): 83 | im = to_pil(im) 84 | k = random.randint(*self.factor_interval) 85 | im = im.filter(ImageFilter.GaussianBlur(k)) 86 | return im, mask, obs 87 | 88 | 89 | class PillowRGBAugmentation: 90 | def __init__(self, pillow_fn, p, factor_interval): 91 | self._pillow_fn = pillow_fn 92 | self.p = p 93 | self.factor_interval = factor_interval 94 | 95 | def __call__(self, im, mask, obs): 96 | im = to_pil(im) 97 | if random.random() <= self.p: 98 | im = self._pillow_fn(im).enhance(factor=random.uniform(*self.factor_interval)) 99 | #im.save('./BRIGHT.png') 100 | return im, mask, obs 101 | 102 | 103 | class PillowSharpness(PillowRGBAugmentation): 104 | def __init__(self, p=0.3, factor_interval=(0., 50.)): 105 | super().__init__(pillow_fn=ImageEnhance.Sharpness, 106 | p=p, 107 | factor_interval=factor_interval) 108 | 109 | 110 | class PillowContrast(PillowRGBAugmentation): 111 | def __init__(self, p=0.3, factor_interval=(0.2, 50.)): 112 | super().__init__(pillow_fn=ImageEnhance.Contrast, 113 | p=p, 114 | factor_interval=factor_interval) 115 | 116 | 117 | class PillowBrightness(PillowRGBAugmentation): 118 | def __init__(self, p=0.5, factor_interval=(0.1, 6.0)): 119 | super().__init__(pillow_fn=ImageEnhance.Brightness, 120 | p=p, 121 | factor_interval=factor_interval) 122 | 123 | 124 | class PillowColor(PillowRGBAugmentation): 125 | def __init__(self, p=0.3, factor_interval=(0.0, 20.0)): 126 | super().__init__(pillow_fn=ImageEnhance.Color, 127 | p=p, 128 | factor_interval=factor_interval) 129 | 130 | 131 | class GrayScale(PillowRGBAugmentation): 132 | def __init__(self, p=0.3): 133 | self.p = p 134 | 135 | def __call__(self, im, mask, obs): 136 | im = to_pil(im) 137 | if random.random() <= self.p: 138 | im = to_torch_uint8(im).float() 139 | gray = 0.2989 * im[..., 0] + 0.5870 * im[..., 1] + 0.1140 * im[..., 2] 140 | gray = gray.to(torch.uint8) 141 | im = gray.unsqueeze(-1).repeat(1, 1, 3) 142 | return im, mask, obs 143 | 144 | 145 | class BackgroundAugmentation: 146 | def __init__(self, image_dataset, p): 147 | self.image_dataset = image_dataset 148 | self.p = p 149 | 150 | def get_bg_image(self, idx): 151 | return self.image_dataset[idx] 152 | 153 | def __call__(self, im, mask, obs): 154 | if random.random() <= self.p: 155 | im = to_torch_uint8(im) 156 | mask = to_torch_uint8(mask) 157 | h, w, c = im.shape 158 | im_bg = self.get_bg_image(random.randint(0, len(self.image_dataset) - 1)) 159 | im_bg = to_pil(im_bg) 160 | im_bg = torch.as_tensor(np.asarray(im_bg.resize((w, h)))) 161 | mask_bg = mask == 0 162 | im[mask_bg] = im_bg[mask_bg] 163 | return im, mask, obs 164 | 165 | class CropResizeToAspectAugmentation: 166 | def __init__(self, resize=(640, 480)): 167 | self.resize = (min(resize), max(resize)) 168 | self.aspect = max(resize) / min(resize) 169 | 170 | def __call__(self, im, mask, obs, use_3d=True): 171 | im = to_torch_uint8(im) 172 | mask = to_torch_uint8(mask) 173 | obs['orig_camera'] = deepcopy(obs['camera']) 174 | assert im.shape[-1] == 3 175 | h, w = im.shape[:2] 176 | if (h, w) == self.resize: 177 | obs['orig_camera']['crop_resize_bbox'] = (0, 0, w-1, h-1) 178 | return im, mask, obs 179 | 180 | ratio = float(self.resize[0])/h 181 | 182 | images = (torch.as_tensor(im).float() / 255).unsqueeze(0).permute(0, 3, 1, 2) 183 | masks = torch.as_tensor(mask).unsqueeze(0).unsqueeze(0).float() 184 | K = torch.tensor(obs['camera']['K']).unsqueeze(0) 185 | 186 | # Match the width on input image with an image of target aspect ratio. 187 | # if not np.isclose(w/h, self.aspect): 188 | # x0, y0 = images.shape[-1] / 2, images.shape[-2] / 2 189 | # w = images.shape[-1] 190 | # r = self.aspect 191 | # h = w * 1/r 192 | # box_size = (h, w) 193 | # h, w = min(box_size), max(box_size) 194 | # x1, y1, x2, y2 = x0-w/2, y0-h/2, x0+w/2, y0+h/2 195 | # box = torch.tensor([x1, y1, x2, y2]) 196 | # images, masks, K = crop_to_aspect_ratio(images, box, masks=masks, K=K) 197 | 198 | # Resize to target size 199 | x0, y0 = images.shape[-1] / 2, images.shape[-2] / 2 200 | h_input, w_input = images.shape[-2], images.shape[-1] 201 | h_output, w_output = min(self.resize), max(self.resize) 202 | box_size = (h_input, w_input) 203 | h, w = min(box_size), max(box_size) 204 | x1, y1, x2, y2 = x0-w/2, y0-h/2, x0+w/2, y0+h/2 205 | box = torch.tensor([x1, y1, x2, y2]) 206 | images = F.interpolate(images, size=(h_output, w_output), mode='bilinear', align_corners=False) 207 | masks = F.interpolate(masks, size=(h_output, w_output), mode='nearest') 208 | obs['orig_camera']['crop_resize_bbox'] = tuple(box.tolist()) 209 | K = get_K_crop_resize(K, box.unsqueeze(0), orig_size=(h_input, w_input), crop_resize=(h_output, w_output)) 210 | # Update the bounding box annotations 211 | keypoints=[] 212 | keypoints_3d=obs['objects'][0]['TCO_keypoints_3d'] 213 | K_tmp=K.cpu().clone().detach().numpy()[0] 214 | 215 | if use_3d: 216 | for location3d in keypoints_3d: 217 | location3d.reshape(3,1) 218 | p_unflattened = np.matmul(K_tmp, location3d) 219 | #print(p_unflattened) 220 | projection = hnormalized(p_unflattened) 221 | #print(p_unflattened) 222 | #print(projection) 223 | keypoints.append(list(projection)) 224 | obs['objects'][0]['keypoints_2d']=keypoints 225 | else: 226 | obs['objects'][0]['keypoints_2d']=obs['objects'][0]['keypoints_2d']*ratio 227 | 228 | dets_gt = make_detections_from_segmentation(masks)[0] 229 | for n, obj in enumerate(obs['objects']): 230 | if 'bbox' in obj: 231 | #assert 'id_in_segm' in obj 232 | #print(dets_gt) 233 | try: 234 | obj['bbox'] = dets_gt[1] 235 | except: 236 | print("bbox",obj['bbox'],"dets_gt",dets_gt) 237 | 238 | im = (images[0].permute(1, 2, 0) * 255).to(torch.uint8) 239 | mask = masks[0, 0].to(torch.uint8) 240 | obs['camera']['K'] = K.squeeze(0).numpy() 241 | obs['camera']['resolution'] = (w_output, h_output) 242 | return im, mask, obs 243 | 244 | def flip_img(x): 245 | assert (x.ndim == 3) 246 | dim = 1 247 | return np.flip(x,axis=dim).copy() 248 | 249 | def flip_joints_2d(joints_2d, width, flip_pairs): 250 | joints = joints_2d.copy() 251 | joints[:, 0] = width - joints[:, 0] - 1 # flip horizontally 252 | 253 | if flip_pairs is not None: # change left-right parts 254 | for lr in flip_pairs: 255 | joints[lr[0]], joints[lr[1]] = joints[lr[1]].copy(), joints[lr[0]].copy() 256 | return joints 257 | 258 | def flip_xyz_joints_3d(joints_3d, flip_pairs): 259 | assert joints_3d.ndim in (2, 3) 260 | joints = joints_3d.copy() 261 | # flip horizontally 262 | joints[:, 0] = -1 * joints[:, 0] 263 | # change left-right parts 264 | if flip_pairs is not None: 265 | print(joints) 266 | for pair in flip_pairs: 267 | print(pair) 268 | print(joints[pair[0]], joints[pair[1]]) 269 | joints[pair[0]], joints[pair[1]] = joints[pair[1]], joints[pair[0]].copy() 270 | return joints 271 | 272 | def flip_joints_3d(joints_3d, width, flip_pairs): 273 | joints = joints_3d.copy() 274 | # flip horizontally 275 | joints[:, 0, 0] = width - joints[:, 0, 0] - 1 276 | # change left-right parts 277 | if flip_pairs is not None: 278 | for pair in flip_pairs: 279 | joints[pair[0], :, 0], joints[pair[1], :, 0] = \ 280 | joints[pair[1], :, 0], joints[pair[0], :, 0].copy() 281 | joints[pair[0], :, 1], joints[pair[1], :, 1] = \ 282 | joints[pair[1], :, 1], joints[pair[0], :, 1].copy() 283 | joints[:, :, 0] *= joints[:, :, 1] 284 | return joints 285 | 286 | class FlipAugmentation: 287 | def __init__(self, p, flip_pairs=None): 288 | self.p = p 289 | self.flip_pairs = flip_pairs 290 | 291 | def __call__(self, im, mask, obs): 292 | if random.random() <= self.p: 293 | im = flip_img(im.numpy()) 294 | # mask = flip_img(mask) 295 | obs['objects'][0]['keypoints_2d'] = flip_joints_2d(np.array(obs['objects'][0]['keypoints_2d']), im.shape[1], self.flip_pairs) 296 | obs['camera']['K'][0,0] = - obs['camera']['K'][0,0] 297 | obs['camera']['K'][0,2] = im.shape[1] - 1 - obs['camera']['K'][0,2] 298 | return im, mask, obs 299 | 300 | def rotate_joints_2d(joints_2d, width): 301 | joints = joints_2d.copy() 302 | joints[:, 1] = joints_2d[:, 0] 303 | joints[:, 0] = width - joints_2d[:, 1] + 1 304 | return joints 305 | 306 | class RotationAugmentation: 307 | def __init__(self, p): 308 | self.p = p 309 | 310 | def __call__(self, im, mask, obs): 311 | if random.random() <= self.p: 312 | h,w = im.shape[0],im.shape[1] 313 | im_copy = np.zeros((w,h,3), dtype=np.uint8) 314 | for i in range(h): 315 | for j in range(w): 316 | im_copy[j][h-i-1]=im[i][j].astype(np.uint8) 317 | rgb = PIL.fromarray(im_copy) 318 | obs['objects'][0]['keypoints_2d'] = rotate_joints_2d(obs['objects'][0]['keypoints_2d'], im_copy.shape[1]) 319 | kp3d = obs['objects'][0]['TCO_keypoints_3d'] 320 | K = obs['camera']['K'] 321 | # original_fx,original_fy,original_cx,original_cy = K[0][0],K[1][1],K[0][2],K[1][2] 322 | # 角度（弧度）表示旋转方向，顺时针旋转90度 323 | angle = np.pi / 2 # 90 degree 324 | K[0][2],K[1][2] = K[1][2],K[0][2] 325 | # set up rotation matrix 326 | rotation_matrix = np.array([[np.cos(angle), -np.sin(angle), 0], 327 | [np.sin(angle), np.cos(angle), 0], 328 | [0, 0, 1]]) 329 | # new camera intrinsic matrix 330 | # obs['camera']['K'] = new_intrinsic_matrix 331 | for i in range(kp3d.shape[0]): 332 | kp3d[i] = np.dot(rotation_matrix, kp3d[i]) 333 | 334 | return rgb, mask, obs 335 | else: 336 | pass -------------------------------------------------------------------------------- /lib/dataset/const.py: -------------------------------------------------------------------------------- 1 | from .augmentations import (CropResizeToAspectAugmentation, PillowBlur, 2 | PillowBrightness, PillowColor, PillowContrast, 3 | PillowSharpness, occlusion_aug, to_torch_uint8) 4 | 5 | rgb_augmentations=[ 6 | PillowSharpness(p=0.3, factor_interval=(0., 50.)), 7 | PillowContrast(p=0.3, factor_interval=(0.7, 1.8)), 8 | PillowBrightness(p=0.3, factor_interval=(0.7, 1.8)), 9 | PillowColor(p=0.3, factor_interval=(0., 4.)) 10 | ] 11 | 12 | KEYPOINT_NAMES={ 13 | 'panda' : [ 14 | 'panda_link0', 'panda_link2', 'panda_link3', 15 | 'panda_link4', 'panda_link6', 'panda_link7', 16 | 'panda_hand' 17 | ], 18 | 'baxter': [ 19 | 'torso_t0', 'right_s0','left_s0', 'right_s1', 'left_s1', 20 | 'right_e0','left_e0', 'right_e1','left_e1','right_w0', 'left_w0', 21 | 'right_w1','left_w1','right_w2', 'left_w2','right_hand','left_hand' 22 | ], 23 | 'kuka' : [ 24 | 'iiwa7_link_0', 'iiwa7_link_1', 25 | 'iiwa7_link_2', 'iiwa7_link_3', 26 | 'iiwa7_link_4', 'iiwa7_link_5', 27 | 'iiwa7_link_6', 'iiwa7_link_7' 28 | ], 29 | 'owi535' :[ 30 | 'Rotation', 'Base', 'Elbow', 'Wrist' 31 | ] 32 | } 33 | 34 | KEYPOINT_NAMES_TO_LINK_NAMES = { 35 | "panda" : dict(zip(KEYPOINT_NAMES['panda'],KEYPOINT_NAMES['panda'])), 36 | "kuka" : { 37 | 'iiwa7_link_0':'iiwa_link_0', 'iiwa7_link_1':'iiwa_link_1', 38 | 'iiwa7_link_2':'iiwa_link_2', 'iiwa7_link_3':'iiwa_link_3', 39 | 'iiwa7_link_4':'iiwa_link_4', 'iiwa7_link_5':'iiwa_link_5', 40 | 'iiwa7_link_6':'iiwa_link_6', 'iiwa7_link_7':'iiwa_link_7' 41 | }, 42 | "baxter" : { 43 | 'torso_t0':'torso', 44 | 'right_s0':'right_upper_shoulder', 'left_s0':'left_upper_shoulder', 45 | 'right_s1':'right_lower_shoulder', 'left_s1':'left_lower_shoulder', 46 | 'right_e0':'right_upper_elbow','left_e0':'left_upper_elbow', 47 | 'right_e1':'right_lower_elbow','left_e1':'left_lower_elbow', 48 | 'right_w0':'right_upper_forearm', 'left_w0':'left_upper_forearm', 49 | 'right_w1':'right_lower_forearm', 'left_w1':'left_lower_forearm', 50 | 'right_w2':'right_wrist', 'left_w2':'left_wrist', 51 | 'right_hand':'right_hand','left_hand':'left_hand' 52 | }, 53 | "owi535" : { 54 | 'Rotation':'Rotation', 'Base':'Base', 'Elbow':'Elbow', 'Wrist':'Wrist' 55 | } 56 | } 57 | 58 | LINK_NAMES = { 59 | 'panda': ['panda_link0', 'panda_link2', 'panda_link3', 'panda_link4', 60 | 'panda_link6', 'panda_link7', 'panda_hand'], 61 | 'kuka': ['iiwa_link_0', 'iiwa_link_1', 'iiwa_link_2', 'iiwa_link_3', 62 | 'iiwa_link_4', 'iiwa_link_5', 'iiwa_link_6', 'iiwa_link_7'], 63 | 'baxter': ['torso', 'right_upper_shoulder', 'left_upper_shoulder', 'right_lower_shoulder', 64 | 'left_lower_shoulder', 'right_upper_elbow', 'left_upper_elbow', 'right_lower_elbow', 65 | 'left_lower_elbow', 'right_upper_forearm', 'left_upper_forearm', 'right_lower_forearm', 66 | 'left_lower_forearm', 'right_wrist', 'left_wrist', 'right_hand', 'left_hand'], 67 | #'owi535': ["Base","Elbow","Wrist","Model","Model","Model","Model","Base","Base","Base","Base","Elbow","Elbow","Elbow","Elbow","Wrist","Wrist"], 68 | 'owi535' :[ 69 | 'Rotation', 'Base', 'Elbow', 'Wrist' 70 | ] 71 | } 72 | 73 | JOINT_NAMES={ 74 | 'panda': ['panda_joint1', 'panda_joint2', 'panda_joint3', 'panda_joint4', 75 | 'panda_joint5', 'panda_joint6', 'panda_joint7', 'panda_finger_joint1'], 76 | 'kuka': ['iiwa_joint_1', 'iiwa_joint_2', 'iiwa_joint_3', 'iiwa_joint_4', 77 | 'iiwa_joint_5', 'iiwa_joint_6', 'iiwa_joint_7'], 78 | 'baxter': ['head_pan', 'right_s0', 'left_s0', 'right_s1', 'left_s1', 79 | 'right_e0', 'left_e0', 'right_e1', 'left_e1', 'right_w0', 80 | 'left_w0', 'right_w1', 'left_w1', 'right_w2', 'left_w2'], 81 | 'owi535' :[ 82 | 'Rotation', 'Base', 'Elbow', 'Wrist' 83 | ] 84 | } 85 | 86 | JOINT_TO_KP = { 87 | 'panda': [1, 1, 2, 3, 4, 4, 5, 6], 88 | 'kuka':[1,2,3,4,5,6,7], 89 | 'baxter':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], 90 | 'owi535':[0,1,2,3] 91 | } 92 | 93 | # flip_pairs=[ 94 | # ["right_s0","left_s0"],["right_s1","left_s1"],["right_e0","left_e0"], 95 | # ["right_e1","left_e1"],["right_w0","left_w0"],["right_w1","left_w1"], 96 | # ["right_w2","left_w2"],["right_hand","left_hand"] 97 | # ] 98 | flip_pairs = [ [1,2],[3,4],[5,6],[7,8],[9,10],[11,12],[13,14],[15,16] ] 99 | 100 | PANDA_LIMB_LENGTH ={ 101 | "link0-link2" : 0.3330, 102 | "link2-link3" : 0.3160, 103 | "link3-link4" : 0.0825, 104 | "link4-link6" : 0.39276, 105 | "link6-link7" : 0.0880, 106 | "link7-hand" : 0.1070 107 | } 108 | KUKA_LIMB_LENGTH ={ 109 | "link0-link1" : 0.1500, 110 | "link1-link2" : 0.1900, 111 | "link2-link3" : 0.2100, 112 | "link3-link4" : 0.1900, 113 | "link4-link5" : 0.2100, 114 | "link5-link6" : 0.19946, 115 | "link6-link7" : 0.10122 116 | } 117 | 118 | LIMB_LENGTH = { 119 | "panda": list(PANDA_LIMB_LENGTH.values()), 120 | "kuka": list(KUKA_LIMB_LENGTH.values()) 121 | } 122 | 123 | INITIAL_JOINT_ANGLE = { 124 | "zero": { 125 | "panda": { 126 | "panda_joint1": 0.0, 127 | "panda_joint2": 0.0, 128 | "panda_joint3": 0.0, 129 | "panda_joint4": 0.0, 130 | "panda_joint5": 0.0, 131 | "panda_joint6": 0.0, 132 | "panda_joint7": 0.0, 133 | "panda_finger_joint1": 0.0 134 | }, 135 | "kuka": { 136 | "iiwa_joint_1": 0.0, 137 | "iiwa_joint_2": 0.0, 138 | "iiwa_joint_3": 0.0, 139 | "iiwa_joint_4": 0.0, 140 | "iiwa_joint_5": 0.0, 141 | "iiwa_joint_6": 0.0, 142 | "iiwa_joint_7": 0.0 143 | }, 144 | "baxter": { 145 | "head_pan": 0.0, 146 | "right_s0": 0.0, 147 | "left_s0": 0.0, 148 | "right_s1": 0.0, 149 | "left_s1": 0.0, 150 | "right_e0": 0.0, 151 | "left_e0": 0.0, 152 | "right_e1": 0.0, 153 | "left_e1": 0.0, 154 | "right_w0": 0.0, 155 | "left_w0": 0.0, 156 | "right_w1": 0.0, 157 | "left_w1": 0.0, 158 | "right_w2": 0.0, 159 | "left_w2": 0.0 160 | }, 161 | "owi535":{ 162 | "Rotation":0.0, 163 | "Base":0.0, 164 | "Elbow":0.0, 165 | "Wrist":0.0 166 | } 167 | }, 168 | "mean": { 169 | "panda": { 170 | "panda_joint1": 0.0, 171 | "panda_joint2": 0.0, 172 | "panda_joint3": 0.0, 173 | "panda_joint4": -1.52715, 174 | "panda_joint5": 0.0, 175 | "panda_joint6": 1.8675, 176 | "panda_joint7": 0.0, 177 | "panda_finger_joint1": 0.02 178 | }, 179 | "kuka": { 180 | "iiwa_joint_1": 0.0, 181 | "iiwa_joint_2": 0.0, 182 | "iiwa_joint_3": 0.0, 183 | "iiwa_joint_4": 0.0, 184 | "iiwa_joint_5": 0.0, 185 | "iiwa_joint_6": 0.0, 186 | "iiwa_joint_7": 0.0 187 | }, 188 | "baxter": { 189 | "head_pan": 0.0, 190 | "right_s0": 0.0, 191 | "left_s0": 0.0, 192 | "right_s1": -0.5499999999999999, 193 | "left_s1": -0.5499999999999999, 194 | "right_e0": 0.0, 195 | "left_e0": 0.0, 196 | "right_e1": 1.284, 197 | "left_e1": 1.284, 198 | "right_w0": 0.0, 199 | "left_w0": 0.0, 200 | "right_w1": 0.2616018366049999, 201 | "left_w1": 0.2616018366049999, 202 | "right_w2": 0.0, 203 | "left_w2": 0.0 204 | }, 205 | "owi535":{ 206 | "Rotation":0.0, 207 | "Base":-0.523598, 208 | "Elbow":0.523598, 209 | "Wrist":0.0 210 | } 211 | } 212 | } 213 | 214 | JOINT_BOUNDS = { 215 | "panda": [[-2.9671, 2.9671], 216 | [-1.8326, 1.8326], 217 | [-2.9671, 2.9671], 218 | [-3.1416, 0.0873], 219 | [-2.9671, 2.9671], 220 | [-0.0873, 3.8223], 221 | [-2.9671, 2.9671], 222 | [ 0.0000, 0.0400]], 223 | 224 | "kuka": [[-2.9671, 2.9671], 225 | [-2.0944, 2.0944], 226 | [-2.9671, 2.9671], 227 | [-2.0944, 2.0944], 228 | [-2.9671, 2.9671], 229 | [-2.0944, 2.0944], 230 | [-3.0543, 3.0543]], 231 | 232 | "baxter": [[-1.5708, 1.5708], 233 | [-1.7017, 1.7017], 234 | [-1.7017, 1.7017], 235 | [-2.1470, 1.0470], 236 | [-2.1470, 1.0470], 237 | [-3.0542, 3.0542], 238 | [-3.0542, 3.0542], 239 | [-0.0500, 2.6180], 240 | [-0.0500, 2.6180], 241 | [-3.0590, 3.0590], 242 | [-3.0590, 3.0590], 243 | [-1.5708, 2.0940], 244 | [-1.5708, 2.0940], 245 | [-3.0590, 3.0590], 246 | [-3.0590, 3.0590]], 247 | "owi535":[ 248 | [-2.268928,2.268928], 249 | [-1.570796,1.047198], 250 | [-1.047198, 1.570796], 251 | [-0.785398,0.785398] 252 | ] 253 | } 254 | 255 | 256 | INTRINSICS_DICT = { 257 | "azure": (399.6578776041667, 399.4959309895833, 319.8955891927083, 244.0602823893229), 258 | "kinect": (525.0, 525.0, 319.5, 239.5), 259 | "realsense": (615.52392578125, 615.2191772460938, 328.2606506347656, 251.7917022705078), 260 | "orb": (615.52392578125, 615.2191772460938, 328.2606506347656, 251.7917022705078), 261 | 262 | } 263 | -------------------------------------------------------------------------------- /lib/dataset/multiepoch_dataloader.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import os 3 | sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) 4 | from itertools import chain 5 | 6 | 7 | class MultiEpochDataLoader: 8 | def __init__(self, dataloader): 9 | self.dataloader = dataloader 10 | self.dataloader_iter = None 11 | self.epoch_id = -1 12 | self.batch_id = 0 13 | self.n_repeats_sampler = 1 14 | self.sampler_length = None 15 | self.id_in_sampler = None 16 | 17 | def __iter__(self): 18 | if self.dataloader_iter is None: 19 | self.dataloader_iter = iter(self.dataloader) 20 | 21 | self.sampler_length = len(self.dataloader) 22 | self.id_in_sampler = 0 23 | while self.sampler_length <= 2 * self.dataloader.num_workers: 24 | self.sampler_length += len(self.dataloader) 25 | next_index_sampler = iter(self.dataloader_iter._index_sampler) 26 | self.dataloader_iter._sampler_iter = chain( 27 | self.dataloader_iter._sampler_iter, next_index_sampler) 28 | 29 | self.epoch_id += 1 30 | self.batch_id = 0 31 | self.epoch_size = len(self.dataloader_iter) 32 | 33 | return self 34 | 35 | def __len__(self): 36 | return len(self.dataloader) 37 | 38 | def __next__(self): 39 | if self.batch_id == self.epoch_size: 40 | raise StopIteration 41 | 42 | elif self.id_in_sampler == self.sampler_length - 2 * self.dataloader.num_workers: 43 | next_index_sampler = iter(self.dataloader_iter._index_sampler) 44 | self.dataloader_iter._sampler_iter = next_index_sampler 45 | self.id_in_sampler = 0 46 | 47 | idx, batch = self.dataloader_iter._get_data() 48 | self.dataloader_iter._tasks_outstanding -= 1 49 | self.dataloader_iter._process_data(batch) 50 | 51 | self.batch_id += 1 52 | self.id_in_sampler += 1 53 | return batch 54 | 55 | def get_infos(self): 56 | return dict() 57 | 58 | def __del__(self): 59 | del self.dataloader_iter 60 | -------------------------------------------------------------------------------- /lib/dataset/roboutils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torchvision 3 | import numpy as np 4 | import torchvision.transforms as transforms 5 | from PIL import Image 6 | import sys 7 | sys.path.append("..") 8 | from utils.geometries import get_K_crop_resize 9 | import random 10 | 11 | def hnormalized(vector): 12 | hnormalized_vector = (vector / vector[-1])[:-1] 13 | return hnormalized_vector 14 | 15 | def crop_to_aspect_ratio(images, box, masks=None, K=None): 16 | assert images.dim() == 4 17 | bsz, _, h, w = images.shape 18 | assert box.dim() == 1 19 | assert box.shape[0] == 4 20 | w_output, h_output = box[[2, 3]] - box[[0, 1]] 21 | boxes = torch.cat( 22 | (torch.arange(bsz).unsqueeze(1).to(box.device).float(), box.unsqueeze(0).repeat(bsz, 1).float()), 23 | dim=1).to(images.device) 24 | images = torchvision.ops.roi_pool(images, boxes, output_size=(h_output, w_output)) 25 | if masks is not None: 26 | assert masks.dim() == 4 27 | masks = torchvision.ops.roi_pool(masks, boxes, output_size=(h_output, w_output)) 28 | if K is not None: 29 | assert K.dim() == 3 30 | assert K.shape[0] == bsz 31 | K = get_K_crop_resize(K, boxes[:, 1:], orig_size=(h, w), crop_resize=(h_output, w_output)) 32 | return images, masks, K 33 | 34 | 35 | def make_detections_from_segmentation(masks): 36 | detections = [] 37 | if masks.dim() == 4: 38 | assert masks.shape[0] == 1 39 | masks = masks.squeeze(0) 40 | 41 | for mask_n in masks: 42 | dets_n = dict() 43 | for uniq in torch.unique(mask_n, sorted=True): 44 | ids = np.where((mask_n == uniq).cpu().numpy()) 45 | x1, y1, x2, y2 = np.min(ids[1]), np.min(ids[0]), np.max(ids[1]), np.max(ids[0]) 46 | dets_n[int(uniq.item())] = torch.tensor([x1, y1, x2, y2]).to(mask_n.device) 47 | detections.append(dets_n) 48 | return detections 49 | 50 | 51 | def make_masks_from_det(detections, h, w): 52 | n_ids = len(detections) 53 | detections = torch.as_tensor(detections) 54 | masks = torch.zeros((n_ids, h, w)).byte() 55 | for mask_n, det_n in zip(masks, detections): 56 | x1, y1, x2, y2 = det_n.cpu().int().tolist() 57 | mask_n[y1:y2, x1:x2] = True 58 | return masks 59 | 60 | def get_bbox(bbox,w,h, strict=True): 61 | assert len(bbox)==4 62 | wmin, hmin, wmax, hmax = bbox 63 | if wmax<0 or hmax <0 or wmin > w or hmin > h: 64 | print("wmax",wmax,"hmax",hmax,"wmin",wmin,"hmin",hmin) 65 | wmin,hmin,wmax,hmax=max(0,wmin),max(0,hmin),min(w,wmax),min(h,hmax) 66 | wnew=wmax-wmin 67 | hnew=hmax-hmin 68 | wmin=int(max(0,wmin-0.3*wnew)) 69 | wmax=int(min(w,wmax+0.3*wnew)) 70 | hmin=int(max(0,hmin-0.3*hnew)) 71 | hmax=int(min(h,hmax+0.3*hnew)) 72 | wnew=wmax-wmin 73 | hnew=hmax-hmin 74 | 75 | if not strict: 76 | randomw = (random.random()-0.2)/2 77 | randomh = (random.random()-0.2)/2 78 | 79 | dwnew=randomw*wnew 80 | wmax+=dwnew/2 81 | wmin-=dwnew/2 82 | 83 | dhnew=randomh*hnew 84 | hmax+=dhnew/2 85 | hmin-=dhnew/2 86 | 87 | wmin=int(max(0,wmin)) 88 | wmax=int(min(w,wmax)) 89 | hmin=int(max(0,hmin)) 90 | hmax=int(min(h,hmax)) 91 | wnew=wmax-wmin 92 | hnew=hmax-hmin 93 | 94 | if wnew < 150: 95 | wmax+=75 96 | wmin-=75 97 | if hnew < 120: 98 | hmax+=60 99 | hmin-=60 100 | 101 | wmin,hmin,wmax,hmax=max(0,wmin),max(0,hmin),min(w,wmax),min(h,hmax) 102 | wmin,hmin,wmax,hmax=min(w,wmin),min(h,hmin),max(0,wmax),max(0,hmax) 103 | new_bbox = np.array([wmin,hmin,wmax,hmax]) 104 | return new_bbox 105 | 106 | def get_bbox_raw(bbox): 107 | assert len(bbox)==4 108 | wmin, hmin, wmax, hmax = bbox 109 | wnew=wmax-wmin 110 | hnew=hmax-hmin 111 | wmin=int(wmin-0.3*wnew) 112 | wmax=int(wmax+0.3*wnew) 113 | hmin=int(hmin-0.3*hnew) 114 | hmax=int(hmax+0.3*hnew) 115 | wnew=wmax-wmin 116 | hnew=hmax-hmin 117 | 118 | if wnew < 150: 119 | wmax+=75 120 | wmin-=75 121 | if hnew < 120: 122 | hmax+=60 123 | hmin-=60 124 | 125 | new_bbox = np.array([wmin,hmin,wmax,hmax]) 126 | return new_bbox 127 | 128 | def resize_image(image, bbox, mask, state, bbox_strict_bounded=None): 129 | #image as np.array 130 | wmin, hmin, wmax, hmax = bbox 131 | square_size =int(max(wmax - wmin, hmax - hmin)) 132 | square_image = np.zeros((square_size, square_size, 3), dtype=np.uint8) 133 | 134 | x_offset = int((square_size - (wmax-wmin)) // 2) 135 | y_offset = int((square_size- (hmax-hmin)) // 2) 136 | 137 | square_image[y_offset:y_offset+(hmax-hmin), x_offset:x_offset+(wmax-wmin)] = image[hmin:hmax, wmin:wmax] 138 | 139 | keypoints=state['objects'][0]['keypoints_2d'] 140 | 141 | for k in keypoints: 142 | k[1]-=hmin 143 | k[1]+=y_offset 144 | k[0]+=x_offset 145 | k[0]-=wmin 146 | if bbox_strict_bounded is not None: 147 | bbox_strict_bounded_new = bbox_strict_bounded[0]-wmin+x_offset, bbox_strict_bounded[1]-hmin+y_offset, \ 148 | bbox_strict_bounded[2]-wmin+x_offset, bbox_strict_bounded[3]-hmin+y_offset 149 | 150 | K = state['camera']['K'] 151 | K[0, 2] -= (wmin-x_offset) 152 | K[1, 2] -= (hmin-y_offset) 153 | if bbox_strict_bounded is None: 154 | return square_image, mask, state 155 | else: 156 | return square_image, mask, state, bbox_strict_bounded_new 157 | 158 | def tensor_to_image(tensor): 159 | image = tensor.cpu().clone().detach().numpy() 160 | image = Image.fromarray(image) 161 | return image 162 | 163 | def process_truncation(image, bbox, mask, state, max_pad=[120, 120, 120, 120]): 164 | #image as np.array 165 | wmin, hmin, wmax, hmax = bbox 166 | if wmin > 0 and hmin > 0 and hmax<480 and wmax <640: 167 | return image, bbox, mask, state 168 | d_wmin, d_hmin, d_wmax, d_hmax = int(-wmin), int(-hmin), int(wmax-640), int(hmax-480) 169 | d_wmin, d_hmin, d_wmax, d_hmax = int(max(0,d_wmin)), int(max(0,d_hmin)), int(max(0,d_wmax)), int(max(0,d_hmax)) 170 | #print(d_wmin, d_hmin, d_wmax, d_hmax) 171 | d_wmin, d_hmin, d_wmax, d_hmax = min(max_pad[0],d_wmin), min(max_pad[1],d_hmin),min(max_pad[2],d_wmax),min(max_pad[3],d_hmax) 172 | wmax, hmax = 640 + d_wmax, 480+ d_hmax 173 | wnew, hnew = 640+d_wmax+d_wmin,480+d_hmax+d_hmin 174 | 175 | #print(wnew,hnew) 176 | new_image = np.zeros((hnew, wnew, 3), dtype=np.uint8) 177 | 178 | #print("d_hmin:",d_hmin,d_hmax, d_wmin, d_wmax,wnew, hnew,"hmax:",hmax) 179 | new_image[d_hmin:d_hmin+480, d_wmin:d_wmin+640] = image[0:480, 0:640] 180 | 181 | 182 | keypoints=state['objects'][0]['keypoints_2d'] 183 | 184 | for k in keypoints: 185 | k[1]+=d_hmin 186 | k[0]+=d_wmin 187 | 188 | K = state['camera']['K'] 189 | K[0, 2] += (d_wmin) 190 | K[1, 2] += (d_hmin) 191 | 192 | # new_bbox = np.array([max(0,int(wmin + d_wmin)),max(0,int(hmin + d_hmin)),int(wmax + d_wmin),int(hmax + d_hmin)]) 193 | bbox_raw = np.concatenate([np.min(keypoints, axis=0)[0:2], np.max(keypoints, axis=0)[0:2]]) 194 | new_bbox = get_bbox(bbox_raw,wnew,hnew) 195 | return new_image, new_bbox, mask, state 196 | 197 | def process_padding(image, bbox, mask, state, padding_pixel=25): 198 | #image as np.array 199 | keypoints=state['objects'][0]['keypoints_2d'] 200 | # in_frame = 0 201 | # for k in keypoints: 202 | # if k[0]>0 and k[0]<256 and k[1]>0 and k[1]<256: 203 | # in_frame +=1 204 | # if in_frame ==7: 205 | # return image, bbox, mask, state 206 | # d_pad = 30 - 3*in_frame 207 | d_pad = padding_pixel 208 | d_wmin, d_hmin, d_wmax, d_hmax = d_pad,d_pad,d_pad,d_pad 209 | 210 | wnew, hnew = 320+d_wmax+d_wmin,320+d_hmax+d_hmin 211 | 212 | #print(wnew,hnew) 213 | new_image = np.zeros((hnew, wnew, 3), dtype=np.uint8) 214 | 215 | #print("d_hmin:",d_hmin,d_hmax, d_wmin, d_wmax,wnew, hnew,"hmax:",hmax) 216 | new_image[d_hmin:d_hmin+320, d_wmin:d_wmin+320] = image[0:320, 0:320] 217 | 218 | for k in keypoints: 219 | k[1]+=d_hmin 220 | k[0]+=d_wmin 221 | 222 | K = state['camera']['K'] 223 | K[0, 2] += (d_wmin) 224 | K[1, 2] += (d_hmin) 225 | 226 | # new_bbox = np.array([max(0,int(wmin + d_wmin)),max(0,int(hmin + d_hmin)),int(wmax + d_wmin),int(hmax + d_hmin)]) 227 | bbox_raw = np.concatenate([np.min(keypoints, axis=0)[0:2], np.max(keypoints, axis=0)[0:2]]) 228 | new_bbox = get_bbox(bbox_raw,wnew,hnew) 229 | return new_image, new_bbox, mask, state 230 | 231 | def bbox_transform(bbox, K_original_inv, K, resize_hw): 232 | wmin, hmin, wmax, hmax = bbox 233 | corners = np.array([[wmin, hmin, 1.0], 234 | [wmax, hmin, 1.0], 235 | [wmax, hmax, 1.0], 236 | [wmin, hmax, 1.0]]) 237 | corners3d_ill = np.matmul(K_original_inv, corners.T) 238 | new_corners = np.matmul(K, corners3d_ill).T 239 | assert all(new_corners[:,2] == 1.0), new_corners 240 | new_bbox = np.array([ 241 | np.clip(new_corners[0,0], 0, resize_hw[0]), 242 | np.clip(new_corners[0,1], 0, resize_hw[1]), 243 | np.clip(new_corners[1,0], 0, resize_hw[0]), 244 | np.clip(new_corners[2,1], 0, resize_hw[1]), 245 | ]) 246 | return new_bbox 247 | 248 | def get_extended_bbox(bbox, dwmin, dhmin, dwmax, dhmax, bounded=True, image_size=None): 249 | wmin, hmin, wmax, hmax = bbox 250 | extended_bbox = np.array([wmin-dwmin, hmin-dhmin, wmax+dwmax, hmax+dhmax]) 251 | wmin, hmin, wmax, hmax = extended_bbox 252 | if bounded: 253 | assert image_size 254 | extended_bbox = np.array([max(0,wmin),max(0,hmin),min(image_size[0],wmax),min(image_size[1],hmax)]) 255 | else: 256 | pass 257 | return extended_bbox 258 | -------------------------------------------------------------------------------- /lib/dataset/samplers.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import os 3 | sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) 4 | import torch 5 | from torch.utils.data import Sampler 6 | 7 | class PartialSampler(Sampler): 8 | def __init__(self, ds, epoch_size): 9 | self.n_items = len(ds) 10 | if epoch_size is not None: 11 | self.epoch_size = min(epoch_size, len(ds)) 12 | else: 13 | self.epoch_size = len(ds) 14 | super().__init__(None) 15 | 16 | def __len__(self): 17 | return self.epoch_size 18 | 19 | def __iter__(self): 20 | return (i.item() for i in torch.randperm(self.n_items)[:len(self)]) 21 | 22 | 23 | class ListSampler(Sampler): 24 | def __init__(self, ids): 25 | self.ids = ids 26 | 27 | def __len__(self): 28 | return len(self.ids) 29 | 30 | def __iter__(self): 31 | return iter(self.ids) 32 | -------------------------------------------------------------------------------- /lib/models/backbones/Resnet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | 5 | class ResNet(nn.Module): 6 | 7 | def __init__(self, resnet_type): 8 | 9 | resnet_spec = {"resnet18": (BasicBlock, [2, 2, 2, 2], [64, 64, 128, 256, 512]), 10 | "resnet34": (BasicBlock, [3, 4, 6, 3], [64, 64, 128, 256, 512]), 11 | "resnet50": (Bottleneck, [3, 4, 6, 3], [64, 256, 512, 1024, 2048]), 12 | "resnet101": (Bottleneck, [3, 4, 23, 3], [64, 256, 512, 1024, 2048]), 13 | "resnet152": (Bottleneck, [3, 8, 36, 3], [64, 256, 512, 1024, 2048])} 14 | block, layers, channels = resnet_spec[resnet_type] 15 | 16 | self.block = block 17 | 18 | self.name = resnet_type 19 | self.inplanes = 64 20 | super(ResNet, self).__init__() 21 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, 22 | bias=False) 23 | self.bn1 = nn.BatchNorm2d(64) 24 | self.relu = nn.ReLU(inplace=True) 25 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 26 | self.layer1 = self._make_layer(block, 64, layers[0]) 27 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2) 28 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2) 29 | self.layer4 = self._make_layer(block, 512, layers[3], stride=2) 30 | 31 | for m in self.modules(): 32 | if isinstance(m, nn.Conv2d): 33 | # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 34 | nn.init.normal_(m.weight, mean=0, std=0.001) 35 | elif isinstance(m, nn.BatchNorm2d): 36 | nn.init.constant_(m.weight, 1) 37 | nn.init.constant_(m.bias, 0) 38 | 39 | def _make_layer(self, block, planes, blocks, stride=1): 40 | downsample = None 41 | if stride != 1 or self.inplanes != planes * block.expansion: 42 | downsample = nn.Sequential( 43 | nn.Conv2d(self.inplanes, planes * block.expansion, 44 | kernel_size=1, stride=stride, bias=False), 45 | nn.BatchNorm2d(planes * block.expansion), 46 | ) 47 | 48 | layers = [] 49 | layers.append(block(self.inplanes, planes, stride, downsample)) 50 | self.inplanes = planes * block.expansion 51 | for i in range(1, blocks): 52 | layers.append(block(self.inplanes, planes)) 53 | 54 | return nn.Sequential(*layers) 55 | 56 | def forward(self, x): 57 | x = self.conv1(x) 58 | x = self.bn1(x) 59 | x = self.relu(x) 60 | x = self.maxpool(x) 61 | 62 | x = self.layer1(x) 63 | x = self.layer2(x) 64 | x = self.layer3(x) 65 | x = self.layer4(x) 66 | 67 | return x 68 | 69 | def init_weights(self, backbone_name): 70 | # org_resnet = torch.utils.model_zoo.load_url(model_urls[self.name]) 71 | # drop orginal resnet fc layer, add 'None' in case of no fc layer, that will raise error 72 | 73 | import torchvision.models.resnet as resnet_ 74 | if backbone_name == "resnet34": 75 | resnet_imagenet = resnet_.resnet34(pretrained=True) 76 | org_resnet = resnet_imagenet.state_dict() 77 | elif backbone_name in ["resnet", "resnet50"]: 78 | backbone_name = "resnet50" 79 | resnet_imagenet = resnet_.resnet50(pretrained=True) 80 | org_resnet = resnet_imagenet.state_dict() 81 | elif backbone_name == "resnet101": 82 | resnet_imagenet = resnet_.resnet101(pretrained=True) 83 | org_resnet = resnet_imagenet.state_dict() 84 | else: 85 | raise NotImplementedError 86 | 87 | org_resnet.pop('fc.weight', None) 88 | org_resnet.pop('fc.bias', None) 89 | 90 | self.load_state_dict(org_resnet, strict=True) 91 | 92 | print(f"Initialized {backbone_name} from model zoo") 93 | 94 | 95 | 96 | class Bottleneck(nn.Module): 97 | """ Redefinition of Bottleneck residual block 98 | Adapted from the official PyTorch implementation 99 | """ 100 | expansion = 4 101 | 102 | def __init__(self, inplanes, planes, stride=1, downsample=None): 103 | super(Bottleneck, self).__init__() 104 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False) 105 | self.bn1 = nn.BatchNorm2d(planes) 106 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, 107 | padding=1, bias=False) 108 | self.bn2 = nn.BatchNorm2d(planes) 109 | self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False) 110 | self.bn3 = nn.BatchNorm2d(planes * 4) 111 | self.relu = nn.ReLU(inplace=True) 112 | self.downsample = downsample 113 | self.stride = stride 114 | 115 | def forward(self, x): 116 | residual = x 117 | 118 | out = self.conv1(x) 119 | out = self.bn1(out) 120 | out = self.relu(out) 121 | 122 | out = self.conv2(out) 123 | out = self.bn2(out) 124 | out = self.relu(out) 125 | 126 | out = self.conv3(out) 127 | out = self.bn3(out) 128 | 129 | if self.downsample is not None: 130 | residual = self.downsample(x) 131 | 132 | out += residual 133 | out = self.relu(out) 134 | 135 | return out 136 | 137 | def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1): 138 | """3x3 convolution with padding""" 139 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 140 | padding=dilation, groups=groups, bias=False, dilation=dilation) 141 | 142 | class BasicBlock(nn.Module): 143 | expansion = 1 144 | 145 | def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1, 146 | base_width=64, dilation=1, norm_layer=None, dcn=None): 147 | super(BasicBlock, self).__init__() 148 | if norm_layer is None: 149 | norm_layer = nn.BatchNorm2d 150 | if groups != 1 or base_width != 64: 151 | raise ValueError('BasicBlock only supports groups=1 and base_width=64') 152 | if dilation > 1: 153 | raise NotImplementedError("Dilation > 1 not supported in BasicBlock") 154 | # Both self.conv1 and self.downsample layers downsample the input when stride != 1 155 | self.conv1 = conv3x3(inplanes, planes, stride) 156 | self.bn1 = norm_layer(planes) 157 | self.relu = nn.ReLU(inplace=True) 158 | self.conv2 = conv3x3(planes, planes) 159 | self.bn2 = norm_layer(planes) 160 | self.downsample = downsample 161 | self.stride = stride 162 | 163 | def forward(self, x): 164 | identity = x 165 | 166 | out = self.conv1(x) 167 | out = self.bn1(out) 168 | out = self.relu(out) 169 | 170 | out = self.conv2(out) 171 | out = self.bn2(out) 172 | 173 | if self.downsample is not None: 174 | identity = self.downsample(x) 175 | 176 | out += identity 177 | out = self.relu(out) 178 | 179 | return out 180 | 181 | 182 | 183 | def get_resnet(backbone_name, pretrain=True): 184 | 185 | if backbone_name == "resnet": 186 | backbone = "resnet50" 187 | else: 188 | backbone = backbone_name 189 | 190 | model = ResNet(backbone) 191 | 192 | if pretrain: 193 | model.init_weights(backbone_name) 194 | return model -------------------------------------------------------------------------------- /lib/models/backbones/configs/hrnet_w32.yaml: -------------------------------------------------------------------------------- 1 | AUTO_RESUME: true 2 | CUDNN: 3 | BENCHMARK: true 4 | DETERMINISTIC: false 5 | ENABLED: true 6 | DATA_DIR: '' 7 | GPUS: (0,1,2,3) 8 | OUTPUT_DIR: 'output' 9 | LOG_DIR: 'log' 10 | WORKERS: 24 11 | PRINT_FREQ: 100 12 | 13 | DATASET: 14 | COLOR_RGB: true 15 | DATASET: 'coco' 16 | DATA_FORMAT: jpg 17 | FLIP: true 18 | NUM_JOINTS_HALF_BODY: 8 19 | PROB_HALF_BODY: 0.3 20 | ROOT: 'data/coco/' 21 | ROT_FACTOR: 45 22 | SCALE_FACTOR: 0.35 23 | TEST_SET: 'val2017' 24 | TRAIN_SET: 'train2017' 25 | 26 | MODEL: 27 | INIT_WEIGHTS: true 28 | NAME: pose_hrnet 29 | NUM_JOINTS: 7 30 | PRETRAINED: './models/hrnet_w32-36af842e_roc.pth' 31 | TARGET_TYPE: gaussian 32 | IMAGE_SIZE: 33 | - 256 34 | - 256 35 | HEATMAP_SIZE: 36 | - 64 37 | - 64 38 | SIGMA: 2 39 | EXTRA: 40 | PRETRAINED_LAYERS: 41 | - 'conv1' 42 | - 'bn1' 43 | - 'conv2' 44 | - 'bn2' 45 | - 'layer1' 46 | - 'transition1' 47 | - 'stage2' 48 | - 'transition2' 49 | - 'stage3' 50 | - 'transition3' 51 | - 'stage4' 52 | - 'incre_modules' 53 | 54 | FINAL_CONV_KERNEL: 1 55 | STAGE2: 56 | NUM_MODULES: 1 57 | NUM_BRANCHES: 2 58 | BLOCK: BASIC 59 | NUM_BLOCKS: 60 | - 4 61 | - 4 62 | NUM_CHANNELS: 63 | - 32 64 | - 64 65 | FUSE_METHOD: SUM 66 | STAGE3: 67 | NUM_MODULES: 4 68 | NUM_BRANCHES: 3 69 | BLOCK: BASIC 70 | NUM_BLOCKS: 71 | - 4 72 | - 4 73 | - 4 74 | NUM_CHANNELS: 75 | - 32 76 | - 64 77 | - 128 78 | FUSE_METHOD: SUM 79 | STAGE4: 80 | NUM_MODULES: 3 81 | NUM_BRANCHES: 4 82 | BLOCK: BASIC 83 | NUM_BLOCKS: 84 | - 4 85 | - 4 86 | - 4 87 | - 4 88 | NUM_CHANNELS: 89 | - 32 90 | - 64 91 | - 128 92 | - 256 93 | FUSE_METHOD: SUM 94 | 95 | # LOSS: 96 | # USE_TARGET_WEIGHT: true 97 | # TRAIN: 98 | # BATCH_SIZE_PER_GPU: 32 99 | # SHUFFLE: true 100 | # BEGIN_EPOCH: 0 101 | # END_EPOCH: 210 102 | # OPTIMIZER: adam 103 | # LR: 0.001 104 | # LR_FACTOR: 0.1 105 | # LR_STEP: 106 | # - 170 107 | # - 200 108 | # WD: 0.0001 109 | # GAMMA1: 0.99 110 | # GAMMA2: 0.0 111 | # MOMENTUM: 0.9 112 | # NESTEROV: false 113 | # TEST: 114 | # BATCH_SIZE_PER_GPU: 32 115 | # COCO_BBOX_FILE: 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json' 116 | # BBOX_THRE: 1.0 117 | # IMAGE_THRE: 0.0 118 | # IN_VIS_THRE: 0.2 119 | # MODEL_FILE: '' 120 | # NMS_THRE: 1.0 121 | # OKS_THRE: 0.9 122 | # USE_GT_BBOX: true 123 | # FLIP_TEST: true 124 | # POST_PROCESS: true 125 | # SHIFT_HEATMAP: true 126 | # DEBUG: 127 | # DEBUG: true 128 | # SAVE_BATCH_IMAGES_GT: true 129 | # SAVE_BATCH_IMAGES_PRED: true 130 | # SAVE_HEATMAPS_GT: true 131 | # SAVE_HEATMAPS_PRED: true -------------------------------------------------------------------------------- /lib/models/backbones/configs/hrnet_w48.yaml: -------------------------------------------------------------------------------- 1 | AUTO_RESUME: true 2 | CUDNN: 3 | BENCHMARK: true 4 | DETERMINISTIC: false 5 | ENABLED: true 6 | DATA_DIR: '' 7 | GPUS: (0,1,2,3) 8 | OUTPUT_DIR: 'output' 9 | LOG_DIR: 'log' 10 | WORKERS: 24 11 | PRINT_FREQ: 100 12 | 13 | DATASET: 14 | COLOR_RGB: true 15 | DATASET: 'coco' 16 | DATA_FORMAT: jpg 17 | FLIP: true 18 | NUM_JOINTS_HALF_BODY: 8 19 | PROB_HALF_BODY: 0.3 20 | ROOT: 'data/coco/' 21 | ROT_FACTOR: 45 22 | SCALE_FACTOR: 0.35 23 | TEST_SET: 'val2017' 24 | TRAIN_SET: 'train2017' 25 | 26 | MODEL: 27 | INIT_WEIGHTS: true 28 | NAME: pose_hrnet 29 | NUM_JOINTS: 7 30 | PRETRAINED: './models/hrnet_w48-8ef0771d.pth' 31 | TARGET_TYPE: gaussian 32 | IMAGE_SIZE: 33 | - 256 34 | - 256 35 | HEATMAP_SIZE: 36 | - 64 37 | - 64 38 | SIGMA: 2 39 | EXTRA: 40 | PRETRAINED_LAYERS: 41 | - 'conv1' 42 | - 'bn1' 43 | - 'conv2' 44 | - 'bn2' 45 | - 'layer1' 46 | - 'transition1' 47 | - 'stage2' 48 | - 'transition2' 49 | - 'stage3' 50 | - 'transition3' 51 | - 'stage4' 52 | FINAL_CONV_KERNEL: 1 53 | STAGE2: 54 | NUM_MODULES: 1 55 | NUM_BRANCHES: 2 56 | BLOCK: BASIC 57 | NUM_BLOCKS: 58 | - 4 59 | - 4 60 | NUM_CHANNELS: 61 | - 48 62 | - 96 63 | FUSE_METHOD: SUM 64 | STAGE3: 65 | NUM_MODULES: 4 66 | NUM_BRANCHES: 3 67 | BLOCK: BASIC 68 | NUM_BLOCKS: 69 | - 4 70 | - 4 71 | - 4 72 | NUM_CHANNELS: 73 | - 48 74 | - 96 75 | - 192 76 | FUSE_METHOD: SUM 77 | STAGE4: 78 | NUM_MODULES: 3 79 | NUM_BRANCHES: 4 80 | BLOCK: BASIC 81 | NUM_BLOCKS: 82 | - 4 83 | - 4 84 | - 4 85 | - 4 86 | NUM_CHANNELS: 87 | - 48 88 | - 96 89 | - 192 90 | - 384 91 | FUSE_METHOD: SUM 92 | 93 | # LOSS: 94 | # USE_TARGET_WEIGHT: true 95 | # TRAIN: 96 | # BATCH_SIZE_PER_GPU: 32 97 | # SHUFFLE: true 98 | # BEGIN_EPOCH: 0 99 | # END_EPOCH: 210 100 | # OPTIMIZER: adam 101 | # LR: 0.001 102 | # LR_FACTOR: 0.1 103 | # LR_STEP: 104 | # - 170 105 | # - 200 106 | # WD: 0.0001 107 | # GAMMA1: 0.99 108 | # GAMMA2: 0.0 109 | # MOMENTUM: 0.9 110 | # NESTEROV: false 111 | # DEBUG: 112 | # DEBUG: true 113 | # SAVE_BATCH_IMAGES_GT: true 114 | # SAVE_BATCH_IMAGES_PRED: true 115 | # SAVE_HEATMAPS_GT: true 116 | # SAVE_HEATMAPS_PRED: true 117 | -------------------------------------------------------------------------------- /lib/models/ctrnet/CtRNet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn.functional as F 3 | import kornia 4 | import numpy as np 5 | 6 | from .keypoint_seg_resnet import KeyPointSegNet 7 | from utils.BPnP import BPnP, BPnP_m3d, batch_project 8 | 9 | 10 | class CtRNet(torch.nn.Module): 11 | def __init__(self, args): 12 | super(CtRNet, self).__init__() 13 | 14 | self.args = args 15 | 16 | if args.use_gpu: 17 | self.device = "cuda" 18 | else: 19 | self.device = "cpu" 20 | 21 | # load keypoint segmentation model 22 | self.keypoint_seg_predictor = KeyPointSegNet(args, use_gpu=args.use_gpu) 23 | 24 | if args.use_gpu: 25 | self.keypoint_seg_predictor = self.keypoint_seg_predictor.cuda() 26 | 27 | self.keypoint_seg_predictor = torch.nn.DataParallel(self.keypoint_seg_predictor, device_ids=[0]) 28 | 29 | if args.keypoint_seg_model_path is not None: 30 | print("Loading keypoint segmentation model from {}".format(args.keypoint_seg_model_path)) 31 | self.keypoint_seg_predictor.load_state_dict(torch.load(args.keypoint_seg_model_path)) 32 | 33 | self.keypoint_seg_predictor.eval() 34 | 35 | # load BPnP 36 | self.bpnp = BPnP.apply 37 | self.bpnp_m3d = BPnP_m3d.apply 38 | 39 | # set up camera intrinsics 40 | 41 | self.intrinsics = np.array([[ args.fx, 0. , args.px ], 42 | [ 0. , args.fy, args.py ], 43 | [ 0. , 0. , 1. ]]) 44 | print("Camera intrinsics: {}".format(self.intrinsics)) 45 | 46 | self.K = torch.tensor(self.intrinsics, device=self.device, dtype=torch.float) 47 | 48 | 49 | def inference_single_image(self, img, joint_angles): 50 | # img: (3, H, W) 51 | # joint_angles: (7) 52 | # robot: robot model 53 | 54 | # detect 2d keypoints and segmentation masks 55 | points_2d, segmentation = self.keypoint_seg_predictor(img[None]) 56 | foreground_mask = torch.sigmoid(segmentation) 57 | _,t_list = self.robot.get_joint_RT(joint_angles) 58 | points_3d = torch.from_numpy(np.array(t_list)).float().to(self.device) 59 | if self.args.robot_name == "Panda": 60 | points_3d = points_3d[[0,2,3,4,6,7,8]] # remove 1 and 5 links as they are overlapping with 2 and 6 61 | 62 | #init_pose = torch.tensor([[ 1.5497, 0.5420, -0.3909, -0.4698, -0.0211, 1.3243]]) 63 | #cTr = bpnp(points_2d_pred, points_3d, K, init_pose) 64 | cTr = self.bpnp(points_2d, points_3d, self.K) 65 | 66 | return cTr, points_2d, foreground_mask 67 | 68 | def inference_batch_images(self, img, joint_angles): 69 | # img: (B, 3, H, W) 70 | # joint_angles: (B, 7) 71 | # robot: robot model 72 | 73 | # detect 2d keypoints and segmentation masks 74 | points_2d, segmentation = self.keypoint_seg_predictor(img) 75 | foreground_mask = torch.sigmoid(segmentation) 76 | 77 | points_3d_batch = [] 78 | for b in range(joint_angles.shape[0]): 79 | _,t_list = self.robot.get_joint_RT(joint_angles[b]) 80 | points_3d = torch.from_numpy(np.array(t_list)).float().to(self.device) 81 | if self.args.robot_name == "Panda": 82 | points_3d = points_3d[:,[0,2,3,4,6,7,8]] 83 | points_3d_batch.append(points_3d[None]) 84 | 85 | points_3d_batch = torch.cat(points_3d_batch, dim=0) 86 | 87 | cTr = self.bpnp_m3d(points_2d, points_3d_batch, self.K) 88 | 89 | return cTr, points_2d, foreground_mask 90 | 91 | def inference_batch_images_seg_kp(self, img): 92 | # img: (B, 3, H, W) 93 | # joint_angles: (B, 7) 94 | # robot: robot model 95 | 96 | # detect 2d keypoints and segmentation masks 97 | points_2d, segmentation = self.keypoint_seg_predictor(img) 98 | foreground_mask = torch.sigmoid(segmentation) 99 | 100 | return points_2d, foreground_mask 101 | 102 | def inference_batch_images_onlyseg(self, img): 103 | # img: (B, 3, H, W) 104 | # joint_angles: (B, 7) 105 | # robot: robot model 106 | 107 | # detect 2d keypoints and segmentation masks 108 | points_2d, segmentation = self.keypoint_seg_predictor(img) 109 | foreground_mask = torch.sigmoid(segmentation) 110 | 111 | return foreground_mask 112 | 113 | 114 | def cTr_to_pose_matrix(self, cTr): 115 | """ 116 | cTr: (batch_size, 6) 117 | pose_matrix: (batch_size, 4, 4) 118 | """ 119 | batch_size = cTr.shape[0] 120 | pose_matrix = torch.zeros((batch_size, 4, 4), device=self.device) 121 | pose_matrix[:, :3, :3] = kornia.geometry.conversions.axis_angle_to_rotation_matrix(cTr[:, :3]) 122 | pose_matrix[:, :3, 3] = cTr[:, 3:] 123 | pose_matrix[:, 3, 3] = 1 124 | return pose_matrix 125 | 126 | def to_valid_R_batch(self, R): 127 | # R is a batch of 3x3 rotation matrices 128 | U, S, V = torch.svd(R) 129 | return torch.bmm(U, V.transpose(1,2)) 130 | 131 | def render_single_robot_mask(self, cTr, robot_mesh, robot_renderer): 132 | # cTr: (6) 133 | # img: (1, H, W) 134 | 135 | R = kornia.geometry.conversions.angle_axis_to_rotation_matrix(cTr[:3][None]) # (1, 3, 3) 136 | R = torch.transpose(R,1,2) 137 | #R = to_valid_R_batch(R) 138 | T = cTr[3:][None] # (1, 3) 139 | 140 | if T[0,-1] < 0: 141 | rendered_image = robot_renderer.silhouette_renderer(meshes_world=robot_mesh, R = -R, T = -T) 142 | else: 143 | rendered_image = robot_renderer.silhouette_renderer(meshes_world=robot_mesh, R = R, T = T) 144 | 145 | if torch.isnan(rendered_image).any(): 146 | rendered_image = torch.nan_to_num(rendered_image) 147 | 148 | return rendered_image[..., 3] 149 | 150 | 151 | def train_on_batch(self, img, joint_angles, robot_renderer, criterions, phase='train'): 152 | # img: (B, 3, H, W) 153 | # joint_angles: (B, 7) 154 | with torch.set_grad_enabled(phase == 'train'): 155 | # detect 2d keypoints 156 | points_2d, segmentation = self.keypoint_seg_predictor(img) 157 | 158 | mask_list = list() 159 | seg_weight_list = list() 160 | 161 | for b in range(img.shape[0]): 162 | # get 3d points 163 | _,t_list = self.robot.get_joint_RT(joint_angles[b]) 164 | points_3d = torch.from_numpy(np.array(t_list)).float().to(self.device) 165 | if self.args.robot_name == "Panda": 166 | points_3d = points_3d[:,[0,2,3,4,6,7,8]] 167 | 168 | # get camera pose 169 | cTr = self.bpnp(points_2d[b][None], points_3d, self.K) 170 | 171 | # config robot mesh 172 | robot_mesh = robot_renderer.get_robot_mesh(joint_angles[b]) 173 | 174 | # render robot mask 175 | rendered_image = self.render_single_robot_mask(cTr.squeeze(), robot_mesh, robot_renderer) 176 | 177 | mask_list.append(rendered_image) 178 | points_2d_proj = batch_project(cTr, points_3d, self.K) 179 | reproject_error = criterions["mse_mean"](points_2d[b], points_2d_proj.squeeze()) 180 | seg_weight = torch.exp(-reproject_error * self.args.reproj_err_scale) 181 | seg_weight_list.append(seg_weight) 182 | 183 | mask_batch = torch.cat(mask_list,0) 184 | 185 | loss_bce = 0 186 | for b in range(segmentation.shape[0]): 187 | loss_bce = loss_bce + seg_weight_list[b] * criterions["bce"](segmentation[b].squeeze(), mask_batch[b].detach()) 188 | 189 | img_ref = torch.sigmoid(segmentation).detach() 190 | #loss_reproj = 0.0005 * criterionMSE_mean(points_2d, points_2d_proj_batch) 191 | loss_mse = 0.001 * criterions["mse_sum"](mask_batch, img_ref.squeeze()) 192 | loss = loss_mse + loss_bce 193 | 194 | return loss 195 | 196 | 197 | 198 | 199 | 200 | -------------------------------------------------------------------------------- /lib/models/ctrnet/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Oliverbansk/Holistic-Robot-Pose-Estimation/77cd316a36ff8de0c736a66c692ca9276fdc2eae/lib/models/ctrnet/__init__.py -------------------------------------------------------------------------------- /lib/models/ctrnet/keypoint_seg_resnet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch import nn 3 | import torch.nn.functional as F 4 | from torch.autograd import Variable 5 | import numpy as np 6 | import torchvision.models as models 7 | 8 | 9 | 10 | class KeypointUpSample(nn.Module): 11 | def __init__(self, in_channels, num_keypoints): 12 | super().__init__() 13 | input_features = in_channels 14 | deconv_kernel = 4 15 | self.kps_score_lowres = nn.ConvTranspose2d( 16 | input_features, 17 | num_keypoints, 18 | deconv_kernel, 19 | stride=2, 20 | padding=deconv_kernel // 2 - 1, 21 | ) 22 | nn.init.kaiming_normal_(self.kps_score_lowres.weight, mode="fan_out", nonlinearity="relu") 23 | nn.init.constant_(self.kps_score_lowres.bias, 0) 24 | #nn.init.uniform_(self.kps_score_lowres.weight) 25 | #nn.init.uniform_(self.kps_score_lowres.bias) 26 | self.up_scale = 1 27 | self.out_channels = num_keypoints 28 | 29 | def forward(self, x): 30 | x = self.kps_score_lowres(x) 31 | return torch.nn.functional.interpolate( 32 | x, scale_factor=float(self.up_scale), mode="bilinear", align_corners=False, recompute_scale_factor=False 33 | ) 34 | 35 | 36 | 37 | 38 | class SpatialSoftArgmax(nn.Module): 39 | """ 40 | The spatial softmax of each feature 41 | map is used to compute a weighted mean of the pixel 42 | locations, effectively performing a soft arg-max 43 | over the feature dimension. 44 | 45 | """ 46 | 47 | def __init__(self, normalize=True): 48 | """Constructor. 49 | Args: 50 | normalize (bool): Whether to use normalized 51 | image coordinates, i.e. coordinates in 52 | the range `[-1, 1]`. 53 | """ 54 | super().__init__() 55 | 56 | self.normalize = normalize 57 | 58 | def _coord_grid(self, h, w, device): 59 | if self.normalize: 60 | return torch.stack( 61 | torch.meshgrid( 62 | torch.linspace(-1, 1, h, device=device), 63 | torch.linspace(-1, 1, w, device=device), 64 | indexing='ij', 65 | ) 66 | ) 67 | return torch.stack( 68 | torch.meshgrid( 69 | torch.arange(0, h, device=device), 70 | torch.arange(0, w, device=device), 71 | indexing='ij', 72 | ) 73 | ) 74 | 75 | def forward(self, x): 76 | assert x.ndim == 4, "Expecting a tensor of shape (B, C, H, W)." 77 | 78 | # compute a spatial softmax over the input: 79 | # given an input of shape (B, C, H, W), 80 | # reshape it to (B*C, H*W) then apply 81 | # the softmax operator over the last dimension 82 | b, c, h, w = x.shape 83 | softmax = F.softmax(x.view(-1, h * w), dim=-1) 84 | 85 | # create a meshgrid of pixel coordinates 86 | # both in the x and y axes 87 | yc, xc = self._coord_grid(h, w, x.device) 88 | 89 | # element-wise multiply the x and y coordinates 90 | # with the softmax, then sum over the h*w dimension 91 | # this effectively computes the weighted mean of x 92 | # and y locations 93 | x_mean = (softmax * xc.flatten()).sum(dim=1, keepdims=True) 94 | y_mean = (softmax * yc.flatten()).sum(dim=1, keepdims=True) 95 | 96 | # concatenate and reshape the result 97 | # to (B, C, 2) where for every feature 98 | # we have the expected x and y pixel 99 | # locations 100 | return torch.cat([x_mean, y_mean], dim=1).view(-1, c, 2) 101 | 102 | 103 | class KeyPointSegNet(nn.Module): 104 | def __init__(self, args, lim=[-1., 1., -1., 1.], use_gpu=True): 105 | super(KeyPointSegNet, self).__init__() 106 | 107 | self.args = args 108 | self.lim = lim 109 | 110 | k = args.n_kp 111 | 112 | if use_gpu: 113 | self.device = "cuda" 114 | else: 115 | self.device = "cpu" 116 | 117 | 118 | deeplabv3_resnet50 = models.segmentation.deeplabv3_resnet50(pretrained=True) 119 | deeplabv3_resnet50.classifier[4] = torch.nn.Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1)) # Change final layer to 2 classes 120 | 121 | self.backbone = torch.nn.Sequential(list(deeplabv3_resnet50.children())[0]) 122 | 123 | self.read_out = KeypointUpSample(2048, k) 124 | 125 | self.spatialsoftargmax = SpatialSoftArgmax() 126 | 127 | self.classifer = torch.nn.Sequential((list(deeplabv3_resnet50.children())[1])) 128 | 129 | 130 | 131 | def forward(self, img): 132 | input_shape = img.shape[-2:] 133 | 134 | resnet_out = self.backbone(img)['out'] # (B, 2048, H//8, W//8) 135 | 136 | # keypoint prediction branch 137 | heatmap = self.read_out(resnet_out) # (B, k, H//4, W//4) 138 | keypoints = self.spatialsoftargmax(heatmap) 139 | # mapping back to original resolution from [-1,1] 140 | offset = torch.tensor([self.lim[0], self.lim[2]], device = resnet_out.device) 141 | scale = torch.tensor([self.args.width // 2, self.args.height // 2], device = resnet_out.device) 142 | keypoints = keypoints - offset 143 | keypoints = keypoints * scale 144 | 145 | # segmentation branch 146 | x = self.classifer(resnet_out) 147 | segout = F.interpolate(x, size=input_shape, mode='bilinear', align_corners=False) 148 | 149 | return keypoints, segout 150 | 151 | 152 | class KeyPointSegNet_x(nn.Module): 153 | def __init__(self, args=None, lim=[-1., 1., -1., 1.], use_gpu=True): 154 | super(KeyPointSegNet_x, self).__init__() 155 | 156 | self.args = args 157 | self.lim = lim 158 | 159 | k = 7 160 | self.width = 640 161 | self.height = 480 162 | 163 | if use_gpu: 164 | self.device = "cuda" 165 | else: 166 | self.device = "cpu" 167 | 168 | 169 | deeplabv3_resnet50 = models.segmentation.deeplabv3_resnet50(pretrained=True) 170 | deeplabv3_resnet50.classifier[4] = torch.nn.Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1)) # Change final layer to 2 classes 171 | 172 | self.backbone = torch.nn.Sequential(list(deeplabv3_resnet50.children())[0]) 173 | 174 | self.read_out = KeypointUpSample(2048, k) 175 | 176 | self.spatialsoftargmax = SpatialSoftArgmax() 177 | 178 | self.classifer = torch.nn.Sequential((list(deeplabv3_resnet50.children())[1])) 179 | 180 | 181 | 182 | def forward(self, img): 183 | input_shape = img.shape[-2:] 184 | 185 | resnet_out = self.backbone(img)['out'] # (B, 2048, H//8, W//8) 186 | 187 | # keypoint prediction branch 188 | heatmap = self.read_out(resnet_out) # (B, k, H//4, W//4) 189 | keypoints = self.spatialsoftargmax(heatmap) 190 | # mapping back to original resolution from [-1,1] 191 | offset = torch.tensor([self.lim[0], self.lim[2]], device = resnet_out.device) 192 | scale = torch.tensor([self.width // 2, self.height // 2], device = resnet_out.device) 193 | keypoints = keypoints - offset 194 | keypoints = keypoints * scale 195 | 196 | # segmentation branch 197 | x = self.classifer(resnet_out) 198 | segout = F.interpolate(x, size=input_shape, mode='bilinear', align_corners=False) 199 | 200 | return keypoints, segout 201 | -------------------------------------------------------------------------------- /lib/models/ctrnet/mask_inference.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | base_dir = os.path.abspath(".") 4 | sys.path.append(base_dir) 5 | import argparse 6 | import numpy as np 7 | import torch 8 | import torchvision.transforms as transforms 9 | from PIL import Image as PILImage 10 | from .CtRNet import CtRNet 11 | 12 | 13 | class seg_mask_inference(torch.nn.Module): 14 | def __init__(self, intrinsics, dataset, image_hw=(480, 640), scale=0.5): 15 | super(seg_mask_inference, self).__init__() 16 | self.args = self.set_args(intrinsics, dataset, image_hw, scale) 17 | self.net = CtRNet(self.args) 18 | self.trans_to_tensor = transforms.Compose([ 19 | transforms.ToTensor(), 20 | transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), 21 | ]) 22 | 23 | def set_args(self, intrinsics, dataset, image_hw=(480, 640), scale=0.5): 24 | parser = argparse.ArgumentParser() 25 | args = parser.parse_args("") 26 | args.use_gpu = True 27 | args.robot_name = 'Panda' 28 | args.n_kp = 7 29 | args.scale = scale 30 | args.height, args.width = image_hw 31 | args.fx, args.fy, args.px, args.py = intrinsics 32 | args.width, args.height = int(args.width * args.scale), int(args.height * args.scale) 33 | args.fx, args.fy, args.px, args.py = args.fx * args.scale, args.fy * args.scale, args.px * args.scale, args.py * args.scale 34 | 35 | if "realsense" in dataset: 36 | args.keypoint_seg_model_path = "models/panda_segmentation/realsense.pth" 37 | elif "azure" in dataset: 38 | args.keypoint_seg_model_path = "models/panda_segmentation/azure.pth" 39 | elif "kinect" in dataset: 40 | args.keypoint_seg_model_path = "models/panda_segmentation/kinect.pth" 41 | elif "orb" in dataset: 42 | args.keypoint_seg_model_path = "models/panda_segmentation/orb.pth" 43 | else: 44 | args.keypoint_seg_model_path = "models/panda_segmentation/azure.pth" 45 | 46 | return args 47 | 48 | def preprocess_img_tensor(self, img_tensor): 49 | width, height = img_tensor.shape[3], img_tensor.shape[2] 50 | img_array = np.uint8(img_tensor.detach().cpu().numpy()).transpose(0, 2, 3, 1) 51 | new_size = (int(width*self.args.scale),int(height*self.args.scale)) 52 | pil_image = [self.trans_to_tensor(PILImage.fromarray(img).resize(new_size)) for img in img_array] 53 | return torch.stack(pil_image) 54 | 55 | def forward(self, img_tensor): 56 | 57 | image = self.preprocess_img_tensor(img_tensor).cuda() 58 | segmentation = self.net.inference_batch_images_onlyseg(image) 59 | 60 | return segmentation 61 | 62 | class seg_keypoint_inference(torch.nn.Module): 63 | def __init__(self, image_hw=(480, 640), scale=0.5): 64 | super(seg_keypoint_inference, self).__init__() 65 | self.args = self.set_args(image_hw, scale) 66 | self.net = CtRNet(self.args) 67 | self.trans_to_tensor = transforms.Compose([ 68 | transforms.ToTensor(), 69 | transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), 70 | ]) 71 | 72 | def set_args(self, image_hw=(480, 640), scale=0.5): 73 | parser = argparse.ArgumentParser() 74 | args = parser.parse_args("") 75 | args.use_gpu = True 76 | args.robot_name = 'Panda' 77 | args.n_kp = 7 78 | args.scale = scale 79 | args.height, args.width = image_hw 80 | args.fx, args.fy, args.px, args.py = 320,320,320,240 81 | args.width, args.height = int(args.width * args.scale), int(args.height * args.scale) 82 | args.fx, args.fy, args.px, args.py = args.fx * args.scale, args.fy * args.scale, args.px * args.scale, args.py * args.scale 83 | args.keypoint_seg_model_path = "models/panda_segmentation/azure.pth" 84 | 85 | return args 86 | 87 | def preprocess_img_tensor(self, img_tensor): 88 | width, height = img_tensor.shape[3], img_tensor.shape[2] 89 | img_array = np.uint8(img_tensor.detach().cpu().numpy()).transpose(0, 2, 3, 1) 90 | new_size = (int(width*self.args.scale),int(height*self.args.scale)) 91 | pil_image = [self.trans_to_tensor(PILImage.fromarray(img).resize(new_size)) for img in img_array] 92 | return torch.stack(pil_image) 93 | 94 | def forward(self, img_tensor): 95 | 96 | image = self.preprocess_img_tensor(img_tensor).cuda() 97 | keypoints, segmentation = self.net.inference_batch_images_seg_kp(image) 98 | 99 | return keypoints, segmentation -------------------------------------------------------------------------------- /lib/models/depth_net.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) 4 | import torch 5 | import torch.nn as nn 6 | from .backbones.HRnet import get_hrnet 7 | from .backbones.Resnet import get_resnet 8 | from torch.nn import functional as F 9 | 10 | 11 | class RootNet(nn.Module): 12 | 13 | def __init__(self, backbone, pred_xy=False, use_offset=False, add_fc=False, input_shape=(256,256), **kwargs): 14 | 15 | super(RootNet, self).__init__() 16 | self.backbone_name = backbone 17 | if backbone in ["resnet34", "resnet50", "resnet"]: 18 | self.backbone = get_resnet(backbone) 19 | self.inplanes = self.backbone.block.expansion * 512 20 | elif backbone in ["hrnet", "hrnet32"]: 21 | self.backbone = get_hrnet(type_name=32, num_joints=7, depth_dim=1, 22 | pretrain=True, generate_feat=True, generate_hm=False) 23 | self.inplanes = 2048 24 | else: 25 | raise NotImplementedError 26 | 27 | self.pred_xy = pred_xy 28 | self.add_fc = add_fc 29 | self.use_offset = use_offset 30 | self.input_shape = input_shape 31 | self.output_shape = (input_shape[0]//4, input_shape[1]//4) 32 | self.outplanes = 256 33 | 34 | if self.pred_xy: 35 | self.deconv_layers = self._make_deconv_layer(3) 36 | self.xy_layer = nn.Conv2d( 37 | in_channels=self.outplanes, 38 | out_channels=1, 39 | kernel_size=1, 40 | stride=1, 41 | padding=0 42 | ) 43 | 44 | if self.add_fc: 45 | self.depth_relu = nn.ReLU() 46 | self.depth_fc1 = nn.Linear(self.inplanes, self.inplanes//2) 47 | self.depth_bn1 = nn.BatchNorm1d(self.inplanes//2) 48 | self.depth_fc2 = nn.Linear(self.inplanes//2, self.inplanes//4) 49 | self.depth_bn2 = nn.BatchNorm1d(self.inplanes//4) 50 | self.depth_fc3 = nn.Linear(self.inplanes//4, self.inplanes//4) 51 | self.depth_bn3 = nn.BatchNorm1d(self.inplanes//4) 52 | self.depth_fc4 = nn.Linear(self.inplanes//4, self.inplanes//2) 53 | self.depth_bn4 = nn.BatchNorm1d(self.inplanes//2) 54 | self.depth_fc5 = nn.Linear(self.inplanes//2, self.inplanes) 55 | 56 | self.depth_layer = nn.Conv2d( 57 | in_channels=self.inplanes, 58 | out_channels=1, 59 | kernel_size=1, 60 | stride=1, 61 | padding=0 62 | ) 63 | if self.use_offset: 64 | self.offset_layer = nn.Conv2d( 65 | in_channels=self.inplanes, 66 | out_channels=1, 67 | kernel_size=1, 68 | stride=1, 69 | padding=0 70 | ) 71 | 72 | def _make_deconv_layer(self, num_layers): 73 | layers = [] 74 | inplanes = self.inplanes 75 | outplanes = self.outplanes 76 | for i in range(num_layers): 77 | layers.append( 78 | nn.ConvTranspose2d( 79 | in_channels=inplanes, 80 | out_channels=outplanes, 81 | kernel_size=4, 82 | stride=2, 83 | padding=1, 84 | output_padding=0, 85 | bias=False)) 86 | layers.append(nn.BatchNorm2d(outplanes)) 87 | layers.append(nn.ReLU(inplace=True)) 88 | inplanes = outplanes 89 | 90 | return nn.Sequential(*layers) 91 | 92 | def forward(self, x, k_value): 93 | if self.backbone_name in ["resnet34", "resnet50", "resnet"]: 94 | fm = self.backbone(x) 95 | img_feat = torch.mean(fm.view(fm.size(0), fm.size(1), fm.size(2)*fm.size(3)), dim=2) # global average pooling 96 | elif self.backbone_name in ["hrnet", "hrnet32"]: 97 | img_feat = self.backbone(x) 98 | 99 | # x,y 100 | if self.pred_xy: 101 | xy = self.deconv_layers(fm) 102 | xy = self.xy_layer(xy) 103 | xy = xy.view(-1,1,self.output_shape[0]*self.output_shape[1]) 104 | xy = F.softmax(xy,2) 105 | xy = xy.view(-1,1,self.output_shape[0],self.output_shape[1]) 106 | hm_x = xy.sum(dim=(2)) 107 | hm_y = xy.sum(dim=(3)) 108 | coord_x = hm_x * torch.arange(self.output_shape[1]).float().cuda() 109 | coord_y = hm_y * torch.arange(self.output_shape[0]).float().cuda() 110 | coord_x = coord_x.sum(dim=2) 111 | coord_y = coord_y.sum(dim=2) 112 | 113 | # z 114 | if self.add_fc: 115 | img_feat1 = self.depth_relu(self.depth_bn1(self.depth_fc1(img_feat))) 116 | img_feat2 = self.depth_relu(self.depth_bn2(self.depth_fc2(img_feat1))) 117 | img_feat3 = self.depth_relu(self.depth_bn3(self.depth_fc3(img_feat2))) 118 | img_feat4 = self.depth_relu(self.depth_bn4(self.depth_fc4(img_feat3))) 119 | img_feat5 = self.depth_fc5(img_feat4) 120 | img_feat = img_feat + img_feat5 121 | img_feat = torch.unsqueeze(img_feat,2) 122 | img_feat = torch.unsqueeze(img_feat,3) 123 | gamma = self.depth_layer(img_feat) 124 | gamma = gamma.view(-1,1) 125 | depth = gamma * k_value.view(-1,1) 126 | 127 | if self.use_offset: 128 | offset = self.offset_layer(img_feat) 129 | offset = offset.view(-1,1) # unit: m 130 | offset *= 1000.0 131 | depth += offset 132 | 133 | if self.pred_xy: 134 | coord = torch.cat((coord_x, coord_y, depth), dim=1) 135 | return coord 136 | else: 137 | return depth 138 | 139 | def init_weights(self): 140 | if self.pred_xy: 141 | for name, m in self.deconv_layers.named_modules(): 142 | if isinstance(m, nn.ConvTranspose2d): 143 | nn.init.normal_(m.weight, std=0.001) 144 | elif isinstance(m, nn.BatchNorm2d): 145 | nn.init.constant_(m.weight, 1) 146 | nn.init.constant_(m.bias, 0) 147 | for m in self.xy_layer.modules(): 148 | if isinstance(m, nn.Conv2d): 149 | nn.init.normal_(m.weight, std=0.001) 150 | nn.init.constant_(m.bias, 0) 151 | print("Initialized deconv and xy layer of RootNet.") 152 | for m in self.depth_layer.modules(): 153 | if isinstance(m, nn.Conv2d): 154 | nn.init.normal_(m.weight, std=0.001) 155 | nn.init.constant_(m.bias, 0) 156 | print("Initialized depth layer of RootNet.") 157 | if self.use_offset: 158 | for m in self.offset_layer.modules(): 159 | if isinstance(m, nn.Conv2d): 160 | nn.init.normal_(m.weight, std=0.001) 161 | nn.init.constant_(m.bias, 0) 162 | print("Initialized offset layer of RootNet.") 163 | 164 | 165 | def get_rootnet(backbone, pred_xy=False, use_offset=False, add_fc=False, input_shape=(256,256), **kwargs): 166 | model = RootNet(backbone, pred_xy, use_offset, add_fc, input_shape=(256,256), **kwargs) 167 | model.init_weights() 168 | return model 169 | 170 | 171 | -------------------------------------------------------------------------------- /lib/utils/geometries.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | from torch.nn import functional as F 4 | 5 | def batch_rodrigues(theta): 6 | """Convert axis-angle representation to rotation matrix. 7 | Args: 8 | theta: size = [B, 3] 9 | Returns: 10 | Rotation matrix corresponding to the quaternion -- size = [B, 3, 3] 11 | """ 12 | l1norm = torch.norm(theta + 1e-8, p = 2, dim = 1) 13 | angle = torch.unsqueeze(l1norm, -1) 14 | normalized = torch.div(theta, angle) 15 | angle = angle * 0.5 16 | v_cos = torch.cos(angle) 17 | v_sin = torch.sin(angle) 18 | quat = torch.cat([v_cos, v_sin * normalized], dim = 1) 19 | return quat_to_rotmat(quat) 20 | 21 | def quat_to_rotmat(quat): 22 | """Convert quaternion coefficients to rotation matrix. 23 | Args: 24 | quat: size = [B, 4] 4 <===>(w, x, y, z) 25 | Returns: 26 | Rotation matrix corresponding to the quaternion -- size = [B, 3, 3] 27 | """ 28 | norm_quat = quat 29 | norm_quat = norm_quat/(norm_quat.norm(p=2, dim=1, keepdim=True)+1e-9) 30 | w, x, y, z = norm_quat[:,0], norm_quat[:,1], norm_quat[:,2], norm_quat[:,3] 31 | 32 | B = quat.size(0) 33 | 34 | w2, x2, y2, z2 = w.pow(2), x.pow(2), y.pow(2), z.pow(2) 35 | wx, wy, wz = w*x, w*y, w*z 36 | xy, xz, yz = x*y, x*z, y*z 37 | 38 | rotMat = torch.stack([w2 + x2 - y2 - z2, 2*xy - 2*wz, 2*wy + 2*xz, 39 | 2*wz + 2*xy, w2 - x2 + y2 - z2, 2*yz - 2*wx, 40 | 2*xz - 2*wy, 2*wx + 2*yz, w2 - x2 - y2 + z2], dim=1).view(B, 3, 3) 41 | return rotMat 42 | 43 | def quat_to_rotmat_np(quat): 44 | """Convert quaternion coefficients to rotation matrix. 45 | Without batch, in numpy. 46 | Args: 47 | quat: size = [4] <===>(w, x, y, z) 48 | Returns: 49 | Rotation matrix corresponding to the quaternion -- size = [3, 3] 50 | """ 51 | norm_quat = quat 52 | norm_quat = norm_quat / np.linalg.norm(norm_quat, ord=2, axis=0, keepdims=True) 53 | w, x, y, z = norm_quat[0], norm_quat[1], norm_quat[2], norm_quat[3] 54 | w2, x2, y2, z2 = w*w, x*x, y*y, z*z 55 | wx, wy, wz = w*x, w*y, w*z 56 | xy, xz, yz = x*y, x*z, y*z 57 | 58 | rotMat = np.array([[w2 - x2 - y2 + z2, -2*yz + 2*wx, 2*wy + 2*xz], 59 | [2*wx + 2*yz, -(w2 - x2 + y2 - z2), 2*xy - 2*wz], 60 | [-2*xz + 2*wy, 2*wz + 2*xy, -(w2 + x2 - y2 - z2)]]) 61 | return rotMat 62 | 63 | def rotmat_to_quat(matrices): 64 | batch = matrices.shape[0] 65 | this_device = matrices.device 66 | w = torch.sqrt(torch.max(1.0 + matrices[:,0,0] + matrices[:,1,1] + matrices[:,2,2], torch.zeros(1).to(this_device))) / 2.0 67 | w = torch.max (w , torch.autograd.Variable(torch.zeros(batch)).to(this_device) + 1e-8) #batch 68 | w4 = 4.0 * w 69 | x = (matrices[:,2,1] - matrices[:,1,2]) / w4 70 | y = (matrices[:,0,2] - matrices[:,2,0]) / w4 71 | z = (matrices[:,1,0] - matrices[:,0,1]) / w4 72 | quats = torch.cat( (w.view(batch,1), x.view(batch, 1),y.view(batch, 1), z.view(batch, 1) ), 1 ) 73 | quats = normalize_vector(quats) 74 | return quats 75 | 76 | def normalize_vector(v): 77 | batch = v.shape[0] 78 | v_mag = torch.sqrt(v.pow(2).sum(1))# batch 79 | v_mag = torch.max(v_mag, torch.autograd.Variable(torch.FloatTensor([1e-8]).to(v.device))) 80 | v_mag = v_mag.view(batch,1).expand(batch,v.shape[1]) 81 | v = v/v_mag 82 | return v 83 | 84 | # def rot6d_to_rotmat(x): 85 | # """Convert 6D rotation representation to 3x3 rotation matrix. 86 | # Based on Zhou et al., "On the Continuity of Rotation Representations in Neural Networks", CVPR 2019 87 | # Input: 88 | # (B,6) Batch of 6-D rotation representations 89 | # Output: 90 | # (B,3,3) Batch of corresponding rotation matrices 91 | # """ 92 | # x = x.view(-1,3,2) 93 | # a1 = x[:, :, 0] 94 | # a2 = x[:, :, 1] 95 | # b1 = F.normalize(a1) 96 | # b2 = F.normalize(a2 - torch.einsum('bi,bi->b', b1, a2).unsqueeze(-1) * b1) 97 | # b3 = torch.cross(b1, b2) 98 | # return torch.stack((b1, b2, b3), dim=-1) 99 | 100 | def rot6d_to_rotmat(poses): 101 | """ 102 | Code from https://github.com/papagina/RotationContinuity 103 | On the Continuity of Rotation Representations in Neural Networks 104 | Zhou et al. CVPR19 105 | https://zhouyisjtu.github.io/project_rotation/rotation.html 106 | """ 107 | assert poses.shape[-1] == 6 108 | x_raw = poses[..., 0:3] 109 | y_raw = poses[..., 3:6] 110 | x = x_raw / torch.norm(x_raw, p=2, dim=-1, keepdim=True) 111 | z = torch.cross(x, y_raw, dim=-1) 112 | z = z / torch.norm(z, p=2, dim=-1, keepdim=True) 113 | y = torch.cross(z, x, dim=-1) 114 | matrix = torch.stack((x, y, z), -1) 115 | return torch.transpose(matrix,dim0=-2,dim1=-1) 116 | 117 | def rotmat_to_rot6d(matrix): 118 | """ 119 | Converts rotation matrices to 6D rotation representation by Zhou et al. [1] 120 | by dropping the last row. Note that 6D representation is not unique. 121 | Args: 122 | matrix: batch of rotation matrices of size (*, 3, 3) 123 | 124 | Returns: 125 | 6D rotation representation, of size (*, 6) 126 | 127 | [1] Zhou, Y., Barnes, C., Lu, J., Yang, J., & Li, H. 128 | On the Continuity of Rotation Representations in Neural Networks. 129 | IEEE Conference on Computer Vision and Pattern Recognition, 2019. 130 | Retrieved from http://arxiv.org/abs/1812.07035 131 | """ 132 | return matrix[..., :2, :].clone().reshape(*matrix.size()[:-2], 6) 133 | 134 | def rot9d_to_rotmat(x): 135 | """ 136 | Maps 9D input vectors onto SO(3) via symmetric orthogonalization. 137 | x: should have size [batch_size, 9] 138 | Output has size [batch_size, 3, 3], where each inner 3x3 matrix is in SO(3). 139 | """ 140 | m = x.view(-1, 3, 3) 141 | d = m.device 142 | u, s, v = torch.svd(m.cpu()) 143 | u, v = u.to(d), v.to(d) 144 | vt = torch.transpose(v, 1, 2) 145 | det = torch.det(torch.bmm(u, vt)) 146 | det = det.view(-1, 1, 1) 147 | vt = torch.cat((vt[:, :2, :], vt[:, -1:, :] * det), 1) 148 | r = torch.bmm(u, vt) 149 | return r.cuda() 150 | 151 | #matrices batch*3*3 152 | #both matrix are orthogonal rotation matrices 153 | #out theta between 0 to 180 degree batch 154 | def compute_geodesic_distance_from_two_matrices(m1, m2): 155 | batch=m1.shape[0] 156 | m = torch.bmm(m1, m2.transpose(1,2)) #batch*3*3 157 | cos = ( m[:,0,0] + m[:,1,1] + m[:,2,2] - 1 )/2 158 | cos = torch.min(cos, torch.autograd.Variable(torch.ones(batch).cuda()) ) 159 | cos = torch.max(cos, torch.autograd.Variable(torch.ones(batch).cuda())*-1 ) 160 | theta = torch.acos(cos) 161 | # theta = torch.min(theta, 2*np.pi - theta) 162 | return theta 163 | 164 | def angle_axis_to_rotation_matrix(angle_axis): 165 | """Convert 3d vector of axis-angle rotation to 4x4 rotation matrix 166 | 167 | Args: 168 | angle_axis (Tensor): tensor of 3d vector of axis-angle rotations. 169 | 170 | Returns: 171 | Tensor: tensor of 4x4 rotation matrices. 172 | 173 | Shape: 174 | - Input: :math:`(N, 3)` 175 | - Output: :math:`(N, 4, 4)` 176 | 177 | Example: 178 | >>> input = torch.rand(1, 3) # Nx3 179 | >>> output = tgm.angle_axis_to_rotation_matrix(input) # Nx4x4 180 | """ 181 | def _compute_rotation_matrix(angle_axis, theta2, eps=1e-6): 182 | # We want to be careful to only evaluate the square root if the 183 | # norm of the angle_axis vector is greater than zero. Otherwise 184 | # we get a division by zero. 185 | k_one = 1.0 186 | theta = torch.sqrt(theta2) 187 | wxyz = angle_axis / (theta + eps) 188 | wx, wy, wz = torch.chunk(wxyz, 3, dim=1) 189 | cos_theta = torch.cos(theta) 190 | sin_theta = torch.sin(theta) 191 | 192 | r00 = cos_theta + wx * wx * (k_one - cos_theta) 193 | r10 = wz * sin_theta + wx * wy * (k_one - cos_theta) 194 | r20 = -wy * sin_theta + wx * wz * (k_one - cos_theta) 195 | r01 = wx * wy * (k_one - cos_theta) - wz * sin_theta 196 | r11 = cos_theta + wy * wy * (k_one - cos_theta) 197 | r21 = wx * sin_theta + wy * wz * (k_one - cos_theta) 198 | r02 = wy * sin_theta + wx * wz * (k_one - cos_theta) 199 | r12 = -wx * sin_theta + wy * wz * (k_one - cos_theta) 200 | r22 = cos_theta + wz * wz * (k_one - cos_theta) 201 | rotation_matrix = torch.cat( 202 | [r00, r01, r02, r10, r11, r12, r20, r21, r22], dim=1) 203 | return rotation_matrix.view(-1, 3, 3) 204 | 205 | def _compute_rotation_matrix_taylor(angle_axis): 206 | rx, ry, rz = torch.chunk(angle_axis, 3, dim=1) 207 | k_one = torch.ones_like(rx) 208 | rotation_matrix = torch.cat( 209 | [k_one, -rz, ry, rz, k_one, -rx, -ry, rx, k_one], dim=1) 210 | return rotation_matrix.view(-1, 3, 3) 211 | 212 | # stolen from ceres/rotation.h 213 | 214 | _angle_axis = torch.unsqueeze(angle_axis, dim=1) 215 | theta2 = torch.matmul(_angle_axis, _angle_axis.transpose(1, 2)) 216 | theta2 = torch.squeeze(theta2, dim=1) 217 | 218 | # compute rotation matrices 219 | rotation_matrix_normal = _compute_rotation_matrix(angle_axis, theta2) 220 | rotation_matrix_taylor = _compute_rotation_matrix_taylor(angle_axis) 221 | 222 | # create mask to handle both cases 223 | eps = 1e-6 224 | mask = (theta2 > eps).view(-1, 1, 1).to(theta2.device) 225 | mask_pos = (mask).type_as(theta2) 226 | mask_neg = (mask == False).type_as(theta2) # noqa 227 | 228 | # create output pose matrix 229 | batch_size = angle_axis.shape[0] 230 | rotation_matrix = torch.eye(4).to(angle_axis.device).type_as(angle_axis) 231 | rotation_matrix = rotation_matrix.view(1, 4, 4).repeat(batch_size, 1, 1) 232 | # fill output matrix with masked values 233 | rotation_matrix[..., :3, :3] = \ 234 | mask_pos * rotation_matrix_normal + mask_neg * rotation_matrix_taylor 235 | return rotation_matrix # Nx4x4 236 | 237 | 238 | def perspective_projection(points, rotation, translation, 239 | focal_length, camera_center): 240 | """ 241 | This function computes the perspective projection of a set of points. 242 | Input: 243 | points (bs, N, 3): 3D points 244 | rotation (bs, 3, 3): Camera rotation 245 | translation (bs, 3): Camera translation 246 | focal_length (bs,) or scalar: Focal length 247 | camera_center (bs, 2): Camera center 248 | """ 249 | batch_size = points.shape[0] 250 | K = torch.zeros([batch_size, 3, 3], device=points.device) 251 | K[:,0,0] = focal_length 252 | K[:,1,1] = focal_length 253 | K[:,2,2] = 1. 254 | K[:,:-1, -1] = camera_center 255 | 256 | # Transform points 257 | points = torch.einsum('bij,bkj->bki', rotation, points) 258 | points = points + translation.unsqueeze(1) 259 | 260 | # Apply perspective distortion 261 | projected_points = points / points[:,:,-1].unsqueeze(-1) 262 | 263 | # Apply camera intrinsics 264 | projected_points = torch.einsum('bij,bkj->bki', K, projected_points) 265 | 266 | return projected_points[:, :, :-1] 267 | 268 | 269 | def estimate_translation_np(S, joints_2d, joints_conf, focal_length=5000, img_size=224): 270 | """Find camera translation that brings 3D joints S closest to 2D the corresponding joints_2d. 271 | Input: 272 | S: (25, 3) 3D joint locations 273 | joints: (25, 3) 2D joint locations and confidence 274 | Returns: 275 | (3,) camera translation vector 276 | """ 277 | 278 | num_joints = S.shape[0] 279 | # focal length 280 | f = np.array([focal_length,focal_length]) 281 | # optical center 282 | center = np.array([img_size/2., img_size/2.]) 283 | 284 | # transformations 285 | Z = np.reshape(np.tile(S[:,2],(2,1)).T,-1) 286 | XY = np.reshape(S[:,0:2],-1) 287 | O = np.tile(center,num_joints) 288 | F = np.tile(f,num_joints) 289 | weight2 = np.reshape(np.tile(np.sqrt(joints_conf),(2,1)).T,-1) 290 | 291 | # least squares 292 | Q = np.array([F*np.tile(np.array([1,0]),num_joints), F*np.tile(np.array([0,1]),num_joints), O-np.reshape(joints_2d,-1)]).T 293 | c = (np.reshape(joints_2d,-1)-O)*Z - F*XY 294 | 295 | # weighted least squares 296 | W = np.diagflat(weight2) 297 | Q = np.dot(W,Q) 298 | c = np.dot(W,c) 299 | 300 | # square matrix 301 | A = np.dot(Q.T,Q) 302 | b = np.dot(Q.T,c) 303 | 304 | # solution 305 | trans = np.linalg.solve(A, b) 306 | 307 | return trans 308 | 309 | 310 | def estimate_translation(S, joints_2d, focal_length=5000., img_size=224.): 311 | """Find camera translation that brings 3D joints S closest to 2D the corresponding joints_2d. 312 | Input: 313 | S: (B, 49, 3) 3D joint locations 314 | joints: (B, 49, 3) 2D joint locations and confidence 315 | Returns: 316 | (B, 3) camera translation vectors 317 | """ 318 | 319 | device = S.device 320 | # Use only joints 25:49 (GT joints) 321 | S = S[:, 25:, :].cpu().numpy() 322 | joints_2d = joints_2d[:, 25:, :].cpu().numpy() 323 | joints_conf = joints_2d[:, :, -1] 324 | joints_2d = joints_2d[:, :, :-1] 325 | trans = np.zeros((S.shape[0], 3), dtype=np.float32) 326 | # Find the translation for each example in the batch 327 | for i in range(S.shape[0]): 328 | S_i = S[i] 329 | joints_i = joints_2d[i] 330 | conf_i = joints_conf[i] 331 | trans[i] = estimate_translation_np(S_i, joints_i, conf_i, focal_length=focal_length, img_size=img_size) 332 | return torch.from_numpy(trans).to(device) 333 | 334 | #input batch*4*4 or batch*3*3 335 | #output torch batch*3 x, y, z in radiant 336 | #the rotation is in the sequence of x,y,z 337 | def compute_euler_angles_from_rotation_matrices(rotation_matrices): 338 | batch=rotation_matrices.shape[0] 339 | R=rotation_matrices 340 | sy = torch.sqrt(R[:,0,0]*R[:,0,0]+R[:,1,0]*R[:,1,0]) 341 | singular= sy<1e-6 342 | singular=singular.float() 343 | 344 | x=torch.atan2(R[:,2,1], R[:,2,2]) 345 | y=torch.atan2(-R[:,2,0], sy) 346 | z=torch.atan2(R[:,1,0],R[:,0,0]) 347 | 348 | xs=torch.atan2(-R[:,1,2], R[:,1,1]) 349 | ys=torch.atan2(-R[:,2,0], sy) 350 | zs=R[:,1,0]*0 351 | 352 | out_euler=torch.autograd.Variable(torch.zeros(batch,3).cuda()) 353 | out_euler[:,0]=x*(1-singular)+xs*singular 354 | out_euler[:,1]=y*(1-singular)+ys*singular 355 | out_euler[:,2]=z*(1-singular)+zs*singular 356 | 357 | return out_euler 358 | 359 | 360 | def get_K_crop_resize(K, boxes, orig_size, crop_resize): 361 | """ 362 | Adapted from https://github.com/BerkeleyAutomation/perception/blob/master/perception/camera_intrinsics.py 363 | Skew is not handled ! 364 | """ 365 | assert K.shape[1:] == (3, 3) 366 | assert boxes.shape[1:] == (4, ) 367 | K = K.float() 368 | boxes = boxes.float() 369 | new_K = K.clone() 370 | 371 | orig_size = torch.tensor(orig_size, dtype=torch.float) 372 | crop_resize = torch.tensor(crop_resize, dtype=torch.float) 373 | 374 | final_width, final_height = max(crop_resize), min(crop_resize) 375 | crop_width = boxes[:, 2] - boxes[:, 0] 376 | crop_height = boxes[:, 3] - boxes[:, 1] 377 | crop_cj = (boxes[:, 0] + boxes[:, 2]) / 2 378 | crop_ci = (boxes[:, 1] + boxes[:, 3]) / 2 379 | 380 | # Crop 381 | cx = K[:, 0, 2] + (crop_width - 1) / 2 - crop_cj 382 | cy = K[:, 1, 2] + (crop_height - 1) / 2 - crop_ci 383 | 384 | # # Resize (upsample) 385 | center_x = (crop_width - 1) / 2 386 | center_y = (crop_height - 1) / 2 387 | orig_cx_diff = cx - center_x 388 | orig_cy_diff = cy - center_y 389 | scale_x = final_width / crop_width 390 | scale_y = final_height / crop_height 391 | scaled_center_x = (final_width - 1) / 2 392 | scaled_center_y = (final_height - 1) / 2 393 | fx = scale_x * K[:, 0, 0] 394 | fy = scale_y * K[:, 1, 1] 395 | cx = scaled_center_x + scale_x * orig_cx_diff 396 | cy = scaled_center_y + scale_y * orig_cy_diff 397 | 398 | new_K[:, 0, 0] = fx 399 | new_K[:, 1, 1] = fy 400 | new_K[:, 0, 2] = cx 401 | new_K[:, 1, 2] = cy 402 | return new_K 403 | 404 | 405 | def cropresize_backtransform_points2d(input_wh, boxes_2d_crop, 406 | output_wh, points_2d_in_output): 407 | bsz = input_wh.shape[0] 408 | assert output_wh.shape == (bsz, 2) 409 | assert input_wh.shape == (bsz, 2) 410 | assert points_2d_in_output.dim() == 3 411 | 412 | points_2d_normalized = points_2d_in_output / output_wh.unsqueeze(1) 413 | points_2d = boxes_2d_crop[:, [0, 1]].unsqueeze(1) + points_2d_normalized * input_wh.unsqueeze(1) 414 | return points_2d 415 | -------------------------------------------------------------------------------- /lib/utils/integral.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | from torch.nn import functional as F 4 | from utils.transforms import uvd_to_xyz 5 | 6 | 7 | def flip(x): 8 | assert (x.dim() == 3 or x.dim() == 4) 9 | dim = x.dim() - 1 10 | 11 | return x.flip(dims=(dim,)) 12 | 13 | def norm_heatmap_hrnet(norm_type, heatmap, tau=5, sample_num=1): 14 | # Input tensor shape: [N,C,...] 15 | shape = heatmap.shape 16 | if norm_type == 'softmax': 17 | heatmap = heatmap.reshape(*shape[:2], -1) 18 | # global soft max 19 | heatmap = F.softmax(heatmap, 2) 20 | return heatmap.reshape(*shape) 21 | elif norm_type == 'sampling': 22 | heatmap = heatmap.reshape(*shape[:2], -1) 23 | 24 | eps = torch.rand_like(heatmap) 25 | log_eps = torch.log(-torch.log(eps)) 26 | gumbel_heatmap = heatmap - log_eps / tau 27 | 28 | gumbel_heatmap = F.softmax(gumbel_heatmap, 2) 29 | return gumbel_heatmap.reshape(*shape) 30 | elif norm_type == 'multiple_sampling': 31 | 32 | heatmap = heatmap.reshape(*shape[:2], 1, -1) 33 | 34 | eps = torch.rand(*heatmap.shape[:2], sample_num, heatmap.shape[3], device=heatmap.device) 35 | log_eps = torch.log(-torch.log(eps)) 36 | gumbel_heatmap = heatmap - log_eps / tau 37 | gumbel_heatmap = F.softmax(gumbel_heatmap, 3) 38 | gumbel_heatmap = gumbel_heatmap.reshape(shape[0], shape[1], sample_num, shape[2]) 39 | 40 | # [B, S, K, -1] 41 | return gumbel_heatmap.transpose(1, 2) 42 | else: 43 | raise NotImplementedError 44 | 45 | def norm_heatmap_resnet(norm_type, heatmap): 46 | # Input tensor shape: [N,C,...] 47 | shape = heatmap.shape 48 | if norm_type == 'softmax': 49 | heatmap = heatmap.reshape(*shape[:2], -1) 50 | # global soft max 51 | heatmap = F.softmax(heatmap, 2) 52 | return heatmap.reshape(*shape) 53 | else: 54 | raise NotImplementedError 55 | 56 | def get_intrinsic_matrix_batch(f, c, bsz, inv=False): 57 | 58 | intrinsic_matrix = torch.zeros((bsz, 3, 3)).to(torch.float) 59 | 60 | if inv: 61 | intrinsic_matrix[:, 0, 0] = 1.0 / f[0].to(float) 62 | intrinsic_matrix[:, 0, 2] = - c[0].to(float) / f[0].to(float) 63 | intrinsic_matrix[:, 1, 1] = 1.0 / f[1].to(float) 64 | intrinsic_matrix[:, 1, 2] = - c[1].to(float) / f[1].to(float) 65 | intrinsic_matrix[:, 2, 2] = 1 66 | else: 67 | intrinsic_matrix[:, 0, 0] = f[0] 68 | intrinsic_matrix[:, 0, 2] = c[0] 69 | intrinsic_matrix[:, 1, 1] = f[1] 70 | intrinsic_matrix[:, 1, 2] = c[1] 71 | intrinsic_matrix[:, 2, 2] = 1 72 | 73 | return intrinsic_matrix.cuda(device=0) 74 | 75 | class HeatmapIntegralPose(nn.Module): 76 | """ 77 | This module takes in heatmap output and performs soft-argmax(integral operation). 78 | """ 79 | def __init__(self, backbone, **kwargs): 80 | super(HeatmapIntegralPose, self).__init__() 81 | self.backbone_name = backbone 82 | self.norm_type = kwargs["norm_type"] 83 | self.num_joints = kwargs["num_joints"] 84 | self.depth_dim = kwargs["depth_dim"] 85 | self.height_dim = kwargs["height_dim"] 86 | self.width_dim = kwargs["width_dim"] 87 | self.rootid = kwargs["rootid"] if "rootid" in kwargs else 0 88 | self.fixroot = kwargs["fixroot"] if "fixroot" in kwargs else False 89 | 90 | # self.focal_length = kwargs['FOCAL_LENGTH'] if 'FOCAL_LENGTH' in kwargs else 320 91 | bbox_3d_shape = kwargs['bbox_3d_shape'] if 'bbox_3d_shape' in kwargs else (2300, 2300, 2300) 92 | self.bbox_3d_shape = torch.tensor(bbox_3d_shape).float() 93 | self.depth_factor = self.bbox_3d_shape[2] * 1e-3 94 | self.image_size = kwargs["image_size"] 95 | 96 | 97 | def forward(self, out, flip_test=False, **kwargs): 98 | """ 99 | Adapted from https://github.com/Jeff-sjtu/HybrIK/tree/main/hybrik/models 100 | """ 101 | 102 | K = kwargs["K"] 103 | root_trans = kwargs["root_trans"] 104 | batch_size = out.shape[0] 105 | inv_k = get_intrinsic_matrix_batch((K[:,0,0],K[:,1,1]), (K[:,0,2],K[:,1,2]), bsz=batch_size, inv=True) 106 | 107 | if self.backbone_name in ["resnet", "resnet34", "resnet50"]: 108 | # out = out.reshape(batch_size, self.num_joints, self.depth_dim, self.height_dim, self.width_dim) 109 | out = out.reshape((out.shape[0], self.num_joints, -1)) 110 | out = norm_heatmap_resnet(self.norm_type, out) 111 | assert out.dim() == 3, out.shape 112 | heatmaps = out / out.sum(dim=2, keepdim=True) 113 | heatmaps = heatmaps.reshape((heatmaps.shape[0], self.num_joints, self.depth_dim, self.height_dim, self.width_dim)) 114 | hm_x0 = heatmaps.sum((2, 3)) # (B, K, W) 115 | hm_y0 = heatmaps.sum((2, 4)) # (B, K, H) 116 | hm_z0 = heatmaps.sum((3, 4)) # (B, K, D) 117 | 118 | range_tensor = torch.arange(hm_x0.shape[-1], dtype=torch.float32, device=hm_x0.device) 119 | 120 | hm_x = hm_x0 * range_tensor 121 | hm_y = hm_y0 * range_tensor 122 | hm_z = hm_z0 * range_tensor 123 | 124 | coord_x = hm_x.sum(dim=2, keepdim=True) 125 | coord_y = hm_y.sum(dim=2, keepdim=True) 126 | coord_z = hm_z.sum(dim=2, keepdim=True) 127 | 128 | coord_x = coord_x / float(self.width_dim) - 0.5 129 | coord_y = coord_y / float(self.height_dim) - 0.5 130 | coord_z = coord_z / float(self.depth_dim) - 0.5 131 | 132 | # -0.5 ~ 0.5 133 | pred_uvd_jts = torch.cat((coord_x, coord_y, coord_z), dim=2) 134 | if self.fixroot: 135 | pred_uvd_jts[:,self.rootid,2] = 0.0 136 | pred_uvd_jts_flat = pred_uvd_jts.reshape(batch_size, -1) 137 | 138 | pred_xyz_jts = uvd_to_xyz(uvd_jts=pred_uvd_jts, image_size=self.image_size, intrinsic_matrix_inverse=inv_k, 139 | root_trans=root_trans, depth_factor=self.depth_factor, return_relative=False) 140 | 141 | # pred_uvd_jts_back = xyz_to_uvd(xyz_jts=pred_xyz_jts, image_size=self.image_size, intrinsic_matrix=K, 142 | # root_trans=root_trans, depth_factor=self.depth_factor, return_relative=False) 143 | # print("(pred_uvd_jts-pred_uvd_jts_back).sum()",(pred_uvd_jts.cuda()-pred_uvd_jts_back.cuda()).sum()) 144 | 145 | return pred_uvd_jts, pred_xyz_jts 146 | 147 | elif self.backbone_name == "hrnet" or self.backbone_name == "hrnet32" or self.backbone_name == "hrnet48": 148 | out = out.reshape((out.shape[0], self.num_joints, -1)) 149 | heatmaps = norm_heatmap_hrnet(self.norm_type, out) 150 | assert heatmaps.dim() == 3, heatmaps.shape 151 | heatmaps = heatmaps.reshape((heatmaps.shape[0], self.num_joints, self.depth_dim, self.height_dim, self.width_dim)) 152 | 153 | hm_x0 = heatmaps.sum((2, 3)) # (B, K, W) 154 | hm_y0 = heatmaps.sum((2, 4)) # (B, K, H) 155 | hm_z0 = heatmaps.sum((3, 4)) # (B, K, D) 156 | 157 | range_tensor = torch.arange(hm_x0.shape[-1], dtype=torch.float32, device=hm_x0.device).unsqueeze(-1) 158 | # hm_x = hm_x0 * range_tensor 159 | # hm_y = hm_y0 * range_tensor 160 | # hm_z = hm_z0 * range_tensor 161 | 162 | # coord_x = hm_x.sum(dim=2, keepdim=True) 163 | # coord_y = hm_y.sum(dim=2, keepdim=True) 164 | # coord_z = hm_z.sum(dim=2, keepdim=True) 165 | coord_x = hm_x0.matmul(range_tensor) 166 | coord_y = hm_y0.matmul(range_tensor) 167 | coord_z = hm_z0.matmul(range_tensor) 168 | 169 | coord_x = coord_x / float(self.width_dim) - 0.5 170 | coord_y = coord_y / float(self.height_dim) - 0.5 171 | coord_z = coord_z / float(self.depth_dim) - 0.5 172 | 173 | # -0.5 ~ 0.5 174 | pred_uvd_jts = torch.cat((coord_x, coord_y, coord_z), dim=2) 175 | if self.fixroot: 176 | pred_uvd_jts[:,self.rootid,2] = 0.0 177 | pred_uvd_jts_flat = pred_uvd_jts.reshape(batch_size, -1) 178 | 179 | pred_xyz_jts = uvd_to_xyz(uvd_jts=pred_uvd_jts, image_size=self.image_size, intrinsic_matrix_inverse=inv_k, 180 | root_trans=root_trans, depth_factor=self.depth_factor, return_relative=False) 181 | 182 | # pred_uvd_jts_back = xyz_to_uvd(xyz_jts=pred_xyz_jts, image_size=self.image_size, intrinsic_matrix=K, 183 | # root_trans=root_trans, depth_factor=self.depth_factor, return_relative=False) 184 | # print("(pred_uvd_jts-pred_uvd_jts_back).sum()",(pred_uvd_jts.cuda()-pred_uvd_jts_back.cuda()).sum()) 185 | 186 | return pred_uvd_jts, pred_xyz_jts 187 | 188 | else: 189 | raise(NotImplementedError) 190 | 191 | 192 | class HeatmapIntegralJoint(nn.Module): 193 | """ 194 | This module takes in heatmap output and performs soft-argmax(integral operation). 195 | """ 196 | def __init__(self, backbone, **kwargs): 197 | super(HeatmapIntegralJoint, self).__init__() 198 | self.backbone_name = backbone 199 | self.norm_type = kwargs["norm_type"] 200 | self.dof = kwargs["dof"] 201 | self.joint_bounds = kwargs["joint_bounds"] 202 | assert self.joint_bounds.shape == (self.dof, 2), self.joint_bounds.shape 203 | 204 | 205 | def forward(self, out, **kwargs): 206 | """ 207 | Adapted from https://github.com/Jeff-sjtu/HybrIK/tree/main/hybrik/models 208 | """ 209 | 210 | batch_size = out.shape[0] 211 | 212 | if self.backbone_name in ["resnet34", "resnet50"]: 213 | out = out.reshape(batch_size, self.dof, -1) 214 | out = norm_heatmap_resnet(self.norm_type, out) 215 | assert out.dim() == 3, out.shape 216 | heatmaps = out / out.sum(dim=2, keepdim=True) 217 | heatmaps = heatmaps.reshape((heatmaps.shape[0], self.dof, -1)) # no depth dimension 218 | 219 | resolution = heatmaps.shape[-1] 220 | range_tensor = torch.arange(resolution, dtype=torch.float32, device=heatmaps.device).reshape(1,1,resolution) 221 | hm_int = heatmaps * range_tensor 222 | coord_joint_raw = hm_int.sum(dim=2, keepdim=True) 223 | coord_joint = coord_joint_raw / float(resolution) # 0~1 224 | 225 | bounds = self.joint_bounds.reshape(1,self.dof,2).cuda() 226 | jointrange = bounds[:,:,[1]] - bounds[:,:,[0]] 227 | joints = coord_joint * jointrange + bounds[:,:,[0]] 228 | 229 | return joints.squeeze(-1) 230 | 231 | else: 232 | raise(NotImplementedError) 233 | -------------------------------------------------------------------------------- /lib/utils/mesh_renderer.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | 4 | # io utils 5 | from pytorch3d.io import load_obj 6 | 7 | # datastructures 8 | from pytorch3d.structures import Meshes 9 | 10 | # rendering components 11 | from pytorch3d.renderer import ( 12 | RasterizationSettings, MeshRenderer, MeshRasterizer, BlendParams, 13 | SoftSilhouetteShader, HardPhongShader, PointLights, TexturesVertex, 14 | PerspectiveCameras,Textures 15 | ) 16 | 17 | from os.path import exists 18 | from roboticstoolbox.robot.ERobot import ERobot 19 | 20 | 21 | class PandaArm(): 22 | def __init__(self, urdf_file): 23 | 24 | self.robot = self.Panda(urdf_file) 25 | 26 | def get_joint_RT(self, joint_angle): 27 | 28 | assert joint_angle.shape[0] == 7 29 | 30 | link_idx_list = [0,1,2,3,4,5,6,7,9] 31 | # link 0,1,2,3,4,5,6,7, and hand 32 | R_list = [] 33 | t_list = [] 34 | 35 | for i in range(len(link_idx_list)): 36 | link_idx = link_idx_list[i] 37 | T = self.robot.fkine(joint_angle, end = self.robot.links[link_idx], start = self.robot.links[0]) 38 | R_list.append(T.R) 39 | t_list.append(T.t) 40 | 41 | return np.array(R_list),np.array(t_list) 42 | 43 | class Panda(ERobot): 44 | """ 45 | Class that imports a URDF model 46 | """ 47 | 48 | def __init__(self, urdf_file): 49 | 50 | links, name, urdf_string, urdf_filepath = self.URDF_read(urdf_file) 51 | 52 | super().__init__( 53 | links, 54 | name=name, 55 | manufacturer="Franka", 56 | urdf_string=urdf_string, 57 | urdf_filepath=urdf_filepath, 58 | ) 59 | 60 | 61 | class RobotMeshRenderer(): 62 | """ 63 | Class that render robot mesh with differentiable renderer 64 | """ 65 | def __init__(self, focal_length, principal_point, image_size, robot, mesh_files, device): 66 | 67 | self.focal_length = focal_length 68 | self.principal_point = principal_point 69 | self.image_size = image_size 70 | self.device = device 71 | self.robot = robot 72 | self.mesh_files = mesh_files 73 | self.preload_verts = [] 74 | self.preload_faces = [] 75 | 76 | 77 | # preload the mesh to save loading time 78 | for m_file in mesh_files: 79 | assert exists(m_file) 80 | preload_verts_i, preload_faces_idx_i, _ = load_obj(m_file) 81 | preload_faces_i = preload_faces_idx_i.verts_idx 82 | self.preload_verts.append(preload_verts_i) 83 | self.preload_faces.append(preload_faces_i) 84 | 85 | 86 | # set up differentiable renderer with given camera parameters 87 | self.cameras = PerspectiveCameras( 88 | focal_length = [focal_length], 89 | principal_point = [principal_point], 90 | device=device, 91 | in_ndc=False, image_size = [image_size] 92 | ) # (height, width) !!!!! 93 | 94 | blend_params = BlendParams(sigma=1e-8, gamma=1e-8) 95 | raster_settings = RasterizationSettings( 96 | image_size=image_size, 97 | blur_radius=np.log(1. / 1e-4 - 1.) * blend_params.sigma, 98 | faces_per_pixel=100, 99 | max_faces_per_bin=100000, # max_faces_per_bin=1000000, 100 | ) 101 | 102 | # Create a silhouette mesh renderer by composing a rasterizer and a shader. 103 | self.silhouette_renderer = MeshRenderer( 104 | rasterizer=MeshRasterizer( 105 | cameras=self.cameras, 106 | raster_settings=raster_settings 107 | ), 108 | shader=SoftSilhouetteShader(blend_params=blend_params) 109 | ) 110 | 111 | 112 | # We will also create a Phong renderer. This is simpler and only needs to render one face per pixel. 113 | raster_settings = RasterizationSettings( 114 | image_size=image_size, 115 | blur_radius=0.0, 116 | faces_per_pixel=1, 117 | max_faces_per_bin=100000, 118 | ) 119 | # We can add a point light in front of the object. 120 | lights = PointLights(device=device, location=((2.0, 2.0, -2.0),)) 121 | self.phong_renderer = MeshRenderer( 122 | rasterizer=MeshRasterizer( 123 | cameras=self.cameras, 124 | raster_settings=raster_settings 125 | ), 126 | shader=HardPhongShader(device=device, cameras=self.cameras, lights=lights) 127 | ) 128 | 129 | def get_robot_mesh(self, joint_angle): 130 | 131 | R_list, t_list = self.robot.get_joint_RT(joint_angle) 132 | assert len(self.mesh_files) == R_list.shape[0] and len(self.mesh_files) == t_list.shape[0] 133 | 134 | verts_list = [] 135 | faces_list = [] 136 | verts_rgb_list = [] 137 | verts_count = 0 138 | for i in range(len(self.mesh_files)): 139 | verts_i = self.preload_verts[i] 140 | faces_i = self.preload_faces[i] 141 | 142 | R = torch.tensor(R_list[i],dtype=torch.float32) 143 | t = torch.tensor(t_list[i],dtype=torch.float32) 144 | verts_i = verts_i @ R.T + t 145 | #verts_i = (R @ verts_i.T).T + t 146 | faces_i = faces_i + verts_count 147 | 148 | verts_count+=verts_i.shape[0] 149 | 150 | verts_list.append(verts_i.to(self.device)) 151 | faces_list.append(faces_i.to(self.device)) 152 | 153 | # Initialize each vertex to be white in color. 154 | color = torch.rand(3) 155 | verts_rgb_i = torch.ones_like(verts_i) * color # (V, 3) 156 | verts_rgb_list.append(verts_rgb_i.to(self.device)) 157 | 158 | 159 | 160 | verts = torch.concat(verts_list, dim=0) 161 | faces = torch.concat(faces_list, dim=0) 162 | 163 | verts_rgb = torch.concat(verts_rgb_list,dim=0)[None] 164 | textures = Textures(verts_rgb=verts_rgb) 165 | 166 | # Create a Meshes object 167 | robot_mesh = Meshes( 168 | verts=[verts.to(self.device)], 169 | faces=[faces.to(self.device)], 170 | textures=textures 171 | ) 172 | 173 | return robot_mesh 174 | 175 | 176 | def get_robot_verts_and_faces(self, joint_angle): 177 | 178 | R_list, t_list = self.robot.get_joint_RT(joint_angle) 179 | assert len(self.mesh_files) == R_list.shape[0] and len(self.mesh_files) == t_list.shape[0] 180 | 181 | verts_list = [] 182 | faces_list = [] 183 | verts_rgb_list = [] 184 | verts_count = 0 185 | for i in range(len(self.mesh_files)): 186 | verts_i = self.preload_verts[i] 187 | faces_i = self.preload_faces[i] 188 | 189 | R = torch.tensor(R_list[i],dtype=torch.float32) 190 | t = torch.tensor(t_list[i],dtype=torch.float32) 191 | verts_i = verts_i @ R.T + t 192 | #verts_i = (R @ verts_i.T).T + t 193 | faces_i = faces_i + verts_count 194 | 195 | verts_count+=verts_i.shape[0] 196 | 197 | verts_list.append(verts_i.to(self.device)) 198 | faces_list.append(faces_i.to(self.device)) 199 | 200 | # Initialize each vertex to be white in color. 201 | #color = torch.rand(3) 202 | #verts_rgb_i = torch.ones_like(verts_i) * color # (V, 3) 203 | #verts_rgb_list.append(verts_rgb_i.to(self.device)) 204 | 205 | verts = torch.concat(verts_list, dim=0) 206 | faces = torch.concat(faces_list, dim=0) 207 | 208 | 209 | return verts, faces -------------------------------------------------------------------------------- /lib/utils/metrics.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | from utils.transforms import point_projection_from_3d 4 | import matplotlib.pyplot as plt 5 | import seaborn as sns 6 | import os 7 | 8 | def compute_metrics_batch(robot,gt_keypoints3d,gt_keypoints2d,K_original,gt_joint,**pred_kwargs): 9 | 10 | # compute 3d keypoints locations 11 | # output shape: (batch_size, keypoints_num, 3) 12 | pred_joint = pred_kwargs["pred_joint"] 13 | pred_rot = pred_kwargs["pred_rot"] 14 | pred_trans = pred_kwargs["pred_trans"] 15 | if "pred_xy" in pred_kwargs and "pred_depth" in pred_kwargs and pred_kwargs["pred_xy"] is not None and pred_kwargs["pred_depth"] is not None: 16 | pred_xy = pred_kwargs["pred_xy"] 17 | pred_depth = pred_kwargs["pred_depth"] 18 | pred_trans = torch.cat((pred_xy,pred_depth),dim=-1) 19 | pred_xyz_integral = pred_kwargs["pred_xyz_integral"] 20 | reference_keypoint_id = pred_kwargs["reference_keypoint_id"] 21 | 22 | if pred_joint is None or pred_rot is None or pred_trans is None: 23 | assert pred_xyz_integral is not None 24 | pred_keypoints3d = pred_xyz_integral 25 | batch_size = pred_xyz_integral.shape[0] 26 | else: 27 | if reference_keypoint_id == 0: 28 | pred_keypoints3d = robot.get_keypoints(pred_joint,pred_rot,pred_trans) 29 | batch_size = pred_joint.shape[0] 30 | pred_joint = pred_joint.detach().cpu().numpy() 31 | else: 32 | pred_keypoints3d = robot.get_keypoints_root(pred_joint,pred_rot,pred_trans,root=reference_keypoint_id) 33 | batch_size = pred_joint.shape[0] 34 | pred_joint = pred_joint.detach().cpu().numpy() 35 | 36 | keypoints_num = len(robot.link_names) 37 | dof = robot.dof 38 | pred_keypoints3d = pred_keypoints3d.detach().cpu().numpy() 39 | gt_keypoints3d = gt_keypoints3d.detach().cpu().numpy() 40 | gt_keypoints2d = gt_keypoints2d.detach().cpu().numpy() 41 | K_original = K_original.detach().cpu().numpy() 42 | gt_joint = gt_joint.detach().cpu().numpy() 43 | pred_keypoints2d = point_projection_from_3d(K_original,pred_keypoints3d) 44 | assert(pred_keypoints3d.shape == (batch_size,keypoints_num,3)),f"{pred_keypoints3d.shape}" 45 | assert(gt_keypoints3d.shape == (batch_size,keypoints_num,3)),f"{gt_keypoints3d.shape}" 46 | assert(pred_keypoints2d.shape == (batch_size,keypoints_num,2)),f"{pred_keypoints2d.shape}" 47 | assert(gt_keypoints2d.shape == (batch_size,keypoints_num,2)),f"{gt_keypoints2d.shape}" 48 | 49 | 50 | # Thresholds (ADD:mm, PCK:pixel) 51 | add_thresholds = [1,5,10,20,40,60,80,100] 52 | pck_thresholds = [2.5,5.0,7.5,10.0,12.5,15.0,17.5,20.0] 53 | 54 | # ADD Average distance of detected keypoints within threshold 55 | error3d_batch = np.linalg.norm(pred_keypoints3d - gt_keypoints3d, ord = 2, axis = 2) 56 | assert(error3d_batch.shape == (batch_size,keypoints_num)) 57 | error3d = np.mean(error3d_batch, axis = 1) 58 | # pcts3d = [len(np.where(error3d < th_mm/1000.0)[0])/float(error3d.shape[0]*error3d.shape[1]) for th_mm in add_thresholds] 59 | 60 | # PCK percentage of correct keypoints (only keypoints within the camera frame) 61 | error2d_batch = np.linalg.norm(pred_keypoints2d - gt_keypoints2d, ord = 2, axis = 2) 62 | assert(error2d_batch.shape == (batch_size,keypoints_num)) 63 | valid = (gt_keypoints2d[:,:,0] <= 640.0) & (gt_keypoints2d[:,:,0] >= 0) & (gt_keypoints2d[:,:,1] <= 480.0) & (gt_keypoints2d[:,:,1] >= 0) 64 | error2d_all = error2d_batch * valid 65 | error2d_sum = np.sum(error2d_all, axis = 1) 66 | valid_sum = np.sum(valid, axis = 1) 67 | error2d = error2d_sum / valid_sum 68 | # pcts2d = [len(np.where(error2d < th_p)[0])/float(error2d.shape[0]*error2d.shape[1]) for th_p in pck_thresholds] 69 | 70 | # 3D/2D mean distance with gt of each keypoints 71 | dis3d = list(np.mean(error3d_batch, axis = 0)) 72 | error2d_sum_batch = np.sum(error2d_all, axis = 0) 73 | valid_sum_batch = np.sum(valid, axis = 0) 74 | dis2d = error2d_sum_batch / valid_sum_batch 75 | # dis2d = list(np.mean(error2d_batch, axis = 0)) 76 | 77 | # mean joint angle L1 error (per joint) 78 | # mean joint angle L1 error (per image) 79 | if pred_joint is not None: 80 | # pred_joint = pred_joint.detach().cpu().numpy() 81 | assert(gt_joint.shape == pred_joint.shape and gt_joint.shape == (batch_size, dof)), f"{pred_joint.shape},{gt_joint.shape}" 82 | error_joint = np.abs(gt_joint - pred_joint) 83 | l1_jointerror = list(np.mean(error_joint, axis = 0)) 84 | if robot.robot_type == "panda": 85 | mean_jointerror = list(np.mean(error_joint[:,:-1], axis = 1)) 86 | else: 87 | mean_jointerror = list(np.mean(error_joint, axis = 1)) 88 | assert(len(mean_jointerror) == batch_size), len(mean_jointerror) 89 | else: 90 | l1_jointerror = [0] * dof 91 | mean_jointerror = [0] * batch_size 92 | 93 | # depth l1 error 94 | reference_keypoint_id = pred_kwargs["reference_keypoint_id"] 95 | error_depth = np.abs(pred_keypoints3d[:,reference_keypoint_id,2] - gt_keypoints3d[:,reference_keypoint_id,2]) 96 | 97 | # root relative error 98 | pred_relatives = pred_keypoints3d[:,:,2] - pred_keypoints3d[:,reference_keypoint_id:reference_keypoint_id+1,2] 99 | gt_relatives = gt_keypoints3d[:,:,2] - gt_keypoints3d[:,reference_keypoint_id:reference_keypoint_id+1,2] 100 | error_relative = np.abs(pred_relatives - gt_relatives) 101 | batch_error_relative = np.mean(error_relative, axis=1) 102 | 103 | # root relative auc 104 | pred_keypoints3d_relative = pred_keypoints3d.copy() 105 | pred_keypoints3d_relative[:,:,2] = pred_relatives 106 | gt_keypoints3d_relative = gt_keypoints3d.copy() 107 | gt_keypoints3d_relative[:,:,2] = gt_relatives 108 | error3d_relative_batch = np.linalg.norm(pred_keypoints3d_relative - gt_keypoints3d_relative, ord = 2, axis = 2) 109 | assert(error3d_relative_batch.shape == (batch_size,keypoints_num)) 110 | error3d_relative = np.mean(error3d_relative_batch, axis = 1) 111 | 112 | 113 | 114 | return error3d, error2d, dis3d, dis2d, l1_jointerror, mean_jointerror, error_depth, batch_error_relative, error3d_relative 115 | 116 | 117 | def summary_add_pck(alldis): 118 | 119 | dis3d = np.array(alldis['dis3d']) 120 | dis2d = np.array(alldis['dis2d']) 121 | assert(dis3d.shape[0] == dis2d.shape[0]) 122 | 123 | add_threshold_ontb = [1,5,10,20,40,60,80,100] 124 | pck_threshold_ontb = [2.5,5.0,7.5,10.0,12.5,15.0,17.5,20.0] 125 | 126 | # for ADD 127 | auc_threshold = 0.1 128 | delta_threshold = 0.00001 129 | add_threshold_values = np.arange(0.0, auc_threshold, delta_threshold) 130 | counts_3d = [] 131 | for value in add_threshold_values: 132 | under_threshold = ( 133 | np.mean(dis3d <= value) 134 | ) 135 | counts_3d.append(under_threshold) 136 | auc_add = np.trapz(counts_3d, dx=delta_threshold) / auc_threshold 137 | 138 | # for PCK 139 | auc_pixel_threshold = 20.0 140 | delta_pixel = 0.01 141 | pck_threshold_values = np.arange(0, auc_pixel_threshold, delta_pixel) 142 | counts_2d = [] 143 | for value in pck_threshold_values: 144 | under_threshold = ( 145 | np.mean(dis2d <= value) 146 | ) 147 | counts_2d.append(under_threshold) 148 | auc_pck = np.trapz(counts_2d, dx=delta_pixel) / auc_pixel_threshold 149 | 150 | summary = { 151 | 'ADD/mean': np.mean(dis3d), 152 | 'ADD/median': np.median(dis3d), 153 | 'ADD/AUC': auc_add.item(), 154 | 'ADD_2D/mean': np.mean(dis2d), 155 | 'ADD_2D/median': np.median(dis2d), 156 | 'PCK/AUC': auc_pck.item() 157 | } 158 | for th_mm in add_threshold_ontb: 159 | summary[f'ADD_{th_mm}_mm'] = np.mean(dis3d <= th_mm * 1e-3) 160 | for th_p in pck_threshold_ontb: 161 | summary[f'PCK_{th_p}_pixel'] = np.mean(dis2d <= th_p) 162 | return summary 163 | 164 | 165 | def draw_add_curve(alldis, savename, testdsname, auc): 166 | 167 | dis3d = np.array(alldis['dis3d']) 168 | auc_threshold = 0.1 169 | delta_threshold = 0.00001 170 | add_threshold_values = np.arange(0.0, auc_threshold, delta_threshold) 171 | counts_3d = [] 172 | for value in add_threshold_values: 173 | under_threshold = ( 174 | np.mean(dis3d <= value) 175 | ) 176 | counts_3d.append(under_threshold) 177 | plt.figure(figsize=(25,18)) 178 | grid = plt.GridSpec(2,2, wspace=0.1, hspace=0.2) 179 | plt.subplot(grid[0,0]) 180 | plt.grid() 181 | plt.plot(add_threshold_values, counts_3d) 182 | plt.xlim(0,auc_threshold) 183 | plt.ylim(0,1.0) 184 | plt.xlabel("add threshold values (unit: m)") 185 | plt.ylabel("percentages") 186 | plt.axvline(x=np.mean(dis3d), color='red', linestyle='--', label='mean distance') 187 | plt.axvline(x=np.median(dis3d), color='green', linestyle='--', label='median distance') 188 | plt.title("ADD curve") 189 | plt.text(x=0.001, y=0.9, s="auc="+str(round(auc*100, ndigits=2))) 190 | plt.legend() 191 | 192 | plt.subplot(grid[0,1]) 193 | sns.histplot(dis3d, kde=True) 194 | plt.title("3d distance distribution, whole range") 195 | 196 | plt.subplot(grid[1,0]) 197 | sns.histplot(dis3d, kde=True) 198 | plt.xlim(0, 0.5) 199 | plt.title("3d distance distribution, range: 0~0.5m") 200 | 201 | plt.subplot(grid[1,1]) 202 | sns.histplot(dis3d, kde=True) 203 | plt.xlim(0, 0.1) 204 | plt.xticks(np.arange(0.0,0.101,0.01)) 205 | plt.title("3d distance distribution, range: 0~0.1m") 206 | plt.axvline(x=np.mean(dis3d), color='red', linestyle='--', label='mean distance') 207 | plt.axvline(x=np.median(dis3d), color='green', linestyle='--', label='median distance') 208 | 209 | dataset_name = testdsname.split("/")[-1] 210 | 211 | plt.savefig(os.path.join(savename, f"add_distribution_curve_{dataset_name}.jpg")) 212 | print("drawn add curve in folder vis") 213 | plt.close() 214 | 215 | 216 | def draw_depth_figure(alldis, savename, testdsname): 217 | if "dr" in testdsname.split("/")[-1]: 218 | ds = "dr" 219 | elif "photo" in testdsname.split("/")[-1]: 220 | ds = "photo" 221 | else: 222 | ds = testdsname.split("/")[-1] 223 | assert len(alldis["deptherror"]) == len(alldis["gt_root_depth"]), (len(alldis["deptherror"]), len(alldis["gt_root_depth"])) 224 | deptherror = np.array(alldis["deptherror"]) 225 | gtrootdepth = np.array(alldis["gt_root_depth"]) 226 | plt.figure(figsize=(15,15)) 227 | plt.scatter(gtrootdepth, deptherror) 228 | plt.xlim(0, 2.0) 229 | plt.ylim(0, 0.2) 230 | plt.title("root depth error -- gt root depth scatterplot") 231 | plt.savefig("unit_test/depth_curve/"+savename+"_"+ds+".jpg") 232 | plt.close() 233 | 234 | plt.close() 235 | -------------------------------------------------------------------------------- /lib/utils/transforms.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) 4 | import numpy as np 5 | import torch 6 | 7 | def hnormalized(vector): 8 | hnormalized_vector = (vector / vector[-1])[:-1] 9 | return hnormalized_vector 10 | 11 | def point_projection_from_3d(camera_K, points): 12 | corr = zip(camera_K, points) 13 | projections = [hnormalized(np.matmul(K, loc.T)).T for K,loc in corr] 14 | projections = np.array(projections) 15 | return projections 16 | 17 | def point_projection_from_3d_tensor(camera_K, points): 18 | corr = zip(camera_K, points) 19 | projections = [hnormalized(torch.matmul(K, loc.T)).T for K,loc in corr] 20 | projections = torch.stack(projections) 21 | return projections 22 | 23 | def invert_T(T): 24 | R = T[..., :3, :3] 25 | t = T[..., :3, [-1]] 26 | R_inv = R.transpose(-2, -1) 27 | t_inv = - R_inv @ t 28 | T_inv = T.clone() 29 | T_inv[..., :3, :3] = R_inv 30 | T_inv[..., :3, [-1]] = t_inv 31 | return T_inv 32 | 33 | def uvd_to_xyz(uvd_jts, image_size, intrinsic_matrix_inverse, root_trans, depth_factor, return_relative=False): 34 | 35 | """ 36 | Adapted from https://github.com/Jeff-sjtu/HybrIK/tree/main/hybrik/models 37 | """ 38 | 39 | # intrinsic_param is of the inverse version (inv=True) 40 | assert uvd_jts.dim() == 3 and uvd_jts.shape[2] == 3, uvd_jts.shape 41 | uvd_jts_new = uvd_jts.clone() 42 | assert torch.sum(torch.isnan(uvd_jts)) == 0, ('uvd_jts', uvd_jts) 43 | 44 | # remap uv coordinate to input (256x256) space 45 | uvd_jts_new[:, :, 0] = (uvd_jts[:, :, 0] + 0.5) * image_size 46 | uvd_jts_new[:, :, 1] = (uvd_jts[:, :, 1] + 0.5) * image_size 47 | # remap d to m (depth_factor unit: m) 48 | uvd_jts_new[:, :, 2] = uvd_jts[:, :, 2] * depth_factor 49 | assert torch.sum(torch.isnan(uvd_jts_new)) == 0, ('uvd_jts_new', uvd_jts_new) 50 | 51 | dz = uvd_jts_new[:, :, 2].cuda() 52 | 53 | # transform uv coordinate to x/z y/z coordinate 54 | uv_homo_jts = torch.cat((uvd_jts_new[:, :, :2], torch.ones_like(uvd_jts_new)[:, :, 2:]), dim=2).cuda() 55 | device = intrinsic_matrix_inverse.device 56 | uv_homo_jts = uv_homo_jts.to(device) 57 | # batch-wise matrix multipy : (B,1,3,3) * (B,K,3,1) -> (B,K,3,1) 58 | xyz_jts = torch.matmul(intrinsic_matrix_inverse.unsqueeze(1), uv_homo_jts.unsqueeze(-1)) 59 | xyz_jts = xyz_jts.squeeze(dim=3).cuda() 60 | 61 | # recover absolute z : (B,K) + (B,1) 62 | abs_z = dz + root_trans[:, 2].unsqueeze(-1).cuda() 63 | # multipy absolute z : (B,K,3) * (B,K,1) 64 | xyz_jts = xyz_jts * abs_z.unsqueeze(-1) 65 | 66 | if return_relative: 67 | # (B,K,3) - (B,1,3) 68 | xyz_jts = xyz_jts - root_trans.unsqueeze(1).cuda() 69 | 70 | # xyz_jts = xyz_jts / depth_factor.unsqueeze(-1) 71 | # output xyz unit: m 72 | 73 | return xyz_jts.cuda() 74 | 75 | 76 | def xyz_to_uvd(xyz_jts, image_size, intrinsic_matrix, root_trans, depth_factor, return_relative=False): 77 | 78 | """ 79 | Adapted from https://github.com/Jeff-sjtu/HybrIK/tree/main/hybrik/models 80 | """ 81 | 82 | assert xyz_jts.dim() == 3 and xyz_jts.shape[2] == 3, xyz_jts.shape 83 | xyz_jts = xyz_jts.cuda() 84 | intrinsic_matrix = intrinsic_matrix.cuda() 85 | root_trans = root_trans.cuda() 86 | uvd_jts = torch.empty_like(xyz_jts).cuda() 87 | if return_relative: 88 | # (B,K,3) - (B,1,3) 89 | xyz_jts = xyz_jts + root_trans.unsqueeze(1) 90 | assert torch.sum(torch.isnan(xyz_jts)) == 0, ('xyz_jts', xyz_jts) 91 | 92 | # batch-wise matrix multipy : (B,1,3,3) * (B,K,3,1) -> (B,K,3,1) 93 | uvz_jts = torch.matmul(intrinsic_matrix.unsqueeze(1), xyz_jts.unsqueeze(-1)) 94 | uvz_jts = uvz_jts.squeeze(dim=3) 95 | 96 | uv_homo = uvz_jts / uvz_jts[:, :, 2].unsqueeze(-1) 97 | 98 | abs_z = xyz_jts[:, :, 2] 99 | dz = abs_z - root_trans[:, 2].unsqueeze(-1) 100 | 101 | uvd_jts[:, :, 2] = dz / depth_factor 102 | uvd_jts[:, :, 0] = uv_homo[:, :, 0] / float(image_size) - 0.5 103 | uvd_jts[:, :, 1] = uv_homo[:, :, 1] / float(image_size) - 0.5 104 | 105 | assert torch.sum(torch.isnan(uvd_jts)) == 0, ('uvd_jts', uvd_jts) 106 | 107 | return uvd_jts 108 | 109 | 110 | def xyz_to_uvd_from_gt2d(xyz_jts, gt_uv_2d, image_size, root_trans, depth_factor, return_relative=False): 111 | 112 | assert xyz_jts.dim() == 3 and xyz_jts.shape[2] == 3, xyz_jts.shape 113 | assert gt_uv_2d.dim() == 3 and gt_uv_2d.shape[2] == 2, gt_uv_2d.shape 114 | xyz_jts = xyz_jts.cuda() 115 | root_trans = root_trans.cuda() 116 | uvd_jts = torch.empty_like(xyz_jts).cuda() 117 | if return_relative: 118 | # (B,K,3) - (B,1,3) 119 | xyz_jts = xyz_jts + root_trans.unsqueeze(1) 120 | assert torch.sum(torch.isnan(xyz_jts)) == 0, ('xyz_jts', xyz_jts) 121 | 122 | abs_z = xyz_jts[:, :, 2] 123 | dz = abs_z - root_trans[:, 2].unsqueeze(-1) 124 | 125 | uvd_jts[:, :, 2] = dz / depth_factor 126 | uvd_jts[:, :, 0] = gt_uv_2d[:, :, 0] / float(image_size) - 0.5 127 | uvd_jts[:, :, 1] = gt_uv_2d[:, :, 1] / float(image_size) - 0.5 128 | 129 | assert torch.sum(torch.isnan(uvd_jts)) == 0, ('uvd_jts', uvd_jts) 130 | 131 | return uvd_jts 132 | 133 | def uvz2xyz_singlepoint(uv, z, K): 134 | batch_size = uv.shape[0] 135 | assert uv.shape == (batch_size, 2) and z.shape == (batch_size,1) and K.shape == (batch_size,3,3), (uv.shape, z.shape, K.shape) 136 | inv_k = get_intrinsic_matrix_batch((K[:,0,0],K[:,1,1]), (K[:,0,2],K[:,1,2]), bsz=batch_size, inv=True) 137 | device = inv_k.device 138 | xy_unnormalized = uv * z 139 | xyz_transformed = torch.cat([xy_unnormalized, z], dim=1) 140 | xyz_transformed = xyz_transformed.to(device) 141 | assert xyz_transformed.shape == (batch_size, 3) and inv_k.shape == (batch_size, 3, 3) 142 | xyz = torch.matmul(inv_k, xyz_transformed.unsqueeze(-1)).squeeze(-1).cuda() 143 | return xyz 144 | 145 | def get_intrinsic_matrix_batch(f, c, bsz, inv=False): 146 | 147 | intrinsic_matrix = torch.zeros((bsz, 3, 3)).to(torch.float) 148 | 149 | if inv: 150 | intrinsic_matrix[:, 0, 0] = 1.0 / f[0].to(float) 151 | intrinsic_matrix[:, 0, 2] = - c[0].to(float) / f[0].to(float) 152 | intrinsic_matrix[:, 1, 1] = 1.0 / f[1].to(float) 153 | intrinsic_matrix[:, 1, 2] = - c[1].to(float) / f[1].to(float) 154 | intrinsic_matrix[:, 2, 2] = 1 155 | else: 156 | intrinsic_matrix[:, 0, 0] = f[0] 157 | intrinsic_matrix[:, 0, 2] = c[0] 158 | intrinsic_matrix[:, 1, 1] = f[1] 159 | intrinsic_matrix[:, 1, 2] = c[1] 160 | intrinsic_matrix[:, 2, 2] = 1 161 | 162 | return intrinsic_matrix.cuda(device=0) -------------------------------------------------------------------------------- /lib/utils/urdf_robot.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) 4 | import platform 5 | import numpy as np 6 | import pandas as pd 7 | import pyrender 8 | import torch 9 | from config import (BAXTER_DESCRIPTION_PATH, KUKA_DESCRIPTION_PATH, 10 | OWI_DESCRIPTION, OWI_KEYPOINTS_PATH, 11 | PANDA_DESCRIPTION_PATH, PANDA_DESCRIPTION_PATH_VISUAL) 12 | from dataset.const import JOINT_NAMES, LINK_NAMES 13 | from PIL import Image 14 | from utils.geometries import (quat_to_rotmat, rot6d_to_rotmat, rot9d_to_rotmat, 15 | rotmat_to_quat, rotmat_to_rot6d) 16 | from utils.mesh_renderer import RobotMeshRenderer, PandaArm 17 | from utils.urdfpytorch import URDF 18 | 19 | if platform.system() == "Linux": 20 | os.environ['PYOPENGL_PLATFORM'] = 'egl' 21 | 22 | class URDFRobot: 23 | def __init__(self,robot_type): 24 | self.robot_type = robot_type 25 | if self.robot_type == "panda": 26 | self.urdf_path = PANDA_DESCRIPTION_PATH 27 | self.urdf_path_visual = PANDA_DESCRIPTION_PATH_VISUAL 28 | self.dof = 8 29 | self.robot_for_render = PandaArm(self.urdf_path) 30 | elif self.robot_type == "kuka": 31 | self.urdf_path = KUKA_DESCRIPTION_PATH 32 | self.urdf_path_visual = KUKA_DESCRIPTION_PATH 33 | self.dof = 7 34 | self.robot_for_render = PandaArm(self.urdf_path) 35 | elif self.robot_type == "baxter": 36 | self.urdf_path = BAXTER_DESCRIPTION_PATH 37 | self.urdf_path_visual = BAXTER_DESCRIPTION_PATH 38 | self.dof = 15 39 | self.robot_for_render = PandaArm(self.urdf_path) 40 | elif self.robot_type == "owi": 41 | self.urdf_path = OWI_DESCRIPTION 42 | self.urdf_path_visual = OWI_DESCRIPTION 43 | self.dof = 4 44 | self.robot_for_render = None 45 | self.robot = URDF.load(self.urdf_path) 46 | self.robot_visual = URDF.load(self.urdf_path_visual) 47 | self.actuated_joint_names = JOINT_NAMES[self.robot_type] 48 | self.global_scale = 1.0 49 | self.device = None 50 | self.link_names, self.offsets = self.get_link_names_and_offsets() 51 | 52 | def get_link_names_and_offsets(self): 53 | if self.robot_type == "panda" or self.robot_type == "kuka": 54 | kp_offsets = torch.zeros((len(LINK_NAMES[self.robot_type]),3),dtype=torch.float).unsqueeze(0).unsqueeze(-1) * self.global_scale 55 | kp_offsets = kp_offsets.to(torch.float) 56 | return LINK_NAMES[self.robot_type], kp_offsets 57 | elif self.robot_type == "baxter": 58 | joint_name_to_joint = {joint.name: joint for joint in self.robot.joints} 59 | offsets = [] 60 | link_names = [] 61 | joint_names_for_links = [ 62 | 'torso_t0', 'right_s0','left_s0', 'right_s1', 'left_s1', 63 | 'right_e0','left_e0', 'right_e1','left_e1','right_w0', 'left_w0', 64 | 'right_w1','left_w1','right_w2', 'left_w2','right_hand','left_hand' 65 | ] 66 | for joint_name in joint_names_for_links: 67 | joint = joint_name_to_joint[joint_name] 68 | offset = joint.origin[:3, -1] 69 | link_name = joint.parent 70 | link_names.append(link_name) 71 | offsets.append(offset) 72 | kp_offsets = torch.as_tensor(np.stack(offsets)).unsqueeze(0).unsqueeze(-1) * self.global_scale 73 | kp_offsets = kp_offsets.to(torch.float) 74 | return link_names, kp_offsets 75 | elif self.robot_type == "owi": 76 | keypoint_infos = pd.read_json(OWI_KEYPOINTS_PATH) 77 | kp_offsets = torch.as_tensor(np.stack(keypoint_infos['offset'])).unsqueeze(0).unsqueeze(-1).to(torch.float) 78 | return LINK_NAMES[self.robot_type], kp_offsets 79 | else: 80 | raise(NotImplementedError) 81 | 82 | def get_keypoints(self, jointcfgs, b2c_rot, b2c_trans): 83 | 84 | # jointcfgs, b2c_rot, b2c_trans all comes in batch (as model outputs) 85 | # b2c means base to camera 86 | 87 | batch_size = b2c_rot.shape[0] 88 | if b2c_rot.shape[1] == 6: 89 | rotmat = rot6d_to_rotmat(b2c_rot) 90 | elif b2c_rot.shape[1] == 4: 91 | rotmat = quat_to_rotmat(b2c_rot) 92 | elif b2c_rot.shape[1] == 9: 93 | rotmat = rot9d_to_rotmat(b2c_rot) 94 | else: 95 | raise NotImplementedError 96 | trans = b2c_trans.unsqueeze(dim=2) 97 | pad = torch.zeros((batch_size,1,4),dtype=torch.float).cuda() 98 | base2cam = torch.cat([rotmat,trans],dim=2).cuda() 99 | base2cam = torch.cat([base2cam,pad],dim=1).cuda() 100 | base2cam[:,3,3] = 1.0 101 | base2cam = base2cam.unsqueeze(1) 102 | TWL_base = self.get_TWL(jointcfgs).cuda() 103 | TWL = base2cam @ TWL_base 104 | pts = TWL[:, :, :3, :3] @ self.offsets.cuda() + TWL[:, :, :3, [-1]] 105 | return pts.squeeze(-1) 106 | 107 | def get_TWL(self, cfgs): 108 | fk = self.robot.link_fk_batch(cfgs, use_names=True) 109 | TWL = torch.stack([fk[link] for link in self.link_names]).permute(1, 0, 2, 3) 110 | TWL[..., :3, -1] *= self.global_scale 111 | return TWL 112 | 113 | def get_rotation_at_specific_root(self, jointcfgs, b2c_rot, b2c_trans, root = 0): 114 | if root == 0: 115 | return b2c_rot 116 | batch_size = b2c_rot.shape[0] 117 | if b2c_rot.shape[1] == 6: 118 | rotmat = rot6d_to_rotmat(b2c_rot) 119 | elif b2c_rot.shape[1] == 4: 120 | rotmat = quat_to_rotmat(b2c_rot) 121 | elif b2c_rot.shape[1] == 9: 122 | rotmat = rot9d_to_rotmat(b2c_rot) 123 | else: 124 | raise NotImplementedError 125 | trans = b2c_trans.unsqueeze(dim=2) 126 | pad = torch.zeros((batch_size,1,4),dtype=torch.float).cuda() 127 | base2cam = torch.cat([rotmat,trans],dim=2).cuda() 128 | base2cam = torch.cat([base2cam,pad],dim=1).cuda() 129 | base2cam[:,3,3] = 1.0 130 | base2cam = base2cam.unsqueeze(1) 131 | TWL_base = self.get_TWL(jointcfgs).cuda() 132 | TWL = base2cam @ TWL_base 133 | assert root < TWL.shape[1], (root, TWL.shape[1]) 134 | if b2c_rot.shape[1] == 6: 135 | rotation = rotmat_to_rot6d(TWL[:, root, :3, :3]).cuda() 136 | elif b2c_rot.shape[1] == 4: 137 | rotation = rotmat_to_quat(TWL[:, root, :3, :3]).cuda() 138 | return rotation 139 | 140 | 141 | def get_keypoints_only_fk(self, jointcfgs): 142 | 143 | # only using joint angles to perform forward kinematics 144 | # the fk process is used when assuming the world frame is at the robot base, so rotation is identity and translation is origin/zeros 145 | # the output from this fk function is used for pnp process 146 | 147 | TWL = self.get_TWL(jointcfgs).cuda() 148 | pts = TWL[:, :, :3, :3] @ self.offsets.cuda() + TWL[:, :, :3, [-1]] 149 | return pts.squeeze(-1) 150 | 151 | def get_keypoints_only_fk_at_specific_root(self, jointcfgs, root=0): 152 | 153 | # only using joint angles to perform forward kinematics 154 | # the fk process is used when assuming the world frame is at the robot base, so rotation is identity and translation is origin/zeros 155 | # the output from this fk function is used for pnp process 156 | 157 | if root == 0: 158 | return self.get_keypoints_only_fk(jointcfgs) 159 | else: 160 | assert root > 0 and root < len(self.link_names) 161 | 162 | TWL_base = self.get_TWL(jointcfgs).cuda() 163 | TWL_root_inv = torch.linalg.inv(TWL_base[:,root:root+1,:,:]) 164 | TWL = TWL_root_inv @ TWL_base 165 | pts = TWL[:, :, :3, :3] @ self.offsets.cuda() + TWL[:, :, :3, [-1]] 166 | return pts.squeeze(-1) 167 | 168 | 169 | def get_keypoints_root(self, jointcfgs, b2c_rot, b2c_trans, root = 0): 170 | 171 | # jointcfgs, b2c_rot, b2c_trans all comes in batch (as model outputs) 172 | # b2c here means *** root *** to camera 173 | 174 | if root == 0: 175 | return self.get_keypoints(jointcfgs, b2c_rot, b2c_trans) 176 | else: 177 | assert root > 0 and root < len(self.link_names) 178 | 179 | batch_size = b2c_rot.shape[0] 180 | if b2c_rot.shape[1] == 6: 181 | rotmat = rot6d_to_rotmat(b2c_rot) 182 | elif b2c_rot.shape[1] == 4: 183 | rotmat = quat_to_rotmat(b2c_rot) 184 | elif b2c_rot.shape[1] == 9: 185 | rotmat = rot9d_to_rotmat(b2c_rot) 186 | else: 187 | raise NotImplementedError 188 | trans = b2c_trans.unsqueeze(dim=2) 189 | pad = torch.zeros((batch_size,1,4),dtype=torch.float).cuda() 190 | base2cam = torch.cat([rotmat,trans],dim=2).cuda() 191 | base2cam = torch.cat([base2cam,pad],dim=1).cuda() 192 | base2cam[:,3,3] = 1.0 193 | base2cam = base2cam.unsqueeze(1) 194 | TWL_base = self.get_TWL(jointcfgs).cuda() 195 | TWL_root_inv = torch.linalg.inv(TWL_base[:,root:root+1,:,:]) 196 | TWL_base = TWL_root_inv @ TWL_base 197 | TWL = base2cam @ TWL_base 198 | pts = TWL[:, :, :3, :3] @ self.offsets.cuda() + TWL[:, :, :3, [-1]] 199 | return pts.squeeze(-1) 200 | 201 | def set_robot_renderer(self, K_original, original_image_size=(480, 640), scale=0.5, device="cpu"): 202 | 203 | fx, fy, cx, cy = K_original[0,0]*scale, K_original[1,1]*scale, K_original[0,2]*scale, K_original[1,2]*scale 204 | image_size = (int(original_image_size[0]*scale), int(original_image_size[1]*scale)) 205 | 206 | base_dir = os.path.dirname(self.urdf_path) 207 | 208 | mesh_files = [ 209 | base_dir + "/meshes/visual/link0/link0.obj", 210 | base_dir + "/meshes/visual/link1/link1.obj", 211 | base_dir + "/meshes/visual/link2/link2.obj", 212 | base_dir + "/meshes/visual/link3/link3.obj", 213 | base_dir + "/meshes/visual/link4/link4.obj", 214 | base_dir + "/meshes/visual/link5/link5.obj", 215 | base_dir + "/meshes/visual/link6/link6.obj", 216 | base_dir + "/meshes/visual/link7/link7.obj", 217 | base_dir + "/meshes/visual/hand/hand.obj", 218 | ] 219 | 220 | focal_length = [-fx,-fy] 221 | principal_point = [cx, cy] 222 | 223 | robot_renderer = RobotMeshRenderer( 224 | focal_length=focal_length, principal_point=principal_point, image_size=image_size, 225 | robot=self.robot_for_render, mesh_files=mesh_files, device=device) 226 | 227 | return robot_renderer 228 | 229 | def get_robot_mesh_list(self, joint_angles, renderer): 230 | 231 | robot_meshes = [] 232 | for joint_angle in joint_angles: 233 | if self.robot_type == "panda": 234 | joints = joint_angle[:-1].detach().cpu() 235 | else: 236 | joints = joint_angle.detach().cpu() 237 | robot_mesh = renderer.get_robot_mesh(joints).cuda() 238 | robot_meshes.append(robot_mesh) 239 | 240 | return robot_meshes 241 | 242 | def get_rendered_mask_single_image(self, rot, trans, robot_mesh, robot_renderer_gpu): 243 | 244 | R = rot6d_to_rotmat(rot).reshape(1,3,3) 245 | R = torch.transpose(R,1,2).cuda() 246 | 247 | T = trans.reshape(1,3).cuda() 248 | 249 | if T[0,-1] < 0: 250 | rendered_image = robot_renderer_gpu.silhouette_renderer(meshes_world=robot_mesh, R = -R, T = -T) 251 | else: 252 | rendered_image = robot_renderer_gpu.silhouette_renderer(meshes_world=robot_mesh, R = R, T = T) 253 | 254 | if torch.isnan(rendered_image).any(): 255 | rendered_image = torch.nan_to_num(rendered_image) 256 | 257 | return rendered_image[..., 3] 258 | 259 | def get_rendered_mask_single_image_at_specific_root(self, joint_angles, rot, trans, robot_mesh, robot_renderer_gpu, root=0): 260 | 261 | if root == 0: 262 | return self.get_rendered_mask_single_image(rot, trans, robot_mesh, robot_renderer_gpu) 263 | else: 264 | rotmat = rot6d_to_rotmat(rot).cuda() 265 | trans = trans.unsqueeze(dim=1).cuda() 266 | pad = torch.zeros((1,4),dtype=torch.float).cuda() 267 | base2cam = torch.cat([rotmat,trans],dim=1).cuda() 268 | base2cam = torch.cat([base2cam,pad],dim=0).cuda() 269 | base2cam[3,3] = 1.0 270 | TWL_base = self.get_TWL(joint_angles.unsqueeze(0)).cuda().detach() # detach joint with rot/trans 271 | TWL_root_inv = torch.linalg.inv(TWL_base[:,root:root+1,:,:]).squeeze() 272 | new_base2cam = base2cam @ TWL_root_inv 273 | new_rot = rotmat_to_rot6d(new_base2cam[:3,:3]) 274 | new_trans = new_base2cam[:3,3] 275 | return self.get_rendered_mask_single_image(new_rot, new_trans, robot_mesh, robot_renderer_gpu) 276 | 277 | def get_textured_rendering(self, joint, rot, trans, intrinsics=(320, 320, 320, 240), save_path=(None,None,None), original_image=None, root=0): 278 | 279 | if root != 0: 280 | rotmat = rot6d_to_rotmat(rot) 281 | trans = trans.unsqueeze(dim=1) 282 | pad = torch.zeros((1,4),dtype=torch.float) 283 | base2cam = torch.cat([rotmat,trans],dim=1) 284 | base2cam = torch.cat([base2cam,pad],dim=0) 285 | base2cam[3,3] = 1.0 286 | TWL_base = self.get_TWL(joint.unsqueeze(0)) 287 | TWL_root_inv = torch.linalg.inv(TWL_base[:,root:root+1,:,:]).squeeze() 288 | new_base2cam = base2cam @ TWL_root_inv 289 | rot = rotmat_to_rot6d(new_base2cam[:3,:3]) 290 | trans = new_base2cam[:3,3] 291 | 292 | save_path1, save_path2, save_path3 = save_path 293 | rotmat = rot6d_to_rotmat(rot) 294 | trans = trans.unsqueeze(dim=1) 295 | pad = torch.zeros((1,4),dtype=torch.float) 296 | camera_pose = torch.cat([rotmat,trans],dim=1) 297 | camera_pose = torch.cat([camera_pose,pad],dim=0) 298 | camera_pose[3,3] = 1.0 299 | joint = joint.numpy() 300 | camera_pose = camera_pose.numpy() 301 | rotation = np.array([[1,0,0,0], 302 | [0,-1,0,0], 303 | [0,0,-1,0], 304 | [0,0,0,1]]) 305 | camera_pose = np.matmul(rotation,camera_pose) 306 | camera_pose = np.linalg.inv(camera_pose) 307 | fk = self.robot_visual.visual_trimesh_fk(cfg=joint) 308 | scene = pyrender.Scene() 309 | camera = pyrender.IntrinsicsCamera(*intrinsics) 310 | light = pyrender.DirectionalLight(color=[1.0, 1.0, 1.0], intensity=3) 311 | light_pose = np.eye(4) 312 | light_pose[:3, 3] = np.array([0, -1, 1]) 313 | scene.add(light, pose=light_pose) 314 | light_pose[:3, 3] = np.array([0, 1, 1]) 315 | scene.add(light, pose=light_pose) 316 | light_pose[:3, 3] = np.array([1, 1, 2]) 317 | scene.add(light, pose=light_pose) 318 | for tm in fk: 319 | pose = fk[tm] 320 | mesh = pyrender.Mesh.from_trimesh(tm, smooth=False) 321 | scene.add(mesh, pose=pose) 322 | scene.add(camera, pose=camera_pose) 323 | scene.add(light, pose=camera_pose) 324 | renderer = pyrender.OffscreenRenderer(viewport_width=640, viewport_height=480) 325 | color, depth = renderer.render(scene) 326 | rendered_img = Image.fromarray(np.uint8(color)).convert("RGBA") 327 | rendered_img.save(save_path1) 328 | original_img = Image.fromarray(np.transpose(np.uint8(original_image),(1,2,0))).convert("RGBA") 329 | original_img.save(save_path2) 330 | blend_ratio = 0.7 331 | blended_image = Image.blend(original_img, rendered_img, blend_ratio) 332 | blended_image.save(save_path3) 333 | 334 | def get_textured_rendering_individual(self, joint, rot, trans, intrinsics=(320, 320, 320, 240), root=0): 335 | 336 | if root != 0: 337 | rotmat = rot6d_to_rotmat(rot) 338 | trans = trans.unsqueeze(dim=1) 339 | pad = torch.zeros((1,4),dtype=torch.float) 340 | base2cam = torch.cat([rotmat,trans],dim=1) 341 | base2cam = torch.cat([base2cam,pad],dim=0) 342 | base2cam[3,3] = 1.0 343 | TWL_base = self.get_TWL(joint.unsqueeze(0)) 344 | TWL_root_inv = torch.linalg.inv(TWL_base[:,root:root+1,:,:]).squeeze() 345 | new_base2cam = base2cam @ TWL_root_inv 346 | rot = rotmat_to_rot6d(new_base2cam[:3,:3]) 347 | trans = new_base2cam[:3,3] 348 | 349 | rotmat = rot6d_to_rotmat(rot) 350 | trans = trans.unsqueeze(dim=1) 351 | pad = torch.zeros((1,4),dtype=torch.float) 352 | camera_pose = torch.cat([rotmat,trans],dim=1) 353 | camera_pose = torch.cat([camera_pose,pad],dim=0) 354 | camera_pose[3,3] = 1.0 355 | joint = joint.numpy() 356 | camera_pose = camera_pose.numpy() 357 | rotation = np.array([[1,0,0,0], 358 | [0,-1,0,0], 359 | [0,0,-1,0], 360 | [0,0,0,1]]) 361 | camera_pose = np.matmul(rotation,camera_pose) 362 | camera_pose = np.linalg.inv(camera_pose) 363 | fk = self.robot_visual.visual_trimesh_fk(cfg=joint) 364 | scene = pyrender.Scene() 365 | camera = pyrender.IntrinsicsCamera(*intrinsics) 366 | # azure 367 | light = pyrender.PointLight(color=[1.5, 1.5, 1.5], intensity=2.6) 368 | # realsense, kinect 369 | # light = pyrender.PointLight(color=[1.4, 1.4, 1.4], intensity=2.4) 370 | # light_pose = np.eye(4) 371 | # light_pose[:3, 3] = np.array([0, 1, 0]) 372 | # scene.add(light, pose=light_pose) 373 | # orb 374 | # light = pyrender.PointLight(color=[1.4, 1.4, 1.4], intensity=2.4) 375 | # light_pose = np.eye(4) 376 | # light_pose[:3, 3] = np.array([0, -1, 0]) 377 | # scene.add(light, pose=light_pose) 378 | 379 | for tm in fk: 380 | pose = fk[tm] 381 | mesh = pyrender.Mesh.from_trimesh(tm, smooth=False) 382 | scene.add(mesh, pose=pose) 383 | scene.add(camera, pose=camera_pose) 384 | scene.add(light, pose=camera_pose) 385 | renderer = pyrender.OffscreenRenderer(viewport_width=640, viewport_height=480) 386 | color, depth = renderer.render(scene) 387 | rendered_img = Image.fromarray(np.uint8(color)) 388 | return rendered_img -------------------------------------------------------------------------------- /lib/utils/urdfpytorch/__init__.py: -------------------------------------------------------------------------------- 1 | from .urdf import (URDFType, 2 | Box, Cylinder, Sphere, Mesh, Geometry, 3 | Texture, Material, 4 | Collision, Visual, Inertial, 5 | JointCalibration, JointDynamics, JointLimit, JointMimic, 6 | SafetyController, Actuator, TransmissionJoint, 7 | Transmission, Joint, Link, URDF) 8 | from .utils import (rpy_to_matrix, matrix_to_rpy, xyz_rpy_to_matrix, 9 | matrix_to_xyz_rpy) 10 | from .version import __version__ 11 | 12 | __all__ = [ 13 | 'URDFType', 'Box', 'Cylinder', 'Sphere', 'Mesh', 'Geometry', 14 | 'Texture', 'Material', 'Collision', 'Visual', 'Inertial', 15 | 'JointCalibration', 'JointDynamics', 'JointLimit', 'JointMimic', 16 | 'SafetyController', 'Actuator', 'TransmissionJoint', 17 | 'Transmission', 'Joint', 'Link', 'URDF', 18 | 'rpy_to_matrix', 'matrix_to_rpy', 'xyz_rpy_to_matrix', 'matrix_to_xyz_rpy', 19 | '__version__' 20 | ] 21 | -------------------------------------------------------------------------------- /lib/utils/urdfpytorch/utils.py: -------------------------------------------------------------------------------- 1 | """Utilities for URDF parsing. 2 | """ 3 | import os 4 | from pathlib import Path 5 | 6 | from lxml import etree as ET 7 | import numpy as np 8 | import trimesh 9 | 10 | 11 | def resolve_package_path(urdf_path, mesh_path): 12 | urdf_path = Path(urdf_path) 13 | search_dir = urdf_path 14 | relative_path = Path(str(mesh_path).replace('package://', '')) 15 | while True: 16 | absolute_path = (search_dir / relative_path) 17 | if absolute_path.exists(): 18 | return absolute_path 19 | search_dir = search_dir.parent 20 | 21 | 22 | def rpy_to_matrix(coords): 23 | """Convert roll-pitch-yaw coordinates to a 3x3 homogenous rotation matrix. 24 | 25 | The roll-pitch-yaw axes in a typical URDF are defined as a 26 | rotation of ``r`` radians around the x-axis followed by a rotation of 27 | ``p`` radians around the y-axis followed by a rotation of ``y`` radians 28 | around the z-axis. These are the Z1-Y2-X3 Tait-Bryan angles. See 29 | Wikipedia_ for more information. 30 | 31 | .. _Wikipedia: https://en.wikipedia.org/wiki/Euler_angles#Rotation_matrix 32 | 33 | Parameters 34 | ---------- 35 | coords : (3,) float 36 | The roll-pitch-yaw coordinates in order (x-rot, y-rot, z-rot). 37 | 38 | Returns 39 | ------- 40 | R : (3,3) float 41 | The corresponding homogenous 3x3 rotation matrix. 42 | """ 43 | coords = np.asanyarray(coords, dtype=np.float64) 44 | c3, c2, c1 = np.cos(coords) 45 | s3, s2, s1 = np.sin(coords) 46 | 47 | return np.array([ 48 | [c1 * c2, (c1 * s2 * s3) - (c3 * s1), (s1 * s3) + (c1 * c3 * s2)], 49 | [c2 * s1, (c1 * c3) + (s1 * s2 * s3), (c3 * s1 * s2) - (c1 * s3)], 50 | [-s2, c2 * s3, c2 * c3] 51 | ], dtype=np.float64) 52 | 53 | 54 | def matrix_to_rpy(R, solution=1): 55 | """Convert a 3x3 transform matrix to roll-pitch-yaw coordinates. 56 | 57 | The roll-pitchRyaw axes in a typical URDF are defined as a 58 | rotation of ``r`` radians around the x-axis followed by a rotation of 59 | ``p`` radians around the y-axis followed by a rotation of ``y`` radians 60 | around the z-axis. These are the Z1-Y2-X3 Tait-Bryan angles. See 61 | Wikipedia_ for more information. 62 | 63 | .. _Wikipedia: https://en.wikipedia.org/wiki/Euler_angles#Rotation_matrix 64 | 65 | There are typically two possible roll-pitch-yaw coordinates that could have 66 | created a given rotation matrix. Specify ``solution=1`` for the first one 67 | and ``solution=2`` for the second one. 68 | 69 | Parameters 70 | ---------- 71 | R : (3,3) float 72 | A 3x3 homogenous rotation matrix. 73 | solution : int 74 | Either 1 or 2, indicating which solution to return. 75 | 76 | Returns 77 | ------- 78 | coords : (3,) float 79 | The roll-pitch-yaw coordinates in order (x-rot, y-rot, z-rot). 80 | """ 81 | R = np.asanyarray(R, dtype=np.float64) 82 | r = 0.0 83 | p = 0.0 84 | y = 0.0 85 | 86 | if np.abs(R[2,0]) >= 1.0 - 1e-12: 87 | y = 0.0 88 | if R[2,0] < 0: 89 | p = np.pi / 2 90 | r = np.arctan2(R[0,1], R[0,2]) 91 | else: 92 | p = -np.pi / 2 93 | r = np.arctan2(-R[0,1], -R[0,2]) 94 | else: 95 | if solution == 1: 96 | p = -np.arcsin(R[2,0]) 97 | else: 98 | p = np.pi + np.arcsin(R[2,0]) 99 | r = np.arctan2(R[2,1] / np.cos(p), R[2,2] / np.cos(p)) 100 | y = np.arctan2(R[1,0] / np.cos(p), R[0,0] / np.cos(p)) 101 | 102 | return np.array([r, p, y], dtype=np.float64) 103 | 104 | 105 | def matrix_to_xyz_rpy(matrix): 106 | """Convert a 4x4 homogenous matrix to xyzrpy coordinates. 107 | 108 | Parameters 109 | ---------- 110 | matrix : (4,4) float 111 | The homogenous transform matrix. 112 | 113 | Returns 114 | ------- 115 | xyz_rpy : (6,) float 116 | The xyz_rpy vector. 117 | """ 118 | xyz = matrix[:3,3] 119 | rpy = matrix_to_rpy(matrix[:3,:3]) 120 | return np.hstack((xyz, rpy)) 121 | 122 | 123 | def xyz_rpy_to_matrix(xyz_rpy): 124 | """Convert xyz_rpy coordinates to a 4x4 homogenous matrix. 125 | 126 | Parameters 127 | ---------- 128 | xyz_rpy : (6,) float 129 | The xyz_rpy vector. 130 | 131 | Returns 132 | ------- 133 | matrix : (4,4) float 134 | The homogenous transform matrix. 135 | """ 136 | matrix = np.eye(4, dtype=np.float64) 137 | matrix[:3,3] = xyz_rpy[:3] 138 | matrix[:3,:3] = rpy_to_matrix(xyz_rpy[3:]) 139 | return matrix 140 | 141 | 142 | def parse_origin(node): 143 | """Find the ``origin`` subelement of an XML node and convert it 144 | into a 4x4 homogenous transformation matrix. 145 | 146 | Parameters 147 | ---------- 148 | node : :class`lxml.etree.Element` 149 | An XML node which (optionally) has a child node with the ``origin`` 150 | tag. 151 | 152 | Returns 153 | ------- 154 | matrix : (4,4) float 155 | The 4x4 homogneous transform matrix that corresponds to this node's 156 | ``origin`` child. Defaults to the identity matrix if no ``origin`` 157 | child was found. 158 | """ 159 | matrix = np.eye(4, dtype=np.float64) 160 | origin_node = node.find('origin') 161 | if origin_node is not None: 162 | if 'xyz' in origin_node.attrib: 163 | matrix[:3,3] = np.fromstring(origin_node.attrib['xyz'], sep=' ') 164 | if 'rpy' in origin_node.attrib: 165 | rpy = np.fromstring(origin_node.attrib['rpy'], sep=' ') 166 | matrix[:3,:3] = rpy_to_matrix(rpy) 167 | return matrix 168 | 169 | 170 | def unparse_origin(matrix): 171 | """Turn a 4x4 homogenous matrix into an ``origin`` XML node. 172 | 173 | Parameters 174 | ---------- 175 | matrix : (4,4) float 176 | The 4x4 homogneous transform matrix to convert into an ``origin`` 177 | XML node. 178 | 179 | Returns 180 | ------- 181 | node : :class`lxml.etree.Element` 182 | An XML node whose tag is ``origin``. The node has two attributes: 183 | 184 | - ``xyz`` - A string with three space-delimited floats representing 185 | the translation of the origin. 186 | - ``rpy`` - A string with three space-delimited floats representing 187 | the rotation of the origin. 188 | """ 189 | node = ET.Element('origin') 190 | node.attrib['xyz'] = '{} {} {}'.format(*matrix[:3,3]) 191 | node.attrib['rpy'] = '{} {} {}'.format(*matrix_to_rpy(matrix[:3,:3])) 192 | return node 193 | 194 | 195 | def get_filename(base_path, file_path, makedirs=False): 196 | """Formats a file path correctly for URDF loading. 197 | 198 | Parameters 199 | ---------- 200 | base_path : str 201 | The base path to the URDF's folder. 202 | file_path : str 203 | The path to the file. 204 | makedirs : bool, optional 205 | If ``True``, the directories leading to the file will be created 206 | if needed. 207 | 208 | Returns 209 | ------- 210 | resolved : str 211 | The resolved filepath -- just the normal ``file_path`` if it was an 212 | absolute path, otherwise that path joined to ``base_path``. 213 | """ 214 | # print(base_path) 215 | # print(file_path) 216 | fn = file_path 217 | if not os.path.isabs(file_path): 218 | fn = os.path.join(base_path, file_path) 219 | if makedirs: 220 | d, _ = os.path.split(fn) 221 | if not os.path.exists(d): 222 | os.makedirs(d) 223 | if not Path(fn).exists(): 224 | fn = str(resolve_package_path(base_path, file_path)) 225 | return fn 226 | 227 | 228 | def load_meshes(filename): 229 | """Loads triangular meshes from a file. 230 | 231 | Parameters 232 | ---------- 233 | filename : str 234 | Path to the mesh file. 235 | 236 | Returns 237 | ------- 238 | meshes : list of :class:`~trimesh.base.Trimesh` 239 | The meshes loaded from the file. 240 | """ 241 | meshes = trimesh.load(filename) 242 | 243 | # If we got a scene, dump the meshes 244 | if isinstance(meshes, trimesh.Scene): 245 | meshes = list(meshes.dump()) 246 | meshes = [g for g in meshes if isinstance(g, trimesh.Trimesh)] 247 | 248 | if isinstance(meshes, (list, tuple, set)): 249 | meshes = list(meshes) 250 | if len(meshes) == 0: 251 | raise ValueError('At least one mesh must be pmeshesent in file') 252 | for r in meshes: 253 | if not isinstance(r, trimesh.Trimesh): 254 | raise TypeError('Could not load meshes from file') 255 | elif isinstance(meshes, trimesh.Trimesh): 256 | meshes = [meshes] 257 | else: 258 | raise ValueError('Unable to load mesh from file') 259 | 260 | return meshes 261 | 262 | 263 | def configure_origin(value): 264 | """Convert a value into a 4x4 transform matrix. 265 | 266 | Parameters 267 | ---------- 268 | value : None, (6,) float, or (4,4) float 269 | The value to turn into the matrix. 270 | If (6,), interpreted as xyzrpy coordinates. 271 | 272 | Returns 273 | ------- 274 | matrix : (4,4) float or None 275 | The created matrix. 276 | """ 277 | if value is None: 278 | value = np.eye(4, dtype=np.float64) 279 | elif isinstance(value, (list, tuple, np.ndarray)): 280 | value = np.asanyarray(value, dtype=np.float64) 281 | if value.shape == (6,): 282 | value = xyz_rpy_to_matrix(value) 283 | elif value.shape != (4,4): 284 | raise ValueError('Origin must be specified as a 4x4 ' 285 | 'homogenous transformation matrix') 286 | else: 287 | raise TypeError('Invalid type for origin, expect 4x4 matrix') 288 | return value 289 | -------------------------------------------------------------------------------- /lib/utils/urdfpytorch/version.py: -------------------------------------------------------------------------------- 1 | __version__ = '0.0.19' 2 | -------------------------------------------------------------------------------- /lib/utils/utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) 4 | import random 5 | import shutil 6 | from collections import OrderedDict, defaultdict 7 | from lib.dataset.multiepoch_dataloader import MultiEpochDataLoader 8 | from lib.dataset.samplers import PartialSampler 9 | from pathlib import Path 10 | import numpy as np 11 | import torch 12 | from lib.dataset.dream import DreamDataset 13 | from torch.utils.data import DataLoader 14 | from torch.utils.tensorboard import SummaryWriter 15 | from torchnet.meter import AverageValueMeter 16 | from tqdm import tqdm 17 | 18 | def cast(obj, device, dtype=None): 19 | 20 | if isinstance(obj, (dict, OrderedDict)): 21 | for k, v in obj.items(): 22 | if v is None: 23 | continue 24 | obj[k] = cast(torch.as_tensor(v),device) 25 | if dtype is not None: 26 | obj[k] = obj[k].to(dtype) 27 | return obj 28 | else: 29 | return obj.to(device) 30 | 31 | 32 | def set_random_seed(seed): 33 | 34 | random.seed(seed) 35 | np.random.seed(seed) 36 | torch.manual_seed(seed) 37 | torch.cuda.manual_seed(seed) 38 | 39 | 40 | def copy_and_rename(src_path, dest_path, new_filename): 41 | 42 | src_path = Path(src_path) 43 | dest_path = Path(dest_path) 44 | shutil.copy(src_path, dest_path) 45 | src_filename = src_path.name 46 | dest_filepath = dest_path / new_filename 47 | (dest_path / src_filename).replace(dest_filepath) 48 | 49 | 50 | def create_logger(args): 51 | 52 | save_folder = os.path.join('experiments', args.exp_name) 53 | ckpt_folder = os.path.join(save_folder, 'ckpt') 54 | log_folder = os.path.join(save_folder, 'log') 55 | os.makedirs(ckpt_folder, exist_ok=True) 56 | os.makedirs(log_folder, exist_ok=True) 57 | writer = SummaryWriter(log_dir=log_folder) 58 | copy_and_rename(args.config_path, save_folder, "config.yaml") 59 | 60 | return save_folder, ckpt_folder, log_folder, writer 61 | 62 | 63 | def get_dataloaders(args): 64 | 65 | urdf_robot_name = args.urdf_robot_name 66 | train_ds_names = args.train_ds_names 67 | test_ds_name_dr = train_ds_names.replace("train_dr","test_dr") 68 | if urdf_robot_name != "baxter": 69 | test_ds_name_photo = train_ds_names.replace("train_dr","test_photo") 70 | if urdf_robot_name == "panda": 71 | 72 | test_ds_name_real = [train_ds_names.replace("synthetic/panda_synth_train_dr","real/panda-3cam_azure"), 73 | train_ds_names.replace("synthetic/panda_synth_train_dr","real/panda-3cam_kinect360"), 74 | train_ds_names.replace("synthetic/panda_synth_train_dr","real/panda-3cam_realsense"), 75 | train_ds_names.replace("synthetic/panda_synth_train_dr","real/panda-orb")] 76 | 77 | rootnet_hw = (int(args.rootnet_image_size),int(args.rootnet_image_size)) 78 | other_hw = (int(args.other_image_size),int(args.other_image_size)) 79 | ds_train = DreamDataset(train_ds_names, 80 | rootnet_resize_hw=rootnet_hw, 81 | other_resize_hw=other_hw, 82 | color_jitter=args.jitter, rgb_augmentation=args.other_aug, 83 | occlusion_augmentation=args.occlusion, occlu_p=args.occlu_p) 84 | ds_test_dr = DreamDataset(test_ds_name_dr, 85 | rootnet_resize_hw=rootnet_hw, 86 | other_resize_hw=other_hw, 87 | color_jitter=False, rgb_augmentation=False, occlusion_augmentation=False) 88 | if urdf_robot_name != "baxter": 89 | ds_test_photo = DreamDataset(test_ds_name_photo, 90 | rootnet_resize_hw=rootnet_hw, 91 | other_resize_hw=other_hw, 92 | color_jitter=False, rgb_augmentation=False, occlusion_augmentation=False) 93 | 94 | train_sampler = PartialSampler(ds_train, epoch_size=args.epoch_size) 95 | ds_iter_train = DataLoader( 96 | ds_train, 97 | sampler=train_sampler, 98 | batch_size=args.batch_size, 99 | num_workers=args.n_dataloader_workers, 100 | drop_last=False, 101 | pin_memory=True 102 | ) 103 | ds_iter_train = MultiEpochDataLoader(ds_iter_train) 104 | 105 | test_loader_dict = {} 106 | ds_iter_test_dr = DataLoader( 107 | ds_test_dr, 108 | batch_size=args.batch_size, 109 | num_workers=args.n_dataloader_workers 110 | ) 111 | test_loader_dict["dr"] = ds_iter_test_dr 112 | 113 | if urdf_robot_name != "baxter": 114 | ds_iter_test_photo = DataLoader( 115 | ds_test_photo, 116 | batch_size=args.batch_size, 117 | num_workers=args.n_dataloader_workers 118 | ) 119 | test_loader_dict["photo"] = ds_iter_test_photo 120 | 121 | if urdf_robot_name == "panda": 122 | ds_shorts = ["azure", "kinect", "realsense", "orb"] 123 | for ds_name, ds_short in zip(test_ds_name_real, ds_shorts): 124 | ds_test_real = DreamDataset(ds_name, 125 | rootnet_resize_hw=rootnet_hw, 126 | other_resize_hw=other_hw, 127 | color_jitter=False, rgb_augmentation=False, occlusion_augmentation=False, 128 | process_truncation=args.fix_truncation) 129 | ds_iter_test_real = DataLoader( 130 | ds_test_real, 131 | batch_size=args.batch_size, 132 | num_workers=args.n_dataloader_workers 133 | ) 134 | test_loader_dict[ds_short] = ds_iter_test_real 135 | 136 | print("len(ds_iter_train): ", len(ds_iter_train)) 137 | print("len(ds_iter_test_dr): ", len(ds_iter_test_dr)) 138 | if urdf_robot_name != "baxter": 139 | print("len(ds_iter_test_photo): ", len(ds_iter_test_photo)) 140 | if urdf_robot_name == "panda": 141 | for ds_short in ds_shorts: 142 | print(f"len(ds_iter_test_{ds_short}): ", len(test_loader_dict[ds_short])) 143 | 144 | return ds_iter_train, test_loader_dict 145 | 146 | 147 | def get_scheduler(args, optimizer, last_epoch): 148 | 149 | def lr_lambda_linear(epoch): 150 | if epoch < args.n_epochs_warmup: 151 | ratio = float(epoch+1)/float(args.n_epochs_warmup) 152 | elif epoch <= args.start_decay: 153 | ratio = 1.0 154 | elif epoch <= args.end_decay: 155 | ratio = (float(args.end_decay - args.final_decay * args.start_decay) - (float(1-args.final_decay) * epoch)) / float(args.end_decay - args.start_decay) 156 | else: 157 | ratio = args.final_decay 158 | return ratio 159 | 160 | def lr_lambda_exponential(epoch): 161 | base_ratio = 1.0 162 | ratio = base_ratio 163 | if epoch < args.n_epochs_warmup: 164 | ratio = float(epoch+1)/float(args.n_epochs_warmup) 165 | elif epoch <= args.start_decay: 166 | ratio = base_ratio 167 | elif epoch <= args.end_decay: 168 | ratio = (args.exponent)**(epoch-args.start_decay) 169 | else: 170 | ratio = (args.exponent)**(args.end_decay-args.start_decay) 171 | return ratio 172 | 173 | def lr_lambda_everyXepoch(epoch): 174 | ratio = (args.step_decay)**(epoch // args.step) 175 | if epoch >= args.end_decay: 176 | ratio = (args.step_decay)**(args.end_decay // args.step) 177 | return ratio 178 | 179 | if args.use_schedule: 180 | if args.schedule_type == "linear": 181 | lr_scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer=optimizer, lr_lambda=lr_lambda_linear, last_epoch=last_epoch) 182 | elif args.schedule_type == "exponential": 183 | lr_scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer=optimizer, lr_lambda=lr_lambda_exponential, last_epoch=last_epoch) 184 | elif args.schedule_type == "everyXepoch": 185 | lr_scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer=optimizer, lr_lambda=lr_lambda_everyXepoch, last_epoch=last_epoch) 186 | else: 187 | lr_scheduler = None 188 | 189 | return lr_scheduler 190 | 191 | 192 | def resume_run(args, model, optimizer, device): 193 | 194 | curr_max_auc_4real = { "azure": 0.0, "kinect": 0.0, "realsense": 0.0, "orb": 0.0 } 195 | template = 'ckpt/curr_best_auc(add)_DATASET_model.pk' 196 | ckpt_paths = [template.replace("DATASET", name) for name in curr_max_auc_4real.keys()] 197 | 198 | resume_dir = os.path.join("experiments" , args.resume_experiment_name) 199 | path = os.path.join(resume_dir, 'ckpt/curr_best_auc(add)_model.pk') 200 | checkpoint = torch.load(path) 201 | state_dict = checkpoint['model_state_dict'] 202 | model.load_state_dict(state_dict) 203 | model.to(device) 204 | optimizer_dict = checkpoint['optimizer_state_dict'] 205 | optimizer.load_state_dict(optimizer_dict) 206 | for state in optimizer.state.values(): 207 | for k, v in state.items(): 208 | if isinstance(v, torch.Tensor): 209 | state[k] = v.to(device) 210 | 211 | start_epoch = checkpoint['epoch']+1 212 | last_epoch = checkpoint['lr_scheduler_last_epoch'] 213 | curr_max_auc = checkpoint["auc_add"] 214 | 215 | for postfix, dsname in zip(ckpt_paths, curr_max_auc_4real.keys()): 216 | model_path = os.path.join(resume_dir, postfix) 217 | ckpt = torch.load(model_path) 218 | curr_max_auc_onreal = ckpt["auc_add"] 219 | curr_max_auc_4real[dsname] = curr_max_auc_onreal 220 | 221 | return start_epoch, last_epoch, curr_max_auc, curr_max_auc_4real 222 | 223 | 224 | def save_checkpoint(args, auc_adds, model, optimizer, ckpt_folder, epoch, lr_scheduler, curr_max_auc, curr_max_auc_4real): 225 | 226 | save_path_dr = os.path.join(ckpt_folder, 'curr_best_auc(add)_model.pk') 227 | save_path_azure = os.path.join(ckpt_folder, 'curr_best_auc(add)_azure_model.pk') 228 | save_path_kinect = os.path.join(ckpt_folder, 'curr_best_auc(add)_kinect_model.pk') 229 | save_path_realsense = os.path.join(ckpt_folder, 'curr_best_auc(add)_realsense_model.pk') 230 | save_path_orb = os.path.join(ckpt_folder, 'curr_best_auc(add)_orb_model.pk') 231 | save_path = {"azure":save_path_azure, "kinect":save_path_kinect, "realsense":save_path_realsense, "orb":save_path_orb} 232 | saves = {"dr":True, "azure":True, "kinect":True, "realsense":True, "orb":True } 233 | if os.path.exists(save_path_dr): 234 | ckpt = torch.load(save_path_dr) 235 | if epoch <= ckpt["epoch"]: # prevent better model got covered during cluster rebooting 236 | saves["dr"] = False 237 | for real_name in ["azure", "kinect", "realsense", "orb"]: 238 | if os.path.exists(save_path[real_name]): 239 | ckpt_real = torch.load(save_path[real_name]) 240 | if epoch <= ckpt_real["epoch"]: # prevent better model got covered during cluster rebooting 241 | saves[real_name] = False 242 | 243 | if saves["dr"]: 244 | if auc_adds["dr"] > curr_max_auc: 245 | curr_max_auc = auc_adds["dr"] 246 | last_epoch = lr_scheduler.last_epoch if args.use_schedule else -1 247 | torch.save({ 248 | 'epoch': epoch, 249 | 'auc_add': curr_max_auc, 250 | 'model_state_dict': model.state_dict(), 251 | 'optimizer_state_dict': optimizer.state_dict(), 252 | 'lr_scheduler_last_epoch':last_epoch, 253 | }, save_path_dr) 254 | 255 | if args.urdf_robot_name == "panda": 256 | for real_name in ["azure", "kinect", "realsense", "orb"]: 257 | if saves[real_name]: 258 | if auc_adds[real_name] > curr_max_auc_4real[real_name]: 259 | curr_max_auc_4real[real_name] = auc_adds[real_name] 260 | last_epoch = lr_scheduler.last_epoch if args.use_schedule else -1 261 | torch.save({ 262 | 'epoch': epoch, 263 | 'auc_add': curr_max_auc_4real[real_name], 264 | 'model_state_dict': model.state_dict(), 265 | 'optimizer_state_dict': optimizer.state_dict(), 266 | 'lr_scheduler_last_epoch':last_epoch, 267 | }, save_path[real_name]) -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | absl-py==1.4.0 2 | aiofiles==22.1.0 3 | aiosqlite==0.18.0 4 | ansitable==0.9.7 5 | antlr4-python3-runtime==4.9.3 6 | anyio==3.6.2 7 | asttokens==2.2.1 8 | async-generator==1.10 9 | attrs==23.1.0 10 | backcall==0.2.0 11 | cachetools==5.3.0 12 | certifi==2023.5.7 13 | charset-normalizer==3.1.0 14 | cmake==3.26.3 15 | colorama==0.4.6 16 | colored==2.2.3 17 | comm==0.1.3 18 | contourpy==1.0.7 19 | cycler==0.11.0 20 | debugpy==1.6.7 21 | decorator==5.1.1 22 | easydict==1.10 23 | exceptiongroup==1.1.1 24 | executing==1.2.0 25 | filelock==3.12.0 26 | fonttools==4.39.4 27 | freetype-py==2.4.0 28 | google-auth==2.18.0 29 | google-auth-oauthlib==0.4.6 30 | grpcio==1.54.0 31 | h11==0.14.0 32 | HeapDict==1.0.1 33 | idna==3.4 34 | imageio==2.28.1 35 | importlib-metadata==6.6.0 36 | importlib-resources==5.12.0 37 | iopath==0.1.10 38 | ipykernel==6.23.0 39 | ipython==8.13.2 40 | jedi==0.18.2 41 | Jinja2==3.1.2 42 | joblib==1.2.0 43 | json-tricks==3.17.3 44 | jsonpatch==1.32 45 | jsonpointer==2.3 46 | jupyter_client==8.2.0 47 | jupyter_core==5.3.0 48 | kiwisolver==1.4.4 49 | kornia==0.7.0 50 | lit==16.0.3 51 | lxml==4.9.2 52 | Markdown==3.4.3 53 | MarkupSafe==2.1.2 54 | matplotlib==3.7.1 55 | matplotlib-inline==0.1.6 56 | mpmath==1.3.0 57 | nest-asyncio==1.5.6 58 | networkx==2.5 59 | numpy==1.22.4 60 | nvidia-cublas-cu11==11.10.3.66 61 | nvidia-cuda-cupti-cu11==11.7.101 62 | nvidia-cuda-nvrtc-cu11==11.7.99 63 | nvidia-cuda-runtime-cu11==11.7.99 64 | nvidia-cudnn-cu11==8.5.0.96 65 | nvidia-cufft-cu11==10.9.0.58 66 | nvidia-curand-cu11==10.2.10.91 67 | nvidia-cusolver-cu11==11.4.0.1 68 | nvidia-cusparse-cu11==11.7.4.91 69 | nvidia-nccl-cu11==2.14.3 70 | nvidia-nvtx-cu11==11.7.91 71 | oauthlib==3.2.2 72 | opencv-python==4.7.0.72 73 | outcome==1.2.0 74 | packaging==23.1 75 | pandas==1.5.3 76 | parso==0.8.3 77 | pexpect==4.8.0 78 | pgraph-python==0.6.2 79 | pickleshare==0.7.5 80 | Pillow==9.5.0 81 | pinocchio==0.3 82 | platformdirs==3.5.1 83 | portalocker==2.8.2 84 | progress==1.6 85 | prompt-toolkit==3.0.38 86 | protobuf==3.20.3 87 | psutil==5.9.5 88 | ptyprocess==0.7.0 89 | pure-eval==0.2.2 90 | pyarrow==12.0.0 91 | pyasn1==0.5.0 92 | pyasn1-modules==0.3.0 93 | pybullet==3.2.5 94 | pycocotools==2.0.7 95 | pycollada==0.6 96 | pyglet==2.0.7 97 | Pygments==2.15.1 98 | PyOpenGL==3.1.0 99 | pyparsing==3.0.9 100 | pyrender==0.1.45 101 | python-dateutil==2.8.2 102 | 103 | pytz==2023.3 104 | PyYAML==5.1 105 | pyzmq==25.0.2 106 | requests==2.30.0 107 | requests-oauthlib==1.3.1 108 | roboticstoolbox-python==1.0.1 109 | rsa==4.9 110 | rtb-data==1.0.1 111 | scikit-learn==1.2.2 112 | scipy==1.10.1 113 | seaborn==0.12.2 114 | shapely==2.0.1 115 | simplejson==3.17.0 116 | six==1.16.0 117 | smplx==0.1.28 118 | sniffio==1.3.0 119 | sortedcontainers==2.4.0 120 | soupsieve==2.4 121 | spatialgeometry==1.0.3 122 | spatialmath-python==1.0.5 123 | stack-data==0.6.2 124 | swift-sim==1.0.1 125 | sympy==1.12 126 | tblib==1.7.0 127 | tensorboard==2.11.2 128 | tensorboard-data-server==0.6.1 129 | tensorboard-plugin-wit==1.8.1 130 | tensorboardX==2.6.2 131 | termcolor==2.2.0 132 | terminado==0.17.1 133 | thop==0.1.1.post2209072238 134 | threadpoolctl==3.1.0 135 | tinycss2==1.2.1 136 | tomli==2.0.1 137 | toolz==0.12.0 138 | torch==1.13.1+cu117 139 | torch-summary==1.4.5 140 | torch-utils==0.1.2 141 | torchgeometry==0.1.2 142 | torchnet==0.0.4 143 | torchvision==0.14.1+cu117 144 | tornado==6.3.1 145 | tqdm==4.41.1 146 | traitlets==5.9.0 147 | transform3d==0.0.4 148 | transforms3d==0.3.1 149 | trimesh==3.18.1 150 | trio==0.22.0 151 | trio-websocket==0.9.2 152 | triton==2.0.0 153 | typing_extensions==4.5.0 154 | tzdata==2023.3 155 | 156 | uri-template==1.2.0 157 | urllib3==1.26.14 158 | visdom==0.2.3 159 | wcwidth==0.2.6 160 | webcolors==1.12 161 | webencodings==0.5.1 162 | websocket-client==1.5.0 163 | websockets==11.0.3 164 | Werkzeug==2.2.3 165 | wget==3.2 166 | wsproto==1.2.0 167 | xarray==0.14.1 168 | y-py==0.5.9 169 | ypy-websocket==0.8.2 170 | zict==2.2.0 171 | zipp==3.15.0 172 | -------------------------------------------------------------------------------- /scripts/train.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) 4 | import argparse 5 | import yaml 6 | from lib.config import LOCAL_DATA_DIR 7 | from lib.core.config import make_cfg 8 | from scripts.train_depthnet import train_depthnet 9 | from scripts.train_sim2real import train_sim2real 10 | from scripts.train_full import train_full 11 | 12 | 13 | if __name__ == '__main__': 14 | parser = argparse.ArgumentParser('Training') 15 | parser.add_argument('--config', '-c', type=str, required=True, default='configs/cfg.yaml', help="hyperparameters path") 16 | args = parser.parse_args() 17 | cfg = make_cfg(args) 18 | 19 | print("------------------- config for this experiment -------------------") 20 | print(cfg) 21 | print("----------------------------------------------------------------------") 22 | 23 | if cfg.use_rootnet_with_reg_int_shared_backbone: 24 | print(f"\n pipeline: full network training (JointNet/RotationNet/KeypoinNet/DepthNet) \n") 25 | train_full(cfg) 26 | 27 | elif cfg.use_rootnet: 28 | print("\n pipeline: training DepthNet only \n") 29 | train_depthnet(cfg) 30 | 31 | elif cfg.use_sim2real: 32 | print("\n pipeline: self-supervised training on real datasets \n") 33 | train_sim2real(cfg) 34 | 35 | elif cfg.use_sim2real_real: 36 | print("\n pipeline: self-supervised training on my real datasets \n") 37 | # train_sim2real_real(cfg) 38 | 39 | 40 | 41 | -------------------------------------------------------------------------------- /scripts/train_full.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) 4 | import numpy as np 5 | import torch 6 | from lib.core.function import farward_loss, validate 7 | from lib.dataset.const import INITIAL_JOINT_ANGLE 8 | from lib.models.full_net import get_rootNetwithRegInt_model 9 | from lib.utils.urdf_robot import URDFRobot 10 | from lib.utils.utils import set_random_seed, create_logger, get_dataloaders, get_scheduler, resume_run, save_checkpoint 11 | from torchnet.meter import AverageValueMeter 12 | from tqdm import tqdm 13 | 14 | 15 | def train_full(args): 16 | 17 | torch.autograd.set_detect_anomaly(True) 18 | set_random_seed(808) 19 | 20 | save_folder, ckpt_folder, log_folder, writer = create_logger(args) 21 | 22 | urdf_robot_name = args.urdf_robot_name 23 | robot = URDFRobot(urdf_robot_name) 24 | 25 | device_id = args.device_id 26 | device = torch.device("cuda" if torch.cuda.is_available() and not args.no_cuda else "cpu") 27 | 28 | ds_iter_train, test_loader_dict = get_dataloaders(args) 29 | 30 | init_param_dict = { 31 | "robot_type" : urdf_robot_name, 32 | "pose_params": INITIAL_JOINT_ANGLE, 33 | "cam_params": np.eye(4,dtype=float), 34 | "init_pose_from_mean": True 35 | } 36 | if args.use_rootnet_with_reg_int_shared_backbone: 37 | print("regression and integral shared backbone, with rootnet 2 backbones in total") 38 | model = get_rootNetwithRegInt_model(init_param_dict, args) 39 | else: 40 | assert 0 41 | 42 | optimizer = torch.optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.weight_decay) 43 | 44 | curr_max_auc = 0.0 45 | curr_max_auc_4real = { "azure": 0.0, "kinect": 0.0, "realsense": 0.0, "orb": 0.0 } 46 | start_epoch, last_epoch, end_epoch = 0, -1, args.n_epochs 47 | if args.resume_run: 48 | start_epoch, last_epoch, curr_max_auc, curr_max_auc_4real = resume_run(args, model, optimizer, device) 49 | 50 | lr_scheduler = get_scheduler(args, optimizer, last_epoch) 51 | 52 | 53 | for epoch in range(start_epoch, end_epoch + 1): 54 | print('In epoch {}, script: full network training (JointNet/RotationNet/KeypoinNet/DepthNet)'.format(epoch + 1)) 55 | model.train() 56 | iterator = tqdm(ds_iter_train, dynamic_ncols=True) 57 | losses = AverageValueMeter() 58 | losses_pose, losses_rot, losses_trans, losses_uv, losses_depth, losses_error2d, losses_error3d, losses_error2d_int, losses_error3d_int, losses_error3d_align = \ 59 | AverageValueMeter(),AverageValueMeter(),AverageValueMeter(),AverageValueMeter(),AverageValueMeter(),AverageValueMeter(),AverageValueMeter(),AverageValueMeter(),AverageValueMeter(),AverageValueMeter() 60 | for batchid, sample in enumerate(iterator): 61 | optimizer.zero_grad() 62 | loss, loss_dict = farward_loss(args=args, input_batch=sample, model=model, robot=robot, device=device, device_id=device_id, train=True) 63 | loss.backward() 64 | if args.clip_gradient is not None: 65 | clipping_value = args.clip_gradient 66 | torch.nn.utils.clip_grad_norm_(model.parameters(), clipping_value) 67 | optimizer.step() 68 | losses.add(loss.detach().cpu().numpy()) 69 | losses_pose.add(loss_dict["loss_joint"].detach().cpu().numpy()) 70 | losses_rot.add(loss_dict["loss_rot"].detach().cpu().numpy()) 71 | losses_trans.add(loss_dict["loss_trans"].detach().cpu().numpy()) 72 | losses_uv.add(loss_dict["loss_uv"].detach().cpu().numpy()) 73 | losses_depth.add(loss_dict["loss_depth"].detach().cpu().numpy()) 74 | losses_error2d.add(loss_dict["loss_error2d"].detach().cpu().numpy()) 75 | losses_error3d.add(loss_dict["loss_error3d"].detach().cpu().numpy()) 76 | losses_error2d_int.add(loss_dict["loss_error2d_int"].detach().cpu().numpy()) 77 | losses_error3d_int.add(loss_dict["loss_error3d_int"].detach().cpu().numpy()) 78 | losses_error3d_align.add(loss_dict["loss_error3d_align"].detach().cpu().numpy()) 79 | 80 | if (batchid+1) % 100 == 0: # Every 100 mini-batches/iterations 81 | writer.add_scalar('Train/loss', losses.mean , epoch * len(ds_iter_train) + batchid + 1) 82 | writer.add_scalar('Train/pose_loss', losses_pose.mean , epoch * len(ds_iter_train) + batchid + 1) 83 | writer.add_scalar('Train/rot_loss', losses_rot.mean , epoch * len(ds_iter_train) + batchid + 1) 84 | writer.add_scalar('Train/trans_loss', losses_trans.mean , epoch * len(ds_iter_train) + batchid + 1) 85 | writer.add_scalar('Train/uv_loss', losses_uv.mean , epoch * len(ds_iter_train) + batchid + 1) 86 | writer.add_scalar('Train/depth_loss', losses_depth.mean , epoch * len(ds_iter_train) + batchid + 1) 87 | writer.add_scalar('Train/error2d_loss', losses_error2d.mean, epoch * len(ds_iter_train) + batchid + 1) 88 | writer.add_scalar('Train/error3d_loss', losses_error3d.mean, epoch * len(ds_iter_train) + batchid + 1) 89 | writer.add_scalar('Train/error2d_int_loss', losses_error2d_int.mean, epoch * len(ds_iter_train) + batchid + 1) 90 | writer.add_scalar('Train/error3d_int_loss', losses_error3d_int.mean, epoch * len(ds_iter_train) + batchid + 1) 91 | writer.add_scalar('Train/error3d_align_loss', losses_error3d_align.mean, epoch * len(ds_iter_train) + batchid + 1) 92 | losses.reset() 93 | losses_pose.reset() 94 | losses_rot.reset() 95 | losses_trans.reset() 96 | losses_uv.reset() 97 | losses_depth.reset() 98 | losses_error2d.reset() 99 | losses_error3d.reset() 100 | losses_error2d_int.reset() 101 | losses_error3d_int.reset() 102 | losses_error3d_align.reset() 103 | writer.add_scalar('LR/learning_rate_opti', optimizer.param_groups[0]['lr'], epoch * len(ds_iter_train) + batchid + 1) 104 | if len(optimizer.param_groups) > 1: 105 | for pgid in range(1,len(optimizer.param_groups)): 106 | writer.add_scalar(f'LR/learning_rate_opti_{pgid}', optimizer.param_groups[pgid]['lr'], epoch * len(ds_iter_train) + batchid + 1) 107 | if args.use_schedule: 108 | lr_scheduler.step() 109 | 110 | auc_adds = {} 111 | for dsname, loader in test_loader_dict.items(): 112 | auc_add = validate(args=args, epoch=epoch, dsname=dsname, loader=loader, model=model, 113 | robot=robot, writer=writer, device=device, device_id=device_id) 114 | auc_adds[dsname] = auc_add 115 | 116 | save_checkpoint(args=args, auc_adds=auc_adds, 117 | model=model, optimizer=optimizer, 118 | ckpt_folder=ckpt_folder, 119 | epoch=epoch, lr_scheduler=lr_scheduler, 120 | curr_max_auc=curr_max_auc, 121 | curr_max_auc_4real=curr_max_auc_4real) 122 | 123 | print("Training Finished !") 124 | writer.flush() 125 | --------------------------------------------------------------------------------

HoRoPose: Real-time Holistic Robot Pose Estimation with Unknown States (ECCV 2024)

HoRoPose: Real-time Holistic Robot Pose Estimation with Unknown States
(ECCV 2024)