├── LICENSE ├── README.md ├── ckpt └── README.md ├── common ├── camera.py ├── generator.py ├── h36m_dataset.py ├── load_data_hm36.py ├── mocap_dataset.py ├── opt.py ├── skeleton.py └── utils.py ├── demo ├── figure │ ├── lindan.jpg │ └── messi.jpg ├── lib │ ├── hrnet │ │ ├── experiments │ │ │ └── w48_384x288_adam_lr1e-3.yaml │ │ ├── gen_kpts.py │ │ └── lib │ │ │ ├── config │ │ │ ├── __init__.py │ │ │ ├── __pycache__ │ │ │ │ ├── __init__.cpython-38.pyc │ │ │ │ ├── __init__.cpython-39.pyc │ │ │ │ ├── default.cpython-38.pyc │ │ │ │ ├── default.cpython-39.pyc │ │ │ │ ├── models.cpython-38.pyc │ │ │ │ └── models.cpython-39.pyc │ │ │ ├── default.py │ │ │ └── models.py │ │ │ ├── models │ │ │ ├── __pycache__ │ │ │ │ ├── pose_hrnet.cpython-38.pyc │ │ │ │ └── pose_hrnet.cpython-39.pyc │ │ │ └── pose_hrnet.py │ │ │ └── utils │ │ │ ├── __pycache__ │ │ │ ├── coco_h36m.cpython-39.pyc │ │ │ ├── inference.cpython-39.pyc │ │ │ ├── transforms.cpython-39.pyc │ │ │ └── utilitys.cpython-39.pyc │ │ │ ├── coco_h36m.py │ │ │ ├── inference.py │ │ │ ├── transforms.py │ │ │ └── utilitys.py │ ├── preprocess.py │ ├── sort │ │ └── sort.py │ └── yolov3 │ │ ├── bbox.py │ │ ├── cfg │ │ ├── tiny-yolo-voc.cfg │ │ ├── yolo-voc.cfg │ │ ├── yolo.cfg │ │ └── yolov3.cfg │ │ ├── darknet.py │ │ ├── data │ │ ├── coco.names │ │ ├── pallete │ │ └── voc.names │ │ ├── human_detector.py │ │ ├── preprocess.py │ │ └── util.py └── vis.py ├── figure ├── README.md ├── messi_pose.png ├── structure.png └── wild.png ├── main.py ├── model ├── Block.py ├── GCN_conv.py ├── Transformer.py ├── __pycache__ │ ├── Block.cpython-39.pyc │ ├── GCN_conv.cpython-39.pyc │ ├── Transformer.cpython-39.pyc │ └── trans.cpython-39.pyc ├── post_refine.py ├── refine.py └── trans.py ├── requirement.txt └── runs └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 vefalun 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # HTNet: Human Topology Aware Network for 3D Human Pose Estimation 2 | 3 |

4 | 5 | > [**HTNet: Human Topology Aware Network for 3D Human Pose Estimation**](https://arxiv.org/pdf/2302.09790), 6 | > Jialun Cai, Hong Liu, Runwei Ding , Wenhao Li, Jianbing Wu, Miaoju Ban 7 | > *In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023* 8 | 9 | 10 | ## Results on Human3.6M 11 | 12 | Protocol 1 (mean per-joint position error) when 2D keypoints detected by CPN and the ground truth of 2D poses. 13 | 14 | | Method | Train Epochs | MPJPE (CPN) | MPJPE (GT) | 15 | |:-------|:-------:|:-------:|:-------:| 16 | | GraFormer | 50 | 51.8 mm | 35.2 mm | 17 | | MGCN (w/refine)| 50 | 49.4 mm | 33.5 mm | 37.4 mm | 18 | | HTNet | 15 | 48.9 mm |34.0 mm| 19 | | HTNet (w/refine) | **15** | **47.6 mm** |**31.9 mm**| 20 | 21 | 22 | ## Get started directly 23 | Special thanks to [MHFormer](https://github.com/Vegetebird/MHFormer), we have completed a **beginner's guide** for image-based pose estimation. 24 | Only three steps that poses can be generated for your own images:(1) Download pretrained models (Yolov3 and HRNet) [here](https://drive.google.com/drive/folders/1LX5zhZGlZjckgfpNroWsuu84xyyFYE5X) and put them in the './demo/lib/checkpoint'; (2)download [pretrained model](https://drive.google.com/drive/folders/134lqqu-0I6aOYr7lRufa6fMTdqm7K9Qk) and put it in the './ckpt' directory; (3) 25 | Put your own images in the './demo/figure', and run: 26 | ``` 27 | python demo/vis.py 28 | ``` 29 | Then you can obtain the visualized pose in the "./demo/output", like: 30 |

31 | 32 | 33 | ## Quick start 34 | To get started as quickly as possible, follow the instructions in this section. This should allow you train a model from scratch, test our pretrained models. 35 | 36 | 37 | ### Dependencies 38 | Make sure you have the following dependencies installed before proceeding: 39 | - Python 3.7+ 40 | - PyTorch >= 1.10.0 41 | To setup the environment: 42 | ```sh 43 | pip install -r requirement.txt 44 | ``` 45 | 46 | 47 | ### Dataset setup 48 | Please download the dataset [here](https://drive.google.com/drive/folders/1gNs5PrcaZ6gar7IiNZPNh39T7y6aPY3g) and refer to [VideoPose3D](https://github.com/facebookresearch/VideoPose3D) to set up the Human3.6M dataset ('./dataset' directory). 49 | 50 | ```bash 51 | ${POSE_ROOT}/ 52 | |-- dataset 53 | | |-- data_3d_h36m.npz 54 | | |-- data_2d_h36m_gt.npz 55 | | |-- data_2d_h36m_cpn_ft_h36m_dbb.npz 56 | ``` 57 | 58 | ### Evaluating our pre-trained models 59 | The pretrained model is [here](https://drive.google.com/drive/folders/134lqqu-0I6aOYr7lRufa6fMTdqm7K9Qk), please download it and put it in the './ckpt' directory. To achieve the performance in the paper, run: 60 | ``` 61 | python main.py --reload --previous_dir "ckpt/cpn" 62 | ``` 63 | 64 | ### Training your models 65 | If you want to train your own model, run: 66 | ``` 67 | python main.py --train -n "your_model_name" 68 | ``` 69 | 70 | 71 | ## Acknowledgement 72 | 73 | Our code is extended from the following repositories. We thank the authors for releasing the codes. 74 | - [MHFormer](https://github.com/Vegetebird/MHFormer) 75 | - [MGCN](https://github.com/ZhimingZo/Modulated-GCN) 76 | - [VideoPose3D](https://github.com/facebookresearch/VideoPose3D) 77 | - [3d-pose-baseline](https://github.com/una-dinosauria/3d-pose-baseline) 78 | - [3d_pose_baseline_pytorch](https://github.com/weigq/3d_pose_baseline_pytorch) 79 | - [StridedTransformer-Pose3D](https://github.com/Vegetebird/StridedTransformer-Pose3D) 80 | ## Licence 81 | 82 | This project is licensed under the terms of the MIT license. 83 | -------------------------------------------------------------------------------- /ckpt/README.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /common/camera.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import numpy as np 3 | import torch 4 | def normalize_screen_coordinates(X, w, h): 5 | assert X.shape[-1] == 2 6 | return X / w * 2 - [1, h / w] 7 | 8 | 9 | def world_to_camera(X, R, t): 10 | Rt = wrap(qinverse, R) 11 | return wrap(qrot, np.tile(Rt, (*X.shape[:-1], 1)), X - t) 12 | 13 | def camera_to_world(X, R, t): 14 | return wrap(qrot, np.tile(R, (*X.shape[:-1], 1)), X) + t 15 | 16 | 17 | def wrap(func, *args, unsqueeze=False): 18 | args = list(args) 19 | for i, arg in enumerate(args): 20 | if type(arg) == np.ndarray: 21 | args[i] = torch.from_numpy(arg) 22 | if unsqueeze: 23 | args[i] = args[i].unsqueeze(0) 24 | 25 | result = func(*args) 26 | 27 | if isinstance(result, tuple): 28 | result = list(result) 29 | for i, res in enumerate(result): 30 | if type(res) == torch.Tensor: 31 | if unsqueeze: 32 | res = res.squeeze(0) 33 | result[i] = res.numpy() 34 | return tuple(result) 35 | elif type(result) == torch.Tensor: 36 | if unsqueeze: 37 | result = result.squeeze(0) 38 | return result.numpy() 39 | else: 40 | return result 41 | 42 | def qrot(q, v): 43 | assert q.shape[-1] == 4 44 | assert v.shape[-1] == 3 45 | assert q.shape[:-1] == v.shape[:-1] 46 | 47 | qvec = q[..., 1:] 48 | uv = torch.cross(qvec, v, dim=len(q.shape) - 1) 49 | uuv = torch.cross(qvec, uv, dim=len(q.shape) - 1) 50 | return (v + 2 * (q[..., :1] * uv + uuv)) 51 | 52 | 53 | 54 | 55 | def qinverse(q, inplace=False): 56 | if inplace: 57 | q[..., 1:] *= -1 58 | return q 59 | else: 60 | w = q[..., :1] 61 | xyz = q[..., 1:] 62 | return torch.cat((w, -xyz), dim=len(q.shape) - 1) 63 | 64 | 65 | 66 | def get_uvd2xyz(uvd, gt_3D, cam): 67 | N, T, V,_ = uvd.size() 68 | 69 | dec_out_all = uvd.view(-1, T, V, 3).clone() 70 | root = gt_3D[:, :, 0, :].unsqueeze(-2).repeat(1, 1, V, 1).clone() 71 | enc_in_all = uvd[:, :, :, :2].view(-1, T, V, 2).clone() 72 | 73 | cam_f_all = cam[..., :2].view(-1,1,1,2).repeat(1,T,V,1) 74 | cam_c_all = cam[..., 2:4].view(-1,1,1,2).repeat(1,T,V,1) 75 | 76 | z_global = dec_out_all[:, :, :, 2] 77 | z_global[:, :, 0] = root[:, :, 0, 2] 78 | z_global[:, :, 1:] = dec_out_all[:, :, 1:, 2] + root[:, :, 1:, 2] 79 | z_global = z_global.unsqueeze(-1) 80 | 81 | uv = enc_in_all - cam_c_all 82 | xy = uv * z_global.repeat(1, 1, 1, 2) / cam_f_all 83 | xyz_global = torch.cat((xy, z_global), -1) 84 | xyz_offset = (xyz_global - xyz_global[:, :, 0, :].unsqueeze(-2).repeat(1, 1, V, 1)) 85 | 86 | return xyz_offset -------------------------------------------------------------------------------- /common/h36m_dataset.py: -------------------------------------------------------------------------------- 1 | 2 | import numpy as np 3 | import copy 4 | from common.skeleton import Skeleton 5 | from common.mocap_dataset import MocapDataset 6 | from common.camera import normalize_screen_coordinates 7 | 8 | h36m_skeleton = Skeleton(parents=[-1, 0, 1, 2, 3, 4, 0, 6, 7, 8, 9, 0, 11, 12, 13, 14, 12, 9 | 16, 17, 18, 19, 20, 19, 22, 12, 24, 25, 26, 27, 28, 27, 30], 10 | joints_left=[6, 7, 8, 9, 10, 16, 17, 18, 19, 20, 21, 22, 23], 11 | joints_right=[1, 2, 3, 4, 5, 24, 25, 26, 27, 28, 29, 30, 31]) 12 | 13 | h36m_cameras_intrinsic_params = [ 14 | { 15 | 'id': '54138969', 16 | 'center': [512.54150390625, 515.4514770507812], 17 | 'focal_length': [1145.0494384765625, 1143.7811279296875], 18 | 'radial_distortion': [-0.20709891617298126, 0.24777518212795258, -0.0030751503072679043], 19 | 'tangential_distortion': [-0.0009756988729350269, -0.00142447161488235], 20 | 'res_w': 1000, 21 | 'res_h': 1002, 22 | 'azimuth': 70, 23 | }, 24 | { 25 | 'id': '55011271', 26 | 'center': [508.8486328125, 508.0649108886719], 27 | 'focal_length': [1149.6756591796875, 1147.5916748046875], 28 | 'radial_distortion': [-0.1942136287689209, 0.2404085397720337, 0.006819975562393665], 29 | 'tangential_distortion': [-0.0016190266469493508, -0.0027408944442868233], 30 | 'res_w': 1000, 31 | 'res_h': 1000, 32 | 'azimuth': -70, 33 | }, 34 | { 35 | 'id': '58860488', 36 | 'center': [519.8158569335938, 501.40264892578125], 37 | 'focal_length': [1149.1407470703125, 1148.7989501953125], 38 | 'radial_distortion': [-0.2083381861448288, 0.25548800826072693, -0.0024604974314570427], 39 | 'tangential_distortion': [0.0014843869721516967, -0.0007599993259645998], 40 | 'res_w': 1000, 41 | 'res_h': 1000, 42 | 'azimuth': 110, 43 | }, 44 | { 45 | 'id': '60457274', 46 | 'center': [514.9682006835938, 501.88201904296875], 47 | 'focal_length': [1145.5113525390625, 1144.77392578125], 48 | 'radial_distortion': [-0.198384091258049, 0.21832367777824402, -0.008947807364165783], 49 | 'tangential_distortion': [-0.0005872055771760643, -0.0018133620033040643], 50 | 'res_w': 1000, 51 | 'res_h': 1002, 52 | 'azimuth': -110, 53 | }, 54 | ] 55 | 56 | h36m_cameras_extrinsic_params = { 57 | 'S1': [ 58 | { 59 | 'orientation': [0.1407056450843811, -0.1500701755285263, -0.755240797996521, 0.6223280429840088], 60 | 'translation': [1841.1070556640625, 4955.28466796875, 1563.4454345703125], 61 | }, 62 | { 63 | 'orientation': [0.6157187819480896, -0.764836311340332, -0.14833825826644897, 0.11794740706682205], 64 | 'translation': [1761.278564453125, -5078.0068359375, 1606.2650146484375], 65 | }, 66 | { 67 | 'orientation': [0.14651472866535187, -0.14647851884365082, 0.7653023600578308, -0.6094175577163696], 68 | 'translation': [-1846.7777099609375, 5215.04638671875, 1491.972412109375], 69 | }, 70 | { 71 | 'orientation': [0.5834008455276489, -0.7853162288665771, 0.14548823237419128, -0.14749594032764435], 72 | 'translation': [-1794.7896728515625, -3722.698974609375, 1574.8927001953125], 73 | }, 74 | ], 75 | 'S2': [ 76 | {}, 77 | {}, 78 | {}, 79 | {}, 80 | ], 81 | 'S3': [ 82 | {}, 83 | {}, 84 | {}, 85 | {}, 86 | ], 87 | 'S4': [ 88 | {}, 89 | {}, 90 | {}, 91 | {}, 92 | ], 93 | 'S5': [ 94 | { 95 | 'orientation': [0.1467377245426178, -0.162370964884758, -0.7551892995834351, 0.6178938746452332], 96 | 'translation': [2097.3916015625, 4880.94482421875, 1605.732421875], 97 | }, 98 | { 99 | 'orientation': [0.6159758567810059, -0.7626792192459106, -0.15728192031383514, 0.1189815029501915], 100 | 'translation': [2031.7008056640625, -5167.93310546875, 1612.923095703125], 101 | }, 102 | { 103 | 'orientation': [0.14291371405124664, -0.12907841801643372, 0.7678384780883789, -0.6110143065452576], 104 | 'translation': [-1620.5948486328125, 5171.65869140625, 1496.43701171875], 105 | }, 106 | { 107 | 'orientation': [0.5920479893684387, -0.7814217805862427, 0.1274748593568802, -0.15036417543888092], 108 | 'translation': [-1637.1737060546875, -3867.3173828125, 1547.033203125], 109 | }, 110 | ], 111 | 'S6': [ 112 | { 113 | 'orientation': [0.1337897777557373, -0.15692396461963654, -0.7571090459823608, 0.6198879480361938], 114 | 'translation': [1935.4517822265625, 4950.24560546875, 1618.0838623046875], 115 | }, 116 | { 117 | 'orientation': [0.6147197484970093, -0.7628812789916992, -0.16174767911434174, 0.11819244921207428], 118 | 'translation': [1969.803955078125, -5128.73876953125, 1632.77880859375], 119 | }, 120 | { 121 | 'orientation': [0.1529948115348816, -0.13529130816459656, 0.7646096348762512, -0.6112781167030334], 122 | 'translation': [-1769.596435546875, 5185.361328125, 1476.993408203125], 123 | }, 124 | { 125 | 'orientation': [0.5916101336479187, -0.7804774045944214, 0.12832270562648773, -0.1561593860387802], 126 | 'translation': [-1721.668701171875, -3884.13134765625, 1540.4879150390625], 127 | }, 128 | ], 129 | 'S7': [ 130 | { 131 | 'orientation': [0.1435241848230362, -0.1631336808204651, -0.7548328638076782, 0.6188824772834778], 132 | 'translation': [1974.512939453125, 4926.3544921875, 1597.8326416015625], 133 | }, 134 | { 135 | 'orientation': [0.6141672730445862, -0.7638262510299683, -0.1596645563840866, 0.1177929937839508], 136 | 'translation': [1937.0584716796875, -5119.7900390625, 1631.5665283203125], 137 | }, 138 | { 139 | 'orientation': [0.14550060033798218, -0.12874816358089447, 0.7660516500473022, -0.6127139329910278], 140 | 'translation': [-1741.8111572265625, 5208.24951171875, 1464.8245849609375], 141 | }, 142 | { 143 | 'orientation': [0.5912848114967346, -0.7821764349937439, 0.12445473670959473, -0.15196487307548523], 144 | 'translation': [-1734.7105712890625, -3832.42138671875, 1548.5830078125], 145 | }, 146 | ], 147 | 'S8': [ 148 | { 149 | 'orientation': [0.14110587537288666, -0.15589867532253265, -0.7561917304992676, 0.619644045829773], 150 | 'translation': [2150.65185546875, 4896.1611328125, 1611.9046630859375], 151 | }, 152 | { 153 | 'orientation': [0.6169601678848267, -0.7647668123245239, -0.14846350252628326, 0.11158157885074615], 154 | 'translation': [2219.965576171875, -5148.453125, 1613.0440673828125], 155 | }, 156 | { 157 | 'orientation': [0.1471444070339203, -0.13377119600772858, 0.7670128345489502, -0.6100369691848755], 158 | 'translation': [-1571.2215576171875, 5137.0185546875, 1498.1761474609375], 159 | }, 160 | { 161 | 'orientation': [0.5927824378013611, -0.7825870513916016, 0.12147816270589828, -0.14631995558738708], 162 | 'translation': [-1476.913330078125, -3896.7412109375, 1547.97216796875], 163 | }, 164 | ], 165 | 'S9': [ 166 | { 167 | 'orientation': [0.15540587902069092, -0.15548215806484222, -0.7532095313072205, 0.6199594736099243], 168 | 'translation': [2044.45849609375, 4935.1171875, 1481.2275390625], 169 | }, 170 | { 171 | 'orientation': [0.618784487247467, -0.7634735107421875, -0.14132238924503326, 0.11933968216180801], 172 | 'translation': [1990.959716796875, -5123.810546875, 1568.8048095703125], 173 | }, 174 | { 175 | 'orientation': [0.13357827067375183, -0.1367100477218628, 0.7689454555511475, -0.6100738644599915], 176 | 'translation': [-1670.9921875, 5211.98583984375, 1528.387939453125], 177 | }, 178 | { 179 | 'orientation': [0.5879399180412292, -0.7823407053947449, 0.1427614390850067, -0.14794869720935822], 180 | 'translation': [-1696.04345703125, -3827.099853515625, 1591.4127197265625], 181 | }, 182 | ], 183 | 'S11': [ 184 | { 185 | 'orientation': [0.15232472121715546, -0.15442320704460144, -0.7547563314437866, 0.6191070079803467], 186 | 'translation': [2098.440185546875, 4926.5546875, 1500.278564453125], 187 | }, 188 | { 189 | 'orientation': [0.6189449429512024, -0.7600917220115662, -0.15300633013248444, 0.1255258321762085], 190 | 'translation': [2083.182373046875, -4912.1728515625, 1561.07861328125], 191 | }, 192 | { 193 | 'orientation': [0.14943228662014008, -0.15650227665901184, 0.7681233882904053, -0.6026304364204407], 194 | 'translation': [-1609.8153076171875, 5177.3359375, 1537.896728515625], 195 | }, 196 | { 197 | 'orientation': [0.5894251465797424, -0.7818877100944519, 0.13991211354732513, -0.14715361595153809], 198 | 'translation': [-1590.738037109375, -3854.1689453125, 1578.017578125], 199 | }, 200 | ], 201 | } 202 | 203 | 204 | class Human36mDataset(MocapDataset): 205 | def __init__(self, path, opt, remove_static_joints=True): 206 | super().__init__(fps=50, skeleton=h36m_skeleton) 207 | self.train_list = ['S1', 'S5', 'S6', 'S7', 'S8'] 208 | self.test_list = ['S9', 'S11'] 209 | 210 | self._cameras = copy.deepcopy(h36m_cameras_extrinsic_params) 211 | for cameras in self._cameras.values(): 212 | for i, cam in enumerate(cameras): 213 | cam.update(h36m_cameras_intrinsic_params[i]) 214 | for k, v in cam.items(): 215 | if k not in ['id', 'res_w', 'res_h']: 216 | cam[k] = np.array(v, dtype='float32') 217 | 218 | if opt.crop_uv == 0: 219 | cam['center'] = normalize_screen_coordinates(cam['center'], w=cam['res_w'], h=cam['res_h']).astype( 220 | 'float32') 221 | cam['focal_length'] = cam['focal_length'] / cam['res_w'] * 2 222 | 223 | if 'translation' in cam: 224 | cam['translation'] = cam['translation'] / 1000 225 | 226 | cam['intrinsic'] = np.concatenate((cam['focal_length'], 227 | cam['center'], 228 | cam['radial_distortion'], 229 | cam['tangential_distortion'])) 230 | 231 | data = np.load(path,allow_pickle=True)['positions_3d'].item() 232 | 233 | self._data = {} 234 | for subject, actions in data.items(): 235 | self._data[subject] = {} 236 | for action_name, positions in actions.items(): 237 | self._data[subject][action_name] = { 238 | 'positions': positions, 239 | 'cameras': self._cameras[subject], 240 | } 241 | 242 | if remove_static_joints: 243 | self.remove_joints([4, 5, 9, 10, 11, 16, 20, 21, 22, 23, 24, 28, 29, 30, 31]) 244 | 245 | self._skeleton._parents[11] = 8 246 | self._skeleton._parents[14] = 8 247 | 248 | def supports_semi_supervised(self): 249 | return True 250 | 251 | 252 | 253 | -------------------------------------------------------------------------------- /common/load_data_hm36.py: -------------------------------------------------------------------------------- 1 | 2 | import torch.utils.data as data 3 | import numpy as np 4 | 5 | from common.utils import deterministic_random 6 | from common.camera import world_to_camera, normalize_screen_coordinates 7 | from common.generator import ChunkedGenerator, ChunkedGenerator_Seq 8 | 9 | class Fusion(data.Dataset): #crop:0, downsample:0 pad:0 stride:1 10 | def __init__(self, opt, dataset, root_path, train=True): 11 | self.data_type = opt.dataset 12 | self.train = train 13 | self.keypoints_name = opt.keypoints 14 | self.root_path = root_path 15 | 16 | self.train_list = opt.subjects_train.split(',') 17 | self.test_list = opt.subjects_test.split(',') 18 | self.action_filter = None if opt.actions == '*' else opt.actions.split(',') 19 | self.downsample = opt.downsample 20 | self.subset = opt.subset 21 | self.stride = opt.stride 22 | self.crop_uv = opt.crop_uv 23 | self.test_aug = opt.test_augmentation 24 | self.pad = opt.pad 25 | causal_shift = 0 26 | if self.train: 27 | self.keypoints = self.prepare_data(dataset, self.train_list) 28 | self.cameras_train, self.poses_train, self.poses_train_2d = self.fetch(dataset, self.train_list, 29 | subset=self.subset) 30 | self.generator = ChunkedGenerator(opt.batch_size, self.cameras_train, self.poses_train, 31 | self.poses_train_2d, self.stride, pad=self.pad, 32 | augment=opt.data_augmentation, reverse_aug=opt.reverse_augmentation, 33 | kps_left=self.kps_left, kps_right=self.kps_right, 34 | joints_left=self.joints_left, 35 | joints_right=self.joints_right, out_all=opt.out_all) 36 | 37 | print('INFO: Training on {} frames'.format(self.generator.num_frames())) 38 | else: 39 | self.keypoints = self.prepare_data(dataset, self.test_list) 40 | self.cameras_test, self.poses_test, self.poses_test_2d = self.fetch(dataset, self.test_list, 41 | subset=self.subset) 42 | self.generator = ChunkedGenerator(opt.batch_size, self.cameras_test, self.poses_test, 43 | self.poses_test_2d, 44 | pad=self.pad, augment=False, kps_left=self.kps_left, 45 | kps_right=self.kps_right, joints_left=self.joints_left, 46 | joints_right=self.joints_right) 47 | self.key_index = self.generator.saved_index 48 | print('INFO: Testing on {} frames'.format(self.generator.num_frames())) 49 | 50 | def prepare_data(self, dataset, folder_list): 51 | for subject in folder_list: 52 | for action in dataset[subject].keys(): 53 | anim = dataset[subject][action] 54 | 55 | positions_3d = [] 56 | for cam in anim['cameras']: 57 | pos_3d = world_to_camera(anim['positions'], R=cam['orientation'], t=cam['translation']) 58 | pos_3d[:, 1:] -= pos_3d[:, :1]#(1265, 17, 3) 59 | positions_3d.append(pos_3d) 60 | anim['positions_3d'] = positions_3d 61 | 62 | keypoints = np.load(self.root_path + 'data_2d_' + self.data_type + '_' + self.keypoints_name + '.npz',allow_pickle=True) 63 | keypoints_symmetry = keypoints['metadata'].item()['keypoints_symmetry'] 64 | 65 | self.kps_left, self.kps_right = list(keypoints_symmetry[0]), list(keypoints_symmetry[1]) 66 | self.joints_left, self.joints_right = list(dataset.skeleton().joints_left()), list(dataset.skeleton().joints_right()) 67 | keypoints = keypoints['positions_2d'].item() 68 | 69 | for subject in folder_list: 70 | assert subject in keypoints, 'Subject {} is missing from the 2D detections dataset'.format(subject) 71 | for action in dataset[subject].keys(): 72 | assert action in keypoints[ 73 | subject], 'Action {} of subject {} is missing from the 2D detections dataset'.format(action, 74 | subject) 75 | for cam_idx in range(len(keypoints[subject][action])): 76 | 77 | mocap_length = dataset[subject][action]['positions_3d'][cam_idx].shape[0] 78 | assert keypoints[subject][action][cam_idx].shape[0] >= mocap_length 79 | 80 | if keypoints[subject][action][cam_idx].shape[0] > mocap_length: 81 | keypoints[subject][action][cam_idx] = keypoints[subject][action][cam_idx][:mocap_length] 82 | 83 | for subject in keypoints.keys(): 84 | for action in keypoints[subject]: 85 | for cam_idx, kps in enumerate(keypoints[subject][action]): 86 | cam = dataset.cameras()[subject][cam_idx] 87 | if self.crop_uv == 0: 88 | kps[..., :2] = normalize_screen_coordinates(kps[..., :2], w=cam['res_w'], h=cam['res_h']) 89 | keypoints[subject][action][cam_idx] = kps 90 | 91 | return keypoints 92 | 93 | def fetch(self, dataset, subjects, subset=1, parse_3d_poses=True): #self.cameras_train, self.poses_train, self.poses_train_2d 94 | out_poses_3d = {} 95 | out_poses_2d = {} 96 | out_camera_params = {} 97 | 98 | for subject in subjects: 99 | for action in self.keypoints[subject].keys(): 100 | if self.action_filter is not None: 101 | found = False 102 | for a in self.action_filter: 103 | if action.startswith(a): 104 | found = True 105 | break 106 | if not found: 107 | continue 108 | 109 | poses_2d = self.keypoints[subject][action] 110 | 111 | for i in range(len(poses_2d)): 112 | out_poses_2d[(subject, action, i)] = poses_2d[i] 113 | 114 | if subject in dataset.cameras(): 115 | cams = dataset.cameras()[subject] 116 | assert len(cams) == len(poses_2d), 'Camera count mismatch' 117 | for i, cam in enumerate(cams): 118 | if 'intrinsic' in cam: 119 | out_camera_params[(subject, action, i)] = cam['intrinsic'] 120 | 121 | if parse_3d_poses and 'positions_3d' in dataset[subject][action]: 122 | poses_3d = dataset[subject][action]['positions_3d'] 123 | assert len(poses_3d) == len(poses_2d), 'Camera count mismatch' 124 | for i in range(len(poses_3d)): 125 | out_poses_3d[(subject, action, i)] = poses_3d[i] 126 | 127 | if len(out_camera_params) == 0: 128 | out_camera_params = None 129 | if len(out_poses_3d) == 0: 130 | out_poses_3d = None 131 | 132 | stride = self.downsample 133 | if subset < 1: 134 | for key in out_poses_2d.keys(): 135 | n_frames = int(round(len(out_poses_2d[key]) // stride * subset) * stride) 136 | start = deterministic_random(0, len(out_poses_2d[key]) - n_frames + 1, str(len(out_poses_2d[key]))) 137 | out_poses_2d[key] = out_poses_2d[key][start:start + n_frames:stride] 138 | if out_poses_3d is not None: 139 | out_poses_3d[key] = out_poses_3d[key][start:start + n_frames:stride] 140 | elif stride > 1: #这一步 141 | for key in out_poses_2d.keys(): 142 | out_poses_2d[key] = out_poses_2d[key][::stride] 143 | if out_poses_3d is not None: 144 | out_poses_3d[key] = out_poses_3d[key][::stride] 145 | 146 | return out_camera_params, out_poses_3d, out_poses_2d 147 | 148 | def __len__(self): 149 | return len(self.generator.pairs) 150 | 151 | def __getitem__(self, index): 152 | seq_name, start_3d, end_3d, flip, reverse = self.generator.pairs[index] 153 | 154 | cam, gt_3D, input_2D, action, subject, cam_ind = self.generator.get_batch(seq_name, start_3d, end_3d, flip, reverse) 155 | 156 | if self.train == False and self.test_aug: 157 | _, _, input_2D_aug, _, _,_ = self.generator.get_batch(seq_name, start_3d, end_3d, flip=True, reverse=reverse) 158 | input_2D = np.concatenate((np.expand_dims(input_2D,axis=0),np.expand_dims(input_2D_aug,axis=0)),0) 159 | 160 | bb_box = np.array([0, 0, 1, 1]) 161 | input_2D_update = input_2D 162 | 163 | scale = np.float(1.0) 164 | 165 | return cam, gt_3D, input_2D_update, action, subject, scale, bb_box, cam_ind 166 | 167 | 168 | 169 | -------------------------------------------------------------------------------- /common/mocap_dataset.py: -------------------------------------------------------------------------------- 1 | 2 | 3 | class MocapDataset: 4 | def __init__(self, fps, skeleton): 5 | self._skeleton = skeleton 6 | self._fps = fps 7 | self._data = None 8 | self._cameras = None 9 | 10 | def remove_joints(self, joints_to_remove): 11 | kept_joints = self._skeleton.remove_joints(joints_to_remove) 12 | for subject in self._data.keys(): 13 | for action in self._data[subject].keys(): 14 | s = self._data[subject][action] 15 | s['positions'] = s['positions'][:, kept_joints] 16 | 17 | def __getitem__(self, key): 18 | return self._data[key] 19 | 20 | def subjects(self): 21 | return self._data.keys() 22 | 23 | def fps(self): 24 | return self._fps 25 | 26 | def skeleton(self): 27 | return self._skeleton 28 | 29 | def cameras(self): 30 | return self._cameras 31 | 32 | def supports_semi_supervised(self): 33 | return False 34 | 35 | 36 | -------------------------------------------------------------------------------- /common/opt.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | from email.policy import default 3 | import os 4 | import math 5 | import time 6 | import torch 7 | 8 | class opts(): 9 | def __init__(self): 10 | self.parser = argparse.ArgumentParser() 11 | 12 | def init(self): 13 | #model args 14 | self.parser.add_argument('--layers', default=3, type=int) 15 | self.parser.add_argument('--channel', default=240, type=int,help="Must be a multiple of 24") 16 | self.parser.add_argument('--frames', type=int, default=1) 17 | self.parser.add_argument('--pad', type=int, default=0) 18 | self.parser.add_argument('-n','--model_name', type=str, default='your_model', help='Name of your model') 19 | self.parser.add_argument('--d_hid', default=1024, type=int) 20 | self.parser.add_argument('--n_joints', type=int, default=17) 21 | self.parser.add_argument('--out_joints', type=int, default=17) 22 | self.parser.add_argument('--in_channels', type=int, default=2) 23 | self.parser.add_argument('--out_channels', type=int, default=3) 24 | 25 | 26 | 27 | #train args 28 | self.parser.add_argument('--gpu', default='0', type=str, help='') 29 | self.parser.add_argument('--train', action='store_true') 30 | self.parser.add_argument('--nepoch', type=int, default=300) 31 | self.parser.add_argument('--batch_size', type=int, default=512) 32 | self.parser.add_argument('--dataset', type=str, default='h36m') 33 | self.parser.add_argument('--lr', type=float, default=0.0005) 34 | self.parser.add_argument('--large_decay_epoch', type=int, default=5) 35 | self.parser.add_argument('-lrd', '--lr_decay', default=0.95, type=float) 36 | self.parser.add_argument('--lr_decay_large', type=float, default=0.5) 37 | self.parser.add_argument('--min_lr', type=float, default=1e-6, metavar='LR', 38 | help='lower lr bound for cyclic schedulers that hit 0') 39 | self.parser.add_argument('--workers', type=int, default=4) 40 | self.parser.add_argument('--out_all', type=int, default=1) 41 | self.parser.add_argument('--drop',default=0.2, type=float) 42 | self.parser.add_argument('--seed',default=1, type=int) 43 | self.parser.add_argument('-k', '--keypoints', default='cpn_ft_h36m_dbb', type=str) 44 | self.parser.add_argument('--data_augmentation', type=bool, default=True) 45 | self.parser.add_argument('--test_augmentation', type=bool, default=True) 46 | self.parser.add_argument('--reverse_augmentation', type=bool, default=False) 47 | self.parser.add_argument('--root_path', type=str, default='./dataset/',help='Put the dataset into this file') 48 | self.parser.add_argument('-a', '--actions', default='*', type=str) 49 | self.parser.add_argument('--downsample', default=1, type=int) 50 | self.parser.add_argument('--subset', default=1, type=float) 51 | self.parser.add_argument('--stride', default=1, type=float) 52 | self.parser.add_argument('--lr_min',type=float,default=0,help='Min learn rate') 53 | 54 | 55 | # test args 56 | self.parser.add_argument('--test', type=int, default=1) 57 | self.parser.add_argument('--reload', action='store_true') 58 | self.parser.add_argument('--previous_dir', type=str, default='./ckpt/your_model') 59 | self.parser.add_argument('--previous',type=str,default='ckpt') 60 | self.parser.add_argument('-previous_best_threshold', type=float, default= math.inf) 61 | self.parser.add_argument('-previous_name', type=str, default='') 62 | self.parser.add_argument('--viz', type=str, default='try') 63 | 64 | #refine 65 | self.parser.add_argument('--refine', action='store_true') 66 | self.parser.add_argument('--crop_uv', type=int, default=0) 67 | self.parser.add_argument('--lr_refine', type=float, default=1e-5) 68 | self.parser.add_argument('--refine_train_reload', action='store_true') 69 | self.parser.add_argument('--refine_test_reload', action='store_true') 70 | self.parser.add_argument('--previous_refine_name', type=str, default='') 71 | 72 | #vis 73 | self.parser.add_argument('--figure', type=str, default='demo.jpg', help='input figure') 74 | self.parser.add_argument('--video', type=str, default='demo.jpg', help='input figure') 75 | 76 | 77 | 78 | def parse(self): 79 | self.init() 80 | self.opt = self.parser.parse_args() 81 | self.opt.pad = (self.opt.frames-1) // 2 82 | self.opt.subjects_train = 'S1,S5,S6,S7,S8' 83 | self.opt.subjects_test = 'S9,S11' 84 | 85 | if self.opt.train: 86 | self.opt.checkpoint = 'ckpt/' + self.opt.model_name 87 | if not os.path.exists(self.opt.checkpoint): 88 | os.makedirs(self.opt.checkpoint) 89 | 90 | 91 | args = dict((name, getattr(self.opt, name)) for name in dir(self.opt) 92 | if not name.startswith('_')) 93 | file_name = os.path.join(self.opt.checkpoint, 'opt.txt') 94 | with open(file_name, 'wt') as opt_file: 95 | opt_file.write('==> Args:\n') 96 | for k, v in sorted(args.items()): 97 | opt_file.write(' %s: %s\n' % (str(k), str(v))) 98 | opt_file.write('==> Args:\n') 99 | 100 | return self.opt 101 | 102 | 103 | 104 | 105 | 106 | 107 | -------------------------------------------------------------------------------- /common/skeleton.py: -------------------------------------------------------------------------------- 1 | 2 | import numpy as np 3 | 4 | class Skeleton: 5 | def __init__(self, parents, joints_left, joints_right): 6 | assert len(joints_left) == len(joints_right) 7 | 8 | self._parents = np.array(parents) 9 | self._joints_left = joints_left 10 | self._joints_right = joints_right 11 | self._compute_metadata() 12 | 13 | def num_joints(self): 14 | return len(self._parents) 15 | 16 | def parents(self): 17 | return self._parents 18 | 19 | def has_children(self): 20 | return self._has_children 21 | 22 | def children(self): 23 | return self._children 24 | 25 | def remove_joints(self, joints_to_remove): 26 | 27 | valid_joints = [] 28 | for joint in range(len(self._parents)): 29 | if joint not in joints_to_remove: 30 | valid_joints.append(joint) 31 | 32 | for i in range(len(self._parents)): 33 | while self._parents[i] in joints_to_remove: 34 | self._parents[i] = self._parents[self._parents[i]] 35 | 36 | index_offsets = np.zeros(len(self._parents), dtype=int) 37 | new_parents = [] 38 | for i, parent in enumerate(self._parents): 39 | if i not in joints_to_remove: 40 | new_parents.append(parent - index_offsets[parent]) 41 | else: 42 | index_offsets[i:] += 1 43 | self._parents = np.array(new_parents) 44 | 45 | if self._joints_left is not None: 46 | new_joints_left = [] 47 | for joint in self._joints_left: 48 | if joint in valid_joints: 49 | new_joints_left.append(joint - index_offsets[joint]) 50 | self._joints_left = new_joints_left 51 | if self._joints_right is not None: 52 | new_joints_right = [] 53 | for joint in self._joints_right: 54 | if joint in valid_joints: 55 | new_joints_right.append(joint - index_offsets[joint]) 56 | self._joints_right = new_joints_right 57 | 58 | self._compute_metadata() 59 | 60 | return valid_joints 61 | 62 | def joints_left(self): 63 | return self._joints_left 64 | 65 | def joints_right(self): 66 | return self._joints_right 67 | 68 | def _compute_metadata(self): 69 | self._has_children = np.zeros(len(self._parents)).astype(bool) 70 | for i, parent in enumerate(self._parents): 71 | if parent != -1: 72 | self._has_children[parent] = True 73 | 74 | self._children = [] 75 | for i, parent in enumerate(self._parents): 76 | self._children.append([]) 77 | for i, parent in enumerate(self._parents): 78 | if parent != -1: 79 | self._children[parent].append(i) 80 | 81 | 82 | -------------------------------------------------------------------------------- /common/utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | import hashlib 4 | from torch.autograd import Variable 5 | import os 6 | 7 | def deterministic_random(min_value, max_value, data): 8 | digest = hashlib.sha256(data.encode()).digest() 9 | raw_value = int.from_bytes(digest[:4], byteorder='little', signed=False) 10 | return int(raw_value / (2 ** 32 - 1) * (max_value - min_value)) + min_value 11 | 12 | 13 | def mpjpe_cal_mask(predicted, target, mask): 14 | assert predicted.shape == target.shape 15 | # index = [i for i in range(17) if i in mask] 16 | predicted = predicted[:,:,mask,:] 17 | target = target[:,:,mask,:] 18 | return torch.mean(torch.norm(predicted - target, dim=len(target.shape) - 1)).contiguous() 19 | 20 | def mpjpe_cal(predicted, target): 21 | assert predicted.shape == target.shape 22 | return torch.mean(torch.norm(predicted - target, dim=len(target.shape) - 1)).contiguous() 23 | 24 | def skeloss(predicted, target): 25 | assert predicted.shape == target.shape 26 | start = [0,1,2, 0,4,5, 0,7,8,9, 8,14,15, 8,11,12] 27 | end = [1,2,3, 4,5,6, 7,8,9,10, 14,15,16, 11,12,13] 28 | ske_predicted = torch.zeros(len(start)) 29 | ske_target = torch.zeros(len(start)) 30 | for i in range(len(start)): 31 | ske_predicted[i] = torch.mean(torch.norm(predicted[:,:,start[i],:] - predicted[:,:,end[i],:], dim=2)).contiguous() 32 | ske_target[i] = torch.mean(torch.norm(target[:,:,start[i],:] - target[:,:,end[i],:], dim=2)).contiguous() 33 | 34 | 35 | 36 | return torch.mean(torch.norm(ske_predicted[i] - ske_target[i])) 37 | 38 | 39 | 40 | 41 | 42 | def frame_loss(predicted):#256,9,17,2 43 | loss = 0 44 | for k in range(predicted.size(0)-1): 45 | for i in range(predicted.size(1)-1): 46 | for j in range(predicted.size(2)-1): 47 | loss += (predicted[k+1,i+1,j+1,0] - predicted[k,i,j,0])**2 48 | loss += (predicted[k+1,i+1,j+1,1] - predicted[k,i,j,1])**2 49 | return loss 50 | 51 | return torch.mean(torch.norm(predicted - target, dim=len(target.shape) - 1)).contiguous() 52 | 53 | ## viz loss 54 | def p_mpjpe(predicted, target): 55 | """ 56 | Pose error: MPJPE after rigid alignment (scale, rotation, and translation), 57 | often referred to as "Protocol #2" in many papers. 58 | """ 59 | assert predicted.shape == target.shape 60 | 61 | muX = np.mean(target, axis=1, keepdims=True) 62 | muY = np.mean(predicted, axis=1, keepdims=True) 63 | 64 | X0 = target - muX 65 | Y0 = predicted - muY 66 | 67 | normX = np.sqrt(np.sum(X0**2, axis=(1, 2), keepdims=True)) 68 | normY = np.sqrt(np.sum(Y0**2, axis=(1, 2), keepdims=True)) 69 | 70 | X0 /= normX 71 | Y0 /= normY 72 | 73 | H = np.matmul(X0.transpose(0, 2, 1), Y0) 74 | U, s, Vt = np.linalg.svd(H) 75 | V = Vt.transpose(0, 2, 1) 76 | R = np.matmul(V, U.transpose(0, 2, 1)) 77 | 78 | # Avoid improper rotations (reflections), i.e. rotations with det(R) = -1 79 | sign_detR = np.sign(np.expand_dims(np.linalg.det(R), axis=1)) 80 | V[:, :, -1] *= sign_detR 81 | s[:, -1] *= sign_detR.flatten() 82 | R = np.matmul(V, U.transpose(0, 2, 1)) # Rotation 83 | 84 | tr = np.expand_dims(np.sum(s, axis=1, keepdims=True), axis=2) 85 | 86 | a = tr * normX / normY # Scale 87 | t = muX - a*np.matmul(muY, R) # Translation 88 | 89 | # Perform rigid transformation on the input 90 | predicted_aligned = a*np.matmul(predicted, R) + t 91 | 92 | # Return MPJPE 93 | return np.mean(np.linalg.norm(predicted_aligned - target, axis=len(target.shape)-1)) 94 | 95 | 96 | 97 | 98 | def compute_PCK(gts, preds, scales=1000, eval_joints=None, threshold=150): 99 | PCK_THRESHOLD = threshold 100 | sample_num = len(gts) 101 | total = 0 102 | true_positive = 0 103 | if eval_joints is None: 104 | eval_joints = list(range(gts.shape[1])) 105 | 106 | for n in range(sample_num): 107 | gt = gts[n] 108 | pred = preds[n] 109 | # scale = scales[n] 110 | scale = 1000 111 | per_joint_error = np.take(np.sqrt(np.sum(np.power(pred - gt, 2), 1)) * scale, eval_joints, axis=0) 112 | true_positive += (per_joint_error < PCK_THRESHOLD).sum() 113 | total += per_joint_error.size 114 | 115 | pck = float(true_positive / total) * 100 116 | return pck 117 | 118 | 119 | def compute_AUC(gts, preds, scales=1000, eval_joints=None): 120 | # This range of thresholds mimics 'mpii_compute_3d_pck.m', which is provided as part of the 121 | # MPI-INF-3DHP test data release. 122 | thresholds = np.linspace(0, 150, 31) 123 | pck_list = [] 124 | for threshold in thresholds: 125 | pck_list.append(compute_PCK(gts, preds, scales, eval_joints, threshold)) 126 | 127 | auc = np.mean(pck_list) 128 | 129 | return auc 130 | 131 | 132 | def mean_velocity_error(predicted, target): 133 | """ 134 | Mean per-joint velocity error (i.e. mean Euclidean distance of the 1st derivative) 135 | """ 136 | assert predicted.shape == target.shape 137 | 138 | velocity_predicted = np.diff(predicted, axis=0) 139 | velocity_target = np.diff(target, axis=0) 140 | 141 | return np.mean(np.linalg.norm(velocity_predicted - velocity_target, axis=len(target.shape)-1)) 142 | 143 | def weighted_mpjpe(predicted, target, w): 144 | """ 145 | Weighted mean per-joint position error (i.e. mean Euclidean distance) 146 | """ 147 | assert predicted.shape == target.shape 148 | assert w.shape[0] == predicted.shape[0] 149 | return torch.mean(w * torch.norm(predicted - target, dim=len(target.shape)-1)) 150 | 151 | 152 | 153 | # def test_calculation(predicted, target, action, error_sum, data_type, subject): 154 | # error_sum = mpjpe_by_action(predicted, target, action, error_sum) 155 | 156 | # return error_sum 157 | 158 | def test_calculation(predicted, target, action, error_sum, data_type, subject): 159 | error_sum = mpjpe_by_action_p1(predicted, target, action, error_sum) 160 | error_sum = mpjpe_by_action_p2(predicted, target, action, error_sum) 161 | 162 | return error_sum 163 | 164 | 165 | 166 | 167 | def mpjpe_by_action(predicted, target, action, action_error_sum): 168 | assert predicted.shape == target.shape 169 | num = predicted.size(0) 170 | dist = torch.mean(torch.norm(predicted - target, dim=len(target.shape) - 1), dim=len(target.shape) - 2) 171 | 172 | if len(set(list(action))) == 1: 173 | end_index = action[0].find(' ') 174 | if end_index != -1: 175 | action_name = action[0][:end_index] 176 | else: 177 | action_name = action[0] 178 | 179 | action_error_sum[action_name].update(torch.mean(dist).item()*num, num) 180 | else: 181 | for i in range(num): 182 | end_index = action[i].find(' ') 183 | if end_index != -1: 184 | action_name = action[i][:end_index] 185 | else: 186 | action_name = action[i] 187 | 188 | action_error_sum[action_name].update(dist[i].item(), 1) 189 | 190 | return action_error_sum 191 | 192 | 193 | def mpjpe_by_action_p1(predicted, target, action, action_error_sum): 194 | assert predicted.shape == target.shape 195 | num = predicted.size(0) 196 | dist = torch.mean(torch.norm(predicted - target, dim=len(target.shape) - 1), dim=len(target.shape) - 2) 197 | 198 | if len(set(list(action))) == 1: 199 | end_index = action[0].find(' ') 200 | if end_index != -1: 201 | action_name = action[0][:end_index] 202 | else: 203 | action_name = action[0] 204 | 205 | action_error_sum[action_name]['p1'].update(torch.mean(dist).item()*num, num) 206 | else: 207 | for i in range(num): 208 | end_index = action[i].find(' ') 209 | if end_index != -1: 210 | action_name = action[i][:end_index] 211 | else: 212 | action_name = action[i] 213 | 214 | action_error_sum[action_name]['p1'].update(dist[i].item(), 1) 215 | 216 | return action_error_sum 217 | 218 | def mpjpe_by_action_p2(predicted, target, action, action_error_sum): 219 | assert predicted.shape == target.shape 220 | num = predicted.size(0) 221 | pred = predicted.detach().cpu().numpy().reshape(-1, predicted.shape[-2], predicted.shape[-1]) 222 | gt = target.detach().cpu().numpy().reshape(-1, target.shape[-2], target.shape[-1]) 223 | dist = p_mpjpe(pred, gt) 224 | 225 | if len(set(list(action))) == 1: 226 | end_index = action[0].find(' ') 227 | if end_index != -1: 228 | action_name = action[0][:end_index] 229 | else: 230 | action_name = action[0] 231 | action_error_sum[action_name]['p2'].update(np.mean(dist) * num, num) 232 | else: 233 | for i in range(num): 234 | end_index = action[i].find(' ') 235 | if end_index != -1: 236 | action_name = action[i][:end_index] 237 | else: 238 | action_name = action[i] 239 | action_error_sum[action_name]['p2'].update(np.mean(dist), 1) 240 | 241 | return action_error_sum 242 | 243 | 244 | 245 | def mpjpe_by_joint_mae(predicted, target,num): 246 | assert predicted.shape == target.shape 247 | # this is the joint 248 | mpjpe_joint = torch.mean(torch.mean(torch.norm(predicted - target, dim=len(target.shape) - 1), dim=len(target.shape) - 3),dim=len(target.shape)-4) 249 | print("\nthe mpjpe/joint",mpjpe_joint) 250 | # this is the order of joint from big to small 251 | index = torch.flip(mpjpe_joint.sort(-1).indices,dims=[0]) 252 | index = index.split(num,-1)[0] 253 | print("\nerror joint",index) 254 | return index 255 | 256 | 257 | 258 | 259 | 260 | def define_actions( action ): 261 | 262 | actions = ["Directions","Discussion","Eating","Greeting", 263 | "Phoning","Photo","Posing","Purchases", 264 | "Sitting","SittingDown","Smoking","Waiting", 265 | "WalkDog","Walking","WalkTogether"] 266 | 267 | if action == "All" or action == "all" or action == '*': 268 | return actions 269 | 270 | if not action in actions: 271 | raise( ValueError, "Unrecognized action: %s" % action ) 272 | 273 | return [action] 274 | 275 | 276 | def define_error_list(actions): 277 | error_sum = {} 278 | error_sum.update({actions[i]: 279 | {'p1':AccumLoss(), 'p2':AccumLoss()} 280 | for i in range(len(actions))}) 281 | return error_sum 282 | 283 | # def define_error_list(actions): 284 | # error_sum = {} 285 | # error_sum.update({actions[i]: AccumLoss() for i in range(len(actions))}) 286 | # return error_sum 287 | 288 | 289 | class AccumLoss(object): 290 | def __init__(self): 291 | self.val = 0 292 | self.avg = 0 293 | self.sum = 0 294 | self.count = 0 295 | 296 | def update(self, val, n=1): 297 | self.val = val 298 | self.sum += val 299 | self.count += n 300 | self.avg = self.sum / self.count 301 | 302 | 303 | def get_varialbe(split, target): 304 | num = len(target) 305 | var = [] 306 | if split == 'train': 307 | for i in range(num): 308 | temp = Variable(target[i], requires_grad=False).contiguous().type(torch.cuda.FloatTensor) 309 | var.append(temp) 310 | else: 311 | for i in range(num): 312 | temp = Variable(target[i]).contiguous().cuda().type(torch.cuda.FloatTensor) 313 | var.append(temp) 314 | 315 | return var 316 | 317 | 318 | 319 | 320 | def print_error(data_type, action_error_sum, is_train): 321 | mean_error_p1, mean_error_p2 = print_error_action(action_error_sum, is_train) 322 | 323 | return mean_error_p1, mean_error_p2 324 | 325 | 326 | 327 | 328 | def print_error_action(action_error_sum, is_train): 329 | mean_error_each = {'p1': 0.0, 'p2': 0.0} 330 | mean_error_all = {'p1': AccumLoss(), 'p2': AccumLoss()} 331 | 332 | if is_train == 0: 333 | print("{0:=^12} {1:=^10} {2:=^8}".format("Action", "p#1 mm", "p#2 mm")) 334 | 335 | 336 | for action, value in action_error_sum.items(): 337 | if is_train == 0: 338 | print("{0:<12} ".format(action), end="") 339 | 340 | mean_error_each['p1'] = action_error_sum[action]['p1'].avg * 1000.0 341 | mean_error_all['p1'].update(mean_error_each['p1'], 1) 342 | 343 | mean_error_each['p2'] = action_error_sum[action]['p2'].avg * 1000.0 344 | mean_error_all['p2'].update(mean_error_each['p2'], 1) 345 | 346 | if is_train == 0: 347 | print("{0:>6.2f} {1:>10.2f}".format(mean_error_each['p1'], mean_error_each['p2'])) 348 | 349 | if is_train == 0: 350 | print("{0:<12} {1:>6.2f} {2:>10.2f}".format("Average", mean_error_all['p1'].avg, \ 351 | mean_error_all['p2'].avg)) 352 | 353 | return mean_error_all['p1'].avg, mean_error_all['p2'].avg 354 | 355 | 356 | 357 | def save_model_refine(previous_name, save_dir,epoch, data_threshold, model, model_name):# 358 | if os.path.exists(previous_name): 359 | os.remove(previous_name) 360 | 361 | torch.save(model.state_dict(), 362 | '%s/%s_%d_%d.pth' % (save_dir, model_name, epoch, data_threshold * 100)) 363 | previous_name = '%s/%s_%d_%d.pth' % (save_dir, model_name, epoch, data_threshold * 100) 364 | 365 | return previous_name 366 | 367 | 368 | def save_model(previous_name, save_dir, epoch, data_threshold, model): 369 | if os.path.exists(previous_name): 370 | os.remove(previous_name) 371 | 372 | torch.save(model.state_dict(), 373 | '%s/model_%d_%d.pth' % (save_dir, epoch, data_threshold * 100)) 374 | previous_name = '%s/model_%d_%d.pth' % (save_dir, epoch, data_threshold * 100) 375 | return previous_name 376 | 377 | 378 | 379 | def save_model_epoch(previous_name, save_dir, epoch, data_threshold, model): 380 | # if os.path.exists(previous_name): 381 | # os.remove(previous_name) 382 | 383 | torch.save(model.state_dict(), 384 | '%s/model_%d_%d.pth' % (save_dir, epoch, data_threshold * 100)) 385 | previous_name = '%s/model_%d_%d.pth' % (save_dir, epoch, data_threshold * 100) 386 | return previous_name 387 | 388 | 389 | 390 | 391 | 392 | 393 | -------------------------------------------------------------------------------- /demo/figure/lindan.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/figure/lindan.jpg -------------------------------------------------------------------------------- /demo/figure/messi.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/figure/messi.jpg -------------------------------------------------------------------------------- /demo/lib/hrnet/experiments/w48_384x288_adam_lr1e-3.yaml: -------------------------------------------------------------------------------- 1 | AUTO_RESUME: true 2 | CUDNN: 3 | BENCHMARK: true 4 | DETERMINISTIC: false 5 | ENABLED: true 6 | DATA_DIR: '' 7 | GPUS: (0,1,2,3) 8 | OUTPUT_DIR: 'output' 9 | LOG_DIR: 'log' 10 | WORKERS: 24 11 | PRINT_FREQ: 100 12 | 13 | DATASET: 14 | COLOR_RGB: true 15 | DATASET: 'coco' 16 | DATA_FORMAT: jpg 17 | FLIP: true 18 | NUM_JOINTS_HALF_BODY: 8 19 | PROB_HALF_BODY: 0.3 20 | ROOT: 'data/coco/' 21 | ROT_FACTOR: 45 22 | SCALE_FACTOR: 0.35 23 | TEST_SET: 'val2017' 24 | TRAIN_SET: 'train2017' 25 | MODEL: 26 | INIT_WEIGHTS: true 27 | NAME: pose_hrnet 28 | NUM_JOINTS: 17 29 | PRETRAINED: 'models/pytorch/imagenet/hrnet_w48-8ef0771d.pth' 30 | TARGET_TYPE: gaussian 31 | IMAGE_SIZE: 32 | - 288 33 | - 384 34 | HEATMAP_SIZE: 35 | - 72 36 | - 96 37 | SIGMA: 3 38 | EXTRA: 39 | PRETRAINED_LAYERS: 40 | - 'conv1' 41 | - 'bn1' 42 | - 'conv2' 43 | - 'bn2' 44 | - 'layer1' 45 | - 'transition1' 46 | - 'stage2' 47 | - 'transition2' 48 | - 'stage3' 49 | - 'transition3' 50 | - 'stage4' 51 | FINAL_CONV_KERNEL: 1 52 | STAGE2: 53 | NUM_MODULES: 1 54 | NUM_BRANCHES: 2 55 | BLOCK: BASIC 56 | NUM_BLOCKS: 57 | - 4 58 | - 4 59 | NUM_CHANNELS: 60 | - 48 61 | - 96 62 | FUSE_METHOD: SUM 63 | STAGE3: 64 | NUM_MODULES: 4 65 | NUM_BRANCHES: 3 66 | BLOCK: BASIC 67 | NUM_BLOCKS: 68 | - 4 69 | - 4 70 | - 4 71 | NUM_CHANNELS: 72 | - 48 73 | - 96 74 | - 192 75 | FUSE_METHOD: SUM 76 | STAGE4: 77 | NUM_MODULES: 3 78 | NUM_BRANCHES: 4 79 | BLOCK: BASIC 80 | NUM_BLOCKS: 81 | - 4 82 | - 4 83 | - 4 84 | - 4 85 | NUM_CHANNELS: 86 | - 48 87 | - 96 88 | - 192 89 | - 384 90 | FUSE_METHOD: SUM 91 | LOSS: 92 | USE_TARGET_WEIGHT: true 93 | TRAIN: 94 | BATCH_SIZE_PER_GPU: 24 95 | SHUFFLE: true 96 | BEGIN_EPOCH: 0 97 | END_EPOCH: 210 98 | OPTIMIZER: adam 99 | LR: 0.001 100 | LR_FACTOR: 0.1 101 | LR_STEP: 102 | - 170 103 | - 200 104 | WD: 0.0001 105 | GAMMA1: 0.99 106 | GAMMA2: 0.0 107 | MOMENTUM: 0.9 108 | NESTEROV: false 109 | TEST: 110 | BATCH_SIZE_PER_GPU: 24 111 | COCO_BBOX_FILE: 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json' 112 | BBOX_THRE: 1.0 113 | IMAGE_THRE: 0.0 114 | IN_VIS_THRE: 0.2 115 | MODEL_FILE: '' 116 | NMS_THRE: 1.0 117 | OKS_THRE: 0.9 118 | USE_GT_BBOX: true 119 | FLIP_TEST: true 120 | POST_PROCESS: true 121 | SHIFT_HEATMAP: true 122 | DEBUG: 123 | DEBUG: true 124 | SAVE_BATCH_IMAGES_GT: true 125 | SAVE_BATCH_IMAGES_PRED: true 126 | SAVE_HEATMAPS_GT: true 127 | SAVE_HEATMAPS_PRED: true 128 | -------------------------------------------------------------------------------- /demo/lib/hrnet/gen_kpts.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import 2 | from __future__ import division 3 | from __future__ import print_function 4 | 5 | import sys 6 | import os 7 | import os.path as osp 8 | import argparse 9 | import time 10 | import numpy as np 11 | from tqdm import tqdm 12 | import json 13 | import torch 14 | import torch.backends.cudnn as cudnn 15 | import cv2 16 | import copy 17 | 18 | from lib.hrnet.lib.utils.utilitys import plot_keypoint, PreProcess, write, load_json 19 | from lib.hrnet.lib.config import cfg, update_config 20 | from lib.hrnet.lib.utils.transforms import * 21 | from lib.hrnet.lib.utils.inference import get_final_preds 22 | from lib.hrnet.lib.models import pose_hrnet 23 | 24 | cfg_dir = 'demo/lib/hrnet/experiments/' 25 | model_dir = 'demo/lib/checkpoint/' 26 | 27 | # Loading human detector model 28 | from lib.yolov3.human_detector import load_model as yolo_model 29 | from lib.yolov3.human_detector import yolo_human_det as yolo_det 30 | from lib.sort.sort import Sort 31 | 32 | 33 | def parse_args(): 34 | parser = argparse.ArgumentParser(description='Train keypoints network') 35 | # general 36 | parser.add_argument('--cfg', type=str, default=cfg_dir + 'w48_384x288_adam_lr1e-3.yaml', 37 | help='experiment configure file name') 38 | parser.add_argument('opts', nargs=argparse.REMAINDER, default=None, 39 | help="Modify config options using the command-line") 40 | parser.add_argument('--modelDir', type=str, default=model_dir + 'pose_hrnet_w48_384x288.pth', 41 | help='The model directory') 42 | parser.add_argument('--det-dim', type=int, default=416, 43 | help='The input dimension of the detected image') 44 | parser.add_argument('--thred-score', type=float, default=0.30, 45 | help='The threshold of object Confidence') 46 | parser.add_argument('-a', '--animation', action='store_true', 47 | help='output animation') 48 | parser.add_argument('-np', '--num-person', type=int, default=1, 49 | help='The maximum number of estimated poses') 50 | parser.add_argument("-v", "--video", type=str, default='camera', 51 | help="input video file name") 52 | parser.add_argument("-f", "--figure", type=str, default='demo.jpg', 53 | help="input figure file name") 54 | parser.add_argument('--gpu', type=str, default='0', help='input video') 55 | args = parser.parse_args() 56 | 57 | return args 58 | 59 | 60 | def reset_config(args): 61 | update_config(cfg, args) 62 | 63 | # cudnn related setting 64 | cudnn.benchmark = cfg.CUDNN.BENCHMARK 65 | torch.backends.cudnn.deterministic = cfg.CUDNN.DETERMINISTIC 66 | torch.backends.cudnn.enabled = cfg.CUDNN.ENABLED 67 | 68 | 69 | # load model 70 | def model_load(config): 71 | model = pose_hrnet.get_pose_net(config, is_train=False) 72 | if torch.cuda.is_available(): 73 | model = model.cuda() 74 | 75 | state_dict = torch.load(config.OUTPUT_DIR) 76 | from collections import OrderedDict 77 | new_state_dict = OrderedDict() 78 | for k, v in state_dict.items(): 79 | name = k # remove module. 80 | # print(name,'\t') 81 | new_state_dict[name] = v 82 | model.load_state_dict(new_state_dict) 83 | model.eval() 84 | # print('HRNet network successfully loaded') 85 | 86 | return model 87 | 88 | 89 | def gen_video_kpts(video, det_dim=416, num_peroson=1, gen_output=False): 90 | # Updating configuration 91 | args = parse_args() 92 | reset_config(args) 93 | 94 | cap = cv2.VideoCapture(video) 95 | 96 | # Loading detector and pose model, initialize sort for track 97 | human_model = yolo_model(inp_dim=det_dim) 98 | pose_model = model_load(cfg) 99 | people_sort = Sort(min_hits=0) 100 | 101 | video_length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) 102 | 103 | kpts_result = [] 104 | scores_result = [] 105 | for ii in tqdm(range(video_length)): 106 | ret, frame = cap.read() 107 | 108 | if not ret: 109 | continue 110 | 111 | bboxs, scores = yolo_det(frame, human_model, reso=det_dim, confidence=args.thred_score) 112 | 113 | if bboxs is None or not bboxs.any(): 114 | print('No person detected!') 115 | bboxs = bboxs_pre 116 | scores = scores_pre 117 | else: 118 | bboxs_pre = copy.deepcopy(bboxs) 119 | scores_pre = copy.deepcopy(scores) 120 | 121 | # Using Sort to track people 122 | people_track = people_sort.update(bboxs) 123 | 124 | # Track the first two people in the video and remove the ID 125 | if people_track.shape[0] == 1: 126 | people_track_ = people_track[-1, :-1].reshape(1, 4) 127 | elif people_track.shape[0] >= 2: 128 | people_track_ = people_track[-num_peroson:, :-1].reshape(num_peroson, 4) 129 | people_track_ = people_track_[::-1] 130 | else: 131 | continue 132 | 133 | track_bboxs = [] 134 | for bbox in people_track_: 135 | bbox = [round(i, 2) for i in list(bbox)] 136 | track_bboxs.append(bbox) 137 | 138 | with torch.no_grad(): 139 | # bbox is coordinate location 140 | inputs, origin_img, center, scale = PreProcess(frame, track_bboxs, cfg, num_peroson) 141 | 142 | inputs = inputs[:, [2, 1, 0]] 143 | 144 | if torch.cuda.is_available(): 145 | inputs = inputs.cuda() 146 | output = pose_model(inputs) 147 | 148 | # compute coordinate 149 | preds, maxvals = get_final_preds(cfg, output.clone().cpu().numpy(), np.asarray(center), np.asarray(scale)) 150 | 151 | kpts = np.zeros((num_peroson, 17, 2), dtype=np.float32) 152 | scores = np.zeros((num_peroson, 17), dtype=np.float32) 153 | for i, kpt in enumerate(preds): 154 | kpts[i] = kpt 155 | 156 | for i, score in enumerate(maxvals): 157 | scores[i] = score.squeeze() 158 | 159 | kpts_result.append(kpts) 160 | scores_result.append(scores) 161 | 162 | keypoints = np.array(kpts_result) 163 | scores = np.array(scores_result) 164 | 165 | keypoints = keypoints.transpose(1, 0, 2, 3) # (T, M, N, 2) --> (M, T, N, 2) 166 | scores = scores.transpose(1, 0, 2) # (T, M, N) --> (M, T, N) 167 | 168 | return keypoints, scores 169 | -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/config/__init__.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from .default import _C as cfg 8 | from .default import update_config 9 | from .models import MODEL_EXTRAS 10 | -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/config/__pycache__/__init__.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/config/__pycache__/__init__.cpython-38.pyc -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/config/__pycache__/__init__.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/config/__pycache__/__init__.cpython-39.pyc -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/config/__pycache__/default.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/config/__pycache__/default.cpython-38.pyc -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/config/__pycache__/default.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/config/__pycache__/default.cpython-39.pyc -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/config/__pycache__/models.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/config/__pycache__/models.cpython-38.pyc -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/config/__pycache__/models.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/config/__pycache__/models.cpython-39.pyc -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/config/default.py: -------------------------------------------------------------------------------- 1 | 2 | # ------------------------------------------------------------------------------ 3 | # Copyright (c) Microsoft 4 | # Licensed under the MIT License. 5 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 6 | # ------------------------------------------------------------------------------ 7 | 8 | from __future__ import absolute_import 9 | from __future__ import division 10 | from __future__ import print_function 11 | 12 | import os 13 | 14 | from yacs.config import CfgNode as CN 15 | 16 | 17 | _C = CN() 18 | 19 | _C.OUTPUT_DIR = '' 20 | _C.LOG_DIR = '' 21 | _C.DATA_DIR = '' 22 | _C.GPUS = (0,) 23 | _C.WORKERS = 4 24 | _C.PRINT_FREQ = 20 25 | _C.AUTO_RESUME = False 26 | _C.PIN_MEMORY = True 27 | _C.RANK = 0 28 | 29 | # Cudnn related params 30 | _C.CUDNN = CN() 31 | _C.CUDNN.BENCHMARK = True 32 | _C.CUDNN.DETERMINISTIC = False 33 | _C.CUDNN.ENABLED = True 34 | 35 | # common params for NETWORK 36 | _C.MODEL = CN() 37 | _C.MODEL.NAME = 'pose_hrnet' 38 | _C.MODEL.INIT_WEIGHTS = True 39 | _C.MODEL.PRETRAINED = '' 40 | _C.MODEL.NUM_JOINTS = 17 41 | _C.MODEL.TAG_PER_JOINT = True 42 | _C.MODEL.TARGET_TYPE = 'gaussian' 43 | _C.MODEL.IMAGE_SIZE = [256, 256] # width * height, ex: 192 * 256 44 | _C.MODEL.HEATMAP_SIZE = [64, 64] # width * height, ex: 24 * 32 45 | _C.MODEL.SIGMA = 2 46 | _C.MODEL.EXTRA = CN(new_allowed=True) 47 | 48 | _C.LOSS = CN() 49 | _C.LOSS.USE_OHKM = False 50 | _C.LOSS.TOPK = 8 51 | _C.LOSS.USE_TARGET_WEIGHT = True 52 | _C.LOSS.USE_DIFFERENT_JOINTS_WEIGHT = False 53 | 54 | # DATASET related params 55 | _C.DATASET = CN() 56 | _C.DATASET.ROOT = '' 57 | _C.DATASET.DATASET = 'mpii' 58 | _C.DATASET.TRAIN_SET = 'train' 59 | _C.DATASET.TEST_SET = 'valid' 60 | _C.DATASET.DATA_FORMAT = 'jpg' 61 | _C.DATASET.HYBRID_JOINTS_TYPE = '' 62 | _C.DATASET.SELECT_DATA = False 63 | 64 | # training data augmentation 65 | _C.DATASET.FLIP = True 66 | _C.DATASET.SCALE_FACTOR = 0.25 67 | _C.DATASET.ROT_FACTOR = 30 68 | _C.DATASET.PROB_HALF_BODY = 0.0 69 | _C.DATASET.NUM_JOINTS_HALF_BODY = 8 70 | _C.DATASET.COLOR_RGB = False 71 | 72 | # train 73 | _C.TRAIN = CN() 74 | 75 | _C.TRAIN.LR_FACTOR = 0.1 76 | _C.TRAIN.LR_STEP = [90, 110] 77 | _C.TRAIN.LR = 0.001 78 | 79 | _C.TRAIN.OPTIMIZER = 'adam' 80 | _C.TRAIN.MOMENTUM = 0.9 81 | _C.TRAIN.WD = 0.0001 82 | _C.TRAIN.NESTEROV = False 83 | _C.TRAIN.GAMMA1 = 0.99 84 | _C.TRAIN.GAMMA2 = 0.0 85 | 86 | _C.TRAIN.BEGIN_EPOCH = 0 87 | _C.TRAIN.END_EPOCH = 140 88 | 89 | _C.TRAIN.RESUME = False 90 | _C.TRAIN.CHECKPOINT = '' 91 | 92 | _C.TRAIN.BATCH_SIZE_PER_GPU = 32 93 | _C.TRAIN.SHUFFLE = True 94 | 95 | # testing 96 | _C.TEST = CN() 97 | 98 | # size of images for each device 99 | _C.TEST.BATCH_SIZE_PER_GPU = 32 100 | # Test Model Epoch 101 | _C.TEST.FLIP_TEST = False 102 | _C.TEST.POST_PROCESS = False 103 | _C.TEST.SHIFT_HEATMAP = False 104 | 105 | _C.TEST.USE_GT_BBOX = False 106 | 107 | # nms 108 | _C.TEST.IMAGE_THRE = 0.1 109 | _C.TEST.NMS_THRE = 0.6 110 | _C.TEST.SOFT_NMS = False 111 | _C.TEST.OKS_THRE = 0.5 112 | _C.TEST.IN_VIS_THRE = 0.0 113 | _C.TEST.COCO_BBOX_FILE = '' 114 | _C.TEST.BBOX_THRE = 1.0 115 | _C.TEST.MODEL_FILE = '' 116 | 117 | # debug 118 | _C.DEBUG = CN() 119 | _C.DEBUG.DEBUG = False 120 | _C.DEBUG.SAVE_BATCH_IMAGES_GT = False 121 | _C.DEBUG.SAVE_BATCH_IMAGES_PRED = False 122 | _C.DEBUG.SAVE_HEATMAPS_GT = False 123 | _C.DEBUG.SAVE_HEATMAPS_PRED = False 124 | 125 | 126 | def update_config(cfg, args): 127 | cfg.defrost() 128 | cfg.merge_from_file(args.cfg) 129 | cfg.merge_from_list(args.opts) 130 | 131 | if args.modelDir: 132 | cfg.OUTPUT_DIR = args.modelDir 133 | 134 | # if args.logDir: 135 | # cfg.LOG_DIR = args.logDir 136 | # 137 | # if args.dataDir: 138 | # cfg.DATA_DIR = args.dataDir 139 | # 140 | # cfg.DATASET.ROOT = os.path.join( 141 | # cfg.DATA_DIR, cfg.DATASET.ROOT 142 | # ) 143 | # 144 | # cfg.MODEL.PRETRAINED = os.path.join( 145 | # cfg.DATA_DIR, cfg.MODEL.PRETRAINED 146 | # ) 147 | # 148 | # if cfg.TEST.MODEL_FILE: 149 | # cfg.TEST.MODEL_FILE = os.path.join( 150 | # cfg.DATA_DIR, cfg.TEST.MODEL_FILE 151 | # ) 152 | 153 | cfg.freeze() 154 | 155 | 156 | if __name__ == '__main__': 157 | import sys 158 | with open(sys.argv[1], 'w') as f: 159 | print(_C, file=f) 160 | 161 | -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/config/models.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | from yacs.config import CfgNode as CN 12 | 13 | 14 | # pose_resnet related params 15 | POSE_RESNET = CN() 16 | POSE_RESNET.NUM_LAYERS = 50 17 | POSE_RESNET.DECONV_WITH_BIAS = False 18 | POSE_RESNET.NUM_DECONV_LAYERS = 3 19 | POSE_RESNET.NUM_DECONV_FILTERS = [256, 256, 256] 20 | POSE_RESNET.NUM_DECONV_KERNELS = [4, 4, 4] 21 | POSE_RESNET.FINAL_CONV_KERNEL = 1 22 | POSE_RESNET.PRETRAINED_LAYERS = ['*'] 23 | 24 | # pose_multi_resoluton_net related params 25 | POSE_HIGH_RESOLUTION_NET = CN() 26 | POSE_HIGH_RESOLUTION_NET.PRETRAINED_LAYERS = ['*'] 27 | POSE_HIGH_RESOLUTION_NET.STEM_INPLANES = 64 28 | POSE_HIGH_RESOLUTION_NET.FINAL_CONV_KERNEL = 1 29 | 30 | POSE_HIGH_RESOLUTION_NET.STAGE2 = CN() 31 | POSE_HIGH_RESOLUTION_NET.STAGE2.NUM_MODULES = 1 32 | POSE_HIGH_RESOLUTION_NET.STAGE2.NUM_BRANCHES = 2 33 | POSE_HIGH_RESOLUTION_NET.STAGE2.NUM_BLOCKS = [4, 4] 34 | POSE_HIGH_RESOLUTION_NET.STAGE2.NUM_CHANNELS = [32, 64] 35 | POSE_HIGH_RESOLUTION_NET.STAGE2.BLOCK = 'BASIC' 36 | POSE_HIGH_RESOLUTION_NET.STAGE2.FUSE_METHOD = 'SUM' 37 | 38 | POSE_HIGH_RESOLUTION_NET.STAGE3 = CN() 39 | POSE_HIGH_RESOLUTION_NET.STAGE3.NUM_MODULES = 1 40 | POSE_HIGH_RESOLUTION_NET.STAGE3.NUM_BRANCHES = 3 41 | POSE_HIGH_RESOLUTION_NET.STAGE3.NUM_BLOCKS = [4, 4, 4] 42 | POSE_HIGH_RESOLUTION_NET.STAGE3.NUM_CHANNELS = [32, 64, 128] 43 | POSE_HIGH_RESOLUTION_NET.STAGE3.BLOCK = 'BASIC' 44 | POSE_HIGH_RESOLUTION_NET.STAGE3.FUSE_METHOD = 'SUM' 45 | 46 | POSE_HIGH_RESOLUTION_NET.STAGE4 = CN() 47 | POSE_HIGH_RESOLUTION_NET.STAGE4.NUM_MODULES = 1 48 | POSE_HIGH_RESOLUTION_NET.STAGE4.NUM_BRANCHES = 4 49 | POSE_HIGH_RESOLUTION_NET.STAGE4.NUM_BLOCKS = [4, 4, 4, 4] 50 | POSE_HIGH_RESOLUTION_NET.STAGE4.NUM_CHANNELS = [32, 64, 128, 256] 51 | POSE_HIGH_RESOLUTION_NET.STAGE4.BLOCK = 'BASIC' 52 | POSE_HIGH_RESOLUTION_NET.STAGE4.FUSE_METHOD = 'SUM' 53 | 54 | 55 | MODEL_EXTRAS = { 56 | 'pose_resnet': POSE_RESNET, 57 | 'pose_high_resolution_net': POSE_HIGH_RESOLUTION_NET, 58 | } 59 | -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/models/__pycache__/pose_hrnet.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/models/__pycache__/pose_hrnet.cpython-38.pyc -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/models/__pycache__/pose_hrnet.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/models/__pycache__/pose_hrnet.cpython-39.pyc -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/models/pose_hrnet.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import os 12 | import logging 13 | 14 | import torch 15 | import torch.nn as nn 16 | 17 | 18 | BN_MOMENTUM = 0.1 19 | logger = logging.getLogger(__name__) 20 | 21 | 22 | def conv3x3(in_planes, out_planes, stride=1): 23 | """3x3 convolution with padding""" 24 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 25 | padding=1, bias=False) 26 | 27 | 28 | class BasicBlock(nn.Module): 29 | expansion = 1 30 | 31 | def __init__(self, inplanes, planes, stride=1, downsample=None): 32 | super(BasicBlock, self).__init__() 33 | self.conv1 = conv3x3(inplanes, planes, stride) 34 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 35 | self.relu = nn.ReLU(inplace=True) 36 | self.conv2 = conv3x3(planes, planes) 37 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 38 | self.downsample = downsample 39 | self.stride = stride 40 | 41 | def forward(self, x): 42 | residual = x 43 | 44 | out = self.conv1(x) 45 | out = self.bn1(out) 46 | out = self.relu(out) 47 | 48 | out = self.conv2(out) 49 | out = self.bn2(out) 50 | 51 | if self.downsample is not None: 52 | residual = self.downsample(x) 53 | 54 | out += residual 55 | out = self.relu(out) 56 | 57 | return out 58 | 59 | 60 | class Bottleneck(nn.Module): 61 | expansion = 4 62 | 63 | def __init__(self, inplanes, planes, stride=1, downsample=None): 64 | super(Bottleneck, self).__init__() 65 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False) 66 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 67 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, 68 | padding=1, bias=False) 69 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 70 | self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, 71 | bias=False) 72 | self.bn3 = nn.BatchNorm2d(planes * self.expansion, 73 | momentum=BN_MOMENTUM) 74 | self.relu = nn.ReLU(inplace=True) 75 | self.downsample = downsample 76 | self.stride = stride 77 | 78 | def forward(self, x): 79 | residual = x 80 | 81 | out = self.conv1(x) 82 | out = self.bn1(out) 83 | out = self.relu(out) 84 | 85 | out = self.conv2(out) 86 | out = self.bn2(out) 87 | out = self.relu(out) 88 | 89 | out = self.conv3(out) 90 | out = self.bn3(out) 91 | 92 | if self.downsample is not None: 93 | residual = self.downsample(x) 94 | 95 | out += residual 96 | out = self.relu(out) 97 | 98 | return out 99 | 100 | 101 | class HighResolutionModule(nn.Module): 102 | def __init__(self, num_branches, blocks, num_blocks, num_inchannels, 103 | num_channels, fuse_method, multi_scale_output=True): 104 | super(HighResolutionModule, self).__init__() 105 | self._check_branches( 106 | num_branches, blocks, num_blocks, num_inchannels, num_channels) 107 | 108 | self.num_inchannels = num_inchannels 109 | self.fuse_method = fuse_method 110 | self.num_branches = num_branches 111 | 112 | self.multi_scale_output = multi_scale_output 113 | 114 | self.branches = self._make_branches( 115 | num_branches, blocks, num_blocks, num_channels) 116 | self.fuse_layers = self._make_fuse_layers() 117 | self.relu = nn.ReLU(True) 118 | 119 | def _check_branches(self, num_branches, blocks, num_blocks, 120 | num_inchannels, num_channels): 121 | if num_branches != len(num_blocks): 122 | error_msg = 'NUM_BRANCHES({}) <> NUM_BLOCKS({})'.format( 123 | num_branches, len(num_blocks)) 124 | logger.error(error_msg) 125 | raise ValueError(error_msg) 126 | 127 | if num_branches != len(num_channels): 128 | error_msg = 'NUM_BRANCHES({}) <> NUM_CHANNELS({})'.format( 129 | num_branches, len(num_channels)) 130 | logger.error(error_msg) 131 | raise ValueError(error_msg) 132 | 133 | if num_branches != len(num_inchannels): 134 | error_msg = 'NUM_BRANCHES({}) <> NUM_INCHANNELS({})'.format( 135 | num_branches, len(num_inchannels)) 136 | logger.error(error_msg) 137 | raise ValueError(error_msg) 138 | 139 | def _make_one_branch(self, branch_index, block, num_blocks, num_channels, 140 | stride=1): 141 | downsample = None 142 | if stride != 1 or \ 143 | self.num_inchannels[branch_index] != num_channels[branch_index] * block.expansion: 144 | downsample = nn.Sequential( 145 | nn.Conv2d( 146 | self.num_inchannels[branch_index], 147 | num_channels[branch_index] * block.expansion, 148 | kernel_size=1, stride=stride, bias=False 149 | ), 150 | nn.BatchNorm2d( 151 | num_channels[branch_index] * block.expansion, 152 | momentum=BN_MOMENTUM 153 | ), 154 | ) 155 | 156 | layers = [] 157 | layers.append( 158 | block( 159 | self.num_inchannels[branch_index], 160 | num_channels[branch_index], 161 | stride, 162 | downsample 163 | ) 164 | ) 165 | self.num_inchannels[branch_index] = \ 166 | num_channels[branch_index] * block.expansion 167 | for i in range(1, num_blocks[branch_index]): 168 | layers.append( 169 | block( 170 | self.num_inchannels[branch_index], 171 | num_channels[branch_index] 172 | ) 173 | ) 174 | 175 | return nn.Sequential(*layers) 176 | 177 | def _make_branches(self, num_branches, block, num_blocks, num_channels): 178 | branches = [] 179 | 180 | for i in range(num_branches): 181 | branches.append( 182 | self._make_one_branch(i, block, num_blocks, num_channels) 183 | ) 184 | 185 | return nn.ModuleList(branches) 186 | 187 | def _make_fuse_layers(self): 188 | if self.num_branches == 1: 189 | return None 190 | 191 | num_branches = self.num_branches 192 | num_inchannels = self.num_inchannels 193 | fuse_layers = [] 194 | for i in range(num_branches if self.multi_scale_output else 1): 195 | fuse_layer = [] 196 | for j in range(num_branches): 197 | if j > i: 198 | fuse_layer.append( 199 | nn.Sequential( 200 | nn.Conv2d( 201 | num_inchannels[j], 202 | num_inchannels[i], 203 | 1, 1, 0, bias=False 204 | ), 205 | nn.BatchNorm2d(num_inchannels[i]), 206 | nn.Upsample(scale_factor=2**(j-i), mode='nearest') 207 | ) 208 | ) 209 | elif j == i: 210 | fuse_layer.append(None) 211 | else: 212 | conv3x3s = [] 213 | for k in range(i-j): 214 | if k == i - j - 1: 215 | num_outchannels_conv3x3 = num_inchannels[i] 216 | conv3x3s.append( 217 | nn.Sequential( 218 | nn.Conv2d( 219 | num_inchannels[j], 220 | num_outchannels_conv3x3, 221 | 3, 2, 1, bias=False 222 | ), 223 | nn.BatchNorm2d(num_outchannels_conv3x3) 224 | ) 225 | ) 226 | else: 227 | num_outchannels_conv3x3 = num_inchannels[j] 228 | conv3x3s.append( 229 | nn.Sequential( 230 | nn.Conv2d( 231 | num_inchannels[j], 232 | num_outchannels_conv3x3, 233 | 3, 2, 1, bias=False 234 | ), 235 | nn.BatchNorm2d(num_outchannels_conv3x3), 236 | nn.ReLU(True) 237 | ) 238 | ) 239 | fuse_layer.append(nn.Sequential(*conv3x3s)) 240 | fuse_layers.append(nn.ModuleList(fuse_layer)) 241 | 242 | return nn.ModuleList(fuse_layers) 243 | 244 | def get_num_inchannels(self): 245 | return self.num_inchannels 246 | 247 | def forward(self, x): 248 | if self.num_branches == 1: 249 | return [self.branches[0](x[0])] 250 | 251 | for i in range(self.num_branches): 252 | x[i] = self.branches[i](x[i]) 253 | 254 | x_fuse = [] 255 | 256 | for i in range(len(self.fuse_layers)): 257 | y = x[0] if i == 0 else self.fuse_layers[i][0](x[0]) 258 | for j in range(1, self.num_branches): 259 | if i == j: 260 | y = y + x[j] 261 | else: 262 | y = y + self.fuse_layers[i][j](x[j]) 263 | x_fuse.append(self.relu(y)) 264 | 265 | return x_fuse 266 | 267 | 268 | blocks_dict = { 269 | 'BASIC': BasicBlock, 270 | 'BOTTLENECK': Bottleneck 271 | } 272 | 273 | 274 | class PoseHighResolutionNet(nn.Module): 275 | 276 | def __init__(self, cfg, **kwargs): 277 | self.inplanes = 64 278 | extra = cfg['MODEL']['EXTRA'] 279 | super(PoseHighResolutionNet, self).__init__() 280 | 281 | # stem net 282 | self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=2, padding=1, 283 | bias=False) 284 | self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM) 285 | self.conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1, 286 | bias=False) 287 | self.bn2 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM) 288 | self.relu = nn.ReLU(inplace=True) 289 | self.layer1 = self._make_layer(Bottleneck, 64, 4) 290 | 291 | self.stage2_cfg = extra['STAGE2'] 292 | num_channels = self.stage2_cfg['NUM_CHANNELS'] 293 | block = blocks_dict[self.stage2_cfg['BLOCK']] 294 | num_channels = [ 295 | num_channels[i] * block.expansion for i in range(len(num_channels)) 296 | ] 297 | self.transition1 = self._make_transition_layer([256], num_channels) 298 | self.stage2, pre_stage_channels = self._make_stage( 299 | self.stage2_cfg, num_channels) 300 | 301 | self.stage3_cfg = extra['STAGE3'] 302 | num_channels = self.stage3_cfg['NUM_CHANNELS'] 303 | block = blocks_dict[self.stage3_cfg['BLOCK']] 304 | num_channels = [ 305 | num_channels[i] * block.expansion for i in range(len(num_channels)) 306 | ] 307 | self.transition2 = self._make_transition_layer( 308 | pre_stage_channels, num_channels) 309 | self.stage3, pre_stage_channels = self._make_stage( 310 | self.stage3_cfg, num_channels) 311 | 312 | self.stage4_cfg = extra['STAGE4'] 313 | num_channels = self.stage4_cfg['NUM_CHANNELS'] 314 | block = blocks_dict[self.stage4_cfg['BLOCK']] 315 | num_channels = [ 316 | num_channels[i] * block.expansion for i in range(len(num_channels)) 317 | ] 318 | self.transition3 = self._make_transition_layer( 319 | pre_stage_channels, num_channels) 320 | self.stage4, pre_stage_channels = self._make_stage( 321 | self.stage4_cfg, num_channels, multi_scale_output=False) 322 | 323 | self.final_layer = nn.Conv2d( 324 | in_channels=pre_stage_channels[0], 325 | out_channels=cfg['MODEL']['NUM_JOINTS'], 326 | kernel_size=extra['FINAL_CONV_KERNEL'], 327 | stride=1, 328 | padding=1 if extra['FINAL_CONV_KERNEL'] == 3 else 0 329 | ) 330 | 331 | self.pretrained_layers = extra['PRETRAINED_LAYERS'] 332 | 333 | def _make_transition_layer( 334 | self, num_channels_pre_layer, num_channels_cur_layer): 335 | num_branches_cur = len(num_channels_cur_layer) 336 | num_branches_pre = len(num_channels_pre_layer) 337 | 338 | transition_layers = [] 339 | for i in range(num_branches_cur): 340 | if i < num_branches_pre: 341 | if num_channels_cur_layer[i] != num_channels_pre_layer[i]: 342 | transition_layers.append( 343 | nn.Sequential( 344 | nn.Conv2d( 345 | num_channels_pre_layer[i], 346 | num_channels_cur_layer[i], 347 | 3, 1, 1, bias=False 348 | ), 349 | nn.BatchNorm2d(num_channels_cur_layer[i]), 350 | nn.ReLU(inplace=True) 351 | ) 352 | ) 353 | else: 354 | transition_layers.append(None) 355 | else: 356 | conv3x3s = [] 357 | for j in range(i+1-num_branches_pre): 358 | inchannels = num_channels_pre_layer[-1] 359 | outchannels = num_channels_cur_layer[i] \ 360 | if j == i-num_branches_pre else inchannels 361 | conv3x3s.append( 362 | nn.Sequential( 363 | nn.Conv2d( 364 | inchannels, outchannels, 3, 2, 1, bias=False 365 | ), 366 | nn.BatchNorm2d(outchannels), 367 | nn.ReLU(inplace=True) 368 | ) 369 | ) 370 | transition_layers.append(nn.Sequential(*conv3x3s)) 371 | 372 | return nn.ModuleList(transition_layers) 373 | 374 | def _make_layer(self, block, planes, blocks, stride=1): 375 | downsample = None 376 | if stride != 1 or self.inplanes != planes * block.expansion: 377 | downsample = nn.Sequential( 378 | nn.Conv2d( 379 | self.inplanes, planes * block.expansion, 380 | kernel_size=1, stride=stride, bias=False 381 | ), 382 | nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM), 383 | ) 384 | 385 | layers = [] 386 | layers.append(block(self.inplanes, planes, stride, downsample)) 387 | self.inplanes = planes * block.expansion 388 | for i in range(1, blocks): 389 | layers.append(block(self.inplanes, planes)) 390 | 391 | return nn.Sequential(*layers) 392 | 393 | def _make_stage(self, layer_config, num_inchannels, 394 | multi_scale_output=True): 395 | num_modules = layer_config['NUM_MODULES'] 396 | num_branches = layer_config['NUM_BRANCHES'] 397 | num_blocks = layer_config['NUM_BLOCKS'] 398 | num_channels = layer_config['NUM_CHANNELS'] 399 | block = blocks_dict[layer_config['BLOCK']] 400 | fuse_method = layer_config['FUSE_METHOD'] 401 | 402 | modules = [] 403 | for i in range(num_modules): 404 | # multi_scale_output is only used last module 405 | if not multi_scale_output and i == num_modules - 1: 406 | reset_multi_scale_output = False 407 | else: 408 | reset_multi_scale_output = True 409 | 410 | modules.append( 411 | HighResolutionModule( 412 | num_branches, 413 | block, 414 | num_blocks, 415 | num_inchannels, 416 | num_channels, 417 | fuse_method, 418 | reset_multi_scale_output 419 | ) 420 | ) 421 | num_inchannels = modules[-1].get_num_inchannels() 422 | 423 | return nn.Sequential(*modules), num_inchannels 424 | 425 | def forward(self, x): 426 | x = self.conv1(x) 427 | x = self.bn1(x) 428 | x = self.relu(x) 429 | x = self.conv2(x) 430 | x = self.bn2(x) 431 | x = self.relu(x) 432 | x = self.layer1(x) 433 | 434 | x_list = [] 435 | for i in range(self.stage2_cfg['NUM_BRANCHES']): 436 | if self.transition1[i] is not None: 437 | x_list.append(self.transition1[i](x)) 438 | else: 439 | x_list.append(x) 440 | y_list = self.stage2(x_list) 441 | 442 | x_list = [] 443 | for i in range(self.stage3_cfg['NUM_BRANCHES']): 444 | if self.transition2[i] is not None: 445 | x_list.append(self.transition2[i](y_list[-1])) 446 | else: 447 | x_list.append(y_list[i]) 448 | y_list = self.stage3(x_list) 449 | 450 | x_list = [] 451 | for i in range(self.stage4_cfg['NUM_BRANCHES']): 452 | if self.transition3[i] is not None: 453 | x_list.append(self.transition3[i](y_list[-1])) 454 | else: 455 | x_list.append(y_list[i]) 456 | y_list = self.stage4(x_list) 457 | 458 | x = self.final_layer(y_list[0]) 459 | 460 | return x 461 | 462 | def init_weights(self, pretrained=''): 463 | logger.info('=> init weights from normal distribution') 464 | for m in self.modules(): 465 | if isinstance(m, nn.Conv2d): 466 | # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 467 | nn.init.normal_(m.weight, std=0.001) 468 | for name, _ in m.named_parameters(): 469 | if name in ['bias']: 470 | nn.init.constant_(m.bias, 0) 471 | elif isinstance(m, nn.BatchNorm2d): 472 | nn.init.constant_(m.weight, 1) 473 | nn.init.constant_(m.bias, 0) 474 | elif isinstance(m, nn.ConvTranspose2d): 475 | nn.init.normal_(m.weight, std=0.001) 476 | for name, _ in m.named_parameters(): 477 | if name in ['bias']: 478 | nn.init.constant_(m.bias, 0) 479 | 480 | if os.path.isfile(pretrained): 481 | pretrained_state_dict = torch.load(pretrained) 482 | logger.info('=> loading pretrained model {}'.format(pretrained)) 483 | 484 | need_init_state_dict = {} 485 | for name, m in pretrained_state_dict.items(): 486 | if name.split('.')[0] in self.pretrained_layers \ 487 | or self.pretrained_layers[0] is '*': 488 | need_init_state_dict[name] = m 489 | self.load_state_dict(need_init_state_dict, strict=False) 490 | elif pretrained: 491 | logger.error('=> please download pre-trained models first!') 492 | raise ValueError('{} is not exist!'.format(pretrained)) 493 | 494 | 495 | def get_pose_net(cfg, is_train, **kwargs): 496 | model = PoseHighResolutionNet(cfg, **kwargs) 497 | 498 | if is_train and cfg['MODEL']['INIT_WEIGHTS']: 499 | model.init_weights(cfg['MODEL']['PRETRAINED']) 500 | 501 | return model 502 | -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/utils/__pycache__/coco_h36m.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/utils/__pycache__/coco_h36m.cpython-39.pyc -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/utils/__pycache__/inference.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/utils/__pycache__/inference.cpython-39.pyc -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/utils/__pycache__/transforms.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/utils/__pycache__/transforms.cpython-39.pyc -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/utils/__pycache__/utilitys.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/utils/__pycache__/utilitys.cpython-39.pyc -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/utils/coco_h36m.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | h36m_coco_order = [9, 11, 14, 12, 15, 13, 16, 4, 1, 5, 2, 6, 3] 5 | coco_order = [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16] 6 | spple_keypoints = [10, 8, 0, 7] 7 | 8 | 9 | def coco_h36m(keypoints): 10 | # keypoints: (T, N, 2) or (M, N, 2) 11 | 12 | temporal = keypoints.shape[0] 13 | keypoints_h36m = np.zeros_like(keypoints, dtype=np.float32) 14 | htps_keypoints = np.zeros((temporal, 4, 2), dtype=np.float32) 15 | 16 | # htps_keypoints: head, thorax, pelvis, spine 17 | htps_keypoints[:, 0, 0] = np.mean(keypoints[:, 1:5, 0], axis=1, dtype=np.float32) 18 | htps_keypoints[:, 0, 1] = np.sum(keypoints[:, 1:3, 1], axis=1, dtype=np.float32) - keypoints[:, 0, 1] 19 | htps_keypoints[:, 1, :] = np.mean(keypoints[:, 5:7, :], axis=1, dtype=np.float32) 20 | htps_keypoints[:, 1, :] += (keypoints[:, 0, :] - htps_keypoints[:, 1, :]) / 3 21 | 22 | htps_keypoints[:, 2, :] = np.mean(keypoints[:, 11:13, :], axis=1, dtype=np.float32) 23 | htps_keypoints[:, 3, :] = np.mean(keypoints[:, [5, 6, 11, 12], :], axis=1, dtype=np.float32) 24 | 25 | keypoints_h36m[:, spple_keypoints, :] = htps_keypoints 26 | keypoints_h36m[:, h36m_coco_order, :] = keypoints[:, coco_order, :] 27 | 28 | keypoints_h36m[:, 9, :] -= (keypoints_h36m[:, 9, :] - np.mean(keypoints[:, 5:7, :], axis=1, dtype=np.float32)) / 4 29 | keypoints_h36m[:, 7, 0] += 0.3*(keypoints_h36m[:, 7, 0] - np.mean(keypoints_h36m[:, [0, 8], 0], axis=1, dtype=np.float32)) 30 | keypoints_h36m[:, 8, 1] -= (np.mean(keypoints[:, 1:3, 1], axis=1, dtype=np.float32) - keypoints[:, 0, 1])*2/3 31 | 32 | # half body: the joint of ankle and knee equal to hip 33 | # keypoints_h36m[:, [2, 3]] = keypoints_h36m[:, [1, 1]] 34 | # keypoints_h36m[:, [5, 6]] = keypoints_h36m[:, [4, 4]] 35 | return keypoints_h36m 36 | 37 | 38 | h36m_mpii_order = [3, 2, 1, 4, 5, 6, 0, 8, 9, 10, 16, 15, 14, 11, 12, 13] 39 | mpii_order = [i for i in range(16)] 40 | lr_hip_shouler = [2, 3, 12, 13] 41 | 42 | 43 | def mpii_h36m(keypoints): 44 | temporal = keypoints.shape[0] 45 | keypoints_h36m = np.zeros((temporal, 17, 2), dtype=np.float32) 46 | keypoints_h36m[:, h36m_mpii_order] = keypoints 47 | # keypoints_h36m[:, 7] = np.mean(keypoints[:, 6:8], axis=1, dtype=np.float32) 48 | keypoints_h36m[:, 7] = np.mean(keypoints[:, lr_hip_shouler], axis=1, dtype=np.float32) 49 | return keypoints_h36m 50 | 51 | 52 | -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/utils/inference.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import math 12 | import sys 13 | import os.path as osp 14 | import numpy as np 15 | 16 | sys.path.insert(0, osp.join(osp.dirname(osp.realpath(__file__)), '..')) 17 | from utils.transforms import transform_preds 18 | sys.path.pop(0) 19 | 20 | 21 | def get_max_preds(batch_heatmaps): 22 | ''' 23 | get predictions from score maps 24 | heatmaps: numpy.ndarray([batch_size, num_joints, height, width]) 25 | ''' 26 | assert isinstance(batch_heatmaps, np.ndarray), \ 27 | 'batch_heatmaps should be numpy.ndarray' 28 | assert batch_heatmaps.ndim == 4, 'batch_images should be 4-ndim' 29 | 30 | batch_size = batch_heatmaps.shape[0] 31 | num_joints = batch_heatmaps.shape[1] 32 | width = batch_heatmaps.shape[3] 33 | heatmaps_reshaped = batch_heatmaps.reshape((batch_size, num_joints, -1)) 34 | idx = np.argmax(heatmaps_reshaped, 2) 35 | maxvals = np.amax(heatmaps_reshaped, 2) 36 | 37 | maxvals = maxvals.reshape((batch_size, num_joints, 1)) 38 | idx = idx.reshape((batch_size, num_joints, 1)) 39 | 40 | preds = np.tile(idx, (1, 1, 2)).astype(np.float32) 41 | 42 | preds[:, :, 0] = (preds[:, :, 0]) % width 43 | preds[:, :, 1] = np.floor((preds[:, :, 1]) / width) 44 | 45 | pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2)) 46 | pred_mask = pred_mask.astype(np.float32) 47 | 48 | preds *= pred_mask 49 | return preds, maxvals 50 | 51 | 52 | def get_final_preds(config, batch_heatmaps, center, scale): 53 | coords, maxvals = get_max_preds(batch_heatmaps) 54 | 55 | heatmap_height = batch_heatmaps.shape[2] 56 | heatmap_width = batch_heatmaps.shape[3] 57 | 58 | # post-processing 59 | if config.TEST.POST_PROCESS: 60 | for n in range(coords.shape[0]): 61 | for p in range(coords.shape[1]): 62 | hm = batch_heatmaps[n][p] 63 | px = int(math.floor(coords[n][p][0] + 0.5)) 64 | py = int(math.floor(coords[n][p][1] + 0.5)) 65 | if 1 < px < heatmap_width-1 and 1 < py < heatmap_height-1: 66 | diff = np.array( 67 | [ 68 | hm[py][px+1] - hm[py][px-1], 69 | hm[py+1][px]-hm[py-1][px] 70 | ] 71 | ) 72 | coords[n][p] += np.sign(diff) * .25 73 | 74 | preds = coords.copy() 75 | 76 | # Transform back 77 | for i in range(coords.shape[0]): 78 | preds[i] = transform_preds( 79 | coords[i], center[i], scale[i], [heatmap_width, heatmap_height] 80 | ) 81 | 82 | return preds, maxvals 83 | -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/utils/transforms.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------------------ 2 | # Copyright (c) Microsoft 3 | # Licensed under the MIT License. 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com) 5 | # ------------------------------------------------------------------------------ 6 | 7 | from __future__ import absolute_import 8 | from __future__ import division 9 | from __future__ import print_function 10 | 11 | import numpy as np 12 | import cv2 13 | 14 | 15 | def flip_back(output_flipped, matched_parts): 16 | ''' 17 | ouput_flipped: numpy.ndarray(batch_size, num_joints, height, width) 18 | ''' 19 | assert output_flipped.ndim == 4,\ 20 | 'output_flipped should be [batch_size, num_joints, height, width]' 21 | 22 | output_flipped = output_flipped[:, :, :, ::-1] 23 | 24 | # 因为你输入的是翻转后的图像,所以输出的热图他们对应的左右关节也是相反的(训练的时候,输入的是翻转后的图像,target对应的左右关节也是对调过来的)。 25 | for pair in matched_parts: 26 | tmp = output_flipped[:, pair[0], :, :].copy() 27 | output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :] 28 | output_flipped[:, pair[1], :, :] = tmp 29 | 30 | return output_flipped 31 | 32 | 33 | def fliplr_joints(joints, joints_vis, width, matched_parts): 34 | """ 35 | flip coords 36 | """ 37 | # Flip horizontal 38 | joints[:, 0] = width - joints[:, 0] - 1 39 | 40 | # Change left-right parts 41 | for pair in matched_parts: 42 | joints[pair[0], :], joints[pair[1], :] = \ 43 | joints[pair[1], :], joints[pair[0], :].copy() 44 | joints_vis[pair[0], :], joints_vis[pair[1], :] = \ 45 | joints_vis[pair[1], :], joints_vis[pair[0], :].copy() 46 | 47 | return joints*joints_vis, joints_vis 48 | 49 | 50 | def transform_preds(coords, center, scale, output_size): 51 | target_coords = np.zeros(coords.shape) 52 | trans = get_affine_transform(center, scale, 0, output_size, inv=1) 53 | for p in range(coords.shape[0]): 54 | target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans) 55 | return target_coords 56 | 57 | 58 | def get_affine_transform( 59 | center, scale, rot, output_size, 60 | shift=np.array([0, 0], dtype=np.float32), inv=0 61 | ): 62 | if not isinstance(scale, np.ndarray) and not isinstance(scale, list): 63 | print(scale) 64 | scale = np.array([scale, scale]) 65 | 66 | scale_tmp = scale * 200.0 67 | src_w = scale_tmp[0] 68 | dst_w = output_size[0] 69 | dst_h = output_size[1] 70 | 71 | rot_rad = np.pi * rot / 180 72 | src_dir = get_dir([0, src_w * -0.5], rot_rad) 73 | dst_dir = np.array([0, dst_w * -0.5], np.float32) 74 | 75 | src = np.zeros((3, 2), dtype=np.float32) 76 | dst = np.zeros((3, 2), dtype=np.float32) 77 | src[0, :] = center + scale_tmp * shift 78 | src[1, :] = center + src_dir + scale_tmp * shift 79 | dst[0, :] = [dst_w * 0.5, dst_h * 0.5] 80 | dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir 81 | 82 | src[2:, :] = get_3rd_point(src[0, :], src[1, :]) 83 | dst[2:, :] = get_3rd_point(dst[0, :], dst[1, :]) 84 | 85 | if inv: 86 | trans = cv2.getAffineTransform(np.float32(dst), np.float32(src)) 87 | else: 88 | trans = cv2.getAffineTransform(np.float32(src), np.float32(dst)) 89 | 90 | return trans 91 | 92 | 93 | def affine_transform(pt, t): 94 | new_pt = np.array([pt[0], pt[1], 1.]).T 95 | new_pt = np.dot(t, new_pt) 96 | return new_pt[:2] 97 | 98 | 99 | def get_3rd_point(a, b): 100 | direct = a - b 101 | return b + np.array([-direct[1], direct[0]], dtype=np.float32) 102 | 103 | 104 | def get_dir(src_point, rot_rad): 105 | sn, cs = np.sin(rot_rad), np.cos(rot_rad) 106 | 107 | src_result = [0, 0] 108 | src_result[0] = src_point[0] * cs - src_point[1] * sn 109 | src_result[1] = src_point[0] * sn + src_point[1] * cs 110 | 111 | return src_result 112 | 113 | 114 | def crop(img, center, scale, output_size, rot=0): 115 | trans = get_affine_transform(center, scale, rot, output_size) 116 | 117 | dst_img = cv2.warpAffine( 118 | img, trans, (int(output_size[0]), int(output_size[1])), 119 | flags=cv2.INTER_LINEAR 120 | ) 121 | 122 | return dst_img 123 | -------------------------------------------------------------------------------- /demo/lib/hrnet/lib/utils/utilitys.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import sys 3 | import torch 4 | import json 5 | import torchvision.transforms as transforms 6 | from lib.hrnet.lib.utils.transforms import * 7 | 8 | from lib.hrnet.lib.utils.coco_h36m import coco_h36m 9 | import numpy as np 10 | 11 | joint_pairs = [[0, 1], [1, 3], [0, 2], [2, 4], 12 | [5, 6], [5, 7], [7, 9], [6, 8], [8, 10], 13 | [5, 11], [6, 12], [11, 12], 14 | [11, 13], [12, 14], [13, 15], [14, 16]] 15 | 16 | h36m_pairs = [(0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6), (0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12), 17 | (12, 13), (8, 14), (14, 15), (15, 16)] 18 | 19 | colors = [[255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0], [85, 255, 0], [0, 255, 0], \ 20 | [0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], \ 21 | [170, 0, 255], [255, 0, 255]] 22 | 23 | 24 | def plot_keypoint(image, coordinates, confidence, keypoint_thresh=0.3): 25 | # USE cv2 26 | joint_visible = confidence[:, :, 0] > keypoint_thresh 27 | coordinates = coco_h36m(coordinates) 28 | for i in range(coordinates.shape[0]): 29 | pts = coordinates[i] 30 | 31 | for joint in pts: 32 | cv2.circle(image, (int(joint[0]), int(joint[1])), 8, (255, 255, 255), 1) 33 | 34 | for color_i, jp in zip(colors, h36m_pairs): 35 | if joint_visible[i, jp[0]] and joint_visible[i, jp[1]]: 36 | pt0 = pts[jp, 0] 37 | pt1 = pts[jp, 1] 38 | pt0_0, pt0_1, pt1_0, pt1_1 = int(pt0[0]), int(pt0[1]), int(pt1[0]), int(pt1[1]) 39 | 40 | cv2.line(image, (pt0_0, pt1_0), (pt0_1, pt1_1), color_i, 6) 41 | # cv2.circle(image,(pt0_0, pt0_1), 2, color_i, thickness=-1) 42 | # cv2.circle(image,(pt1_0, pt1_1), 2, color_i, thickness=-1) 43 | return image 44 | 45 | 46 | def write(x, img): 47 | x = [int(i) for i in x] 48 | c1 = tuple(x[0:2]) 49 | c2 = tuple(x[2:4]) 50 | 51 | color = [0, 97, 255] 52 | label = 'People {}'.format(x[-1]) 53 | cv2.rectangle(img, c1, c2, color, 2) 54 | t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1, 1)[0] 55 | c2 = c1[0] + t_size[0] + 3, c1[1] + t_size[1] + 4 56 | cv2.rectangle(img, c1, c2, [0, 128, 255], -1) 57 | cv2.putText(img, label, (c1[0], c1[1] + t_size[1] + 4), cv2.FONT_HERSHEY_PLAIN, 1, [225, 255, 255], 1) 58 | return img 59 | 60 | 61 | def load_json(file_path): 62 | with open(file_path, 'r') as fr: 63 | video_info = json.load(fr) 64 | 65 | label = video_info['label'] 66 | label_index = video_info['label_index'] 67 | 68 | num_frames = video_info['data'][-1]['frame_index'] 69 | keypoints = np.zeros((2, num_frames, 17, 2), dtype=np.float32) # (M, T, N, 2) 70 | scores = np.zeros((2, num_frames, 17), dtype=np.float32) # (M, T, N) 71 | 72 | for frame_info in video_info['data']: 73 | frame_index = frame_info['frame_index'] 74 | 75 | for index, skeleton_info in enumerate(frame_info['skeleton']): 76 | pose = skeleton_info['pose'] 77 | score = skeleton_info['score'] 78 | bbox = skeleton_info['bbox'] 79 | 80 | if len(bbox) == 0 or index+1 > 2: 81 | continue 82 | 83 | pose = np.asarray(pose, dtype=np.float32) 84 | score = np.asarray(score, dtype=np.float32) 85 | score = score.reshape(-1) 86 | 87 | keypoints[index, frame_index-1] = pose 88 | scores[index, frame_index-1] = score 89 | 90 | new_kpts = [] 91 | for i in range(keypoints.shape[0]): 92 | kps = keypoints[i] 93 | if np.sum(kps) != 0.: 94 | new_kpts.append(kps) 95 | 96 | new_kpts = np.asarray(new_kpts, dtype=np.float32) 97 | scores = np.asarray(scores, dtype=np.float32) 98 | scores = scores[:, :, :, np.newaxis] 99 | return new_kpts, scores, label, label_index 100 | 101 | 102 | def box_to_center_scale(box, model_image_width, model_image_height): 103 | """convert a box to center,scale information required for pose transformation 104 | Parameters 105 | ---------- 106 | box : (x1, y1, x2, y2) 107 | model_image_width : int 108 | model_image_height : int 109 | 110 | Returns 111 | ------- 112 | (numpy array, numpy array) 113 | Two numpy arrays, coordinates for the center of the box and the scale of the box 114 | """ 115 | center = np.zeros((2), dtype=np.float32) 116 | x1, y1, x2, y2 = box[:4] 117 | box_width, box_height = x2 - x1, y2 - y1 118 | 119 | center[0] = x1 + box_width * 0.5 120 | center[1] = y1 + box_height * 0.5 121 | 122 | aspect_ratio = model_image_width * 1.0 / model_image_height 123 | pixel_std = 200 124 | 125 | if box_width > aspect_ratio * box_height: 126 | box_height = box_width * 1.0 / aspect_ratio 127 | elif box_width < aspect_ratio * box_height: 128 | box_width = box_height * aspect_ratio 129 | scale = np.array( 130 | [box_width * 1.0 / pixel_std, box_height * 1.0 / pixel_std], 131 | dtype=np.float32) 132 | if center[0] != -1: 133 | scale = scale * 1.25 134 | 135 | return center, scale 136 | 137 | 138 | # Pre-process 139 | def PreProcess(image, bboxs, cfg, num_pos=2): 140 | if type(image) == str: 141 | data_numpy = cv2.imread(image, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION) 142 | # data_numpy = cv2.cvtColor(data_numpy, cv2.COLOR_BGR2RGB) 143 | else: 144 | data_numpy = image 145 | 146 | inputs = [] 147 | centers = [] 148 | scales = [] 149 | 150 | for bbox in bboxs[:num_pos]: 151 | c, s = box_to_center_scale(bbox, data_numpy.shape[0], data_numpy.shape[1]) 152 | centers.append(c) 153 | scales.append(s) 154 | r = 0 155 | 156 | trans = get_affine_transform(c, s, r, cfg.MODEL.IMAGE_SIZE) 157 | input = cv2.warpAffine( 158 | data_numpy, 159 | trans, 160 | (int(cfg.MODEL.IMAGE_SIZE[0]), int(cfg.MODEL.IMAGE_SIZE[1])), 161 | flags=cv2.INTER_LINEAR) 162 | 163 | transform = transforms.Compose([transforms.ToTensor(), 164 | transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) 165 | input = transform(input).unsqueeze(0) 166 | inputs.append(input) 167 | 168 | inputs = torch.cat(inputs) 169 | return inputs, data_numpy, centers, scales 170 | -------------------------------------------------------------------------------- /demo/lib/preprocess.py: -------------------------------------------------------------------------------- 1 | import json 2 | import numpy as np 3 | import os 4 | 5 | h36m_coco_order = [9, 11, 14, 12, 15, 13, 16, 4, 1, 5, 2, 6, 3] 6 | coco_order = [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16] 7 | spple_keypoints = [10, 8, 0, 7] 8 | 9 | 10 | def coco_h36m(keypoints): 11 | temporal = keypoints.shape[0] 12 | keypoints_h36m = np.zeros_like(keypoints, dtype=np.float32) 13 | htps_keypoints = np.zeros((temporal, 4, 2), dtype=np.float32) 14 | 15 | # htps_keypoints: head, thorax, pelvis, spine 16 | htps_keypoints[:, 0, 0] = np.mean(keypoints[:, 1:5, 0], axis=1, dtype=np.float32) 17 | htps_keypoints[:, 0, 1] = np.sum(keypoints[:, 1:3, 1], axis=1, dtype=np.float32) - keypoints[:, 0, 1] 18 | htps_keypoints[:, 1, :] = np.mean(keypoints[:, 5:7, :], axis=1, dtype=np.float32) 19 | htps_keypoints[:, 1, :] += (keypoints[:, 0, :] - htps_keypoints[:, 1, :]) / 3 20 | 21 | htps_keypoints[:, 2, :] = np.mean(keypoints[:, 11:13, :], axis=1, dtype=np.float32) 22 | htps_keypoints[:, 3, :] = np.mean(keypoints[:, [5, 6, 11, 12], :], axis=1, dtype=np.float32) 23 | 24 | keypoints_h36m[:, spple_keypoints, :] = htps_keypoints 25 | keypoints_h36m[:, h36m_coco_order, :] = keypoints[:, coco_order, :] 26 | 27 | keypoints_h36m[:, 9, :] -= (keypoints_h36m[:, 9, :] - np.mean(keypoints[:, 5:7, :], axis=1, dtype=np.float32)) / 4 28 | keypoints_h36m[:, 7, 0] += 2*(keypoints_h36m[:, 7, 0] - np.mean(keypoints_h36m[:, [0, 8], 0], axis=1, dtype=np.float32)) 29 | keypoints_h36m[:, 8, 1] -= (np.mean(keypoints[:, 1:3, 1], axis=1, dtype=np.float32) - keypoints[:, 0, 1])*2/3 30 | 31 | # half body: the joint of ankle and knee equal to hip 32 | # keypoints_h36m[:, [2, 3]] = keypoints_h36m[:, [1, 1]] 33 | # keypoints_h36m[:, [5, 6]] = keypoints_h36m[:, [4, 4]] 34 | 35 | valid_frames = np.where(np.sum(keypoints_h36m.reshape(-1, 34), axis=1) != 0)[0] 36 | 37 | return keypoints_h36m, valid_frames 38 | 39 | 40 | def h36m_coco_format(keypoints, scores): 41 | assert len(keypoints.shape) == 4 and len(scores.shape) == 3 42 | 43 | h36m_kpts = [] 44 | h36m_scores = [] 45 | valid_frames = [] 46 | 47 | for i in range(keypoints.shape[0]): 48 | kpts = keypoints[i] 49 | score = scores[i] 50 | 51 | new_score = np.zeros_like(score, dtype=np.float32) 52 | 53 | if np.sum(kpts) != 0.: 54 | kpts, valid_frame = coco_h36m(kpts) 55 | h36m_kpts.append(kpts) 56 | valid_frames.append(valid_frame) 57 | 58 | new_score[:, h36m_coco_order] = score[:, coco_order] 59 | new_score[:, 0] = np.mean(score[:, [11, 12]], axis=1, dtype=np.float32) 60 | new_score[:, 8] = np.mean(score[:, [5, 6]], axis=1, dtype=np.float32) 61 | new_score[:, 7] = np.mean(new_score[:, [0, 8]], axis=1, dtype=np.float32) 62 | new_score[:, 10] = np.mean(score[:, [1, 2, 3, 4]], axis=1, dtype=np.float32) 63 | 64 | h36m_scores.append(new_score) 65 | 66 | h36m_kpts = np.asarray(h36m_kpts, dtype=np.float32) 67 | h36m_scores = np.asarray(h36m_scores, dtype=np.float32) 68 | 69 | return h36m_kpts, h36m_scores, valid_frames 70 | 71 | 72 | def revise_kpts(h36m_kpts, h36m_scores, valid_frames): 73 | 74 | new_h36m_kpts = np.zeros_like(h36m_kpts) 75 | for index, frames in enumerate(valid_frames): 76 | kpts = h36m_kpts[index, frames] 77 | score = h36m_scores[index, frames] 78 | 79 | index_frame = np.where(np.sum(score < 0.3, axis=1) > 0)[0] 80 | 81 | for frame in index_frame: 82 | less_threshold_joints = np.where(score[frame] < 0.3)[0] 83 | 84 | intersect = [i for i in [2, 3, 5, 6] if i in less_threshold_joints] 85 | 86 | if [2, 3, 5, 6] == intersect: 87 | kpts[frame, [2, 3, 5, 6]] = kpts[frame, [1, 1, 4, 4]] 88 | elif [2, 3, 6] == intersect: 89 | kpts[frame, [2, 3, 6]] = kpts[frame, [1, 1, 5]] 90 | elif [3, 5, 6] == intersect: 91 | kpts[frame, [3, 5, 6]] = kpts[frame, [2, 4, 4]] 92 | elif [3, 6] == intersect: 93 | kpts[frame, [3, 6]] = kpts[frame, [2, 5]] 94 | elif [3] == intersect: 95 | kpts[frame, 3] = kpts[frame, 2] 96 | elif [6] == intersect: 97 | kpts[frame, 6] = kpts[frame, 5] 98 | else: 99 | continue 100 | 101 | new_h36m_kpts[index, frames] = kpts 102 | 103 | return new_h36m_kpts 104 | 105 | 106 | -------------------------------------------------------------------------------- /demo/lib/sort/sort.py: -------------------------------------------------------------------------------- 1 | """ 2 | https://arxiv.org/abs/1602.00763 3 | """ 4 | from __future__ import print_function 5 | 6 | from numba import jit 7 | import os.path 8 | import numpy as np 9 | from skimage import io 10 | from scipy.optimize import linear_sum_assignment 11 | import argparse 12 | from filterpy.kalman import KalmanFilter 13 | 14 | 15 | @jit 16 | def iou(bb_test, bb_gt): 17 | """ 18 | Computes IUO between two bboxes in the form [x1,y1,x2,y2] 19 | """ 20 | xx1 = np.maximum(bb_test[0], bb_gt[0]) 21 | yy1 = np.maximum(bb_test[1], bb_gt[1]) 22 | xx2 = np.minimum(bb_test[2], bb_gt[2]) 23 | yy2 = np.minimum(bb_test[3], bb_gt[3]) 24 | w = np.maximum(0., xx2 - xx1) 25 | h = np.maximum(0., yy2 - yy1) 26 | wh = w * h 27 | o = wh / ((bb_test[2] - bb_test[0]) * (bb_test[3] - bb_test[1]) 28 | + (bb_gt[2] - bb_gt[0]) * (bb_gt[3] - bb_gt[1]) - wh) 29 | 30 | return o 31 | 32 | 33 | def convert_bbox_to_z(bbox): 34 | """ 35 | Takes a bounding box in the form [x1,y1,x2,y2] and returns z in the form 36 | [x,y,s,r] where x,y is the centre of the box and s is the scale/area and r is 37 | the aspect ratio 38 | """ 39 | w = bbox[2] - bbox[0] 40 | h = bbox[3] - bbox[1] 41 | x = bbox[0] + w / 2. 42 | y = bbox[1] + h / 2. 43 | s = w * h # scale is just area 44 | r = w / float(h) 45 | return np.array([x, y, s, r]).reshape((4, 1)) 46 | 47 | 48 | def convert_x_to_bbox(x, score=None): 49 | """ 50 | Takes a bounding box in the centre form [x,y,s,r] and returns it in the form 51 | [x1,y1,x2,y2] where x1,y1 is the top left and x2,y2 is the bottom right 52 | """ 53 | w = np.sqrt(x[2] * x[3]) 54 | h = x[2] / w 55 | if (score == None): 56 | return np.array([x[0] - w / 2., x[1] - h / 2., x[0] + w / 2., x[1] + h / 2.]).reshape((1, 4)) 57 | else: 58 | return np.array([x[0] - w / 2., x[1] - h / 2., x[0] + w / 2., x[1] + h / 2., score]).reshape((1, 5)) 59 | 60 | 61 | class KalmanBoxTracker(object): 62 | """ 63 | This class represents the internel state of individual tracked objects observed as bbox. 64 | """ 65 | count = 0 66 | 67 | def __init__(self, bbox): 68 | """ 69 | Initialises a tracker using initial bounding box. 70 | """ 71 | # define constant velocity model 72 | self.kf = KalmanFilter(dim_x=7, dim_z=4) 73 | self.kf.F = np.array( 74 | [[1, 0, 0, 0, 1, 0, 0], [0, 1, 0, 0, 0, 1, 0], [0, 0, 1, 0, 0, 0, 1], [0, 0, 0, 1, 0, 0, 0], 75 | [0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 1]]) 76 | self.kf.H = np.array( 77 | [[1, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0]]) 78 | 79 | self.kf.R[2:, 2:] *= 10. 80 | self.kf.P[4:, 4:] *= 1000. # give high uncertainty to the unobservable initial velocities 81 | self.kf.P *= 10. 82 | self.kf.Q[-1, -1] *= 0.01 83 | self.kf.Q[4:, 4:] *= 0.01 84 | 85 | self.kf.x[:4] = convert_bbox_to_z(bbox) 86 | self.time_since_update = 0 87 | self.id = KalmanBoxTracker.count 88 | KalmanBoxTracker.count += 1 89 | self.history = [] 90 | self.hits = 0 91 | self.hit_streak = 0 92 | self.age = 0 93 | 94 | def update(self, bbox): 95 | """ 96 | Updates the state vector with observed bbox. 97 | """ 98 | self.time_since_update = 0 99 | self.history = [] 100 | self.hits += 1 101 | self.hit_streak += 1 102 | self.kf.update(convert_bbox_to_z(bbox)) 103 | 104 | def predict(self): 105 | """ 106 | Advances the state vector and returns the predicted bounding box estimate. 107 | """ 108 | if ((self.kf.x[6] + self.kf.x[2]) <= 0): 109 | self.kf.x[6] *= 0.0 110 | self.kf.predict() 111 | self.age += 1 112 | if (self.time_since_update > 0): 113 | self.hit_streak = 0 114 | self.time_since_update += 1 115 | self.history.append(convert_x_to_bbox(self.kf.x)) 116 | return self.history[-1] 117 | 118 | def get_state(self): 119 | """ 120 | Returns the current bounding box estimate. 121 | """ 122 | return convert_x_to_bbox(self.kf.x) 123 | 124 | 125 | def associate_detections_to_trackers(detections, trackers, iou_threshold=0.3): 126 | """ 127 | Assigns detections to tracked object (both represented as bounding boxes) 128 | 129 | Returns 3 lists of matches, unmatched_detections and unmatched_trackers 130 | """ 131 | if (len(trackers) == 0): 132 | return np.empty((0, 2), dtype=int), np.arange(len(detections)), np.empty((0, 5), dtype=int) 133 | iou_matrix = np.zeros((len(detections), len(trackers)), dtype=np.float32) 134 | 135 | for d, det in enumerate(detections): 136 | for t, trk in enumerate(trackers): 137 | iou_matrix[d, t] = iou(det, trk) 138 | matched_indices = linear_sum_assignment(-iou_matrix) 139 | matched_indices = np.asarray(matched_indices) 140 | matched_indices = matched_indices.transpose() 141 | 142 | unmatched_detections = [] 143 | for d, det in enumerate(detections): 144 | if (d not in matched_indices[:, 0]): 145 | unmatched_detections.append(d) 146 | unmatched_trackers = [] 147 | for t, trk in enumerate(trackers): 148 | if (t not in matched_indices[:, 1]): 149 | unmatched_trackers.append(t) 150 | 151 | # filter out matched with low IOU 152 | matches = [] 153 | for m in matched_indices: 154 | if (iou_matrix[m[0], m[1]] < iou_threshold): 155 | unmatched_detections.append(m[0]) 156 | unmatched_trackers.append(m[1]) 157 | else: 158 | matches.append(m.reshape(1, 2)) 159 | if (len(matches) == 0): 160 | matches = np.empty((0, 2), dtype=int) 161 | else: 162 | matches = np.concatenate(matches, axis=0) 163 | 164 | return matches, np.array(unmatched_detections), np.array(unmatched_trackers) 165 | 166 | 167 | class Sort(object): 168 | def __init__(self, max_age=1, min_hits=3): 169 | """ 170 | Sets key parameters for SORT 171 | """ 172 | self.max_age = max_age 173 | self.min_hits = min_hits 174 | self.trackers = [] 175 | self.frame_count = 0 176 | 177 | def update(self, dets): 178 | """ 179 | Params: 180 | dets - a numpy array of detections in the format [[x1,y1,x2,y2,score],[x1,y1,x2,y2,score],...] 181 | Requires: this method must be called once for each frame even with empty detections. 182 | Returns the a similar array, where the last column is the object ID. 183 | 184 | NOTE: The number of objects returned may differ from the number of detections provided. 185 | """ 186 | self.frame_count += 1 187 | # get predicted locations from existing trackers. 188 | trks = np.zeros((len(self.trackers), 5)) 189 | to_del = [] 190 | ret = [] 191 | for t, trk in enumerate(trks): 192 | pos = self.trackers[t].predict()[0] 193 | trk[:] = [pos[0], pos[1], pos[2], pos[3], 0] 194 | if np.any(np.isnan(pos)): 195 | to_del.append(t) 196 | trks = np.ma.compress_rows(np.ma.masked_invalid(trks)) 197 | for t in reversed(to_del): 198 | self.trackers.pop(t) 199 | matched, unmatched_dets, unmatched_trks = associate_detections_to_trackers(dets, trks) 200 | 201 | # update matched trackers with assigned detections 202 | for t, trk in enumerate(self.trackers): 203 | if t not in unmatched_trks: 204 | d = matched[np.where(matched[:, 1] == t)[0], 0] # d: [n] 205 | trk.update(dets[d, :][0]) 206 | 207 | # create and initialise new trackers for unmatched detections 208 | for i in unmatched_dets: 209 | trk = KalmanBoxTracker(dets[i, :]) 210 | self.trackers.append(trk) 211 | i = len(self.trackers) 212 | for trk in reversed(self.trackers): 213 | d = trk.get_state()[0] 214 | if ((trk.time_since_update < 1) and (trk.hit_streak >= self.min_hits or self.frame_count <= self.min_hits)): 215 | ret.append(np.concatenate((d, [trk.id + 1])).reshape(1, -1)) # +1 as MOT benchmark requires positive 216 | i -= 1 217 | # remove dead tracklet 218 | if (trk.time_since_update > self.max_age): 219 | self.trackers.pop(i) 220 | if (len(ret) > 0): 221 | return np.concatenate(ret) 222 | return np.empty((0, 5)) 223 | 224 | 225 | def parse_args(): 226 | """Parse input arguments.""" 227 | parser = argparse.ArgumentParser(description='SORT demo') 228 | parser.add_argument('--display', dest='display', help='Display online tracker output (slow) [False]', 229 | action='store_true') 230 | args = parser.parse_args() 231 | return args 232 | -------------------------------------------------------------------------------- /demo/lib/yolov3/bbox.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | import torch 4 | import random 5 | import numpy as np 6 | import cv2 7 | 8 | 9 | def confidence_filter(result, confidence): 10 | conf_mask = (result[:,:,4] > confidence).float().unsqueeze(2) 11 | result = result*conf_mask 12 | 13 | return result 14 | 15 | 16 | def confidence_filter_cls(result, confidence): 17 | max_scores = torch.max(result[:,:,5:25], 2)[0] 18 | res = torch.cat((result, max_scores),2) 19 | print(res.shape) 20 | 21 | 22 | cond_1 = (res[:,:,4] > confidence).float() 23 | cond_2 = (res[:,:,25] > 0.995).float() 24 | 25 | conf = cond_1 + cond_2 26 | conf = torch.clamp(conf, 0.0, 1.0) 27 | conf = conf.unsqueeze(2) 28 | result = result*conf 29 | return result 30 | 31 | 32 | def get_abs_coord(box): 33 | box[2], box[3] = abs(box[2]), abs(box[3]) 34 | x1 = (box[0] - box[2]/2) - 1 35 | y1 = (box[1] - box[3]/2) - 1 36 | x2 = (box[0] + box[2]/2) - 1 37 | y2 = (box[1] + box[3]/2) - 1 38 | return x1, y1, x2, y2 39 | 40 | 41 | def sanity_fix(box): 42 | if (box[0] > box[2]): 43 | box[0], box[2] = box[2], box[0] 44 | 45 | if (box[1] > box[3]): 46 | box[1], box[3] = box[3], box[1] 47 | 48 | return box 49 | 50 | 51 | def bbox_iou(box1, box2): 52 | """ 53 | Returns the IoU of two bounding boxes 54 | 55 | """ 56 | # Get the coordinates of bounding boxes 57 | b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3] 58 | b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3] 59 | 60 | # get the corrdinates of the intersection rectangle 61 | inter_rect_x1 = torch.max(b1_x1, b2_x1) 62 | inter_rect_y1 = torch.max(b1_y1, b2_y1) 63 | inter_rect_x2 = torch.min(b1_x2, b2_x2) 64 | inter_rect_y2 = torch.min(b1_y2, b2_y2) 65 | 66 | # Intersection area 67 | if torch.cuda.is_available(): 68 | inter_area = torch.max(inter_rect_x2 - inter_rect_x1 + 1, torch.zeros(inter_rect_x2.shape).cuda())*torch.max(inter_rect_y2 - inter_rect_y1 + 1, torch.zeros(inter_rect_x2.shape).cuda()) 69 | else: 70 | inter_area = torch.max(inter_rect_x2 - inter_rect_x1 + 1, torch.zeros(inter_rect_x2.shape))*torch.max(inter_rect_y2 - inter_rect_y1 + 1, torch.zeros(inter_rect_x2.shape)) 71 | 72 | # Union Area 73 | b1_area = (b1_x2 - b1_x1 + 1)*(b1_y2 - b1_y1 + 1) 74 | b2_area = (b2_x2 - b2_x1 + 1)*(b2_y2 - b2_y1 + 1) 75 | 76 | iou = inter_area / (b1_area + b2_area - inter_area) 77 | 78 | return iou 79 | 80 | 81 | def pred_corner_coord(prediction): 82 | #Get indices of non-zero confidence bboxes 83 | ind_nz = torch.nonzero(prediction[:,:,4]).transpose(0,1).contiguous() 84 | 85 | box = prediction[ind_nz[0], ind_nz[1]] 86 | 87 | box_a = box.new(box.shape) 88 | box_a[:,0] = (box[:,0] - box[:,2]/2) 89 | box_a[:,1] = (box[:,1] - box[:,3]/2) 90 | box_a[:,2] = (box[:,0] + box[:,2]/2) 91 | box_a[:,3] = (box[:,1] + box[:,3]/2) 92 | box[:,:4] = box_a[:,:4] 93 | 94 | prediction[ind_nz[0], ind_nz[1]] = box 95 | 96 | return prediction 97 | 98 | 99 | def write(x, batches, results, colors, classes): 100 | c1 = tuple(x[1:3].int()) 101 | c2 = tuple(x[3:5].int()) 102 | img = results[int(x[0])] 103 | cls = int(x[-1]) 104 | label = "{0}".format(classes[cls]) 105 | color = random.choice(colors) 106 | cv2.rectangle(img, c1, c2,color, 1) 107 | t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1 , 1)[0] 108 | c2 = c1[0] + t_size[0] + 3, c1[1] + t_size[1] + 4 109 | cv2.rectangle(img, c1, c2,color, -1) 110 | cv2.putText(img, label, (c1[0], c1[1] + t_size[1] + 4), cv2.FONT_HERSHEY_PLAIN, 1, [225,255,255], 1); 111 | return img 112 | -------------------------------------------------------------------------------- /demo/lib/yolov3/cfg/tiny-yolo-voc.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | batch=64 3 | subdivisions=8 4 | width=416 5 | height=416 6 | channels=3 7 | momentum=0.9 8 | decay=0.0005 9 | angle=0 10 | saturation = 1.5 11 | exposure = 1.5 12 | hue=.1 13 | 14 | learning_rate=0.001 15 | max_batches = 40200 16 | policy=steps 17 | steps=-1,100,20000,30000 18 | scales=.1,10,.1,.1 19 | 20 | [convolutional] 21 | batch_normalize=1 22 | filters=16 23 | size=3 24 | stride=1 25 | pad=1 26 | activation=leaky 27 | 28 | [maxpool] 29 | size=2 30 | stride=2 31 | 32 | [convolutional] 33 | batch_normalize=1 34 | filters=32 35 | size=3 36 | stride=1 37 | pad=1 38 | activation=leaky 39 | 40 | [maxpool] 41 | size=2 42 | stride=2 43 | 44 | [convolutional] 45 | batch_normalize=1 46 | filters=64 47 | size=3 48 | stride=1 49 | pad=1 50 | activation=leaky 51 | 52 | [maxpool] 53 | size=2 54 | stride=2 55 | 56 | [convolutional] 57 | batch_normalize=1 58 | filters=128 59 | size=3 60 | stride=1 61 | pad=1 62 | activation=leaky 63 | 64 | [maxpool] 65 | size=2 66 | stride=2 67 | 68 | [convolutional] 69 | batch_normalize=1 70 | filters=256 71 | size=3 72 | stride=1 73 | pad=1 74 | activation=leaky 75 | 76 | [maxpool] 77 | size=2 78 | stride=2 79 | 80 | [convolutional] 81 | batch_normalize=1 82 | filters=512 83 | size=3 84 | stride=1 85 | pad=1 86 | activation=leaky 87 | 88 | [maxpool] 89 | size=2 90 | stride=1 91 | 92 | [convolutional] 93 | batch_normalize=1 94 | filters=1024 95 | size=3 96 | stride=1 97 | pad=1 98 | activation=leaky 99 | 100 | ########### 101 | 102 | [convolutional] 103 | batch_normalize=1 104 | size=3 105 | stride=1 106 | pad=1 107 | filters=1024 108 | activation=leaky 109 | 110 | [convolutional] 111 | size=1 112 | stride=1 113 | pad=1 114 | filters=125 115 | activation=linear 116 | 117 | [region] 118 | anchors = 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52 119 | bias_match=1 120 | classes=20 121 | coords=4 122 | num=5 123 | softmax=1 124 | jitter=.2 125 | rescore=1 126 | 127 | object_scale=5 128 | noobject_scale=1 129 | class_scale=1 130 | coord_scale=1 131 | 132 | absolute=1 133 | thresh = .6 134 | random=1 135 | -------------------------------------------------------------------------------- /demo/lib/yolov3/cfg/yolo-voc.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | batch=64 4 | subdivisions=8 5 | # Training 6 | # batch=64 7 | # subdivisions=8 8 | height=416 9 | width=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 80200 21 | policy=steps 22 | steps=-1,500,40000,60000 23 | scales=0.1,10,.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | [maxpool] 34 | size=2 35 | stride=2 36 | 37 | [convolutional] 38 | batch_normalize=1 39 | filters=64 40 | size=3 41 | stride=1 42 | pad=1 43 | activation=leaky 44 | 45 | [maxpool] 46 | size=2 47 | stride=2 48 | 49 | [convolutional] 50 | batch_normalize=1 51 | filters=128 52 | size=3 53 | stride=1 54 | pad=1 55 | activation=leaky 56 | 57 | [convolutional] 58 | batch_normalize=1 59 | filters=64 60 | size=1 61 | stride=1 62 | pad=1 63 | activation=leaky 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=1 70 | pad=1 71 | activation=leaky 72 | 73 | [maxpool] 74 | size=2 75 | stride=2 76 | 77 | [convolutional] 78 | batch_normalize=1 79 | filters=256 80 | size=3 81 | stride=1 82 | pad=1 83 | activation=leaky 84 | 85 | [convolutional] 86 | batch_normalize=1 87 | filters=128 88 | size=1 89 | stride=1 90 | pad=1 91 | activation=leaky 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=256 96 | size=3 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [maxpool] 102 | size=2 103 | stride=2 104 | 105 | [convolutional] 106 | batch_normalize=1 107 | filters=512 108 | size=3 109 | stride=1 110 | pad=1 111 | activation=leaky 112 | 113 | [convolutional] 114 | batch_normalize=1 115 | filters=256 116 | size=1 117 | stride=1 118 | pad=1 119 | activation=leaky 120 | 121 | [convolutional] 122 | batch_normalize=1 123 | filters=512 124 | size=3 125 | stride=1 126 | pad=1 127 | activation=leaky 128 | 129 | [convolutional] 130 | batch_normalize=1 131 | filters=256 132 | size=1 133 | stride=1 134 | pad=1 135 | activation=leaky 136 | 137 | [convolutional] 138 | batch_normalize=1 139 | filters=512 140 | size=3 141 | stride=1 142 | pad=1 143 | activation=leaky 144 | 145 | [maxpool] 146 | size=2 147 | stride=2 148 | 149 | [convolutional] 150 | batch_normalize=1 151 | filters=1024 152 | size=3 153 | stride=1 154 | pad=1 155 | activation=leaky 156 | 157 | [convolutional] 158 | batch_normalize=1 159 | filters=512 160 | size=1 161 | stride=1 162 | pad=1 163 | activation=leaky 164 | 165 | [convolutional] 166 | batch_normalize=1 167 | filters=1024 168 | size=3 169 | stride=1 170 | pad=1 171 | activation=leaky 172 | 173 | [convolutional] 174 | batch_normalize=1 175 | filters=512 176 | size=1 177 | stride=1 178 | pad=1 179 | activation=leaky 180 | 181 | [convolutional] 182 | batch_normalize=1 183 | filters=1024 184 | size=3 185 | stride=1 186 | pad=1 187 | activation=leaky 188 | 189 | 190 | ####### 191 | 192 | [convolutional] 193 | batch_normalize=1 194 | size=3 195 | stride=1 196 | pad=1 197 | filters=1024 198 | activation=leaky 199 | 200 | [convolutional] 201 | batch_normalize=1 202 | size=3 203 | stride=1 204 | pad=1 205 | filters=1024 206 | activation=leaky 207 | 208 | [route] 209 | layers=-9 210 | 211 | [convolutional] 212 | batch_normalize=1 213 | size=1 214 | stride=1 215 | pad=1 216 | filters=64 217 | activation=leaky 218 | 219 | [reorg] 220 | stride=2 221 | 222 | [route] 223 | layers=-1,-4 224 | 225 | [convolutional] 226 | batch_normalize=1 227 | size=3 228 | stride=1 229 | pad=1 230 | filters=1024 231 | activation=leaky 232 | 233 | [convolutional] 234 | size=1 235 | stride=1 236 | pad=1 237 | filters=125 238 | activation=linear 239 | 240 | 241 | [region] 242 | anchors = 1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071 243 | bias_match=1 244 | classes=20 245 | coords=4 246 | num=5 247 | softmax=1 248 | jitter=.3 249 | rescore=1 250 | 251 | object_scale=5 252 | noobject_scale=1 253 | class_scale=1 254 | coord_scale=1 255 | 256 | absolute=1 257 | thresh = .6 258 | random=1 259 | -------------------------------------------------------------------------------- /demo/lib/yolov3/cfg/yolo.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | batch=1 4 | subdivisions=1 5 | # Training 6 | # batch=64 7 | # subdivisions=8 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | [maxpool] 34 | size=2 35 | stride=2 36 | 37 | [convolutional] 38 | batch_normalize=1 39 | filters=64 40 | size=3 41 | stride=1 42 | pad=1 43 | activation=leaky 44 | 45 | [maxpool] 46 | size=2 47 | stride=2 48 | 49 | [convolutional] 50 | batch_normalize=1 51 | filters=128 52 | size=3 53 | stride=1 54 | pad=1 55 | activation=leaky 56 | 57 | [convolutional] 58 | batch_normalize=1 59 | filters=64 60 | size=1 61 | stride=1 62 | pad=1 63 | activation=leaky 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=1 70 | pad=1 71 | activation=leaky 72 | 73 | [maxpool] 74 | size=2 75 | stride=2 76 | 77 | [convolutional] 78 | batch_normalize=1 79 | filters=256 80 | size=3 81 | stride=1 82 | pad=1 83 | activation=leaky 84 | 85 | [convolutional] 86 | batch_normalize=1 87 | filters=128 88 | size=1 89 | stride=1 90 | pad=1 91 | activation=leaky 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=256 96 | size=3 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [maxpool] 102 | size=2 103 | stride=2 104 | 105 | [convolutional] 106 | batch_normalize=1 107 | filters=512 108 | size=3 109 | stride=1 110 | pad=1 111 | activation=leaky 112 | 113 | [convolutional] 114 | batch_normalize=1 115 | filters=256 116 | size=1 117 | stride=1 118 | pad=1 119 | activation=leaky 120 | 121 | [convolutional] 122 | batch_normalize=1 123 | filters=512 124 | size=3 125 | stride=1 126 | pad=1 127 | activation=leaky 128 | 129 | [convolutional] 130 | batch_normalize=1 131 | filters=256 132 | size=1 133 | stride=1 134 | pad=1 135 | activation=leaky 136 | 137 | [convolutional] 138 | batch_normalize=1 139 | filters=512 140 | size=3 141 | stride=1 142 | pad=1 143 | activation=leaky 144 | 145 | [maxpool] 146 | size=2 147 | stride=2 148 | 149 | [convolutional] 150 | batch_normalize=1 151 | filters=1024 152 | size=3 153 | stride=1 154 | pad=1 155 | activation=leaky 156 | 157 | [convolutional] 158 | batch_normalize=1 159 | filters=512 160 | size=1 161 | stride=1 162 | pad=1 163 | activation=leaky 164 | 165 | [convolutional] 166 | batch_normalize=1 167 | filters=1024 168 | size=3 169 | stride=1 170 | pad=1 171 | activation=leaky 172 | 173 | [convolutional] 174 | batch_normalize=1 175 | filters=512 176 | size=1 177 | stride=1 178 | pad=1 179 | activation=leaky 180 | 181 | [convolutional] 182 | batch_normalize=1 183 | filters=1024 184 | size=3 185 | stride=1 186 | pad=1 187 | activation=leaky 188 | 189 | 190 | ####### 191 | 192 | [convolutional] 193 | batch_normalize=1 194 | size=3 195 | stride=1 196 | pad=1 197 | filters=1024 198 | activation=leaky 199 | 200 | [convolutional] 201 | batch_normalize=1 202 | size=3 203 | stride=1 204 | pad=1 205 | filters=1024 206 | activation=leaky 207 | 208 | [route] 209 | layers=-9 210 | 211 | [convolutional] 212 | batch_normalize=1 213 | size=1 214 | stride=1 215 | pad=1 216 | filters=64 217 | activation=leaky 218 | 219 | [reorg] 220 | stride=2 221 | 222 | [route] 223 | layers=-1,-4 224 | 225 | [convolutional] 226 | batch_normalize=1 227 | size=3 228 | stride=1 229 | pad=1 230 | filters=1024 231 | activation=leaky 232 | 233 | [convolutional] 234 | size=1 235 | stride=1 236 | pad=1 237 | filters=425 238 | activation=linear 239 | 240 | 241 | [region] 242 | anchors = 0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828 243 | bias_match=1 244 | classes=80 245 | coords=4 246 | num=5 247 | softmax=1 248 | jitter=.3 249 | rescore=1 250 | 251 | object_scale=5 252 | noobject_scale=1 253 | class_scale=1 254 | coord_scale=1 255 | 256 | absolute=1 257 | thresh = .6 258 | random=1 259 | -------------------------------------------------------------------------------- /demo/lib/yolov3/cfg/yolov3.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | batch=1 4 | subdivisions=1 5 | # Training 6 | # batch=64 7 | # subdivisions=16 8 | width= 320 9 | height = 320 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | # Downsample 34 | 35 | [convolutional] 36 | batch_normalize=1 37 | filters=64 38 | size=3 39 | stride=2 40 | pad=1 41 | activation=leaky 42 | 43 | [convolutional] 44 | batch_normalize=1 45 | filters=32 46 | size=1 47 | stride=1 48 | pad=1 49 | activation=leaky 50 | 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | [shortcut] 60 | from=-3 61 | activation=linear 62 | 63 | # Downsample 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=2 70 | pad=1 71 | activation=leaky 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=1 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=128 84 | size=3 85 | stride=1 86 | pad=1 87 | activation=leaky 88 | 89 | [shortcut] 90 | from=-3 91 | activation=linear 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=64 96 | size=1 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [convolutional] 102 | batch_normalize=1 103 | filters=128 104 | size=3 105 | stride=1 106 | pad=1 107 | activation=leaky 108 | 109 | [shortcut] 110 | from=-3 111 | activation=linear 112 | 113 | # Downsample 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=256 118 | size=3 119 | stride=2 120 | pad=1 121 | activation=leaky 122 | 123 | [convolutional] 124 | batch_normalize=1 125 | filters=128 126 | size=1 127 | stride=1 128 | pad=1 129 | activation=leaky 130 | 131 | [convolutional] 132 | batch_normalize=1 133 | filters=256 134 | size=3 135 | stride=1 136 | pad=1 137 | activation=leaky 138 | 139 | [shortcut] 140 | from=-3 141 | activation=linear 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=128 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [convolutional] 152 | batch_normalize=1 153 | filters=256 154 | size=3 155 | stride=1 156 | pad=1 157 | activation=leaky 158 | 159 | [shortcut] 160 | from=-3 161 | activation=linear 162 | 163 | [convolutional] 164 | batch_normalize=1 165 | filters=128 166 | size=1 167 | stride=1 168 | pad=1 169 | activation=leaky 170 | 171 | [convolutional] 172 | batch_normalize=1 173 | filters=256 174 | size=3 175 | stride=1 176 | pad=1 177 | activation=leaky 178 | 179 | [shortcut] 180 | from=-3 181 | activation=linear 182 | 183 | [convolutional] 184 | batch_normalize=1 185 | filters=128 186 | size=1 187 | stride=1 188 | pad=1 189 | activation=leaky 190 | 191 | [convolutional] 192 | batch_normalize=1 193 | filters=256 194 | size=3 195 | stride=1 196 | pad=1 197 | activation=leaky 198 | 199 | [shortcut] 200 | from=-3 201 | activation=linear 202 | 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=128 207 | size=1 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [convolutional] 213 | batch_normalize=1 214 | filters=256 215 | size=3 216 | stride=1 217 | pad=1 218 | activation=leaky 219 | 220 | [shortcut] 221 | from=-3 222 | activation=linear 223 | 224 | [convolutional] 225 | batch_normalize=1 226 | filters=128 227 | size=1 228 | stride=1 229 | pad=1 230 | activation=leaky 231 | 232 | [convolutional] 233 | batch_normalize=1 234 | filters=256 235 | size=3 236 | stride=1 237 | pad=1 238 | activation=leaky 239 | 240 | [shortcut] 241 | from=-3 242 | activation=linear 243 | 244 | [convolutional] 245 | batch_normalize=1 246 | filters=128 247 | size=1 248 | stride=1 249 | pad=1 250 | activation=leaky 251 | 252 | [convolutional] 253 | batch_normalize=1 254 | filters=256 255 | size=3 256 | stride=1 257 | pad=1 258 | activation=leaky 259 | 260 | [shortcut] 261 | from=-3 262 | activation=linear 263 | 264 | [convolutional] 265 | batch_normalize=1 266 | filters=128 267 | size=1 268 | stride=1 269 | pad=1 270 | activation=leaky 271 | 272 | [convolutional] 273 | batch_normalize=1 274 | filters=256 275 | size=3 276 | stride=1 277 | pad=1 278 | activation=leaky 279 | 280 | [shortcut] 281 | from=-3 282 | activation=linear 283 | 284 | # Downsample 285 | 286 | [convolutional] 287 | batch_normalize=1 288 | filters=512 289 | size=3 290 | stride=2 291 | pad=1 292 | activation=leaky 293 | 294 | [convolutional] 295 | batch_normalize=1 296 | filters=256 297 | size=1 298 | stride=1 299 | pad=1 300 | activation=leaky 301 | 302 | [convolutional] 303 | batch_normalize=1 304 | filters=512 305 | size=3 306 | stride=1 307 | pad=1 308 | activation=leaky 309 | 310 | [shortcut] 311 | from=-3 312 | activation=linear 313 | 314 | 315 | [convolutional] 316 | batch_normalize=1 317 | filters=256 318 | size=1 319 | stride=1 320 | pad=1 321 | activation=leaky 322 | 323 | [convolutional] 324 | batch_normalize=1 325 | filters=512 326 | size=3 327 | stride=1 328 | pad=1 329 | activation=leaky 330 | 331 | [shortcut] 332 | from=-3 333 | activation=linear 334 | 335 | 336 | [convolutional] 337 | batch_normalize=1 338 | filters=256 339 | size=1 340 | stride=1 341 | pad=1 342 | activation=leaky 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=512 347 | size=3 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [shortcut] 353 | from=-3 354 | activation=linear 355 | 356 | 357 | [convolutional] 358 | batch_normalize=1 359 | filters=256 360 | size=1 361 | stride=1 362 | pad=1 363 | activation=leaky 364 | 365 | [convolutional] 366 | batch_normalize=1 367 | filters=512 368 | size=3 369 | stride=1 370 | pad=1 371 | activation=leaky 372 | 373 | [shortcut] 374 | from=-3 375 | activation=linear 376 | 377 | [convolutional] 378 | batch_normalize=1 379 | filters=256 380 | size=1 381 | stride=1 382 | pad=1 383 | activation=leaky 384 | 385 | [convolutional] 386 | batch_normalize=1 387 | filters=512 388 | size=3 389 | stride=1 390 | pad=1 391 | activation=leaky 392 | 393 | [shortcut] 394 | from=-3 395 | activation=linear 396 | 397 | 398 | [convolutional] 399 | batch_normalize=1 400 | filters=256 401 | size=1 402 | stride=1 403 | pad=1 404 | activation=leaky 405 | 406 | [convolutional] 407 | batch_normalize=1 408 | filters=512 409 | size=3 410 | stride=1 411 | pad=1 412 | activation=leaky 413 | 414 | [shortcut] 415 | from=-3 416 | activation=linear 417 | 418 | 419 | [convolutional] 420 | batch_normalize=1 421 | filters=256 422 | size=1 423 | stride=1 424 | pad=1 425 | activation=leaky 426 | 427 | [convolutional] 428 | batch_normalize=1 429 | filters=512 430 | size=3 431 | stride=1 432 | pad=1 433 | activation=leaky 434 | 435 | [shortcut] 436 | from=-3 437 | activation=linear 438 | 439 | [convolutional] 440 | batch_normalize=1 441 | filters=256 442 | size=1 443 | stride=1 444 | pad=1 445 | activation=leaky 446 | 447 | [convolutional] 448 | batch_normalize=1 449 | filters=512 450 | size=3 451 | stride=1 452 | pad=1 453 | activation=leaky 454 | 455 | [shortcut] 456 | from=-3 457 | activation=linear 458 | 459 | # Downsample 460 | 461 | [convolutional] 462 | batch_normalize=1 463 | filters=1024 464 | size=3 465 | stride=2 466 | pad=1 467 | activation=leaky 468 | 469 | [convolutional] 470 | batch_normalize=1 471 | filters=512 472 | size=1 473 | stride=1 474 | pad=1 475 | activation=leaky 476 | 477 | [convolutional] 478 | batch_normalize=1 479 | filters=1024 480 | size=3 481 | stride=1 482 | pad=1 483 | activation=leaky 484 | 485 | [shortcut] 486 | from=-3 487 | activation=linear 488 | 489 | [convolutional] 490 | batch_normalize=1 491 | filters=512 492 | size=1 493 | stride=1 494 | pad=1 495 | activation=leaky 496 | 497 | [convolutional] 498 | batch_normalize=1 499 | filters=1024 500 | size=3 501 | stride=1 502 | pad=1 503 | activation=leaky 504 | 505 | [shortcut] 506 | from=-3 507 | activation=linear 508 | 509 | [convolutional] 510 | batch_normalize=1 511 | filters=512 512 | size=1 513 | stride=1 514 | pad=1 515 | activation=leaky 516 | 517 | [convolutional] 518 | batch_normalize=1 519 | filters=1024 520 | size=3 521 | stride=1 522 | pad=1 523 | activation=leaky 524 | 525 | [shortcut] 526 | from=-3 527 | activation=linear 528 | 529 | [convolutional] 530 | batch_normalize=1 531 | filters=512 532 | size=1 533 | stride=1 534 | pad=1 535 | activation=leaky 536 | 537 | [convolutional] 538 | batch_normalize=1 539 | filters=1024 540 | size=3 541 | stride=1 542 | pad=1 543 | activation=leaky 544 | 545 | [shortcut] 546 | from=-3 547 | activation=linear 548 | 549 | ###################### 550 | 551 | [convolutional] 552 | batch_normalize=1 553 | filters=512 554 | size=1 555 | stride=1 556 | pad=1 557 | activation=leaky 558 | 559 | [convolutional] 560 | batch_normalize=1 561 | size=3 562 | stride=1 563 | pad=1 564 | filters=1024 565 | activation=leaky 566 | 567 | [convolutional] 568 | batch_normalize=1 569 | filters=512 570 | size=1 571 | stride=1 572 | pad=1 573 | activation=leaky 574 | 575 | [convolutional] 576 | batch_normalize=1 577 | size=3 578 | stride=1 579 | pad=1 580 | filters=1024 581 | activation=leaky 582 | 583 | [convolutional] 584 | batch_normalize=1 585 | filters=512 586 | size=1 587 | stride=1 588 | pad=1 589 | activation=leaky 590 | 591 | [convolutional] 592 | batch_normalize=1 593 | size=3 594 | stride=1 595 | pad=1 596 | filters=1024 597 | activation=leaky 598 | 599 | [convolutional] 600 | size=1 601 | stride=1 602 | pad=1 603 | filters=255 604 | activation=linear 605 | 606 | 607 | [yolo] 608 | mask = 6,7,8 609 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 610 | classes=80 611 | num=9 612 | jitter=.3 613 | ignore_thresh = .5 614 | truth_thresh = 1 615 | random=1 616 | 617 | 618 | [route] 619 | layers = -4 620 | 621 | [convolutional] 622 | batch_normalize=1 623 | filters=256 624 | size=1 625 | stride=1 626 | pad=1 627 | activation=leaky 628 | 629 | [upsample] 630 | stride=2 631 | 632 | [route] 633 | layers = -1, 61 634 | 635 | 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=256 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=leaky 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=512 651 | activation=leaky 652 | 653 | [convolutional] 654 | batch_normalize=1 655 | filters=256 656 | size=1 657 | stride=1 658 | pad=1 659 | activation=leaky 660 | 661 | [convolutional] 662 | batch_normalize=1 663 | size=3 664 | stride=1 665 | pad=1 666 | filters=512 667 | activation=leaky 668 | 669 | [convolutional] 670 | batch_normalize=1 671 | filters=256 672 | size=1 673 | stride=1 674 | pad=1 675 | activation=leaky 676 | 677 | [convolutional] 678 | batch_normalize=1 679 | size=3 680 | stride=1 681 | pad=1 682 | filters=512 683 | activation=leaky 684 | 685 | [convolutional] 686 | size=1 687 | stride=1 688 | pad=1 689 | filters=255 690 | activation=linear 691 | 692 | 693 | [yolo] 694 | mask = 3,4,5 695 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 696 | classes=80 697 | num=9 698 | jitter=.3 699 | ignore_thresh = .5 700 | truth_thresh = 1 701 | random=1 702 | 703 | 704 | 705 | [route] 706 | layers = -4 707 | 708 | [convolutional] 709 | batch_normalize=1 710 | filters=128 711 | size=1 712 | stride=1 713 | pad=1 714 | activation=leaky 715 | 716 | [upsample] 717 | stride=2 718 | 719 | [route] 720 | layers = -1, 36 721 | 722 | 723 | 724 | [convolutional] 725 | batch_normalize=1 726 | filters=128 727 | size=1 728 | stride=1 729 | pad=1 730 | activation=leaky 731 | 732 | [convolutional] 733 | batch_normalize=1 734 | size=3 735 | stride=1 736 | pad=1 737 | filters=256 738 | activation=leaky 739 | 740 | [convolutional] 741 | batch_normalize=1 742 | filters=128 743 | size=1 744 | stride=1 745 | pad=1 746 | activation=leaky 747 | 748 | [convolutional] 749 | batch_normalize=1 750 | size=3 751 | stride=1 752 | pad=1 753 | filters=256 754 | activation=leaky 755 | 756 | [convolutional] 757 | batch_normalize=1 758 | filters=128 759 | size=1 760 | stride=1 761 | pad=1 762 | activation=leaky 763 | 764 | [convolutional] 765 | batch_normalize=1 766 | size=3 767 | stride=1 768 | pad=1 769 | filters=256 770 | activation=leaky 771 | 772 | [convolutional] 773 | size=1 774 | stride=1 775 | pad=1 776 | filters=255 777 | activation=linear 778 | 779 | 780 | [yolo] 781 | mask = 0,1,2 782 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 783 | classes=80 784 | num=9 785 | jitter=.3 786 | ignore_thresh = .5 787 | truth_thresh = 1 788 | random=1 789 | 790 | -------------------------------------------------------------------------------- /demo/lib/yolov3/darknet.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | import numpy as np 7 | import cv2 8 | import os 9 | import sys 10 | 11 | from lib.yolov3.util import convert2cpu as cpu 12 | from lib.yolov3.util import predict_transform 13 | 14 | 15 | class test_net(nn.Module): 16 | def __init__(self, num_layers, input_size): 17 | super(test_net, self).__init__() 18 | self.num_layers= num_layers 19 | self.linear_1 = nn.Linear(input_size, 5) 20 | self.middle = nn.ModuleList([nn.Linear(5,5) for x in range(num_layers)]) 21 | self.output = nn.Linear(5,2) 22 | 23 | def forward(self, x): 24 | x = x.view(-1) 25 | fwd = nn.Sequential(self.linear_1, *self.middle, self.output) 26 | return fwd(x) 27 | 28 | 29 | def get_test_input(): 30 | img = cv2.imread("dog-cycle-car.png") 31 | img = cv2.resize(img, (416, 416)) 32 | img_ = img[:, :, ::-1].transpose((2, 0, 1)) 33 | img_ = img_[np.newaxis, :, :, :]/255.0 34 | img_ = torch.from_numpy(img_).float() 35 | return img_ 36 | 37 | 38 | def parse_cfg(cfgfile): 39 | """ 40 | Takes a configuration file 41 | 42 | Returns a list of blocks. Each blocks describes a block in the neural 43 | network to be built. Block is represented as a dictionary in the list 44 | 45 | """ 46 | # cfgfile = os.path.join(sys.path[-1], cfgfile) 47 | file = open(cfgfile, 'r') 48 | lines = file.read().split('\n') # store the lines in a list 49 | lines = [x for x in lines if len(x) > 0] # get read of the empty lines 50 | lines = [x for x in lines if x[0] != '#'] 51 | lines = [x.rstrip().lstrip() for x in lines] 52 | 53 | block = {} 54 | blocks = [] 55 | 56 | for line in lines: 57 | if line[0] == "[": # This marks the start of a new block 58 | if len(block) != 0: 59 | blocks.append(block) 60 | block = {} 61 | block["type"] = line[1:-1].rstrip() 62 | else: 63 | key,value = line.split("=") 64 | block[key.rstrip()] = value.lstrip() 65 | blocks.append(block) 66 | 67 | return blocks 68 | 69 | 70 | class MaxPoolStride1(nn.Module): 71 | def __init__(self, kernel_size): 72 | super(MaxPoolStride1, self).__init__() 73 | self.kernel_size = kernel_size 74 | self.pad = kernel_size - 1 75 | 76 | def forward(self, x): 77 | padded_x = F.pad(x, (0, self.pad, 0, self.pad), mode="replicate") 78 | pooled_x = nn.MaxPool2d(self.kernel_size, self.pad)(padded_x) 79 | return pooled_x 80 | 81 | 82 | class EmptyLayer(nn.Module): 83 | def __init__(self): 84 | super(EmptyLayer, self).__init__() 85 | 86 | 87 | class DetectionLayer(nn.Module): 88 | def __init__(self, anchors): 89 | super(DetectionLayer, self).__init__() 90 | self.anchors = anchors 91 | 92 | def forward(self, x, inp_dim, num_classes, confidence): 93 | x = x.data 94 | global CUDA 95 | prediction = x 96 | prediction = predict_transform(prediction, inp_dim, self.anchors, num_classes, confidence, CUDA) 97 | return prediction 98 | 99 | 100 | class Upsample(nn.Module): 101 | def __init__(self, stride=2): 102 | super(Upsample, self).__init__() 103 | self.stride = stride 104 | 105 | def forward(self, x): 106 | stride = self.stride 107 | assert(x.data.dim() == 4) 108 | B = x.data.size(0) 109 | C = x.data.size(1) 110 | H = x.data.size(2) 111 | W = x.data.size(3) 112 | ws = stride 113 | hs = stride 114 | x = x.view(B, C, H, 1, W, 1).expand(B, C, H, stride, W, stride).contiguous().view(B, C, H*stride, W*stride) 115 | return x 116 | 117 | 118 | class ReOrgLayer(nn.Module): 119 | def __init__(self, stride=2): 120 | super(ReOrgLayer, self).__init__() 121 | self.stride= stride 122 | 123 | def forward(self, x): 124 | assert(x.data.dim() == 4) 125 | B, C, H, W = x.data.shape 126 | hs = self.stride 127 | ws = self.stride 128 | assert(H % hs == 0), "The stride " + str(self.stride) + " is not a proper divisor of height " + str(H) 129 | assert(W % ws == 0), "The stride " + str(self.stride) + " is not a proper divisor of height " + str(W) 130 | x = x.view(B, C, H // hs, hs, W // ws, ws).transpose(-2, -3).contiguous() 131 | x = x.view(B, C, H // hs * W // ws, hs, ws) 132 | x = x.view(B, C, H // hs * W // ws, hs*ws).transpose(-1, -2).contiguous() 133 | x = x.view(B, C, ws*hs, H // ws, W // ws).transpose(1, 2).contiguous() 134 | x = x.view(B, C*ws*hs, H // ws, W // ws) 135 | return x 136 | 137 | 138 | def create_modules(blocks): 139 | net_info = blocks[0] # Captures the information about the input and pre-processing 140 | 141 | module_list = nn.ModuleList() 142 | 143 | index = 0 # indexing blocks helps with implementing route layers (skip connections) 144 | prev_filters = 3 145 | output_filters = [] 146 | 147 | for x in blocks: 148 | module = nn.Sequential() 149 | if x["type"] == "net": 150 | continue 151 | 152 | # If it's a convolutional layer 153 | if x["type"] == "convolutional": 154 | # Get the info about the layer 155 | activation = x["activation"] 156 | try: 157 | batch_normalize = int(x["batch_normalize"]) 158 | bias = False 159 | except: 160 | batch_normalize = 0 161 | bias = True 162 | 163 | filters= int(x["filters"]) 164 | padding = int(x["pad"]) 165 | kernel_size = int(x["size"]) 166 | stride = int(x["stride"]) 167 | 168 | if padding: 169 | pad = (kernel_size - 1) // 2 170 | else: 171 | pad = 0 172 | 173 | # Add the convolutional layer 174 | conv = nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias = bias) 175 | module.add_module("conv_{0}".format(index), conv) 176 | 177 | # Add the Batch Norm Layer 178 | if batch_normalize: 179 | bn = nn.BatchNorm2d(filters) 180 | module.add_module("batch_norm_{0}".format(index), bn) 181 | 182 | # Check the activation. 183 | # It is either Linear or a Leaky ReLU for YOLO 184 | if activation == "leaky": 185 | activn = nn.LeakyReLU(0.1, inplace = True) 186 | module.add_module("leaky_{0}".format(index), activn) 187 | 188 | # If it's an upsampling layer 189 | # We use Bilinear2dUpsampling 190 | 191 | elif x["type"] == "upsample": 192 | stride = int(x["stride"]) 193 | # upsample = Upsample(stride) 194 | upsample = nn.Upsample(scale_factor=2, mode="nearest") 195 | module.add_module("upsample_{}".format(index), upsample) 196 | 197 | # If it is a route layer 198 | elif (x["type"] == "route"): 199 | x["layers"] = x["layers"].split(',') 200 | 201 | # Start of a route 202 | start = int(x["layers"][0]) 203 | 204 | # end, if there exists one. 205 | try: 206 | end = int(x["layers"][1]) 207 | except: 208 | end = 0 209 | 210 | # Positive anotation 211 | if start > 0: 212 | start = start - index 213 | 214 | if end > 0: 215 | end = end - index 216 | 217 | route = EmptyLayer() 218 | module.add_module("route_{0}".format(index), route) 219 | 220 | if end < 0: 221 | filters = output_filters[index + start] + output_filters[index + end] 222 | else: 223 | filters = output_filters[index + start] 224 | 225 | # shortcut corresponds to skip connection 226 | elif x["type"] == "shortcut": 227 | from_ = int(x["from"]) 228 | shortcut = EmptyLayer() 229 | module.add_module("shortcut_{}".format(index), shortcut) 230 | 231 | elif x["type"] == "maxpool": 232 | stride = int(x["stride"]) 233 | size = int(x["size"]) 234 | if stride != 1: 235 | maxpool = nn.MaxPool2d(size, stride) 236 | else: 237 | maxpool = MaxPoolStride1(size) 238 | 239 | module.add_module("maxpool_{}".format(index), maxpool) 240 | 241 | # Yolo is the detection layer 242 | elif x["type"] == "yolo": 243 | mask = x["mask"].split(",") 244 | mask = [int(x) for x in mask] 245 | 246 | anchors = x["anchors"].split(",") 247 | anchors = [int(a) for a in anchors] 248 | anchors = [(anchors[i], anchors[i+1]) for i in range(0, len(anchors),2)] 249 | anchors = [anchors[i] for i in mask] 250 | 251 | detection = DetectionLayer(anchors) 252 | module.add_module("Detection_{}".format(index), detection) 253 | 254 | else: 255 | print("Something I dunno") 256 | assert False 257 | 258 | module_list.append(module) 259 | prev_filters = filters 260 | output_filters.append(filters) 261 | index += 1 262 | 263 | return (net_info, module_list) 264 | 265 | 266 | class Darknet(nn.Module): 267 | def __init__(self, cfgfile): 268 | super(Darknet, self).__init__() 269 | self.blocks = parse_cfg(cfgfile) 270 | self.net_info, self.module_list = create_modules(self.blocks) 271 | self.header = torch.IntTensor([0, 0, 0, 0]) 272 | self.seen = 0 273 | 274 | def get_blocks(self): 275 | return self.blocks 276 | 277 | def get_module_list(self): 278 | return self.module_list 279 | 280 | def forward(self, x, CUDA): 281 | detections = [] 282 | modules = self.blocks[1:] 283 | outputs = {} # We cache the outputs for the route layer 284 | 285 | write = 0 286 | for i in range(len(modules)): 287 | 288 | module_type = (modules[i]["type"]) 289 | if module_type == "convolutional" or module_type == "upsample" or module_type == "maxpool": 290 | 291 | x = self.module_list[i](x) 292 | outputs[i] = x 293 | 294 | elif module_type == "route": 295 | layers = modules[i]["layers"] 296 | layers = [int(a) for a in layers] 297 | 298 | if (layers[0]) > 0: 299 | layers[0] = layers[0] - i 300 | 301 | if len(layers) == 1: 302 | x = outputs[i + (layers[0])] 303 | 304 | else: 305 | if (layers[1]) > 0: 306 | layers[1] = layers[1] - i 307 | 308 | map1 = outputs[i + layers[0]] 309 | map2 = outputs[i + layers[1]] 310 | 311 | x = torch.cat((map1, map2), 1) 312 | outputs[i] = x 313 | 314 | elif module_type == "shortcut": 315 | from_ = int(modules[i]["from"]) 316 | x = outputs[i-1] + outputs[i+from_] 317 | outputs[i] = x 318 | 319 | elif module_type == 'yolo': 320 | 321 | anchors = self.module_list[i][0].anchors 322 | # Get the input dimensions 323 | inp_dim = int(self.net_info["height"]) 324 | 325 | # Get the number of classes 326 | num_classes = int(modules[i]["classes"]) 327 | 328 | # Output the result 329 | x = x.data 330 | x = predict_transform(x, inp_dim, anchors, num_classes, CUDA) 331 | 332 | if type(x) == int: 333 | continue 334 | 335 | if not write: 336 | detections = x 337 | write = 1 338 | else: 339 | detections = torch.cat((detections, x), 1) 340 | 341 | outputs[i] = outputs[i-1] 342 | 343 | try: 344 | return detections 345 | except: 346 | return 0 347 | 348 | def load_weights(self, weightfile): 349 | # Introduction: https://blog.paperspace.com/how-to-implement-a-yolo-v3-object-detector-from-scratch-in-pytorch-part-3/ 350 | # Open the weights file 351 | # weightfile = os.path.join(sys.path[-1], weightfile) 352 | fp = open(weightfile, "rb") 353 | 354 | # The first 5 values are header information 355 | # 1. Major version number 356 | # 2. Minor Version Number 357 | # 3. Subversion number 358 | # 4.5 Images seen by the network (during training) 359 | header = np.fromfile(fp, dtype = np.int32, count = 5) 360 | self.header = torch.from_numpy(header) 361 | self.seen = self.header[3] 362 | 363 | # The rest of the values are the weights 364 | # Let's load them up 365 | weights = np.fromfile(fp, dtype = np.float32) 366 | 367 | ptr = 0 368 | for i in range(len(self.module_list)): 369 | module_type = self.blocks[i + 1]["type"] 370 | 371 | if module_type == "convolutional": 372 | model = self.module_list[i] 373 | try: 374 | batch_normalize = int(self.blocks[i+1]["batch_normalize"]) 375 | except: 376 | batch_normalize = 0 377 | 378 | conv = model[0] 379 | 380 | if (batch_normalize): 381 | bn = model[1] 382 | 383 | # Get the number of weights of Batch Norm Layer 384 | num_bn_biases = bn.bias.numel() 385 | 386 | # Load the weights 387 | bn_biases = torch.from_numpy(weights[ptr:ptr + num_bn_biases]) 388 | ptr += num_bn_biases 389 | 390 | bn_weights = torch.from_numpy(weights[ptr: ptr + num_bn_biases]) 391 | ptr += num_bn_biases 392 | 393 | bn_running_mean = torch.from_numpy(weights[ptr: ptr + num_bn_biases]) 394 | ptr += num_bn_biases 395 | 396 | bn_running_var = torch.from_numpy(weights[ptr: ptr + num_bn_biases]) 397 | ptr += num_bn_biases 398 | 399 | # Cast the loaded weights into dims of model weights. 400 | bn_biases = bn_biases.view_as(bn.bias.data) 401 | bn_weights = bn_weights.view_as(bn.weight.data) 402 | bn_running_mean = bn_running_mean.view_as(bn.running_mean) 403 | bn_running_var = bn_running_var.view_as(bn.running_var) 404 | 405 | # Copy the data to model 406 | bn.bias.data.copy_(bn_biases) 407 | bn.weight.data.copy_(bn_weights) 408 | bn.running_mean.copy_(bn_running_mean) 409 | bn.running_var.copy_(bn_running_var) 410 | 411 | else: 412 | # Number of biases 413 | num_biases = conv.bias.numel() 414 | 415 | # Load the weights 416 | conv_biases = torch.from_numpy(weights[ptr: ptr + num_biases]) 417 | ptr = ptr + num_biases 418 | 419 | # reshape the loaded weights according to the dims of the model weights 420 | conv_biases = conv_biases.view_as(conv.bias.data) 421 | 422 | # Finally copy the data 423 | conv.bias.data.copy_(conv_biases) 424 | 425 | # Let us load the weights for the Convolutional layers 426 | num_weights = conv.weight.numel() 427 | 428 | # Do the same as above for weights 429 | conv_weights = torch.from_numpy(weights[ptr:ptr+num_weights]) 430 | ptr = ptr + num_weights 431 | 432 | conv_weights = conv_weights.view_as(conv.weight.data) 433 | conv.weight.data.copy_(conv_weights) 434 | -------------------------------------------------------------------------------- /demo/lib/yolov3/data/coco.names: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorbike 5 | aeroplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | stop sign 13 | parking meter 14 | bench 15 | bird 16 | cat 17 | dog 18 | horse 19 | sheep 20 | cow 21 | elephant 22 | bear 23 | zebra 24 | giraffe 25 | backpack 26 | umbrella 27 | handbag 28 | tie 29 | suitcase 30 | frisbee 31 | skis 32 | snowboard 33 | sports ball 34 | kite 35 | baseball bat 36 | baseball glove 37 | skateboard 38 | surfboard 39 | tennis racket 40 | bottle 41 | wine glass 42 | cup 43 | fork 44 | knife 45 | spoon 46 | bowl 47 | banana 48 | apple 49 | sandwich 50 | orange 51 | broccoli 52 | carrot 53 | hot dog 54 | pizza 55 | donut 56 | cake 57 | chair 58 | sofa 59 | pottedplant 60 | bed 61 | diningtable 62 | toilet 63 | tvmonitor 64 | laptop 65 | mouse 66 | remote 67 | keyboard 68 | cell phone 69 | microwave 70 | oven 71 | toaster 72 | sink 73 | refrigerator 74 | book 75 | clock 76 | vase 77 | scissors 78 | teddy bear 79 | hair drier 80 | toothbrush 81 | -------------------------------------------------------------------------------- /demo/lib/yolov3/data/pallete: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/yolov3/data/pallete -------------------------------------------------------------------------------- /demo/lib/yolov3/data/voc.names: -------------------------------------------------------------------------------- 1 | aeroplane 2 | bicycle 3 | bird 4 | boat 5 | bottle 6 | bus 7 | car 8 | cat 9 | chair 10 | cow 11 | diningtable 12 | dog 13 | horse 14 | motorbike 15 | person 16 | pottedplant 17 | sheep 18 | sofa 19 | train 20 | tvmonitor 21 | -------------------------------------------------------------------------------- /demo/lib/yolov3/human_detector.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | import time 3 | import torch 4 | import numpy as np 5 | import cv2 6 | import os 7 | import sys 8 | import random 9 | import pickle as pkl 10 | import argparse 11 | 12 | from lib.yolov3.util import * 13 | from lib.yolov3.darknet import Darknet 14 | from lib.yolov3 import preprocess 15 | 16 | cur_dir = os.path.dirname(os.path.realpath(__file__)) 17 | project_root = os.path.join(cur_dir, '../../../') 18 | chk_root = os.path.join(project_root, 'checkpoint/') 19 | data_root = os.path.join(project_root, 'data/') 20 | 21 | 22 | sys.path.insert(0, project_root) 23 | sys.path.pop(0) 24 | 25 | 26 | def prep_image(img, inp_dim): 27 | """ 28 | Prepare image for inputting to the neural network. 29 | 30 | Returns a Variable 31 | """ 32 | ori_img = img 33 | dim = ori_img.shape[1], ori_img.shape[0] 34 | img = cv2.resize(ori_img, (inp_dim, inp_dim)) 35 | img_ = img[:, :, ::-1].transpose((2, 0, 1)).copy() 36 | img_ = torch.from_numpy(img_).float().div(255.0).unsqueeze(0) 37 | return img_, ori_img, dim 38 | 39 | 40 | def write(x, img, colors): 41 | x = [int(i) for i in x] 42 | c1 = tuple(x[0:2]) 43 | c2 = tuple(x[2:4]) 44 | 45 | label = 'People {}'.format(0) 46 | color = (0, 0, 255) 47 | cv2.rectangle(img, c1, c2, color, 2) 48 | t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1, 1)[0] 49 | c2 = c1[0] + t_size[0] + 3, c1[1] + t_size[1] + 4 50 | cv2.rectangle(img, c1, c2, color, -1) 51 | cv2.putText(img, label, (c1[0], c1[1] + t_size[1] + 4), cv2.FONT_HERSHEY_PLAIN, 1, [225, 255, 255], 1) 52 | return img 53 | 54 | 55 | def arg_parse(): 56 | """" 57 | Parse arguements to the detect module 58 | 59 | """ 60 | parser = argparse.ArgumentParser(description='YOLO v3 Cam Demo') 61 | parser.add_argument('--confidence', dest='confidence', type=float, default=0.70, 62 | help='Object Confidence to filter predictions') 63 | parser.add_argument('--nms-thresh', dest='nms_thresh', type=float, default=0.4, help='NMS Threshold') 64 | parser.add_argument('--reso', dest='reso', default=416, type=int, help='Input resolution of the network. ' 65 | 'Increase to increase accuracy. Decrease to increase speed. (160, 416)') 66 | parser.add_argument('-wf', '--weight-file', type=str, default= 'demo/lib/checkpoint/yolov3.weights', help='The path' 67 | 'of model weight file') 68 | parser.add_argument('-cf', '--cfg-file', type=str, default=cur_dir + '/cfg/yolov3.cfg', help='weight file') 69 | parser.add_argument('-a', '--animation', action='store_true', help='output animation') 70 | parser.add_argument('-v', '--video', type=str, default='camera', help='The input video path') 71 | parser.add_argument("-f", "--figure", type=str, default='demo.jpg',help="input figure file name") 72 | parser.add_argument('-i', '--image', type=str, default=cur_dir + '/data/dog-cycle-car.png', 73 | help='The input video path') 74 | parser.add_argument('-np', '--num-person', type=int, default=1, help='number of estimated human poses. [1, 2]') 75 | parser.add_argument('--gpu', type=str, default='0', help='input video') 76 | 77 | return parser.parse_args() 78 | 79 | 80 | def load_model(args=None, CUDA=None, inp_dim=416): 81 | if args is None: 82 | args = arg_parse() 83 | 84 | if CUDA is None: 85 | CUDA = torch.cuda.is_available() 86 | 87 | # Set up the neural network 88 | model = Darknet(args.cfg_file) 89 | model.load_weights(args.weight_file) 90 | # print("YOLOv3 network successfully loaded") 91 | 92 | model.net_info["height"] = inp_dim 93 | assert inp_dim % 32 == 0 94 | assert inp_dim > 32 95 | 96 | # If there's a GPU availible, put the model on GPU 97 | if CUDA: 98 | model.cuda() 99 | 100 | # Set the model in evaluation mode 101 | model.eval() 102 | 103 | return model 104 | 105 | 106 | def yolo_human_det(img, model=None, reso=416, confidence=0.70): 107 | args = arg_parse() 108 | # args.reso = reso 109 | inp_dim = reso 110 | num_classes = 80 111 | 112 | CUDA = torch.cuda.is_available() 113 | if model is None: 114 | model = load_model(args, CUDA, inp_dim) 115 | 116 | if type(img) == str: 117 | assert os.path.isfile(img), 'The image path does not exist' 118 | img = cv2.imread(img) 119 | 120 | img, ori_img, img_dim = preprocess.prep_image(img, inp_dim) 121 | img_dim = torch.FloatTensor(img_dim).repeat(1, 2) 122 | 123 | with torch.no_grad(): 124 | if CUDA: 125 | img_dim = img_dim.cuda() 126 | img = img.cuda() 127 | output = model(img, CUDA) 128 | output = write_results(output, confidence, num_classes, nms=True, nms_conf=args.nms_thresh, det_hm=True) 129 | 130 | if len(output) == 0: 131 | return None, None 132 | 133 | img_dim = img_dim.repeat(output.size(0), 1) 134 | scaling_factor = torch.min(inp_dim / img_dim, 1)[0].view(-1, 1) 135 | 136 | output[:, [1, 3]] -= (inp_dim - scaling_factor * img_dim[:, 0].view(-1, 1)) / 2 137 | output[:, [2, 4]] -= (inp_dim - scaling_factor * img_dim[:, 1].view(-1, 1)) / 2 138 | output[:, 1:5] /= scaling_factor 139 | 140 | for i in range(output.shape[0]): 141 | output[i, [1, 3]] = torch.clamp(output[i, [1, 3]], 0.0, img_dim[i, 0]) 142 | output[i, [2, 4]] = torch.clamp(output[i, [2, 4]], 0.0, img_dim[i, 1]) 143 | 144 | bboxs = [] 145 | scores = [] 146 | for i in range(len(output)): 147 | item = output[i] 148 | bbox = item[1:5].cpu().numpy() 149 | # conver float32 to .2f data 150 | bbox = [round(i, 2) for i in list(bbox)] 151 | score = item[5].cpu().numpy() 152 | bboxs.append(bbox) 153 | scores.append(score) 154 | scores = np.expand_dims(np.array(scores), 1) 155 | bboxs = np.array(bboxs) 156 | 157 | return bboxs, scores 158 | -------------------------------------------------------------------------------- /demo/lib/yolov3/preprocess.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | import torch 4 | import numpy as np 5 | import cv2 6 | from PIL import Image 7 | 8 | 9 | def letterbox_image(img, inp_dim): 10 | '''resize image with unchanged aspect ratio using padding''' 11 | img_w, img_h = img.shape[1], img.shape[0] 12 | w, h = inp_dim 13 | new_w = int(img_w * min(w/img_w, h/img_h)) 14 | new_h = int(img_h * min(w/img_w, h/img_h)) 15 | resized_image = cv2.resize(img, (new_w, new_h), interpolation=cv2.INTER_CUBIC) 16 | 17 | canvas = np.full((inp_dim[1], inp_dim[0], 3), 128) 18 | 19 | canvas[(h - new_h) // 2:(h - new_h) // 2 + new_h, (w - new_w) // 2:(w - new_w) // 2 + new_w, :] = resized_image 20 | 21 | return canvas 22 | 23 | 24 | def prep_image(img, inp_dim): 25 | """ 26 | Prepare image for inputting to the neural network. 27 | 28 | Returns a Variable 29 | """ 30 | if type(img) == str: 31 | orig_im = cv2.imread(img) 32 | else: 33 | orig_im = img 34 | dim = orig_im.shape[1], orig_im.shape[0] 35 | img = (letterbox_image(orig_im, (inp_dim, inp_dim))) 36 | img_ = img[:, :, ::-1].transpose((2, 0, 1)).copy() 37 | img_ = torch.from_numpy(img_).float().div(255.0).unsqueeze(0) 38 | return img_, orig_im, dim 39 | 40 | 41 | def prep_image_pil(img, network_dim): 42 | orig_im = Image.open(img) 43 | img = orig_im.convert('RGB') 44 | dim = img.size 45 | img = img.resize(network_dim) 46 | img = torch.ByteTensor(torch.ByteStorage.from_buffer(img.tobytes())) 47 | img = img.view(*network_dim, 3).transpose(0, 1).transpose(0, 2).contiguous() 48 | img = img.view(1, 3, *network_dim) 49 | img = img.float().div(255.0) 50 | return img, orig_im, dim 51 | 52 | 53 | def inp_to_image(inp): 54 | inp = inp.cpu().squeeze() 55 | inp = inp * 255 56 | try: 57 | inp = inp.data.numpy() 58 | except RuntimeError: 59 | inp = inp.numpy() 60 | inp = inp.transpose(1, 2, 0) 61 | 62 | inp = inp[:, :, ::-1] 63 | return inp 64 | -------------------------------------------------------------------------------- /demo/lib/yolov3/util.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | import torch 4 | import numpy as np 5 | import cv2 6 | import os.path as osp 7 | from lib.yolov3.bbox import bbox_iou 8 | 9 | 10 | def get_path(cur_file): 11 | cur_dir = osp.dirname(osp.realpath(cur_file)) 12 | project_root = osp.join(cur_dir, '../../../') 13 | chk_root = osp.join(project_root, 'checkpoint/') 14 | data_root = osp.join(project_root, 'data/') 15 | 16 | return project_root, chk_root, data_root, cur_dir 17 | 18 | 19 | def count_parameters(model): 20 | return sum(p.numel() for p in model.parameters()) 21 | 22 | 23 | def count_learnable_parameters(model): 24 | return sum(p.numel() for p in model.parameters() if p.requires_grad) 25 | 26 | 27 | def convert2cpu(matrix): 28 | if matrix.is_cuda: 29 | return torch.FloatTensor(matrix.size()).copy_(matrix) 30 | else: 31 | return matrix 32 | 33 | 34 | def predict_transform(prediction, inp_dim, anchors, num_classes, CUDA = True): 35 | batch_size = prediction.size(0) 36 | stride = inp_dim // prediction.size(2) 37 | grid_size = inp_dim // stride 38 | bbox_attrs = 5 + num_classes 39 | num_anchors = len(anchors) 40 | 41 | anchors = [(a[0]/stride, a[1]/stride) for a in anchors] 42 | 43 | prediction = prediction.view(batch_size, bbox_attrs*num_anchors, grid_size*grid_size) 44 | prediction = prediction.transpose(1, 2).contiguous() 45 | prediction = prediction.view(batch_size, grid_size*grid_size*num_anchors, bbox_attrs) 46 | 47 | # Sigmoid the centre_X, centre_Y. and object confidencce 48 | prediction[:, :, 0] = torch.sigmoid(prediction[:, :, 0]) 49 | prediction[:, :, 1] = torch.sigmoid(prediction[:, :, 1]) 50 | prediction[:, :, 4] = torch.sigmoid(prediction[:, :, 4]) 51 | 52 | # Add the center offsets 53 | grid_len = np.arange(grid_size) 54 | a, b = np.meshgrid(grid_len, grid_len) 55 | 56 | x_offset = torch.FloatTensor(a).view(-1, 1) 57 | y_offset = torch.FloatTensor(b).view(-1, 1) 58 | 59 | if CUDA: 60 | x_offset = x_offset.cuda() 61 | y_offset = y_offset.cuda() 62 | 63 | x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1, num_anchors).view(-1, 2).unsqueeze(0) 64 | 65 | prediction[:, :, :2] += x_y_offset 66 | 67 | # log space transform height and the width 68 | anchors = torch.FloatTensor(anchors) 69 | 70 | if CUDA: 71 | anchors = anchors.cuda() 72 | 73 | anchors = anchors.repeat(grid_size*grid_size, 1).unsqueeze(0) 74 | prediction[:, :, 2:4] = torch.exp(prediction[:, :, 2:4])*anchors 75 | 76 | # Softmax the class scores 77 | prediction[:, :, 5: 5 + num_classes] = torch.sigmoid((prediction[:, :, 5: 5 + num_classes])) 78 | 79 | prediction[:, :, :4] *= stride 80 | 81 | return prediction 82 | 83 | 84 | def load_classes(namesfile): 85 | fp = open(namesfile, "r") 86 | names = fp.read().split("\n")[:-1] 87 | return names 88 | 89 | 90 | def get_im_dim(im): 91 | im = cv2.imread(im) 92 | w, h = im.shape[1], im.shape[0] 93 | return w, h 94 | 95 | 96 | def unique(tensor): 97 | tensor_np = tensor.cpu().numpy() 98 | unique_np = np.unique(tensor_np) 99 | unique_tensor = torch.from_numpy(unique_np) 100 | 101 | tensor_res = tensor.new(unique_tensor.shape) 102 | tensor_res.copy_(unique_tensor) 103 | return tensor_res 104 | 105 | 106 | # ADD SOFT NMS 107 | def write_results(prediction, confidence, num_classes, nms=True, nms_conf=0.4, det_hm=False): 108 | """ 109 | https://blog.paperspace.com/how-to-implement-a-yolo-v3-object-detector-from-scratch-in-pytorch-part-4/ 110 | prediction: (B x 10647 x 85) 111 | B: the number of images in a batch, 112 | 10647: the number of bounding boxes predicted per image. (52×52+26×26+13×13)×3=10647 113 | 85: the number of bounding box attributes. (c_x, c_y, w, h, object confidence, and 80 class scores) 114 | 115 | output: Num_obj × [img_index, x_1, y_1, x_2, y_2, object confidence, class_score, label_index] 116 | """ 117 | 118 | conf_mask = (prediction[:, :, 4] > confidence).float().unsqueeze(2) 119 | prediction = prediction*conf_mask 120 | 121 | box_a = prediction.new(prediction.shape) 122 | box_a[:, :, 0] = (prediction[:, :, 0] - prediction[:, :, 2]/2) 123 | box_a[:, :, 1] = (prediction[:, :, 1] - prediction[:, :, 3]/2) 124 | box_a[:, :, 2] = (prediction[:, :, 0] + prediction[:, :, 2]/2) 125 | box_a[:, :, 3] = (prediction[:, :, 1] + prediction[:, :, 3]/2) 126 | prediction[:, :, :4] = box_a[:, :, :4] 127 | 128 | batch_size = prediction.size(0) 129 | 130 | output = prediction.new(1, prediction.size(2) + 1) 131 | write = False 132 | 133 | for ind in range(batch_size): 134 | # select the image from the batch 135 | image_pred = prediction[ind] 136 | 137 | # Get the class having maximum score, and the index of that class 138 | # Get rid of num_classes softmax scores 139 | # Add the class index and the class score of class having maximum score 140 | max_conf, max_conf_index = torch.max(image_pred[:, 5:5 + num_classes], 1) 141 | max_conf = max_conf.float().unsqueeze(1) 142 | max_conf_index = max_conf_index.float().unsqueeze(1) 143 | seq = (image_pred[:, :5], max_conf, max_conf_index) 144 | image_pred = torch.cat(seq, 1) # image_pred:(10647, 7) 7:[x1, y1, x2, y2, obj_score, max_conf, max_conf_index] 145 | 146 | # Get rid of the zero entries 147 | non_zero_ind = (torch.nonzero(image_pred[:, 4])) 148 | image_pred__ = image_pred[non_zero_ind.squeeze(), :].view(-1, 7) 149 | 150 | # filters out people id 151 | if det_hm: 152 | cls_mask = (image_pred__[:, -1] == 0).float() 153 | class_mask_ind = torch.nonzero(cls_mask).squeeze() 154 | image_pred_ = image_pred__[class_mask_ind].view(-1, 7) 155 | 156 | if torch.sum(cls_mask) == 0: 157 | return image_pred_ 158 | else: 159 | image_pred_ = image_pred__ 160 | 161 | # Get the various classes detected in the image 162 | try: 163 | # img_classes = unique(image_pred_[:, -1]) 164 | img_classes = torch.unique(image_pred_[:, -1], sorted=True).float() 165 | except: 166 | continue 167 | 168 | # We will do NMS classwise 169 | # import ipdb;ipdb.set_trace() 170 | for cls in img_classes: 171 | # get the detections with one particular class 172 | cls_mask = image_pred_*(image_pred_[:, -1] == cls).float().unsqueeze(1) 173 | class_mask_ind = torch.nonzero(cls_mask[:, -2]).squeeze() 174 | image_pred_class = image_pred_[class_mask_ind].view(-1, 7) 175 | 176 | # sort the detections such that the entry with the maximum objectness 177 | # confidence is at the top 178 | conf_sort_index = torch.sort(image_pred_class[:, 4], descending=True)[1] 179 | image_pred_class = image_pred_class[conf_sort_index] 180 | idx = image_pred_class.size(0) 181 | 182 | # from soft_NMS import soft_nms 183 | # boxes = image_pred_class[:,:4] 184 | # scores = image_pred_class[:, 4] 185 | # k, N = soft_nms(boxes, scores, method=2) 186 | # image_pred_class = image_pred_class[k] 187 | 188 | # if nms has to be done 189 | if nms: 190 | # For each detection 191 | for i in range(idx): 192 | # Get the IOUs of all boxes that come after the one we are looking at 193 | # in the loop 194 | try: 195 | ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:]) 196 | except ValueError: 197 | break 198 | 199 | except IndexError: 200 | break 201 | 202 | # Zero out all the detections that have IoU > threshold 203 | iou_mask = (ious < nms_conf).float().unsqueeze(1) 204 | image_pred_class[i+1:] *= iou_mask 205 | 206 | # Remove the zero entries 207 | non_zero_ind = torch.nonzero(image_pred_class[:, 4]).squeeze() 208 | image_pred_class = image_pred_class[non_zero_ind].view(-1, 7) 209 | 210 | # Concatenate the batch_id of the image to the detection 211 | # this helps us identify which image does the detection correspond to 212 | # We use a linear structure to hold ALL the detections from the batch 213 | # the batch_dim is flattened 214 | # batch is identified by extra batch column 215 | 216 | batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind) 217 | seq = batch_ind, image_pred_class 218 | if not write: 219 | output = torch.cat(seq, 1) 220 | write = True 221 | else: 222 | out = torch.cat(seq, 1) 223 | output = torch.cat((output, out)) 224 | 225 | return output 226 | -------------------------------------------------------------------------------- /demo/vis.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import cv2 3 | 4 | from lib.preprocess import h36m_coco_format, revise_kpts 5 | from lib.hrnet.gen_kpts import gen_video_kpts as hrnet_pose 6 | import os 7 | import numpy as np 8 | import torch 9 | import glob 10 | from tqdm import tqdm 11 | import copy 12 | import shutil 13 | from IPython import embed 14 | 15 | 16 | sys.path.append(os.getcwd()) 17 | from model.GCN_conv import adj_mx_from_skeleton 18 | from model.trans import HTNet 19 | from common.camera import * 20 | from common.h36m_dataset import Human36mDataset 21 | from common.camera import camera_to_world 22 | from common.opt import opts 23 | opt = opts().parse() 24 | 25 | import matplotlib 26 | import matplotlib.pyplot as plt 27 | import matplotlib.gridspec as gridspec 28 | 29 | os.environ["CUDA_VISIBLE_DEVICES"] = opt.gpu 30 | 31 | plt.switch_backend('agg') 32 | matplotlib.rcParams['pdf.fonttype'] = 42 33 | matplotlib.rcParams['ps.fonttype'] = 42 34 | 35 | dataset_path = './dataset/data_3d_h36m.npz' 36 | dataset = Human36mDataset(dataset_path, opt) 37 | adj = adj_mx_from_skeleton(dataset.skeleton()) 38 | 39 | def show2Dpose(kps, img): 40 | connections = [[0, 1], [1, 2], [2, 3], [0, 4], [4, 5], 41 | [5, 6], [0, 7], [7, 8], [8, 9], [9, 10], 42 | [8, 11], [11, 12], [12, 13], [8, 14], [14, 15], [15, 16]] 43 | 44 | LR = np.array([0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0], dtype=bool) 45 | 46 | lcolor = (255, 0, 0) 47 | rcolor = (0, 0, 255) 48 | thickness = 3 49 | 50 | for j,c in enumerate(connections): 51 | start = map(int, kps[c[0]]) 52 | end = map(int, kps[c[1]]) 53 | start = list(start) 54 | end = list(end) 55 | cv2.line(img, (start[0], start[1]), (end[0], end[1]), lcolor if LR[j] else rcolor, thickness) 56 | cv2.circle(img, (start[0], start[1]), thickness=-1, color=(0, 255, 0), radius=3) 57 | cv2.circle(img, (end[0], end[1]), thickness=-1, color=(0, 255, 0), radius=3) 58 | 59 | return img 60 | 61 | 62 | 63 | def show3Dpose(vals, ax): 64 | ax.view_init(elev=15., azim=70) 65 | 66 | lcolor=(0,0,1) 67 | rcolor=(1,0,0) 68 | 69 | I = np.array( [0, 0, 1, 4, 2, 5, 0, 7, 8, 8, 14, 15, 11, 12, 8, 9]) 70 | J = np.array( [1, 4, 2, 5, 3, 6, 7, 8, 14, 11, 15, 16, 12, 13, 9, 10]) 71 | 72 | LR = np.array([0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0], dtype=bool) 73 | 74 | for i in np.arange( len(I) ): 75 | x, y, z = [np.array( [vals[I[i], j], vals[J[i], j]] ) for j in range(3)] 76 | ax.plot(x, y, z, lw=2, color = lcolor if LR[i] else rcolor) 77 | 78 | RADIUS = 0.72 79 | RADIUS_Z = 0.7 80 | 81 | xroot, yroot, zroot = vals[0,0], vals[0,1], vals[0,2] 82 | ax.set_xlim3d([-RADIUS+xroot, RADIUS+xroot]) 83 | ax.set_ylim3d([-RADIUS+yroot, RADIUS+yroot]) 84 | ax.set_zlim3d([-RADIUS_Z+zroot, RADIUS_Z+zroot]) 85 | ax.set_aspect('auto') # works fine in matplotlib==2.2.2 86 | 87 | white = (1.0, 1.0, 1.0, 0.0) 88 | ax.xaxis.set_pane_color(white) 89 | ax.yaxis.set_pane_color(white) 90 | ax.zaxis.set_pane_color(white) 91 | 92 | ax.tick_params('x', labelbottom = False) 93 | ax.tick_params('y', labelleft = False) 94 | ax.tick_params('z', labelleft = False) 95 | 96 | 97 | 98 | 99 | def showimage(ax, img): 100 | ax.set_xticks([]) 101 | ax.set_yticks([]) 102 | plt.axis('off') 103 | ax.imshow(img) 104 | 105 | 106 | def get_pose3D(figure_path, output_dir, file_name): 107 | # Genarate 2D pose 108 | keypoints, scores = hrnet_pose(figure_path, det_dim=416, num_peroson=1, gen_output=True) 109 | keypoints, scores, valid_frames = h36m_coco_format(keypoints, scores) 110 | 111 | ## Reload 112 | previous_dir = './ckpt/cpn' 113 | model = HTNet(opt, adj).cuda() 114 | model_dict = model.state_dict() 115 | model_path = sorted(glob.glob(os.path.join(previous_dir, '*.pth')))[0] 116 | pre_dict = torch.load(model_path) 117 | for name, key in model_dict.items(): 118 | model_dict[name] = pre_dict[name] 119 | model.load_state_dict(model_dict) 120 | model.eval() 121 | 122 | ## 3D 123 | img = cv2.imread(figure_path) 124 | img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) 125 | img_size = img.shape 126 | input_2D_no = keypoints[:,0,:,:] 127 | 128 | joints_left = [4, 5, 6, 11, 12, 13] 129 | joints_right = [1, 2, 3, 14, 15, 16] 130 | 131 | input_2D = normalize_screen_coordinates(input_2D_no, w=img_size[1], h=img_size[0]) 132 | 133 | input_2D_aug = copy.deepcopy(input_2D) 134 | input_2D_aug[ :, :, 0] *= -1 135 | input_2D_aug[ :, joints_left + joints_right] = input_2D_aug[ :, joints_right + joints_left] 136 | input_2D = np.concatenate((np.expand_dims(input_2D, axis=0), np.expand_dims(input_2D_aug, axis=0)), 0) 137 | 138 | input_2D = input_2D[np.newaxis, :, :, :, :] 139 | 140 | input_2D = torch.from_numpy(input_2D.astype('float32')).cuda() 141 | 142 | N = input_2D.size(0) 143 | 144 | ## estimation 145 | output_3D_non_flip = model(input_2D[:, 0]) 146 | output_3D_flip = model(input_2D[:, 1]) 147 | 148 | output_3D_flip[:, :, :, 0] *= -1 149 | output_3D_flip[:, :, joints_left + joints_right, :] = output_3D_flip[:, :, joints_right + joints_left, :] 150 | 151 | output_3D = (output_3D_non_flip + output_3D_flip) / 2 152 | 153 | output_3D = output_3D[0:, opt.pad].unsqueeze(1) 154 | output_3D[:, :, 0, :] = 0 155 | post_out = output_3D[0, 0].cpu().detach().numpy() 156 | 157 | rot = [0.1407056450843811, -0.1500701755285263, -0.755240797996521, 0.6223280429840088] 158 | rot = np.array(rot, dtype='float32') 159 | post_out = camera_to_world(post_out, R=rot, t=0) 160 | post_out[:, 2] -= np.min(post_out[:, 2]) 161 | 162 | input_2D_no = input_2D_no[opt.pad] 163 | 164 | ## 2D 165 | image = show2Dpose(input_2D_no, copy.deepcopy(img)) 166 | 167 | 168 | ## 3D 169 | fig = plt.figure( figsize=(9.6, 5.4)) 170 | gs = gridspec.GridSpec(1, 1) 171 | gs.update(wspace=-0.00, hspace=0.05) 172 | ax = plt.subplot(gs[0], projection='3d') 173 | show3Dpose( post_out, ax) 174 | 175 | 176 | output_dir_3D = output_dir +'pose3D/' 177 | os.makedirs(output_dir_3D, exist_ok=True) 178 | plt.savefig(output_dir_3D + '_3D.png', dpi=200, format='png', bbox_inches = 'tight') 179 | 180 | 181 | 182 | 183 | ## all 184 | image_3d_dir = sorted(glob.glob(os.path.join(output_dir_3D, '*.png'))) 185 | 186 | for i in range(len(image_3d_dir)): 187 | image_2d = image 188 | image_3d = plt.imread(image_3d_dir[i]) 189 | ## crop 190 | edge = (image_2d.shape[1] - image_2d.shape[0]) // 2 191 | image_2d = image_2d[:, edge:image_2d.shape[1] - edge] 192 | 193 | edge = 130 194 | image_3d = image_3d[edge:image_3d.shape[0] - edge, edge:image_3d.shape[1] - edge] 195 | ## show 196 | font_size = 12 197 | ax = plt.subplot(121) 198 | showimage(ax, image_2d) 199 | ax.set_title("Input", fontsize = font_size) 200 | 201 | ax = plt.subplot(122) 202 | showimage(ax, image_3d) 203 | ax.set_title("Pose", fontsize = font_size) 204 | 205 | ## save 206 | output_dir_pose = output_dir 207 | plt.savefig(output_dir + file_name + '_pose.png', dpi=200, bbox_inches = 'tight') 208 | 209 | shutil.rmtree("./demo/output/pose3D") 210 | 211 | 212 | 213 | if __name__ == "__main__": 214 | items = os.listdir('./demo/figure/') 215 | print(items) 216 | for i, file_name in enumerate(items): 217 | print("Gnenerate Pose For " + file_name) 218 | figure_path = './demo/figure/' + file_name 219 | output_dir = './demo/output/' 220 | get_pose3D(figure_path, output_dir, file_name[:-4]) 221 | 222 | 223 | 224 | -------------------------------------------------------------------------------- /figure/README.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /figure/messi_pose.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/figure/messi_pose.png -------------------------------------------------------------------------------- /figure/structure.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/figure/structure.png -------------------------------------------------------------------------------- /figure/wild.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/figure/wild.png -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import os 2 | import glob 3 | import torch 4 | import random 5 | import logging 6 | import numpy as np 7 | from tqdm import tqdm 8 | import torch.utils.data 9 | import torch.optim as optim 10 | 11 | from common.opt import opts 12 | from common.utils import * 13 | from common.load_data_hm36 import Fusion 14 | from common.h36m_dataset import Human36mDataset 15 | from model.GCN_conv import adj_mx_from_skeleton 16 | from model.trans import HTNet 17 | 18 | opt = opts().parse() 19 | os.environ["CUDA_VISIBLE_DEVICES"] = opt.gpu 20 | from tensorboardX import SummaryWriter 21 | 22 | writer = SummaryWriter(log_dir='./runs/' + opt.model_name) 23 | 24 | 25 | def train(opt, actions, train_loader, model, optimizer, epoch): 26 | return step('train', opt, actions, train_loader, model, optimizer, epoch) 27 | 28 | def val(opt, actions, val_loader, model): 29 | with torch.no_grad(): 30 | return step('test', opt, actions, val_loader, model) 31 | 32 | def step(split, opt, actions, dataLoader, model, optimizer=None, epoch=None): 33 | loss_all = {'loss': AccumLoss()} 34 | action_error_sum = define_error_list(actions) 35 | if split == 'train': 36 | model.train() 37 | else: 38 | model.eval() 39 | for i, data in enumerate(tqdm(dataLoader, 0)): 40 | batch_cam, gt_3D, input_2D, action, subject, scale, bb_box, cam_ind = data 41 | [input_2D, gt_3D, batch_cam, scale, bb_box] = get_varialbe(split, [input_2D, gt_3D, batch_cam, scale, bb_box]) 42 | if split =='train': 43 | output_3D = model(input_2D) 44 | else: 45 | input_2D, output_3D = input_augmentation(input_2D, model) 46 | out_target = gt_3D.clone() 47 | out_target[:, :, 0] = 0 48 | if split == 'train': 49 | loss = mpjpe_cal(output_3D, out_target) 50 | N = input_2D.size(0) 51 | loss_all['loss'].update(loss.detach().cpu().numpy() * N, N) 52 | optimizer.zero_grad() 53 | loss.backward() 54 | optimizer.step() 55 | elif split == 'test': 56 | output_3D = output_3D[:, opt.pad].unsqueeze(1) 57 | output_3D[:, :, 0, :] = 0 58 | action_error_sum = test_calculation(output_3D, out_target, action, action_error_sum, opt.dataset, subject) 59 | if split == 'train': 60 | return loss_all['loss'].avg 61 | elif split == 'test': 62 | p1, p2 = print_error(opt.dataset, action_error_sum, opt.train) 63 | return p1, p2 64 | 65 | def input_augmentation(input_2D, model): 66 | joints_left = [4, 5, 6, 11, 12, 13] 67 | joints_right = [1, 2, 3, 14, 15, 16] 68 | input_2D_non_flip = input_2D[:, 0] 69 | input_2D_flip = input_2D[:, 1] 70 | output_3D_non_flip = model(input_2D_non_flip) 71 | output_3D_flip = model(input_2D_flip) 72 | output_3D_flip[:, :, :, 0] *= -1 73 | output_3D_flip[:, :, joints_left + joints_right, :] = output_3D_flip[:, :, joints_right + joints_left, :] 74 | output_3D = (output_3D_non_flip + output_3D_flip) / 2 75 | input_2D = input_2D_non_flip 76 | return input_2D, output_3D 77 | 78 | if __name__ == '__main__': 79 | manualSeed = opt.seed 80 | random.seed(manualSeed) 81 | torch.manual_seed(manualSeed) 82 | np.random.seed(manualSeed) 83 | torch.cuda.manual_seed_all(manualSeed) 84 | torch.backends.cudnn.benchmark = False 85 | torch.backends.cudnn.deterministic = True 86 | 87 | print("lr: ", opt.lr) 88 | print("batch_size: ", opt.batch_size) 89 | print("channel: ", opt.channel) 90 | print("GPU: ", opt.gpu) 91 | 92 | if opt.train: 93 | logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%Y/%m/%d %H:%M:%S', \ 94 | filename=os.path.join(opt.checkpoint, 'train.log'), level=logging.INFO) 95 | 96 | root_path = opt.root_path 97 | dataset_path = root_path + 'data_3d_' + opt.dataset + '.npz' 98 | 99 | dataset = Human36mDataset(dataset_path, opt) 100 | actions = define_actions(opt.actions) 101 | adj = adj_mx_from_skeleton(dataset.skeleton()) 102 | 103 | 104 | if opt.train: 105 | train_data = Fusion(opt=opt, train=True, dataset=dataset, root_path=root_path) 106 | train_dataloader = torch.utils.data.DataLoader(train_data, batch_size=opt.batch_size, 107 | shuffle=True, num_workers=int(opt.workers), pin_memory=True) 108 | 109 | test_data = Fusion(opt=opt, train=False, dataset=dataset, root_path =root_path) 110 | test_dataloader = torch.utils.data.DataLoader(test_data, batch_size=opt.batch_size, 111 | shuffle=False, num_workers=int(opt.workers), pin_memory=True) 112 | 113 | model = HTNet(opt,adj).cuda() 114 | 115 | if opt.reload: 116 | model_dict = model.state_dict() 117 | model_path = sorted(glob.glob(os.path.join(opt.previous_dir, '*.pth')))[0] 118 | print(model_path) 119 | pre_dict = torch.load(model_path) 120 | pre_key = pre_dict.keys() 121 | for name, key in model_dict.items(): 122 | model_dict[name] = pre_dict[name] 123 | model.load_state_dict(model_dict) 124 | 125 | model_params = 0 126 | for parameter in model.parameters(): 127 | model_params += parameter.numel() 128 | print('INFO: Trainable parameter count:', model_params / 1000000) 129 | 130 | 131 | all_param = [] 132 | lr = opt.lr 133 | all_param += list(model.parameters()) 134 | optimizer = optim.Adam(all_param, lr=opt.lr, amsgrad=True) 135 | scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.317, patience=5, verbose=True) 136 | 137 | for epoch in range(1, opt.nepoch): 138 | if opt.train: 139 | loss = train(opt, actions, train_dataloader, model, optimizer, epoch) 140 | p1, p2 = val(opt, actions, test_dataloader, model) 141 | writer.add_scalar('mpjpe',p1,epoch) 142 | writer.add_scalar('p2',p1,epoch) 143 | 144 | if opt.train and p1 < opt.previous_best_threshold: 145 | opt.previous_name = save_model(opt.previous_name, opt.checkpoint, epoch, p1, model) 146 | opt.previous_best_threshold = p1 147 | if opt.train == 0: 148 | print('p1: %.2f, p2: %.2f' % (p1, p2)) 149 | break 150 | else: 151 | logging.info('epoch: %d, lr: %.7f, loss: %.4f, p1: %.2f, p2: %.2f' % (epoch, lr, loss, p1, p2)) 152 | print('e: %d, lr: %.7f, loss: %.4f, p1: %.2f, p2: %.2f' % (epoch, lr, loss, p1, p2)) 153 | if epoch % opt.large_decay_epoch == 0: 154 | for param_group in optimizer.param_groups: 155 | param_group['lr'] *= opt.lr_decay_large 156 | lr *= opt.lr_decay_large 157 | else: 158 | for param_group in optimizer.param_groups: 159 | param_group['lr'] *= opt.lr_decay 160 | lr *= opt.lr_decay 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | -------------------------------------------------------------------------------- /model/Block.py: -------------------------------------------------------------------------------- 1 | from functools import partial 2 | import torch 3 | import torch.nn as nn 4 | 5 | from timm.data import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD 6 | from timm.models.layers import DropPath 7 | from model.GCN_conv import ModulatedGraphConv 8 | from model.Transformer import Attention, Mlp 9 | 10 | #X_1 11 | rl_2joints = [2,3] 12 | ll_2joints = [5,6] 13 | la_2joints = [12,13] 14 | ra_2joints = [15,16] 15 | part_2joints = [rl_2joints,ll_2joints,la_2joints,ra_2joints] 16 | # X_2 17 | rl_3joints = [1,2,3] 18 | ll_3joints = [4,5,6] 19 | ra_3joints = [14,15,16] 20 | la_3joints = [11,12,13] 21 | part_3joints = [rl_3joints,ll_3joints,la_3joints,ra_3joints] 22 | 23 | class LJC(nn.Module): 24 | def __init__(self, adj, dim, drop_path=0., norm_layer=nn.LayerNorm): 25 | super().__init__() 26 | self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity() 27 | self.adj = adj 28 | self.norm_gcn1 = norm_layer(dim) 29 | self.gcn1 = ModulatedGraphConv(dim,384,self.adj) 30 | self.gelu = nn.GELU() 31 | self.gcn2 = ModulatedGraphConv(384,dim,self.adj) 32 | self.norm_gcn2 = norm_layer(dim) 33 | 34 | def forward(self, x_gcn): 35 | x_gcn = x_gcn + self.drop_path(self.norm_gcn2(self.gcn2(self.gelu(self.gcn1(self.norm_gcn1(x_gcn)))))) 36 | return x_gcn 37 | 38 | 39 | class IPC(nn.Module): 40 | def __init__(self, dim, mlp_hidden_dim, drop=0., drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm): 41 | super().__init__() 42 | self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity() 43 | self.index_1 = [1,2,3, 4,5,6, 11,12,13, 14,15,16] # 6parts 44 | self.index_2 = [2,3, 5,6, 12,13, 15,16] 45 | self.gelu = nn.GELU() 46 | self.norm_conv1 = norm_layer(dim) 47 | self.conv1 = nn.Conv1d(dim,dim, kernel_size=3, padding=0, stride=3) 48 | self.norm_conv1_mlp = norm_layer(dim) 49 | self.mlp_down_1 = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop) 50 | self.norm_conv2 = norm_layer(dim) 51 | self.conv2 = nn.Conv1d(dim,dim, kernel_size=2, padding=0, stride=2) 52 | self.norm_conv2_mlp = norm_layer(dim) 53 | self.mlp_down_2 = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop) 54 | 55 | 56 | def forward(self, x_gcn, x_conv): 57 | x_conv = x_conv + x_gcn 58 | 59 | #NOTE:Conv_1 3 joints per limb 60 | x_conv_1 = self.norm_conv1(x_conv) 61 | x_conv_1 = x_conv_1.permute(0,2,1) 62 | x_pooling_1 = x_conv_1[:, :, self.index_1] 63 | x_pooling_1 = self.drop_path(self.gelu(self.conv1(x_pooling_1))) 64 | 65 | x_pooling_1 = x_pooling_1.permute(0,2,1) 66 | x_pooling_1 = x_pooling_1 + self.drop_path(self.mlp_down_1(self.norm_conv1_mlp(x_pooling_1))) 67 | x_pooling_1 = x_pooling_1.permute(0,2,1) 68 | for i in range(len(part_3joints)): 69 | num_joints = len(part_3joints[i]) - 1 70 | x_conv_1[:,:,part_3joints[i][1:]] = x_pooling_1[:,:,i].unsqueeze(-1).repeat(1,1,num_joints) 71 | x_conv_1 = x_conv_1.permute(0,2,1) 72 | 73 | #NOTE:Conv_2 2 joints per limb 74 | x_conv_2 = self.norm_conv2(x_conv) 75 | x_conv_2 = x_conv_2.permute(0,2,1) 76 | x_pooling_2 = x_conv_2[:, :, self.index_2] 77 | x_pooling_2 = self.drop_path(self.gelu(self.conv2(x_pooling_2))) 78 | 79 | x_pooling_2 = x_pooling_2.permute(0,2,1) 80 | x_pooling_2 = x_pooling_2 + self.drop_path(self.mlp_down_2(self.norm_conv2_mlp(x_pooling_2))) 81 | x_pooling_2 = x_pooling_2.permute(0,2,1) 82 | for i in range(len(part_2joints)): 83 | num_joints = len(part_2joints[i]) - 1 84 | x_conv_2[:,:,part_2joints[i][1:]] = x_pooling_2[:,:,i].unsqueeze(-1).repeat(1,1,num_joints) 85 | x_conv_2 = x_conv_2.permute(0,2,1) 86 | 87 | x_conv = x_conv_1 + x_conv_2 + x_conv 88 | return x_conv 89 | 90 | 91 | class GBI(nn.Module): 92 | def __init__(self, dim, num_heads, qkv_bias=False, qk_scale=None, drop=0., attn_drop=0., 93 | drop_path=0., norm_layer=nn.LayerNorm, length=1): 94 | super().__init__() 95 | self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity() 96 | self.norm_attn = norm_layer(dim) 97 | self.attn = Attention(dim, num_heads=num_heads, qkv_bias=qkv_bias, \ 98 | qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop, length=length) 99 | 100 | def forward(self, x_conv, x_attn): 101 | x_attn = x_attn + x_conv 102 | x_attn = x_attn + self.drop_path(self.attn(self.norm_attn(x_attn))) 103 | return x_attn 104 | 105 | 106 | 107 | 108 | 109 | class Hiremixer(nn.Module): 110 | def __init__(self, adj, depth=8, embed_dim=512, mlp_hidden_dim=1024, h=8, drop_rate=0.1, length=9): 111 | super().__init__() 112 | drop_path_rate = 0.3 113 | attn_drop_rate = 0. 114 | qkv_bias = True 115 | qk_scale = None 116 | norm_layer = partial(nn.LayerNorm, eps=1e-6) 117 | # Stochastic depth decay rule 118 | dpr = [x.item() for x in torch.linspace(0.1, drop_path_rate, depth)] 119 | self.blocks = nn.ModuleList([ 120 | Block( 121 | adj, dim=embed_dim, num_heads=h, mlp_hidden_dim=mlp_hidden_dim, qkv_bias=qkv_bias, qk_scale=qk_scale, 122 | drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer, length=length) 123 | for i in range(depth)]) 124 | self.Temporal_norm = norm_layer(embed_dim) 125 | 126 | def forward(self, x): 127 | for blk in self.blocks: 128 | x = blk(x) 129 | x = self.Temporal_norm(x) 130 | return x 131 | 132 | 133 | class Block(nn.Module): 134 | def __init__(self, adj, dim, num_heads, mlp_hidden_dim, qkv_bias=False, qk_scale=None, drop=0., attn_drop=0., 135 | drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm, length=1): 136 | super().__init__() 137 | 138 | dim = int(dim/3) 139 | self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity() 140 | 141 | # Three sub-modules 142 | self.lgc = LJC(adj, dim, drop_path=drop_path, norm_layer=nn.LayerNorm) 143 | self.ipc = IPC(dim, mlp_hidden_dim, drop=drop, drop_path=drop_path, act_layer=nn.GELU, norm_layer=nn.LayerNorm) 144 | self.gbi = GBI(dim, num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale, drop=0.1, attn_drop=attn_drop, 145 | drop_path=drop_path, norm_layer=nn.LayerNorm, length=length) 146 | 147 | self.norm_mlp = norm_layer(dim*3) 148 | self.mlp = Mlp(in_features=dim*3, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop) 149 | 150 | 151 | 152 | def forward(self, x): 153 | x_split = torch.chunk(x,3,-1) 154 | x_lgc, x_ipc, x_gbi = x_split 155 | # Local Joint-level Connection (LJC) 156 | x_lgc = self.lgc(x_lgc) 157 | # Inter-Part Constraint (IPC) 158 | x_ipc = self.ipc(x_lgc, x_ipc) 159 | # Global body-level Interaction (GBI) 160 | x_gbi = self.gbi(x_ipc, x_gbi) 161 | x_cat = torch.cat([x_lgc,x_ipc,x_gbi], -1) 162 | x = x_cat + self.drop_path(self.mlp(self.norm_mlp(x_cat))) 163 | return x 164 | 165 | 166 | 167 | class Hiremixer_frame(nn.Module): 168 | def __init__(self, adj, depth=8, embed_dim=512, mlp_hidden_dim=1024, h=8, drop_rate=0.1, length=9): 169 | super().__init__() 170 | drop_path_rate = 0.3 171 | attn_drop_rate = 0. 172 | qkv_bias = True 173 | qk_scale = None 174 | norm_layer = partial(nn.LayerNorm, eps=1e-6) 175 | # Stochastic depth decay rule 176 | dpr = [x.item() for x in torch.linspace(0.1, drop_path_rate, depth)] 177 | self.blocks = nn.ModuleList([ 178 | Block_frame( 179 | adj, dim=embed_dim, num_heads=h, mlp_hidden_dim=mlp_hidden_dim, qkv_bias=qkv_bias, qk_scale=qk_scale, 180 | drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer, length=length) 181 | for i in range(depth)]) 182 | self.Temporal_norm = norm_layer(embed_dim) 183 | 184 | def forward(self, x): 185 | for blk in self.blocks: 186 | x = blk(x) 187 | x = self.Temporal_norm(x) 188 | return x 189 | 190 | 191 | class Block_frame(nn.Module): 192 | def __init__(self, adj, dim, num_heads, mlp_hidden_dim, qkv_bias=False, qk_scale=None, drop=0., attn_drop=0., 193 | drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm, length=1): 194 | super().__init__() 195 | 196 | dim = int(dim/2) 197 | self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity() 198 | 199 | # Three sub-modules 200 | self.lgc = LJC(adj, dim, drop_path=drop_path, norm_layer=nn.LayerNorm) 201 | self.gbi = GBI(dim, num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale, drop=0.1, attn_drop=attn_drop, 202 | drop_path=drop_path, norm_layer=nn.LayerNorm, length=length) 203 | 204 | self.norm_mlp = norm_layer(dim*2) 205 | self.mlp = Mlp(in_features=dim*2, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop) 206 | 207 | 208 | 209 | def forward(self, x): 210 | x_split = torch.chunk(x,2,-1) 211 | x_lgc, x_gbi = x_split 212 | # Local Joint-level Connection (LJC) 213 | x_lgc = self.lgc(x_lgc) 214 | # Global body-level Interaction (GBI) 215 | x_gbi = self.gbi(x_lgc, x_gbi) 216 | x_cat = torch.cat([x_lgc,x_gbi], -1) 217 | x = x_cat + self.drop_path(self.mlp(self.norm_mlp(x_cat))) 218 | return x 219 | 220 | 221 | 222 | class Block_ipc(nn.Module): 223 | def __init__(self, adj, dim, num_heads, mlp_hidden_dim, qkv_bias=False, qk_scale=None, drop=0., attn_drop=0., 224 | drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm, length=1): 225 | super().__init__() 226 | 227 | self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity() 228 | 229 | # Three sub-modules 230 | self.ipc = IPC(dim, mlp_hidden_dim, drop=drop, drop_path=drop_path, act_layer=nn.GELU, norm_layer=nn.LayerNorm) 231 | 232 | self.norm_mlp = norm_layer(dim) 233 | self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop) 234 | 235 | 236 | 237 | def forward(self, x): 238 | x = x + self.ipc(x) 239 | x = x + self.drop_path(self.mlp(self.norm_mlp(x))) 240 | return x -------------------------------------------------------------------------------- /model/GCN_conv.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import, division 2 | 3 | import math 4 | import torch 5 | import torch.nn as nn 6 | import torch.nn.functional as F 7 | import numpy as np 8 | 9 | import scipy.sparse as sp 10 | 11 | 12 | class _NonLocalBlockND(nn.Module): 13 | def __init__(self, in_channels, inter_channels=None, dimension=3, sub_sample=True, bn_layer=True): 14 | super(_NonLocalBlockND, self).__init__() 15 | 16 | assert dimension in [1, 2, 3] 17 | 18 | self.dimension = dimension 19 | self.sub_sample = sub_sample 20 | 21 | self.in_channels = in_channels 22 | self.inter_channels = inter_channels 23 | 24 | if self.inter_channels is None: 25 | self.inter_channels = in_channels // 2 26 | if self.inter_channels == 0: 27 | self.inter_channels = 1 28 | 29 | if dimension == 3: 30 | conv_nd = nn.Conv3d 31 | max_pool_layer = nn.MaxPool3d(kernel_size=(1, 2, 2)) 32 | bn = nn.BatchNorm3d 33 | elif dimension == 2: 34 | conv_nd = nn.Conv2d 35 | max_pool_layer = nn.MaxPool2d(kernel_size=(2, 2)) 36 | bn = nn.BatchNorm2d 37 | else: 38 | conv_nd = nn.Conv1d 39 | max_pool_layer = nn.MaxPool1d(kernel_size=(2)) 40 | bn = nn.BatchNorm1d 41 | 42 | self.g = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels, 43 | kernel_size=1, stride=1, padding=0) 44 | 45 | if bn_layer: 46 | self.W = nn.Sequential( 47 | conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels, 48 | kernel_size=1, stride=1, padding=0), 49 | bn(self.in_channels) 50 | ) 51 | nn.init.constant_(self.W[1].weight, 0) 52 | nn.init.constant_(self.W[1].bias, 0) 53 | else: 54 | self.W = conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels, 55 | kernel_size=1, stride=1, padding=0) 56 | nn.init.constant_(self.W.weight, 0) 57 | nn.init.constant_(self.W.bias, 0) 58 | 59 | self.theta = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels, 60 | kernel_size=1, stride=1, padding=0) 61 | self.phi = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels, 62 | kernel_size=1, stride=1, padding=0) 63 | 64 | if sub_sample: 65 | self.g = nn.Sequential(self.g, max_pool_layer)#384-》192 66 | self.phi = nn.Sequential(self.phi, max_pool_layer) 67 | 68 | def forward(self, x): 69 | ''' 70 | :param x: (b, c, t, h, w) 71 | :return: 72 | ''' 73 | 74 | batch_size = x.size(0)#torch.Size([256, 384, 1, 17]) 75 | 76 | g_x = self.g(x).view(batch_size, self.inter_channels, -1)#256,192,17 77 | g_x = g_x.permute(0, 2, 1)#torch.Size([256, 17, 192]) 78 | 79 | theta_x = self.theta(x).view(batch_size, self.inter_channels, -1) 80 | theta_x = theta_x.permute(0, 2, 1)#torch.Size([256, 17, 192]) 81 | phi_x = self.phi(x).view(batch_size, self.inter_channels, -1)#torch.Size([256, 192, 17]) 82 | f = torch.matmul(theta_x, phi_x)#torch.Size([256, 17, 17]) 83 | f_div_C = F.softmax(f, dim=-1)#torch.Size([256, 17, 17])在最后一个维度上进行softmax 84 | 85 | y = torch.matmul(f_div_C, g_x) #这一步其实就是AMIX感觉 86 | y = y.permute(0, 2, 1).contiguous()#torch.Size([256, 17, 192]) 87 | y = y.view(batch_size, self.inter_channels, *x.size()[2:])#torch.Size([256, 192, 1, 17]) 88 | W_y = self.W(y) 89 | z = W_y + x##torch.Size([256, 384, 1, 17]) Amix 90 | 91 | return z 92 | 93 | 94 | 95 | 96 | class NONLocalBlock2D(_NonLocalBlockND): 97 | def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True): 98 | super(NONLocalBlock2D, self).__init__(in_channels, 99 | inter_channels=inter_channels, 100 | dimension=2, sub_sample=sub_sample, 101 | bn_layer=bn_layer) 102 | 103 | 104 | class ModulatedGraphConv(nn.Module): 105 | """ 106 | Semantic graph convolution layer 107 | """ 108 | 109 | def __init__(self, in_features, out_features, adj, bias=True): 110 | super(ModulatedGraphConv, self).__init__() 111 | self.in_features = in_features 112 | self.out_features = out_features 113 | 114 | self.W = nn.Parameter(torch.zeros(size=(2, in_features, out_features), dtype=torch.float)) #torch.Size([2,2, 384]) 115 | nn.init.xavier_uniform_(self.W.data, gain=1.414) 116 | 117 | self.M = nn.Parameter(torch.zeros(size=(adj.size(0), out_features), dtype=torch.float))#17,384,取值在 118 | nn.init.xavier_uniform_(self.M.data, gain=1.414) 119 | 120 | self.adj = adj 121 | 122 | self.adj2 = nn.Parameter(torch.ones_like(adj)) 123 | nn.init.constant_(self.adj2, 1e-6) 124 | 125 | if bias: 126 | self.bias = nn.Parameter(torch.zeros(out_features, dtype=torch.float)) 127 | stdv = 1. / math.sqrt(self.W.size(2)) 128 | self.bias.data.uniform_(-stdv, stdv) 129 | else: 130 | self.register_parameter('bias', None) 131 | 132 | def forward(self, input): 133 | h0 = torch.matmul(input, self.W[0]) #input 256,17,2 -> 256,17,384 134 | h1 = torch.matmul(input, self.W[1]) 135 | 136 | adj = self.adj.to(input.device) + self.adj2.to(input.device) 137 | adj = (adj.T + adj)/2 138 | E = torch.eye(adj.size(0), dtype=torch.float).to(input.device) #17*17的I 139 | 140 | output = torch.matmul(adj * E, self.M*h0) + torch.matmul(adj * (1 - E), self.M*h1) #前者是专门针对自己的I,后者是针对M的 141 | if self.bias is not None: 142 | return output + self.bias.view(1, 1, -1) #torch.Size([256, 17, 384]),全部都有的 143 | else: 144 | return output 145 | 146 | def __repr__(self): 147 | return self.__class__.__name__ + ' (' + str(self.in_features) + ' -> ' + str(self.out_features) + ')' 148 | 149 | 150 | 151 | def normalize(mx): 152 | """Row-normalize sparse matrix""" 153 | rowsum = np.array(mx.sum(1)) 154 | r_inv = np.power(rowsum, -1).flatten() 155 | r_inv[np.isinf(r_inv)] = 0. 156 | r_mat_inv = sp.diags(r_inv) 157 | mx = r_mat_inv.dot(mx) 158 | return mx 159 | 160 | def sparse_mx_to_torch_sparse_tensor(sparse_mx): 161 | """Convert a scipy sparse matrix to a torch sparse tensor.""" 162 | sparse_mx = sparse_mx.tocoo().astype(np.float32) 163 | indices = torch.from_numpy(np.vstack((sparse_mx.row, sparse_mx.col)).astype(np.int64)) 164 | values = torch.from_numpy(sparse_mx.data) 165 | shape = torch.Size(sparse_mx.shape) 166 | return torch.sparse.FloatTensor(indices, values, shape) 167 | 168 | 169 | def adj_mx_from_edges(num_pts, edges, sparse=True): 170 | edges = np.array(edges, dtype=np.int32) 171 | data, i, j = np.ones(edges.shape[0]), edges[:, 0], edges[:, 1] 172 | adj_mx = sp.coo_matrix((data, (i, j)), shape=(num_pts, num_pts), dtype=np.float32) 173 | 174 | # build symmetric adjacency matrix 175 | adj_mx = adj_mx + adj_mx.T.multiply(adj_mx.T > adj_mx) - adj_mx.multiply(adj_mx.T > adj_mx) 176 | adj_mx = normalize(adj_mx) #+ sp.eye(adj_mx.shape[0])) 177 | if sparse: 178 | adj_mx = sparse_mx_to_torch_sparse_tensor(adj_mx) 179 | else: 180 | adj_mx = torch.tensor(adj_mx.todense(), dtype=torch.float) 181 | 182 | adj_mx = adj_mx * (1-torch.eye(adj_mx.shape[0])) + torch.eye(adj_mx.shape[0]) 183 | return adj_mx 184 | 185 | 186 | def adj_mx_from_skeleton(skeleton): 187 | num_joints = skeleton.num_joints() 188 | edges = list(filter(lambda x: x[1] >= 0, zip(list(range(0, num_joints)), skeleton.parents()))) 189 | return adj_mx_from_edges(num_joints, edges, sparse=False) 190 | -------------------------------------------------------------------------------- /model/Transformer.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | from timm.data import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD 3 | import torch 4 | 5 | 6 | class Mlp(nn.Module): 7 | def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.): 8 | super().__init__() 9 | out_features = out_features or in_features 10 | hidden_features = hidden_features or in_features 11 | self.fc1 = nn.Linear(in_features, hidden_features) 12 | self.act = act_layer() 13 | self.fc2 = nn.Linear(hidden_features, out_features) 14 | self.drop = nn.Dropout(drop) 15 | 16 | def forward(self, x): 17 | x = self.fc1(x) 18 | x = self.act(x) 19 | x = self.drop(x) 20 | x = self.fc2(x) 21 | x = self.drop(x) 22 | return x 23 | 24 | class Attention(nn.Module): 25 | def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0., length=27): 26 | super().__init__() 27 | 28 | self.num_heads = num_heads 29 | head_dim = torch.div(dim, num_heads) 30 | self.scale = qk_scale or head_dim ** -0.5 31 | self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias) 32 | self.attn_drop = nn.Dropout(attn_drop) 33 | self.proj = nn.Linear(dim, dim) 34 | self.proj_drop = nn.Dropout(proj_drop) 35 | 36 | def forward(self, x): 37 | B, N, C = x.shape 38 | qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, torch.div(C, self.num_heads, rounding_mode='floor')).permute(2, 0, 3, 1, 4) 39 | q, k, v = qkv[0], qkv[1], qkv[2] 40 | 41 | attn = (q @ k.transpose(-2, -1)) * self.scale 42 | attn = attn.softmax(dim=-1) 43 | 44 | attn = self.attn_drop(attn) 45 | 46 | x = (attn @ v).transpose(1, 2).reshape(B, N, C) 47 | x = self.proj(x) 48 | x = self.proj_drop(x) 49 | return x -------------------------------------------------------------------------------- /model/__pycache__/Block.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/model/__pycache__/Block.cpython-39.pyc -------------------------------------------------------------------------------- /model/__pycache__/GCN_conv.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/model/__pycache__/GCN_conv.cpython-39.pyc -------------------------------------------------------------------------------- /model/__pycache__/Transformer.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/model/__pycache__/Transformer.cpython-39.pyc -------------------------------------------------------------------------------- /model/__pycache__/trans.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/model/__pycache__/trans.cpython-39.pyc -------------------------------------------------------------------------------- /model/post_refine.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | from torch.autograd import Variable 5 | 6 | 7 | 8 | inter_channels = [128, 256] 9 | fc_out = inter_channels[1] 10 | fc_unit = 1024 11 | class post_refine(nn.Module): 12 | 13 | 14 | def __init__(self, opt): 15 | super().__init__() 16 | 17 | out_seqlen = 1 18 | fc_in = opt.out_channels*2*out_seqlen*opt.n_joints 19 | 20 | fc_out = opt.in_channels * opt.n_joints 21 | self.post_refine = nn.Sequential( 22 | nn.Linear(fc_in, fc_unit), 23 | nn.ReLU(), 24 | nn.Dropout(0.5,inplace=True), 25 | nn.Linear(fc_unit, fc_out), 26 | nn.Sigmoid() 27 | 28 | ) 29 | 30 | 31 | def forward(self, x, x_1): 32 | """ 33 | 34 | :param x: N*T*V*3 35 | :param x_1: N*T*V*2 36 | :return: 37 | """ 38 | # data normalization 39 | N, T, V,_ = x.size() 40 | x_in = torch.cat((x, x_1), -1) #N*T*V*5 41 | x_in = x_in.view(N, -1) 42 | 43 | 44 | 45 | score = self.post_refine(x_in).view(N,T,V,2) 46 | score_cm = Variable(torch.ones(score.size()), requires_grad=False).cuda() - score 47 | x_out = x.clone() 48 | x_out[:, :, :, :2] = score * x[:, :, :, :2] + score_cm * x_1[:, :, :, :2] 49 | 50 | return x_out -------------------------------------------------------------------------------- /model/refine.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | from torch.autograd import Variable 4 | 5 | fc_out = 256 6 | fc_unit = 1024 7 | 8 | class refine(nn.Module): 9 | def __init__(self, opt): 10 | super().__init__() 11 | 12 | out_seqlen = 1 13 | fc_in = opt.out_channels*2*out_seqlen*opt.n_joints 14 | fc_out = opt.in_channels * opt.n_joints 15 | 16 | self.post_refine = nn.Sequential( 17 | nn.Linear(fc_in, fc_unit), 18 | nn.ReLU(inplace =False), 19 | nn.Dropout(0.5,inplace =False), 20 | nn.Linear(fc_unit, fc_out), 21 | nn.Sigmoid() 22 | ) 23 | 24 | def forward(self, x, x_1): 25 | N, T, V,_ = x.size()#256,1,17,3 26 | x_in = torch.cat((x, x_1), -1) #torch.Size([256, 1, 17, 6]) 27 | x_in = x_in.view(N, -1) #torch.Size([256, 102]) 28 | 29 | score = self.post_refine(x_in).view(N,T,V,2) #torch.Size([256, 1, 17, 2]) 30 | score_cm = Variable(torch.ones(score.size()), requires_grad=False).cuda() - score 31 | x_out = x.clone() 32 | x_out[:, :, :, :2] = score * x[:, :, :, :2] + score_cm * x_1[:, :, :, :2]#torch.Size([256, 1, 17, 3]) 33 | 34 | return x_out 35 | 36 | 37 | -------------------------------------------------------------------------------- /model/trans.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from einops.einops import rearrange 3 | sys.path.append("..") 4 | import torch 5 | import torch.nn as nn 6 | from model.Block import Hiremixer 7 | from common.opt import opts 8 | opt = opts().parse() 9 | 10 | 11 | 12 | class HTNet(nn.Module): 13 | def __init__(self, args, adj): 14 | super().__init__() 15 | 16 | if args == -1: 17 | layers, channel, d_hid, length = 3, 512, 1024, 27 18 | self.num_joints_in, self.num_joints_out = 17, 17 19 | else: 20 | layers, channel, d_hid, length = args.layers, args.channel, args.d_hid, args.frames 21 | self.num_joints_in, self.num_joints_out = args.n_joints, args.out_joints 22 | 23 | self.patch_embed = nn.Linear(2, channel) 24 | self.pos_embed = nn.Parameter(torch.zeros(1, self.num_joints_in, channel)) 25 | self.Hiremixer = Hiremixer(adj, layers, channel, d_hid, length=length) 26 | self.fcn = nn.Linear(args.channel, 3) 27 | 28 | def forward(self, x): 29 | x = rearrange(x, 'b f j c -> (b f) j c').contiguous() 30 | x = self.patch_embed(x) 31 | x = x + self.pos_embed 32 | x = self.Hiremixer(x) 33 | x = self.fcn(x) 34 | x = x.view(x.shape[0], -1, self.num_joints_out, x.shape[2]) 35 | return x 36 | 37 | 38 | -------------------------------------------------------------------------------- /requirement.txt: -------------------------------------------------------------------------------- 1 | einops 2 | timm 3 | tensorboardX 4 | scipy 5 | filterpy 6 | tqdm -------------------------------------------------------------------------------- /runs/README.md: -------------------------------------------------------------------------------- 1 | 2 | --------------------------------------------------------------------------------