├── LICENSE
├── README.md
├── ckpt
    └── README.md
├── common
    ├── camera.py
    ├── generator.py
    ├── h36m_dataset.py
    ├── load_data_hm36.py
    ├── mocap_dataset.py
    ├── opt.py
    ├── skeleton.py
    └── utils.py
├── demo
    ├── figure
    │   ├── lindan.jpg
    │   └── messi.jpg
    ├── lib
    │   ├── hrnet
    │   │   ├── experiments
    │   │   │   └── w48_384x288_adam_lr1e-3.yaml
    │   │   ├── gen_kpts.py
    │   │   └── lib
    │   │   │   ├── config
    │   │   │       ├── __init__.py
    │   │   │       ├── __pycache__
    │   │   │       │   ├── __init__.cpython-38.pyc
    │   │   │       │   ├── __init__.cpython-39.pyc
    │   │   │       │   ├── default.cpython-38.pyc
    │   │   │       │   ├── default.cpython-39.pyc
    │   │   │       │   ├── models.cpython-38.pyc
    │   │   │       │   └── models.cpython-39.pyc
    │   │   │       ├── default.py
    │   │   │       └── models.py
    │   │   │   ├── models
    │   │   │       ├── __pycache__
    │   │   │       │   ├── pose_hrnet.cpython-38.pyc
    │   │   │       │   └── pose_hrnet.cpython-39.pyc
    │   │   │       └── pose_hrnet.py
    │   │   │   └── utils
    │   │   │       ├── __pycache__
    │   │   │           ├── coco_h36m.cpython-39.pyc
    │   │   │           ├── inference.cpython-39.pyc
    │   │   │           ├── transforms.cpython-39.pyc
    │   │   │           └── utilitys.cpython-39.pyc
    │   │   │       ├── coco_h36m.py
    │   │   │       ├── inference.py
    │   │   │       ├── transforms.py
    │   │   │       └── utilitys.py
    │   ├── preprocess.py
    │   ├── sort
    │   │   └── sort.py
    │   └── yolov3
    │   │   ├── bbox.py
    │   │   ├── cfg
    │   │       ├── tiny-yolo-voc.cfg
    │   │       ├── yolo-voc.cfg
    │   │       ├── yolo.cfg
    │   │       └── yolov3.cfg
    │   │   ├── darknet.py
    │   │   ├── data
    │   │       ├── coco.names
    │   │       ├── pallete
    │   │       └── voc.names
    │   │   ├── human_detector.py
    │   │   ├── preprocess.py
    │   │   └── util.py
    └── vis.py
├── figure
    ├── README.md
    ├── messi_pose.png
    ├── structure.png
    └── wild.png
├── main.py
├── model
    ├── Block.py
    ├── GCN_conv.py
    ├── Transformer.py
    ├── __pycache__
    │   ├── Block.cpython-39.pyc
    │   ├── GCN_conv.cpython-39.pyc
    │   ├── Transformer.cpython-39.pyc
    │   └── trans.cpython-39.pyc
    ├── post_refine.py
    ├── refine.py
    └── trans.py
├── requirement.txt
└── runs
    └── README.md


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2022 vefalun
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # HTNet: Human Topology Aware Network for 3D Human Pose Estimation
 2 | 
 3 | <p align="center"><img src="figure/structure.png" width="100%" alt="" /></p>
 4 | 
 5 | > [**HTNet: Human Topology Aware Network for 3D Human Pose Estimation**](https://arxiv.org/pdf/2302.09790),            
 6 | > Jialun Cai, Hong Liu, Runwei Ding , Wenhao Li, Jianbing Wu, Miaoju Ban  
 7 | > *In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023*
 8 | 
 9 | 
10 | ## Results on Human3.6M
11 | 
12 | Protocol 1 (mean per-joint position error) when 2D keypoints detected by CPN and the ground truth of 2D poses.
13 | 
14 | | Method |  Train Epochs | MPJPE (CPN) | MPJPE (GT) |
15 | |:-------|:-------:|:-------:|:-------:|
16 | | GraFormer |      50 |  51.8 mm | 35.2 mm |
17 | | MGCN (w/refine)| 50 |  49.4 mm |  33.5 mm | 37.4 mm |
18 | | HTNet          | 15 |  48.9 mm |34.0 mm|
19 | | HTNet (w/refine) | **15** |  **47.6 mm** |**31.9 mm**|
20 | 
21 | 
22 | ## Get started directly
23 | Special thanks to [MHFormer](https://github.com/Vegetebird/MHFormer), we have completed a **beginner's guide** for image-based pose estimation. 
24 | Only three steps that poses can be generated for your own images:(1) Download pretrained models (Yolov3 and HRNet) [here](https://drive.google.com/drive/folders/1LX5zhZGlZjckgfpNroWsuu84xyyFYE5X) and put them  in the './demo/lib/checkpoint'; (2)download [pretrained model](https://drive.google.com/drive/folders/134lqqu-0I6aOYr7lRufa6fMTdqm7K9Qk) and put it in the './ckpt' directory; (3) 
25 | Put your own images in the './demo/figure', and run:
26 | ```
27 | python demo/vis.py
28 | ```
29 | Then you can obtain the visualized pose in the "./demo/output", like:
30 | <p align="center"><img src="figure/messi_pose.png" width="50%" alt="" /></p>
31 | 
32 | 
33 | ## Quick start
34 | To get started as quickly as possible, follow the instructions in this section. This should allow you train a model from scratch, test our pretrained models. 
35 | 
36 | 
37 | ### Dependencies
38 | Make sure you have the following dependencies installed before proceeding:
39 | - Python 3.7+ 
40 | - PyTorch >= 1.10.0
41 | To setup the environment:
42 | ```sh
43 | pip install -r requirement.txt
44 | ```
45 | 
46 | 
47 | ### Dataset setup
48 | Please download the dataset [here](https://drive.google.com/drive/folders/1gNs5PrcaZ6gar7IiNZPNh39T7y6aPY3g) and refer to [VideoPose3D](https://github.com/facebookresearch/VideoPose3D) to set up the Human3.6M dataset ('./dataset' directory). 
49 | 
50 | ```bash
51 | ${POSE_ROOT}/
52 | |-- dataset
53 | |   |-- data_3d_h36m.npz
54 | |   |-- data_2d_h36m_gt.npz
55 | |   |-- data_2d_h36m_cpn_ft_h36m_dbb.npz
56 | ```
57 | 
58 | ### Evaluating our pre-trained models
59 | The pretrained model is [here](https://drive.google.com/drive/folders/134lqqu-0I6aOYr7lRufa6fMTdqm7K9Qk), please download it and put it in the './ckpt' directory. To achieve the performance in the paper, run:
60 | ```
61 | python main.py --reload --previous_dir "ckpt/cpn" 
62 | ```
63 | 
64 | ### Training your models
65 | If you want to train your own model, run:
66 | ```
67 | python main.py --train -n "your_model_name"
68 | ```
69 | 
70 | 
71 | ## Acknowledgement
72 | 
73 | Our code is extended from the following repositories. We thank the authors for releasing the codes. 
74 | - [MHFormer](https://github.com/Vegetebird/MHFormer)
75 | - [MGCN](https://github.com/ZhimingZo/Modulated-GCN)
76 | - [VideoPose3D](https://github.com/facebookresearch/VideoPose3D)
77 | - [3d-pose-baseline](https://github.com/una-dinosauria/3d-pose-baseline)
78 | - [3d_pose_baseline_pytorch](https://github.com/weigq/3d_pose_baseline_pytorch)
79 | - [StridedTransformer-Pose3D](https://github.com/Vegetebird/StridedTransformer-Pose3D)
80 | ## Licence
81 | 
82 | This project is licensed under the terms of the MIT license.
83 | 


--------------------------------------------------------------------------------
/ckpt/README.md:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/common/camera.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | import numpy as np
 3 | import torch
 4 | def normalize_screen_coordinates(X, w, h):
 5 |     assert X.shape[-1] == 2
 6 |     return X / w * 2 - [1, h / w]
 7 | 
 8 | 
 9 | def world_to_camera(X, R, t):
10 |     Rt = wrap(qinverse, R)
11 |     return wrap(qrot, np.tile(Rt, (*X.shape[:-1], 1)), X - t)
12 | 
13 | def camera_to_world(X, R, t):
14 |     return wrap(qrot, np.tile(R, (*X.shape[:-1], 1)), X) + t
15 | 
16 | 
17 | def wrap(func, *args, unsqueeze=False):
18 | 	args = list(args)
19 | 	for i, arg in enumerate(args):
20 | 		if type(arg) == np.ndarray:
21 | 			args[i] = torch.from_numpy(arg)
22 | 			if unsqueeze:
23 | 				args[i] = args[i].unsqueeze(0)
24 | 
25 | 	result = func(*args)
26 | 
27 | 	if isinstance(result, tuple):
28 | 		result = list(result)
29 | 		for i, res in enumerate(result):
30 | 			if type(res) == torch.Tensor:
31 | 				if unsqueeze:
32 | 					res = res.squeeze(0)
33 | 				result[i] = res.numpy()
34 | 		return tuple(result)
35 | 	elif type(result) == torch.Tensor:
36 | 		if unsqueeze:
37 | 			result = result.squeeze(0)
38 | 		return result.numpy()
39 | 	else:
40 | 		return result
41 | 
42 | def qrot(q, v):
43 | 	assert q.shape[-1] == 4
44 | 	assert v.shape[-1] == 3
45 | 	assert q.shape[:-1] == v.shape[:-1]
46 | 
47 | 	qvec = q[..., 1:]
48 | 	uv = torch.cross(qvec, v, dim=len(q.shape) - 1)
49 | 	uuv = torch.cross(qvec, uv, dim=len(q.shape) - 1)
50 | 	return (v + 2 * (q[..., :1] * uv + uuv))
51 | 
52 | 
53 | 
54 | 
55 | def qinverse(q, inplace=False):
56 |     if inplace:
57 |         q[..., 1:] *= -1
58 |         return q
59 |     else:
60 |         w = q[..., :1]
61 |         xyz = q[..., 1:]
62 |         return torch.cat((w, -xyz), dim=len(q.shape) - 1)
63 | 
64 | 
65 | 
66 | def get_uvd2xyz(uvd, gt_3D, cam):
67 |     N, T, V,_ = uvd.size()
68 | 
69 |     dec_out_all = uvd.view(-1, T, V, 3).clone()
70 |     root = gt_3D[:, :, 0, :].unsqueeze(-2).repeat(1, 1, V, 1).clone()
71 |     enc_in_all = uvd[:, :, :, :2].view(-1, T, V, 2).clone()
72 | 
73 |     cam_f_all = cam[..., :2].view(-1,1,1,2).repeat(1,T,V,1)
74 |     cam_c_all = cam[..., 2:4].view(-1,1,1,2).repeat(1,T,V,1)
75 | 
76 |     z_global = dec_out_all[:, :, :, 2]
77 |     z_global[:, :, 0] = root[:, :, 0, 2]
78 |     z_global[:, :, 1:] = dec_out_all[:, :, 1:, 2] + root[:, :, 1:, 2]
79 |     z_global = z_global.unsqueeze(-1)
80 | 
81 |     uv = enc_in_all - cam_c_all
82 |     xy = uv * z_global.repeat(1, 1, 1, 2) / cam_f_all
83 |     xyz_global = torch.cat((xy, z_global), -1)
84 |     xyz_offset = (xyz_global - xyz_global[:, :, 0, :].unsqueeze(-2).repeat(1, 1, V, 1))
85 | 
86 |     return xyz_offset


--------------------------------------------------------------------------------
/common/h36m_dataset.py:
--------------------------------------------------------------------------------
  1 | 
  2 | import numpy as np
  3 | import copy
  4 | from common.skeleton import Skeleton
  5 | from common.mocap_dataset import MocapDataset
  6 | from common.camera import normalize_screen_coordinates
  7 | 
  8 | h36m_skeleton = Skeleton(parents=[-1, 0, 1, 2, 3, 4, 0, 6, 7, 8, 9, 0, 11, 12, 13, 14, 12,
  9 |                                   16, 17, 18, 19, 20, 19, 22, 12, 24, 25, 26, 27, 28, 27, 30],
 10 |                          joints_left=[6, 7, 8, 9, 10, 16, 17, 18, 19, 20, 21, 22, 23],
 11 |                          joints_right=[1, 2, 3, 4, 5, 24, 25, 26, 27, 28, 29, 30, 31])
 12 | 
 13 | h36m_cameras_intrinsic_params = [
 14 |     {
 15 |         'id': '54138969',
 16 |         'center': [512.54150390625, 515.4514770507812],
 17 |         'focal_length': [1145.0494384765625, 1143.7811279296875],
 18 |         'radial_distortion': [-0.20709891617298126, 0.24777518212795258, -0.0030751503072679043],
 19 |         'tangential_distortion': [-0.0009756988729350269, -0.00142447161488235],
 20 |         'res_w': 1000,
 21 |         'res_h': 1002,
 22 |         'azimuth': 70,  
 23 |     },
 24 |     {
 25 |         'id': '55011271',
 26 |         'center': [508.8486328125, 508.0649108886719],
 27 |         'focal_length': [1149.6756591796875, 1147.5916748046875],
 28 |         'radial_distortion': [-0.1942136287689209, 0.2404085397720337, 0.006819975562393665],
 29 |         'tangential_distortion': [-0.0016190266469493508, -0.0027408944442868233],
 30 |         'res_w': 1000,
 31 |         'res_h': 1000,
 32 |         'azimuth': -70, 
 33 |     },
 34 |     {
 35 |         'id': '58860488',
 36 |         'center': [519.8158569335938, 501.40264892578125],
 37 |         'focal_length': [1149.1407470703125, 1148.7989501953125],
 38 |         'radial_distortion': [-0.2083381861448288, 0.25548800826072693, -0.0024604974314570427],
 39 |         'tangential_distortion': [0.0014843869721516967, -0.0007599993259645998],
 40 |         'res_w': 1000,
 41 |         'res_h': 1000,
 42 |         'azimuth': 110, 
 43 |     },
 44 |     {
 45 |         'id': '60457274',
 46 |         'center': [514.9682006835938, 501.88201904296875],
 47 |         'focal_length': [1145.5113525390625, 1144.77392578125],
 48 |         'radial_distortion': [-0.198384091258049, 0.21832367777824402, -0.008947807364165783],
 49 |         'tangential_distortion': [-0.0005872055771760643, -0.0018133620033040643],
 50 |         'res_w': 1000,
 51 |         'res_h': 1002,
 52 |         'azimuth': -110,  
 53 |     },
 54 | ]
 55 | 
 56 | h36m_cameras_extrinsic_params = {
 57 |     'S1': [
 58 |         {
 59 |             'orientation': [0.1407056450843811, -0.1500701755285263, -0.755240797996521, 0.6223280429840088],
 60 |             'translation': [1841.1070556640625, 4955.28466796875, 1563.4454345703125],
 61 |         },
 62 |         {
 63 |             'orientation': [0.6157187819480896, -0.764836311340332, -0.14833825826644897, 0.11794740706682205],
 64 |             'translation': [1761.278564453125, -5078.0068359375, 1606.2650146484375],
 65 |         },
 66 |         {
 67 |             'orientation': [0.14651472866535187, -0.14647851884365082, 0.7653023600578308, -0.6094175577163696],
 68 |             'translation': [-1846.7777099609375, 5215.04638671875, 1491.972412109375],
 69 |         },
 70 |         {
 71 |             'orientation': [0.5834008455276489, -0.7853162288665771, 0.14548823237419128, -0.14749594032764435],
 72 |             'translation': [-1794.7896728515625, -3722.698974609375, 1574.8927001953125],
 73 |         },
 74 |     ],
 75 |     'S2': [
 76 |         {},
 77 |         {},
 78 |         {},
 79 |         {},
 80 |     ],
 81 |     'S3': [
 82 |         {},
 83 |         {},
 84 |         {},
 85 |         {},
 86 |     ],
 87 |     'S4': [
 88 |         {},
 89 |         {},
 90 |         {},
 91 |         {},
 92 |     ],
 93 |     'S5': [
 94 |         {
 95 |             'orientation': [0.1467377245426178, -0.162370964884758, -0.7551892995834351, 0.6178938746452332],
 96 |             'translation': [2097.3916015625, 4880.94482421875, 1605.732421875],
 97 |         },
 98 |         {
 99 |             'orientation': [0.6159758567810059, -0.7626792192459106, -0.15728192031383514, 0.1189815029501915],
100 |             'translation': [2031.7008056640625, -5167.93310546875, 1612.923095703125],
101 |         },
102 |         {
103 |             'orientation': [0.14291371405124664, -0.12907841801643372, 0.7678384780883789, -0.6110143065452576],
104 |             'translation': [-1620.5948486328125, 5171.65869140625, 1496.43701171875],
105 |         },
106 |         {
107 |             'orientation': [0.5920479893684387, -0.7814217805862427, 0.1274748593568802, -0.15036417543888092],
108 |             'translation': [-1637.1737060546875, -3867.3173828125, 1547.033203125],
109 |         },
110 |     ],
111 |     'S6': [
112 |         {
113 |             'orientation': [0.1337897777557373, -0.15692396461963654, -0.7571090459823608, 0.6198879480361938],
114 |             'translation': [1935.4517822265625, 4950.24560546875, 1618.0838623046875],
115 |         },
116 |         {
117 |             'orientation': [0.6147197484970093, -0.7628812789916992, -0.16174767911434174, 0.11819244921207428],
118 |             'translation': [1969.803955078125, -5128.73876953125, 1632.77880859375],
119 |         },
120 |         {
121 |             'orientation': [0.1529948115348816, -0.13529130816459656, 0.7646096348762512, -0.6112781167030334],
122 |             'translation': [-1769.596435546875, 5185.361328125, 1476.993408203125],
123 |         },
124 |         {
125 |             'orientation': [0.5916101336479187, -0.7804774045944214, 0.12832270562648773, -0.1561593860387802],
126 |             'translation': [-1721.668701171875, -3884.13134765625, 1540.4879150390625],
127 |         },
128 |     ],
129 |     'S7': [
130 |         {
131 |             'orientation': [0.1435241848230362, -0.1631336808204651, -0.7548328638076782, 0.6188824772834778],
132 |             'translation': [1974.512939453125, 4926.3544921875, 1597.8326416015625],
133 |         },
134 |         {
135 |             'orientation': [0.6141672730445862, -0.7638262510299683, -0.1596645563840866, 0.1177929937839508],
136 |             'translation': [1937.0584716796875, -5119.7900390625, 1631.5665283203125],
137 |         },
138 |         {
139 |             'orientation': [0.14550060033798218, -0.12874816358089447, 0.7660516500473022, -0.6127139329910278],
140 |             'translation': [-1741.8111572265625, 5208.24951171875, 1464.8245849609375],
141 |         },
142 |         {
143 |             'orientation': [0.5912848114967346, -0.7821764349937439, 0.12445473670959473, -0.15196487307548523],
144 |             'translation': [-1734.7105712890625, -3832.42138671875, 1548.5830078125],
145 |         },
146 |     ],
147 |     'S8': [
148 |         {
149 |             'orientation': [0.14110587537288666, -0.15589867532253265, -0.7561917304992676, 0.619644045829773],
150 |             'translation': [2150.65185546875, 4896.1611328125, 1611.9046630859375],
151 |         },
152 |         {
153 |             'orientation': [0.6169601678848267, -0.7647668123245239, -0.14846350252628326, 0.11158157885074615],
154 |             'translation': [2219.965576171875, -5148.453125, 1613.0440673828125],
155 |         },
156 |         {
157 |             'orientation': [0.1471444070339203, -0.13377119600772858, 0.7670128345489502, -0.6100369691848755],
158 |             'translation': [-1571.2215576171875, 5137.0185546875, 1498.1761474609375],
159 |         },
160 |         {
161 |             'orientation': [0.5927824378013611, -0.7825870513916016, 0.12147816270589828, -0.14631995558738708],
162 |             'translation': [-1476.913330078125, -3896.7412109375, 1547.97216796875],
163 |         },
164 |     ],
165 |     'S9': [
166 |         {
167 |             'orientation': [0.15540587902069092, -0.15548215806484222, -0.7532095313072205, 0.6199594736099243],
168 |             'translation': [2044.45849609375, 4935.1171875, 1481.2275390625],
169 |         },
170 |         {
171 |             'orientation': [0.618784487247467, -0.7634735107421875, -0.14132238924503326, 0.11933968216180801],
172 |             'translation': [1990.959716796875, -5123.810546875, 1568.8048095703125],
173 |         },
174 |         {
175 |             'orientation': [0.13357827067375183, -0.1367100477218628, 0.7689454555511475, -0.6100738644599915],
176 |             'translation': [-1670.9921875, 5211.98583984375, 1528.387939453125],
177 |         },
178 |         {
179 |             'orientation': [0.5879399180412292, -0.7823407053947449, 0.1427614390850067, -0.14794869720935822],
180 |             'translation': [-1696.04345703125, -3827.099853515625, 1591.4127197265625],
181 |         },
182 |     ],
183 |     'S11': [
184 |         {
185 |             'orientation': [0.15232472121715546, -0.15442320704460144, -0.7547563314437866, 0.6191070079803467],
186 |             'translation': [2098.440185546875, 4926.5546875, 1500.278564453125],
187 |         },
188 |         {
189 |             'orientation': [0.6189449429512024, -0.7600917220115662, -0.15300633013248444, 0.1255258321762085],
190 |             'translation': [2083.182373046875, -4912.1728515625, 1561.07861328125],
191 |         },
192 |         {
193 |             'orientation': [0.14943228662014008, -0.15650227665901184, 0.7681233882904053, -0.6026304364204407],
194 |             'translation': [-1609.8153076171875, 5177.3359375, 1537.896728515625],
195 |         },
196 |         {
197 |             'orientation': [0.5894251465797424, -0.7818877100944519, 0.13991211354732513, -0.14715361595153809],
198 |             'translation': [-1590.738037109375, -3854.1689453125, 1578.017578125],
199 |         },
200 |     ],
201 | }
202 | 
203 | 
204 | class Human36mDataset(MocapDataset):
205 |     def __init__(self, path, opt, remove_static_joints=True):
206 |         super().__init__(fps=50, skeleton=h36m_skeleton)
207 |         self.train_list = ['S1', 'S5', 'S6', 'S7', 'S8']
208 |         self.test_list = ['S9', 'S11']
209 | 
210 |         self._cameras = copy.deepcopy(h36m_cameras_extrinsic_params)
211 |         for cameras in self._cameras.values():
212 |             for i, cam in enumerate(cameras):
213 |                 cam.update(h36m_cameras_intrinsic_params[i])
214 |                 for k, v in cam.items():
215 |                     if k not in ['id', 'res_w', 'res_h']:
216 |                         cam[k] = np.array(v, dtype='float32') 
217 | 
218 |                 if opt.crop_uv == 0:
219 |                     cam['center'] = normalize_screen_coordinates(cam['center'], w=cam['res_w'], h=cam['res_h']).astype(
220 |                         'float32')
221 |                     cam['focal_length'] = cam['focal_length'] / cam['res_w'] * 2
222 | 
223 |                 if 'translation' in cam:
224 |                     cam['translation'] = cam['translation'] / 1000 
225 | 
226 |                 cam['intrinsic'] = np.concatenate((cam['focal_length'],
227 |                                                    cam['center'],
228 |                                                    cam['radial_distortion'],
229 |                                                    cam['tangential_distortion']))
230 | 
231 |         data = np.load(path,allow_pickle=True)['positions_3d'].item()
232 | 
233 |         self._data = {}
234 |         for subject, actions in data.items():
235 |             self._data[subject] = {}
236 |             for action_name, positions in actions.items():
237 |                 self._data[subject][action_name] = {
238 |                     'positions': positions,
239 |                     'cameras': self._cameras[subject],
240 |                 }
241 | 
242 |         if remove_static_joints:
243 |             self.remove_joints([4, 5, 9, 10, 11, 16, 20, 21, 22, 23, 24, 28, 29, 30, 31])
244 | 
245 |             self._skeleton._parents[11] = 8
246 |             self._skeleton._parents[14] = 8
247 | 
248 |     def supports_semi_supervised(self):
249 |         return True
250 | 
251 | 
252 | 
253 | 


--------------------------------------------------------------------------------
/common/load_data_hm36.py:
--------------------------------------------------------------------------------
  1 | 
  2 | import torch.utils.data as data
  3 | import numpy as np
  4 | 
  5 | from common.utils import deterministic_random
  6 | from common.camera import world_to_camera, normalize_screen_coordinates
  7 | from common.generator import ChunkedGenerator, ChunkedGenerator_Seq
  8 | 
  9 | class Fusion(data.Dataset):  #crop:0, downsample:0  pad:0    stride:1
 10 |     def __init__(self, opt, dataset, root_path, train=True):
 11 |         self.data_type = opt.dataset
 12 |         self.train = train
 13 |         self.keypoints_name = opt.keypoints
 14 |         self.root_path = root_path
 15 | 
 16 |         self.train_list = opt.subjects_train.split(',')
 17 |         self.test_list = opt.subjects_test.split(',')
 18 |         self.action_filter = None if opt.actions == '*' else opt.actions.split(',')
 19 |         self.downsample = opt.downsample
 20 |         self.subset = opt.subset  
 21 |         self.stride = opt.stride
 22 |         self.crop_uv = opt.crop_uv
 23 |         self.test_aug = opt.test_augmentation
 24 |         self.pad = opt.pad
 25 |         causal_shift = 0
 26 |         if self.train:
 27 |             self.keypoints = self.prepare_data(dataset, self.train_list)
 28 |             self.cameras_train, self.poses_train, self.poses_train_2d = self.fetch(dataset, self.train_list,
 29 |                                                                                    subset=self.subset)
 30 |             self.generator = ChunkedGenerator(opt.batch_size, self.cameras_train, self.poses_train,
 31 |                                               self.poses_train_2d, self.stride, pad=self.pad,
 32 |                                               augment=opt.data_augmentation, reverse_aug=opt.reverse_augmentation,
 33 |                                               kps_left=self.kps_left, kps_right=self.kps_right,
 34 |                                               joints_left=self.joints_left,
 35 |                                               joints_right=self.joints_right, out_all=opt.out_all)
 36 | 
 37 |             print('INFO: Training on {} frames'.format(self.generator.num_frames()))
 38 |         else:
 39 |             self.keypoints = self.prepare_data(dataset, self.test_list)
 40 |             self.cameras_test, self.poses_test, self.poses_test_2d = self.fetch(dataset, self.test_list,
 41 |                                                                                 subset=self.subset)
 42 |             self.generator = ChunkedGenerator(opt.batch_size, self.cameras_test, self.poses_test,
 43 |                                               self.poses_test_2d,
 44 |                                               pad=self.pad, augment=False, kps_left=self.kps_left,
 45 |                                               kps_right=self.kps_right, joints_left=self.joints_left,
 46 |                                               joints_right=self.joints_right)
 47 |             self.key_index = self.generator.saved_index
 48 |             print('INFO: Testing on {} frames'.format(self.generator.num_frames()))
 49 | 
 50 |     def prepare_data(self, dataset, folder_list):
 51 |         for subject in folder_list:
 52 |             for action in dataset[subject].keys():
 53 |                 anim = dataset[subject][action]
 54 | 
 55 |                 positions_3d = []
 56 |                 for cam in anim['cameras']:
 57 |                     pos_3d = world_to_camera(anim['positions'], R=cam['orientation'], t=cam['translation'])
 58 |                     pos_3d[:, 1:] -= pos_3d[:, :1]#(1265, 17, 3)
 59 |                     positions_3d.append(pos_3d)
 60 |                 anim['positions_3d'] = positions_3d
 61 | 
 62 |         keypoints = np.load(self.root_path + 'data_2d_' + self.data_type + '_' + self.keypoints_name + '.npz',allow_pickle=True)
 63 |         keypoints_symmetry = keypoints['metadata'].item()['keypoints_symmetry']
 64 | 
 65 |         self.kps_left, self.kps_right = list(keypoints_symmetry[0]), list(keypoints_symmetry[1])
 66 |         self.joints_left, self.joints_right = list(dataset.skeleton().joints_left()), list(dataset.skeleton().joints_right())
 67 |         keypoints = keypoints['positions_2d'].item()
 68 | 
 69 |         for subject in folder_list:
 70 |             assert subject in keypoints, 'Subject {} is missing from the 2D detections dataset'.format(subject)
 71 |             for action in dataset[subject].keys():
 72 |                 assert action in keypoints[
 73 |                     subject], 'Action {} of subject {} is missing from the 2D detections dataset'.format(action,
 74 |                                                                                                          subject)
 75 |                 for cam_idx in range(len(keypoints[subject][action])):
 76 | 
 77 |                     mocap_length = dataset[subject][action]['positions_3d'][cam_idx].shape[0]
 78 |                     assert keypoints[subject][action][cam_idx].shape[0] >= mocap_length
 79 | 
 80 |                     if keypoints[subject][action][cam_idx].shape[0] > mocap_length:
 81 |                         keypoints[subject][action][cam_idx] = keypoints[subject][action][cam_idx][:mocap_length]
 82 | 
 83 |         for subject in keypoints.keys():
 84 |             for action in keypoints[subject]:
 85 |                 for cam_idx, kps in enumerate(keypoints[subject][action]):
 86 |                     cam = dataset.cameras()[subject][cam_idx]
 87 |                     if self.crop_uv == 0:
 88 |                         kps[..., :2] = normalize_screen_coordinates(kps[..., :2], w=cam['res_w'], h=cam['res_h'])
 89 |                     keypoints[subject][action][cam_idx] = kps
 90 | 
 91 |         return keypoints
 92 | 
 93 |     def fetch(self, dataset, subjects, subset=1, parse_3d_poses=True):   #self.cameras_train, self.poses_train, self.poses_train_2d
 94 |         out_poses_3d = {}
 95 |         out_poses_2d = {}
 96 |         out_camera_params = {}
 97 | 
 98 |         for subject in subjects:
 99 |             for action in self.keypoints[subject].keys():
100 |                 if self.action_filter is not None:
101 |                     found = False
102 |                     for a in self.action_filter:
103 |                         if action.startswith(a):
104 |                             found = True
105 |                             break
106 |                     if not found:
107 |                         continue
108 | 
109 |                 poses_2d = self.keypoints[subject][action]
110 | 
111 |                 for i in range(len(poses_2d)):
112 |                     out_poses_2d[(subject, action, i)] = poses_2d[i]
113 | 
114 |                 if subject in dataset.cameras():
115 |                     cams = dataset.cameras()[subject]
116 |                     assert len(cams) == len(poses_2d), 'Camera count mismatch'
117 |                     for i, cam in enumerate(cams):
118 |                         if 'intrinsic' in cam:
119 |                             out_camera_params[(subject, action, i)] = cam['intrinsic']
120 | 
121 |                 if parse_3d_poses and 'positions_3d' in dataset[subject][action]:
122 |                     poses_3d = dataset[subject][action]['positions_3d']
123 |                     assert len(poses_3d) == len(poses_2d), 'Camera count mismatch'
124 |                     for i in range(len(poses_3d)):
125 |                         out_poses_3d[(subject, action, i)] = poses_3d[i]
126 | 
127 |         if len(out_camera_params) == 0:
128 |             out_camera_params = None
129 |         if len(out_poses_3d) == 0:
130 |             out_poses_3d = None
131 | 
132 |         stride = self.downsample
133 |         if subset < 1:
134 |             for key in out_poses_2d.keys():
135 |                 n_frames = int(round(len(out_poses_2d[key]) // stride * subset) * stride)
136 |                 start = deterministic_random(0, len(out_poses_2d[key]) - n_frames + 1, str(len(out_poses_2d[key])))
137 |                 out_poses_2d[key] = out_poses_2d[key][start:start + n_frames:stride]
138 |                 if out_poses_3d is not None:
139 |                     out_poses_3d[key] = out_poses_3d[key][start:start + n_frames:stride]
140 |         elif stride > 1:   #这一步
141 |             for key in out_poses_2d.keys():
142 |                 out_poses_2d[key] = out_poses_2d[key][::stride]
143 |                 if out_poses_3d is not None:
144 |                     out_poses_3d[key] = out_poses_3d[key][::stride]
145 | 
146 |         return out_camera_params, out_poses_3d, out_poses_2d
147 | 
148 |     def __len__(self):
149 |         return len(self.generator.pairs)
150 | 
151 |     def __getitem__(self, index):
152 |         seq_name, start_3d, end_3d, flip, reverse = self.generator.pairs[index]
153 | 
154 |         cam, gt_3D, input_2D, action, subject, cam_ind = self.generator.get_batch(seq_name, start_3d, end_3d, flip, reverse)
155 | 
156 |         if self.train == False and self.test_aug:
157 |             _, _, input_2D_aug, _, _,_ = self.generator.get_batch(seq_name, start_3d, end_3d, flip=True, reverse=reverse)
158 |             input_2D = np.concatenate((np.expand_dims(input_2D,axis=0),np.expand_dims(input_2D_aug,axis=0)),0)
159 | 
160 |         bb_box = np.array([0, 0, 1, 1])
161 |         input_2D_update = input_2D
162 | 
163 |         scale = np.float(1.0)
164 | 
165 |         return cam, gt_3D, input_2D_update, action, subject, scale, bb_box, cam_ind
166 | 
167 | 
168 | 
169 | 


--------------------------------------------------------------------------------
/common/mocap_dataset.py:
--------------------------------------------------------------------------------
 1 | 
 2 | 
 3 | class MocapDataset:
 4 |     def __init__(self, fps, skeleton):
 5 |         self._skeleton = skeleton
 6 |         self._fps = fps
 7 |         self._data = None  
 8 |         self._cameras = None  
 9 | 
10 |     def remove_joints(self, joints_to_remove):
11 |         kept_joints = self._skeleton.remove_joints(joints_to_remove)
12 |         for subject in self._data.keys():
13 |             for action in self._data[subject].keys():
14 |                 s = self._data[subject][action]
15 |                 s['positions'] = s['positions'][:, kept_joints]
16 | 
17 |     def __getitem__(self, key):
18 |         return self._data[key]
19 | 
20 |     def subjects(self):
21 |         return self._data.keys()
22 | 
23 |     def fps(self):
24 |         return self._fps
25 | 
26 |     def skeleton(self):
27 |         return self._skeleton
28 | 
29 |     def cameras(self):
30 |         return self._cameras
31 | 
32 |     def supports_semi_supervised(self):
33 |         return False
34 | 
35 | 
36 | 


--------------------------------------------------------------------------------
/common/opt.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | from email.policy import default
  3 | import os
  4 | import math
  5 | import time
  6 | import torch
  7 | 
  8 | class opts():
  9 |     def __init__(self):
 10 |         self.parser = argparse.ArgumentParser()
 11 | 
 12 |     def init(self):
 13 |         #model args
 14 |         self.parser.add_argument('--layers', default=3, type=int)
 15 |         self.parser.add_argument('--channel', default=240, type=int,help="Must be a multiple of 24")
 16 |         self.parser.add_argument('--frames', type=int, default=1)
 17 |         self.parser.add_argument('--pad', type=int, default=0)
 18 |         self.parser.add_argument('-n','--model_name', type=str, default='your_model', help='Name of your model')
 19 |         self.parser.add_argument('--d_hid', default=1024, type=int)
 20 |         self.parser.add_argument('--n_joints', type=int, default=17)
 21 |         self.parser.add_argument('--out_joints', type=int, default=17)
 22 |         self.parser.add_argument('--in_channels', type=int, default=2)
 23 |         self.parser.add_argument('--out_channels', type=int, default=3)        
 24 |         
 25 |         
 26 |         
 27 |         #train args
 28 |         self.parser.add_argument('--gpu', default='0', type=str, help='')
 29 |         self.parser.add_argument('--train', action='store_true')
 30 |         self.parser.add_argument('--nepoch', type=int, default=300)
 31 |         self.parser.add_argument('--batch_size', type=int, default=512)
 32 |         self.parser.add_argument('--dataset', type=str, default='h36m')
 33 |         self.parser.add_argument('--lr', type=float, default=0.0005)
 34 |         self.parser.add_argument('--large_decay_epoch', type=int, default=5)
 35 |         self.parser.add_argument('-lrd', '--lr_decay', default=0.95, type=float)        
 36 |         self.parser.add_argument('--lr_decay_large', type=float, default=0.5)     
 37 |         self.parser.add_argument('--min_lr', type=float, default=1e-6, metavar='LR',
 38 |                         help='lower lr bound for cyclic schedulers that hit 0')   
 39 |         self.parser.add_argument('--workers', type=int, default=4)
 40 |         self.parser.add_argument('--out_all', type=int, default=1)
 41 |         self.parser.add_argument('--drop',default=0.2, type=float)
 42 |         self.parser.add_argument('--seed',default=1, type=int)        
 43 |         self.parser.add_argument('-k', '--keypoints', default='cpn_ft_h36m_dbb', type=str)
 44 |         self.parser.add_argument('--data_augmentation', type=bool, default=True)
 45 |         self.parser.add_argument('--test_augmentation', type=bool, default=True)        
 46 |         self.parser.add_argument('--reverse_augmentation', type=bool, default=False)
 47 |         self.parser.add_argument('--root_path', type=str, default='./dataset/',help='Put the dataset into this file')
 48 |         self.parser.add_argument('-a', '--actions', default='*', type=str)
 49 |         self.parser.add_argument('--downsample', default=1, type=int)
 50 |         self.parser.add_argument('--subset', default=1, type=float)  
 51 |         self.parser.add_argument('--stride', default=1, type=float)       
 52 |         self.parser.add_argument('--lr_min',type=float,default=0,help='Min learn rate') 
 53 |         
 54 |         
 55 |         # test args
 56 |         self.parser.add_argument('--test', type=int, default=1)
 57 |         self.parser.add_argument('--reload', action='store_true')
 58 |         self.parser.add_argument('--previous_dir', type=str, default='./ckpt/your_model')
 59 |         self.parser.add_argument('--previous',type=str,default='ckpt')
 60 |         self.parser.add_argument('-previous_best_threshold', type=float, default= math.inf)
 61 |         self.parser.add_argument('-previous_name', type=str, default='')
 62 |         self.parser.add_argument('--viz', type=str, default='try')
 63 |         
 64 |         #refine
 65 |         self.parser.add_argument('--refine', action='store_true')
 66 |         self.parser.add_argument('--crop_uv', type=int, default=0)
 67 |         self.parser.add_argument('--lr_refine', type=float, default=1e-5)
 68 |         self.parser.add_argument('--refine_train_reload', action='store_true')
 69 |         self.parser.add_argument('--refine_test_reload', action='store_true')
 70 |         self.parser.add_argument('--previous_refine_name', type=str, default='')
 71 | 
 72 |         #vis
 73 |         self.parser.add_argument('--figure', type=str, default='demo.jpg', help='input figure')
 74 |         self.parser.add_argument('--video', type=str, default='demo.jpg', help='input figure')
 75 | 
 76 | 
 77 | 
 78 |     def parse(self):
 79 |         self.init()
 80 |         self.opt = self.parser.parse_args()
 81 |         self.opt.pad = (self.opt.frames-1) // 2
 82 |         self.opt.subjects_train = 'S1,S5,S6,S7,S8'
 83 |         self.opt.subjects_test = 'S9,S11'
 84 | 
 85 |         if self.opt.train:
 86 |             self.opt.checkpoint = 'ckpt/' + self.opt.model_name
 87 |             if not os.path.exists(self.opt.checkpoint):
 88 |                 os.makedirs(self.opt.checkpoint)
 89 | 
 90 | 
 91 |             args = dict((name, getattr(self.opt, name)) for name in dir(self.opt)
 92 |                     if not name.startswith('_'))
 93 |             file_name = os.path.join(self.opt.checkpoint, 'opt.txt')
 94 |             with open(file_name, 'wt') as opt_file:
 95 |                 opt_file.write('==> Args:\n')
 96 |                 for k, v in sorted(args.items()):
 97 |                     opt_file.write('  %s: %s\n' % (str(k), str(v)))
 98 |                 opt_file.write('==> Args:\n')
 99 | 
100 |         return self.opt
101 | 
102 | 
103 | 
104 | 
105 | 
106 | 
107 | 


--------------------------------------------------------------------------------
/common/skeleton.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import numpy as np
 3 | 
 4 | class Skeleton:
 5 |     def __init__(self, parents, joints_left, joints_right):
 6 |         assert len(joints_left) == len(joints_right)
 7 | 
 8 |         self._parents = np.array(parents)
 9 |         self._joints_left = joints_left
10 |         self._joints_right = joints_right
11 |         self._compute_metadata()
12 | 
13 |     def num_joints(self):
14 |         return len(self._parents)
15 | 
16 |     def parents(self):
17 |         return self._parents
18 | 
19 |     def has_children(self):
20 |         return self._has_children
21 | 
22 |     def children(self):
23 |         return self._children
24 | 
25 |     def remove_joints(self, joints_to_remove):
26 | 
27 |         valid_joints = []
28 |         for joint in range(len(self._parents)):
29 |             if joint not in joints_to_remove:
30 |                 valid_joints.append(joint)
31 | 
32 |         for i in range(len(self._parents)):
33 |             while self._parents[i] in joints_to_remove:
34 |                 self._parents[i] = self._parents[self._parents[i]]
35 | 
36 |         index_offsets = np.zeros(len(self._parents), dtype=int)
37 |         new_parents = []
38 |         for i, parent in enumerate(self._parents):
39 |             if i not in joints_to_remove:
40 |                 new_parents.append(parent - index_offsets[parent])
41 |             else:
42 |                 index_offsets[i:] += 1
43 |         self._parents = np.array(new_parents)
44 | 
45 |         if self._joints_left is not None:
46 |             new_joints_left = []
47 |             for joint in self._joints_left:
48 |                 if joint in valid_joints:
49 |                     new_joints_left.append(joint - index_offsets[joint])
50 |             self._joints_left = new_joints_left
51 |         if self._joints_right is not None:
52 |             new_joints_right = []
53 |             for joint in self._joints_right:
54 |                 if joint in valid_joints:
55 |                     new_joints_right.append(joint - index_offsets[joint])
56 |             self._joints_right = new_joints_right
57 | 
58 |         self._compute_metadata()
59 | 
60 |         return valid_joints
61 | 
62 |     def joints_left(self):
63 |         return self._joints_left
64 | 
65 |     def joints_right(self):
66 |         return self._joints_right
67 | 
68 |     def _compute_metadata(self):
69 |         self._has_children = np.zeros(len(self._parents)).astype(bool)
70 |         for i, parent in enumerate(self._parents):
71 |             if parent != -1:
72 |                 self._has_children[parent] = True
73 | 
74 |         self._children = []
75 |         for i, parent in enumerate(self._parents):
76 |             self._children.append([])
77 |         for i, parent in enumerate(self._parents):
78 |             if parent != -1:
79 |                 self._children[parent].append(i)
80 | 
81 | 
82 |                 


--------------------------------------------------------------------------------
/common/utils.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import numpy as np
  3 | import hashlib
  4 | from torch.autograd import Variable
  5 | import os
  6 | 
  7 | def deterministic_random(min_value, max_value, data):
  8 |     digest = hashlib.sha256(data.encode()).digest()
  9 |     raw_value = int.from_bytes(digest[:4], byteorder='little', signed=False)
 10 |     return int(raw_value / (2 ** 32 - 1) * (max_value - min_value)) + min_value
 11 | 
 12 | 
 13 | def mpjpe_cal_mask(predicted, target, mask):
 14 |     assert predicted.shape == target.shape
 15 |     # index = [i for i in range(17) if i in mask]
 16 |     predicted = predicted[:,:,mask,:]
 17 |     target = target[:,:,mask,:]
 18 |     return torch.mean(torch.norm(predicted - target, dim=len(target.shape) - 1)).contiguous()
 19 | 
 20 | def mpjpe_cal(predicted, target):
 21 |     assert predicted.shape == target.shape
 22 |     return torch.mean(torch.norm(predicted - target, dim=len(target.shape) - 1)).contiguous()
 23 | 
 24 | def skeloss(predicted, target):
 25 |     assert predicted.shape == target.shape
 26 |     start = [0,1,2, 0,4,5, 0,7,8,9,   8,14,15,  8,11,12]
 27 |     end =   [1,2,3, 4,5,6, 7,8,9,10, 14,15,16, 11,12,13]
 28 |     ske_predicted = torch.zeros(len(start))
 29 |     ske_target = torch.zeros(len(start))
 30 |     for i in range(len(start)):
 31 |         ske_predicted[i] = torch.mean(torch.norm(predicted[:,:,start[i],:] - predicted[:,:,end[i],:], dim=2)).contiguous()
 32 |         ske_target[i] = torch.mean(torch.norm(target[:,:,start[i],:] - target[:,:,end[i],:], dim=2)).contiguous()
 33 | 
 34 | 
 35 | 
 36 |     return torch.mean(torch.norm(ske_predicted[i] - ske_target[i]))
 37 | 
 38 | 
 39 | 
 40 | 
 41 | 
 42 | def frame_loss(predicted):#256,9,17,2
 43 |     loss = 0
 44 |     for k in range(predicted.size(0)-1):
 45 |         for i in range(predicted.size(1)-1):
 46 |             for j in range(predicted.size(2)-1):
 47 |                 loss += (predicted[k+1,i+1,j+1,0] - predicted[k,i,j,0])**2
 48 |                 loss += (predicted[k+1,i+1,j+1,1] - predicted[k,i,j,1])**2
 49 |     return loss
 50 | 
 51 |     return torch.mean(torch.norm(predicted - target, dim=len(target.shape) - 1)).contiguous()
 52 | 
 53 | ## viz loss
 54 | def p_mpjpe(predicted, target):
 55 |     """
 56 |     Pose error: MPJPE after rigid alignment (scale, rotation, and translation),
 57 |     often referred to as "Protocol #2" in many papers.
 58 |     """
 59 |     assert predicted.shape == target.shape
 60 | 
 61 |     muX = np.mean(target, axis=1, keepdims=True)
 62 |     muY = np.mean(predicted, axis=1, keepdims=True)
 63 | 
 64 |     X0 = target - muX
 65 |     Y0 = predicted - muY
 66 | 
 67 |     normX = np.sqrt(np.sum(X0**2, axis=(1, 2), keepdims=True))
 68 |     normY = np.sqrt(np.sum(Y0**2, axis=(1, 2), keepdims=True))
 69 | 
 70 |     X0 /= normX
 71 |     Y0 /= normY
 72 | 
 73 |     H = np.matmul(X0.transpose(0, 2, 1), Y0)
 74 |     U, s, Vt = np.linalg.svd(H)
 75 |     V = Vt.transpose(0, 2, 1)
 76 |     R = np.matmul(V, U.transpose(0, 2, 1))
 77 | 
 78 |     # Avoid improper rotations (reflections), i.e. rotations with det(R) = -1
 79 |     sign_detR = np.sign(np.expand_dims(np.linalg.det(R), axis=1))
 80 |     V[:, :, -1] *= sign_detR
 81 |     s[:, -1] *= sign_detR.flatten()
 82 |     R = np.matmul(V, U.transpose(0, 2, 1)) # Rotation
 83 | 
 84 |     tr = np.expand_dims(np.sum(s, axis=1, keepdims=True), axis=2)
 85 | 
 86 |     a = tr * normX / normY # Scale
 87 |     t = muX - a*np.matmul(muY, R) # Translation
 88 | 
 89 |     # Perform rigid transformation on the input
 90 |     predicted_aligned = a*np.matmul(predicted, R) + t
 91 | 
 92 |     # Return MPJPE
 93 |     return np.mean(np.linalg.norm(predicted_aligned - target, axis=len(target.shape)-1))
 94 | 
 95 | 
 96 | 
 97 | 
 98 | def compute_PCK(gts, preds, scales=1000, eval_joints=None, threshold=150):
 99 |     PCK_THRESHOLD = threshold
100 |     sample_num = len(gts)
101 |     total = 0
102 |     true_positive = 0
103 |     if eval_joints is None:
104 |         eval_joints = list(range(gts.shape[1]))
105 | 
106 |     for n in range(sample_num):
107 |         gt = gts[n]
108 |         pred = preds[n]
109 |         # scale = scales[n]
110 |         scale = 1000
111 |         per_joint_error = np.take(np.sqrt(np.sum(np.power(pred - gt, 2), 1)) * scale, eval_joints, axis=0)
112 |         true_positive += (per_joint_error < PCK_THRESHOLD).sum()
113 |         total += per_joint_error.size
114 | 
115 |     pck = float(true_positive / total) * 100
116 |     return pck
117 | 
118 | 
119 | def compute_AUC(gts, preds, scales=1000, eval_joints=None):
120 |     # This range of thresholds mimics 'mpii_compute_3d_pck.m', which is provided as part of the
121 |     # MPI-INF-3DHP test data release.
122 |     thresholds = np.linspace(0, 150, 31)
123 |     pck_list = []
124 |     for threshold in thresholds:
125 |         pck_list.append(compute_PCK(gts, preds, scales, eval_joints, threshold))
126 | 
127 |     auc = np.mean(pck_list)
128 | 
129 |     return auc
130 | 
131 | 
132 | def mean_velocity_error(predicted, target):
133 |     """
134 |     Mean per-joint velocity error (i.e. mean Euclidean distance of the 1st derivative)
135 |     """
136 |     assert predicted.shape == target.shape
137 | 
138 |     velocity_predicted = np.diff(predicted, axis=0)
139 |     velocity_target = np.diff(target, axis=0)
140 | 
141 |     return np.mean(np.linalg.norm(velocity_predicted - velocity_target, axis=len(target.shape)-1))
142 | 
143 | def weighted_mpjpe(predicted, target, w):
144 |     """
145 |     Weighted mean per-joint position error (i.e. mean Euclidean distance)
146 |     """
147 |     assert predicted.shape == target.shape
148 |     assert w.shape[0] == predicted.shape[0]
149 |     return torch.mean(w * torch.norm(predicted - target, dim=len(target.shape)-1))
150 | 
151 | 
152 | 
153 | # def test_calculation(predicted, target, action, error_sum, data_type, subject):
154 | #     error_sum = mpjpe_by_action(predicted, target, action, error_sum)
155 | 
156 | #     return error_sum
157 | 
158 | def test_calculation(predicted, target, action, error_sum, data_type, subject):
159 |     error_sum = mpjpe_by_action_p1(predicted, target, action, error_sum)
160 |     error_sum = mpjpe_by_action_p2(predicted, target, action, error_sum)
161 | 
162 |     return error_sum
163 | 
164 | 
165 | 
166 | 
167 | def mpjpe_by_action(predicted, target, action, action_error_sum):
168 |     assert predicted.shape == target.shape
169 |     num = predicted.size(0)
170 |     dist = torch.mean(torch.norm(predicted - target, dim=len(target.shape) - 1), dim=len(target.shape) - 2)
171 | 
172 |     if len(set(list(action))) == 1:
173 |         end_index = action[0].find(' ')
174 |         if end_index != -1:
175 |             action_name = action[0][:end_index]
176 |         else:
177 |             action_name = action[0]
178 | 
179 |         action_error_sum[action_name].update(torch.mean(dist).item()*num, num)
180 |     else:
181 |         for i in range(num):
182 |             end_index = action[i].find(' ')
183 |             if end_index != -1:
184 |                 action_name = action[i][:end_index]
185 |             else:
186 |                 action_name = action[i]
187 | 
188 |             action_error_sum[action_name].update(dist[i].item(), 1)
189 | 
190 |     return action_error_sum
191 | 
192 | 
193 | def mpjpe_by_action_p1(predicted, target, action, action_error_sum):
194 |     assert predicted.shape == target.shape
195 |     num = predicted.size(0)
196 |     dist = torch.mean(torch.norm(predicted - target, dim=len(target.shape) - 1), dim=len(target.shape) - 2)
197 | 
198 |     if len(set(list(action))) == 1:
199 |         end_index = action[0].find(' ')
200 |         if end_index != -1:
201 |             action_name = action[0][:end_index]
202 |         else:
203 |             action_name = action[0]
204 | 
205 |         action_error_sum[action_name]['p1'].update(torch.mean(dist).item()*num, num)
206 |     else:
207 |         for i in range(num):
208 |             end_index = action[i].find(' ')
209 |             if end_index != -1:
210 |                 action_name = action[i][:end_index]
211 |             else:
212 |                 action_name = action[i]
213 | 
214 |             action_error_sum[action_name]['p1'].update(dist[i].item(), 1)
215 | 
216 |     return action_error_sum
217 | 
218 | def mpjpe_by_action_p2(predicted, target, action, action_error_sum):
219 |     assert predicted.shape == target.shape
220 |     num = predicted.size(0)
221 |     pred = predicted.detach().cpu().numpy().reshape(-1, predicted.shape[-2], predicted.shape[-1])
222 |     gt = target.detach().cpu().numpy().reshape(-1, target.shape[-2], target.shape[-1])
223 |     dist = p_mpjpe(pred, gt)
224 | 
225 |     if len(set(list(action))) == 1:
226 |         end_index = action[0].find(' ')
227 |         if end_index != -1:
228 |             action_name = action[0][:end_index]
229 |         else:
230 |             action_name = action[0]
231 |         action_error_sum[action_name]['p2'].update(np.mean(dist) * num, num)
232 |     else:
233 |         for i in range(num):
234 |             end_index = action[i].find(' ')
235 |             if end_index != -1:
236 |                 action_name = action[i][:end_index]
237 |             else:
238 |                 action_name = action[i]
239 |             action_error_sum[action_name]['p2'].update(np.mean(dist), 1)
240 | 
241 |     return action_error_sum
242 | 
243 | 
244 | 
245 | def mpjpe_by_joint_mae(predicted, target,num):
246 |     assert predicted.shape == target.shape
247 |     # this is the joint
248 |     mpjpe_joint = torch.mean(torch.mean(torch.norm(predicted - target, dim=len(target.shape) - 1), dim=len(target.shape) - 3),dim=len(target.shape)-4)
249 |     print("\nthe mpjpe/joint",mpjpe_joint)
250 |     # this is the order of joint from big to small
251 |     index = torch.flip(mpjpe_joint.sort(-1).indices,dims=[0])
252 |     index = index.split(num,-1)[0]
253 |     print("\nerror joint",index)
254 |     return index
255 | 
256 | 
257 | 
258 | 
259 | 
260 | def define_actions( action ):
261 | 
262 |   actions = ["Directions","Discussion","Eating","Greeting",
263 |            "Phoning","Photo","Posing","Purchases",
264 |            "Sitting","SittingDown","Smoking","Waiting",
265 |            "WalkDog","Walking","WalkTogether"]
266 | 
267 |   if action == "All" or action == "all" or action == '*':
268 |     return actions
269 | 
270 |   if not action in actions:
271 |     raise( ValueError, "Unrecognized action: %s" % action )
272 | 
273 |   return [action]
274 | 
275 | 
276 | def define_error_list(actions):
277 |     error_sum = {}
278 |     error_sum.update({actions[i]:
279 |         {'p1':AccumLoss(), 'p2':AccumLoss()}
280 |         for i in range(len(actions))})
281 |     return error_sum
282 | 
283 | # def define_error_list(actions):
284 | #     error_sum = {}
285 | #     error_sum.update({actions[i]: AccumLoss() for i in range(len(actions))})
286 | #     return error_sum
287 | 
288 | 
289 | class AccumLoss(object):
290 |     def __init__(self):
291 |         self.val = 0
292 |         self.avg = 0
293 |         self.sum = 0
294 |         self.count = 0
295 | 
296 |     def update(self, val, n=1):
297 |         self.val = val
298 |         self.sum += val
299 |         self.count += n
300 |         self.avg = self.sum / self.count
301 | 
302 | 
303 | def get_varialbe(split, target):
304 |     num = len(target)
305 |     var = []
306 |     if split == 'train':
307 |         for i in range(num):
308 |             temp = Variable(target[i], requires_grad=False).contiguous().type(torch.cuda.FloatTensor)
309 |             var.append(temp)
310 |     else:
311 |         for i in range(num):
312 |             temp = Variable(target[i]).contiguous().cuda().type(torch.cuda.FloatTensor)
313 |             var.append(temp)
314 | 
315 |     return var
316 | 
317 | 
318 | 
319 | 
320 | def print_error(data_type, action_error_sum, is_train):
321 |     mean_error_p1, mean_error_p2 = print_error_action(action_error_sum, is_train)
322 | 
323 |     return mean_error_p1, mean_error_p2
324 | 
325 | 
326 | 
327 | 
328 | def print_error_action(action_error_sum, is_train):
329 |     mean_error_each = {'p1': 0.0, 'p2': 0.0}
330 |     mean_error_all  = {'p1': AccumLoss(), 'p2': AccumLoss()}
331 | 
332 |     if is_train == 0:
333 |         print("{0:=^12} {1:=^10} {2:=^8}".format("Action", "p#1 mm", "p#2 mm"))
334 | 
335 | 
336 |     for action, value in action_error_sum.items():
337 |         if is_train == 0:
338 |             print("{0:<12} ".format(action), end="")
339 | 
340 |         mean_error_each['p1'] = action_error_sum[action]['p1'].avg * 1000.0
341 |         mean_error_all['p1'].update(mean_error_each['p1'], 1)
342 | 
343 |         mean_error_each['p2'] = action_error_sum[action]['p2'].avg * 1000.0
344 |         mean_error_all['p2'].update(mean_error_each['p2'], 1)
345 | 
346 |         if is_train == 0:
347 |             print("{0:>6.2f} {1:>10.2f}".format(mean_error_each['p1'], mean_error_each['p2']))
348 | 
349 |     if is_train == 0:
350 |         print("{0:<12} {1:>6.2f} {2:>10.2f}".format("Average", mean_error_all['p1'].avg, \
351 |                 mean_error_all['p2'].avg))
352 | 
353 |     return mean_error_all['p1'].avg, mean_error_all['p2'].avg
354 | 
355 | 
356 | 
357 | def save_model_refine(previous_name, save_dir,epoch, data_threshold, model, model_name):#
358 |     if os.path.exists(previous_name):
359 |         os.remove(previous_name)
360 | 
361 |     torch.save(model.state_dict(),
362 |                '%s/%s_%d_%d.pth' % (save_dir, model_name, epoch, data_threshold * 100))
363 |     previous_name = '%s/%s_%d_%d.pth' % (save_dir, model_name, epoch, data_threshold * 100)
364 | 
365 |     return previous_name
366 | 
367 | 
368 | def save_model(previous_name, save_dir, epoch, data_threshold, model):
369 |     if os.path.exists(previous_name):
370 |         os.remove(previous_name)
371 | 
372 |     torch.save(model.state_dict(),
373 |                '%s/model_%d_%d.pth' % (save_dir, epoch, data_threshold * 100))
374 |     previous_name = '%s/model_%d_%d.pth' % (save_dir, epoch, data_threshold * 100)
375 |     return previous_name
376 | 
377 | 
378 | 
379 | def save_model_epoch(previous_name, save_dir, epoch, data_threshold, model):
380 |     # if os.path.exists(previous_name):
381 |     #     os.remove(previous_name)
382 | 
383 |     torch.save(model.state_dict(),
384 |                '%s/model_%d_%d.pth' % (save_dir, epoch, data_threshold * 100))
385 |     previous_name = '%s/model_%d_%d.pth' % (save_dir, epoch, data_threshold * 100)
386 |     return previous_name
387 | 
388 | 
389 | 
390 | 
391 | 
392 | 
393 | 


--------------------------------------------------------------------------------
/demo/figure/lindan.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/figure/lindan.jpg


--------------------------------------------------------------------------------
/demo/figure/messi.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/figure/messi.jpg


--------------------------------------------------------------------------------
/demo/lib/hrnet/experiments/w48_384x288_adam_lr1e-3.yaml:
--------------------------------------------------------------------------------
  1 | AUTO_RESUME: true
  2 | CUDNN:
  3 |   BENCHMARK: true
  4 |   DETERMINISTIC: false
  5 |   ENABLED: true
  6 | DATA_DIR: ''
  7 | GPUS: (0,1,2,3)
  8 | OUTPUT_DIR: 'output'
  9 | LOG_DIR: 'log'
 10 | WORKERS: 24
 11 | PRINT_FREQ: 100
 12 | 
 13 | DATASET:
 14 |   COLOR_RGB: true
 15 |   DATASET: 'coco'
 16 |   DATA_FORMAT: jpg
 17 |   FLIP: true
 18 |   NUM_JOINTS_HALF_BODY: 8
 19 |   PROB_HALF_BODY: 0.3
 20 |   ROOT: 'data/coco/'
 21 |   ROT_FACTOR: 45
 22 |   SCALE_FACTOR: 0.35
 23 |   TEST_SET: 'val2017'
 24 |   TRAIN_SET: 'train2017'
 25 | MODEL:
 26 |   INIT_WEIGHTS: true
 27 |   NAME: pose_hrnet
 28 |   NUM_JOINTS: 17
 29 |   PRETRAINED: 'models/pytorch/imagenet/hrnet_w48-8ef0771d.pth'
 30 |   TARGET_TYPE: gaussian
 31 |   IMAGE_SIZE:
 32 |   - 288
 33 |   - 384
 34 |   HEATMAP_SIZE:
 35 |   - 72
 36 |   - 96
 37 |   SIGMA: 3
 38 |   EXTRA:
 39 |     PRETRAINED_LAYERS:
 40 |     - 'conv1'
 41 |     - 'bn1'
 42 |     - 'conv2'
 43 |     - 'bn2'
 44 |     - 'layer1'
 45 |     - 'transition1'
 46 |     - 'stage2'
 47 |     - 'transition2'
 48 |     - 'stage3'
 49 |     - 'transition3'
 50 |     - 'stage4'
 51 |     FINAL_CONV_KERNEL: 1
 52 |     STAGE2:
 53 |       NUM_MODULES: 1
 54 |       NUM_BRANCHES: 2
 55 |       BLOCK: BASIC
 56 |       NUM_BLOCKS:
 57 |       - 4
 58 |       - 4
 59 |       NUM_CHANNELS:
 60 |       - 48
 61 |       - 96
 62 |       FUSE_METHOD: SUM
 63 |     STAGE3:
 64 |       NUM_MODULES: 4
 65 |       NUM_BRANCHES: 3
 66 |       BLOCK: BASIC
 67 |       NUM_BLOCKS:
 68 |       - 4
 69 |       - 4
 70 |       - 4
 71 |       NUM_CHANNELS:
 72 |       - 48
 73 |       - 96
 74 |       - 192
 75 |       FUSE_METHOD: SUM
 76 |     STAGE4:
 77 |       NUM_MODULES: 3
 78 |       NUM_BRANCHES: 4
 79 |       BLOCK: BASIC
 80 |       NUM_BLOCKS:
 81 |       - 4
 82 |       - 4
 83 |       - 4
 84 |       - 4
 85 |       NUM_CHANNELS:
 86 |       - 48
 87 |       - 96
 88 |       - 192
 89 |       - 384
 90 |       FUSE_METHOD: SUM
 91 | LOSS:
 92 |   USE_TARGET_WEIGHT: true
 93 | TRAIN:
 94 |   BATCH_SIZE_PER_GPU: 24
 95 |   SHUFFLE: true
 96 |   BEGIN_EPOCH: 0
 97 |   END_EPOCH: 210
 98 |   OPTIMIZER: adam
 99 |   LR: 0.001
100 |   LR_FACTOR: 0.1
101 |   LR_STEP:
102 |   - 170
103 |   - 200
104 |   WD: 0.0001
105 |   GAMMA1: 0.99
106 |   GAMMA2: 0.0
107 |   MOMENTUM: 0.9
108 |   NESTEROV: false
109 | TEST:
110 |   BATCH_SIZE_PER_GPU: 24
111 |   COCO_BBOX_FILE: 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json'
112 |   BBOX_THRE: 1.0
113 |   IMAGE_THRE: 0.0
114 |   IN_VIS_THRE: 0.2
115 |   MODEL_FILE: ''
116 |   NMS_THRE: 1.0
117 |   OKS_THRE: 0.9
118 |   USE_GT_BBOX: true
119 |   FLIP_TEST: true
120 |   POST_PROCESS: true
121 |   SHIFT_HEATMAP: true
122 | DEBUG:
123 |   DEBUG: true
124 |   SAVE_BATCH_IMAGES_GT: true
125 |   SAVE_BATCH_IMAGES_PRED: true
126 |   SAVE_HEATMAPS_GT: true
127 |   SAVE_HEATMAPS_PRED: true
128 | 


--------------------------------------------------------------------------------
/demo/lib/hrnet/gen_kpts.py:
--------------------------------------------------------------------------------
  1 | from __future__ import absolute_import
  2 | from __future__ import division
  3 | from __future__ import print_function
  4 | 
  5 | import sys
  6 | import os
  7 | import os.path as osp
  8 | import argparse
  9 | import time
 10 | import numpy as np
 11 | from tqdm import tqdm
 12 | import json
 13 | import torch
 14 | import torch.backends.cudnn as cudnn
 15 | import cv2
 16 | import copy
 17 | 
 18 | from lib.hrnet.lib.utils.utilitys import plot_keypoint, PreProcess, write, load_json
 19 | from lib.hrnet.lib.config import cfg, update_config
 20 | from lib.hrnet.lib.utils.transforms import *
 21 | from lib.hrnet.lib.utils.inference import get_final_preds
 22 | from lib.hrnet.lib.models import pose_hrnet
 23 | 
 24 | cfg_dir = 'demo/lib/hrnet/experiments/'
 25 | model_dir = 'demo/lib/checkpoint/'
 26 | 
 27 | # Loading human detector model
 28 | from lib.yolov3.human_detector import load_model as yolo_model
 29 | from lib.yolov3.human_detector import yolo_human_det as yolo_det
 30 | from lib.sort.sort import Sort
 31 | 
 32 | 
 33 | def parse_args():
 34 |     parser = argparse.ArgumentParser(description='Train keypoints network')
 35 |     # general
 36 |     parser.add_argument('--cfg', type=str, default=cfg_dir + 'w48_384x288_adam_lr1e-3.yaml',
 37 |                         help='experiment configure file name')
 38 |     parser.add_argument('opts', nargs=argparse.REMAINDER, default=None,
 39 |                         help="Modify config options using the command-line")
 40 |     parser.add_argument('--modelDir', type=str, default=model_dir + 'pose_hrnet_w48_384x288.pth',
 41 |                         help='The model directory')
 42 |     parser.add_argument('--det-dim', type=int, default=416,
 43 |                         help='The input dimension of the detected image')
 44 |     parser.add_argument('--thred-score', type=float, default=0.30,
 45 |                         help='The threshold of object Confidence')
 46 |     parser.add_argument('-a', '--animation', action='store_true',
 47 |                         help='output animation')
 48 |     parser.add_argument('-np', '--num-person', type=int, default=1,
 49 |                         help='The maximum number of estimated poses')
 50 |     parser.add_argument("-v", "--video", type=str, default='camera',
 51 |                         help="input video file name")
 52 |     parser.add_argument("-f", "--figure", type=str, default='demo.jpg',
 53 |                         help="input figure file name")
 54 |     parser.add_argument('--gpu', type=str, default='0', help='input video')
 55 |     args = parser.parse_args()
 56 | 
 57 |     return args
 58 | 
 59 | 
 60 | def reset_config(args):
 61 |     update_config(cfg, args)
 62 | 
 63 |     # cudnn related setting
 64 |     cudnn.benchmark = cfg.CUDNN.BENCHMARK
 65 |     torch.backends.cudnn.deterministic = cfg.CUDNN.DETERMINISTIC
 66 |     torch.backends.cudnn.enabled = cfg.CUDNN.ENABLED
 67 | 
 68 | 
 69 | # load model
 70 | def model_load(config):
 71 |     model = pose_hrnet.get_pose_net(config, is_train=False)
 72 |     if torch.cuda.is_available():
 73 |         model = model.cuda()
 74 | 
 75 |     state_dict = torch.load(config.OUTPUT_DIR)
 76 |     from collections import OrderedDict
 77 |     new_state_dict = OrderedDict()
 78 |     for k, v in state_dict.items():
 79 |         name = k  # remove module.
 80 |         #  print(name,'\t')
 81 |         new_state_dict[name] = v
 82 |     model.load_state_dict(new_state_dict)
 83 |     model.eval()
 84 |     # print('HRNet network successfully loaded')
 85 |     
 86 |     return model
 87 | 
 88 | 
 89 | def gen_video_kpts(video, det_dim=416, num_peroson=1, gen_output=False):
 90 |     # Updating configuration
 91 |     args = parse_args()
 92 |     reset_config(args)
 93 | 
 94 |     cap = cv2.VideoCapture(video)
 95 | 
 96 |     # Loading detector and pose model, initialize sort for track
 97 |     human_model = yolo_model(inp_dim=det_dim)
 98 |     pose_model = model_load(cfg)
 99 |     people_sort = Sort(min_hits=0)
100 | 
101 |     video_length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
102 | 
103 |     kpts_result = []
104 |     scores_result = []
105 |     for ii in tqdm(range(video_length)):
106 |         ret, frame = cap.read()
107 | 
108 |         if not ret:
109 |             continue
110 | 
111 |         bboxs, scores = yolo_det(frame, human_model, reso=det_dim, confidence=args.thred_score)
112 | 
113 |         if bboxs is None or not bboxs.any():
114 |             print('No person detected!')
115 |             bboxs = bboxs_pre
116 |             scores = scores_pre
117 |         else:
118 |             bboxs_pre = copy.deepcopy(bboxs) 
119 |             scores_pre = copy.deepcopy(scores) 
120 | 
121 |         # Using Sort to track people
122 |         people_track = people_sort.update(bboxs)
123 | 
124 |         # Track the first two people in the video and remove the ID
125 |         if people_track.shape[0] == 1:
126 |             people_track_ = people_track[-1, :-1].reshape(1, 4)
127 |         elif people_track.shape[0] >= 2:
128 |             people_track_ = people_track[-num_peroson:, :-1].reshape(num_peroson, 4)
129 |             people_track_ = people_track_[::-1]
130 |         else:
131 |             continue
132 | 
133 |         track_bboxs = []
134 |         for bbox in people_track_:
135 |             bbox = [round(i, 2) for i in list(bbox)]
136 |             track_bboxs.append(bbox)
137 | 
138 |         with torch.no_grad():
139 |             # bbox is coordinate location
140 |             inputs, origin_img, center, scale = PreProcess(frame, track_bboxs, cfg, num_peroson)
141 | 
142 |             inputs = inputs[:, [2, 1, 0]]
143 | 
144 |             if torch.cuda.is_available():
145 |                 inputs = inputs.cuda()
146 |             output = pose_model(inputs)
147 | 
148 |             # compute coordinate
149 |             preds, maxvals = get_final_preds(cfg, output.clone().cpu().numpy(), np.asarray(center), np.asarray(scale))
150 | 
151 |         kpts = np.zeros((num_peroson, 17, 2), dtype=np.float32)
152 |         scores = np.zeros((num_peroson, 17), dtype=np.float32)
153 |         for i, kpt in enumerate(preds):
154 |             kpts[i] = kpt
155 | 
156 |         for i, score in enumerate(maxvals):
157 |             scores[i] = score.squeeze()
158 | 
159 |         kpts_result.append(kpts)
160 |         scores_result.append(scores)
161 | 
162 |     keypoints = np.array(kpts_result)
163 |     scores = np.array(scores_result)
164 | 
165 |     keypoints = keypoints.transpose(1, 0, 2, 3)  # (T, M, N, 2) --> (M, T, N, 2)
166 |     scores = scores.transpose(1, 0, 2)  # (T, M, N) --> (M, T, N)
167 | 
168 |     return keypoints, scores
169 | 


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/config/__init__.py:
--------------------------------------------------------------------------------
 1 | # ------------------------------------------------------------------------------
 2 | # Copyright (c) Microsoft
 3 | # Licensed under the MIT License.
 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com)
 5 | # ------------------------------------------------------------------------------
 6 | 
 7 | from .default import _C as cfg
 8 | from .default import update_config
 9 | from .models import MODEL_EXTRAS
10 | 


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/config/__pycache__/__init__.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/config/__pycache__/__init__.cpython-38.pyc


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/config/__pycache__/__init__.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/config/__pycache__/__init__.cpython-39.pyc


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/config/__pycache__/default.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/config/__pycache__/default.cpython-38.pyc


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/config/__pycache__/default.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/config/__pycache__/default.cpython-39.pyc


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/config/__pycache__/models.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/config/__pycache__/models.cpython-38.pyc


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/config/__pycache__/models.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/config/__pycache__/models.cpython-39.pyc


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/config/default.py:
--------------------------------------------------------------------------------
  1 | 
  2 | # ------------------------------------------------------------------------------
  3 | # Copyright (c) Microsoft
  4 | # Licensed under the MIT License.
  5 | # Written by Bin Xiao (Bin.Xiao@microsoft.com)
  6 | # ------------------------------------------------------------------------------
  7 | 
  8 | from __future__ import absolute_import
  9 | from __future__ import division
 10 | from __future__ import print_function
 11 | 
 12 | import os
 13 | 
 14 | from yacs.config import CfgNode as CN
 15 | 
 16 | 
 17 | _C = CN()
 18 | 
 19 | _C.OUTPUT_DIR = ''
 20 | _C.LOG_DIR = ''
 21 | _C.DATA_DIR = ''
 22 | _C.GPUS = (0,)
 23 | _C.WORKERS = 4
 24 | _C.PRINT_FREQ = 20
 25 | _C.AUTO_RESUME = False
 26 | _C.PIN_MEMORY = True
 27 | _C.RANK = 0
 28 | 
 29 | # Cudnn related params
 30 | _C.CUDNN = CN()
 31 | _C.CUDNN.BENCHMARK = True
 32 | _C.CUDNN.DETERMINISTIC = False
 33 | _C.CUDNN.ENABLED = True
 34 | 
 35 | # common params for NETWORK
 36 | _C.MODEL = CN()
 37 | _C.MODEL.NAME = 'pose_hrnet'
 38 | _C.MODEL.INIT_WEIGHTS = True
 39 | _C.MODEL.PRETRAINED = ''
 40 | _C.MODEL.NUM_JOINTS = 17
 41 | _C.MODEL.TAG_PER_JOINT = True
 42 | _C.MODEL.TARGET_TYPE = 'gaussian'
 43 | _C.MODEL.IMAGE_SIZE = [256, 256]  # width * height, ex: 192 * 256
 44 | _C.MODEL.HEATMAP_SIZE = [64, 64]  # width * height, ex: 24 * 32
 45 | _C.MODEL.SIGMA = 2
 46 | _C.MODEL.EXTRA = CN(new_allowed=True)
 47 | 
 48 | _C.LOSS = CN()
 49 | _C.LOSS.USE_OHKM = False
 50 | _C.LOSS.TOPK = 8
 51 | _C.LOSS.USE_TARGET_WEIGHT = True
 52 | _C.LOSS.USE_DIFFERENT_JOINTS_WEIGHT = False
 53 | 
 54 | # DATASET related params
 55 | _C.DATASET = CN()
 56 | _C.DATASET.ROOT = ''
 57 | _C.DATASET.DATASET = 'mpii'
 58 | _C.DATASET.TRAIN_SET = 'train'
 59 | _C.DATASET.TEST_SET = 'valid'
 60 | _C.DATASET.DATA_FORMAT = 'jpg'
 61 | _C.DATASET.HYBRID_JOINTS_TYPE = ''
 62 | _C.DATASET.SELECT_DATA = False
 63 | 
 64 | # training data augmentation
 65 | _C.DATASET.FLIP = True
 66 | _C.DATASET.SCALE_FACTOR = 0.25
 67 | _C.DATASET.ROT_FACTOR = 30
 68 | _C.DATASET.PROB_HALF_BODY = 0.0
 69 | _C.DATASET.NUM_JOINTS_HALF_BODY = 8
 70 | _C.DATASET.COLOR_RGB = False
 71 | 
 72 | # train
 73 | _C.TRAIN = CN()
 74 | 
 75 | _C.TRAIN.LR_FACTOR = 0.1
 76 | _C.TRAIN.LR_STEP = [90, 110]
 77 | _C.TRAIN.LR = 0.001
 78 | 
 79 | _C.TRAIN.OPTIMIZER = 'adam'
 80 | _C.TRAIN.MOMENTUM = 0.9
 81 | _C.TRAIN.WD = 0.0001
 82 | _C.TRAIN.NESTEROV = False
 83 | _C.TRAIN.GAMMA1 = 0.99
 84 | _C.TRAIN.GAMMA2 = 0.0
 85 | 
 86 | _C.TRAIN.BEGIN_EPOCH = 0
 87 | _C.TRAIN.END_EPOCH = 140
 88 | 
 89 | _C.TRAIN.RESUME = False
 90 | _C.TRAIN.CHECKPOINT = ''
 91 | 
 92 | _C.TRAIN.BATCH_SIZE_PER_GPU = 32
 93 | _C.TRAIN.SHUFFLE = True
 94 | 
 95 | # testing
 96 | _C.TEST = CN()
 97 | 
 98 | # size of images for each device
 99 | _C.TEST.BATCH_SIZE_PER_GPU = 32
100 | # Test Model Epoch
101 | _C.TEST.FLIP_TEST = False
102 | _C.TEST.POST_PROCESS = False
103 | _C.TEST.SHIFT_HEATMAP = False
104 | 
105 | _C.TEST.USE_GT_BBOX = False
106 | 
107 | # nms
108 | _C.TEST.IMAGE_THRE = 0.1
109 | _C.TEST.NMS_THRE = 0.6
110 | _C.TEST.SOFT_NMS = False
111 | _C.TEST.OKS_THRE = 0.5
112 | _C.TEST.IN_VIS_THRE = 0.0
113 | _C.TEST.COCO_BBOX_FILE = ''
114 | _C.TEST.BBOX_THRE = 1.0
115 | _C.TEST.MODEL_FILE = ''
116 | 
117 | # debug
118 | _C.DEBUG = CN()
119 | _C.DEBUG.DEBUG = False
120 | _C.DEBUG.SAVE_BATCH_IMAGES_GT = False
121 | _C.DEBUG.SAVE_BATCH_IMAGES_PRED = False
122 | _C.DEBUG.SAVE_HEATMAPS_GT = False
123 | _C.DEBUG.SAVE_HEATMAPS_PRED = False
124 | 
125 | 
126 | def update_config(cfg, args):
127 |     cfg.defrost()
128 |     cfg.merge_from_file(args.cfg)
129 |     cfg.merge_from_list(args.opts)
130 | 
131 |     if args.modelDir:
132 |         cfg.OUTPUT_DIR = args.modelDir
133 | 
134 |     # if args.logDir:
135 |     #     cfg.LOG_DIR = args.logDir
136 |     #
137 |     # if args.dataDir:
138 |     #     cfg.DATA_DIR = args.dataDir
139 |     #
140 |     # cfg.DATASET.ROOT = os.path.join(
141 |     #     cfg.DATA_DIR, cfg.DATASET.ROOT
142 |     # )
143 |     #
144 |     # cfg.MODEL.PRETRAINED = os.path.join(
145 |     #     cfg.DATA_DIR, cfg.MODEL.PRETRAINED
146 |     # )
147 |     #
148 |     # if cfg.TEST.MODEL_FILE:
149 |     #     cfg.TEST.MODEL_FILE = os.path.join(
150 |     #         cfg.DATA_DIR, cfg.TEST.MODEL_FILE
151 |     #     )
152 | 
153 |     cfg.freeze()
154 | 
155 | 
156 | if __name__ == '__main__':
157 |     import sys
158 |     with open(sys.argv[1], 'w') as f:
159 |         print(_C, file=f)
160 | 
161 | 


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/config/models.py:
--------------------------------------------------------------------------------
 1 | # ------------------------------------------------------------------------------
 2 | # Copyright (c) Microsoft
 3 | # Licensed under the MIT License.
 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com)
 5 | # ------------------------------------------------------------------------------
 6 | 
 7 | from __future__ import absolute_import
 8 | from __future__ import division
 9 | from __future__ import print_function
10 | 
11 | from yacs.config import CfgNode as CN
12 | 
13 | 
14 | # pose_resnet related params
15 | POSE_RESNET = CN()
16 | POSE_RESNET.NUM_LAYERS = 50
17 | POSE_RESNET.DECONV_WITH_BIAS = False
18 | POSE_RESNET.NUM_DECONV_LAYERS = 3
19 | POSE_RESNET.NUM_DECONV_FILTERS = [256, 256, 256]
20 | POSE_RESNET.NUM_DECONV_KERNELS = [4, 4, 4]
21 | POSE_RESNET.FINAL_CONV_KERNEL = 1
22 | POSE_RESNET.PRETRAINED_LAYERS = ['*']
23 | 
24 | # pose_multi_resoluton_net related params
25 | POSE_HIGH_RESOLUTION_NET = CN()
26 | POSE_HIGH_RESOLUTION_NET.PRETRAINED_LAYERS = ['*']
27 | POSE_HIGH_RESOLUTION_NET.STEM_INPLANES = 64
28 | POSE_HIGH_RESOLUTION_NET.FINAL_CONV_KERNEL = 1
29 | 
30 | POSE_HIGH_RESOLUTION_NET.STAGE2 = CN()
31 | POSE_HIGH_RESOLUTION_NET.STAGE2.NUM_MODULES = 1
32 | POSE_HIGH_RESOLUTION_NET.STAGE2.NUM_BRANCHES = 2
33 | POSE_HIGH_RESOLUTION_NET.STAGE2.NUM_BLOCKS = [4, 4]
34 | POSE_HIGH_RESOLUTION_NET.STAGE2.NUM_CHANNELS = [32, 64]
35 | POSE_HIGH_RESOLUTION_NET.STAGE2.BLOCK = 'BASIC'
36 | POSE_HIGH_RESOLUTION_NET.STAGE2.FUSE_METHOD = 'SUM'
37 | 
38 | POSE_HIGH_RESOLUTION_NET.STAGE3 = CN()
39 | POSE_HIGH_RESOLUTION_NET.STAGE3.NUM_MODULES = 1
40 | POSE_HIGH_RESOLUTION_NET.STAGE3.NUM_BRANCHES = 3
41 | POSE_HIGH_RESOLUTION_NET.STAGE3.NUM_BLOCKS = [4, 4, 4]
42 | POSE_HIGH_RESOLUTION_NET.STAGE3.NUM_CHANNELS = [32, 64, 128]
43 | POSE_HIGH_RESOLUTION_NET.STAGE3.BLOCK = 'BASIC'
44 | POSE_HIGH_RESOLUTION_NET.STAGE3.FUSE_METHOD = 'SUM'
45 | 
46 | POSE_HIGH_RESOLUTION_NET.STAGE4 = CN()
47 | POSE_HIGH_RESOLUTION_NET.STAGE4.NUM_MODULES = 1
48 | POSE_HIGH_RESOLUTION_NET.STAGE4.NUM_BRANCHES = 4
49 | POSE_HIGH_RESOLUTION_NET.STAGE4.NUM_BLOCKS = [4, 4, 4, 4]
50 | POSE_HIGH_RESOLUTION_NET.STAGE4.NUM_CHANNELS = [32, 64, 128, 256]
51 | POSE_HIGH_RESOLUTION_NET.STAGE4.BLOCK = 'BASIC'
52 | POSE_HIGH_RESOLUTION_NET.STAGE4.FUSE_METHOD = 'SUM'
53 | 
54 | 
55 | MODEL_EXTRAS = {
56 |     'pose_resnet': POSE_RESNET,
57 |     'pose_high_resolution_net': POSE_HIGH_RESOLUTION_NET,
58 | }
59 | 


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/models/__pycache__/pose_hrnet.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/models/__pycache__/pose_hrnet.cpython-38.pyc


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/models/__pycache__/pose_hrnet.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/models/__pycache__/pose_hrnet.cpython-39.pyc


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/models/pose_hrnet.py:
--------------------------------------------------------------------------------
  1 | # ------------------------------------------------------------------------------
  2 | # Copyright (c) Microsoft
  3 | # Licensed under the MIT License.
  4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com)
  5 | # ------------------------------------------------------------------------------
  6 | 
  7 | from __future__ import absolute_import
  8 | from __future__ import division
  9 | from __future__ import print_function
 10 | 
 11 | import os
 12 | import logging
 13 | 
 14 | import torch
 15 | import torch.nn as nn
 16 | 
 17 | 
 18 | BN_MOMENTUM = 0.1
 19 | logger = logging.getLogger(__name__)
 20 | 
 21 | 
 22 | def conv3x3(in_planes, out_planes, stride=1):
 23 |     """3x3 convolution with padding"""
 24 |     return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
 25 |                      padding=1, bias=False)
 26 | 
 27 | 
 28 | class BasicBlock(nn.Module):
 29 |     expansion = 1
 30 | 
 31 |     def __init__(self, inplanes, planes, stride=1, downsample=None):
 32 |         super(BasicBlock, self).__init__()
 33 |         self.conv1 = conv3x3(inplanes, planes, stride)
 34 |         self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
 35 |         self.relu = nn.ReLU(inplace=True)
 36 |         self.conv2 = conv3x3(planes, planes)
 37 |         self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
 38 |         self.downsample = downsample
 39 |         self.stride = stride
 40 | 
 41 |     def forward(self, x):
 42 |         residual = x
 43 | 
 44 |         out = self.conv1(x)
 45 |         out = self.bn1(out)
 46 |         out = self.relu(out)
 47 | 
 48 |         out = self.conv2(out)
 49 |         out = self.bn2(out)
 50 | 
 51 |         if self.downsample is not None:
 52 |             residual = self.downsample(x)
 53 | 
 54 |         out += residual
 55 |         out = self.relu(out)
 56 | 
 57 |         return out
 58 | 
 59 | 
 60 | class Bottleneck(nn.Module):
 61 |     expansion = 4
 62 | 
 63 |     def __init__(self, inplanes, planes, stride=1, downsample=None):
 64 |         super(Bottleneck, self).__init__()
 65 |         self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
 66 |         self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
 67 |         self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
 68 |                                padding=1, bias=False)
 69 |         self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
 70 |         self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1,
 71 |                                bias=False)
 72 |         self.bn3 = nn.BatchNorm2d(planes * self.expansion,
 73 |                                   momentum=BN_MOMENTUM)
 74 |         self.relu = nn.ReLU(inplace=True)
 75 |         self.downsample = downsample
 76 |         self.stride = stride
 77 | 
 78 |     def forward(self, x):
 79 |         residual = x
 80 | 
 81 |         out = self.conv1(x)
 82 |         out = self.bn1(out)
 83 |         out = self.relu(out)
 84 | 
 85 |         out = self.conv2(out)
 86 |         out = self.bn2(out)
 87 |         out = self.relu(out)
 88 | 
 89 |         out = self.conv3(out)
 90 |         out = self.bn3(out)
 91 | 
 92 |         if self.downsample is not None:
 93 |             residual = self.downsample(x)
 94 | 
 95 |         out += residual
 96 |         out = self.relu(out)
 97 | 
 98 |         return out
 99 | 
100 | 
101 | class HighResolutionModule(nn.Module):
102 |     def __init__(self, num_branches, blocks, num_blocks, num_inchannels,
103 |                  num_channels, fuse_method, multi_scale_output=True):
104 |         super(HighResolutionModule, self).__init__()
105 |         self._check_branches(
106 |             num_branches, blocks, num_blocks, num_inchannels, num_channels)
107 | 
108 |         self.num_inchannels = num_inchannels
109 |         self.fuse_method = fuse_method
110 |         self.num_branches = num_branches
111 | 
112 |         self.multi_scale_output = multi_scale_output
113 | 
114 |         self.branches = self._make_branches(
115 |             num_branches, blocks, num_blocks, num_channels)
116 |         self.fuse_layers = self._make_fuse_layers()
117 |         self.relu = nn.ReLU(True)
118 | 
119 |     def _check_branches(self, num_branches, blocks, num_blocks,
120 |                         num_inchannels, num_channels):
121 |         if num_branches != len(num_blocks):
122 |             error_msg = 'NUM_BRANCHES({}) <> NUM_BLOCKS({})'.format(
123 |                 num_branches, len(num_blocks))
124 |             logger.error(error_msg)
125 |             raise ValueError(error_msg)
126 | 
127 |         if num_branches != len(num_channels):
128 |             error_msg = 'NUM_BRANCHES({}) <> NUM_CHANNELS({})'.format(
129 |                 num_branches, len(num_channels))
130 |             logger.error(error_msg)
131 |             raise ValueError(error_msg)
132 | 
133 |         if num_branches != len(num_inchannels):
134 |             error_msg = 'NUM_BRANCHES({}) <> NUM_INCHANNELS({})'.format(
135 |                 num_branches, len(num_inchannels))
136 |             logger.error(error_msg)
137 |             raise ValueError(error_msg)
138 | 
139 |     def _make_one_branch(self, branch_index, block, num_blocks, num_channels,
140 |                          stride=1):
141 |         downsample = None
142 |         if stride != 1 or \
143 |            self.num_inchannels[branch_index] != num_channels[branch_index] * block.expansion:
144 |             downsample = nn.Sequential(
145 |                 nn.Conv2d(
146 |                     self.num_inchannels[branch_index],
147 |                     num_channels[branch_index] * block.expansion,
148 |                     kernel_size=1, stride=stride, bias=False
149 |                 ),
150 |                 nn.BatchNorm2d(
151 |                     num_channels[branch_index] * block.expansion,
152 |                     momentum=BN_MOMENTUM
153 |                 ),
154 |             )
155 | 
156 |         layers = []
157 |         layers.append(
158 |             block(
159 |                 self.num_inchannels[branch_index],
160 |                 num_channels[branch_index],
161 |                 stride,
162 |                 downsample
163 |             )
164 |         )
165 |         self.num_inchannels[branch_index] = \
166 |             num_channels[branch_index] * block.expansion
167 |         for i in range(1, num_blocks[branch_index]):
168 |             layers.append(
169 |                 block(
170 |                     self.num_inchannels[branch_index],
171 |                     num_channels[branch_index]
172 |                 )
173 |             )
174 | 
175 |         return nn.Sequential(*layers)
176 | 
177 |     def _make_branches(self, num_branches, block, num_blocks, num_channels):
178 |         branches = []
179 | 
180 |         for i in range(num_branches):
181 |             branches.append(
182 |                 self._make_one_branch(i, block, num_blocks, num_channels)
183 |             )
184 | 
185 |         return nn.ModuleList(branches)
186 | 
187 |     def _make_fuse_layers(self):
188 |         if self.num_branches == 1:
189 |             return None
190 | 
191 |         num_branches = self.num_branches
192 |         num_inchannels = self.num_inchannels
193 |         fuse_layers = []
194 |         for i in range(num_branches if self.multi_scale_output else 1):
195 |             fuse_layer = []
196 |             for j in range(num_branches):
197 |                 if j > i:
198 |                     fuse_layer.append(
199 |                         nn.Sequential(
200 |                             nn.Conv2d(
201 |                                 num_inchannels[j],
202 |                                 num_inchannels[i],
203 |                                 1, 1, 0, bias=False
204 |                             ),
205 |                             nn.BatchNorm2d(num_inchannels[i]),
206 |                             nn.Upsample(scale_factor=2**(j-i), mode='nearest')
207 |                         )
208 |                     )
209 |                 elif j == i:
210 |                     fuse_layer.append(None)
211 |                 else:
212 |                     conv3x3s = []
213 |                     for k in range(i-j):
214 |                         if k == i - j - 1:
215 |                             num_outchannels_conv3x3 = num_inchannels[i]
216 |                             conv3x3s.append(
217 |                                 nn.Sequential(
218 |                                     nn.Conv2d(
219 |                                         num_inchannels[j],
220 |                                         num_outchannels_conv3x3,
221 |                                         3, 2, 1, bias=False
222 |                                     ),
223 |                                     nn.BatchNorm2d(num_outchannels_conv3x3)
224 |                                 )
225 |                             )
226 |                         else:
227 |                             num_outchannels_conv3x3 = num_inchannels[j]
228 |                             conv3x3s.append(
229 |                                 nn.Sequential(
230 |                                     nn.Conv2d(
231 |                                         num_inchannels[j],
232 |                                         num_outchannels_conv3x3,
233 |                                         3, 2, 1, bias=False
234 |                                     ),
235 |                                     nn.BatchNorm2d(num_outchannels_conv3x3),
236 |                                     nn.ReLU(True)
237 |                                 )
238 |                             )
239 |                     fuse_layer.append(nn.Sequential(*conv3x3s))
240 |             fuse_layers.append(nn.ModuleList(fuse_layer))
241 | 
242 |         return nn.ModuleList(fuse_layers)
243 | 
244 |     def get_num_inchannels(self):
245 |         return self.num_inchannels
246 | 
247 |     def forward(self, x):
248 |         if self.num_branches == 1:
249 |             return [self.branches[0](x[0])]
250 | 
251 |         for i in range(self.num_branches):
252 |             x[i] = self.branches[i](x[i])
253 | 
254 |         x_fuse = []
255 | 
256 |         for i in range(len(self.fuse_layers)):
257 |             y = x[0] if i == 0 else self.fuse_layers[i][0](x[0])
258 |             for j in range(1, self.num_branches):
259 |                 if i == j:
260 |                     y = y + x[j]
261 |                 else:
262 |                     y = y + self.fuse_layers[i][j](x[j])
263 |             x_fuse.append(self.relu(y))
264 | 
265 |         return x_fuse
266 | 
267 | 
268 | blocks_dict = {
269 |     'BASIC': BasicBlock,
270 |     'BOTTLENECK': Bottleneck
271 | }
272 | 
273 | 
274 | class PoseHighResolutionNet(nn.Module):
275 | 
276 |     def __init__(self, cfg, **kwargs):
277 |         self.inplanes = 64
278 |         extra = cfg['MODEL']['EXTRA']
279 |         super(PoseHighResolutionNet, self).__init__()
280 | 
281 |         # stem net
282 |         self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=2, padding=1,
283 |                                bias=False)
284 |         self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)
285 |         self.conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1,
286 |                                bias=False)
287 |         self.bn2 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)
288 |         self.relu = nn.ReLU(inplace=True)
289 |         self.layer1 = self._make_layer(Bottleneck, 64, 4)
290 | 
291 |         self.stage2_cfg = extra['STAGE2']
292 |         num_channels = self.stage2_cfg['NUM_CHANNELS']
293 |         block = blocks_dict[self.stage2_cfg['BLOCK']]
294 |         num_channels = [
295 |             num_channels[i] * block.expansion for i in range(len(num_channels))
296 |         ]
297 |         self.transition1 = self._make_transition_layer([256], num_channels)
298 |         self.stage2, pre_stage_channels = self._make_stage(
299 |             self.stage2_cfg, num_channels)
300 | 
301 |         self.stage3_cfg = extra['STAGE3']
302 |         num_channels = self.stage3_cfg['NUM_CHANNELS']
303 |         block = blocks_dict[self.stage3_cfg['BLOCK']]
304 |         num_channels = [
305 |             num_channels[i] * block.expansion for i in range(len(num_channels))
306 |         ]
307 |         self.transition2 = self._make_transition_layer(
308 |             pre_stage_channels, num_channels)
309 |         self.stage3, pre_stage_channels = self._make_stage(
310 |             self.stage3_cfg, num_channels)
311 | 
312 |         self.stage4_cfg = extra['STAGE4']
313 |         num_channels = self.stage4_cfg['NUM_CHANNELS']
314 |         block = blocks_dict[self.stage4_cfg['BLOCK']]
315 |         num_channels = [
316 |             num_channels[i] * block.expansion for i in range(len(num_channels))
317 |         ]
318 |         self.transition3 = self._make_transition_layer(
319 |             pre_stage_channels, num_channels)
320 |         self.stage4, pre_stage_channels = self._make_stage(
321 |             self.stage4_cfg, num_channels, multi_scale_output=False)
322 | 
323 |         self.final_layer = nn.Conv2d(
324 |             in_channels=pre_stage_channels[0],
325 |             out_channels=cfg['MODEL']['NUM_JOINTS'],
326 |             kernel_size=extra['FINAL_CONV_KERNEL'],
327 |             stride=1,
328 |             padding=1 if extra['FINAL_CONV_KERNEL'] == 3 else 0
329 |         )
330 | 
331 |         self.pretrained_layers = extra['PRETRAINED_LAYERS']
332 | 
333 |     def _make_transition_layer(
334 |             self, num_channels_pre_layer, num_channels_cur_layer):
335 |         num_branches_cur = len(num_channels_cur_layer)
336 |         num_branches_pre = len(num_channels_pre_layer)
337 | 
338 |         transition_layers = []
339 |         for i in range(num_branches_cur):
340 |             if i < num_branches_pre:
341 |                 if num_channels_cur_layer[i] != num_channels_pre_layer[i]:
342 |                     transition_layers.append(
343 |                         nn.Sequential(
344 |                             nn.Conv2d(
345 |                                 num_channels_pre_layer[i],
346 |                                 num_channels_cur_layer[i],
347 |                                 3, 1, 1, bias=False
348 |                             ),
349 |                             nn.BatchNorm2d(num_channels_cur_layer[i]),
350 |                             nn.ReLU(inplace=True)
351 |                         )
352 |                     )
353 |                 else:
354 |                     transition_layers.append(None)
355 |             else:
356 |                 conv3x3s = []
357 |                 for j in range(i+1-num_branches_pre):
358 |                     inchannels = num_channels_pre_layer[-1]
359 |                     outchannels = num_channels_cur_layer[i] \
360 |                         if j == i-num_branches_pre else inchannels
361 |                     conv3x3s.append(
362 |                         nn.Sequential(
363 |                             nn.Conv2d(
364 |                                 inchannels, outchannels, 3, 2, 1, bias=False
365 |                             ),
366 |                             nn.BatchNorm2d(outchannels),
367 |                             nn.ReLU(inplace=True)
368 |                         )
369 |                     )
370 |                 transition_layers.append(nn.Sequential(*conv3x3s))
371 | 
372 |         return nn.ModuleList(transition_layers)
373 | 
374 |     def _make_layer(self, block, planes, blocks, stride=1):
375 |         downsample = None
376 |         if stride != 1 or self.inplanes != planes * block.expansion:
377 |             downsample = nn.Sequential(
378 |                 nn.Conv2d(
379 |                     self.inplanes, planes * block.expansion,
380 |                     kernel_size=1, stride=stride, bias=False
381 |                 ),
382 |                 nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),
383 |             )
384 | 
385 |         layers = []
386 |         layers.append(block(self.inplanes, planes, stride, downsample))
387 |         self.inplanes = planes * block.expansion
388 |         for i in range(1, blocks):
389 |             layers.append(block(self.inplanes, planes))
390 | 
391 |         return nn.Sequential(*layers)
392 | 
393 |     def _make_stage(self, layer_config, num_inchannels,
394 |                     multi_scale_output=True):
395 |         num_modules = layer_config['NUM_MODULES']
396 |         num_branches = layer_config['NUM_BRANCHES']
397 |         num_blocks = layer_config['NUM_BLOCKS']
398 |         num_channels = layer_config['NUM_CHANNELS']
399 |         block = blocks_dict[layer_config['BLOCK']]
400 |         fuse_method = layer_config['FUSE_METHOD']
401 | 
402 |         modules = []
403 |         for i in range(num_modules):
404 |             # multi_scale_output is only used last module
405 |             if not multi_scale_output and i == num_modules - 1:
406 |                 reset_multi_scale_output = False
407 |             else:
408 |                 reset_multi_scale_output = True
409 | 
410 |             modules.append(
411 |                 HighResolutionModule(
412 |                     num_branches,
413 |                     block,
414 |                     num_blocks,
415 |                     num_inchannels,
416 |                     num_channels,
417 |                     fuse_method,
418 |                     reset_multi_scale_output
419 |                 )
420 |             )
421 |             num_inchannels = modules[-1].get_num_inchannels()
422 | 
423 |         return nn.Sequential(*modules), num_inchannels
424 | 
425 |     def forward(self, x):
426 |         x = self.conv1(x)
427 |         x = self.bn1(x)
428 |         x = self.relu(x)
429 |         x = self.conv2(x)
430 |         x = self.bn2(x)
431 |         x = self.relu(x)
432 |         x = self.layer1(x)
433 | 
434 |         x_list = []
435 |         for i in range(self.stage2_cfg['NUM_BRANCHES']):
436 |             if self.transition1[i] is not None:
437 |                 x_list.append(self.transition1[i](x))
438 |             else:
439 |                 x_list.append(x)
440 |         y_list = self.stage2(x_list)
441 | 
442 |         x_list = []
443 |         for i in range(self.stage3_cfg['NUM_BRANCHES']):
444 |             if self.transition2[i] is not None:
445 |                 x_list.append(self.transition2[i](y_list[-1]))
446 |             else:
447 |                 x_list.append(y_list[i])
448 |         y_list = self.stage3(x_list)
449 | 
450 |         x_list = []
451 |         for i in range(self.stage4_cfg['NUM_BRANCHES']):
452 |             if self.transition3[i] is not None:
453 |                 x_list.append(self.transition3[i](y_list[-1]))
454 |             else:
455 |                 x_list.append(y_list[i])
456 |         y_list = self.stage4(x_list)
457 | 
458 |         x = self.final_layer(y_list[0])
459 | 
460 |         return x
461 | 
462 |     def init_weights(self, pretrained=''):
463 |         logger.info('=> init weights from normal distribution')
464 |         for m in self.modules():
465 |             if isinstance(m, nn.Conv2d):
466 |                 # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
467 |                 nn.init.normal_(m.weight, std=0.001)
468 |                 for name, _ in m.named_parameters():
469 |                     if name in ['bias']:
470 |                         nn.init.constant_(m.bias, 0)
471 |             elif isinstance(m, nn.BatchNorm2d):
472 |                 nn.init.constant_(m.weight, 1)
473 |                 nn.init.constant_(m.bias, 0)
474 |             elif isinstance(m, nn.ConvTranspose2d):
475 |                 nn.init.normal_(m.weight, std=0.001)
476 |                 for name, _ in m.named_parameters():
477 |                     if name in ['bias']:
478 |                         nn.init.constant_(m.bias, 0)
479 | 
480 |         if os.path.isfile(pretrained):
481 |             pretrained_state_dict = torch.load(pretrained)
482 |             logger.info('=> loading pretrained model {}'.format(pretrained))
483 | 
484 |             need_init_state_dict = {}
485 |             for name, m in pretrained_state_dict.items():
486 |                 if name.split('.')[0] in self.pretrained_layers \
487 |                    or self.pretrained_layers[0] is '*':
488 |                     need_init_state_dict[name] = m
489 |             self.load_state_dict(need_init_state_dict, strict=False)
490 |         elif pretrained:
491 |             logger.error('=> please download pre-trained models first!')
492 |             raise ValueError('{} is not exist!'.format(pretrained))
493 | 
494 | 
495 | def get_pose_net(cfg, is_train, **kwargs):
496 |     model = PoseHighResolutionNet(cfg, **kwargs)
497 | 
498 |     if is_train and cfg['MODEL']['INIT_WEIGHTS']:
499 |         model.init_weights(cfg['MODEL']['PRETRAINED'])
500 | 
501 |     return model
502 | 


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/utils/__pycache__/coco_h36m.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/utils/__pycache__/coco_h36m.cpython-39.pyc


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/utils/__pycache__/inference.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/utils/__pycache__/inference.cpython-39.pyc


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/utils/__pycache__/transforms.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/utils/__pycache__/transforms.cpython-39.pyc


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/utils/__pycache__/utilitys.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/utils/__pycache__/utilitys.cpython-39.pyc


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/utils/coco_h36m.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | 
 4 | h36m_coco_order = [9, 11, 14, 12, 15, 13, 16, 4, 1, 5, 2, 6, 3]
 5 | coco_order = [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
 6 | spple_keypoints = [10, 8, 0, 7]
 7 | 
 8 | 
 9 | def coco_h36m(keypoints):
10 |     # keypoints: (T, N, 2) or (M, N, 2)
11 | 
12 |     temporal = keypoints.shape[0]
13 |     keypoints_h36m = np.zeros_like(keypoints, dtype=np.float32)
14 |     htps_keypoints = np.zeros((temporal, 4, 2), dtype=np.float32)
15 | 
16 |     # htps_keypoints: head, thorax, pelvis, spine
17 |     htps_keypoints[:, 0, 0] = np.mean(keypoints[:, 1:5, 0], axis=1, dtype=np.float32)
18 |     htps_keypoints[:, 0, 1] = np.sum(keypoints[:, 1:3, 1], axis=1, dtype=np.float32) - keypoints[:, 0, 1]
19 |     htps_keypoints[:, 1, :] = np.mean(keypoints[:, 5:7, :], axis=1, dtype=np.float32)
20 |     htps_keypoints[:, 1, :] += (keypoints[:, 0, :] - htps_keypoints[:, 1, :]) / 3
21 | 
22 |     htps_keypoints[:, 2, :] = np.mean(keypoints[:, 11:13, :], axis=1, dtype=np.float32)
23 |     htps_keypoints[:, 3, :] = np.mean(keypoints[:, [5, 6, 11, 12], :], axis=1, dtype=np.float32)
24 | 
25 |     keypoints_h36m[:, spple_keypoints, :] = htps_keypoints
26 |     keypoints_h36m[:, h36m_coco_order, :] = keypoints[:, coco_order, :]
27 | 
28 |     keypoints_h36m[:, 9, :] -= (keypoints_h36m[:, 9, :] - np.mean(keypoints[:, 5:7, :], axis=1, dtype=np.float32)) / 4
29 |     keypoints_h36m[:, 7, 0] += 0.3*(keypoints_h36m[:, 7, 0] - np.mean(keypoints_h36m[:, [0, 8], 0], axis=1, dtype=np.float32))
30 |     keypoints_h36m[:, 8, 1] -= (np.mean(keypoints[:, 1:3, 1], axis=1, dtype=np.float32) - keypoints[:, 0, 1])*2/3
31 | 
32 |     # half body: the joint of ankle and knee equal to hip
33 |     # keypoints_h36m[:, [2, 3]] = keypoints_h36m[:, [1, 1]]
34 |     # keypoints_h36m[:, [5, 6]] = keypoints_h36m[:, [4, 4]]
35 |     return keypoints_h36m
36 | 
37 | 
38 | h36m_mpii_order = [3, 2, 1, 4, 5, 6, 0, 8, 9, 10, 16, 15, 14, 11, 12, 13]
39 | mpii_order = [i for i in range(16)]
40 | lr_hip_shouler = [2, 3, 12, 13]
41 | 
42 | 
43 | def mpii_h36m(keypoints):
44 |     temporal = keypoints.shape[0]
45 |     keypoints_h36m = np.zeros((temporal, 17, 2), dtype=np.float32)
46 |     keypoints_h36m[:, h36m_mpii_order] = keypoints
47 |     # keypoints_h36m[:, 7] = np.mean(keypoints[:, 6:8], axis=1, dtype=np.float32)
48 |     keypoints_h36m[:, 7] = np.mean(keypoints[:, lr_hip_shouler], axis=1, dtype=np.float32)
49 |     return keypoints_h36m
50 | 
51 | 
52 | 


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/utils/inference.py:
--------------------------------------------------------------------------------
 1 | # ------------------------------------------------------------------------------
 2 | # Copyright (c) Microsoft
 3 | # Licensed under the MIT License.
 4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com)
 5 | # ------------------------------------------------------------------------------
 6 | 
 7 | from __future__ import absolute_import
 8 | from __future__ import division
 9 | from __future__ import print_function
10 | 
11 | import math
12 | import sys
13 | import os.path as osp
14 | import numpy as np
15 | 
16 | sys.path.insert(0, osp.join(osp.dirname(osp.realpath(__file__)), '..'))
17 | from utils.transforms import transform_preds
18 | sys.path.pop(0)
19 | 
20 | 
21 | def get_max_preds(batch_heatmaps):
22 |     '''
23 |     get predictions from score maps
24 |     heatmaps: numpy.ndarray([batch_size, num_joints, height, width])
25 |     '''
26 |     assert isinstance(batch_heatmaps, np.ndarray), \
27 |         'batch_heatmaps should be numpy.ndarray'
28 |     assert batch_heatmaps.ndim == 4, 'batch_images should be 4-ndim'
29 | 
30 |     batch_size = batch_heatmaps.shape[0]
31 |     num_joints = batch_heatmaps.shape[1]
32 |     width = batch_heatmaps.shape[3]
33 |     heatmaps_reshaped = batch_heatmaps.reshape((batch_size, num_joints, -1))
34 |     idx = np.argmax(heatmaps_reshaped, 2)
35 |     maxvals = np.amax(heatmaps_reshaped, 2)
36 | 
37 |     maxvals = maxvals.reshape((batch_size, num_joints, 1))
38 |     idx = idx.reshape((batch_size, num_joints, 1))
39 | 
40 |     preds = np.tile(idx, (1, 1, 2)).astype(np.float32)
41 | 
42 |     preds[:, :, 0] = (preds[:, :, 0]) % width
43 |     preds[:, :, 1] = np.floor((preds[:, :, 1]) / width)
44 | 
45 |     pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))
46 |     pred_mask = pred_mask.astype(np.float32)
47 | 
48 |     preds *= pred_mask
49 |     return preds, maxvals
50 | 
51 | 
52 | def get_final_preds(config, batch_heatmaps, center, scale):
53 |     coords, maxvals = get_max_preds(batch_heatmaps)
54 | 
55 |     heatmap_height = batch_heatmaps.shape[2]
56 |     heatmap_width = batch_heatmaps.shape[3]
57 | 
58 |     # post-processing
59 |     if config.TEST.POST_PROCESS:
60 |         for n in range(coords.shape[0]):
61 |             for p in range(coords.shape[1]):
62 |                 hm = batch_heatmaps[n][p]
63 |                 px = int(math.floor(coords[n][p][0] + 0.5))
64 |                 py = int(math.floor(coords[n][p][1] + 0.5))
65 |                 if 1 < px < heatmap_width-1 and 1 < py < heatmap_height-1:
66 |                     diff = np.array(
67 |                         [
68 |                             hm[py][px+1] - hm[py][px-1],
69 |                             hm[py+1][px]-hm[py-1][px]
70 |                         ]
71 |                     )
72 |                     coords[n][p] += np.sign(diff) * .25
73 | 
74 |     preds = coords.copy()
75 | 
76 |     # Transform back
77 |     for i in range(coords.shape[0]):
78 |         preds[i] = transform_preds(
79 |             coords[i], center[i], scale[i], [heatmap_width, heatmap_height]
80 |         )
81 | 
82 |     return preds, maxvals
83 | 


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/utils/transforms.py:
--------------------------------------------------------------------------------
  1 | # ------------------------------------------------------------------------------
  2 | # Copyright (c) Microsoft
  3 | # Licensed under the MIT License.
  4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com)
  5 | # ------------------------------------------------------------------------------
  6 | 
  7 | from __future__ import absolute_import
  8 | from __future__ import division
  9 | from __future__ import print_function
 10 | 
 11 | import numpy as np
 12 | import cv2
 13 | 
 14 | 
 15 | def flip_back(output_flipped, matched_parts):
 16 |     '''
 17 |     ouput_flipped: numpy.ndarray(batch_size, num_joints, height, width)
 18 |     '''
 19 |     assert output_flipped.ndim == 4,\
 20 |         'output_flipped should be [batch_size, num_joints, height, width]'
 21 | 
 22 |     output_flipped = output_flipped[:, :, :, ::-1]
 23 | 
 24 |     # 因为你输入的是翻转后的图像，所以输出的热图他们对应的左右关节也是相反的（训练的时候，输入的是翻转后的图像，target对应的左右关节也是对调过来的）。
 25 |     for pair in matched_parts:
 26 |         tmp = output_flipped[:, pair[0], :, :].copy()
 27 |         output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :]
 28 |         output_flipped[:, pair[1], :, :] = tmp
 29 | 
 30 |     return output_flipped
 31 | 
 32 | 
 33 | def fliplr_joints(joints, joints_vis, width, matched_parts):
 34 |     """
 35 |     flip coords
 36 |     """
 37 |     # Flip horizontal
 38 |     joints[:, 0] = width - joints[:, 0] - 1
 39 | 
 40 |     # Change left-right parts
 41 |     for pair in matched_parts:
 42 |         joints[pair[0], :], joints[pair[1], :] = \
 43 |             joints[pair[1], :], joints[pair[0], :].copy()
 44 |         joints_vis[pair[0], :], joints_vis[pair[1], :] = \
 45 |             joints_vis[pair[1], :], joints_vis[pair[0], :].copy()
 46 | 
 47 |     return joints*joints_vis, joints_vis
 48 | 
 49 | 
 50 | def transform_preds(coords, center, scale, output_size):
 51 |     target_coords = np.zeros(coords.shape)
 52 |     trans = get_affine_transform(center, scale, 0, output_size, inv=1)
 53 |     for p in range(coords.shape[0]):
 54 |         target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans)
 55 |     return target_coords
 56 | 
 57 | 
 58 | def get_affine_transform(
 59 |         center, scale, rot, output_size,
 60 |         shift=np.array([0, 0], dtype=np.float32), inv=0
 61 | ):
 62 |     if not isinstance(scale, np.ndarray) and not isinstance(scale, list):
 63 |         print(scale)
 64 |         scale = np.array([scale, scale])
 65 | 
 66 |     scale_tmp = scale * 200.0
 67 |     src_w = scale_tmp[0]
 68 |     dst_w = output_size[0]
 69 |     dst_h = output_size[1]
 70 | 
 71 |     rot_rad = np.pi * rot / 180
 72 |     src_dir = get_dir([0, src_w * -0.5], rot_rad)
 73 |     dst_dir = np.array([0, dst_w * -0.5], np.float32)
 74 | 
 75 |     src = np.zeros((3, 2), dtype=np.float32)
 76 |     dst = np.zeros((3, 2), dtype=np.float32)
 77 |     src[0, :] = center + scale_tmp * shift
 78 |     src[1, :] = center + src_dir + scale_tmp * shift
 79 |     dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
 80 |     dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir
 81 | 
 82 |     src[2:, :] = get_3rd_point(src[0, :], src[1, :])
 83 |     dst[2:, :] = get_3rd_point(dst[0, :], dst[1, :])
 84 | 
 85 |     if inv:
 86 |         trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
 87 |     else:
 88 |         trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
 89 | 
 90 |     return trans
 91 | 
 92 | 
 93 | def affine_transform(pt, t):
 94 |     new_pt = np.array([pt[0], pt[1], 1.]).T
 95 |     new_pt = np.dot(t, new_pt)
 96 |     return new_pt[:2]
 97 | 
 98 | 
 99 | def get_3rd_point(a, b):
100 |     direct = a - b
101 |     return b + np.array([-direct[1], direct[0]], dtype=np.float32)
102 | 
103 | 
104 | def get_dir(src_point, rot_rad):
105 |     sn, cs = np.sin(rot_rad), np.cos(rot_rad)
106 | 
107 |     src_result = [0, 0]
108 |     src_result[0] = src_point[0] * cs - src_point[1] * sn
109 |     src_result[1] = src_point[0] * sn + src_point[1] * cs
110 | 
111 |     return src_result
112 | 
113 | 
114 | def crop(img, center, scale, output_size, rot=0):
115 |     trans = get_affine_transform(center, scale, rot, output_size)
116 | 
117 |     dst_img = cv2.warpAffine(
118 |         img, trans, (int(output_size[0]), int(output_size[1])),
119 |         flags=cv2.INTER_LINEAR
120 |     )
121 | 
122 |     return dst_img
123 | 


--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/utils/utilitys.py:
--------------------------------------------------------------------------------
  1 | import cv2
  2 | import sys
  3 | import torch
  4 | import json
  5 | import torchvision.transforms as transforms
  6 | from lib.hrnet.lib.utils.transforms import *
  7 | 
  8 | from lib.hrnet.lib.utils.coco_h36m import coco_h36m
  9 | import numpy as np
 10 | 
 11 | joint_pairs = [[0, 1], [1, 3], [0, 2], [2, 4],
 12 |                [5, 6], [5, 7], [7, 9], [6, 8], [8, 10],
 13 |                [5, 11], [6, 12], [11, 12],
 14 |                [11, 13], [12, 14], [13, 15], [14, 16]]
 15 | 
 16 | h36m_pairs = [(0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6), (0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12),
 17 |               (12, 13), (8, 14), (14, 15), (15, 16)]
 18 | 
 19 | colors = [[255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0], [85, 255, 0], [0, 255, 0], \
 20 |           [0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], \
 21 |           [170, 0, 255], [255, 0, 255]]
 22 | 
 23 | 
 24 | def plot_keypoint(image, coordinates, confidence, keypoint_thresh=0.3):
 25 |     # USE cv2
 26 |     joint_visible = confidence[:, :, 0] > keypoint_thresh
 27 |     coordinates = coco_h36m(coordinates)
 28 |     for i in range(coordinates.shape[0]):
 29 |         pts = coordinates[i]
 30 | 
 31 |         for joint in pts:
 32 |             cv2.circle(image, (int(joint[0]), int(joint[1])), 8, (255, 255, 255), 1)
 33 | 
 34 |         for color_i, jp in zip(colors, h36m_pairs):
 35 |             if joint_visible[i, jp[0]] and joint_visible[i, jp[1]]:
 36 |                 pt0 = pts[jp, 0]
 37 |                 pt1 = pts[jp, 1]
 38 |                 pt0_0, pt0_1, pt1_0, pt1_1 = int(pt0[0]), int(pt0[1]), int(pt1[0]), int(pt1[1])
 39 | 
 40 |                 cv2.line(image, (pt0_0, pt1_0), (pt0_1, pt1_1), color_i, 6)
 41 |                 #  cv2.circle(image,(pt0_0, pt0_1), 2, color_i, thickness=-1)
 42 |                 #  cv2.circle(image,(pt1_0, pt1_1), 2, color_i, thickness=-1)
 43 |     return image
 44 | 
 45 | 
 46 | def write(x, img):
 47 |     x = [int(i) for i in x]
 48 |     c1 = tuple(x[0:2])
 49 |     c2 = tuple(x[2:4])
 50 | 
 51 |     color = [0, 97, 255]
 52 |     label = 'People {}'.format(x[-1])
 53 |     cv2.rectangle(img, c1, c2, color, 2)
 54 |     t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1, 1)[0]
 55 |     c2 = c1[0] + t_size[0] + 3, c1[1] + t_size[1] + 4
 56 |     cv2.rectangle(img, c1, c2, [0, 128, 255], -1)
 57 |     cv2.putText(img, label, (c1[0], c1[1] + t_size[1] + 4), cv2.FONT_HERSHEY_PLAIN, 1, [225, 255, 255], 1)
 58 |     return img
 59 | 
 60 | 
 61 | def load_json(file_path):
 62 |     with open(file_path, 'r') as fr:
 63 |         video_info = json.load(fr)
 64 | 
 65 |     label = video_info['label']
 66 |     label_index = video_info['label_index']
 67 | 
 68 |     num_frames = video_info['data'][-1]['frame_index']
 69 |     keypoints = np.zeros((2, num_frames, 17, 2), dtype=np.float32)  # (M, T, N, 2)
 70 |     scores = np.zeros((2, num_frames, 17), dtype=np.float32)  # (M, T, N)
 71 | 
 72 |     for frame_info in video_info['data']:
 73 |         frame_index = frame_info['frame_index']
 74 | 
 75 |         for index, skeleton_info in enumerate(frame_info['skeleton']):
 76 |             pose = skeleton_info['pose']
 77 |             score = skeleton_info['score']
 78 |             bbox = skeleton_info['bbox']
 79 | 
 80 |             if len(bbox) == 0 or index+1 > 2:
 81 |                 continue
 82 | 
 83 |             pose = np.asarray(pose, dtype=np.float32)
 84 |             score = np.asarray(score, dtype=np.float32)
 85 |             score = score.reshape(-1)
 86 | 
 87 |             keypoints[index, frame_index-1] = pose
 88 |             scores[index, frame_index-1] = score
 89 | 
 90 |     new_kpts = []
 91 |     for i in range(keypoints.shape[0]):
 92 |         kps = keypoints[i]
 93 |         if np.sum(kps) != 0.:
 94 |             new_kpts.append(kps)
 95 | 
 96 |     new_kpts = np.asarray(new_kpts, dtype=np.float32)
 97 |     scores = np.asarray(scores, dtype=np.float32)
 98 |     scores = scores[:, :, :, np.newaxis]
 99 |     return new_kpts, scores, label, label_index
100 | 
101 | 
102 | def box_to_center_scale(box, model_image_width, model_image_height):
103 |     """convert a box to center,scale information required for pose transformation
104 |     Parameters
105 |     ----------
106 |     box : (x1, y1, x2, y2)
107 |     model_image_width : int
108 |     model_image_height : int
109 | 
110 |     Returns
111 |     -------
112 |     (numpy array, numpy array)
113 |         Two numpy arrays, coordinates for the center of the box and the scale of the box
114 |     """
115 |     center = np.zeros((2), dtype=np.float32)
116 |     x1, y1, x2, y2 = box[:4]
117 |     box_width, box_height = x2 - x1, y2 - y1
118 | 
119 |     center[0] = x1 + box_width * 0.5
120 |     center[1] = y1 + box_height * 0.5
121 | 
122 |     aspect_ratio = model_image_width * 1.0 / model_image_height
123 |     pixel_std = 200
124 | 
125 |     if box_width > aspect_ratio * box_height:
126 |         box_height = box_width * 1.0 / aspect_ratio
127 |     elif box_width < aspect_ratio * box_height:
128 |         box_width = box_height * aspect_ratio
129 |     scale = np.array(
130 |         [box_width * 1.0 / pixel_std, box_height * 1.0 / pixel_std],
131 |         dtype=np.float32)
132 |     if center[0] != -1:
133 |         scale = scale * 1.25
134 | 
135 |     return center, scale
136 | 
137 | 
138 | # Pre-process
139 | def PreProcess(image, bboxs, cfg, num_pos=2):
140 |     if type(image) == str:
141 |         data_numpy = cv2.imread(image, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
142 |         # data_numpy = cv2.cvtColor(data_numpy, cv2.COLOR_BGR2RGB)
143 |     else:
144 |         data_numpy = image
145 | 
146 |     inputs = []
147 |     centers = []
148 |     scales = []
149 | 
150 |     for bbox in bboxs[:num_pos]:
151 |         c, s = box_to_center_scale(bbox, data_numpy.shape[0], data_numpy.shape[1])
152 |         centers.append(c)
153 |         scales.append(s)
154 |         r = 0
155 | 
156 |         trans = get_affine_transform(c, s, r, cfg.MODEL.IMAGE_SIZE)
157 |         input = cv2.warpAffine(
158 |             data_numpy,
159 |             trans,
160 |             (int(cfg.MODEL.IMAGE_SIZE[0]), int(cfg.MODEL.IMAGE_SIZE[1])),
161 |             flags=cv2.INTER_LINEAR)
162 | 
163 |         transform = transforms.Compose([transforms.ToTensor(),
164 |                                         transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
165 |         input = transform(input).unsqueeze(0)
166 |         inputs.append(input)
167 | 
168 |     inputs = torch.cat(inputs)
169 |     return inputs, data_numpy, centers, scales
170 | 


--------------------------------------------------------------------------------
/demo/lib/preprocess.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import numpy as np
  3 | import os
  4 | 
  5 | h36m_coco_order = [9, 11, 14, 12, 15, 13, 16, 4, 1, 5, 2, 6, 3]
  6 | coco_order = [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
  7 | spple_keypoints = [10, 8, 0, 7]
  8 | 
  9 | 
 10 | def coco_h36m(keypoints):
 11 |     temporal = keypoints.shape[0]
 12 |     keypoints_h36m = np.zeros_like(keypoints, dtype=np.float32)
 13 |     htps_keypoints = np.zeros((temporal, 4, 2), dtype=np.float32)
 14 | 
 15 |     # htps_keypoints: head, thorax, pelvis, spine
 16 |     htps_keypoints[:, 0, 0] = np.mean(keypoints[:, 1:5, 0], axis=1, dtype=np.float32)
 17 |     htps_keypoints[:, 0, 1] = np.sum(keypoints[:, 1:3, 1], axis=1, dtype=np.float32) - keypoints[:, 0, 1]
 18 |     htps_keypoints[:, 1, :] = np.mean(keypoints[:, 5:7, :], axis=1, dtype=np.float32)
 19 |     htps_keypoints[:, 1, :] += (keypoints[:, 0, :] - htps_keypoints[:, 1, :]) / 3
 20 | 
 21 |     htps_keypoints[:, 2, :] = np.mean(keypoints[:, 11:13, :], axis=1, dtype=np.float32)
 22 |     htps_keypoints[:, 3, :] = np.mean(keypoints[:, [5, 6, 11, 12], :], axis=1, dtype=np.float32)
 23 | 
 24 |     keypoints_h36m[:, spple_keypoints, :] = htps_keypoints
 25 |     keypoints_h36m[:, h36m_coco_order, :] = keypoints[:, coco_order, :]
 26 | 
 27 |     keypoints_h36m[:, 9, :] -= (keypoints_h36m[:, 9, :] - np.mean(keypoints[:, 5:7, :], axis=1, dtype=np.float32)) / 4
 28 |     keypoints_h36m[:, 7, 0] += 2*(keypoints_h36m[:, 7, 0] - np.mean(keypoints_h36m[:, [0, 8], 0], axis=1, dtype=np.float32))
 29 |     keypoints_h36m[:, 8, 1] -= (np.mean(keypoints[:, 1:3, 1], axis=1, dtype=np.float32) - keypoints[:, 0, 1])*2/3
 30 | 
 31 |     # half body: the joint of ankle and knee equal to hip
 32 |     # keypoints_h36m[:, [2, 3]] = keypoints_h36m[:, [1, 1]]
 33 |     # keypoints_h36m[:, [5, 6]] = keypoints_h36m[:, [4, 4]]
 34 | 
 35 |     valid_frames = np.where(np.sum(keypoints_h36m.reshape(-1, 34), axis=1) != 0)[0]
 36 |     
 37 |     return keypoints_h36m, valid_frames
 38 | 
 39 | 
 40 | def h36m_coco_format(keypoints, scores):
 41 |     assert len(keypoints.shape) == 4 and len(scores.shape) == 3
 42 | 
 43 |     h36m_kpts = []
 44 |     h36m_scores = []
 45 |     valid_frames = []
 46 | 
 47 |     for i in range(keypoints.shape[0]):
 48 |         kpts = keypoints[i]
 49 |         score = scores[i]
 50 | 
 51 |         new_score = np.zeros_like(score, dtype=np.float32)
 52 | 
 53 |         if np.sum(kpts) != 0.:
 54 |             kpts, valid_frame = coco_h36m(kpts)
 55 |             h36m_kpts.append(kpts)
 56 |             valid_frames.append(valid_frame)
 57 | 
 58 |             new_score[:, h36m_coco_order] = score[:, coco_order]
 59 |             new_score[:, 0] = np.mean(score[:, [11, 12]], axis=1, dtype=np.float32)
 60 |             new_score[:, 8] = np.mean(score[:, [5, 6]], axis=1, dtype=np.float32)
 61 |             new_score[:, 7] = np.mean(new_score[:, [0, 8]], axis=1, dtype=np.float32)
 62 |             new_score[:, 10] = np.mean(score[:, [1, 2, 3, 4]], axis=1, dtype=np.float32)
 63 | 
 64 |             h36m_scores.append(new_score)
 65 | 
 66 |     h36m_kpts = np.asarray(h36m_kpts, dtype=np.float32)
 67 |     h36m_scores = np.asarray(h36m_scores, dtype=np.float32)
 68 | 
 69 |     return h36m_kpts, h36m_scores, valid_frames
 70 | 
 71 | 
 72 | def revise_kpts(h36m_kpts, h36m_scores, valid_frames):
 73 | 
 74 |     new_h36m_kpts = np.zeros_like(h36m_kpts)
 75 |     for index, frames in enumerate(valid_frames):
 76 |         kpts = h36m_kpts[index, frames]
 77 |         score = h36m_scores[index, frames]
 78 | 
 79 |         index_frame = np.where(np.sum(score < 0.3, axis=1) > 0)[0]
 80 | 
 81 |         for frame in index_frame:
 82 |             less_threshold_joints = np.where(score[frame] < 0.3)[0]
 83 | 
 84 |             intersect = [i for i in [2, 3, 5, 6] if i in less_threshold_joints]
 85 | 
 86 |             if [2, 3, 5, 6] == intersect:
 87 |                 kpts[frame, [2, 3, 5, 6]] = kpts[frame, [1, 1, 4, 4]]
 88 |             elif [2, 3, 6] == intersect:
 89 |                 kpts[frame, [2, 3, 6]] = kpts[frame, [1, 1, 5]]
 90 |             elif [3, 5, 6] == intersect:
 91 |                 kpts[frame, [3, 5, 6]] = kpts[frame, [2, 4, 4]]
 92 |             elif [3, 6] == intersect:
 93 |                 kpts[frame, [3, 6]] = kpts[frame, [2, 5]]
 94 |             elif [3] == intersect:
 95 |                 kpts[frame, 3] = kpts[frame, 2]
 96 |             elif [6] == intersect:
 97 |                 kpts[frame, 6] = kpts[frame, 5]
 98 |             else:
 99 |                 continue
100 | 
101 |         new_h36m_kpts[index, frames] = kpts
102 | 
103 |     return new_h36m_kpts
104 | 
105 | 
106 | 


--------------------------------------------------------------------------------
/demo/lib/sort/sort.py:
--------------------------------------------------------------------------------
  1 | """
  2 |     https://arxiv.org/abs/1602.00763
  3 | """
  4 | from __future__ import print_function
  5 | 
  6 | from numba import jit
  7 | import os.path
  8 | import numpy as np
  9 | from skimage import io
 10 | from scipy.optimize import linear_sum_assignment
 11 | import argparse
 12 | from filterpy.kalman import KalmanFilter
 13 | 
 14 | 
 15 | @jit
 16 | def iou(bb_test, bb_gt):
 17 |     """
 18 |     Computes IUO between two bboxes in the form [x1,y1,x2,y2]
 19 |     """
 20 |     xx1 = np.maximum(bb_test[0], bb_gt[0])
 21 |     yy1 = np.maximum(bb_test[1], bb_gt[1])
 22 |     xx2 = np.minimum(bb_test[2], bb_gt[2])
 23 |     yy2 = np.minimum(bb_test[3], bb_gt[3])
 24 |     w = np.maximum(0., xx2 - xx1)
 25 |     h = np.maximum(0., yy2 - yy1)
 26 |     wh = w * h
 27 |     o = wh / ((bb_test[2] - bb_test[0]) * (bb_test[3] - bb_test[1])
 28 |               + (bb_gt[2] - bb_gt[0]) * (bb_gt[3] - bb_gt[1]) - wh)
 29 | 
 30 |     return o
 31 | 
 32 | 
 33 | def convert_bbox_to_z(bbox):
 34 |     """
 35 |     Takes a bounding box in the form [x1,y1,x2,y2] and returns z in the form
 36 |       [x,y,s,r] where x,y is the centre of the box and s is the scale/area and r is
 37 |       the aspect ratio
 38 |     """
 39 |     w = bbox[2] - bbox[0]
 40 |     h = bbox[3] - bbox[1]
 41 |     x = bbox[0] + w / 2.
 42 |     y = bbox[1] + h / 2.
 43 |     s = w * h  # scale is just area
 44 |     r = w / float(h)
 45 |     return np.array([x, y, s, r]).reshape((4, 1))
 46 | 
 47 | 
 48 | def convert_x_to_bbox(x, score=None):
 49 |     """
 50 |     Takes a bounding box in the centre form [x,y,s,r] and returns it in the form
 51 |       [x1,y1,x2,y2] where x1,y1 is the top left and x2,y2 is the bottom right
 52 |     """
 53 |     w = np.sqrt(x[2] * x[3])
 54 |     h = x[2] / w
 55 |     if (score == None):
 56 |         return np.array([x[0] - w / 2., x[1] - h / 2., x[0] + w / 2., x[1] + h / 2.]).reshape((1, 4))
 57 |     else:
 58 |         return np.array([x[0] - w / 2., x[1] - h / 2., x[0] + w / 2., x[1] + h / 2., score]).reshape((1, 5))
 59 | 
 60 | 
 61 | class KalmanBoxTracker(object):
 62 |     """
 63 |     This class represents the internel state of individual tracked objects observed as bbox.
 64 |     """
 65 |     count = 0
 66 | 
 67 |     def __init__(self, bbox):
 68 |         """
 69 |         Initialises a tracker using initial bounding box.
 70 |         """
 71 |         # define constant velocity model
 72 |         self.kf = KalmanFilter(dim_x=7, dim_z=4)
 73 |         self.kf.F = np.array(
 74 |             [[1, 0, 0, 0, 1, 0, 0], [0, 1, 0, 0, 0, 1, 0], [0, 0, 1, 0, 0, 0, 1], [0, 0, 0, 1, 0, 0, 0],
 75 |              [0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 1]])
 76 |         self.kf.H = np.array(
 77 |             [[1, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0]])
 78 | 
 79 |         self.kf.R[2:, 2:] *= 10.
 80 |         self.kf.P[4:, 4:] *= 1000.  # give high uncertainty to the unobservable initial velocities
 81 |         self.kf.P *= 10.
 82 |         self.kf.Q[-1, -1] *= 0.01
 83 |         self.kf.Q[4:, 4:] *= 0.01
 84 | 
 85 |         self.kf.x[:4] = convert_bbox_to_z(bbox)
 86 |         self.time_since_update = 0
 87 |         self.id = KalmanBoxTracker.count
 88 |         KalmanBoxTracker.count += 1
 89 |         self.history = []
 90 |         self.hits = 0
 91 |         self.hit_streak = 0
 92 |         self.age = 0
 93 | 
 94 |     def update(self, bbox):
 95 |         """
 96 |         Updates the state vector with observed bbox.
 97 |         """
 98 |         self.time_since_update = 0
 99 |         self.history = []
100 |         self.hits += 1
101 |         self.hit_streak += 1
102 |         self.kf.update(convert_bbox_to_z(bbox))
103 | 
104 |     def predict(self):
105 |         """
106 |         Advances the state vector and returns the predicted bounding box estimate.
107 |         """
108 |         if ((self.kf.x[6] + self.kf.x[2]) <= 0):
109 |             self.kf.x[6] *= 0.0
110 |         self.kf.predict()
111 |         self.age += 1
112 |         if (self.time_since_update > 0):
113 |             self.hit_streak = 0
114 |         self.time_since_update += 1
115 |         self.history.append(convert_x_to_bbox(self.kf.x))
116 |         return self.history[-1]
117 | 
118 |     def get_state(self):
119 |         """
120 |         Returns the current bounding box estimate.
121 |         """
122 |         return convert_x_to_bbox(self.kf.x)
123 | 
124 | 
125 | def associate_detections_to_trackers(detections, trackers, iou_threshold=0.3):
126 |     """
127 |     Assigns detections to tracked object (both represented as bounding boxes)
128 | 
129 |     Returns 3 lists of matches, unmatched_detections and unmatched_trackers
130 |     """
131 |     if (len(trackers) == 0):
132 |         return np.empty((0, 2), dtype=int), np.arange(len(detections)), np.empty((0, 5), dtype=int)
133 |     iou_matrix = np.zeros((len(detections), len(trackers)), dtype=np.float32)
134 | 
135 |     for d, det in enumerate(detections):
136 |         for t, trk in enumerate(trackers):
137 |             iou_matrix[d, t] = iou(det, trk)
138 |     matched_indices = linear_sum_assignment(-iou_matrix)
139 |     matched_indices = np.asarray(matched_indices)
140 |     matched_indices = matched_indices.transpose()
141 | 
142 |     unmatched_detections = []
143 |     for d, det in enumerate(detections):
144 |         if (d not in matched_indices[:, 0]):
145 |             unmatched_detections.append(d)
146 |     unmatched_trackers = []
147 |     for t, trk in enumerate(trackers):
148 |         if (t not in matched_indices[:, 1]):
149 |             unmatched_trackers.append(t)
150 | 
151 |     # filter out matched with low IOU
152 |     matches = []
153 |     for m in matched_indices:
154 |         if (iou_matrix[m[0], m[1]] < iou_threshold):
155 |             unmatched_detections.append(m[0])
156 |             unmatched_trackers.append(m[1])
157 |         else:
158 |             matches.append(m.reshape(1, 2))
159 |     if (len(matches) == 0):
160 |         matches = np.empty((0, 2), dtype=int)
161 |     else:
162 |         matches = np.concatenate(matches, axis=0)
163 | 
164 |     return matches, np.array(unmatched_detections), np.array(unmatched_trackers)
165 | 
166 | 
167 | class Sort(object):
168 |     def __init__(self, max_age=1, min_hits=3):
169 |         """
170 |         Sets key parameters for SORT
171 |         """
172 |         self.max_age = max_age
173 |         self.min_hits = min_hits
174 |         self.trackers = []
175 |         self.frame_count = 0
176 | 
177 |     def update(self, dets):
178 |         """
179 |         Params:
180 |           dets - a numpy array of detections in the format [[x1,y1,x2,y2,score],[x1,y1,x2,y2,score],...]
181 |         Requires: this method must be called once for each frame even with empty detections.
182 |         Returns the a similar array, where the last column is the object ID.
183 | 
184 |         NOTE: The number of objects returned may differ from the number of detections provided.
185 |         """
186 |         self.frame_count += 1
187 |         # get predicted locations from existing trackers.
188 |         trks = np.zeros((len(self.trackers), 5))
189 |         to_del = []
190 |         ret = []
191 |         for t, trk in enumerate(trks):
192 |             pos = self.trackers[t].predict()[0]
193 |             trk[:] = [pos[0], pos[1], pos[2], pos[3], 0]
194 |             if np.any(np.isnan(pos)):
195 |                 to_del.append(t)
196 |         trks = np.ma.compress_rows(np.ma.masked_invalid(trks))
197 |         for t in reversed(to_del):
198 |             self.trackers.pop(t)
199 |         matched, unmatched_dets, unmatched_trks = associate_detections_to_trackers(dets, trks)
200 | 
201 |         # update matched trackers with assigned detections
202 |         for t, trk in enumerate(self.trackers):
203 |             if t not in unmatched_trks:
204 |                 d = matched[np.where(matched[:, 1] == t)[0], 0]  # d: [n]
205 |                 trk.update(dets[d, :][0])
206 | 
207 |         # create and initialise new trackers for unmatched detections
208 |         for i in unmatched_dets:
209 |             trk = KalmanBoxTracker(dets[i, :])
210 |             self.trackers.append(trk)
211 |         i = len(self.trackers)
212 |         for trk in reversed(self.trackers):
213 |             d = trk.get_state()[0]
214 |             if ((trk.time_since_update < 1) and (trk.hit_streak >= self.min_hits or self.frame_count <= self.min_hits)):
215 |                 ret.append(np.concatenate((d, [trk.id + 1])).reshape(1, -1))  # +1 as MOT benchmark requires positive
216 |             i -= 1
217 |             # remove dead tracklet
218 |             if (trk.time_since_update > self.max_age):
219 |                 self.trackers.pop(i)
220 |         if (len(ret) > 0):
221 |             return np.concatenate(ret)
222 |         return np.empty((0, 5))
223 | 
224 | 
225 | def parse_args():
226 |     """Parse input arguments."""
227 |     parser = argparse.ArgumentParser(description='SORT demo')
228 |     parser.add_argument('--display', dest='display', help='Display online tracker output (slow) [False]',
229 |                         action='store_true')
230 |     args = parser.parse_args()
231 |     return args
232 | 


--------------------------------------------------------------------------------
/demo/lib/yolov3/bbox.py:
--------------------------------------------------------------------------------
  1 | from __future__ import division
  2 | 
  3 | import torch 
  4 | import random
  5 | import numpy as np
  6 | import cv2
  7 | 
  8 | 
  9 | def confidence_filter(result, confidence):
 10 |     conf_mask = (result[:,:,4] > confidence).float().unsqueeze(2)
 11 |     result = result*conf_mask    
 12 |     
 13 |     return result
 14 | 
 15 | 
 16 | def confidence_filter_cls(result, confidence):
 17 |     max_scores = torch.max(result[:,:,5:25], 2)[0]
 18 |     res = torch.cat((result, max_scores),2)
 19 |     print(res.shape)
 20 |     
 21 |     
 22 |     cond_1 = (res[:,:,4] > confidence).float()
 23 |     cond_2 = (res[:,:,25] > 0.995).float()
 24 |     
 25 |     conf = cond_1 + cond_2
 26 |     conf = torch.clamp(conf, 0.0, 1.0)
 27 |     conf = conf.unsqueeze(2)
 28 |     result = result*conf   
 29 |     return result
 30 | 
 31 | 
 32 | def get_abs_coord(box):
 33 |     box[2], box[3] = abs(box[2]), abs(box[3])
 34 |     x1 = (box[0] - box[2]/2) - 1 
 35 |     y1 = (box[1] - box[3]/2) - 1 
 36 |     x2 = (box[0] + box[2]/2) - 1 
 37 |     y2 = (box[1] + box[3]/2) - 1
 38 |     return x1, y1, x2, y2
 39 | 
 40 | 
 41 | def sanity_fix(box):
 42 |     if (box[0] > box[2]):
 43 |         box[0], box[2] = box[2], box[0]
 44 |     
 45 |     if (box[1] >  box[3]):
 46 |         box[1], box[3] = box[3], box[1]
 47 |         
 48 |     return box
 49 | 
 50 | 
 51 | def bbox_iou(box1, box2):
 52 |     """
 53 |     Returns the IoU of two bounding boxes
 54 |     
 55 |     """
 56 |     # Get the coordinates of bounding boxes
 57 |     b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]
 58 |     b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]
 59 |     
 60 |     # get the corrdinates of the intersection rectangle
 61 |     inter_rect_x1 = torch.max(b1_x1, b2_x1)
 62 |     inter_rect_y1 = torch.max(b1_y1, b2_y1)
 63 |     inter_rect_x2 = torch.min(b1_x2, b2_x2)
 64 |     inter_rect_y2 = torch.min(b1_y2, b2_y2)
 65 |     
 66 |     # Intersection area
 67 |     if torch.cuda.is_available():
 68 |             inter_area = torch.max(inter_rect_x2 - inter_rect_x1 + 1, torch.zeros(inter_rect_x2.shape).cuda())*torch.max(inter_rect_y2 - inter_rect_y1 + 1, torch.zeros(inter_rect_x2.shape).cuda())
 69 |     else:
 70 |             inter_area = torch.max(inter_rect_x2 - inter_rect_x1 + 1, torch.zeros(inter_rect_x2.shape))*torch.max(inter_rect_y2 - inter_rect_y1 + 1, torch.zeros(inter_rect_x2.shape))
 71 |     
 72 |     # Union Area
 73 |     b1_area = (b1_x2 - b1_x1 + 1)*(b1_y2 - b1_y1 + 1)
 74 |     b2_area = (b2_x2 - b2_x1 + 1)*(b2_y2 - b2_y1 + 1)
 75 |     
 76 |     iou = inter_area / (b1_area + b2_area - inter_area)
 77 |     
 78 |     return iou
 79 | 
 80 | 
 81 | def pred_corner_coord(prediction):
 82 |     #Get indices of non-zero confidence bboxes
 83 |     ind_nz = torch.nonzero(prediction[:,:,4]).transpose(0,1).contiguous()
 84 |     
 85 |     box = prediction[ind_nz[0], ind_nz[1]]
 86 | 
 87 |     box_a = box.new(box.shape)
 88 |     box_a[:,0] = (box[:,0] - box[:,2]/2)
 89 |     box_a[:,1] = (box[:,1] - box[:,3]/2)
 90 |     box_a[:,2] = (box[:,0] + box[:,2]/2) 
 91 |     box_a[:,3] = (box[:,1] + box[:,3]/2)
 92 |     box[:,:4] = box_a[:,:4]
 93 |     
 94 |     prediction[ind_nz[0], ind_nz[1]] = box
 95 |     
 96 |     return prediction
 97 | 
 98 | 
 99 | def write(x, batches, results, colors, classes):
100 |     c1 = tuple(x[1:3].int())
101 |     c2 = tuple(x[3:5].int())
102 |     img = results[int(x[0])]
103 |     cls = int(x[-1])
104 |     label = "{0}".format(classes[cls])
105 |     color = random.choice(colors)
106 |     cv2.rectangle(img, c1, c2,color, 1)
107 |     t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1 , 1)[0]
108 |     c2 = c1[0] + t_size[0] + 3, c1[1] + t_size[1] + 4
109 |     cv2.rectangle(img, c1, c2,color, -1)
110 |     cv2.putText(img, label, (c1[0], c1[1] + t_size[1] + 4), cv2.FONT_HERSHEY_PLAIN, 1, [225,255,255], 1);
111 |     return img
112 | 


--------------------------------------------------------------------------------
/demo/lib/yolov3/cfg/tiny-yolo-voc.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | batch=64
  3 | subdivisions=8
  4 | width=416
  5 | height=416
  6 | channels=3
  7 | momentum=0.9
  8 | decay=0.0005
  9 | angle=0
 10 | saturation = 1.5
 11 | exposure = 1.5
 12 | hue=.1
 13 | 
 14 | learning_rate=0.001
 15 | max_batches = 40200
 16 | policy=steps
 17 | steps=-1,100,20000,30000
 18 | scales=.1,10,.1,.1
 19 | 
 20 | [convolutional]
 21 | batch_normalize=1
 22 | filters=16
 23 | size=3
 24 | stride=1
 25 | pad=1
 26 | activation=leaky
 27 | 
 28 | [maxpool]
 29 | size=2
 30 | stride=2
 31 | 
 32 | [convolutional]
 33 | batch_normalize=1
 34 | filters=32
 35 | size=3
 36 | stride=1
 37 | pad=1
 38 | activation=leaky
 39 | 
 40 | [maxpool]
 41 | size=2
 42 | stride=2
 43 | 
 44 | [convolutional]
 45 | batch_normalize=1
 46 | filters=64
 47 | size=3
 48 | stride=1
 49 | pad=1
 50 | activation=leaky
 51 | 
 52 | [maxpool]
 53 | size=2
 54 | stride=2
 55 | 
 56 | [convolutional]
 57 | batch_normalize=1
 58 | filters=128
 59 | size=3
 60 | stride=1
 61 | pad=1
 62 | activation=leaky
 63 | 
 64 | [maxpool]
 65 | size=2
 66 | stride=2
 67 | 
 68 | [convolutional]
 69 | batch_normalize=1
 70 | filters=256
 71 | size=3
 72 | stride=1
 73 | pad=1
 74 | activation=leaky
 75 | 
 76 | [maxpool]
 77 | size=2
 78 | stride=2
 79 | 
 80 | [convolutional]
 81 | batch_normalize=1
 82 | filters=512
 83 | size=3
 84 | stride=1
 85 | pad=1
 86 | activation=leaky
 87 | 
 88 | [maxpool]
 89 | size=2
 90 | stride=1
 91 | 
 92 | [convolutional]
 93 | batch_normalize=1
 94 | filters=1024
 95 | size=3
 96 | stride=1
 97 | pad=1
 98 | activation=leaky
 99 | 
100 | ###########
101 | 
102 | [convolutional]
103 | batch_normalize=1
104 | size=3
105 | stride=1
106 | pad=1
107 | filters=1024
108 | activation=leaky
109 | 
110 | [convolutional]
111 | size=1
112 | stride=1
113 | pad=1
114 | filters=125
115 | activation=linear
116 | 
117 | [region]
118 | anchors = 1.08,1.19,  3.42,4.41,  6.63,11.38,  9.42,5.11,  16.62,10.52
119 | bias_match=1
120 | classes=20
121 | coords=4
122 | num=5
123 | softmax=1
124 | jitter=.2
125 | rescore=1
126 | 
127 | object_scale=5
128 | noobject_scale=1
129 | class_scale=1
130 | coord_scale=1
131 | 
132 | absolute=1
133 | thresh = .6
134 | random=1
135 | 


--------------------------------------------------------------------------------
/demo/lib/yolov3/cfg/yolo-voc.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | batch=64
  4 | subdivisions=8
  5 | # Training
  6 | # batch=64
  7 | # subdivisions=8
  8 | height=416
  9 | width=416
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 80200
 21 | policy=steps
 22 | steps=-1,500,40000,60000
 23 | scales=0.1,10,.1,.1
 24 | 
 25 | [convolutional]
 26 | batch_normalize=1
 27 | filters=32
 28 | size=3
 29 | stride=1
 30 | pad=1
 31 | activation=leaky
 32 | 
 33 | [maxpool]
 34 | size=2
 35 | stride=2
 36 | 
 37 | [convolutional]
 38 | batch_normalize=1
 39 | filters=64
 40 | size=3
 41 | stride=1
 42 | pad=1
 43 | activation=leaky
 44 | 
 45 | [maxpool]
 46 | size=2
 47 | stride=2
 48 | 
 49 | [convolutional]
 50 | batch_normalize=1
 51 | filters=128
 52 | size=3
 53 | stride=1
 54 | pad=1
 55 | activation=leaky
 56 | 
 57 | [convolutional]
 58 | batch_normalize=1
 59 | filters=64
 60 | size=1
 61 | stride=1
 62 | pad=1
 63 | activation=leaky
 64 | 
 65 | [convolutional]
 66 | batch_normalize=1
 67 | filters=128
 68 | size=3
 69 | stride=1
 70 | pad=1
 71 | activation=leaky
 72 | 
 73 | [maxpool]
 74 | size=2
 75 | stride=2
 76 | 
 77 | [convolutional]
 78 | batch_normalize=1
 79 | filters=256
 80 | size=3
 81 | stride=1
 82 | pad=1
 83 | activation=leaky
 84 | 
 85 | [convolutional]
 86 | batch_normalize=1
 87 | filters=128
 88 | size=1
 89 | stride=1
 90 | pad=1
 91 | activation=leaky
 92 | 
 93 | [convolutional]
 94 | batch_normalize=1
 95 | filters=256
 96 | size=3
 97 | stride=1
 98 | pad=1
 99 | activation=leaky
100 | 
101 | [maxpool]
102 | size=2
103 | stride=2
104 | 
105 | [convolutional]
106 | batch_normalize=1
107 | filters=512
108 | size=3
109 | stride=1
110 | pad=1
111 | activation=leaky
112 | 
113 | [convolutional]
114 | batch_normalize=1
115 | filters=256
116 | size=1
117 | stride=1
118 | pad=1
119 | activation=leaky
120 | 
121 | [convolutional]
122 | batch_normalize=1
123 | filters=512
124 | size=3
125 | stride=1
126 | pad=1
127 | activation=leaky
128 | 
129 | [convolutional]
130 | batch_normalize=1
131 | filters=256
132 | size=1
133 | stride=1
134 | pad=1
135 | activation=leaky
136 | 
137 | [convolutional]
138 | batch_normalize=1
139 | filters=512
140 | size=3
141 | stride=1
142 | pad=1
143 | activation=leaky
144 | 
145 | [maxpool]
146 | size=2
147 | stride=2
148 | 
149 | [convolutional]
150 | batch_normalize=1
151 | filters=1024
152 | size=3
153 | stride=1
154 | pad=1
155 | activation=leaky
156 | 
157 | [convolutional]
158 | batch_normalize=1
159 | filters=512
160 | size=1
161 | stride=1
162 | pad=1
163 | activation=leaky
164 | 
165 | [convolutional]
166 | batch_normalize=1
167 | filters=1024
168 | size=3
169 | stride=1
170 | pad=1
171 | activation=leaky
172 | 
173 | [convolutional]
174 | batch_normalize=1
175 | filters=512
176 | size=1
177 | stride=1
178 | pad=1
179 | activation=leaky
180 | 
181 | [convolutional]
182 | batch_normalize=1
183 | filters=1024
184 | size=3
185 | stride=1
186 | pad=1
187 | activation=leaky
188 | 
189 | 
190 | #######
191 | 
192 | [convolutional]
193 | batch_normalize=1
194 | size=3
195 | stride=1
196 | pad=1
197 | filters=1024
198 | activation=leaky
199 | 
200 | [convolutional]
201 | batch_normalize=1
202 | size=3
203 | stride=1
204 | pad=1
205 | filters=1024
206 | activation=leaky
207 | 
208 | [route]
209 | layers=-9
210 | 
211 | [convolutional]
212 | batch_normalize=1
213 | size=1
214 | stride=1
215 | pad=1
216 | filters=64
217 | activation=leaky
218 | 
219 | [reorg]
220 | stride=2
221 | 
222 | [route]
223 | layers=-1,-4
224 | 
225 | [convolutional]
226 | batch_normalize=1
227 | size=3
228 | stride=1
229 | pad=1
230 | filters=1024
231 | activation=leaky
232 | 
233 | [convolutional]
234 | size=1
235 | stride=1
236 | pad=1
237 | filters=125
238 | activation=linear
239 | 
240 | 
241 | [region]
242 | anchors =  1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071
243 | bias_match=1
244 | classes=20
245 | coords=4
246 | num=5
247 | softmax=1
248 | jitter=.3
249 | rescore=1
250 | 
251 | object_scale=5
252 | noobject_scale=1
253 | class_scale=1
254 | coord_scale=1
255 | 
256 | absolute=1
257 | thresh = .6
258 | random=1
259 | 


--------------------------------------------------------------------------------
/demo/lib/yolov3/cfg/yolo.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | batch=1
  4 | subdivisions=1
  5 | # Training
  6 | # batch=64
  7 | # subdivisions=8
  8 | width=416
  9 | height=416
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | [convolutional]
 26 | batch_normalize=1
 27 | filters=32
 28 | size=3
 29 | stride=1
 30 | pad=1
 31 | activation=leaky
 32 | 
 33 | [maxpool]
 34 | size=2
 35 | stride=2
 36 | 
 37 | [convolutional]
 38 | batch_normalize=1
 39 | filters=64
 40 | size=3
 41 | stride=1
 42 | pad=1
 43 | activation=leaky
 44 | 
 45 | [maxpool]
 46 | size=2
 47 | stride=2
 48 | 
 49 | [convolutional]
 50 | batch_normalize=1
 51 | filters=128
 52 | size=3
 53 | stride=1
 54 | pad=1
 55 | activation=leaky
 56 | 
 57 | [convolutional]
 58 | batch_normalize=1
 59 | filters=64
 60 | size=1
 61 | stride=1
 62 | pad=1
 63 | activation=leaky
 64 | 
 65 | [convolutional]
 66 | batch_normalize=1
 67 | filters=128
 68 | size=3
 69 | stride=1
 70 | pad=1
 71 | activation=leaky
 72 | 
 73 | [maxpool]
 74 | size=2
 75 | stride=2
 76 | 
 77 | [convolutional]
 78 | batch_normalize=1
 79 | filters=256
 80 | size=3
 81 | stride=1
 82 | pad=1
 83 | activation=leaky
 84 | 
 85 | [convolutional]
 86 | batch_normalize=1
 87 | filters=128
 88 | size=1
 89 | stride=1
 90 | pad=1
 91 | activation=leaky
 92 | 
 93 | [convolutional]
 94 | batch_normalize=1
 95 | filters=256
 96 | size=3
 97 | stride=1
 98 | pad=1
 99 | activation=leaky
100 | 
101 | [maxpool]
102 | size=2
103 | stride=2
104 | 
105 | [convolutional]
106 | batch_normalize=1
107 | filters=512
108 | size=3
109 | stride=1
110 | pad=1
111 | activation=leaky
112 | 
113 | [convolutional]
114 | batch_normalize=1
115 | filters=256
116 | size=1
117 | stride=1
118 | pad=1
119 | activation=leaky
120 | 
121 | [convolutional]
122 | batch_normalize=1
123 | filters=512
124 | size=3
125 | stride=1
126 | pad=1
127 | activation=leaky
128 | 
129 | [convolutional]
130 | batch_normalize=1
131 | filters=256
132 | size=1
133 | stride=1
134 | pad=1
135 | activation=leaky
136 | 
137 | [convolutional]
138 | batch_normalize=1
139 | filters=512
140 | size=3
141 | stride=1
142 | pad=1
143 | activation=leaky
144 | 
145 | [maxpool]
146 | size=2
147 | stride=2
148 | 
149 | [convolutional]
150 | batch_normalize=1
151 | filters=1024
152 | size=3
153 | stride=1
154 | pad=1
155 | activation=leaky
156 | 
157 | [convolutional]
158 | batch_normalize=1
159 | filters=512
160 | size=1
161 | stride=1
162 | pad=1
163 | activation=leaky
164 | 
165 | [convolutional]
166 | batch_normalize=1
167 | filters=1024
168 | size=3
169 | stride=1
170 | pad=1
171 | activation=leaky
172 | 
173 | [convolutional]
174 | batch_normalize=1
175 | filters=512
176 | size=1
177 | stride=1
178 | pad=1
179 | activation=leaky
180 | 
181 | [convolutional]
182 | batch_normalize=1
183 | filters=1024
184 | size=3
185 | stride=1
186 | pad=1
187 | activation=leaky
188 | 
189 | 
190 | #######
191 | 
192 | [convolutional]
193 | batch_normalize=1
194 | size=3
195 | stride=1
196 | pad=1
197 | filters=1024
198 | activation=leaky
199 | 
200 | [convolutional]
201 | batch_normalize=1
202 | size=3
203 | stride=1
204 | pad=1
205 | filters=1024
206 | activation=leaky
207 | 
208 | [route]
209 | layers=-9
210 | 
211 | [convolutional]
212 | batch_normalize=1
213 | size=1
214 | stride=1
215 | pad=1
216 | filters=64
217 | activation=leaky
218 | 
219 | [reorg]
220 | stride=2
221 | 
222 | [route]
223 | layers=-1,-4
224 | 
225 | [convolutional]
226 | batch_normalize=1
227 | size=3
228 | stride=1
229 | pad=1
230 | filters=1024
231 | activation=leaky
232 | 
233 | [convolutional]
234 | size=1
235 | stride=1
236 | pad=1
237 | filters=425
238 | activation=linear
239 | 
240 | 
241 | [region]
242 | anchors =  0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828
243 | bias_match=1
244 | classes=80
245 | coords=4
246 | num=5
247 | softmax=1
248 | jitter=.3
249 | rescore=1
250 | 
251 | object_scale=5
252 | noobject_scale=1
253 | class_scale=1
254 | coord_scale=1
255 | 
256 | absolute=1
257 | thresh = .6
258 | random=1
259 | 


--------------------------------------------------------------------------------
/demo/lib/yolov3/cfg/yolov3.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | batch=1
  4 | subdivisions=1
  5 | # Training
  6 | # batch=64
  7 | # subdivisions=16
  8 | width= 320
  9 | height = 320
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | [convolutional]
 26 | batch_normalize=1
 27 | filters=32
 28 | size=3
 29 | stride=1
 30 | pad=1
 31 | activation=leaky
 32 | 
 33 | # Downsample
 34 | 
 35 | [convolutional]
 36 | batch_normalize=1
 37 | filters=64
 38 | size=3
 39 | stride=2
 40 | pad=1
 41 | activation=leaky
 42 | 
 43 | [convolutional]
 44 | batch_normalize=1
 45 | filters=32
 46 | size=1
 47 | stride=1
 48 | pad=1
 49 | activation=leaky
 50 | 
 51 | [convolutional]
 52 | batch_normalize=1
 53 | filters=64
 54 | size=3
 55 | stride=1
 56 | pad=1
 57 | activation=leaky
 58 | 
 59 | [shortcut]
 60 | from=-3
 61 | activation=linear
 62 | 
 63 | # Downsample
 64 | 
 65 | [convolutional]
 66 | batch_normalize=1
 67 | filters=128
 68 | size=3
 69 | stride=2
 70 | pad=1
 71 | activation=leaky
 72 | 
 73 | [convolutional]
 74 | batch_normalize=1
 75 | filters=64
 76 | size=1
 77 | stride=1
 78 | pad=1
 79 | activation=leaky
 80 | 
 81 | [convolutional]
 82 | batch_normalize=1
 83 | filters=128
 84 | size=3
 85 | stride=1
 86 | pad=1
 87 | activation=leaky
 88 | 
 89 | [shortcut]
 90 | from=-3
 91 | activation=linear
 92 | 
 93 | [convolutional]
 94 | batch_normalize=1
 95 | filters=64
 96 | size=1
 97 | stride=1
 98 | pad=1
 99 | activation=leaky
100 | 
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 | 
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 | 
113 | # Downsample
114 | 
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 | 
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 | 
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 | 
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 | 
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 | 
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 | 
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 | 
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 | 
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 | 
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 | 
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 | 
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 | 
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 | 
203 | 
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 | 
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 | 
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 | 
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 | 
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 | 
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 | 
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 | 
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 | 
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 | 
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 | 
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 | 
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 | 
284 | # Downsample
285 | 
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 | 
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 | 
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 | 
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 | 
314 | 
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 | 
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 | 
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 | 
335 | 
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 | 
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 | 
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 | 
356 | 
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 | 
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 | 
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 | 
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 | 
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 | 
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 | 
397 | 
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 | 
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 | 
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 | 
418 | 
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 | 
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 | 
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 | 
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 | 
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 | 
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 | 
459 | # Downsample
460 | 
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 | 
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 | 
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 | 
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 | 
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 | 
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 | 
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 | 
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 | 
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 | 
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 | 
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 | 
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 | 
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 | 
549 | ######################
550 | 
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 | 
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 | 
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 | 
575 | [convolutional]
576 | batch_normalize=1
577 | size=3
578 | stride=1
579 | pad=1
580 | filters=1024
581 | activation=leaky
582 | 
583 | [convolutional]
584 | batch_normalize=1
585 | filters=512
586 | size=1
587 | stride=1
588 | pad=1
589 | activation=leaky
590 | 
591 | [convolutional]
592 | batch_normalize=1
593 | size=3
594 | stride=1
595 | pad=1
596 | filters=1024
597 | activation=leaky
598 | 
599 | [convolutional]
600 | size=1
601 | stride=1
602 | pad=1
603 | filters=255
604 | activation=linear
605 | 
606 | 
607 | [yolo]
608 | mask = 6,7,8
609 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
610 | classes=80
611 | num=9
612 | jitter=.3
613 | ignore_thresh = .5
614 | truth_thresh = 1
615 | random=1
616 | 
617 | 
618 | [route]
619 | layers = -4
620 | 
621 | [convolutional]
622 | batch_normalize=1
623 | filters=256
624 | size=1
625 | stride=1
626 | pad=1
627 | activation=leaky
628 | 
629 | [upsample]
630 | stride=2
631 | 
632 | [route]
633 | layers = -1, 61
634 | 
635 | 
636 | 
637 | [convolutional]
638 | batch_normalize=1
639 | filters=256
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 | 
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=512
651 | activation=leaky
652 | 
653 | [convolutional]
654 | batch_normalize=1
655 | filters=256
656 | size=1
657 | stride=1
658 | pad=1
659 | activation=leaky
660 | 
661 | [convolutional]
662 | batch_normalize=1
663 | size=3
664 | stride=1
665 | pad=1
666 | filters=512
667 | activation=leaky
668 | 
669 | [convolutional]
670 | batch_normalize=1
671 | filters=256
672 | size=1
673 | stride=1
674 | pad=1
675 | activation=leaky
676 | 
677 | [convolutional]
678 | batch_normalize=1
679 | size=3
680 | stride=1
681 | pad=1
682 | filters=512
683 | activation=leaky
684 | 
685 | [convolutional]
686 | size=1
687 | stride=1
688 | pad=1
689 | filters=255
690 | activation=linear
691 | 
692 | 
693 | [yolo]
694 | mask = 3,4,5
695 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
696 | classes=80
697 | num=9
698 | jitter=.3
699 | ignore_thresh = .5
700 | truth_thresh = 1
701 | random=1
702 | 
703 | 
704 | 
705 | [route]
706 | layers = -4
707 | 
708 | [convolutional]
709 | batch_normalize=1
710 | filters=128
711 | size=1
712 | stride=1
713 | pad=1
714 | activation=leaky
715 | 
716 | [upsample]
717 | stride=2
718 | 
719 | [route]
720 | layers = -1, 36
721 | 
722 | 
723 | 
724 | [convolutional]
725 | batch_normalize=1
726 | filters=128
727 | size=1
728 | stride=1
729 | pad=1
730 | activation=leaky
731 | 
732 | [convolutional]
733 | batch_normalize=1
734 | size=3
735 | stride=1
736 | pad=1
737 | filters=256
738 | activation=leaky
739 | 
740 | [convolutional]
741 | batch_normalize=1
742 | filters=128
743 | size=1
744 | stride=1
745 | pad=1
746 | activation=leaky
747 | 
748 | [convolutional]
749 | batch_normalize=1
750 | size=3
751 | stride=1
752 | pad=1
753 | filters=256
754 | activation=leaky
755 | 
756 | [convolutional]
757 | batch_normalize=1
758 | filters=128
759 | size=1
760 | stride=1
761 | pad=1
762 | activation=leaky
763 | 
764 | [convolutional]
765 | batch_normalize=1
766 | size=3
767 | stride=1
768 | pad=1
769 | filters=256
770 | activation=leaky
771 | 
772 | [convolutional]
773 | size=1
774 | stride=1
775 | pad=1
776 | filters=255
777 | activation=linear
778 | 
779 | 
780 | [yolo]
781 | mask = 0,1,2
782 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
783 | classes=80
784 | num=9
785 | jitter=.3
786 | ignore_thresh = .5
787 | truth_thresh = 1
788 | random=1
789 | 
790 | 


--------------------------------------------------------------------------------
/demo/lib/yolov3/darknet.py:
--------------------------------------------------------------------------------
  1 | from __future__ import division
  2 | 
  3 | import torch
  4 | import torch.nn as nn
  5 | import torch.nn.functional as F
  6 | import numpy as np
  7 | import cv2
  8 | import os
  9 | import sys
 10 | 
 11 | from lib.yolov3.util import convert2cpu as cpu
 12 | from lib.yolov3.util import predict_transform
 13 | 
 14 | 
 15 | class test_net(nn.Module):
 16 |     def __init__(self, num_layers, input_size):
 17 |         super(test_net, self).__init__()
 18 |         self.num_layers= num_layers
 19 |         self.linear_1 = nn.Linear(input_size, 5)
 20 |         self.middle = nn.ModuleList([nn.Linear(5,5) for x in range(num_layers)])
 21 |         self.output = nn.Linear(5,2)
 22 | 
 23 |     def forward(self, x):
 24 |         x = x.view(-1)
 25 |         fwd = nn.Sequential(self.linear_1, *self.middle, self.output)
 26 |         return fwd(x)
 27 | 
 28 | 
 29 | def get_test_input():
 30 |     img = cv2.imread("dog-cycle-car.png")
 31 |     img = cv2.resize(img, (416, 416))
 32 |     img_ = img[:, :, ::-1].transpose((2, 0, 1))
 33 |     img_ = img_[np.newaxis, :, :, :]/255.0
 34 |     img_ = torch.from_numpy(img_).float()
 35 |     return img_
 36 | 
 37 | 
 38 | def parse_cfg(cfgfile):
 39 |     """
 40 |     Takes a configuration file
 41 | 
 42 |     Returns a list of blocks. Each blocks describes a block in the neural
 43 |     network to be built. Block is represented as a dictionary in the list
 44 | 
 45 |     """
 46 |     # cfgfile = os.path.join(sys.path[-1], cfgfile)
 47 |     file = open(cfgfile, 'r')
 48 |     lines = file.read().split('\n')     # store the lines in a list
 49 |     lines = [x for x in lines if len(x) > 0]  # get read of the empty lines
 50 |     lines = [x for x in lines if x[0] != '#']
 51 |     lines = [x.rstrip().lstrip() for x in lines]
 52 | 
 53 |     block = {}
 54 |     blocks = []
 55 | 
 56 |     for line in lines:
 57 |         if line[0] == "[":               # This marks the start of a new block
 58 |             if len(block) != 0:
 59 |                 blocks.append(block)
 60 |                 block = {}
 61 |             block["type"] = line[1:-1].rstrip()
 62 |         else:
 63 |             key,value = line.split("=")
 64 |             block[key.rstrip()] = value.lstrip()
 65 |     blocks.append(block)
 66 | 
 67 |     return blocks
 68 | 
 69 | 
 70 | class MaxPoolStride1(nn.Module):
 71 |     def __init__(self, kernel_size):
 72 |         super(MaxPoolStride1, self).__init__()
 73 |         self.kernel_size = kernel_size
 74 |         self.pad = kernel_size - 1
 75 | 
 76 |     def forward(self, x):
 77 |         padded_x = F.pad(x, (0, self.pad, 0, self.pad), mode="replicate")
 78 |         pooled_x = nn.MaxPool2d(self.kernel_size, self.pad)(padded_x)
 79 |         return pooled_x
 80 | 
 81 | 
 82 | class EmptyLayer(nn.Module):
 83 |     def __init__(self):
 84 |         super(EmptyLayer, self).__init__()
 85 | 
 86 | 
 87 | class DetectionLayer(nn.Module):
 88 |     def __init__(self, anchors):
 89 |         super(DetectionLayer, self).__init__()
 90 |         self.anchors = anchors
 91 | 
 92 |     def forward(self, x, inp_dim, num_classes, confidence):
 93 |         x = x.data
 94 |         global CUDA
 95 |         prediction = x
 96 |         prediction = predict_transform(prediction, inp_dim, self.anchors, num_classes, confidence, CUDA)
 97 |         return prediction
 98 | 
 99 | 
100 | class Upsample(nn.Module):
101 |     def __init__(self, stride=2):
102 |         super(Upsample, self).__init__()
103 |         self.stride = stride
104 | 
105 |     def forward(self, x):
106 |         stride = self.stride
107 |         assert(x.data.dim() == 4)
108 |         B = x.data.size(0)
109 |         C = x.data.size(1)
110 |         H = x.data.size(2)
111 |         W = x.data.size(3)
112 |         ws = stride
113 |         hs = stride
114 |         x = x.view(B, C, H, 1, W, 1).expand(B, C, H, stride, W, stride).contiguous().view(B, C, H*stride, W*stride)
115 |         return x
116 | 
117 | 
118 | class ReOrgLayer(nn.Module):
119 |     def __init__(self, stride=2):
120 |         super(ReOrgLayer, self).__init__()
121 |         self.stride= stride
122 | 
123 |     def forward(self, x):
124 |         assert(x.data.dim() == 4)
125 |         B, C, H, W = x.data.shape
126 |         hs = self.stride
127 |         ws = self.stride
128 |         assert(H % hs == 0),  "The stride " + str(self.stride) + " is not a proper divisor of height " + str(H)
129 |         assert(W % ws == 0),  "The stride " + str(self.stride) + " is not a proper divisor of height " + str(W)
130 |         x = x.view(B, C, H // hs, hs, W // ws, ws).transpose(-2, -3).contiguous()
131 |         x = x.view(B, C, H // hs * W // ws, hs, ws)
132 |         x = x.view(B, C, H // hs * W // ws, hs*ws).transpose(-1, -2).contiguous()
133 |         x = x.view(B, C, ws*hs, H // ws, W // ws).transpose(1, 2).contiguous()
134 |         x = x.view(B, C*ws*hs, H // ws, W // ws)
135 |         return x
136 | 
137 | 
138 | def create_modules(blocks):
139 |     net_info = blocks[0]     # Captures the information about the input and pre-processing
140 | 
141 |     module_list = nn.ModuleList()
142 | 
143 |     index = 0    # indexing blocks helps with implementing route  layers (skip connections)
144 |     prev_filters = 3
145 |     output_filters = []
146 | 
147 |     for x in blocks:
148 |         module = nn.Sequential()
149 |         if x["type"] == "net":
150 |             continue
151 | 
152 |         # If it's a convolutional layer
153 |         if x["type"] == "convolutional":
154 |             # Get the info about the layer
155 |             activation = x["activation"]
156 |             try:
157 |                 batch_normalize = int(x["batch_normalize"])
158 |                 bias = False
159 |             except:
160 |                 batch_normalize = 0
161 |                 bias = True
162 | 
163 |             filters= int(x["filters"])
164 |             padding = int(x["pad"])
165 |             kernel_size = int(x["size"])
166 |             stride = int(x["stride"])
167 | 
168 |             if padding:
169 |                 pad = (kernel_size - 1) // 2
170 |             else:
171 |                 pad = 0
172 | 
173 |             # Add the convolutional layer
174 |             conv = nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias = bias)
175 |             module.add_module("conv_{0}".format(index), conv)
176 | 
177 |             # Add the Batch Norm Layer
178 |             if batch_normalize:
179 |                 bn = nn.BatchNorm2d(filters)
180 |                 module.add_module("batch_norm_{0}".format(index), bn)
181 | 
182 |             # Check the activation.
183 |             # It is either Linear or a Leaky ReLU for YOLO
184 |             if activation == "leaky":
185 |                 activn = nn.LeakyReLU(0.1, inplace = True)
186 |                 module.add_module("leaky_{0}".format(index), activn)
187 | 
188 |         # If it's an upsampling layer
189 |         # We use Bilinear2dUpsampling
190 | 
191 |         elif x["type"] == "upsample":
192 |             stride = int(x["stride"])
193 | #           upsample = Upsample(stride)
194 |             upsample = nn.Upsample(scale_factor=2, mode="nearest")
195 |             module.add_module("upsample_{}".format(index), upsample)
196 | 
197 |         # If it is a route layer
198 |         elif (x["type"] == "route"):
199 |             x["layers"] = x["layers"].split(',')
200 | 
201 |             # Start  of a route
202 |             start = int(x["layers"][0])
203 | 
204 |             # end, if there exists one.
205 |             try:
206 |                 end = int(x["layers"][1])
207 |             except:
208 |                 end = 0
209 | 
210 |             # Positive anotation
211 |             if start > 0:
212 |                 start = start - index
213 | 
214 |             if end > 0:
215 |                 end = end - index
216 | 
217 |             route = EmptyLayer()
218 |             module.add_module("route_{0}".format(index), route)
219 | 
220 |             if end < 0:
221 |                 filters = output_filters[index + start] + output_filters[index + end]
222 |             else:
223 |                 filters = output_filters[index + start]
224 | 
225 |         # shortcut corresponds to skip connection
226 |         elif x["type"] == "shortcut":
227 |             from_ = int(x["from"])
228 |             shortcut = EmptyLayer()
229 |             module.add_module("shortcut_{}".format(index), shortcut)
230 | 
231 |         elif x["type"] == "maxpool":
232 |             stride = int(x["stride"])
233 |             size = int(x["size"])
234 |             if stride != 1:
235 |                 maxpool = nn.MaxPool2d(size, stride)
236 |             else:
237 |                 maxpool = MaxPoolStride1(size)
238 | 
239 |             module.add_module("maxpool_{}".format(index), maxpool)
240 | 
241 |         # Yolo is the detection layer
242 |         elif x["type"] == "yolo":
243 |             mask = x["mask"].split(",")
244 |             mask = [int(x) for x in mask]
245 | 
246 |             anchors = x["anchors"].split(",")
247 |             anchors = [int(a) for a in anchors]
248 |             anchors = [(anchors[i], anchors[i+1]) for i in range(0, len(anchors),2)]
249 |             anchors = [anchors[i] for i in mask]
250 | 
251 |             detection = DetectionLayer(anchors)
252 |             module.add_module("Detection_{}".format(index), detection)
253 | 
254 |         else:
255 |             print("Something I dunno")
256 |             assert False
257 | 
258 |         module_list.append(module)
259 |         prev_filters = filters
260 |         output_filters.append(filters)
261 |         index += 1
262 | 
263 |     return (net_info, module_list)
264 | 
265 | 
266 | class Darknet(nn.Module):
267 |     def __init__(self, cfgfile):
268 |         super(Darknet, self).__init__()
269 |         self.blocks = parse_cfg(cfgfile)
270 |         self.net_info, self.module_list = create_modules(self.blocks)
271 |         self.header = torch.IntTensor([0, 0, 0, 0])
272 |         self.seen = 0
273 | 
274 |     def get_blocks(self):
275 |         return self.blocks
276 | 
277 |     def get_module_list(self):
278 |         return self.module_list
279 | 
280 |     def forward(self, x, CUDA):
281 |         detections = []
282 |         modules = self.blocks[1:]
283 |         outputs = {}   # We cache the outputs for the route layer
284 | 
285 |         write = 0
286 |         for i in range(len(modules)):
287 | 
288 |             module_type = (modules[i]["type"])
289 |             if module_type == "convolutional" or module_type == "upsample" or module_type == "maxpool":
290 | 
291 |                 x = self.module_list[i](x)
292 |                 outputs[i] = x
293 | 
294 |             elif module_type == "route":
295 |                 layers = modules[i]["layers"]
296 |                 layers = [int(a) for a in layers]
297 | 
298 |                 if (layers[0]) > 0:
299 |                     layers[0] = layers[0] - i
300 | 
301 |                 if len(layers) == 1:
302 |                     x = outputs[i + (layers[0])]
303 | 
304 |                 else:
305 |                     if (layers[1]) > 0:
306 |                         layers[1] = layers[1] - i
307 | 
308 |                     map1 = outputs[i + layers[0]]
309 |                     map2 = outputs[i + layers[1]]
310 | 
311 |                     x = torch.cat((map1, map2), 1)
312 |                 outputs[i] = x
313 | 
314 |             elif module_type == "shortcut":
315 |                 from_ = int(modules[i]["from"])
316 |                 x = outputs[i-1] + outputs[i+from_]
317 |                 outputs[i] = x
318 | 
319 |             elif module_type == 'yolo':
320 | 
321 |                 anchors = self.module_list[i][0].anchors
322 |                 # Get the input dimensions
323 |                 inp_dim = int(self.net_info["height"])
324 | 
325 |                 # Get the number of classes
326 |                 num_classes = int(modules[i]["classes"])
327 | 
328 |                 # Output the result
329 |                 x = x.data
330 |                 x = predict_transform(x, inp_dim, anchors, num_classes, CUDA)
331 | 
332 |                 if type(x) == int:
333 |                     continue
334 | 
335 |                 if not write:
336 |                     detections = x
337 |                     write = 1
338 |                 else:
339 |                     detections = torch.cat((detections, x), 1)
340 | 
341 |                 outputs[i] = outputs[i-1]
342 | 
343 |         try:
344 |             return detections
345 |         except:
346 |             return 0
347 | 
348 |     def load_weights(self, weightfile):
349 |         # Introduction: https://blog.paperspace.com/how-to-implement-a-yolo-v3-object-detector-from-scratch-in-pytorch-part-3/
350 |         # Open the weights file
351 |         # weightfile = os.path.join(sys.path[-1], weightfile)
352 |         fp = open(weightfile, "rb")
353 | 
354 |         # The first 5 values are header information
355 |         # 1. Major version number
356 |         # 2. Minor Version Number
357 |         # 3. Subversion number
358 |         # 4.5 Images seen by the network (during training)
359 |         header = np.fromfile(fp, dtype = np.int32, count = 5)
360 |         self.header = torch.from_numpy(header)
361 |         self.seen = self.header[3]
362 | 
363 |         # The rest of the values are the weights
364 |         # Let's load them up
365 |         weights = np.fromfile(fp, dtype = np.float32)
366 | 
367 |         ptr = 0
368 |         for i in range(len(self.module_list)):
369 |             module_type = self.blocks[i + 1]["type"]
370 | 
371 |             if module_type == "convolutional":
372 |                 model = self.module_list[i]
373 |                 try:
374 |                     batch_normalize = int(self.blocks[i+1]["batch_normalize"])
375 |                 except:
376 |                     batch_normalize = 0
377 | 
378 |                 conv = model[0]
379 | 
380 |                 if (batch_normalize):
381 |                     bn = model[1]
382 | 
383 |                     # Get the number of weights of Batch Norm Layer
384 |                     num_bn_biases = bn.bias.numel()
385 | 
386 |                     # Load the weights
387 |                     bn_biases = torch.from_numpy(weights[ptr:ptr + num_bn_biases])
388 |                     ptr += num_bn_biases
389 | 
390 |                     bn_weights = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
391 |                     ptr += num_bn_biases
392 | 
393 |                     bn_running_mean = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
394 |                     ptr += num_bn_biases
395 | 
396 |                     bn_running_var = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
397 |                     ptr += num_bn_biases
398 | 
399 |                     # Cast the loaded weights into dims of model weights.
400 |                     bn_biases = bn_biases.view_as(bn.bias.data)
401 |                     bn_weights = bn_weights.view_as(bn.weight.data)
402 |                     bn_running_mean = bn_running_mean.view_as(bn.running_mean)
403 |                     bn_running_var = bn_running_var.view_as(bn.running_var)
404 | 
405 |                     # Copy the data to model
406 |                     bn.bias.data.copy_(bn_biases)
407 |                     bn.weight.data.copy_(bn_weights)
408 |                     bn.running_mean.copy_(bn_running_mean)
409 |                     bn.running_var.copy_(bn_running_var)
410 | 
411 |                 else:
412 |                     # Number of biases
413 |                     num_biases = conv.bias.numel()
414 | 
415 |                     # Load the weights
416 |                     conv_biases = torch.from_numpy(weights[ptr: ptr + num_biases])
417 |                     ptr = ptr + num_biases
418 | 
419 |                     # reshape the loaded weights according to the dims of the model weights
420 |                     conv_biases = conv_biases.view_as(conv.bias.data)
421 | 
422 |                     # Finally copy the data
423 |                     conv.bias.data.copy_(conv_biases)
424 | 
425 |                 # Let us load the weights for the Convolutional layers
426 |                 num_weights = conv.weight.numel()
427 | 
428 |                 # Do the same as above for weights
429 |                 conv_weights = torch.from_numpy(weights[ptr:ptr+num_weights])
430 |                 ptr = ptr + num_weights
431 | 
432 |                 conv_weights = conv_weights.view_as(conv.weight.data)
433 |                 conv.weight.data.copy_(conv_weights)
434 | 


--------------------------------------------------------------------------------
/demo/lib/yolov3/data/coco.names:
--------------------------------------------------------------------------------
 1 | person
 2 | bicycle
 3 | car
 4 | motorbike
 5 | aeroplane
 6 | bus
 7 | train
 8 | truck
 9 | boat
10 | traffic light
11 | fire hydrant
12 | stop sign
13 | parking meter
14 | bench
15 | bird
16 | cat
17 | dog
18 | horse
19 | sheep
20 | cow
21 | elephant
22 | bear
23 | zebra
24 | giraffe
25 | backpack
26 | umbrella
27 | handbag
28 | tie
29 | suitcase
30 | frisbee
31 | skis
32 | snowboard
33 | sports ball
34 | kite
35 | baseball bat
36 | baseball glove
37 | skateboard
38 | surfboard
39 | tennis racket
40 | bottle
41 | wine glass
42 | cup
43 | fork
44 | knife
45 | spoon
46 | bowl
47 | banana
48 | apple
49 | sandwich
50 | orange
51 | broccoli
52 | carrot
53 | hot dog
54 | pizza
55 | donut
56 | cake
57 | chair
58 | sofa
59 | pottedplant
60 | bed
61 | diningtable
62 | toilet
63 | tvmonitor
64 | laptop
65 | mouse
66 | remote
67 | keyboard
68 | cell phone
69 | microwave
70 | oven
71 | toaster
72 | sink
73 | refrigerator
74 | book
75 | clock
76 | vase
77 | scissors
78 | teddy bear
79 | hair drier
80 | toothbrush
81 | 


--------------------------------------------------------------------------------
/demo/lib/yolov3/data/pallete:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/yolov3/data/pallete


--------------------------------------------------------------------------------
/demo/lib/yolov3/data/voc.names:
--------------------------------------------------------------------------------
 1 | aeroplane
 2 | bicycle
 3 | bird
 4 | boat
 5 | bottle
 6 | bus
 7 | car
 8 | cat
 9 | chair
10 | cow
11 | diningtable
12 | dog
13 | horse
14 | motorbike
15 | person
16 | pottedplant
17 | sheep
18 | sofa
19 | train
20 | tvmonitor
21 | 


--------------------------------------------------------------------------------
/demo/lib/yolov3/human_detector.py:
--------------------------------------------------------------------------------
  1 | from __future__ import division
  2 | import time
  3 | import torch
  4 | import numpy as np
  5 | import cv2
  6 | import os
  7 | import sys
  8 | import random
  9 | import pickle as pkl
 10 | import argparse
 11 | 
 12 | from lib.yolov3.util import *
 13 | from lib.yolov3.darknet import Darknet
 14 | from lib.yolov3 import preprocess
 15 | 
 16 | cur_dir = os.path.dirname(os.path.realpath(__file__))
 17 | project_root = os.path.join(cur_dir, '../../../')
 18 | chk_root = os.path.join(project_root, 'checkpoint/')
 19 | data_root = os.path.join(project_root, 'data/')
 20 | 
 21 | 
 22 | sys.path.insert(0, project_root)
 23 | sys.path.pop(0)
 24 | 
 25 | 
 26 | def prep_image(img, inp_dim):
 27 |     """
 28 |     Prepare image for inputting to the neural network.
 29 | 
 30 |     Returns a Variable
 31 |     """
 32 |     ori_img = img
 33 |     dim = ori_img.shape[1], ori_img.shape[0]
 34 |     img = cv2.resize(ori_img, (inp_dim, inp_dim))
 35 |     img_ = img[:, :, ::-1].transpose((2, 0, 1)).copy()
 36 |     img_ = torch.from_numpy(img_).float().div(255.0).unsqueeze(0)
 37 |     return img_, ori_img, dim
 38 | 
 39 | 
 40 | def write(x, img, colors):
 41 |     x = [int(i) for i in x]
 42 |     c1 = tuple(x[0:2])
 43 |     c2 = tuple(x[2:4])
 44 | 
 45 |     label = 'People {}'.format(0)
 46 |     color = (0, 0, 255)
 47 |     cv2.rectangle(img, c1, c2, color, 2)
 48 |     t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1, 1)[0]
 49 |     c2 = c1[0] + t_size[0] + 3, c1[1] + t_size[1] + 4
 50 |     cv2.rectangle(img, c1, c2, color, -1)
 51 |     cv2.putText(img, label, (c1[0], c1[1] + t_size[1] + 4), cv2.FONT_HERSHEY_PLAIN, 1, [225, 255, 255], 1)
 52 |     return img
 53 | 
 54 | 
 55 | def arg_parse():
 56 |     """"
 57 |     Parse arguements to the detect module
 58 | 
 59 |     """
 60 |     parser = argparse.ArgumentParser(description='YOLO v3 Cam Demo')
 61 |     parser.add_argument('--confidence', dest='confidence', type=float, default=0.70,
 62 |                         help='Object Confidence to filter predictions')
 63 |     parser.add_argument('--nms-thresh', dest='nms_thresh', type=float, default=0.4, help='NMS Threshold')
 64 |     parser.add_argument('--reso', dest='reso', default=416, type=int, help='Input resolution of the network. '
 65 |                         'Increase to increase accuracy. Decrease to increase speed. (160, 416)')
 66 |     parser.add_argument('-wf', '--weight-file', type=str, default= 'demo/lib/checkpoint/yolov3.weights', help='The path'
 67 |                         'of model weight file')
 68 |     parser.add_argument('-cf', '--cfg-file', type=str, default=cur_dir + '/cfg/yolov3.cfg', help='weight file')
 69 |     parser.add_argument('-a', '--animation', action='store_true', help='output animation')
 70 |     parser.add_argument('-v', '--video', type=str, default='camera', help='The input video path')
 71 |     parser.add_argument("-f", "--figure", type=str, default='demo.jpg',help="input figure file name")
 72 |     parser.add_argument('-i', '--image', type=str, default=cur_dir + '/data/dog-cycle-car.png',
 73 |                         help='The input video path')
 74 |     parser.add_argument('-np', '--num-person', type=int, default=1, help='number of estimated human poses. [1, 2]')
 75 |     parser.add_argument('--gpu', type=str, default='0', help='input video')
 76 | 
 77 |     return parser.parse_args()
 78 | 
 79 | 
 80 | def load_model(args=None, CUDA=None, inp_dim=416):
 81 |     if args is None:
 82 |         args = arg_parse()
 83 | 
 84 |     if CUDA is None:
 85 |         CUDA = torch.cuda.is_available()
 86 | 
 87 |     # Set up the neural network
 88 |     model = Darknet(args.cfg_file)
 89 |     model.load_weights(args.weight_file)
 90 |     # print("YOLOv3 network successfully loaded")
 91 | 
 92 |     model.net_info["height"] = inp_dim
 93 |     assert inp_dim % 32 == 0
 94 |     assert inp_dim > 32
 95 | 
 96 |     # If there's a GPU availible, put the model on GPU
 97 |     if CUDA:
 98 |         model.cuda()
 99 | 
100 |     # Set the model in evaluation mode
101 |     model.eval()
102 | 
103 |     return model
104 | 
105 | 
106 | def yolo_human_det(img, model=None, reso=416, confidence=0.70):
107 |     args = arg_parse()
108 |     # args.reso = reso
109 |     inp_dim = reso
110 |     num_classes = 80
111 | 
112 |     CUDA = torch.cuda.is_available()
113 |     if model is None:
114 |         model = load_model(args, CUDA, inp_dim)
115 | 
116 |     if type(img) == str:
117 |         assert os.path.isfile(img), 'The image path does not exist'
118 |         img = cv2.imread(img)
119 | 
120 |     img, ori_img, img_dim = preprocess.prep_image(img, inp_dim)
121 |     img_dim = torch.FloatTensor(img_dim).repeat(1, 2)
122 | 
123 |     with torch.no_grad():
124 |         if CUDA:
125 |             img_dim = img_dim.cuda()
126 |             img = img.cuda()
127 |         output = model(img, CUDA)
128 |         output = write_results(output, confidence, num_classes, nms=True, nms_conf=args.nms_thresh, det_hm=True)
129 | 
130 |         if len(output) == 0:
131 |             return None, None
132 | 
133 |         img_dim = img_dim.repeat(output.size(0), 1)
134 |         scaling_factor = torch.min(inp_dim / img_dim, 1)[0].view(-1, 1)
135 | 
136 |         output[:, [1, 3]] -= (inp_dim - scaling_factor * img_dim[:, 0].view(-1, 1)) / 2
137 |         output[:, [2, 4]] -= (inp_dim - scaling_factor * img_dim[:, 1].view(-1, 1)) / 2
138 |         output[:, 1:5] /= scaling_factor
139 | 
140 |         for i in range(output.shape[0]):
141 |             output[i, [1, 3]] = torch.clamp(output[i, [1, 3]], 0.0, img_dim[i, 0])
142 |             output[i, [2, 4]] = torch.clamp(output[i, [2, 4]], 0.0, img_dim[i, 1])
143 | 
144 |     bboxs = []
145 |     scores = []
146 |     for i in range(len(output)):
147 |         item = output[i]
148 |         bbox = item[1:5].cpu().numpy()
149 |         # conver float32 to .2f data
150 |         bbox = [round(i, 2) for i in list(bbox)]
151 |         score = item[5].cpu().numpy()
152 |         bboxs.append(bbox)
153 |         scores.append(score)
154 |     scores = np.expand_dims(np.array(scores), 1)
155 |     bboxs = np.array(bboxs)
156 | 
157 |     return bboxs, scores
158 | 


--------------------------------------------------------------------------------
/demo/lib/yolov3/preprocess.py:
--------------------------------------------------------------------------------
 1 | from __future__ import division
 2 | 
 3 | import torch
 4 | import numpy as np
 5 | import cv2
 6 | from PIL import Image
 7 | 
 8 | 
 9 | def letterbox_image(img, inp_dim):
10 |     '''resize image with unchanged aspect ratio using padding'''
11 |     img_w, img_h = img.shape[1], img.shape[0]
12 |     w, h = inp_dim
13 |     new_w = int(img_w * min(w/img_w, h/img_h))
14 |     new_h = int(img_h * min(w/img_w, h/img_h))
15 |     resized_image = cv2.resize(img, (new_w, new_h), interpolation=cv2.INTER_CUBIC)
16 | 
17 |     canvas = np.full((inp_dim[1], inp_dim[0], 3), 128)
18 | 
19 |     canvas[(h - new_h) // 2:(h - new_h) // 2 + new_h, (w - new_w) // 2:(w - new_w) // 2 + new_w, :] = resized_image
20 | 
21 |     return canvas
22 | 
23 | 
24 | def prep_image(img, inp_dim):
25 |     """
26 |     Prepare image for inputting to the neural network.
27 | 
28 |     Returns a Variable
29 |     """
30 |     if type(img) == str:
31 |         orig_im = cv2.imread(img)
32 |     else:
33 |         orig_im = img
34 |     dim = orig_im.shape[1], orig_im.shape[0]
35 |     img = (letterbox_image(orig_im, (inp_dim, inp_dim)))
36 |     img_ = img[:, :, ::-1].transpose((2, 0, 1)).copy()
37 |     img_ = torch.from_numpy(img_).float().div(255.0).unsqueeze(0)
38 |     return img_, orig_im, dim
39 | 
40 | 
41 | def prep_image_pil(img, network_dim):
42 |     orig_im = Image.open(img)
43 |     img = orig_im.convert('RGB')
44 |     dim = img.size
45 |     img = img.resize(network_dim)
46 |     img = torch.ByteTensor(torch.ByteStorage.from_buffer(img.tobytes()))
47 |     img = img.view(*network_dim, 3).transpose(0, 1).transpose(0, 2).contiguous()
48 |     img = img.view(1, 3, *network_dim)
49 |     img = img.float().div(255.0)
50 |     return img, orig_im, dim
51 | 
52 | 
53 | def inp_to_image(inp):
54 |     inp = inp.cpu().squeeze()
55 |     inp = inp * 255
56 |     try:
57 |         inp = inp.data.numpy()
58 |     except RuntimeError:
59 |         inp = inp.numpy()
60 |     inp = inp.transpose(1, 2, 0)
61 | 
62 |     inp = inp[:, :, ::-1]
63 |     return inp
64 | 


--------------------------------------------------------------------------------
/demo/lib/yolov3/util.py:
--------------------------------------------------------------------------------
  1 | from __future__ import division
  2 | 
  3 | import torch
  4 | import numpy as np
  5 | import cv2
  6 | import os.path as osp
  7 | from lib.yolov3.bbox import bbox_iou
  8 | 
  9 | 
 10 | def get_path(cur_file):
 11 |     cur_dir = osp.dirname(osp.realpath(cur_file))
 12 |     project_root = osp.join(cur_dir, '../../../')
 13 |     chk_root = osp.join(project_root, 'checkpoint/')
 14 |     data_root = osp.join(project_root, 'data/')
 15 | 
 16 |     return project_root, chk_root, data_root, cur_dir
 17 | 
 18 | 
 19 | def count_parameters(model):
 20 |     return sum(p.numel() for p in model.parameters())
 21 | 
 22 | 
 23 | def count_learnable_parameters(model):
 24 |     return sum(p.numel() for p in model.parameters() if p.requires_grad)
 25 | 
 26 | 
 27 | def convert2cpu(matrix):
 28 |     if matrix.is_cuda:
 29 |         return torch.FloatTensor(matrix.size()).copy_(matrix)
 30 |     else:
 31 |         return matrix
 32 | 
 33 | 
 34 | def predict_transform(prediction, inp_dim, anchors, num_classes, CUDA = True):
 35 |     batch_size = prediction.size(0)
 36 |     stride = inp_dim // prediction.size(2)
 37 |     grid_size = inp_dim // stride
 38 |     bbox_attrs = 5 + num_classes
 39 |     num_anchors = len(anchors)
 40 | 
 41 |     anchors = [(a[0]/stride, a[1]/stride) for a in anchors]
 42 | 
 43 |     prediction = prediction.view(batch_size, bbox_attrs*num_anchors, grid_size*grid_size)
 44 |     prediction = prediction.transpose(1, 2).contiguous()
 45 |     prediction = prediction.view(batch_size, grid_size*grid_size*num_anchors, bbox_attrs)
 46 | 
 47 |     # Sigmoid the  centre_X, centre_Y. and object confidencce
 48 |     prediction[:, :, 0] = torch.sigmoid(prediction[:, :, 0])
 49 |     prediction[:, :, 1] = torch.sigmoid(prediction[:, :, 1])
 50 |     prediction[:, :, 4] = torch.sigmoid(prediction[:, :, 4])
 51 | 
 52 |     # Add the center offsets
 53 |     grid_len = np.arange(grid_size)
 54 |     a, b = np.meshgrid(grid_len, grid_len)
 55 | 
 56 |     x_offset = torch.FloatTensor(a).view(-1, 1)
 57 |     y_offset = torch.FloatTensor(b).view(-1, 1)
 58 | 
 59 |     if CUDA:
 60 |         x_offset = x_offset.cuda()
 61 |         y_offset = y_offset.cuda()
 62 | 
 63 |     x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1, num_anchors).view(-1, 2).unsqueeze(0)
 64 | 
 65 |     prediction[:, :, :2] += x_y_offset
 66 | 
 67 |     # log space transform height and the width
 68 |     anchors = torch.FloatTensor(anchors)
 69 | 
 70 |     if CUDA:
 71 |         anchors = anchors.cuda()
 72 | 
 73 |     anchors = anchors.repeat(grid_size*grid_size, 1).unsqueeze(0)
 74 |     prediction[:, :, 2:4] = torch.exp(prediction[:, :, 2:4])*anchors
 75 | 
 76 |     # Softmax the class scores
 77 |     prediction[:, :, 5: 5 + num_classes] = torch.sigmoid((prediction[:, :, 5: 5 + num_classes]))
 78 | 
 79 |     prediction[:, :, :4] *= stride
 80 | 
 81 |     return prediction
 82 | 
 83 | 
 84 | def load_classes(namesfile):
 85 |     fp = open(namesfile, "r")
 86 |     names = fp.read().split("\n")[:-1]
 87 |     return names
 88 | 
 89 | 
 90 | def get_im_dim(im):
 91 |     im = cv2.imread(im)
 92 |     w, h = im.shape[1], im.shape[0]
 93 |     return w, h
 94 | 
 95 | 
 96 | def unique(tensor):
 97 |     tensor_np = tensor.cpu().numpy()
 98 |     unique_np = np.unique(tensor_np)
 99 |     unique_tensor = torch.from_numpy(unique_np)
100 | 
101 |     tensor_res = tensor.new(unique_tensor.shape)
102 |     tensor_res.copy_(unique_tensor)
103 |     return tensor_res
104 | 
105 | 
106 | # ADD SOFT NMS
107 | def write_results(prediction, confidence, num_classes, nms=True, nms_conf=0.4, det_hm=False):
108 |     """
109 |         https://blog.paperspace.com/how-to-implement-a-yolo-v3-object-detector-from-scratch-in-pytorch-part-4/
110 |         prediction: (B x 10647 x 85)
111 |         B: the number of images in a batch,
112 |         10647: the number of bounding boxes predicted per image. (52×52+26×26+13×13)×3=10647
113 |         85: the number of bounding box attributes. (c_x, c_y, w, h, object confidence, and 80 class scores)
114 | 
115 |         output: Num_obj × [img_index, x_1, y_1, x_2, y_2, object confidence, class_score, label_index]
116 |     """
117 | 
118 |     conf_mask = (prediction[:, :, 4] > confidence).float().unsqueeze(2)
119 |     prediction = prediction*conf_mask
120 | 
121 |     box_a = prediction.new(prediction.shape)
122 |     box_a[:, :, 0] = (prediction[:, :, 0] - prediction[:, :, 2]/2)
123 |     box_a[:, :, 1] = (prediction[:, :, 1] - prediction[:, :, 3]/2)
124 |     box_a[:, :, 2] = (prediction[:, :, 0] + prediction[:, :, 2]/2)
125 |     box_a[:, :, 3] = (prediction[:, :, 1] + prediction[:, :, 3]/2)
126 |     prediction[:, :, :4] = box_a[:, :, :4]
127 | 
128 |     batch_size = prediction.size(0)
129 | 
130 |     output = prediction.new(1, prediction.size(2) + 1)
131 |     write = False
132 | 
133 |     for ind in range(batch_size):
134 |         # select the image from the batch
135 |         image_pred = prediction[ind]
136 | 
137 |         # Get the class having maximum score, and the index of that class
138 |         # Get rid of num_classes softmax scores
139 |         # Add the class index and the class score of class having maximum score
140 |         max_conf, max_conf_index = torch.max(image_pred[:, 5:5 + num_classes], 1)
141 |         max_conf = max_conf.float().unsqueeze(1)
142 |         max_conf_index = max_conf_index.float().unsqueeze(1)
143 |         seq = (image_pred[:, :5], max_conf, max_conf_index)
144 |         image_pred = torch.cat(seq, 1)  # image_pred:(10647, 7) 7:[x1, y1, x2, y2, obj_score, max_conf, max_conf_index]
145 | 
146 |         # Get rid of the zero entries
147 |         non_zero_ind = (torch.nonzero(image_pred[:, 4]))
148 |         image_pred__ = image_pred[non_zero_ind.squeeze(), :].view(-1, 7)
149 | 
150 |         # filters out people id
151 |         if det_hm:
152 |             cls_mask = (image_pred__[:, -1] == 0).float()
153 |             class_mask_ind = torch.nonzero(cls_mask).squeeze()
154 |             image_pred_ = image_pred__[class_mask_ind].view(-1, 7)
155 | 
156 |             if torch.sum(cls_mask) == 0:
157 |                 return image_pred_
158 |         else:
159 |             image_pred_ = image_pred__
160 | 
161 |         # Get the various classes detected in the image
162 |         try:
163 |             # img_classes = unique(image_pred_[:, -1])
164 |             img_classes = torch.unique(image_pred_[:, -1], sorted=True).float()
165 |         except:
166 |              continue
167 | 
168 |         # We will do NMS classwise
169 |         #  import ipdb;ipdb.set_trace()
170 |         for cls in img_classes:
171 |             # get the detections with one particular class
172 |             cls_mask = image_pred_*(image_pred_[:, -1] == cls).float().unsqueeze(1)
173 |             class_mask_ind = torch.nonzero(cls_mask[:, -2]).squeeze()
174 |             image_pred_class = image_pred_[class_mask_ind].view(-1, 7)
175 | 
176 |             # sort the detections such that the entry with the maximum objectness
177 |             # confidence is at the top
178 |             conf_sort_index = torch.sort(image_pred_class[:, 4], descending=True)[1]
179 |             image_pred_class = image_pred_class[conf_sort_index]
180 |             idx = image_pred_class.size(0)
181 | 
182 |             #  from soft_NMS import soft_nms
183 |             #  boxes = image_pred_class[:,:4]
184 |             #  scores = image_pred_class[:, 4]
185 |             #  k, N = soft_nms(boxes, scores, method=2)
186 |             #  image_pred_class = image_pred_class[k]
187 | 
188 |             # if nms has to be done
189 |             if nms:
190 |                 # For each detection
191 |                 for i in range(idx):
192 |                     # Get the IOUs of all boxes that come after the one we are looking at
193 |                     # in the loop
194 |                     try:
195 |                         ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])
196 |                     except ValueError:
197 |                         break
198 | 
199 |                     except IndexError:
200 |                         break
201 | 
202 |                     # Zero out all the detections that have IoU > threshold
203 |                     iou_mask = (ious < nms_conf).float().unsqueeze(1)
204 |                     image_pred_class[i+1:] *= iou_mask
205 | 
206 |                     #  Remove the zero entries
207 |                     non_zero_ind = torch.nonzero(image_pred_class[:, 4]).squeeze()
208 |                     image_pred_class = image_pred_class[non_zero_ind].view(-1, 7)
209 | 
210 |             # Concatenate the batch_id of the image to the detection
211 |             # this helps us identify which image does the detection correspond to
212 |             # We use a linear structure to hold ALL the detections from the batch
213 |             # the batch_dim is flattened
214 |             # batch is identified by extra batch column
215 | 
216 |             batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind)
217 |             seq = batch_ind, image_pred_class
218 |             if not write:
219 |                 output = torch.cat(seq, 1)
220 |                 write = True
221 |             else:
222 |                 out = torch.cat(seq, 1)
223 |                 output = torch.cat((output, out))
224 | 
225 |     return output
226 | 


--------------------------------------------------------------------------------
/demo/vis.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | import cv2
  3 | 
  4 | from lib.preprocess import h36m_coco_format, revise_kpts
  5 | from lib.hrnet.gen_kpts import gen_video_kpts as hrnet_pose
  6 | import os
  7 | import numpy as np
  8 | import torch
  9 | import glob
 10 | from tqdm import tqdm
 11 | import copy
 12 | import shutil
 13 | from IPython import embed
 14 | 
 15 | 
 16 | sys.path.append(os.getcwd())
 17 | from model.GCN_conv import adj_mx_from_skeleton
 18 | from model.trans import HTNet
 19 | from common.camera import *
 20 | from common.h36m_dataset import Human36mDataset
 21 | from common.camera import camera_to_world
 22 | from common.opt import opts
 23 | opt = opts().parse()
 24 | 
 25 | import matplotlib
 26 | import matplotlib.pyplot as plt
 27 | import matplotlib.gridspec as gridspec
 28 | 
 29 | os.environ["CUDA_VISIBLE_DEVICES"] = opt.gpu
 30 | 
 31 | plt.switch_backend('agg')
 32 | matplotlib.rcParams['pdf.fonttype'] = 42
 33 | matplotlib.rcParams['ps.fonttype'] = 42
 34 | 
 35 | dataset_path = './dataset/data_3d_h36m.npz'
 36 | dataset = Human36mDataset(dataset_path, opt)
 37 | adj = adj_mx_from_skeleton(dataset.skeleton())
 38 | 
 39 | def show2Dpose(kps, img):
 40 |     connections = [[0, 1], [1, 2], [2, 3], [0, 4], [4, 5],
 41 |                    [5, 6], [0, 7], [7, 8], [8, 9], [9, 10],
 42 |                    [8, 11], [11, 12], [12, 13], [8, 14], [14, 15], [15, 16]]
 43 | 
 44 |     LR = np.array([0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0], dtype=bool)
 45 | 
 46 |     lcolor = (255, 0, 0)
 47 |     rcolor = (0, 0, 255)
 48 |     thickness = 3
 49 | 
 50 |     for j,c in enumerate(connections):
 51 |         start = map(int, kps[c[0]])
 52 |         end = map(int, kps[c[1]])
 53 |         start = list(start)
 54 |         end = list(end)
 55 |         cv2.line(img, (start[0], start[1]), (end[0], end[1]), lcolor if LR[j] else rcolor, thickness)
 56 |         cv2.circle(img, (start[0], start[1]), thickness=-1, color=(0, 255, 0), radius=3)
 57 |         cv2.circle(img, (end[0], end[1]), thickness=-1, color=(0, 255, 0), radius=3)
 58 | 
 59 |     return img
 60 | 
 61 | 
 62 | 
 63 | def show3Dpose(vals, ax):
 64 |     ax.view_init(elev=15., azim=70)
 65 | 
 66 |     lcolor=(0,0,1)
 67 |     rcolor=(1,0,0)
 68 | 
 69 |     I = np.array( [0, 0, 1, 4, 2, 5, 0, 7,  8,  8, 14, 15, 11, 12, 8,  9])
 70 |     J = np.array( [1, 4, 2, 5, 3, 6, 7, 8, 14, 11, 15, 16, 12, 13, 9, 10])
 71 | 
 72 |     LR = np.array([0, 1, 0, 1, 0, 1, 0, 0, 0,   1,  0,  0,  1,  1, 0, 0], dtype=bool)
 73 | 
 74 |     for i in np.arange( len(I) ):
 75 |         x, y, z = [np.array( [vals[I[i], j], vals[J[i], j]] ) for j in range(3)]
 76 |         ax.plot(x, y, z, lw=2, color = lcolor if LR[i] else rcolor)
 77 | 
 78 |     RADIUS = 0.72
 79 |     RADIUS_Z = 0.7
 80 | 
 81 |     xroot, yroot, zroot = vals[0,0], vals[0,1], vals[0,2]
 82 |     ax.set_xlim3d([-RADIUS+xroot, RADIUS+xroot])
 83 |     ax.set_ylim3d([-RADIUS+yroot, RADIUS+yroot])
 84 |     ax.set_zlim3d([-RADIUS_Z+zroot, RADIUS_Z+zroot])
 85 |     ax.set_aspect('auto') # works fine in matplotlib==2.2.2
 86 | 
 87 |     white = (1.0, 1.0, 1.0, 0.0)
 88 |     ax.xaxis.set_pane_color(white) 
 89 |     ax.yaxis.set_pane_color(white)
 90 |     ax.zaxis.set_pane_color(white)
 91 | 
 92 |     ax.tick_params('x', labelbottom = False)
 93 |     ax.tick_params('y', labelleft = False)
 94 |     ax.tick_params('z', labelleft = False)
 95 | 
 96 | 
 97 | 
 98 | 
 99 | def showimage(ax, img):
100 |     ax.set_xticks([])
101 |     ax.set_yticks([])
102 |     plt.axis('off')
103 |     ax.imshow(img)
104 | 
105 | 
106 | def get_pose3D(figure_path, output_dir, file_name):
107 |     # Genarate 2D pose 
108 |     keypoints, scores = hrnet_pose(figure_path, det_dim=416, num_peroson=1, gen_output=True)
109 |     keypoints, scores, valid_frames = h36m_coco_format(keypoints, scores)
110 | 
111 |     ## Reload
112 |     previous_dir = './ckpt/cpn'
113 |     model = HTNet(opt, adj).cuda()
114 |     model_dict = model.state_dict()
115 |     model_path = sorted(glob.glob(os.path.join(previous_dir, '*.pth')))[0]
116 |     pre_dict = torch.load(model_path)
117 |     for name, key in model_dict.items():
118 |         model_dict[name] = pre_dict[name]
119 |     model.load_state_dict(model_dict)
120 |     model.eval()
121 | 
122 |     ## 3D
123 |     img = cv2.imread(figure_path)
124 |     img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
125 |     img_size = img.shape
126 |     input_2D_no = keypoints[:,0,:,:]
127 | 
128 |     joints_left =  [4, 5, 6, 11, 12, 13]
129 |     joints_right = [1, 2, 3, 14, 15, 16]
130 | 
131 |     input_2D = normalize_screen_coordinates(input_2D_no, w=img_size[1], h=img_size[0])
132 | 
133 |     input_2D_aug = copy.deepcopy(input_2D)
134 |     input_2D_aug[ :, :, 0] *= -1
135 |     input_2D_aug[ :, joints_left + joints_right] = input_2D_aug[ :, joints_right + joints_left]
136 |     input_2D = np.concatenate((np.expand_dims(input_2D, axis=0), np.expand_dims(input_2D_aug, axis=0)), 0)
137 | 
138 |     input_2D = input_2D[np.newaxis, :, :, :, :]
139 | 
140 |     input_2D = torch.from_numpy(input_2D.astype('float32')).cuda()
141 | 
142 |     N = input_2D.size(0)
143 | 
144 |     ## estimation
145 |     output_3D_non_flip = model(input_2D[:, 0])
146 |     output_3D_flip     = model(input_2D[:, 1])
147 | 
148 |     output_3D_flip[:, :, :, 0] *= -1
149 |     output_3D_flip[:, :, joints_left + joints_right, :] = output_3D_flip[:, :, joints_right + joints_left, :]
150 | 
151 |     output_3D = (output_3D_non_flip + output_3D_flip) / 2
152 | 
153 |     output_3D = output_3D[0:, opt.pad].unsqueeze(1)
154 |     output_3D[:, :, 0, :] = 0
155 |     post_out = output_3D[0, 0].cpu().detach().numpy()
156 | 
157 |     rot =  [0.1407056450843811, -0.1500701755285263, -0.755240797996521, 0.6223280429840088]
158 |     rot = np.array(rot, dtype='float32')
159 |     post_out = camera_to_world(post_out, R=rot, t=0)
160 |     post_out[:, 2] -= np.min(post_out[:, 2])
161 | 
162 |     input_2D_no = input_2D_no[opt.pad]
163 | 
164 |     ## 2D
165 |     image = show2Dpose(input_2D_no, copy.deepcopy(img))
166 | 
167 | 
168 |     ## 3D
169 |     fig = plt.figure( figsize=(9.6, 5.4))
170 |     gs = gridspec.GridSpec(1, 1)
171 |     gs.update(wspace=-0.00, hspace=0.05)
172 |     ax = plt.subplot(gs[0], projection='3d')
173 |     show3Dpose( post_out, ax)
174 | 
175 | 
176 |     output_dir_3D = output_dir +'pose3D/'
177 |     os.makedirs(output_dir_3D, exist_ok=True)
178 |     plt.savefig(output_dir_3D  + '_3D.png', dpi=200, format='png', bbox_inches = 'tight')
179 | 
180 | 
181 | 
182 | 
183 |     ## all
184 |     image_3d_dir = sorted(glob.glob(os.path.join(output_dir_3D, '*.png')))
185 | 
186 |     for i in range(len(image_3d_dir)):
187 |         image_2d = image
188 |         image_3d = plt.imread(image_3d_dir[i])
189 |         ## crop
190 |         edge = (image_2d.shape[1] - image_2d.shape[0]) // 2
191 |         image_2d = image_2d[:, edge:image_2d.shape[1] - edge]
192 | 
193 |         edge = 130
194 |         image_3d = image_3d[edge:image_3d.shape[0] - edge, edge:image_3d.shape[1] - edge]
195 |         ## show
196 |         font_size = 12
197 |         ax = plt.subplot(121)
198 |         showimage(ax, image_2d)
199 |         ax.set_title("Input", fontsize = font_size)
200 | 
201 |         ax = plt.subplot(122)
202 |         showimage(ax, image_3d)
203 |         ax.set_title("Pose", fontsize = font_size)
204 | 
205 |         ## save
206 |         output_dir_pose = output_dir 
207 |         plt.savefig(output_dir + file_name + '_pose.png', dpi=200, bbox_inches = 'tight')
208 |     
209 |     shutil.rmtree("./demo/output/pose3D")
210 |     
211 |     
212 | 
213 | if __name__ == "__main__":
214 |     items = os.listdir('./demo/figure/')
215 |     print(items)
216 |     for i, file_name in enumerate(items): 
217 |         print("Gnenerate Pose For " + file_name)
218 |         figure_path = './demo/figure/' + file_name
219 |         output_dir = './demo/output/'
220 |         get_pose3D(figure_path, output_dir, file_name[:-4])
221 | 
222 | 
223 | 
224 | 


--------------------------------------------------------------------------------
/figure/README.md:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/figure/messi_pose.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/figure/messi_pose.png


--------------------------------------------------------------------------------
/figure/structure.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/figure/structure.png


--------------------------------------------------------------------------------
/figure/wild.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/figure/wild.png


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import glob
  3 | import torch
  4 | import random
  5 | import logging
  6 | import numpy as np
  7 | from tqdm import tqdm
  8 | import torch.utils.data
  9 | import torch.optim as optim
 10 | 
 11 | from common.opt import opts
 12 | from common.utils import *
 13 | from common.load_data_hm36 import Fusion
 14 | from common.h36m_dataset import Human36mDataset
 15 | from model.GCN_conv import adj_mx_from_skeleton
 16 | from model.trans import HTNet
 17 | 
 18 | opt = opts().parse()
 19 | os.environ["CUDA_VISIBLE_DEVICES"] = opt.gpu
 20 | from tensorboardX import SummaryWriter
 21 | 
 22 | writer = SummaryWriter(log_dir='./runs/' + opt.model_name)
 23 | 
 24 | 
 25 | def train(opt, actions, train_loader, model, optimizer, epoch):
 26 |     return step('train', opt, actions, train_loader, model, optimizer, epoch)
 27 | 
 28 | def val(opt, actions, val_loader, model):
 29 |     with torch.no_grad():
 30 |         return step('test', opt, actions, val_loader, model)
 31 | 
 32 | def step(split, opt, actions, dataLoader, model, optimizer=None, epoch=None):
 33 |     loss_all = {'loss': AccumLoss()}
 34 |     action_error_sum = define_error_list(actions)
 35 |     if split == 'train':
 36 |         model.train()
 37 |     else:
 38 |         model.eval()
 39 |     for i, data in enumerate(tqdm(dataLoader, 0)):
 40 |         batch_cam, gt_3D, input_2D, action, subject, scale, bb_box, cam_ind = data
 41 |         [input_2D, gt_3D, batch_cam, scale, bb_box] = get_varialbe(split, [input_2D, gt_3D, batch_cam, scale, bb_box])
 42 |         if split =='train':
 43 |             output_3D = model(input_2D)
 44 |         else:
 45 |             input_2D, output_3D = input_augmentation(input_2D, model)
 46 |         out_target = gt_3D.clone()
 47 |         out_target[:, :, 0] = 0
 48 |         if split == 'train':
 49 |             loss = mpjpe_cal(output_3D, out_target)
 50 |             N = input_2D.size(0)
 51 |             loss_all['loss'].update(loss.detach().cpu().numpy() * N, N)
 52 |             optimizer.zero_grad()
 53 |             loss.backward()
 54 |             optimizer.step()
 55 |         elif split == 'test':
 56 |             output_3D = output_3D[:, opt.pad].unsqueeze(1)
 57 |             output_3D[:, :, 0, :] = 0
 58 |             action_error_sum = test_calculation(output_3D, out_target, action, action_error_sum, opt.dataset, subject)
 59 |     if split == 'train':
 60 |         return loss_all['loss'].avg
 61 |     elif split == 'test':
 62 |         p1, p2 = print_error(opt.dataset, action_error_sum, opt.train)
 63 |         return p1, p2
 64 | 
 65 | def input_augmentation(input_2D, model):
 66 |     joints_left = [4, 5, 6, 11, 12, 13]
 67 |     joints_right = [1, 2, 3, 14, 15, 16]
 68 |     input_2D_non_flip = input_2D[:, 0]
 69 |     input_2D_flip = input_2D[:, 1]
 70 |     output_3D_non_flip = model(input_2D_non_flip)
 71 |     output_3D_flip = model(input_2D_flip)
 72 |     output_3D_flip[:, :, :, 0] *= -1
 73 |     output_3D_flip[:, :, joints_left + joints_right, :] = output_3D_flip[:, :, joints_right + joints_left, :]
 74 |     output_3D = (output_3D_non_flip + output_3D_flip) / 2
 75 |     input_2D = input_2D_non_flip
 76 |     return input_2D, output_3D
 77 | 
 78 | if __name__ == '__main__':
 79 |     manualSeed = opt.seed
 80 |     random.seed(manualSeed)
 81 |     torch.manual_seed(manualSeed)
 82 |     np.random.seed(manualSeed)
 83 |     torch.cuda.manual_seed_all(manualSeed)
 84 |     torch.backends.cudnn.benchmark = False
 85 |     torch.backends.cudnn.deterministic = True
 86 |     
 87 |     print("lr: ", opt.lr)
 88 |     print("batch_size: ", opt.batch_size)
 89 |     print("channel: ", opt.channel)
 90 |     print("GPU: ", opt.gpu)
 91 | 
 92 |     if opt.train:
 93 |         logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%Y/%m/%d %H:%M:%S', \
 94 |             filename=os.path.join(opt.checkpoint, 'train.log'), level=logging.INFO)
 95 | 
 96 |     root_path = opt.root_path
 97 |     dataset_path = root_path + 'data_3d_' + opt.dataset + '.npz'
 98 | 
 99 |     dataset = Human36mDataset(dataset_path, opt)
100 |     actions = define_actions(opt.actions)
101 |     adj = adj_mx_from_skeleton(dataset.skeleton())
102 | 
103 | 
104 |     if opt.train:
105 |         train_data = Fusion(opt=opt, train=True, dataset=dataset, root_path=root_path)
106 |         train_dataloader = torch.utils.data.DataLoader(train_data, batch_size=opt.batch_size,
107 |                                                        shuffle=True, num_workers=int(opt.workers), pin_memory=True)
108 | 
109 |     test_data = Fusion(opt=opt, train=False, dataset=dataset, root_path =root_path)
110 |     test_dataloader = torch.utils.data.DataLoader(test_data, batch_size=opt.batch_size,
111 |                                                   shuffle=False, num_workers=int(opt.workers), pin_memory=True)
112 | 
113 |     model = HTNet(opt,adj).cuda()
114 | 
115 |     if opt.reload:
116 |         model_dict = model.state_dict()
117 |         model_path = sorted(glob.glob(os.path.join(opt.previous_dir, '*.pth')))[0]
118 |         print(model_path)
119 |         pre_dict = torch.load(model_path)
120 |         pre_key = pre_dict.keys()
121 |         for name, key in model_dict.items():
122 |             model_dict[name] = pre_dict[name]
123 |         model.load_state_dict(model_dict)
124 | 
125 |     model_params = 0
126 |     for parameter in model.parameters():
127 |         model_params += parameter.numel()
128 |     print('INFO: Trainable parameter count:', model_params / 1000000)
129 | 
130 |     
131 |     all_param = []
132 |     lr = opt.lr
133 |     all_param += list(model.parameters())
134 |     optimizer = optim.Adam(all_param, lr=opt.lr, amsgrad=True)
135 |     scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.317, patience=5, verbose=True)
136 | 
137 |     for epoch in range(1, opt.nepoch):
138 |         if opt.train:
139 |             loss = train(opt, actions, train_dataloader, model, optimizer, epoch)
140 |         p1, p2 = val(opt, actions, test_dataloader, model)
141 |         writer.add_scalar('mpjpe',p1,epoch)
142 |         writer.add_scalar('p2',p1,epoch)
143 | 
144 |         if opt.train and p1 < opt.previous_best_threshold:
145 |             opt.previous_name = save_model(opt.previous_name, opt.checkpoint, epoch, p1, model)
146 |             opt.previous_best_threshold = p1
147 |         if opt.train == 0:
148 |             print('p1: %.2f, p2: %.2f' % (p1, p2))
149 |             break
150 |         else:
151 |             logging.info('epoch: %d, lr: %.7f, loss: %.4f, p1: %.2f, p2: %.2f' % (epoch, lr, loss, p1, p2))
152 |             print('e: %d, lr: %.7f, loss: %.4f, p1: %.2f, p2: %.2f' % (epoch, lr, loss, p1, p2))
153 |         if epoch % opt.large_decay_epoch == 0:
154 |             for param_group in optimizer.param_groups:
155 |                 param_group['lr'] *= opt.lr_decay_large
156 |                 lr *= opt.lr_decay_large
157 |         else:
158 |             for param_group in optimizer.param_groups:
159 |                 param_group['lr'] *= opt.lr_decay
160 |                 lr *= opt.lr_decay
161 | 
162 | 
163 | 
164 | 
165 | 
166 | 
167 | 
168 | 
169 | 


--------------------------------------------------------------------------------
/model/Block.py:
--------------------------------------------------------------------------------
  1 | from functools import partial
  2 | import torch
  3 | import torch.nn as nn
  4 | 
  5 | from timm.data import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD
  6 | from timm.models.layers import DropPath
  7 | from model.GCN_conv import ModulatedGraphConv
  8 | from model.Transformer import Attention, Mlp
  9 | 
 10 | #X_1
 11 | rl_2joints = [2,3]
 12 | ll_2joints = [5,6]
 13 | la_2joints = [12,13]
 14 | ra_2joints = [15,16]
 15 | part_2joints = [rl_2joints,ll_2joints,la_2joints,ra_2joints]
 16 | # X_2
 17 | rl_3joints = [1,2,3]
 18 | ll_3joints = [4,5,6]
 19 | ra_3joints = [14,15,16]
 20 | la_3joints = [11,12,13]
 21 | part_3joints = [rl_3joints,ll_3joints,la_3joints,ra_3joints]
 22 | 
 23 | class LJC(nn.Module):
 24 |     def __init__(self, adj, dim, drop_path=0., norm_layer=nn.LayerNorm):
 25 |         super().__init__()
 26 |         self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
 27 |         self.adj = adj
 28 |         self.norm_gcn1 = norm_layer(dim)
 29 |         self.gcn1 = ModulatedGraphConv(dim,384,self.adj)
 30 |         self.gelu = nn.GELU()
 31 |         self.gcn2 = ModulatedGraphConv(384,dim,self.adj)
 32 |         self.norm_gcn2 = norm_layer(dim)
 33 | 
 34 |     def forward(self, x_gcn):
 35 |         x_gcn = x_gcn + self.drop_path(self.norm_gcn2(self.gcn2(self.gelu(self.gcn1(self.norm_gcn1(x_gcn))))))
 36 |         return x_gcn
 37 | 
 38 | 
 39 | class IPC(nn.Module):
 40 |     def __init__(self, dim, mlp_hidden_dim, drop=0., drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm):
 41 |         super().__init__()
 42 |         self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
 43 |         self.index_1 =    [1,2,3, 4,5,6, 11,12,13, 14,15,16] # 6parts
 44 |         self.index_2 =    [2,3, 5,6, 12,13, 15,16]
 45 |         self.gelu = nn.GELU()
 46 |         self.norm_conv1 = norm_layer(dim)
 47 |         self.conv1 = nn.Conv1d(dim,dim, kernel_size=3, padding=0, stride=3)
 48 |         self.norm_conv1_mlp = norm_layer(dim)
 49 |         self.mlp_down_1 = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
 50 |         self.norm_conv2 = norm_layer(dim)
 51 |         self.conv2 = nn.Conv1d(dim,dim, kernel_size=2, padding=0, stride=2)
 52 |         self.norm_conv2_mlp = norm_layer(dim)
 53 |         self.mlp_down_2 = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
 54 | 
 55 | 
 56 |     def forward(self,  x_gcn, x_conv):
 57 |         x_conv = x_conv + x_gcn
 58 |         
 59 |         #NOTE:Conv_1  3 joints per limb
 60 |         x_conv_1 = self.norm_conv1(x_conv)
 61 |         x_conv_1 = x_conv_1.permute(0,2,1)
 62 |         x_pooling_1 = x_conv_1[:, :, self.index_1]
 63 |         x_pooling_1 = self.drop_path(self.gelu(self.conv1(x_pooling_1)))
 64 | 
 65 |         x_pooling_1 = x_pooling_1.permute(0,2,1)
 66 |         x_pooling_1 = x_pooling_1 + self.drop_path(self.mlp_down_1(self.norm_conv1_mlp(x_pooling_1)))
 67 |         x_pooling_1 = x_pooling_1.permute(0,2,1)
 68 |         for i in range(len(part_3joints)):
 69 |             num_joints = len(part_3joints[i]) - 1
 70 |             x_conv_1[:,:,part_3joints[i][1:]] = x_pooling_1[:,:,i].unsqueeze(-1).repeat(1,1,num_joints)
 71 |         x_conv_1 = x_conv_1.permute(0,2,1)
 72 | 
 73 |         #NOTE:Conv_2  2 joints per limb
 74 |         x_conv_2 = self.norm_conv2(x_conv)
 75 |         x_conv_2 = x_conv_2.permute(0,2,1)
 76 |         x_pooling_2 = x_conv_2[:, :, self.index_2]
 77 |         x_pooling_2 = self.drop_path(self.gelu(self.conv2(x_pooling_2)))
 78 | 
 79 |         x_pooling_2 = x_pooling_2.permute(0,2,1)
 80 |         x_pooling_2 = x_pooling_2 + self.drop_path(self.mlp_down_2(self.norm_conv2_mlp(x_pooling_2)))
 81 |         x_pooling_2 = x_pooling_2.permute(0,2,1)
 82 |         for i in range(len(part_2joints)):
 83 |             num_joints = len(part_2joints[i]) - 1
 84 |             x_conv_2[:,:,part_2joints[i][1:]] = x_pooling_2[:,:,i].unsqueeze(-1).repeat(1,1,num_joints)
 85 |         x_conv_2 = x_conv_2.permute(0,2,1)
 86 | 
 87 |         x_conv = x_conv_1 + x_conv_2 + x_conv
 88 |         return x_conv
 89 | 
 90 | 
 91 | class GBI(nn.Module):
 92 |     def __init__(self, dim, num_heads,  qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,
 93 |                  drop_path=0., norm_layer=nn.LayerNorm, length=1):
 94 |         super().__init__()
 95 |         self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
 96 |         self.norm_attn = norm_layer(dim)
 97 |         self.attn = Attention(dim, num_heads=num_heads, qkv_bias=qkv_bias, \
 98 |             qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop, length=length)
 99 | 
100 |     def forward(self, x_conv, x_attn):
101 |         x_attn = x_attn + x_conv
102 |         x_attn = x_attn + self.drop_path(self.attn(self.norm_attn(x_attn)))
103 |         return x_attn
104 | 
105 | 
106 | 
107 | 
108 | 
109 | class Hiremixer(nn.Module):
110 |     def __init__(self, adj, depth=8, embed_dim=512, mlp_hidden_dim=1024, h=8, drop_rate=0.1, length=9):
111 |         super().__init__()
112 |         drop_path_rate = 0.3
113 |         attn_drop_rate = 0.
114 |         qkv_bias = True
115 |         qk_scale = None
116 |         norm_layer = partial(nn.LayerNorm, eps=1e-6)
117 |         # Stochastic depth decay rule
118 |         dpr = [x.item() for x in torch.linspace(0.1, drop_path_rate, depth)]
119 |         self.blocks = nn.ModuleList([
120 |             Block(
121 |                 adj, dim=embed_dim, num_heads=h, mlp_hidden_dim=mlp_hidden_dim, qkv_bias=qkv_bias, qk_scale=qk_scale,
122 |                 drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer, length=length)
123 |             for i in range(depth)])
124 |         self.Temporal_norm = norm_layer(embed_dim)
125 | 
126 |     def forward(self, x):
127 |         for blk in self.blocks:
128 |             x = blk(x)
129 |         x = self.Temporal_norm(x)
130 |         return x
131 | 
132 | 
133 | class Block(nn.Module):
134 |     def __init__(self, adj, dim, num_heads, mlp_hidden_dim, qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,
135 |                  drop_path=0., act_layer=nn.GELU,    norm_layer=nn.LayerNorm, length=1):
136 |         super().__init__()
137 | 
138 |         dim = int(dim/3)
139 |         self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
140 | 
141 |         # Three sub-modules
142 |         self.lgc = LJC(adj, dim, drop_path=drop_path, norm_layer=nn.LayerNorm)
143 |         self.ipc = IPC(dim, mlp_hidden_dim, drop=drop, drop_path=drop_path, act_layer=nn.GELU, norm_layer=nn.LayerNorm)
144 |         self.gbi = GBI(dim, num_heads,  qkv_bias=qkv_bias, qk_scale=qk_scale, drop=0.1, attn_drop=attn_drop,
145 |                  drop_path=drop_path, norm_layer=nn.LayerNorm, length=length)
146 | 
147 |         self.norm_mlp = norm_layer(dim*3)
148 |         self.mlp = Mlp(in_features=dim*3, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
149 | 
150 | 
151 | 
152 |     def forward(self, x):
153 |         x_split = torch.chunk(x,3,-1)
154 |         x_lgc, x_ipc, x_gbi = x_split
155 |         # Local Joint-level Connection (LJC)
156 |         x_lgc = self.lgc(x_lgc)
157 |         # Inter-Part Constraint (IPC)
158 |         x_ipc = self.ipc(x_lgc, x_ipc)
159 |         # Global body-level Interaction (GBI)
160 |         x_gbi = self.gbi(x_ipc, x_gbi)
161 |         x_cat = torch.cat([x_lgc,x_ipc,x_gbi], -1)
162 |         x = x_cat + self.drop_path(self.mlp(self.norm_mlp(x_cat)))
163 |         return x
164 | 
165 | 
166 | 
167 | class Hiremixer_frame(nn.Module):
168 |     def __init__(self, adj, depth=8, embed_dim=512, mlp_hidden_dim=1024, h=8, drop_rate=0.1, length=9):
169 |         super().__init__()
170 |         drop_path_rate = 0.3
171 |         attn_drop_rate = 0.
172 |         qkv_bias = True
173 |         qk_scale = None
174 |         norm_layer = partial(nn.LayerNorm, eps=1e-6)
175 |         # Stochastic depth decay rule
176 |         dpr = [x.item() for x in torch.linspace(0.1, drop_path_rate, depth)]
177 |         self.blocks = nn.ModuleList([
178 |             Block_frame(
179 |                 adj, dim=embed_dim, num_heads=h, mlp_hidden_dim=mlp_hidden_dim, qkv_bias=qkv_bias, qk_scale=qk_scale,
180 |                 drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer, length=length)
181 |             for i in range(depth)])
182 |         self.Temporal_norm = norm_layer(embed_dim)
183 | 
184 |     def forward(self, x):
185 |         for blk in self.blocks:
186 |             x = blk(x)
187 |         x = self.Temporal_norm(x)
188 |         return x
189 | 
190 | 
191 | class Block_frame(nn.Module):
192 |     def __init__(self, adj, dim, num_heads, mlp_hidden_dim, qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,
193 |                  drop_path=0., act_layer=nn.GELU,    norm_layer=nn.LayerNorm, length=1):
194 |         super().__init__()
195 | 
196 |         dim = int(dim/2)
197 |         self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
198 | 
199 |         # Three sub-modules
200 |         self.lgc = LJC(adj, dim, drop_path=drop_path, norm_layer=nn.LayerNorm)
201 |         self.gbi = GBI(dim, num_heads,  qkv_bias=qkv_bias, qk_scale=qk_scale, drop=0.1, attn_drop=attn_drop,
202 |                  drop_path=drop_path, norm_layer=nn.LayerNorm, length=length)
203 | 
204 |         self.norm_mlp = norm_layer(dim*2)
205 |         self.mlp = Mlp(in_features=dim*2, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
206 | 
207 | 
208 | 
209 |     def forward(self, x):
210 |         x_split = torch.chunk(x,2,-1)
211 |         x_lgc,  x_gbi = x_split
212 |         # Local Joint-level Connection (LJC)
213 |         x_lgc = self.lgc(x_lgc)
214 |         # Global body-level Interaction (GBI)
215 |         x_gbi = self.gbi(x_lgc, x_gbi)
216 |         x_cat = torch.cat([x_lgc,x_gbi], -1)
217 |         x = x_cat + self.drop_path(self.mlp(self.norm_mlp(x_cat)))
218 |         return x
219 |     
220 | 
221 | 
222 | class Block_ipc(nn.Module):
223 |     def __init__(self, adj, dim, num_heads, mlp_hidden_dim, qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,
224 |                  drop_path=0., act_layer=nn.GELU,    norm_layer=nn.LayerNorm, length=1):
225 |         super().__init__()
226 | 
227 |         self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
228 | 
229 |         # Three sub-modules
230 |         self.ipc = IPC(dim, mlp_hidden_dim, drop=drop, drop_path=drop_path, act_layer=nn.GELU, norm_layer=nn.LayerNorm)
231 | 
232 |         self.norm_mlp = norm_layer(dim)
233 |         self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
234 | 
235 | 
236 | 
237 |     def forward(self, x):
238 |         x = x + self.ipc(x)
239 |         x = x + self.drop_path(self.mlp(self.norm_mlp(x)))
240 |         return x


--------------------------------------------------------------------------------
/model/GCN_conv.py:
--------------------------------------------------------------------------------
  1 | from __future__ import absolute_import, division
  2 | 
  3 | import math
  4 | import torch
  5 | import torch.nn as nn
  6 | import torch.nn.functional as F
  7 | import numpy as np
  8 | 
  9 | import scipy.sparse as sp
 10 | 
 11 | 
 12 | class _NonLocalBlockND(nn.Module):
 13 |     def __init__(self, in_channels, inter_channels=None, dimension=3, sub_sample=True, bn_layer=True):
 14 |         super(_NonLocalBlockND, self).__init__()
 15 | 
 16 |         assert dimension in [1, 2, 3]
 17 | 
 18 |         self.dimension = dimension
 19 |         self.sub_sample = sub_sample
 20 | 
 21 |         self.in_channels = in_channels
 22 |         self.inter_channels = inter_channels
 23 | 
 24 |         if self.inter_channels is None:
 25 |             self.inter_channels = in_channels // 2
 26 |             if self.inter_channels == 0:
 27 |                 self.inter_channels = 1
 28 | 
 29 |         if dimension == 3:
 30 |             conv_nd = nn.Conv3d
 31 |             max_pool_layer = nn.MaxPool3d(kernel_size=(1, 2, 2))
 32 |             bn = nn.BatchNorm3d
 33 |         elif dimension == 2:
 34 |             conv_nd = nn.Conv2d
 35 |             max_pool_layer = nn.MaxPool2d(kernel_size=(2, 2))
 36 |             bn = nn.BatchNorm2d
 37 |         else:
 38 |             conv_nd = nn.Conv1d
 39 |             max_pool_layer = nn.MaxPool1d(kernel_size=(2))
 40 |             bn = nn.BatchNorm1d
 41 | 
 42 |         self.g = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
 43 |                          kernel_size=1, stride=1, padding=0)
 44 | 
 45 |         if bn_layer:
 46 |             self.W = nn.Sequential(
 47 |                 conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
 48 |                         kernel_size=1, stride=1, padding=0),
 49 |                 bn(self.in_channels)
 50 |             )
 51 |             nn.init.constant_(self.W[1].weight, 0)
 52 |             nn.init.constant_(self.W[1].bias, 0)
 53 |         else:
 54 |             self.W = conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
 55 |                              kernel_size=1, stride=1, padding=0)
 56 |             nn.init.constant_(self.W.weight, 0)
 57 |             nn.init.constant_(self.W.bias, 0)
 58 | 
 59 |         self.theta = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
 60 |                              kernel_size=1, stride=1, padding=0)
 61 |         self.phi = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
 62 |                            kernel_size=1, stride=1, padding=0)
 63 | 
 64 |         if sub_sample:
 65 |             self.g = nn.Sequential(self.g, max_pool_layer)#384-》192
 66 |             self.phi = nn.Sequential(self.phi, max_pool_layer)
 67 | 
 68 |     def forward(self, x):
 69 |         '''
 70 |         :param x: (b, c, t, h, w)
 71 |         :return:
 72 |         '''
 73 | 
 74 |         batch_size = x.size(0)#torch.Size([256, 384, 1, 17])
 75 | 
 76 |         g_x = self.g(x).view(batch_size, self.inter_channels, -1)#256，192，17
 77 |         g_x = g_x.permute(0, 2, 1)#torch.Size([256, 17, 192])
 78 | 
 79 |         theta_x = self.theta(x).view(batch_size, self.inter_channels, -1)
 80 |         theta_x = theta_x.permute(0, 2, 1)#torch.Size([256, 17, 192])
 81 |         phi_x = self.phi(x).view(batch_size, self.inter_channels, -1)#torch.Size([256, 192, 17])
 82 |         f = torch.matmul(theta_x, phi_x)#torch.Size([256, 17, 17])
 83 |         f_div_C = F.softmax(f, dim=-1)#torch.Size([256, 17, 17])在最后一个维度上进行softmax
 84 | 
 85 |         y = torch.matmul(f_div_C, g_x) #这一步其实就是AMIX感觉
 86 |         y = y.permute(0, 2, 1).contiguous()#torch.Size([256, 17, 192])
 87 |         y = y.view(batch_size, self.inter_channels, *x.size()[2:])#torch.Size([256, 192, 1, 17])
 88 |         W_y = self.W(y)
 89 |         z = W_y + x##torch.Size([256, 384, 1, 17])        Amix
 90 | 
 91 |         return z
 92 | 
 93 | 
 94 | 
 95 | 
 96 | class NONLocalBlock2D(_NonLocalBlockND):
 97 |     def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
 98 |         super(NONLocalBlock2D, self).__init__(in_channels,
 99 |                                               inter_channels=inter_channels,
100 |                                               dimension=2, sub_sample=sub_sample,
101 |                                               bn_layer=bn_layer)
102 | 
103 | 
104 | class ModulatedGraphConv(nn.Module):
105 |     """
106 |     Semantic graph convolution layer
107 |     """
108 | 
109 |     def __init__(self, in_features, out_features, adj, bias=True):
110 |         super(ModulatedGraphConv, self).__init__()
111 |         self.in_features = in_features
112 |         self.out_features = out_features
113 | 
114 |         self.W = nn.Parameter(torch.zeros(size=(2, in_features, out_features), dtype=torch.float))  #torch.Size([2,2, 384])
115 |         nn.init.xavier_uniform_(self.W.data, gain=1.414)
116 | 
117 |         self.M = nn.Parameter(torch.zeros(size=(adj.size(0), out_features), dtype=torch.float))#17,384,取值在
118 |         nn.init.xavier_uniform_(self.M.data, gain=1.414)
119 | 
120 |         self.adj = adj
121 | 
122 |         self.adj2 = nn.Parameter(torch.ones_like(adj))
123 |         nn.init.constant_(self.adj2, 1e-6)
124 | 
125 |         if bias:
126 |             self.bias = nn.Parameter(torch.zeros(out_features, dtype=torch.float))
127 |             stdv = 1. / math.sqrt(self.W.size(2))
128 |             self.bias.data.uniform_(-stdv, stdv)
129 |         else:
130 |             self.register_parameter('bias', None)
131 | 
132 |     def forward(self, input):
133 |         h0 = torch.matmul(input, self.W[0])  #input 256,17,2  -> 256,17,384
134 |         h1 = torch.matmul(input, self.W[1])
135 | 
136 |         adj = self.adj.to(input.device) + self.adj2.to(input.device)
137 |         adj = (adj.T + adj)/2
138 |         E = torch.eye(adj.size(0), dtype=torch.float).to(input.device) #17*17的I
139 | 
140 |         output = torch.matmul(adj * E, self.M*h0) + torch.matmul(adj * (1 - E), self.M*h1) #前者是专门针对自己的I，后者是针对M的
141 |         if self.bias is not None:
142 |             return output + self.bias.view(1, 1, -1) #torch.Size([256, 17, 384])，全部都有的
143 |         else:
144 |             return output
145 | 
146 |     def __repr__(self):
147 |         return self.__class__.__name__ + ' (' + str(self.in_features) + ' -> ' + str(self.out_features) + ')'
148 | 
149 | 
150 | 
151 | def normalize(mx):
152 |     """Row-normalize sparse matrix"""
153 |     rowsum = np.array(mx.sum(1))
154 |     r_inv = np.power(rowsum, -1).flatten()
155 |     r_inv[np.isinf(r_inv)] = 0.
156 |     r_mat_inv = sp.diags(r_inv)
157 |     mx = r_mat_inv.dot(mx)
158 |     return mx
159 | 
160 | def sparse_mx_to_torch_sparse_tensor(sparse_mx):
161 |     """Convert a scipy sparse matrix to a torch sparse tensor."""
162 |     sparse_mx = sparse_mx.tocoo().astype(np.float32)
163 |     indices = torch.from_numpy(np.vstack((sparse_mx.row, sparse_mx.col)).astype(np.int64))
164 |     values = torch.from_numpy(sparse_mx.data)
165 |     shape = torch.Size(sparse_mx.shape)
166 |     return torch.sparse.FloatTensor(indices, values, shape)
167 | 
168 | 
169 | def adj_mx_from_edges(num_pts, edges, sparse=True):
170 |     edges = np.array(edges, dtype=np.int32)
171 |     data, i, j = np.ones(edges.shape[0]), edges[:, 0], edges[:, 1]
172 |     adj_mx = sp.coo_matrix((data, (i, j)), shape=(num_pts, num_pts), dtype=np.float32)
173 | 
174 |     # build symmetric adjacency matrix
175 |     adj_mx = adj_mx + adj_mx.T.multiply(adj_mx.T > adj_mx) - adj_mx.multiply(adj_mx.T > adj_mx)
176 |     adj_mx = normalize(adj_mx) #+ sp.eye(adj_mx.shape[0]))
177 |     if sparse:
178 |         adj_mx = sparse_mx_to_torch_sparse_tensor(adj_mx)
179 |     else:
180 |         adj_mx = torch.tensor(adj_mx.todense(), dtype=torch.float)
181 | 
182 |     adj_mx = adj_mx * (1-torch.eye(adj_mx.shape[0])) + torch.eye(adj_mx.shape[0])
183 |     return adj_mx
184 | 
185 | 
186 | def adj_mx_from_skeleton(skeleton):
187 |     num_joints = skeleton.num_joints()
188 |     edges = list(filter(lambda x: x[1] >= 0, zip(list(range(0, num_joints)), skeleton.parents())))
189 |     return adj_mx_from_edges(num_joints, edges, sparse=False)
190 | 


--------------------------------------------------------------------------------
/model/Transformer.py:
--------------------------------------------------------------------------------
 1 | import torch.nn as nn
 2 | from timm.data import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD
 3 | import torch
 4 | 
 5 | 
 6 | class Mlp(nn.Module):
 7 |     def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):
 8 |         super().__init__()
 9 |         out_features = out_features or in_features
10 |         hidden_features = hidden_features or in_features
11 |         self.fc1 = nn.Linear(in_features, hidden_features)
12 |         self.act = act_layer()
13 |         self.fc2 = nn.Linear(hidden_features, out_features)
14 |         self.drop = nn.Dropout(drop)
15 | 
16 |     def forward(self, x):
17 |         x = self.fc1(x)
18 |         x = self.act(x)
19 |         x = self.drop(x)
20 |         x = self.fc2(x)
21 |         x = self.drop(x)
22 |         return x
23 | 
24 | class Attention(nn.Module):
25 |     def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0., length=27):
26 |         super().__init__()
27 | 
28 |         self.num_heads = num_heads
29 |         head_dim = torch.div(dim, num_heads)
30 |         self.scale = qk_scale or head_dim ** -0.5 
31 |         self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias) 
32 |         self.attn_drop = nn.Dropout(attn_drop)
33 |         self.proj = nn.Linear(dim, dim)
34 |         self.proj_drop = nn.Dropout(proj_drop)
35 | 
36 |     def forward(self, x):
37 |         B, N, C = x.shape 
38 |         qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, torch.div(C, self.num_heads, rounding_mode='floor')).permute(2, 0, 3, 1, 4) 
39 |         q, k, v = qkv[0], qkv[1], qkv[2]   
40 | 
41 |         attn = (q @ k.transpose(-2, -1)) * self.scale 
42 |         attn = attn.softmax(dim=-1)
43 | 
44 |         attn = self.attn_drop(attn)
45 | 
46 |         x = (attn @ v).transpose(1, 2).reshape(B, N, C)
47 |         x = self.proj(x)
48 |         x = self.proj_drop(x)
49 |         return x


--------------------------------------------------------------------------------
/model/__pycache__/Block.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/model/__pycache__/Block.cpython-39.pyc


--------------------------------------------------------------------------------
/model/__pycache__/GCN_conv.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/model/__pycache__/GCN_conv.cpython-39.pyc


--------------------------------------------------------------------------------
/model/__pycache__/Transformer.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/model/__pycache__/Transformer.cpython-39.pyc


--------------------------------------------------------------------------------
/model/__pycache__/trans.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/model/__pycache__/trans.cpython-39.pyc


--------------------------------------------------------------------------------
/model/post_refine.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | 
 4 | from torch.autograd import Variable
 5 | 
 6 | 
 7 | 
 8 | inter_channels = [128, 256]
 9 | fc_out = inter_channels[1]
10 | fc_unit = 1024
11 | class post_refine(nn.Module):
12 | 
13 | 
14 |     def __init__(self, opt):
15 |         super().__init__()
16 | 
17 |         out_seqlen = 1
18 |         fc_in = opt.out_channels*2*out_seqlen*opt.n_joints
19 | 
20 |         fc_out = opt.in_channels * opt.n_joints
21 |         self.post_refine = nn.Sequential(
22 |             nn.Linear(fc_in, fc_unit),
23 |             nn.ReLU(),
24 |             nn.Dropout(0.5,inplace=True),
25 |             nn.Linear(fc_unit, fc_out),
26 |             nn.Sigmoid()
27 | 
28 |         )
29 | 
30 | 
31 |     def forward(self, x, x_1):
32 |         """
33 | 
34 |         :param x:  N*T*V*3
35 |         :param x_1: N*T*V*2
36 |         :return:
37 |         """
38 |         # data normalization
39 |         N, T, V,_ = x.size()
40 |         x_in = torch.cat((x, x_1), -1)  #N*T*V*5
41 |         x_in = x_in.view(N, -1)
42 | 
43 | 
44 | 
45 |         score = self.post_refine(x_in).view(N,T,V,2)
46 |         score_cm = Variable(torch.ones(score.size()), requires_grad=False).cuda() - score
47 |         x_out = x.clone()
48 |         x_out[:, :, :, :2] = score * x[:, :, :, :2] + score_cm * x_1[:, :, :, :2]
49 | 
50 |         return x_out


--------------------------------------------------------------------------------
/model/refine.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | from torch.autograd import Variable
 4 | 
 5 | fc_out = 256
 6 | fc_unit = 1024
 7 | 
 8 | class refine(nn.Module):
 9 |     def __init__(self, opt):
10 |         super().__init__()
11 | 
12 |         out_seqlen = 1
13 |         fc_in = opt.out_channels*2*out_seqlen*opt.n_joints
14 |         fc_out = opt.in_channels * opt.n_joints
15 | 
16 |         self.post_refine = nn.Sequential(
17 |             nn.Linear(fc_in, fc_unit),
18 |             nn.ReLU(inplace =False),
19 |             nn.Dropout(0.5,inplace =False),
20 |             nn.Linear(fc_unit, fc_out),
21 |             nn.Sigmoid()
22 |         )
23 | 
24 |     def forward(self, x, x_1):
25 |         N, T, V,_ = x.size()#256,1,17,3
26 |         x_in = torch.cat((x, x_1), -1) #torch.Size([256, 1, 17, 6])
27 |         x_in = x_in.view(N, -1) #torch.Size([256, 102])
28 | 
29 |         score = self.post_refine(x_in).view(N,T,V,2) #torch.Size([256, 1, 17, 2])
30 |         score_cm = Variable(torch.ones(score.size()), requires_grad=False).cuda() - score
31 |         x_out = x.clone()
32 |         x_out[:, :, :, :2] = score * x[:, :, :, :2] + score_cm * x_1[:, :, :, :2]#torch.Size([256, 1, 17, 3])
33 | 
34 |         return x_out
35 | 
36 | 
37 | 


--------------------------------------------------------------------------------
/model/trans.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | from einops.einops import rearrange
 3 | sys.path.append("..")
 4 | import torch
 5 | import torch.nn as nn
 6 | from model.Block import Hiremixer
 7 | from common.opt import opts
 8 | opt = opts().parse()
 9 | 
10 | 
11 | 
12 | class HTNet(nn.Module):
13 |     def __init__(self, args, adj):
14 |         super().__init__()
15 | 
16 |         if args == -1:
17 |             layers, channel, d_hid, length  = 3, 512, 1024, 27
18 |             self.num_joints_in, self.num_joints_out = 17, 17
19 |         else:
20 |             layers, channel, d_hid, length  = args.layers, args.channel, args.d_hid, args.frames
21 |             self.num_joints_in, self.num_joints_out = args.n_joints, args.out_joints
22 | 
23 |         self.patch_embed = nn.Linear(2, channel)
24 |         self.pos_embed = nn.Parameter(torch.zeros(1, self.num_joints_in, channel))
25 |         self.Hiremixer = Hiremixer(adj, layers, channel, d_hid, length=length)
26 |         self.fcn = nn.Linear(args.channel, 3)
27 | 
28 |     def forward(self, x):
29 |         x = rearrange(x, 'b f j c -> (b f) j c').contiguous()
30 |         x = self.patch_embed(x)
31 |         x = x + self.pos_embed
32 |         x = self.Hiremixer(x)
33 |         x = self.fcn(x)
34 |         x = x.view(x.shape[0], -1, self.num_joints_out, x.shape[2])
35 |         return x
36 | 
37 | 
38 | 


--------------------------------------------------------------------------------
/requirement.txt:
--------------------------------------------------------------------------------
1 | einops
2 | timm
3 | tensorboardX
4 | scipy
5 | filterpy
6 | tqdm


--------------------------------------------------------------------------------
/runs/README.md:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------