├── LICENSE
├── README.md
├── ckpt
└── README.md
├── common
├── camera.py
├── generator.py
├── h36m_dataset.py
├── load_data_hm36.py
├── mocap_dataset.py
├── opt.py
├── skeleton.py
└── utils.py
├── demo
├── figure
│ ├── lindan.jpg
│ └── messi.jpg
├── lib
│ ├── hrnet
│ │ ├── experiments
│ │ │ └── w48_384x288_adam_lr1e-3.yaml
│ │ ├── gen_kpts.py
│ │ └── lib
│ │ │ ├── config
│ │ │ ├── __init__.py
│ │ │ ├── __pycache__
│ │ │ │ ├── __init__.cpython-38.pyc
│ │ │ │ ├── __init__.cpython-39.pyc
│ │ │ │ ├── default.cpython-38.pyc
│ │ │ │ ├── default.cpython-39.pyc
│ │ │ │ ├── models.cpython-38.pyc
│ │ │ │ └── models.cpython-39.pyc
│ │ │ ├── default.py
│ │ │ └── models.py
│ │ │ ├── models
│ │ │ ├── __pycache__
│ │ │ │ ├── pose_hrnet.cpython-38.pyc
│ │ │ │ └── pose_hrnet.cpython-39.pyc
│ │ │ └── pose_hrnet.py
│ │ │ └── utils
│ │ │ ├── __pycache__
│ │ │ ├── coco_h36m.cpython-39.pyc
│ │ │ ├── inference.cpython-39.pyc
│ │ │ ├── transforms.cpython-39.pyc
│ │ │ └── utilitys.cpython-39.pyc
│ │ │ ├── coco_h36m.py
│ │ │ ├── inference.py
│ │ │ ├── transforms.py
│ │ │ └── utilitys.py
│ ├── preprocess.py
│ ├── sort
│ │ └── sort.py
│ └── yolov3
│ │ ├── bbox.py
│ │ ├── cfg
│ │ ├── tiny-yolo-voc.cfg
│ │ ├── yolo-voc.cfg
│ │ ├── yolo.cfg
│ │ └── yolov3.cfg
│ │ ├── darknet.py
│ │ ├── data
│ │ ├── coco.names
│ │ ├── pallete
│ │ └── voc.names
│ │ ├── human_detector.py
│ │ ├── preprocess.py
│ │ └── util.py
└── vis.py
├── figure
├── README.md
├── messi_pose.png
├── structure.png
└── wild.png
├── main.py
├── model
├── Block.py
├── GCN_conv.py
├── Transformer.py
├── __pycache__
│ ├── Block.cpython-39.pyc
│ ├── GCN_conv.cpython-39.pyc
│ ├── Transformer.cpython-39.pyc
│ └── trans.cpython-39.pyc
├── post_refine.py
├── refine.py
└── trans.py
├── requirement.txt
└── runs
└── README.md
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2022 vefalun
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # HTNet: Human Topology Aware Network for 3D Human Pose Estimation
2 |
3 |

4 |
5 | > [**HTNet: Human Topology Aware Network for 3D Human Pose Estimation**](https://arxiv.org/pdf/2302.09790),
6 | > Jialun Cai, Hong Liu, Runwei Ding , Wenhao Li, Jianbing Wu, Miaoju Ban
7 | > *In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023*
8 |
9 |
10 | ## Results on Human3.6M
11 |
12 | Protocol 1 (mean per-joint position error) when 2D keypoints detected by CPN and the ground truth of 2D poses.
13 |
14 | | Method | Train Epochs | MPJPE (CPN) | MPJPE (GT) |
15 | |:-------|:-------:|:-------:|:-------:|
16 | | GraFormer | 50 | 51.8 mm | 35.2 mm |
17 | | MGCN (w/refine)| 50 | 49.4 mm | 33.5 mm | 37.4 mm |
18 | | HTNet | 15 | 48.9 mm |34.0 mm|
19 | | HTNet (w/refine) | **15** | **47.6 mm** |**31.9 mm**|
20 |
21 |
22 | ## Get started directly
23 | Special thanks to [MHFormer](https://github.com/Vegetebird/MHFormer), we have completed a **beginner's guide** for image-based pose estimation.
24 | Only three steps that poses can be generated for your own images:(1) Download pretrained models (Yolov3 and HRNet) [here](https://drive.google.com/drive/folders/1LX5zhZGlZjckgfpNroWsuu84xyyFYE5X) and put them in the './demo/lib/checkpoint'; (2)download [pretrained model](https://drive.google.com/drive/folders/134lqqu-0I6aOYr7lRufa6fMTdqm7K9Qk) and put it in the './ckpt' directory; (3)
25 | Put your own images in the './demo/figure', and run:
26 | ```
27 | python demo/vis.py
28 | ```
29 | Then you can obtain the visualized pose in the "./demo/output", like:
30 | 
31 |
32 |
33 | ## Quick start
34 | To get started as quickly as possible, follow the instructions in this section. This should allow you train a model from scratch, test our pretrained models.
35 |
36 |
37 | ### Dependencies
38 | Make sure you have the following dependencies installed before proceeding:
39 | - Python 3.7+
40 | - PyTorch >= 1.10.0
41 | To setup the environment:
42 | ```sh
43 | pip install -r requirement.txt
44 | ```
45 |
46 |
47 | ### Dataset setup
48 | Please download the dataset [here](https://drive.google.com/drive/folders/1gNs5PrcaZ6gar7IiNZPNh39T7y6aPY3g) and refer to [VideoPose3D](https://github.com/facebookresearch/VideoPose3D) to set up the Human3.6M dataset ('./dataset' directory).
49 |
50 | ```bash
51 | ${POSE_ROOT}/
52 | |-- dataset
53 | | |-- data_3d_h36m.npz
54 | | |-- data_2d_h36m_gt.npz
55 | | |-- data_2d_h36m_cpn_ft_h36m_dbb.npz
56 | ```
57 |
58 | ### Evaluating our pre-trained models
59 | The pretrained model is [here](https://drive.google.com/drive/folders/134lqqu-0I6aOYr7lRufa6fMTdqm7K9Qk), please download it and put it in the './ckpt' directory. To achieve the performance in the paper, run:
60 | ```
61 | python main.py --reload --previous_dir "ckpt/cpn"
62 | ```
63 |
64 | ### Training your models
65 | If you want to train your own model, run:
66 | ```
67 | python main.py --train -n "your_model_name"
68 | ```
69 |
70 |
71 | ## Acknowledgement
72 |
73 | Our code is extended from the following repositories. We thank the authors for releasing the codes.
74 | - [MHFormer](https://github.com/Vegetebird/MHFormer)
75 | - [MGCN](https://github.com/ZhimingZo/Modulated-GCN)
76 | - [VideoPose3D](https://github.com/facebookresearch/VideoPose3D)
77 | - [3d-pose-baseline](https://github.com/una-dinosauria/3d-pose-baseline)
78 | - [3d_pose_baseline_pytorch](https://github.com/weigq/3d_pose_baseline_pytorch)
79 | - [StridedTransformer-Pose3D](https://github.com/Vegetebird/StridedTransformer-Pose3D)
80 | ## Licence
81 |
82 | This project is licensed under the terms of the MIT license.
83 |
--------------------------------------------------------------------------------
/ckpt/README.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/common/camera.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import numpy as np
3 | import torch
4 | def normalize_screen_coordinates(X, w, h):
5 | assert X.shape[-1] == 2
6 | return X / w * 2 - [1, h / w]
7 |
8 |
9 | def world_to_camera(X, R, t):
10 | Rt = wrap(qinverse, R)
11 | return wrap(qrot, np.tile(Rt, (*X.shape[:-1], 1)), X - t)
12 |
13 | def camera_to_world(X, R, t):
14 | return wrap(qrot, np.tile(R, (*X.shape[:-1], 1)), X) + t
15 |
16 |
17 | def wrap(func, *args, unsqueeze=False):
18 | args = list(args)
19 | for i, arg in enumerate(args):
20 | if type(arg) == np.ndarray:
21 | args[i] = torch.from_numpy(arg)
22 | if unsqueeze:
23 | args[i] = args[i].unsqueeze(0)
24 |
25 | result = func(*args)
26 |
27 | if isinstance(result, tuple):
28 | result = list(result)
29 | for i, res in enumerate(result):
30 | if type(res) == torch.Tensor:
31 | if unsqueeze:
32 | res = res.squeeze(0)
33 | result[i] = res.numpy()
34 | return tuple(result)
35 | elif type(result) == torch.Tensor:
36 | if unsqueeze:
37 | result = result.squeeze(0)
38 | return result.numpy()
39 | else:
40 | return result
41 |
42 | def qrot(q, v):
43 | assert q.shape[-1] == 4
44 | assert v.shape[-1] == 3
45 | assert q.shape[:-1] == v.shape[:-1]
46 |
47 | qvec = q[..., 1:]
48 | uv = torch.cross(qvec, v, dim=len(q.shape) - 1)
49 | uuv = torch.cross(qvec, uv, dim=len(q.shape) - 1)
50 | return (v + 2 * (q[..., :1] * uv + uuv))
51 |
52 |
53 |
54 |
55 | def qinverse(q, inplace=False):
56 | if inplace:
57 | q[..., 1:] *= -1
58 | return q
59 | else:
60 | w = q[..., :1]
61 | xyz = q[..., 1:]
62 | return torch.cat((w, -xyz), dim=len(q.shape) - 1)
63 |
64 |
65 |
66 | def get_uvd2xyz(uvd, gt_3D, cam):
67 | N, T, V,_ = uvd.size()
68 |
69 | dec_out_all = uvd.view(-1, T, V, 3).clone()
70 | root = gt_3D[:, :, 0, :].unsqueeze(-2).repeat(1, 1, V, 1).clone()
71 | enc_in_all = uvd[:, :, :, :2].view(-1, T, V, 2).clone()
72 |
73 | cam_f_all = cam[..., :2].view(-1,1,1,2).repeat(1,T,V,1)
74 | cam_c_all = cam[..., 2:4].view(-1,1,1,2).repeat(1,T,V,1)
75 |
76 | z_global = dec_out_all[:, :, :, 2]
77 | z_global[:, :, 0] = root[:, :, 0, 2]
78 | z_global[:, :, 1:] = dec_out_all[:, :, 1:, 2] + root[:, :, 1:, 2]
79 | z_global = z_global.unsqueeze(-1)
80 |
81 | uv = enc_in_all - cam_c_all
82 | xy = uv * z_global.repeat(1, 1, 1, 2) / cam_f_all
83 | xyz_global = torch.cat((xy, z_global), -1)
84 | xyz_offset = (xyz_global - xyz_global[:, :, 0, :].unsqueeze(-2).repeat(1, 1, V, 1))
85 |
86 | return xyz_offset
--------------------------------------------------------------------------------
/common/h36m_dataset.py:
--------------------------------------------------------------------------------
1 |
2 | import numpy as np
3 | import copy
4 | from common.skeleton import Skeleton
5 | from common.mocap_dataset import MocapDataset
6 | from common.camera import normalize_screen_coordinates
7 |
8 | h36m_skeleton = Skeleton(parents=[-1, 0, 1, 2, 3, 4, 0, 6, 7, 8, 9, 0, 11, 12, 13, 14, 12,
9 | 16, 17, 18, 19, 20, 19, 22, 12, 24, 25, 26, 27, 28, 27, 30],
10 | joints_left=[6, 7, 8, 9, 10, 16, 17, 18, 19, 20, 21, 22, 23],
11 | joints_right=[1, 2, 3, 4, 5, 24, 25, 26, 27, 28, 29, 30, 31])
12 |
13 | h36m_cameras_intrinsic_params = [
14 | {
15 | 'id': '54138969',
16 | 'center': [512.54150390625, 515.4514770507812],
17 | 'focal_length': [1145.0494384765625, 1143.7811279296875],
18 | 'radial_distortion': [-0.20709891617298126, 0.24777518212795258, -0.0030751503072679043],
19 | 'tangential_distortion': [-0.0009756988729350269, -0.00142447161488235],
20 | 'res_w': 1000,
21 | 'res_h': 1002,
22 | 'azimuth': 70,
23 | },
24 | {
25 | 'id': '55011271',
26 | 'center': [508.8486328125, 508.0649108886719],
27 | 'focal_length': [1149.6756591796875, 1147.5916748046875],
28 | 'radial_distortion': [-0.1942136287689209, 0.2404085397720337, 0.006819975562393665],
29 | 'tangential_distortion': [-0.0016190266469493508, -0.0027408944442868233],
30 | 'res_w': 1000,
31 | 'res_h': 1000,
32 | 'azimuth': -70,
33 | },
34 | {
35 | 'id': '58860488',
36 | 'center': [519.8158569335938, 501.40264892578125],
37 | 'focal_length': [1149.1407470703125, 1148.7989501953125],
38 | 'radial_distortion': [-0.2083381861448288, 0.25548800826072693, -0.0024604974314570427],
39 | 'tangential_distortion': [0.0014843869721516967, -0.0007599993259645998],
40 | 'res_w': 1000,
41 | 'res_h': 1000,
42 | 'azimuth': 110,
43 | },
44 | {
45 | 'id': '60457274',
46 | 'center': [514.9682006835938, 501.88201904296875],
47 | 'focal_length': [1145.5113525390625, 1144.77392578125],
48 | 'radial_distortion': [-0.198384091258049, 0.21832367777824402, -0.008947807364165783],
49 | 'tangential_distortion': [-0.0005872055771760643, -0.0018133620033040643],
50 | 'res_w': 1000,
51 | 'res_h': 1002,
52 | 'azimuth': -110,
53 | },
54 | ]
55 |
56 | h36m_cameras_extrinsic_params = {
57 | 'S1': [
58 | {
59 | 'orientation': [0.1407056450843811, -0.1500701755285263, -0.755240797996521, 0.6223280429840088],
60 | 'translation': [1841.1070556640625, 4955.28466796875, 1563.4454345703125],
61 | },
62 | {
63 | 'orientation': [0.6157187819480896, -0.764836311340332, -0.14833825826644897, 0.11794740706682205],
64 | 'translation': [1761.278564453125, -5078.0068359375, 1606.2650146484375],
65 | },
66 | {
67 | 'orientation': [0.14651472866535187, -0.14647851884365082, 0.7653023600578308, -0.6094175577163696],
68 | 'translation': [-1846.7777099609375, 5215.04638671875, 1491.972412109375],
69 | },
70 | {
71 | 'orientation': [0.5834008455276489, -0.7853162288665771, 0.14548823237419128, -0.14749594032764435],
72 | 'translation': [-1794.7896728515625, -3722.698974609375, 1574.8927001953125],
73 | },
74 | ],
75 | 'S2': [
76 | {},
77 | {},
78 | {},
79 | {},
80 | ],
81 | 'S3': [
82 | {},
83 | {},
84 | {},
85 | {},
86 | ],
87 | 'S4': [
88 | {},
89 | {},
90 | {},
91 | {},
92 | ],
93 | 'S5': [
94 | {
95 | 'orientation': [0.1467377245426178, -0.162370964884758, -0.7551892995834351, 0.6178938746452332],
96 | 'translation': [2097.3916015625, 4880.94482421875, 1605.732421875],
97 | },
98 | {
99 | 'orientation': [0.6159758567810059, -0.7626792192459106, -0.15728192031383514, 0.1189815029501915],
100 | 'translation': [2031.7008056640625, -5167.93310546875, 1612.923095703125],
101 | },
102 | {
103 | 'orientation': [0.14291371405124664, -0.12907841801643372, 0.7678384780883789, -0.6110143065452576],
104 | 'translation': [-1620.5948486328125, 5171.65869140625, 1496.43701171875],
105 | },
106 | {
107 | 'orientation': [0.5920479893684387, -0.7814217805862427, 0.1274748593568802, -0.15036417543888092],
108 | 'translation': [-1637.1737060546875, -3867.3173828125, 1547.033203125],
109 | },
110 | ],
111 | 'S6': [
112 | {
113 | 'orientation': [0.1337897777557373, -0.15692396461963654, -0.7571090459823608, 0.6198879480361938],
114 | 'translation': [1935.4517822265625, 4950.24560546875, 1618.0838623046875],
115 | },
116 | {
117 | 'orientation': [0.6147197484970093, -0.7628812789916992, -0.16174767911434174, 0.11819244921207428],
118 | 'translation': [1969.803955078125, -5128.73876953125, 1632.77880859375],
119 | },
120 | {
121 | 'orientation': [0.1529948115348816, -0.13529130816459656, 0.7646096348762512, -0.6112781167030334],
122 | 'translation': [-1769.596435546875, 5185.361328125, 1476.993408203125],
123 | },
124 | {
125 | 'orientation': [0.5916101336479187, -0.7804774045944214, 0.12832270562648773, -0.1561593860387802],
126 | 'translation': [-1721.668701171875, -3884.13134765625, 1540.4879150390625],
127 | },
128 | ],
129 | 'S7': [
130 | {
131 | 'orientation': [0.1435241848230362, -0.1631336808204651, -0.7548328638076782, 0.6188824772834778],
132 | 'translation': [1974.512939453125, 4926.3544921875, 1597.8326416015625],
133 | },
134 | {
135 | 'orientation': [0.6141672730445862, -0.7638262510299683, -0.1596645563840866, 0.1177929937839508],
136 | 'translation': [1937.0584716796875, -5119.7900390625, 1631.5665283203125],
137 | },
138 | {
139 | 'orientation': [0.14550060033798218, -0.12874816358089447, 0.7660516500473022, -0.6127139329910278],
140 | 'translation': [-1741.8111572265625, 5208.24951171875, 1464.8245849609375],
141 | },
142 | {
143 | 'orientation': [0.5912848114967346, -0.7821764349937439, 0.12445473670959473, -0.15196487307548523],
144 | 'translation': [-1734.7105712890625, -3832.42138671875, 1548.5830078125],
145 | },
146 | ],
147 | 'S8': [
148 | {
149 | 'orientation': [0.14110587537288666, -0.15589867532253265, -0.7561917304992676, 0.619644045829773],
150 | 'translation': [2150.65185546875, 4896.1611328125, 1611.9046630859375],
151 | },
152 | {
153 | 'orientation': [0.6169601678848267, -0.7647668123245239, -0.14846350252628326, 0.11158157885074615],
154 | 'translation': [2219.965576171875, -5148.453125, 1613.0440673828125],
155 | },
156 | {
157 | 'orientation': [0.1471444070339203, -0.13377119600772858, 0.7670128345489502, -0.6100369691848755],
158 | 'translation': [-1571.2215576171875, 5137.0185546875, 1498.1761474609375],
159 | },
160 | {
161 | 'orientation': [0.5927824378013611, -0.7825870513916016, 0.12147816270589828, -0.14631995558738708],
162 | 'translation': [-1476.913330078125, -3896.7412109375, 1547.97216796875],
163 | },
164 | ],
165 | 'S9': [
166 | {
167 | 'orientation': [0.15540587902069092, -0.15548215806484222, -0.7532095313072205, 0.6199594736099243],
168 | 'translation': [2044.45849609375, 4935.1171875, 1481.2275390625],
169 | },
170 | {
171 | 'orientation': [0.618784487247467, -0.7634735107421875, -0.14132238924503326, 0.11933968216180801],
172 | 'translation': [1990.959716796875, -5123.810546875, 1568.8048095703125],
173 | },
174 | {
175 | 'orientation': [0.13357827067375183, -0.1367100477218628, 0.7689454555511475, -0.6100738644599915],
176 | 'translation': [-1670.9921875, 5211.98583984375, 1528.387939453125],
177 | },
178 | {
179 | 'orientation': [0.5879399180412292, -0.7823407053947449, 0.1427614390850067, -0.14794869720935822],
180 | 'translation': [-1696.04345703125, -3827.099853515625, 1591.4127197265625],
181 | },
182 | ],
183 | 'S11': [
184 | {
185 | 'orientation': [0.15232472121715546, -0.15442320704460144, -0.7547563314437866, 0.6191070079803467],
186 | 'translation': [2098.440185546875, 4926.5546875, 1500.278564453125],
187 | },
188 | {
189 | 'orientation': [0.6189449429512024, -0.7600917220115662, -0.15300633013248444, 0.1255258321762085],
190 | 'translation': [2083.182373046875, -4912.1728515625, 1561.07861328125],
191 | },
192 | {
193 | 'orientation': [0.14943228662014008, -0.15650227665901184, 0.7681233882904053, -0.6026304364204407],
194 | 'translation': [-1609.8153076171875, 5177.3359375, 1537.896728515625],
195 | },
196 | {
197 | 'orientation': [0.5894251465797424, -0.7818877100944519, 0.13991211354732513, -0.14715361595153809],
198 | 'translation': [-1590.738037109375, -3854.1689453125, 1578.017578125],
199 | },
200 | ],
201 | }
202 |
203 |
204 | class Human36mDataset(MocapDataset):
205 | def __init__(self, path, opt, remove_static_joints=True):
206 | super().__init__(fps=50, skeleton=h36m_skeleton)
207 | self.train_list = ['S1', 'S5', 'S6', 'S7', 'S8']
208 | self.test_list = ['S9', 'S11']
209 |
210 | self._cameras = copy.deepcopy(h36m_cameras_extrinsic_params)
211 | for cameras in self._cameras.values():
212 | for i, cam in enumerate(cameras):
213 | cam.update(h36m_cameras_intrinsic_params[i])
214 | for k, v in cam.items():
215 | if k not in ['id', 'res_w', 'res_h']:
216 | cam[k] = np.array(v, dtype='float32')
217 |
218 | if opt.crop_uv == 0:
219 | cam['center'] = normalize_screen_coordinates(cam['center'], w=cam['res_w'], h=cam['res_h']).astype(
220 | 'float32')
221 | cam['focal_length'] = cam['focal_length'] / cam['res_w'] * 2
222 |
223 | if 'translation' in cam:
224 | cam['translation'] = cam['translation'] / 1000
225 |
226 | cam['intrinsic'] = np.concatenate((cam['focal_length'],
227 | cam['center'],
228 | cam['radial_distortion'],
229 | cam['tangential_distortion']))
230 |
231 | data = np.load(path,allow_pickle=True)['positions_3d'].item()
232 |
233 | self._data = {}
234 | for subject, actions in data.items():
235 | self._data[subject] = {}
236 | for action_name, positions in actions.items():
237 | self._data[subject][action_name] = {
238 | 'positions': positions,
239 | 'cameras': self._cameras[subject],
240 | }
241 |
242 | if remove_static_joints:
243 | self.remove_joints([4, 5, 9, 10, 11, 16, 20, 21, 22, 23, 24, 28, 29, 30, 31])
244 |
245 | self._skeleton._parents[11] = 8
246 | self._skeleton._parents[14] = 8
247 |
248 | def supports_semi_supervised(self):
249 | return True
250 |
251 |
252 |
253 |
--------------------------------------------------------------------------------
/common/load_data_hm36.py:
--------------------------------------------------------------------------------
1 |
2 | import torch.utils.data as data
3 | import numpy as np
4 |
5 | from common.utils import deterministic_random
6 | from common.camera import world_to_camera, normalize_screen_coordinates
7 | from common.generator import ChunkedGenerator, ChunkedGenerator_Seq
8 |
9 | class Fusion(data.Dataset): #crop:0, downsample:0 pad:0 stride:1
10 | def __init__(self, opt, dataset, root_path, train=True):
11 | self.data_type = opt.dataset
12 | self.train = train
13 | self.keypoints_name = opt.keypoints
14 | self.root_path = root_path
15 |
16 | self.train_list = opt.subjects_train.split(',')
17 | self.test_list = opt.subjects_test.split(',')
18 | self.action_filter = None if opt.actions == '*' else opt.actions.split(',')
19 | self.downsample = opt.downsample
20 | self.subset = opt.subset
21 | self.stride = opt.stride
22 | self.crop_uv = opt.crop_uv
23 | self.test_aug = opt.test_augmentation
24 | self.pad = opt.pad
25 | causal_shift = 0
26 | if self.train:
27 | self.keypoints = self.prepare_data(dataset, self.train_list)
28 | self.cameras_train, self.poses_train, self.poses_train_2d = self.fetch(dataset, self.train_list,
29 | subset=self.subset)
30 | self.generator = ChunkedGenerator(opt.batch_size, self.cameras_train, self.poses_train,
31 | self.poses_train_2d, self.stride, pad=self.pad,
32 | augment=opt.data_augmentation, reverse_aug=opt.reverse_augmentation,
33 | kps_left=self.kps_left, kps_right=self.kps_right,
34 | joints_left=self.joints_left,
35 | joints_right=self.joints_right, out_all=opt.out_all)
36 |
37 | print('INFO: Training on {} frames'.format(self.generator.num_frames()))
38 | else:
39 | self.keypoints = self.prepare_data(dataset, self.test_list)
40 | self.cameras_test, self.poses_test, self.poses_test_2d = self.fetch(dataset, self.test_list,
41 | subset=self.subset)
42 | self.generator = ChunkedGenerator(opt.batch_size, self.cameras_test, self.poses_test,
43 | self.poses_test_2d,
44 | pad=self.pad, augment=False, kps_left=self.kps_left,
45 | kps_right=self.kps_right, joints_left=self.joints_left,
46 | joints_right=self.joints_right)
47 | self.key_index = self.generator.saved_index
48 | print('INFO: Testing on {} frames'.format(self.generator.num_frames()))
49 |
50 | def prepare_data(self, dataset, folder_list):
51 | for subject in folder_list:
52 | for action in dataset[subject].keys():
53 | anim = dataset[subject][action]
54 |
55 | positions_3d = []
56 | for cam in anim['cameras']:
57 | pos_3d = world_to_camera(anim['positions'], R=cam['orientation'], t=cam['translation'])
58 | pos_3d[:, 1:] -= pos_3d[:, :1]#(1265, 17, 3)
59 | positions_3d.append(pos_3d)
60 | anim['positions_3d'] = positions_3d
61 |
62 | keypoints = np.load(self.root_path + 'data_2d_' + self.data_type + '_' + self.keypoints_name + '.npz',allow_pickle=True)
63 | keypoints_symmetry = keypoints['metadata'].item()['keypoints_symmetry']
64 |
65 | self.kps_left, self.kps_right = list(keypoints_symmetry[0]), list(keypoints_symmetry[1])
66 | self.joints_left, self.joints_right = list(dataset.skeleton().joints_left()), list(dataset.skeleton().joints_right())
67 | keypoints = keypoints['positions_2d'].item()
68 |
69 | for subject in folder_list:
70 | assert subject in keypoints, 'Subject {} is missing from the 2D detections dataset'.format(subject)
71 | for action in dataset[subject].keys():
72 | assert action in keypoints[
73 | subject], 'Action {} of subject {} is missing from the 2D detections dataset'.format(action,
74 | subject)
75 | for cam_idx in range(len(keypoints[subject][action])):
76 |
77 | mocap_length = dataset[subject][action]['positions_3d'][cam_idx].shape[0]
78 | assert keypoints[subject][action][cam_idx].shape[0] >= mocap_length
79 |
80 | if keypoints[subject][action][cam_idx].shape[0] > mocap_length:
81 | keypoints[subject][action][cam_idx] = keypoints[subject][action][cam_idx][:mocap_length]
82 |
83 | for subject in keypoints.keys():
84 | for action in keypoints[subject]:
85 | for cam_idx, kps in enumerate(keypoints[subject][action]):
86 | cam = dataset.cameras()[subject][cam_idx]
87 | if self.crop_uv == 0:
88 | kps[..., :2] = normalize_screen_coordinates(kps[..., :2], w=cam['res_w'], h=cam['res_h'])
89 | keypoints[subject][action][cam_idx] = kps
90 |
91 | return keypoints
92 |
93 | def fetch(self, dataset, subjects, subset=1, parse_3d_poses=True): #self.cameras_train, self.poses_train, self.poses_train_2d
94 | out_poses_3d = {}
95 | out_poses_2d = {}
96 | out_camera_params = {}
97 |
98 | for subject in subjects:
99 | for action in self.keypoints[subject].keys():
100 | if self.action_filter is not None:
101 | found = False
102 | for a in self.action_filter:
103 | if action.startswith(a):
104 | found = True
105 | break
106 | if not found:
107 | continue
108 |
109 | poses_2d = self.keypoints[subject][action]
110 |
111 | for i in range(len(poses_2d)):
112 | out_poses_2d[(subject, action, i)] = poses_2d[i]
113 |
114 | if subject in dataset.cameras():
115 | cams = dataset.cameras()[subject]
116 | assert len(cams) == len(poses_2d), 'Camera count mismatch'
117 | for i, cam in enumerate(cams):
118 | if 'intrinsic' in cam:
119 | out_camera_params[(subject, action, i)] = cam['intrinsic']
120 |
121 | if parse_3d_poses and 'positions_3d' in dataset[subject][action]:
122 | poses_3d = dataset[subject][action]['positions_3d']
123 | assert len(poses_3d) == len(poses_2d), 'Camera count mismatch'
124 | for i in range(len(poses_3d)):
125 | out_poses_3d[(subject, action, i)] = poses_3d[i]
126 |
127 | if len(out_camera_params) == 0:
128 | out_camera_params = None
129 | if len(out_poses_3d) == 0:
130 | out_poses_3d = None
131 |
132 | stride = self.downsample
133 | if subset < 1:
134 | for key in out_poses_2d.keys():
135 | n_frames = int(round(len(out_poses_2d[key]) // stride * subset) * stride)
136 | start = deterministic_random(0, len(out_poses_2d[key]) - n_frames + 1, str(len(out_poses_2d[key])))
137 | out_poses_2d[key] = out_poses_2d[key][start:start + n_frames:stride]
138 | if out_poses_3d is not None:
139 | out_poses_3d[key] = out_poses_3d[key][start:start + n_frames:stride]
140 | elif stride > 1: #这一步
141 | for key in out_poses_2d.keys():
142 | out_poses_2d[key] = out_poses_2d[key][::stride]
143 | if out_poses_3d is not None:
144 | out_poses_3d[key] = out_poses_3d[key][::stride]
145 |
146 | return out_camera_params, out_poses_3d, out_poses_2d
147 |
148 | def __len__(self):
149 | return len(self.generator.pairs)
150 |
151 | def __getitem__(self, index):
152 | seq_name, start_3d, end_3d, flip, reverse = self.generator.pairs[index]
153 |
154 | cam, gt_3D, input_2D, action, subject, cam_ind = self.generator.get_batch(seq_name, start_3d, end_3d, flip, reverse)
155 |
156 | if self.train == False and self.test_aug:
157 | _, _, input_2D_aug, _, _,_ = self.generator.get_batch(seq_name, start_3d, end_3d, flip=True, reverse=reverse)
158 | input_2D = np.concatenate((np.expand_dims(input_2D,axis=0),np.expand_dims(input_2D_aug,axis=0)),0)
159 |
160 | bb_box = np.array([0, 0, 1, 1])
161 | input_2D_update = input_2D
162 |
163 | scale = np.float(1.0)
164 |
165 | return cam, gt_3D, input_2D_update, action, subject, scale, bb_box, cam_ind
166 |
167 |
168 |
169 |
--------------------------------------------------------------------------------
/common/mocap_dataset.py:
--------------------------------------------------------------------------------
1 |
2 |
3 | class MocapDataset:
4 | def __init__(self, fps, skeleton):
5 | self._skeleton = skeleton
6 | self._fps = fps
7 | self._data = None
8 | self._cameras = None
9 |
10 | def remove_joints(self, joints_to_remove):
11 | kept_joints = self._skeleton.remove_joints(joints_to_remove)
12 | for subject in self._data.keys():
13 | for action in self._data[subject].keys():
14 | s = self._data[subject][action]
15 | s['positions'] = s['positions'][:, kept_joints]
16 |
17 | def __getitem__(self, key):
18 | return self._data[key]
19 |
20 | def subjects(self):
21 | return self._data.keys()
22 |
23 | def fps(self):
24 | return self._fps
25 |
26 | def skeleton(self):
27 | return self._skeleton
28 |
29 | def cameras(self):
30 | return self._cameras
31 |
32 | def supports_semi_supervised(self):
33 | return False
34 |
35 |
36 |
--------------------------------------------------------------------------------
/common/opt.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | from email.policy import default
3 | import os
4 | import math
5 | import time
6 | import torch
7 |
8 | class opts():
9 | def __init__(self):
10 | self.parser = argparse.ArgumentParser()
11 |
12 | def init(self):
13 | #model args
14 | self.parser.add_argument('--layers', default=3, type=int)
15 | self.parser.add_argument('--channel', default=240, type=int,help="Must be a multiple of 24")
16 | self.parser.add_argument('--frames', type=int, default=1)
17 | self.parser.add_argument('--pad', type=int, default=0)
18 | self.parser.add_argument('-n','--model_name', type=str, default='your_model', help='Name of your model')
19 | self.parser.add_argument('--d_hid', default=1024, type=int)
20 | self.parser.add_argument('--n_joints', type=int, default=17)
21 | self.parser.add_argument('--out_joints', type=int, default=17)
22 | self.parser.add_argument('--in_channels', type=int, default=2)
23 | self.parser.add_argument('--out_channels', type=int, default=3)
24 |
25 |
26 |
27 | #train args
28 | self.parser.add_argument('--gpu', default='0', type=str, help='')
29 | self.parser.add_argument('--train', action='store_true')
30 | self.parser.add_argument('--nepoch', type=int, default=300)
31 | self.parser.add_argument('--batch_size', type=int, default=512)
32 | self.parser.add_argument('--dataset', type=str, default='h36m')
33 | self.parser.add_argument('--lr', type=float, default=0.0005)
34 | self.parser.add_argument('--large_decay_epoch', type=int, default=5)
35 | self.parser.add_argument('-lrd', '--lr_decay', default=0.95, type=float)
36 | self.parser.add_argument('--lr_decay_large', type=float, default=0.5)
37 | self.parser.add_argument('--min_lr', type=float, default=1e-6, metavar='LR',
38 | help='lower lr bound for cyclic schedulers that hit 0')
39 | self.parser.add_argument('--workers', type=int, default=4)
40 | self.parser.add_argument('--out_all', type=int, default=1)
41 | self.parser.add_argument('--drop',default=0.2, type=float)
42 | self.parser.add_argument('--seed',default=1, type=int)
43 | self.parser.add_argument('-k', '--keypoints', default='cpn_ft_h36m_dbb', type=str)
44 | self.parser.add_argument('--data_augmentation', type=bool, default=True)
45 | self.parser.add_argument('--test_augmentation', type=bool, default=True)
46 | self.parser.add_argument('--reverse_augmentation', type=bool, default=False)
47 | self.parser.add_argument('--root_path', type=str, default='./dataset/',help='Put the dataset into this file')
48 | self.parser.add_argument('-a', '--actions', default='*', type=str)
49 | self.parser.add_argument('--downsample', default=1, type=int)
50 | self.parser.add_argument('--subset', default=1, type=float)
51 | self.parser.add_argument('--stride', default=1, type=float)
52 | self.parser.add_argument('--lr_min',type=float,default=0,help='Min learn rate')
53 |
54 |
55 | # test args
56 | self.parser.add_argument('--test', type=int, default=1)
57 | self.parser.add_argument('--reload', action='store_true')
58 | self.parser.add_argument('--previous_dir', type=str, default='./ckpt/your_model')
59 | self.parser.add_argument('--previous',type=str,default='ckpt')
60 | self.parser.add_argument('-previous_best_threshold', type=float, default= math.inf)
61 | self.parser.add_argument('-previous_name', type=str, default='')
62 | self.parser.add_argument('--viz', type=str, default='try')
63 |
64 | #refine
65 | self.parser.add_argument('--refine', action='store_true')
66 | self.parser.add_argument('--crop_uv', type=int, default=0)
67 | self.parser.add_argument('--lr_refine', type=float, default=1e-5)
68 | self.parser.add_argument('--refine_train_reload', action='store_true')
69 | self.parser.add_argument('--refine_test_reload', action='store_true')
70 | self.parser.add_argument('--previous_refine_name', type=str, default='')
71 |
72 | #vis
73 | self.parser.add_argument('--figure', type=str, default='demo.jpg', help='input figure')
74 | self.parser.add_argument('--video', type=str, default='demo.jpg', help='input figure')
75 |
76 |
77 |
78 | def parse(self):
79 | self.init()
80 | self.opt = self.parser.parse_args()
81 | self.opt.pad = (self.opt.frames-1) // 2
82 | self.opt.subjects_train = 'S1,S5,S6,S7,S8'
83 | self.opt.subjects_test = 'S9,S11'
84 |
85 | if self.opt.train:
86 | self.opt.checkpoint = 'ckpt/' + self.opt.model_name
87 | if not os.path.exists(self.opt.checkpoint):
88 | os.makedirs(self.opt.checkpoint)
89 |
90 |
91 | args = dict((name, getattr(self.opt, name)) for name in dir(self.opt)
92 | if not name.startswith('_'))
93 | file_name = os.path.join(self.opt.checkpoint, 'opt.txt')
94 | with open(file_name, 'wt') as opt_file:
95 | opt_file.write('==> Args:\n')
96 | for k, v in sorted(args.items()):
97 | opt_file.write(' %s: %s\n' % (str(k), str(v)))
98 | opt_file.write('==> Args:\n')
99 |
100 | return self.opt
101 |
102 |
103 |
104 |
105 |
106 |
107 |
--------------------------------------------------------------------------------
/common/skeleton.py:
--------------------------------------------------------------------------------
1 |
2 | import numpy as np
3 |
4 | class Skeleton:
5 | def __init__(self, parents, joints_left, joints_right):
6 | assert len(joints_left) == len(joints_right)
7 |
8 | self._parents = np.array(parents)
9 | self._joints_left = joints_left
10 | self._joints_right = joints_right
11 | self._compute_metadata()
12 |
13 | def num_joints(self):
14 | return len(self._parents)
15 |
16 | def parents(self):
17 | return self._parents
18 |
19 | def has_children(self):
20 | return self._has_children
21 |
22 | def children(self):
23 | return self._children
24 |
25 | def remove_joints(self, joints_to_remove):
26 |
27 | valid_joints = []
28 | for joint in range(len(self._parents)):
29 | if joint not in joints_to_remove:
30 | valid_joints.append(joint)
31 |
32 | for i in range(len(self._parents)):
33 | while self._parents[i] in joints_to_remove:
34 | self._parents[i] = self._parents[self._parents[i]]
35 |
36 | index_offsets = np.zeros(len(self._parents), dtype=int)
37 | new_parents = []
38 | for i, parent in enumerate(self._parents):
39 | if i not in joints_to_remove:
40 | new_parents.append(parent - index_offsets[parent])
41 | else:
42 | index_offsets[i:] += 1
43 | self._parents = np.array(new_parents)
44 |
45 | if self._joints_left is not None:
46 | new_joints_left = []
47 | for joint in self._joints_left:
48 | if joint in valid_joints:
49 | new_joints_left.append(joint - index_offsets[joint])
50 | self._joints_left = new_joints_left
51 | if self._joints_right is not None:
52 | new_joints_right = []
53 | for joint in self._joints_right:
54 | if joint in valid_joints:
55 | new_joints_right.append(joint - index_offsets[joint])
56 | self._joints_right = new_joints_right
57 |
58 | self._compute_metadata()
59 |
60 | return valid_joints
61 |
62 | def joints_left(self):
63 | return self._joints_left
64 |
65 | def joints_right(self):
66 | return self._joints_right
67 |
68 | def _compute_metadata(self):
69 | self._has_children = np.zeros(len(self._parents)).astype(bool)
70 | for i, parent in enumerate(self._parents):
71 | if parent != -1:
72 | self._has_children[parent] = True
73 |
74 | self._children = []
75 | for i, parent in enumerate(self._parents):
76 | self._children.append([])
77 | for i, parent in enumerate(self._parents):
78 | if parent != -1:
79 | self._children[parent].append(i)
80 |
81 |
82 |
--------------------------------------------------------------------------------
/common/utils.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import numpy as np
3 | import hashlib
4 | from torch.autograd import Variable
5 | import os
6 |
7 | def deterministic_random(min_value, max_value, data):
8 | digest = hashlib.sha256(data.encode()).digest()
9 | raw_value = int.from_bytes(digest[:4], byteorder='little', signed=False)
10 | return int(raw_value / (2 ** 32 - 1) * (max_value - min_value)) + min_value
11 |
12 |
13 | def mpjpe_cal_mask(predicted, target, mask):
14 | assert predicted.shape == target.shape
15 | # index = [i for i in range(17) if i in mask]
16 | predicted = predicted[:,:,mask,:]
17 | target = target[:,:,mask,:]
18 | return torch.mean(torch.norm(predicted - target, dim=len(target.shape) - 1)).contiguous()
19 |
20 | def mpjpe_cal(predicted, target):
21 | assert predicted.shape == target.shape
22 | return torch.mean(torch.norm(predicted - target, dim=len(target.shape) - 1)).contiguous()
23 |
24 | def skeloss(predicted, target):
25 | assert predicted.shape == target.shape
26 | start = [0,1,2, 0,4,5, 0,7,8,9, 8,14,15, 8,11,12]
27 | end = [1,2,3, 4,5,6, 7,8,9,10, 14,15,16, 11,12,13]
28 | ske_predicted = torch.zeros(len(start))
29 | ske_target = torch.zeros(len(start))
30 | for i in range(len(start)):
31 | ske_predicted[i] = torch.mean(torch.norm(predicted[:,:,start[i],:] - predicted[:,:,end[i],:], dim=2)).contiguous()
32 | ske_target[i] = torch.mean(torch.norm(target[:,:,start[i],:] - target[:,:,end[i],:], dim=2)).contiguous()
33 |
34 |
35 |
36 | return torch.mean(torch.norm(ske_predicted[i] - ske_target[i]))
37 |
38 |
39 |
40 |
41 |
42 | def frame_loss(predicted):#256,9,17,2
43 | loss = 0
44 | for k in range(predicted.size(0)-1):
45 | for i in range(predicted.size(1)-1):
46 | for j in range(predicted.size(2)-1):
47 | loss += (predicted[k+1,i+1,j+1,0] - predicted[k,i,j,0])**2
48 | loss += (predicted[k+1,i+1,j+1,1] - predicted[k,i,j,1])**2
49 | return loss
50 |
51 | return torch.mean(torch.norm(predicted - target, dim=len(target.shape) - 1)).contiguous()
52 |
53 | ## viz loss
54 | def p_mpjpe(predicted, target):
55 | """
56 | Pose error: MPJPE after rigid alignment (scale, rotation, and translation),
57 | often referred to as "Protocol #2" in many papers.
58 | """
59 | assert predicted.shape == target.shape
60 |
61 | muX = np.mean(target, axis=1, keepdims=True)
62 | muY = np.mean(predicted, axis=1, keepdims=True)
63 |
64 | X0 = target - muX
65 | Y0 = predicted - muY
66 |
67 | normX = np.sqrt(np.sum(X0**2, axis=(1, 2), keepdims=True))
68 | normY = np.sqrt(np.sum(Y0**2, axis=(1, 2), keepdims=True))
69 |
70 | X0 /= normX
71 | Y0 /= normY
72 |
73 | H = np.matmul(X0.transpose(0, 2, 1), Y0)
74 | U, s, Vt = np.linalg.svd(H)
75 | V = Vt.transpose(0, 2, 1)
76 | R = np.matmul(V, U.transpose(0, 2, 1))
77 |
78 | # Avoid improper rotations (reflections), i.e. rotations with det(R) = -1
79 | sign_detR = np.sign(np.expand_dims(np.linalg.det(R), axis=1))
80 | V[:, :, -1] *= sign_detR
81 | s[:, -1] *= sign_detR.flatten()
82 | R = np.matmul(V, U.transpose(0, 2, 1)) # Rotation
83 |
84 | tr = np.expand_dims(np.sum(s, axis=1, keepdims=True), axis=2)
85 |
86 | a = tr * normX / normY # Scale
87 | t = muX - a*np.matmul(muY, R) # Translation
88 |
89 | # Perform rigid transformation on the input
90 | predicted_aligned = a*np.matmul(predicted, R) + t
91 |
92 | # Return MPJPE
93 | return np.mean(np.linalg.norm(predicted_aligned - target, axis=len(target.shape)-1))
94 |
95 |
96 |
97 |
98 | def compute_PCK(gts, preds, scales=1000, eval_joints=None, threshold=150):
99 | PCK_THRESHOLD = threshold
100 | sample_num = len(gts)
101 | total = 0
102 | true_positive = 0
103 | if eval_joints is None:
104 | eval_joints = list(range(gts.shape[1]))
105 |
106 | for n in range(sample_num):
107 | gt = gts[n]
108 | pred = preds[n]
109 | # scale = scales[n]
110 | scale = 1000
111 | per_joint_error = np.take(np.sqrt(np.sum(np.power(pred - gt, 2), 1)) * scale, eval_joints, axis=0)
112 | true_positive += (per_joint_error < PCK_THRESHOLD).sum()
113 | total += per_joint_error.size
114 |
115 | pck = float(true_positive / total) * 100
116 | return pck
117 |
118 |
119 | def compute_AUC(gts, preds, scales=1000, eval_joints=None):
120 | # This range of thresholds mimics 'mpii_compute_3d_pck.m', which is provided as part of the
121 | # MPI-INF-3DHP test data release.
122 | thresholds = np.linspace(0, 150, 31)
123 | pck_list = []
124 | for threshold in thresholds:
125 | pck_list.append(compute_PCK(gts, preds, scales, eval_joints, threshold))
126 |
127 | auc = np.mean(pck_list)
128 |
129 | return auc
130 |
131 |
132 | def mean_velocity_error(predicted, target):
133 | """
134 | Mean per-joint velocity error (i.e. mean Euclidean distance of the 1st derivative)
135 | """
136 | assert predicted.shape == target.shape
137 |
138 | velocity_predicted = np.diff(predicted, axis=0)
139 | velocity_target = np.diff(target, axis=0)
140 |
141 | return np.mean(np.linalg.norm(velocity_predicted - velocity_target, axis=len(target.shape)-1))
142 |
143 | def weighted_mpjpe(predicted, target, w):
144 | """
145 | Weighted mean per-joint position error (i.e. mean Euclidean distance)
146 | """
147 | assert predicted.shape == target.shape
148 | assert w.shape[0] == predicted.shape[0]
149 | return torch.mean(w * torch.norm(predicted - target, dim=len(target.shape)-1))
150 |
151 |
152 |
153 | # def test_calculation(predicted, target, action, error_sum, data_type, subject):
154 | # error_sum = mpjpe_by_action(predicted, target, action, error_sum)
155 |
156 | # return error_sum
157 |
158 | def test_calculation(predicted, target, action, error_sum, data_type, subject):
159 | error_sum = mpjpe_by_action_p1(predicted, target, action, error_sum)
160 | error_sum = mpjpe_by_action_p2(predicted, target, action, error_sum)
161 |
162 | return error_sum
163 |
164 |
165 |
166 |
167 | def mpjpe_by_action(predicted, target, action, action_error_sum):
168 | assert predicted.shape == target.shape
169 | num = predicted.size(0)
170 | dist = torch.mean(torch.norm(predicted - target, dim=len(target.shape) - 1), dim=len(target.shape) - 2)
171 |
172 | if len(set(list(action))) == 1:
173 | end_index = action[0].find(' ')
174 | if end_index != -1:
175 | action_name = action[0][:end_index]
176 | else:
177 | action_name = action[0]
178 |
179 | action_error_sum[action_name].update(torch.mean(dist).item()*num, num)
180 | else:
181 | for i in range(num):
182 | end_index = action[i].find(' ')
183 | if end_index != -1:
184 | action_name = action[i][:end_index]
185 | else:
186 | action_name = action[i]
187 |
188 | action_error_sum[action_name].update(dist[i].item(), 1)
189 |
190 | return action_error_sum
191 |
192 |
193 | def mpjpe_by_action_p1(predicted, target, action, action_error_sum):
194 | assert predicted.shape == target.shape
195 | num = predicted.size(0)
196 | dist = torch.mean(torch.norm(predicted - target, dim=len(target.shape) - 1), dim=len(target.shape) - 2)
197 |
198 | if len(set(list(action))) == 1:
199 | end_index = action[0].find(' ')
200 | if end_index != -1:
201 | action_name = action[0][:end_index]
202 | else:
203 | action_name = action[0]
204 |
205 | action_error_sum[action_name]['p1'].update(torch.mean(dist).item()*num, num)
206 | else:
207 | for i in range(num):
208 | end_index = action[i].find(' ')
209 | if end_index != -1:
210 | action_name = action[i][:end_index]
211 | else:
212 | action_name = action[i]
213 |
214 | action_error_sum[action_name]['p1'].update(dist[i].item(), 1)
215 |
216 | return action_error_sum
217 |
218 | def mpjpe_by_action_p2(predicted, target, action, action_error_sum):
219 | assert predicted.shape == target.shape
220 | num = predicted.size(0)
221 | pred = predicted.detach().cpu().numpy().reshape(-1, predicted.shape[-2], predicted.shape[-1])
222 | gt = target.detach().cpu().numpy().reshape(-1, target.shape[-2], target.shape[-1])
223 | dist = p_mpjpe(pred, gt)
224 |
225 | if len(set(list(action))) == 1:
226 | end_index = action[0].find(' ')
227 | if end_index != -1:
228 | action_name = action[0][:end_index]
229 | else:
230 | action_name = action[0]
231 | action_error_sum[action_name]['p2'].update(np.mean(dist) * num, num)
232 | else:
233 | for i in range(num):
234 | end_index = action[i].find(' ')
235 | if end_index != -1:
236 | action_name = action[i][:end_index]
237 | else:
238 | action_name = action[i]
239 | action_error_sum[action_name]['p2'].update(np.mean(dist), 1)
240 |
241 | return action_error_sum
242 |
243 |
244 |
245 | def mpjpe_by_joint_mae(predicted, target,num):
246 | assert predicted.shape == target.shape
247 | # this is the joint
248 | mpjpe_joint = torch.mean(torch.mean(torch.norm(predicted - target, dim=len(target.shape) - 1), dim=len(target.shape) - 3),dim=len(target.shape)-4)
249 | print("\nthe mpjpe/joint",mpjpe_joint)
250 | # this is the order of joint from big to small
251 | index = torch.flip(mpjpe_joint.sort(-1).indices,dims=[0])
252 | index = index.split(num,-1)[0]
253 | print("\nerror joint",index)
254 | return index
255 |
256 |
257 |
258 |
259 |
260 | def define_actions( action ):
261 |
262 | actions = ["Directions","Discussion","Eating","Greeting",
263 | "Phoning","Photo","Posing","Purchases",
264 | "Sitting","SittingDown","Smoking","Waiting",
265 | "WalkDog","Walking","WalkTogether"]
266 |
267 | if action == "All" or action == "all" or action == '*':
268 | return actions
269 |
270 | if not action in actions:
271 | raise( ValueError, "Unrecognized action: %s" % action )
272 |
273 | return [action]
274 |
275 |
276 | def define_error_list(actions):
277 | error_sum = {}
278 | error_sum.update({actions[i]:
279 | {'p1':AccumLoss(), 'p2':AccumLoss()}
280 | for i in range(len(actions))})
281 | return error_sum
282 |
283 | # def define_error_list(actions):
284 | # error_sum = {}
285 | # error_sum.update({actions[i]: AccumLoss() for i in range(len(actions))})
286 | # return error_sum
287 |
288 |
289 | class AccumLoss(object):
290 | def __init__(self):
291 | self.val = 0
292 | self.avg = 0
293 | self.sum = 0
294 | self.count = 0
295 |
296 | def update(self, val, n=1):
297 | self.val = val
298 | self.sum += val
299 | self.count += n
300 | self.avg = self.sum / self.count
301 |
302 |
303 | def get_varialbe(split, target):
304 | num = len(target)
305 | var = []
306 | if split == 'train':
307 | for i in range(num):
308 | temp = Variable(target[i], requires_grad=False).contiguous().type(torch.cuda.FloatTensor)
309 | var.append(temp)
310 | else:
311 | for i in range(num):
312 | temp = Variable(target[i]).contiguous().cuda().type(torch.cuda.FloatTensor)
313 | var.append(temp)
314 |
315 | return var
316 |
317 |
318 |
319 |
320 | def print_error(data_type, action_error_sum, is_train):
321 | mean_error_p1, mean_error_p2 = print_error_action(action_error_sum, is_train)
322 |
323 | return mean_error_p1, mean_error_p2
324 |
325 |
326 |
327 |
328 | def print_error_action(action_error_sum, is_train):
329 | mean_error_each = {'p1': 0.0, 'p2': 0.0}
330 | mean_error_all = {'p1': AccumLoss(), 'p2': AccumLoss()}
331 |
332 | if is_train == 0:
333 | print("{0:=^12} {1:=^10} {2:=^8}".format("Action", "p#1 mm", "p#2 mm"))
334 |
335 |
336 | for action, value in action_error_sum.items():
337 | if is_train == 0:
338 | print("{0:<12} ".format(action), end="")
339 |
340 | mean_error_each['p1'] = action_error_sum[action]['p1'].avg * 1000.0
341 | mean_error_all['p1'].update(mean_error_each['p1'], 1)
342 |
343 | mean_error_each['p2'] = action_error_sum[action]['p2'].avg * 1000.0
344 | mean_error_all['p2'].update(mean_error_each['p2'], 1)
345 |
346 | if is_train == 0:
347 | print("{0:>6.2f} {1:>10.2f}".format(mean_error_each['p1'], mean_error_each['p2']))
348 |
349 | if is_train == 0:
350 | print("{0:<12} {1:>6.2f} {2:>10.2f}".format("Average", mean_error_all['p1'].avg, \
351 | mean_error_all['p2'].avg))
352 |
353 | return mean_error_all['p1'].avg, mean_error_all['p2'].avg
354 |
355 |
356 |
357 | def save_model_refine(previous_name, save_dir,epoch, data_threshold, model, model_name):#
358 | if os.path.exists(previous_name):
359 | os.remove(previous_name)
360 |
361 | torch.save(model.state_dict(),
362 | '%s/%s_%d_%d.pth' % (save_dir, model_name, epoch, data_threshold * 100))
363 | previous_name = '%s/%s_%d_%d.pth' % (save_dir, model_name, epoch, data_threshold * 100)
364 |
365 | return previous_name
366 |
367 |
368 | def save_model(previous_name, save_dir, epoch, data_threshold, model):
369 | if os.path.exists(previous_name):
370 | os.remove(previous_name)
371 |
372 | torch.save(model.state_dict(),
373 | '%s/model_%d_%d.pth' % (save_dir, epoch, data_threshold * 100))
374 | previous_name = '%s/model_%d_%d.pth' % (save_dir, epoch, data_threshold * 100)
375 | return previous_name
376 |
377 |
378 |
379 | def save_model_epoch(previous_name, save_dir, epoch, data_threshold, model):
380 | # if os.path.exists(previous_name):
381 | # os.remove(previous_name)
382 |
383 | torch.save(model.state_dict(),
384 | '%s/model_%d_%d.pth' % (save_dir, epoch, data_threshold * 100))
385 | previous_name = '%s/model_%d_%d.pth' % (save_dir, epoch, data_threshold * 100)
386 | return previous_name
387 |
388 |
389 |
390 |
391 |
392 |
393 |
--------------------------------------------------------------------------------
/demo/figure/lindan.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/figure/lindan.jpg
--------------------------------------------------------------------------------
/demo/figure/messi.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/figure/messi.jpg
--------------------------------------------------------------------------------
/demo/lib/hrnet/experiments/w48_384x288_adam_lr1e-3.yaml:
--------------------------------------------------------------------------------
1 | AUTO_RESUME: true
2 | CUDNN:
3 | BENCHMARK: true
4 | DETERMINISTIC: false
5 | ENABLED: true
6 | DATA_DIR: ''
7 | GPUS: (0,1,2,3)
8 | OUTPUT_DIR: 'output'
9 | LOG_DIR: 'log'
10 | WORKERS: 24
11 | PRINT_FREQ: 100
12 |
13 | DATASET:
14 | COLOR_RGB: true
15 | DATASET: 'coco'
16 | DATA_FORMAT: jpg
17 | FLIP: true
18 | NUM_JOINTS_HALF_BODY: 8
19 | PROB_HALF_BODY: 0.3
20 | ROOT: 'data/coco/'
21 | ROT_FACTOR: 45
22 | SCALE_FACTOR: 0.35
23 | TEST_SET: 'val2017'
24 | TRAIN_SET: 'train2017'
25 | MODEL:
26 | INIT_WEIGHTS: true
27 | NAME: pose_hrnet
28 | NUM_JOINTS: 17
29 | PRETRAINED: 'models/pytorch/imagenet/hrnet_w48-8ef0771d.pth'
30 | TARGET_TYPE: gaussian
31 | IMAGE_SIZE:
32 | - 288
33 | - 384
34 | HEATMAP_SIZE:
35 | - 72
36 | - 96
37 | SIGMA: 3
38 | EXTRA:
39 | PRETRAINED_LAYERS:
40 | - 'conv1'
41 | - 'bn1'
42 | - 'conv2'
43 | - 'bn2'
44 | - 'layer1'
45 | - 'transition1'
46 | - 'stage2'
47 | - 'transition2'
48 | - 'stage3'
49 | - 'transition3'
50 | - 'stage4'
51 | FINAL_CONV_KERNEL: 1
52 | STAGE2:
53 | NUM_MODULES: 1
54 | NUM_BRANCHES: 2
55 | BLOCK: BASIC
56 | NUM_BLOCKS:
57 | - 4
58 | - 4
59 | NUM_CHANNELS:
60 | - 48
61 | - 96
62 | FUSE_METHOD: SUM
63 | STAGE3:
64 | NUM_MODULES: 4
65 | NUM_BRANCHES: 3
66 | BLOCK: BASIC
67 | NUM_BLOCKS:
68 | - 4
69 | - 4
70 | - 4
71 | NUM_CHANNELS:
72 | - 48
73 | - 96
74 | - 192
75 | FUSE_METHOD: SUM
76 | STAGE4:
77 | NUM_MODULES: 3
78 | NUM_BRANCHES: 4
79 | BLOCK: BASIC
80 | NUM_BLOCKS:
81 | - 4
82 | - 4
83 | - 4
84 | - 4
85 | NUM_CHANNELS:
86 | - 48
87 | - 96
88 | - 192
89 | - 384
90 | FUSE_METHOD: SUM
91 | LOSS:
92 | USE_TARGET_WEIGHT: true
93 | TRAIN:
94 | BATCH_SIZE_PER_GPU: 24
95 | SHUFFLE: true
96 | BEGIN_EPOCH: 0
97 | END_EPOCH: 210
98 | OPTIMIZER: adam
99 | LR: 0.001
100 | LR_FACTOR: 0.1
101 | LR_STEP:
102 | - 170
103 | - 200
104 | WD: 0.0001
105 | GAMMA1: 0.99
106 | GAMMA2: 0.0
107 | MOMENTUM: 0.9
108 | NESTEROV: false
109 | TEST:
110 | BATCH_SIZE_PER_GPU: 24
111 | COCO_BBOX_FILE: 'data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json'
112 | BBOX_THRE: 1.0
113 | IMAGE_THRE: 0.0
114 | IN_VIS_THRE: 0.2
115 | MODEL_FILE: ''
116 | NMS_THRE: 1.0
117 | OKS_THRE: 0.9
118 | USE_GT_BBOX: true
119 | FLIP_TEST: true
120 | POST_PROCESS: true
121 | SHIFT_HEATMAP: true
122 | DEBUG:
123 | DEBUG: true
124 | SAVE_BATCH_IMAGES_GT: true
125 | SAVE_BATCH_IMAGES_PRED: true
126 | SAVE_HEATMAPS_GT: true
127 | SAVE_HEATMAPS_PRED: true
128 |
--------------------------------------------------------------------------------
/demo/lib/hrnet/gen_kpts.py:
--------------------------------------------------------------------------------
1 | from __future__ import absolute_import
2 | from __future__ import division
3 | from __future__ import print_function
4 |
5 | import sys
6 | import os
7 | import os.path as osp
8 | import argparse
9 | import time
10 | import numpy as np
11 | from tqdm import tqdm
12 | import json
13 | import torch
14 | import torch.backends.cudnn as cudnn
15 | import cv2
16 | import copy
17 |
18 | from lib.hrnet.lib.utils.utilitys import plot_keypoint, PreProcess, write, load_json
19 | from lib.hrnet.lib.config import cfg, update_config
20 | from lib.hrnet.lib.utils.transforms import *
21 | from lib.hrnet.lib.utils.inference import get_final_preds
22 | from lib.hrnet.lib.models import pose_hrnet
23 |
24 | cfg_dir = 'demo/lib/hrnet/experiments/'
25 | model_dir = 'demo/lib/checkpoint/'
26 |
27 | # Loading human detector model
28 | from lib.yolov3.human_detector import load_model as yolo_model
29 | from lib.yolov3.human_detector import yolo_human_det as yolo_det
30 | from lib.sort.sort import Sort
31 |
32 |
33 | def parse_args():
34 | parser = argparse.ArgumentParser(description='Train keypoints network')
35 | # general
36 | parser.add_argument('--cfg', type=str, default=cfg_dir + 'w48_384x288_adam_lr1e-3.yaml',
37 | help='experiment configure file name')
38 | parser.add_argument('opts', nargs=argparse.REMAINDER, default=None,
39 | help="Modify config options using the command-line")
40 | parser.add_argument('--modelDir', type=str, default=model_dir + 'pose_hrnet_w48_384x288.pth',
41 | help='The model directory')
42 | parser.add_argument('--det-dim', type=int, default=416,
43 | help='The input dimension of the detected image')
44 | parser.add_argument('--thred-score', type=float, default=0.30,
45 | help='The threshold of object Confidence')
46 | parser.add_argument('-a', '--animation', action='store_true',
47 | help='output animation')
48 | parser.add_argument('-np', '--num-person', type=int, default=1,
49 | help='The maximum number of estimated poses')
50 | parser.add_argument("-v", "--video", type=str, default='camera',
51 | help="input video file name")
52 | parser.add_argument("-f", "--figure", type=str, default='demo.jpg',
53 | help="input figure file name")
54 | parser.add_argument('--gpu', type=str, default='0', help='input video')
55 | args = parser.parse_args()
56 |
57 | return args
58 |
59 |
60 | def reset_config(args):
61 | update_config(cfg, args)
62 |
63 | # cudnn related setting
64 | cudnn.benchmark = cfg.CUDNN.BENCHMARK
65 | torch.backends.cudnn.deterministic = cfg.CUDNN.DETERMINISTIC
66 | torch.backends.cudnn.enabled = cfg.CUDNN.ENABLED
67 |
68 |
69 | # load model
70 | def model_load(config):
71 | model = pose_hrnet.get_pose_net(config, is_train=False)
72 | if torch.cuda.is_available():
73 | model = model.cuda()
74 |
75 | state_dict = torch.load(config.OUTPUT_DIR)
76 | from collections import OrderedDict
77 | new_state_dict = OrderedDict()
78 | for k, v in state_dict.items():
79 | name = k # remove module.
80 | # print(name,'\t')
81 | new_state_dict[name] = v
82 | model.load_state_dict(new_state_dict)
83 | model.eval()
84 | # print('HRNet network successfully loaded')
85 |
86 | return model
87 |
88 |
89 | def gen_video_kpts(video, det_dim=416, num_peroson=1, gen_output=False):
90 | # Updating configuration
91 | args = parse_args()
92 | reset_config(args)
93 |
94 | cap = cv2.VideoCapture(video)
95 |
96 | # Loading detector and pose model, initialize sort for track
97 | human_model = yolo_model(inp_dim=det_dim)
98 | pose_model = model_load(cfg)
99 | people_sort = Sort(min_hits=0)
100 |
101 | video_length = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
102 |
103 | kpts_result = []
104 | scores_result = []
105 | for ii in tqdm(range(video_length)):
106 | ret, frame = cap.read()
107 |
108 | if not ret:
109 | continue
110 |
111 | bboxs, scores = yolo_det(frame, human_model, reso=det_dim, confidence=args.thred_score)
112 |
113 | if bboxs is None or not bboxs.any():
114 | print('No person detected!')
115 | bboxs = bboxs_pre
116 | scores = scores_pre
117 | else:
118 | bboxs_pre = copy.deepcopy(bboxs)
119 | scores_pre = copy.deepcopy(scores)
120 |
121 | # Using Sort to track people
122 | people_track = people_sort.update(bboxs)
123 |
124 | # Track the first two people in the video and remove the ID
125 | if people_track.shape[0] == 1:
126 | people_track_ = people_track[-1, :-1].reshape(1, 4)
127 | elif people_track.shape[0] >= 2:
128 | people_track_ = people_track[-num_peroson:, :-1].reshape(num_peroson, 4)
129 | people_track_ = people_track_[::-1]
130 | else:
131 | continue
132 |
133 | track_bboxs = []
134 | for bbox in people_track_:
135 | bbox = [round(i, 2) for i in list(bbox)]
136 | track_bboxs.append(bbox)
137 |
138 | with torch.no_grad():
139 | # bbox is coordinate location
140 | inputs, origin_img, center, scale = PreProcess(frame, track_bboxs, cfg, num_peroson)
141 |
142 | inputs = inputs[:, [2, 1, 0]]
143 |
144 | if torch.cuda.is_available():
145 | inputs = inputs.cuda()
146 | output = pose_model(inputs)
147 |
148 | # compute coordinate
149 | preds, maxvals = get_final_preds(cfg, output.clone().cpu().numpy(), np.asarray(center), np.asarray(scale))
150 |
151 | kpts = np.zeros((num_peroson, 17, 2), dtype=np.float32)
152 | scores = np.zeros((num_peroson, 17), dtype=np.float32)
153 | for i, kpt in enumerate(preds):
154 | kpts[i] = kpt
155 |
156 | for i, score in enumerate(maxvals):
157 | scores[i] = score.squeeze()
158 |
159 | kpts_result.append(kpts)
160 | scores_result.append(scores)
161 |
162 | keypoints = np.array(kpts_result)
163 | scores = np.array(scores_result)
164 |
165 | keypoints = keypoints.transpose(1, 0, 2, 3) # (T, M, N, 2) --> (M, T, N, 2)
166 | scores = scores.transpose(1, 0, 2) # (T, M, N) --> (M, T, N)
167 |
168 | return keypoints, scores
169 |
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/config/__init__.py:
--------------------------------------------------------------------------------
1 | # ------------------------------------------------------------------------------
2 | # Copyright (c) Microsoft
3 | # Licensed under the MIT License.
4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com)
5 | # ------------------------------------------------------------------------------
6 |
7 | from .default import _C as cfg
8 | from .default import update_config
9 | from .models import MODEL_EXTRAS
10 |
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/config/__pycache__/__init__.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/config/__pycache__/__init__.cpython-38.pyc
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/config/__pycache__/__init__.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/config/__pycache__/__init__.cpython-39.pyc
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/config/__pycache__/default.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/config/__pycache__/default.cpython-38.pyc
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/config/__pycache__/default.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/config/__pycache__/default.cpython-39.pyc
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/config/__pycache__/models.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/config/__pycache__/models.cpython-38.pyc
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/config/__pycache__/models.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/config/__pycache__/models.cpython-39.pyc
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/config/default.py:
--------------------------------------------------------------------------------
1 |
2 | # ------------------------------------------------------------------------------
3 | # Copyright (c) Microsoft
4 | # Licensed under the MIT License.
5 | # Written by Bin Xiao (Bin.Xiao@microsoft.com)
6 | # ------------------------------------------------------------------------------
7 |
8 | from __future__ import absolute_import
9 | from __future__ import division
10 | from __future__ import print_function
11 |
12 | import os
13 |
14 | from yacs.config import CfgNode as CN
15 |
16 |
17 | _C = CN()
18 |
19 | _C.OUTPUT_DIR = ''
20 | _C.LOG_DIR = ''
21 | _C.DATA_DIR = ''
22 | _C.GPUS = (0,)
23 | _C.WORKERS = 4
24 | _C.PRINT_FREQ = 20
25 | _C.AUTO_RESUME = False
26 | _C.PIN_MEMORY = True
27 | _C.RANK = 0
28 |
29 | # Cudnn related params
30 | _C.CUDNN = CN()
31 | _C.CUDNN.BENCHMARK = True
32 | _C.CUDNN.DETERMINISTIC = False
33 | _C.CUDNN.ENABLED = True
34 |
35 | # common params for NETWORK
36 | _C.MODEL = CN()
37 | _C.MODEL.NAME = 'pose_hrnet'
38 | _C.MODEL.INIT_WEIGHTS = True
39 | _C.MODEL.PRETRAINED = ''
40 | _C.MODEL.NUM_JOINTS = 17
41 | _C.MODEL.TAG_PER_JOINT = True
42 | _C.MODEL.TARGET_TYPE = 'gaussian'
43 | _C.MODEL.IMAGE_SIZE = [256, 256] # width * height, ex: 192 * 256
44 | _C.MODEL.HEATMAP_SIZE = [64, 64] # width * height, ex: 24 * 32
45 | _C.MODEL.SIGMA = 2
46 | _C.MODEL.EXTRA = CN(new_allowed=True)
47 |
48 | _C.LOSS = CN()
49 | _C.LOSS.USE_OHKM = False
50 | _C.LOSS.TOPK = 8
51 | _C.LOSS.USE_TARGET_WEIGHT = True
52 | _C.LOSS.USE_DIFFERENT_JOINTS_WEIGHT = False
53 |
54 | # DATASET related params
55 | _C.DATASET = CN()
56 | _C.DATASET.ROOT = ''
57 | _C.DATASET.DATASET = 'mpii'
58 | _C.DATASET.TRAIN_SET = 'train'
59 | _C.DATASET.TEST_SET = 'valid'
60 | _C.DATASET.DATA_FORMAT = 'jpg'
61 | _C.DATASET.HYBRID_JOINTS_TYPE = ''
62 | _C.DATASET.SELECT_DATA = False
63 |
64 | # training data augmentation
65 | _C.DATASET.FLIP = True
66 | _C.DATASET.SCALE_FACTOR = 0.25
67 | _C.DATASET.ROT_FACTOR = 30
68 | _C.DATASET.PROB_HALF_BODY = 0.0
69 | _C.DATASET.NUM_JOINTS_HALF_BODY = 8
70 | _C.DATASET.COLOR_RGB = False
71 |
72 | # train
73 | _C.TRAIN = CN()
74 |
75 | _C.TRAIN.LR_FACTOR = 0.1
76 | _C.TRAIN.LR_STEP = [90, 110]
77 | _C.TRAIN.LR = 0.001
78 |
79 | _C.TRAIN.OPTIMIZER = 'adam'
80 | _C.TRAIN.MOMENTUM = 0.9
81 | _C.TRAIN.WD = 0.0001
82 | _C.TRAIN.NESTEROV = False
83 | _C.TRAIN.GAMMA1 = 0.99
84 | _C.TRAIN.GAMMA2 = 0.0
85 |
86 | _C.TRAIN.BEGIN_EPOCH = 0
87 | _C.TRAIN.END_EPOCH = 140
88 |
89 | _C.TRAIN.RESUME = False
90 | _C.TRAIN.CHECKPOINT = ''
91 |
92 | _C.TRAIN.BATCH_SIZE_PER_GPU = 32
93 | _C.TRAIN.SHUFFLE = True
94 |
95 | # testing
96 | _C.TEST = CN()
97 |
98 | # size of images for each device
99 | _C.TEST.BATCH_SIZE_PER_GPU = 32
100 | # Test Model Epoch
101 | _C.TEST.FLIP_TEST = False
102 | _C.TEST.POST_PROCESS = False
103 | _C.TEST.SHIFT_HEATMAP = False
104 |
105 | _C.TEST.USE_GT_BBOX = False
106 |
107 | # nms
108 | _C.TEST.IMAGE_THRE = 0.1
109 | _C.TEST.NMS_THRE = 0.6
110 | _C.TEST.SOFT_NMS = False
111 | _C.TEST.OKS_THRE = 0.5
112 | _C.TEST.IN_VIS_THRE = 0.0
113 | _C.TEST.COCO_BBOX_FILE = ''
114 | _C.TEST.BBOX_THRE = 1.0
115 | _C.TEST.MODEL_FILE = ''
116 |
117 | # debug
118 | _C.DEBUG = CN()
119 | _C.DEBUG.DEBUG = False
120 | _C.DEBUG.SAVE_BATCH_IMAGES_GT = False
121 | _C.DEBUG.SAVE_BATCH_IMAGES_PRED = False
122 | _C.DEBUG.SAVE_HEATMAPS_GT = False
123 | _C.DEBUG.SAVE_HEATMAPS_PRED = False
124 |
125 |
126 | def update_config(cfg, args):
127 | cfg.defrost()
128 | cfg.merge_from_file(args.cfg)
129 | cfg.merge_from_list(args.opts)
130 |
131 | if args.modelDir:
132 | cfg.OUTPUT_DIR = args.modelDir
133 |
134 | # if args.logDir:
135 | # cfg.LOG_DIR = args.logDir
136 | #
137 | # if args.dataDir:
138 | # cfg.DATA_DIR = args.dataDir
139 | #
140 | # cfg.DATASET.ROOT = os.path.join(
141 | # cfg.DATA_DIR, cfg.DATASET.ROOT
142 | # )
143 | #
144 | # cfg.MODEL.PRETRAINED = os.path.join(
145 | # cfg.DATA_DIR, cfg.MODEL.PRETRAINED
146 | # )
147 | #
148 | # if cfg.TEST.MODEL_FILE:
149 | # cfg.TEST.MODEL_FILE = os.path.join(
150 | # cfg.DATA_DIR, cfg.TEST.MODEL_FILE
151 | # )
152 |
153 | cfg.freeze()
154 |
155 |
156 | if __name__ == '__main__':
157 | import sys
158 | with open(sys.argv[1], 'w') as f:
159 | print(_C, file=f)
160 |
161 |
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/config/models.py:
--------------------------------------------------------------------------------
1 | # ------------------------------------------------------------------------------
2 | # Copyright (c) Microsoft
3 | # Licensed under the MIT License.
4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com)
5 | # ------------------------------------------------------------------------------
6 |
7 | from __future__ import absolute_import
8 | from __future__ import division
9 | from __future__ import print_function
10 |
11 | from yacs.config import CfgNode as CN
12 |
13 |
14 | # pose_resnet related params
15 | POSE_RESNET = CN()
16 | POSE_RESNET.NUM_LAYERS = 50
17 | POSE_RESNET.DECONV_WITH_BIAS = False
18 | POSE_RESNET.NUM_DECONV_LAYERS = 3
19 | POSE_RESNET.NUM_DECONV_FILTERS = [256, 256, 256]
20 | POSE_RESNET.NUM_DECONV_KERNELS = [4, 4, 4]
21 | POSE_RESNET.FINAL_CONV_KERNEL = 1
22 | POSE_RESNET.PRETRAINED_LAYERS = ['*']
23 |
24 | # pose_multi_resoluton_net related params
25 | POSE_HIGH_RESOLUTION_NET = CN()
26 | POSE_HIGH_RESOLUTION_NET.PRETRAINED_LAYERS = ['*']
27 | POSE_HIGH_RESOLUTION_NET.STEM_INPLANES = 64
28 | POSE_HIGH_RESOLUTION_NET.FINAL_CONV_KERNEL = 1
29 |
30 | POSE_HIGH_RESOLUTION_NET.STAGE2 = CN()
31 | POSE_HIGH_RESOLUTION_NET.STAGE2.NUM_MODULES = 1
32 | POSE_HIGH_RESOLUTION_NET.STAGE2.NUM_BRANCHES = 2
33 | POSE_HIGH_RESOLUTION_NET.STAGE2.NUM_BLOCKS = [4, 4]
34 | POSE_HIGH_RESOLUTION_NET.STAGE2.NUM_CHANNELS = [32, 64]
35 | POSE_HIGH_RESOLUTION_NET.STAGE2.BLOCK = 'BASIC'
36 | POSE_HIGH_RESOLUTION_NET.STAGE2.FUSE_METHOD = 'SUM'
37 |
38 | POSE_HIGH_RESOLUTION_NET.STAGE3 = CN()
39 | POSE_HIGH_RESOLUTION_NET.STAGE3.NUM_MODULES = 1
40 | POSE_HIGH_RESOLUTION_NET.STAGE3.NUM_BRANCHES = 3
41 | POSE_HIGH_RESOLUTION_NET.STAGE3.NUM_BLOCKS = [4, 4, 4]
42 | POSE_HIGH_RESOLUTION_NET.STAGE3.NUM_CHANNELS = [32, 64, 128]
43 | POSE_HIGH_RESOLUTION_NET.STAGE3.BLOCK = 'BASIC'
44 | POSE_HIGH_RESOLUTION_NET.STAGE3.FUSE_METHOD = 'SUM'
45 |
46 | POSE_HIGH_RESOLUTION_NET.STAGE4 = CN()
47 | POSE_HIGH_RESOLUTION_NET.STAGE4.NUM_MODULES = 1
48 | POSE_HIGH_RESOLUTION_NET.STAGE4.NUM_BRANCHES = 4
49 | POSE_HIGH_RESOLUTION_NET.STAGE4.NUM_BLOCKS = [4, 4, 4, 4]
50 | POSE_HIGH_RESOLUTION_NET.STAGE4.NUM_CHANNELS = [32, 64, 128, 256]
51 | POSE_HIGH_RESOLUTION_NET.STAGE4.BLOCK = 'BASIC'
52 | POSE_HIGH_RESOLUTION_NET.STAGE4.FUSE_METHOD = 'SUM'
53 |
54 |
55 | MODEL_EXTRAS = {
56 | 'pose_resnet': POSE_RESNET,
57 | 'pose_high_resolution_net': POSE_HIGH_RESOLUTION_NET,
58 | }
59 |
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/models/__pycache__/pose_hrnet.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/models/__pycache__/pose_hrnet.cpython-38.pyc
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/models/__pycache__/pose_hrnet.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/models/__pycache__/pose_hrnet.cpython-39.pyc
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/models/pose_hrnet.py:
--------------------------------------------------------------------------------
1 | # ------------------------------------------------------------------------------
2 | # Copyright (c) Microsoft
3 | # Licensed under the MIT License.
4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com)
5 | # ------------------------------------------------------------------------------
6 |
7 | from __future__ import absolute_import
8 | from __future__ import division
9 | from __future__ import print_function
10 |
11 | import os
12 | import logging
13 |
14 | import torch
15 | import torch.nn as nn
16 |
17 |
18 | BN_MOMENTUM = 0.1
19 | logger = logging.getLogger(__name__)
20 |
21 |
22 | def conv3x3(in_planes, out_planes, stride=1):
23 | """3x3 convolution with padding"""
24 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
25 | padding=1, bias=False)
26 |
27 |
28 | class BasicBlock(nn.Module):
29 | expansion = 1
30 |
31 | def __init__(self, inplanes, planes, stride=1, downsample=None):
32 | super(BasicBlock, self).__init__()
33 | self.conv1 = conv3x3(inplanes, planes, stride)
34 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
35 | self.relu = nn.ReLU(inplace=True)
36 | self.conv2 = conv3x3(planes, planes)
37 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
38 | self.downsample = downsample
39 | self.stride = stride
40 |
41 | def forward(self, x):
42 | residual = x
43 |
44 | out = self.conv1(x)
45 | out = self.bn1(out)
46 | out = self.relu(out)
47 |
48 | out = self.conv2(out)
49 | out = self.bn2(out)
50 |
51 | if self.downsample is not None:
52 | residual = self.downsample(x)
53 |
54 | out += residual
55 | out = self.relu(out)
56 |
57 | return out
58 |
59 |
60 | class Bottleneck(nn.Module):
61 | expansion = 4
62 |
63 | def __init__(self, inplanes, planes, stride=1, downsample=None):
64 | super(Bottleneck, self).__init__()
65 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
66 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
67 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
68 | padding=1, bias=False)
69 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
70 | self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1,
71 | bias=False)
72 | self.bn3 = nn.BatchNorm2d(planes * self.expansion,
73 | momentum=BN_MOMENTUM)
74 | self.relu = nn.ReLU(inplace=True)
75 | self.downsample = downsample
76 | self.stride = stride
77 |
78 | def forward(self, x):
79 | residual = x
80 |
81 | out = self.conv1(x)
82 | out = self.bn1(out)
83 | out = self.relu(out)
84 |
85 | out = self.conv2(out)
86 | out = self.bn2(out)
87 | out = self.relu(out)
88 |
89 | out = self.conv3(out)
90 | out = self.bn3(out)
91 |
92 | if self.downsample is not None:
93 | residual = self.downsample(x)
94 |
95 | out += residual
96 | out = self.relu(out)
97 |
98 | return out
99 |
100 |
101 | class HighResolutionModule(nn.Module):
102 | def __init__(self, num_branches, blocks, num_blocks, num_inchannels,
103 | num_channels, fuse_method, multi_scale_output=True):
104 | super(HighResolutionModule, self).__init__()
105 | self._check_branches(
106 | num_branches, blocks, num_blocks, num_inchannels, num_channels)
107 |
108 | self.num_inchannels = num_inchannels
109 | self.fuse_method = fuse_method
110 | self.num_branches = num_branches
111 |
112 | self.multi_scale_output = multi_scale_output
113 |
114 | self.branches = self._make_branches(
115 | num_branches, blocks, num_blocks, num_channels)
116 | self.fuse_layers = self._make_fuse_layers()
117 | self.relu = nn.ReLU(True)
118 |
119 | def _check_branches(self, num_branches, blocks, num_blocks,
120 | num_inchannels, num_channels):
121 | if num_branches != len(num_blocks):
122 | error_msg = 'NUM_BRANCHES({}) <> NUM_BLOCKS({})'.format(
123 | num_branches, len(num_blocks))
124 | logger.error(error_msg)
125 | raise ValueError(error_msg)
126 |
127 | if num_branches != len(num_channels):
128 | error_msg = 'NUM_BRANCHES({}) <> NUM_CHANNELS({})'.format(
129 | num_branches, len(num_channels))
130 | logger.error(error_msg)
131 | raise ValueError(error_msg)
132 |
133 | if num_branches != len(num_inchannels):
134 | error_msg = 'NUM_BRANCHES({}) <> NUM_INCHANNELS({})'.format(
135 | num_branches, len(num_inchannels))
136 | logger.error(error_msg)
137 | raise ValueError(error_msg)
138 |
139 | def _make_one_branch(self, branch_index, block, num_blocks, num_channels,
140 | stride=1):
141 | downsample = None
142 | if stride != 1 or \
143 | self.num_inchannels[branch_index] != num_channels[branch_index] * block.expansion:
144 | downsample = nn.Sequential(
145 | nn.Conv2d(
146 | self.num_inchannels[branch_index],
147 | num_channels[branch_index] * block.expansion,
148 | kernel_size=1, stride=stride, bias=False
149 | ),
150 | nn.BatchNorm2d(
151 | num_channels[branch_index] * block.expansion,
152 | momentum=BN_MOMENTUM
153 | ),
154 | )
155 |
156 | layers = []
157 | layers.append(
158 | block(
159 | self.num_inchannels[branch_index],
160 | num_channels[branch_index],
161 | stride,
162 | downsample
163 | )
164 | )
165 | self.num_inchannels[branch_index] = \
166 | num_channels[branch_index] * block.expansion
167 | for i in range(1, num_blocks[branch_index]):
168 | layers.append(
169 | block(
170 | self.num_inchannels[branch_index],
171 | num_channels[branch_index]
172 | )
173 | )
174 |
175 | return nn.Sequential(*layers)
176 |
177 | def _make_branches(self, num_branches, block, num_blocks, num_channels):
178 | branches = []
179 |
180 | for i in range(num_branches):
181 | branches.append(
182 | self._make_one_branch(i, block, num_blocks, num_channels)
183 | )
184 |
185 | return nn.ModuleList(branches)
186 |
187 | def _make_fuse_layers(self):
188 | if self.num_branches == 1:
189 | return None
190 |
191 | num_branches = self.num_branches
192 | num_inchannels = self.num_inchannels
193 | fuse_layers = []
194 | for i in range(num_branches if self.multi_scale_output else 1):
195 | fuse_layer = []
196 | for j in range(num_branches):
197 | if j > i:
198 | fuse_layer.append(
199 | nn.Sequential(
200 | nn.Conv2d(
201 | num_inchannels[j],
202 | num_inchannels[i],
203 | 1, 1, 0, bias=False
204 | ),
205 | nn.BatchNorm2d(num_inchannels[i]),
206 | nn.Upsample(scale_factor=2**(j-i), mode='nearest')
207 | )
208 | )
209 | elif j == i:
210 | fuse_layer.append(None)
211 | else:
212 | conv3x3s = []
213 | for k in range(i-j):
214 | if k == i - j - 1:
215 | num_outchannels_conv3x3 = num_inchannels[i]
216 | conv3x3s.append(
217 | nn.Sequential(
218 | nn.Conv2d(
219 | num_inchannels[j],
220 | num_outchannels_conv3x3,
221 | 3, 2, 1, bias=False
222 | ),
223 | nn.BatchNorm2d(num_outchannels_conv3x3)
224 | )
225 | )
226 | else:
227 | num_outchannels_conv3x3 = num_inchannels[j]
228 | conv3x3s.append(
229 | nn.Sequential(
230 | nn.Conv2d(
231 | num_inchannels[j],
232 | num_outchannels_conv3x3,
233 | 3, 2, 1, bias=False
234 | ),
235 | nn.BatchNorm2d(num_outchannels_conv3x3),
236 | nn.ReLU(True)
237 | )
238 | )
239 | fuse_layer.append(nn.Sequential(*conv3x3s))
240 | fuse_layers.append(nn.ModuleList(fuse_layer))
241 |
242 | return nn.ModuleList(fuse_layers)
243 |
244 | def get_num_inchannels(self):
245 | return self.num_inchannels
246 |
247 | def forward(self, x):
248 | if self.num_branches == 1:
249 | return [self.branches[0](x[0])]
250 |
251 | for i in range(self.num_branches):
252 | x[i] = self.branches[i](x[i])
253 |
254 | x_fuse = []
255 |
256 | for i in range(len(self.fuse_layers)):
257 | y = x[0] if i == 0 else self.fuse_layers[i][0](x[0])
258 | for j in range(1, self.num_branches):
259 | if i == j:
260 | y = y + x[j]
261 | else:
262 | y = y + self.fuse_layers[i][j](x[j])
263 | x_fuse.append(self.relu(y))
264 |
265 | return x_fuse
266 |
267 |
268 | blocks_dict = {
269 | 'BASIC': BasicBlock,
270 | 'BOTTLENECK': Bottleneck
271 | }
272 |
273 |
274 | class PoseHighResolutionNet(nn.Module):
275 |
276 | def __init__(self, cfg, **kwargs):
277 | self.inplanes = 64
278 | extra = cfg['MODEL']['EXTRA']
279 | super(PoseHighResolutionNet, self).__init__()
280 |
281 | # stem net
282 | self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=2, padding=1,
283 | bias=False)
284 | self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)
285 | self.conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1,
286 | bias=False)
287 | self.bn2 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)
288 | self.relu = nn.ReLU(inplace=True)
289 | self.layer1 = self._make_layer(Bottleneck, 64, 4)
290 |
291 | self.stage2_cfg = extra['STAGE2']
292 | num_channels = self.stage2_cfg['NUM_CHANNELS']
293 | block = blocks_dict[self.stage2_cfg['BLOCK']]
294 | num_channels = [
295 | num_channels[i] * block.expansion for i in range(len(num_channels))
296 | ]
297 | self.transition1 = self._make_transition_layer([256], num_channels)
298 | self.stage2, pre_stage_channels = self._make_stage(
299 | self.stage2_cfg, num_channels)
300 |
301 | self.stage3_cfg = extra['STAGE3']
302 | num_channels = self.stage3_cfg['NUM_CHANNELS']
303 | block = blocks_dict[self.stage3_cfg['BLOCK']]
304 | num_channels = [
305 | num_channels[i] * block.expansion for i in range(len(num_channels))
306 | ]
307 | self.transition2 = self._make_transition_layer(
308 | pre_stage_channels, num_channels)
309 | self.stage3, pre_stage_channels = self._make_stage(
310 | self.stage3_cfg, num_channels)
311 |
312 | self.stage4_cfg = extra['STAGE4']
313 | num_channels = self.stage4_cfg['NUM_CHANNELS']
314 | block = blocks_dict[self.stage4_cfg['BLOCK']]
315 | num_channels = [
316 | num_channels[i] * block.expansion for i in range(len(num_channels))
317 | ]
318 | self.transition3 = self._make_transition_layer(
319 | pre_stage_channels, num_channels)
320 | self.stage4, pre_stage_channels = self._make_stage(
321 | self.stage4_cfg, num_channels, multi_scale_output=False)
322 |
323 | self.final_layer = nn.Conv2d(
324 | in_channels=pre_stage_channels[0],
325 | out_channels=cfg['MODEL']['NUM_JOINTS'],
326 | kernel_size=extra['FINAL_CONV_KERNEL'],
327 | stride=1,
328 | padding=1 if extra['FINAL_CONV_KERNEL'] == 3 else 0
329 | )
330 |
331 | self.pretrained_layers = extra['PRETRAINED_LAYERS']
332 |
333 | def _make_transition_layer(
334 | self, num_channels_pre_layer, num_channels_cur_layer):
335 | num_branches_cur = len(num_channels_cur_layer)
336 | num_branches_pre = len(num_channels_pre_layer)
337 |
338 | transition_layers = []
339 | for i in range(num_branches_cur):
340 | if i < num_branches_pre:
341 | if num_channels_cur_layer[i] != num_channels_pre_layer[i]:
342 | transition_layers.append(
343 | nn.Sequential(
344 | nn.Conv2d(
345 | num_channels_pre_layer[i],
346 | num_channels_cur_layer[i],
347 | 3, 1, 1, bias=False
348 | ),
349 | nn.BatchNorm2d(num_channels_cur_layer[i]),
350 | nn.ReLU(inplace=True)
351 | )
352 | )
353 | else:
354 | transition_layers.append(None)
355 | else:
356 | conv3x3s = []
357 | for j in range(i+1-num_branches_pre):
358 | inchannels = num_channels_pre_layer[-1]
359 | outchannels = num_channels_cur_layer[i] \
360 | if j == i-num_branches_pre else inchannels
361 | conv3x3s.append(
362 | nn.Sequential(
363 | nn.Conv2d(
364 | inchannels, outchannels, 3, 2, 1, bias=False
365 | ),
366 | nn.BatchNorm2d(outchannels),
367 | nn.ReLU(inplace=True)
368 | )
369 | )
370 | transition_layers.append(nn.Sequential(*conv3x3s))
371 |
372 | return nn.ModuleList(transition_layers)
373 |
374 | def _make_layer(self, block, planes, blocks, stride=1):
375 | downsample = None
376 | if stride != 1 or self.inplanes != planes * block.expansion:
377 | downsample = nn.Sequential(
378 | nn.Conv2d(
379 | self.inplanes, planes * block.expansion,
380 | kernel_size=1, stride=stride, bias=False
381 | ),
382 | nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),
383 | )
384 |
385 | layers = []
386 | layers.append(block(self.inplanes, planes, stride, downsample))
387 | self.inplanes = planes * block.expansion
388 | for i in range(1, blocks):
389 | layers.append(block(self.inplanes, planes))
390 |
391 | return nn.Sequential(*layers)
392 |
393 | def _make_stage(self, layer_config, num_inchannels,
394 | multi_scale_output=True):
395 | num_modules = layer_config['NUM_MODULES']
396 | num_branches = layer_config['NUM_BRANCHES']
397 | num_blocks = layer_config['NUM_BLOCKS']
398 | num_channels = layer_config['NUM_CHANNELS']
399 | block = blocks_dict[layer_config['BLOCK']]
400 | fuse_method = layer_config['FUSE_METHOD']
401 |
402 | modules = []
403 | for i in range(num_modules):
404 | # multi_scale_output is only used last module
405 | if not multi_scale_output and i == num_modules - 1:
406 | reset_multi_scale_output = False
407 | else:
408 | reset_multi_scale_output = True
409 |
410 | modules.append(
411 | HighResolutionModule(
412 | num_branches,
413 | block,
414 | num_blocks,
415 | num_inchannels,
416 | num_channels,
417 | fuse_method,
418 | reset_multi_scale_output
419 | )
420 | )
421 | num_inchannels = modules[-1].get_num_inchannels()
422 |
423 | return nn.Sequential(*modules), num_inchannels
424 |
425 | def forward(self, x):
426 | x = self.conv1(x)
427 | x = self.bn1(x)
428 | x = self.relu(x)
429 | x = self.conv2(x)
430 | x = self.bn2(x)
431 | x = self.relu(x)
432 | x = self.layer1(x)
433 |
434 | x_list = []
435 | for i in range(self.stage2_cfg['NUM_BRANCHES']):
436 | if self.transition1[i] is not None:
437 | x_list.append(self.transition1[i](x))
438 | else:
439 | x_list.append(x)
440 | y_list = self.stage2(x_list)
441 |
442 | x_list = []
443 | for i in range(self.stage3_cfg['NUM_BRANCHES']):
444 | if self.transition2[i] is not None:
445 | x_list.append(self.transition2[i](y_list[-1]))
446 | else:
447 | x_list.append(y_list[i])
448 | y_list = self.stage3(x_list)
449 |
450 | x_list = []
451 | for i in range(self.stage4_cfg['NUM_BRANCHES']):
452 | if self.transition3[i] is not None:
453 | x_list.append(self.transition3[i](y_list[-1]))
454 | else:
455 | x_list.append(y_list[i])
456 | y_list = self.stage4(x_list)
457 |
458 | x = self.final_layer(y_list[0])
459 |
460 | return x
461 |
462 | def init_weights(self, pretrained=''):
463 | logger.info('=> init weights from normal distribution')
464 | for m in self.modules():
465 | if isinstance(m, nn.Conv2d):
466 | # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
467 | nn.init.normal_(m.weight, std=0.001)
468 | for name, _ in m.named_parameters():
469 | if name in ['bias']:
470 | nn.init.constant_(m.bias, 0)
471 | elif isinstance(m, nn.BatchNorm2d):
472 | nn.init.constant_(m.weight, 1)
473 | nn.init.constant_(m.bias, 0)
474 | elif isinstance(m, nn.ConvTranspose2d):
475 | nn.init.normal_(m.weight, std=0.001)
476 | for name, _ in m.named_parameters():
477 | if name in ['bias']:
478 | nn.init.constant_(m.bias, 0)
479 |
480 | if os.path.isfile(pretrained):
481 | pretrained_state_dict = torch.load(pretrained)
482 | logger.info('=> loading pretrained model {}'.format(pretrained))
483 |
484 | need_init_state_dict = {}
485 | for name, m in pretrained_state_dict.items():
486 | if name.split('.')[0] in self.pretrained_layers \
487 | or self.pretrained_layers[0] is '*':
488 | need_init_state_dict[name] = m
489 | self.load_state_dict(need_init_state_dict, strict=False)
490 | elif pretrained:
491 | logger.error('=> please download pre-trained models first!')
492 | raise ValueError('{} is not exist!'.format(pretrained))
493 |
494 |
495 | def get_pose_net(cfg, is_train, **kwargs):
496 | model = PoseHighResolutionNet(cfg, **kwargs)
497 |
498 | if is_train and cfg['MODEL']['INIT_WEIGHTS']:
499 | model.init_weights(cfg['MODEL']['PRETRAINED'])
500 |
501 | return model
502 |
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/utils/__pycache__/coco_h36m.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/utils/__pycache__/coco_h36m.cpython-39.pyc
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/utils/__pycache__/inference.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/utils/__pycache__/inference.cpython-39.pyc
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/utils/__pycache__/transforms.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/utils/__pycache__/transforms.cpython-39.pyc
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/utils/__pycache__/utilitys.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/hrnet/lib/utils/__pycache__/utilitys.cpython-39.pyc
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/utils/coco_h36m.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 |
4 | h36m_coco_order = [9, 11, 14, 12, 15, 13, 16, 4, 1, 5, 2, 6, 3]
5 | coco_order = [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
6 | spple_keypoints = [10, 8, 0, 7]
7 |
8 |
9 | def coco_h36m(keypoints):
10 | # keypoints: (T, N, 2) or (M, N, 2)
11 |
12 | temporal = keypoints.shape[0]
13 | keypoints_h36m = np.zeros_like(keypoints, dtype=np.float32)
14 | htps_keypoints = np.zeros((temporal, 4, 2), dtype=np.float32)
15 |
16 | # htps_keypoints: head, thorax, pelvis, spine
17 | htps_keypoints[:, 0, 0] = np.mean(keypoints[:, 1:5, 0], axis=1, dtype=np.float32)
18 | htps_keypoints[:, 0, 1] = np.sum(keypoints[:, 1:3, 1], axis=1, dtype=np.float32) - keypoints[:, 0, 1]
19 | htps_keypoints[:, 1, :] = np.mean(keypoints[:, 5:7, :], axis=1, dtype=np.float32)
20 | htps_keypoints[:, 1, :] += (keypoints[:, 0, :] - htps_keypoints[:, 1, :]) / 3
21 |
22 | htps_keypoints[:, 2, :] = np.mean(keypoints[:, 11:13, :], axis=1, dtype=np.float32)
23 | htps_keypoints[:, 3, :] = np.mean(keypoints[:, [5, 6, 11, 12], :], axis=1, dtype=np.float32)
24 |
25 | keypoints_h36m[:, spple_keypoints, :] = htps_keypoints
26 | keypoints_h36m[:, h36m_coco_order, :] = keypoints[:, coco_order, :]
27 |
28 | keypoints_h36m[:, 9, :] -= (keypoints_h36m[:, 9, :] - np.mean(keypoints[:, 5:7, :], axis=1, dtype=np.float32)) / 4
29 | keypoints_h36m[:, 7, 0] += 0.3*(keypoints_h36m[:, 7, 0] - np.mean(keypoints_h36m[:, [0, 8], 0], axis=1, dtype=np.float32))
30 | keypoints_h36m[:, 8, 1] -= (np.mean(keypoints[:, 1:3, 1], axis=1, dtype=np.float32) - keypoints[:, 0, 1])*2/3
31 |
32 | # half body: the joint of ankle and knee equal to hip
33 | # keypoints_h36m[:, [2, 3]] = keypoints_h36m[:, [1, 1]]
34 | # keypoints_h36m[:, [5, 6]] = keypoints_h36m[:, [4, 4]]
35 | return keypoints_h36m
36 |
37 |
38 | h36m_mpii_order = [3, 2, 1, 4, 5, 6, 0, 8, 9, 10, 16, 15, 14, 11, 12, 13]
39 | mpii_order = [i for i in range(16)]
40 | lr_hip_shouler = [2, 3, 12, 13]
41 |
42 |
43 | def mpii_h36m(keypoints):
44 | temporal = keypoints.shape[0]
45 | keypoints_h36m = np.zeros((temporal, 17, 2), dtype=np.float32)
46 | keypoints_h36m[:, h36m_mpii_order] = keypoints
47 | # keypoints_h36m[:, 7] = np.mean(keypoints[:, 6:8], axis=1, dtype=np.float32)
48 | keypoints_h36m[:, 7] = np.mean(keypoints[:, lr_hip_shouler], axis=1, dtype=np.float32)
49 | return keypoints_h36m
50 |
51 |
52 |
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/utils/inference.py:
--------------------------------------------------------------------------------
1 | # ------------------------------------------------------------------------------
2 | # Copyright (c) Microsoft
3 | # Licensed under the MIT License.
4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com)
5 | # ------------------------------------------------------------------------------
6 |
7 | from __future__ import absolute_import
8 | from __future__ import division
9 | from __future__ import print_function
10 |
11 | import math
12 | import sys
13 | import os.path as osp
14 | import numpy as np
15 |
16 | sys.path.insert(0, osp.join(osp.dirname(osp.realpath(__file__)), '..'))
17 | from utils.transforms import transform_preds
18 | sys.path.pop(0)
19 |
20 |
21 | def get_max_preds(batch_heatmaps):
22 | '''
23 | get predictions from score maps
24 | heatmaps: numpy.ndarray([batch_size, num_joints, height, width])
25 | '''
26 | assert isinstance(batch_heatmaps, np.ndarray), \
27 | 'batch_heatmaps should be numpy.ndarray'
28 | assert batch_heatmaps.ndim == 4, 'batch_images should be 4-ndim'
29 |
30 | batch_size = batch_heatmaps.shape[0]
31 | num_joints = batch_heatmaps.shape[1]
32 | width = batch_heatmaps.shape[3]
33 | heatmaps_reshaped = batch_heatmaps.reshape((batch_size, num_joints, -1))
34 | idx = np.argmax(heatmaps_reshaped, 2)
35 | maxvals = np.amax(heatmaps_reshaped, 2)
36 |
37 | maxvals = maxvals.reshape((batch_size, num_joints, 1))
38 | idx = idx.reshape((batch_size, num_joints, 1))
39 |
40 | preds = np.tile(idx, (1, 1, 2)).astype(np.float32)
41 |
42 | preds[:, :, 0] = (preds[:, :, 0]) % width
43 | preds[:, :, 1] = np.floor((preds[:, :, 1]) / width)
44 |
45 | pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))
46 | pred_mask = pred_mask.astype(np.float32)
47 |
48 | preds *= pred_mask
49 | return preds, maxvals
50 |
51 |
52 | def get_final_preds(config, batch_heatmaps, center, scale):
53 | coords, maxvals = get_max_preds(batch_heatmaps)
54 |
55 | heatmap_height = batch_heatmaps.shape[2]
56 | heatmap_width = batch_heatmaps.shape[3]
57 |
58 | # post-processing
59 | if config.TEST.POST_PROCESS:
60 | for n in range(coords.shape[0]):
61 | for p in range(coords.shape[1]):
62 | hm = batch_heatmaps[n][p]
63 | px = int(math.floor(coords[n][p][0] + 0.5))
64 | py = int(math.floor(coords[n][p][1] + 0.5))
65 | if 1 < px < heatmap_width-1 and 1 < py < heatmap_height-1:
66 | diff = np.array(
67 | [
68 | hm[py][px+1] - hm[py][px-1],
69 | hm[py+1][px]-hm[py-1][px]
70 | ]
71 | )
72 | coords[n][p] += np.sign(diff) * .25
73 |
74 | preds = coords.copy()
75 |
76 | # Transform back
77 | for i in range(coords.shape[0]):
78 | preds[i] = transform_preds(
79 | coords[i], center[i], scale[i], [heatmap_width, heatmap_height]
80 | )
81 |
82 | return preds, maxvals
83 |
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/utils/transforms.py:
--------------------------------------------------------------------------------
1 | # ------------------------------------------------------------------------------
2 | # Copyright (c) Microsoft
3 | # Licensed under the MIT License.
4 | # Written by Bin Xiao (Bin.Xiao@microsoft.com)
5 | # ------------------------------------------------------------------------------
6 |
7 | from __future__ import absolute_import
8 | from __future__ import division
9 | from __future__ import print_function
10 |
11 | import numpy as np
12 | import cv2
13 |
14 |
15 | def flip_back(output_flipped, matched_parts):
16 | '''
17 | ouput_flipped: numpy.ndarray(batch_size, num_joints, height, width)
18 | '''
19 | assert output_flipped.ndim == 4,\
20 | 'output_flipped should be [batch_size, num_joints, height, width]'
21 |
22 | output_flipped = output_flipped[:, :, :, ::-1]
23 |
24 | # 因为你输入的是翻转后的图像,所以输出的热图他们对应的左右关节也是相反的(训练的时候,输入的是翻转后的图像,target对应的左右关节也是对调过来的)。
25 | for pair in matched_parts:
26 | tmp = output_flipped[:, pair[0], :, :].copy()
27 | output_flipped[:, pair[0], :, :] = output_flipped[:, pair[1], :, :]
28 | output_flipped[:, pair[1], :, :] = tmp
29 |
30 | return output_flipped
31 |
32 |
33 | def fliplr_joints(joints, joints_vis, width, matched_parts):
34 | """
35 | flip coords
36 | """
37 | # Flip horizontal
38 | joints[:, 0] = width - joints[:, 0] - 1
39 |
40 | # Change left-right parts
41 | for pair in matched_parts:
42 | joints[pair[0], :], joints[pair[1], :] = \
43 | joints[pair[1], :], joints[pair[0], :].copy()
44 | joints_vis[pair[0], :], joints_vis[pair[1], :] = \
45 | joints_vis[pair[1], :], joints_vis[pair[0], :].copy()
46 |
47 | return joints*joints_vis, joints_vis
48 |
49 |
50 | def transform_preds(coords, center, scale, output_size):
51 | target_coords = np.zeros(coords.shape)
52 | trans = get_affine_transform(center, scale, 0, output_size, inv=1)
53 | for p in range(coords.shape[0]):
54 | target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans)
55 | return target_coords
56 |
57 |
58 | def get_affine_transform(
59 | center, scale, rot, output_size,
60 | shift=np.array([0, 0], dtype=np.float32), inv=0
61 | ):
62 | if not isinstance(scale, np.ndarray) and not isinstance(scale, list):
63 | print(scale)
64 | scale = np.array([scale, scale])
65 |
66 | scale_tmp = scale * 200.0
67 | src_w = scale_tmp[0]
68 | dst_w = output_size[0]
69 | dst_h = output_size[1]
70 |
71 | rot_rad = np.pi * rot / 180
72 | src_dir = get_dir([0, src_w * -0.5], rot_rad)
73 | dst_dir = np.array([0, dst_w * -0.5], np.float32)
74 |
75 | src = np.zeros((3, 2), dtype=np.float32)
76 | dst = np.zeros((3, 2), dtype=np.float32)
77 | src[0, :] = center + scale_tmp * shift
78 | src[1, :] = center + src_dir + scale_tmp * shift
79 | dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
80 | dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir
81 |
82 | src[2:, :] = get_3rd_point(src[0, :], src[1, :])
83 | dst[2:, :] = get_3rd_point(dst[0, :], dst[1, :])
84 |
85 | if inv:
86 | trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
87 | else:
88 | trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
89 |
90 | return trans
91 |
92 |
93 | def affine_transform(pt, t):
94 | new_pt = np.array([pt[0], pt[1], 1.]).T
95 | new_pt = np.dot(t, new_pt)
96 | return new_pt[:2]
97 |
98 |
99 | def get_3rd_point(a, b):
100 | direct = a - b
101 | return b + np.array([-direct[1], direct[0]], dtype=np.float32)
102 |
103 |
104 | def get_dir(src_point, rot_rad):
105 | sn, cs = np.sin(rot_rad), np.cos(rot_rad)
106 |
107 | src_result = [0, 0]
108 | src_result[0] = src_point[0] * cs - src_point[1] * sn
109 | src_result[1] = src_point[0] * sn + src_point[1] * cs
110 |
111 | return src_result
112 |
113 |
114 | def crop(img, center, scale, output_size, rot=0):
115 | trans = get_affine_transform(center, scale, rot, output_size)
116 |
117 | dst_img = cv2.warpAffine(
118 | img, trans, (int(output_size[0]), int(output_size[1])),
119 | flags=cv2.INTER_LINEAR
120 | )
121 |
122 | return dst_img
123 |
--------------------------------------------------------------------------------
/demo/lib/hrnet/lib/utils/utilitys.py:
--------------------------------------------------------------------------------
1 | import cv2
2 | import sys
3 | import torch
4 | import json
5 | import torchvision.transforms as transforms
6 | from lib.hrnet.lib.utils.transforms import *
7 |
8 | from lib.hrnet.lib.utils.coco_h36m import coco_h36m
9 | import numpy as np
10 |
11 | joint_pairs = [[0, 1], [1, 3], [0, 2], [2, 4],
12 | [5, 6], [5, 7], [7, 9], [6, 8], [8, 10],
13 | [5, 11], [6, 12], [11, 12],
14 | [11, 13], [12, 14], [13, 15], [14, 16]]
15 |
16 | h36m_pairs = [(0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6), (0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12),
17 | (12, 13), (8, 14), (14, 15), (15, 16)]
18 |
19 | colors = [[255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0], [85, 255, 0], [0, 255, 0], \
20 | [0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], \
21 | [170, 0, 255], [255, 0, 255]]
22 |
23 |
24 | def plot_keypoint(image, coordinates, confidence, keypoint_thresh=0.3):
25 | # USE cv2
26 | joint_visible = confidence[:, :, 0] > keypoint_thresh
27 | coordinates = coco_h36m(coordinates)
28 | for i in range(coordinates.shape[0]):
29 | pts = coordinates[i]
30 |
31 | for joint in pts:
32 | cv2.circle(image, (int(joint[0]), int(joint[1])), 8, (255, 255, 255), 1)
33 |
34 | for color_i, jp in zip(colors, h36m_pairs):
35 | if joint_visible[i, jp[0]] and joint_visible[i, jp[1]]:
36 | pt0 = pts[jp, 0]
37 | pt1 = pts[jp, 1]
38 | pt0_0, pt0_1, pt1_0, pt1_1 = int(pt0[0]), int(pt0[1]), int(pt1[0]), int(pt1[1])
39 |
40 | cv2.line(image, (pt0_0, pt1_0), (pt0_1, pt1_1), color_i, 6)
41 | # cv2.circle(image,(pt0_0, pt0_1), 2, color_i, thickness=-1)
42 | # cv2.circle(image,(pt1_0, pt1_1), 2, color_i, thickness=-1)
43 | return image
44 |
45 |
46 | def write(x, img):
47 | x = [int(i) for i in x]
48 | c1 = tuple(x[0:2])
49 | c2 = tuple(x[2:4])
50 |
51 | color = [0, 97, 255]
52 | label = 'People {}'.format(x[-1])
53 | cv2.rectangle(img, c1, c2, color, 2)
54 | t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1, 1)[0]
55 | c2 = c1[0] + t_size[0] + 3, c1[1] + t_size[1] + 4
56 | cv2.rectangle(img, c1, c2, [0, 128, 255], -1)
57 | cv2.putText(img, label, (c1[0], c1[1] + t_size[1] + 4), cv2.FONT_HERSHEY_PLAIN, 1, [225, 255, 255], 1)
58 | return img
59 |
60 |
61 | def load_json(file_path):
62 | with open(file_path, 'r') as fr:
63 | video_info = json.load(fr)
64 |
65 | label = video_info['label']
66 | label_index = video_info['label_index']
67 |
68 | num_frames = video_info['data'][-1]['frame_index']
69 | keypoints = np.zeros((2, num_frames, 17, 2), dtype=np.float32) # (M, T, N, 2)
70 | scores = np.zeros((2, num_frames, 17), dtype=np.float32) # (M, T, N)
71 |
72 | for frame_info in video_info['data']:
73 | frame_index = frame_info['frame_index']
74 |
75 | for index, skeleton_info in enumerate(frame_info['skeleton']):
76 | pose = skeleton_info['pose']
77 | score = skeleton_info['score']
78 | bbox = skeleton_info['bbox']
79 |
80 | if len(bbox) == 0 or index+1 > 2:
81 | continue
82 |
83 | pose = np.asarray(pose, dtype=np.float32)
84 | score = np.asarray(score, dtype=np.float32)
85 | score = score.reshape(-1)
86 |
87 | keypoints[index, frame_index-1] = pose
88 | scores[index, frame_index-1] = score
89 |
90 | new_kpts = []
91 | for i in range(keypoints.shape[0]):
92 | kps = keypoints[i]
93 | if np.sum(kps) != 0.:
94 | new_kpts.append(kps)
95 |
96 | new_kpts = np.asarray(new_kpts, dtype=np.float32)
97 | scores = np.asarray(scores, dtype=np.float32)
98 | scores = scores[:, :, :, np.newaxis]
99 | return new_kpts, scores, label, label_index
100 |
101 |
102 | def box_to_center_scale(box, model_image_width, model_image_height):
103 | """convert a box to center,scale information required for pose transformation
104 | Parameters
105 | ----------
106 | box : (x1, y1, x2, y2)
107 | model_image_width : int
108 | model_image_height : int
109 |
110 | Returns
111 | -------
112 | (numpy array, numpy array)
113 | Two numpy arrays, coordinates for the center of the box and the scale of the box
114 | """
115 | center = np.zeros((2), dtype=np.float32)
116 | x1, y1, x2, y2 = box[:4]
117 | box_width, box_height = x2 - x1, y2 - y1
118 |
119 | center[0] = x1 + box_width * 0.5
120 | center[1] = y1 + box_height * 0.5
121 |
122 | aspect_ratio = model_image_width * 1.0 / model_image_height
123 | pixel_std = 200
124 |
125 | if box_width > aspect_ratio * box_height:
126 | box_height = box_width * 1.0 / aspect_ratio
127 | elif box_width < aspect_ratio * box_height:
128 | box_width = box_height * aspect_ratio
129 | scale = np.array(
130 | [box_width * 1.0 / pixel_std, box_height * 1.0 / pixel_std],
131 | dtype=np.float32)
132 | if center[0] != -1:
133 | scale = scale * 1.25
134 |
135 | return center, scale
136 |
137 |
138 | # Pre-process
139 | def PreProcess(image, bboxs, cfg, num_pos=2):
140 | if type(image) == str:
141 | data_numpy = cv2.imread(image, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
142 | # data_numpy = cv2.cvtColor(data_numpy, cv2.COLOR_BGR2RGB)
143 | else:
144 | data_numpy = image
145 |
146 | inputs = []
147 | centers = []
148 | scales = []
149 |
150 | for bbox in bboxs[:num_pos]:
151 | c, s = box_to_center_scale(bbox, data_numpy.shape[0], data_numpy.shape[1])
152 | centers.append(c)
153 | scales.append(s)
154 | r = 0
155 |
156 | trans = get_affine_transform(c, s, r, cfg.MODEL.IMAGE_SIZE)
157 | input = cv2.warpAffine(
158 | data_numpy,
159 | trans,
160 | (int(cfg.MODEL.IMAGE_SIZE[0]), int(cfg.MODEL.IMAGE_SIZE[1])),
161 | flags=cv2.INTER_LINEAR)
162 |
163 | transform = transforms.Compose([transforms.ToTensor(),
164 | transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
165 | input = transform(input).unsqueeze(0)
166 | inputs.append(input)
167 |
168 | inputs = torch.cat(inputs)
169 | return inputs, data_numpy, centers, scales
170 |
--------------------------------------------------------------------------------
/demo/lib/preprocess.py:
--------------------------------------------------------------------------------
1 | import json
2 | import numpy as np
3 | import os
4 |
5 | h36m_coco_order = [9, 11, 14, 12, 15, 13, 16, 4, 1, 5, 2, 6, 3]
6 | coco_order = [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
7 | spple_keypoints = [10, 8, 0, 7]
8 |
9 |
10 | def coco_h36m(keypoints):
11 | temporal = keypoints.shape[0]
12 | keypoints_h36m = np.zeros_like(keypoints, dtype=np.float32)
13 | htps_keypoints = np.zeros((temporal, 4, 2), dtype=np.float32)
14 |
15 | # htps_keypoints: head, thorax, pelvis, spine
16 | htps_keypoints[:, 0, 0] = np.mean(keypoints[:, 1:5, 0], axis=1, dtype=np.float32)
17 | htps_keypoints[:, 0, 1] = np.sum(keypoints[:, 1:3, 1], axis=1, dtype=np.float32) - keypoints[:, 0, 1]
18 | htps_keypoints[:, 1, :] = np.mean(keypoints[:, 5:7, :], axis=1, dtype=np.float32)
19 | htps_keypoints[:, 1, :] += (keypoints[:, 0, :] - htps_keypoints[:, 1, :]) / 3
20 |
21 | htps_keypoints[:, 2, :] = np.mean(keypoints[:, 11:13, :], axis=1, dtype=np.float32)
22 | htps_keypoints[:, 3, :] = np.mean(keypoints[:, [5, 6, 11, 12], :], axis=1, dtype=np.float32)
23 |
24 | keypoints_h36m[:, spple_keypoints, :] = htps_keypoints
25 | keypoints_h36m[:, h36m_coco_order, :] = keypoints[:, coco_order, :]
26 |
27 | keypoints_h36m[:, 9, :] -= (keypoints_h36m[:, 9, :] - np.mean(keypoints[:, 5:7, :], axis=1, dtype=np.float32)) / 4
28 | keypoints_h36m[:, 7, 0] += 2*(keypoints_h36m[:, 7, 0] - np.mean(keypoints_h36m[:, [0, 8], 0], axis=1, dtype=np.float32))
29 | keypoints_h36m[:, 8, 1] -= (np.mean(keypoints[:, 1:3, 1], axis=1, dtype=np.float32) - keypoints[:, 0, 1])*2/3
30 |
31 | # half body: the joint of ankle and knee equal to hip
32 | # keypoints_h36m[:, [2, 3]] = keypoints_h36m[:, [1, 1]]
33 | # keypoints_h36m[:, [5, 6]] = keypoints_h36m[:, [4, 4]]
34 |
35 | valid_frames = np.where(np.sum(keypoints_h36m.reshape(-1, 34), axis=1) != 0)[0]
36 |
37 | return keypoints_h36m, valid_frames
38 |
39 |
40 | def h36m_coco_format(keypoints, scores):
41 | assert len(keypoints.shape) == 4 and len(scores.shape) == 3
42 |
43 | h36m_kpts = []
44 | h36m_scores = []
45 | valid_frames = []
46 |
47 | for i in range(keypoints.shape[0]):
48 | kpts = keypoints[i]
49 | score = scores[i]
50 |
51 | new_score = np.zeros_like(score, dtype=np.float32)
52 |
53 | if np.sum(kpts) != 0.:
54 | kpts, valid_frame = coco_h36m(kpts)
55 | h36m_kpts.append(kpts)
56 | valid_frames.append(valid_frame)
57 |
58 | new_score[:, h36m_coco_order] = score[:, coco_order]
59 | new_score[:, 0] = np.mean(score[:, [11, 12]], axis=1, dtype=np.float32)
60 | new_score[:, 8] = np.mean(score[:, [5, 6]], axis=1, dtype=np.float32)
61 | new_score[:, 7] = np.mean(new_score[:, [0, 8]], axis=1, dtype=np.float32)
62 | new_score[:, 10] = np.mean(score[:, [1, 2, 3, 4]], axis=1, dtype=np.float32)
63 |
64 | h36m_scores.append(new_score)
65 |
66 | h36m_kpts = np.asarray(h36m_kpts, dtype=np.float32)
67 | h36m_scores = np.asarray(h36m_scores, dtype=np.float32)
68 |
69 | return h36m_kpts, h36m_scores, valid_frames
70 |
71 |
72 | def revise_kpts(h36m_kpts, h36m_scores, valid_frames):
73 |
74 | new_h36m_kpts = np.zeros_like(h36m_kpts)
75 | for index, frames in enumerate(valid_frames):
76 | kpts = h36m_kpts[index, frames]
77 | score = h36m_scores[index, frames]
78 |
79 | index_frame = np.where(np.sum(score < 0.3, axis=1) > 0)[0]
80 |
81 | for frame in index_frame:
82 | less_threshold_joints = np.where(score[frame] < 0.3)[0]
83 |
84 | intersect = [i for i in [2, 3, 5, 6] if i in less_threshold_joints]
85 |
86 | if [2, 3, 5, 6] == intersect:
87 | kpts[frame, [2, 3, 5, 6]] = kpts[frame, [1, 1, 4, 4]]
88 | elif [2, 3, 6] == intersect:
89 | kpts[frame, [2, 3, 6]] = kpts[frame, [1, 1, 5]]
90 | elif [3, 5, 6] == intersect:
91 | kpts[frame, [3, 5, 6]] = kpts[frame, [2, 4, 4]]
92 | elif [3, 6] == intersect:
93 | kpts[frame, [3, 6]] = kpts[frame, [2, 5]]
94 | elif [3] == intersect:
95 | kpts[frame, 3] = kpts[frame, 2]
96 | elif [6] == intersect:
97 | kpts[frame, 6] = kpts[frame, 5]
98 | else:
99 | continue
100 |
101 | new_h36m_kpts[index, frames] = kpts
102 |
103 | return new_h36m_kpts
104 |
105 |
106 |
--------------------------------------------------------------------------------
/demo/lib/sort/sort.py:
--------------------------------------------------------------------------------
1 | """
2 | https://arxiv.org/abs/1602.00763
3 | """
4 | from __future__ import print_function
5 |
6 | from numba import jit
7 | import os.path
8 | import numpy as np
9 | from skimage import io
10 | from scipy.optimize import linear_sum_assignment
11 | import argparse
12 | from filterpy.kalman import KalmanFilter
13 |
14 |
15 | @jit
16 | def iou(bb_test, bb_gt):
17 | """
18 | Computes IUO between two bboxes in the form [x1,y1,x2,y2]
19 | """
20 | xx1 = np.maximum(bb_test[0], bb_gt[0])
21 | yy1 = np.maximum(bb_test[1], bb_gt[1])
22 | xx2 = np.minimum(bb_test[2], bb_gt[2])
23 | yy2 = np.minimum(bb_test[3], bb_gt[3])
24 | w = np.maximum(0., xx2 - xx1)
25 | h = np.maximum(0., yy2 - yy1)
26 | wh = w * h
27 | o = wh / ((bb_test[2] - bb_test[0]) * (bb_test[3] - bb_test[1])
28 | + (bb_gt[2] - bb_gt[0]) * (bb_gt[3] - bb_gt[1]) - wh)
29 |
30 | return o
31 |
32 |
33 | def convert_bbox_to_z(bbox):
34 | """
35 | Takes a bounding box in the form [x1,y1,x2,y2] and returns z in the form
36 | [x,y,s,r] where x,y is the centre of the box and s is the scale/area and r is
37 | the aspect ratio
38 | """
39 | w = bbox[2] - bbox[0]
40 | h = bbox[3] - bbox[1]
41 | x = bbox[0] + w / 2.
42 | y = bbox[1] + h / 2.
43 | s = w * h # scale is just area
44 | r = w / float(h)
45 | return np.array([x, y, s, r]).reshape((4, 1))
46 |
47 |
48 | def convert_x_to_bbox(x, score=None):
49 | """
50 | Takes a bounding box in the centre form [x,y,s,r] and returns it in the form
51 | [x1,y1,x2,y2] where x1,y1 is the top left and x2,y2 is the bottom right
52 | """
53 | w = np.sqrt(x[2] * x[3])
54 | h = x[2] / w
55 | if (score == None):
56 | return np.array([x[0] - w / 2., x[1] - h / 2., x[0] + w / 2., x[1] + h / 2.]).reshape((1, 4))
57 | else:
58 | return np.array([x[0] - w / 2., x[1] - h / 2., x[0] + w / 2., x[1] + h / 2., score]).reshape((1, 5))
59 |
60 |
61 | class KalmanBoxTracker(object):
62 | """
63 | This class represents the internel state of individual tracked objects observed as bbox.
64 | """
65 | count = 0
66 |
67 | def __init__(self, bbox):
68 | """
69 | Initialises a tracker using initial bounding box.
70 | """
71 | # define constant velocity model
72 | self.kf = KalmanFilter(dim_x=7, dim_z=4)
73 | self.kf.F = np.array(
74 | [[1, 0, 0, 0, 1, 0, 0], [0, 1, 0, 0, 0, 1, 0], [0, 0, 1, 0, 0, 0, 1], [0, 0, 0, 1, 0, 0, 0],
75 | [0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 1]])
76 | self.kf.H = np.array(
77 | [[1, 0, 0, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0]])
78 |
79 | self.kf.R[2:, 2:] *= 10.
80 | self.kf.P[4:, 4:] *= 1000. # give high uncertainty to the unobservable initial velocities
81 | self.kf.P *= 10.
82 | self.kf.Q[-1, -1] *= 0.01
83 | self.kf.Q[4:, 4:] *= 0.01
84 |
85 | self.kf.x[:4] = convert_bbox_to_z(bbox)
86 | self.time_since_update = 0
87 | self.id = KalmanBoxTracker.count
88 | KalmanBoxTracker.count += 1
89 | self.history = []
90 | self.hits = 0
91 | self.hit_streak = 0
92 | self.age = 0
93 |
94 | def update(self, bbox):
95 | """
96 | Updates the state vector with observed bbox.
97 | """
98 | self.time_since_update = 0
99 | self.history = []
100 | self.hits += 1
101 | self.hit_streak += 1
102 | self.kf.update(convert_bbox_to_z(bbox))
103 |
104 | def predict(self):
105 | """
106 | Advances the state vector and returns the predicted bounding box estimate.
107 | """
108 | if ((self.kf.x[6] + self.kf.x[2]) <= 0):
109 | self.kf.x[6] *= 0.0
110 | self.kf.predict()
111 | self.age += 1
112 | if (self.time_since_update > 0):
113 | self.hit_streak = 0
114 | self.time_since_update += 1
115 | self.history.append(convert_x_to_bbox(self.kf.x))
116 | return self.history[-1]
117 |
118 | def get_state(self):
119 | """
120 | Returns the current bounding box estimate.
121 | """
122 | return convert_x_to_bbox(self.kf.x)
123 |
124 |
125 | def associate_detections_to_trackers(detections, trackers, iou_threshold=0.3):
126 | """
127 | Assigns detections to tracked object (both represented as bounding boxes)
128 |
129 | Returns 3 lists of matches, unmatched_detections and unmatched_trackers
130 | """
131 | if (len(trackers) == 0):
132 | return np.empty((0, 2), dtype=int), np.arange(len(detections)), np.empty((0, 5), dtype=int)
133 | iou_matrix = np.zeros((len(detections), len(trackers)), dtype=np.float32)
134 |
135 | for d, det in enumerate(detections):
136 | for t, trk in enumerate(trackers):
137 | iou_matrix[d, t] = iou(det, trk)
138 | matched_indices = linear_sum_assignment(-iou_matrix)
139 | matched_indices = np.asarray(matched_indices)
140 | matched_indices = matched_indices.transpose()
141 |
142 | unmatched_detections = []
143 | for d, det in enumerate(detections):
144 | if (d not in matched_indices[:, 0]):
145 | unmatched_detections.append(d)
146 | unmatched_trackers = []
147 | for t, trk in enumerate(trackers):
148 | if (t not in matched_indices[:, 1]):
149 | unmatched_trackers.append(t)
150 |
151 | # filter out matched with low IOU
152 | matches = []
153 | for m in matched_indices:
154 | if (iou_matrix[m[0], m[1]] < iou_threshold):
155 | unmatched_detections.append(m[0])
156 | unmatched_trackers.append(m[1])
157 | else:
158 | matches.append(m.reshape(1, 2))
159 | if (len(matches) == 0):
160 | matches = np.empty((0, 2), dtype=int)
161 | else:
162 | matches = np.concatenate(matches, axis=0)
163 |
164 | return matches, np.array(unmatched_detections), np.array(unmatched_trackers)
165 |
166 |
167 | class Sort(object):
168 | def __init__(self, max_age=1, min_hits=3):
169 | """
170 | Sets key parameters for SORT
171 | """
172 | self.max_age = max_age
173 | self.min_hits = min_hits
174 | self.trackers = []
175 | self.frame_count = 0
176 |
177 | def update(self, dets):
178 | """
179 | Params:
180 | dets - a numpy array of detections in the format [[x1,y1,x2,y2,score],[x1,y1,x2,y2,score],...]
181 | Requires: this method must be called once for each frame even with empty detections.
182 | Returns the a similar array, where the last column is the object ID.
183 |
184 | NOTE: The number of objects returned may differ from the number of detections provided.
185 | """
186 | self.frame_count += 1
187 | # get predicted locations from existing trackers.
188 | trks = np.zeros((len(self.trackers), 5))
189 | to_del = []
190 | ret = []
191 | for t, trk in enumerate(trks):
192 | pos = self.trackers[t].predict()[0]
193 | trk[:] = [pos[0], pos[1], pos[2], pos[3], 0]
194 | if np.any(np.isnan(pos)):
195 | to_del.append(t)
196 | trks = np.ma.compress_rows(np.ma.masked_invalid(trks))
197 | for t in reversed(to_del):
198 | self.trackers.pop(t)
199 | matched, unmatched_dets, unmatched_trks = associate_detections_to_trackers(dets, trks)
200 |
201 | # update matched trackers with assigned detections
202 | for t, trk in enumerate(self.trackers):
203 | if t not in unmatched_trks:
204 | d = matched[np.where(matched[:, 1] == t)[0], 0] # d: [n]
205 | trk.update(dets[d, :][0])
206 |
207 | # create and initialise new trackers for unmatched detections
208 | for i in unmatched_dets:
209 | trk = KalmanBoxTracker(dets[i, :])
210 | self.trackers.append(trk)
211 | i = len(self.trackers)
212 | for trk in reversed(self.trackers):
213 | d = trk.get_state()[0]
214 | if ((trk.time_since_update < 1) and (trk.hit_streak >= self.min_hits or self.frame_count <= self.min_hits)):
215 | ret.append(np.concatenate((d, [trk.id + 1])).reshape(1, -1)) # +1 as MOT benchmark requires positive
216 | i -= 1
217 | # remove dead tracklet
218 | if (trk.time_since_update > self.max_age):
219 | self.trackers.pop(i)
220 | if (len(ret) > 0):
221 | return np.concatenate(ret)
222 | return np.empty((0, 5))
223 |
224 |
225 | def parse_args():
226 | """Parse input arguments."""
227 | parser = argparse.ArgumentParser(description='SORT demo')
228 | parser.add_argument('--display', dest='display', help='Display online tracker output (slow) [False]',
229 | action='store_true')
230 | args = parser.parse_args()
231 | return args
232 |
--------------------------------------------------------------------------------
/demo/lib/yolov3/bbox.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 |
3 | import torch
4 | import random
5 | import numpy as np
6 | import cv2
7 |
8 |
9 | def confidence_filter(result, confidence):
10 | conf_mask = (result[:,:,4] > confidence).float().unsqueeze(2)
11 | result = result*conf_mask
12 |
13 | return result
14 |
15 |
16 | def confidence_filter_cls(result, confidence):
17 | max_scores = torch.max(result[:,:,5:25], 2)[0]
18 | res = torch.cat((result, max_scores),2)
19 | print(res.shape)
20 |
21 |
22 | cond_1 = (res[:,:,4] > confidence).float()
23 | cond_2 = (res[:,:,25] > 0.995).float()
24 |
25 | conf = cond_1 + cond_2
26 | conf = torch.clamp(conf, 0.0, 1.0)
27 | conf = conf.unsqueeze(2)
28 | result = result*conf
29 | return result
30 |
31 |
32 | def get_abs_coord(box):
33 | box[2], box[3] = abs(box[2]), abs(box[3])
34 | x1 = (box[0] - box[2]/2) - 1
35 | y1 = (box[1] - box[3]/2) - 1
36 | x2 = (box[0] + box[2]/2) - 1
37 | y2 = (box[1] + box[3]/2) - 1
38 | return x1, y1, x2, y2
39 |
40 |
41 | def sanity_fix(box):
42 | if (box[0] > box[2]):
43 | box[0], box[2] = box[2], box[0]
44 |
45 | if (box[1] > box[3]):
46 | box[1], box[3] = box[3], box[1]
47 |
48 | return box
49 |
50 |
51 | def bbox_iou(box1, box2):
52 | """
53 | Returns the IoU of two bounding boxes
54 |
55 | """
56 | # Get the coordinates of bounding boxes
57 | b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]
58 | b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]
59 |
60 | # get the corrdinates of the intersection rectangle
61 | inter_rect_x1 = torch.max(b1_x1, b2_x1)
62 | inter_rect_y1 = torch.max(b1_y1, b2_y1)
63 | inter_rect_x2 = torch.min(b1_x2, b2_x2)
64 | inter_rect_y2 = torch.min(b1_y2, b2_y2)
65 |
66 | # Intersection area
67 | if torch.cuda.is_available():
68 | inter_area = torch.max(inter_rect_x2 - inter_rect_x1 + 1, torch.zeros(inter_rect_x2.shape).cuda())*torch.max(inter_rect_y2 - inter_rect_y1 + 1, torch.zeros(inter_rect_x2.shape).cuda())
69 | else:
70 | inter_area = torch.max(inter_rect_x2 - inter_rect_x1 + 1, torch.zeros(inter_rect_x2.shape))*torch.max(inter_rect_y2 - inter_rect_y1 + 1, torch.zeros(inter_rect_x2.shape))
71 |
72 | # Union Area
73 | b1_area = (b1_x2 - b1_x1 + 1)*(b1_y2 - b1_y1 + 1)
74 | b2_area = (b2_x2 - b2_x1 + 1)*(b2_y2 - b2_y1 + 1)
75 |
76 | iou = inter_area / (b1_area + b2_area - inter_area)
77 |
78 | return iou
79 |
80 |
81 | def pred_corner_coord(prediction):
82 | #Get indices of non-zero confidence bboxes
83 | ind_nz = torch.nonzero(prediction[:,:,4]).transpose(0,1).contiguous()
84 |
85 | box = prediction[ind_nz[0], ind_nz[1]]
86 |
87 | box_a = box.new(box.shape)
88 | box_a[:,0] = (box[:,0] - box[:,2]/2)
89 | box_a[:,1] = (box[:,1] - box[:,3]/2)
90 | box_a[:,2] = (box[:,0] + box[:,2]/2)
91 | box_a[:,3] = (box[:,1] + box[:,3]/2)
92 | box[:,:4] = box_a[:,:4]
93 |
94 | prediction[ind_nz[0], ind_nz[1]] = box
95 |
96 | return prediction
97 |
98 |
99 | def write(x, batches, results, colors, classes):
100 | c1 = tuple(x[1:3].int())
101 | c2 = tuple(x[3:5].int())
102 | img = results[int(x[0])]
103 | cls = int(x[-1])
104 | label = "{0}".format(classes[cls])
105 | color = random.choice(colors)
106 | cv2.rectangle(img, c1, c2,color, 1)
107 | t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1 , 1)[0]
108 | c2 = c1[0] + t_size[0] + 3, c1[1] + t_size[1] + 4
109 | cv2.rectangle(img, c1, c2,color, -1)
110 | cv2.putText(img, label, (c1[0], c1[1] + t_size[1] + 4), cv2.FONT_HERSHEY_PLAIN, 1, [225,255,255], 1);
111 | return img
112 |
--------------------------------------------------------------------------------
/demo/lib/yolov3/cfg/tiny-yolo-voc.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | batch=64
3 | subdivisions=8
4 | width=416
5 | height=416
6 | channels=3
7 | momentum=0.9
8 | decay=0.0005
9 | angle=0
10 | saturation = 1.5
11 | exposure = 1.5
12 | hue=.1
13 |
14 | learning_rate=0.001
15 | max_batches = 40200
16 | policy=steps
17 | steps=-1,100,20000,30000
18 | scales=.1,10,.1,.1
19 |
20 | [convolutional]
21 | batch_normalize=1
22 | filters=16
23 | size=3
24 | stride=1
25 | pad=1
26 | activation=leaky
27 |
28 | [maxpool]
29 | size=2
30 | stride=2
31 |
32 | [convolutional]
33 | batch_normalize=1
34 | filters=32
35 | size=3
36 | stride=1
37 | pad=1
38 | activation=leaky
39 |
40 | [maxpool]
41 | size=2
42 | stride=2
43 |
44 | [convolutional]
45 | batch_normalize=1
46 | filters=64
47 | size=3
48 | stride=1
49 | pad=1
50 | activation=leaky
51 |
52 | [maxpool]
53 | size=2
54 | stride=2
55 |
56 | [convolutional]
57 | batch_normalize=1
58 | filters=128
59 | size=3
60 | stride=1
61 | pad=1
62 | activation=leaky
63 |
64 | [maxpool]
65 | size=2
66 | stride=2
67 |
68 | [convolutional]
69 | batch_normalize=1
70 | filters=256
71 | size=3
72 | stride=1
73 | pad=1
74 | activation=leaky
75 |
76 | [maxpool]
77 | size=2
78 | stride=2
79 |
80 | [convolutional]
81 | batch_normalize=1
82 | filters=512
83 | size=3
84 | stride=1
85 | pad=1
86 | activation=leaky
87 |
88 | [maxpool]
89 | size=2
90 | stride=1
91 |
92 | [convolutional]
93 | batch_normalize=1
94 | filters=1024
95 | size=3
96 | stride=1
97 | pad=1
98 | activation=leaky
99 |
100 | ###########
101 |
102 | [convolutional]
103 | batch_normalize=1
104 | size=3
105 | stride=1
106 | pad=1
107 | filters=1024
108 | activation=leaky
109 |
110 | [convolutional]
111 | size=1
112 | stride=1
113 | pad=1
114 | filters=125
115 | activation=linear
116 |
117 | [region]
118 | anchors = 1.08,1.19, 3.42,4.41, 6.63,11.38, 9.42,5.11, 16.62,10.52
119 | bias_match=1
120 | classes=20
121 | coords=4
122 | num=5
123 | softmax=1
124 | jitter=.2
125 | rescore=1
126 |
127 | object_scale=5
128 | noobject_scale=1
129 | class_scale=1
130 | coord_scale=1
131 |
132 | absolute=1
133 | thresh = .6
134 | random=1
135 |
--------------------------------------------------------------------------------
/demo/lib/yolov3/cfg/yolo-voc.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | batch=64
4 | subdivisions=8
5 | # Training
6 | # batch=64
7 | # subdivisions=8
8 | height=416
9 | width=416
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=1000
20 | max_batches = 80200
21 | policy=steps
22 | steps=-1,500,40000,60000
23 | scales=0.1,10,.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=32
28 | size=3
29 | stride=1
30 | pad=1
31 | activation=leaky
32 |
33 | [maxpool]
34 | size=2
35 | stride=2
36 |
37 | [convolutional]
38 | batch_normalize=1
39 | filters=64
40 | size=3
41 | stride=1
42 | pad=1
43 | activation=leaky
44 |
45 | [maxpool]
46 | size=2
47 | stride=2
48 |
49 | [convolutional]
50 | batch_normalize=1
51 | filters=128
52 | size=3
53 | stride=1
54 | pad=1
55 | activation=leaky
56 |
57 | [convolutional]
58 | batch_normalize=1
59 | filters=64
60 | size=1
61 | stride=1
62 | pad=1
63 | activation=leaky
64 |
65 | [convolutional]
66 | batch_normalize=1
67 | filters=128
68 | size=3
69 | stride=1
70 | pad=1
71 | activation=leaky
72 |
73 | [maxpool]
74 | size=2
75 | stride=2
76 |
77 | [convolutional]
78 | batch_normalize=1
79 | filters=256
80 | size=3
81 | stride=1
82 | pad=1
83 | activation=leaky
84 |
85 | [convolutional]
86 | batch_normalize=1
87 | filters=128
88 | size=1
89 | stride=1
90 | pad=1
91 | activation=leaky
92 |
93 | [convolutional]
94 | batch_normalize=1
95 | filters=256
96 | size=3
97 | stride=1
98 | pad=1
99 | activation=leaky
100 |
101 | [maxpool]
102 | size=2
103 | stride=2
104 |
105 | [convolutional]
106 | batch_normalize=1
107 | filters=512
108 | size=3
109 | stride=1
110 | pad=1
111 | activation=leaky
112 |
113 | [convolutional]
114 | batch_normalize=1
115 | filters=256
116 | size=1
117 | stride=1
118 | pad=1
119 | activation=leaky
120 |
121 | [convolutional]
122 | batch_normalize=1
123 | filters=512
124 | size=3
125 | stride=1
126 | pad=1
127 | activation=leaky
128 |
129 | [convolutional]
130 | batch_normalize=1
131 | filters=256
132 | size=1
133 | stride=1
134 | pad=1
135 | activation=leaky
136 |
137 | [convolutional]
138 | batch_normalize=1
139 | filters=512
140 | size=3
141 | stride=1
142 | pad=1
143 | activation=leaky
144 |
145 | [maxpool]
146 | size=2
147 | stride=2
148 |
149 | [convolutional]
150 | batch_normalize=1
151 | filters=1024
152 | size=3
153 | stride=1
154 | pad=1
155 | activation=leaky
156 |
157 | [convolutional]
158 | batch_normalize=1
159 | filters=512
160 | size=1
161 | stride=1
162 | pad=1
163 | activation=leaky
164 |
165 | [convolutional]
166 | batch_normalize=1
167 | filters=1024
168 | size=3
169 | stride=1
170 | pad=1
171 | activation=leaky
172 |
173 | [convolutional]
174 | batch_normalize=1
175 | filters=512
176 | size=1
177 | stride=1
178 | pad=1
179 | activation=leaky
180 |
181 | [convolutional]
182 | batch_normalize=1
183 | filters=1024
184 | size=3
185 | stride=1
186 | pad=1
187 | activation=leaky
188 |
189 |
190 | #######
191 |
192 | [convolutional]
193 | batch_normalize=1
194 | size=3
195 | stride=1
196 | pad=1
197 | filters=1024
198 | activation=leaky
199 |
200 | [convolutional]
201 | batch_normalize=1
202 | size=3
203 | stride=1
204 | pad=1
205 | filters=1024
206 | activation=leaky
207 |
208 | [route]
209 | layers=-9
210 |
211 | [convolutional]
212 | batch_normalize=1
213 | size=1
214 | stride=1
215 | pad=1
216 | filters=64
217 | activation=leaky
218 |
219 | [reorg]
220 | stride=2
221 |
222 | [route]
223 | layers=-1,-4
224 |
225 | [convolutional]
226 | batch_normalize=1
227 | size=3
228 | stride=1
229 | pad=1
230 | filters=1024
231 | activation=leaky
232 |
233 | [convolutional]
234 | size=1
235 | stride=1
236 | pad=1
237 | filters=125
238 | activation=linear
239 |
240 |
241 | [region]
242 | anchors = 1.3221, 1.73145, 3.19275, 4.00944, 5.05587, 8.09892, 9.47112, 4.84053, 11.2364, 10.0071
243 | bias_match=1
244 | classes=20
245 | coords=4
246 | num=5
247 | softmax=1
248 | jitter=.3
249 | rescore=1
250 |
251 | object_scale=5
252 | noobject_scale=1
253 | class_scale=1
254 | coord_scale=1
255 |
256 | absolute=1
257 | thresh = .6
258 | random=1
259 |
--------------------------------------------------------------------------------
/demo/lib/yolov3/cfg/yolo.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | batch=1
4 | subdivisions=1
5 | # Training
6 | # batch=64
7 | # subdivisions=8
8 | width=416
9 | height=416
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=1000
20 | max_batches = 500200
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=32
28 | size=3
29 | stride=1
30 | pad=1
31 | activation=leaky
32 |
33 | [maxpool]
34 | size=2
35 | stride=2
36 |
37 | [convolutional]
38 | batch_normalize=1
39 | filters=64
40 | size=3
41 | stride=1
42 | pad=1
43 | activation=leaky
44 |
45 | [maxpool]
46 | size=2
47 | stride=2
48 |
49 | [convolutional]
50 | batch_normalize=1
51 | filters=128
52 | size=3
53 | stride=1
54 | pad=1
55 | activation=leaky
56 |
57 | [convolutional]
58 | batch_normalize=1
59 | filters=64
60 | size=1
61 | stride=1
62 | pad=1
63 | activation=leaky
64 |
65 | [convolutional]
66 | batch_normalize=1
67 | filters=128
68 | size=3
69 | stride=1
70 | pad=1
71 | activation=leaky
72 |
73 | [maxpool]
74 | size=2
75 | stride=2
76 |
77 | [convolutional]
78 | batch_normalize=1
79 | filters=256
80 | size=3
81 | stride=1
82 | pad=1
83 | activation=leaky
84 |
85 | [convolutional]
86 | batch_normalize=1
87 | filters=128
88 | size=1
89 | stride=1
90 | pad=1
91 | activation=leaky
92 |
93 | [convolutional]
94 | batch_normalize=1
95 | filters=256
96 | size=3
97 | stride=1
98 | pad=1
99 | activation=leaky
100 |
101 | [maxpool]
102 | size=2
103 | stride=2
104 |
105 | [convolutional]
106 | batch_normalize=1
107 | filters=512
108 | size=3
109 | stride=1
110 | pad=1
111 | activation=leaky
112 |
113 | [convolutional]
114 | batch_normalize=1
115 | filters=256
116 | size=1
117 | stride=1
118 | pad=1
119 | activation=leaky
120 |
121 | [convolutional]
122 | batch_normalize=1
123 | filters=512
124 | size=3
125 | stride=1
126 | pad=1
127 | activation=leaky
128 |
129 | [convolutional]
130 | batch_normalize=1
131 | filters=256
132 | size=1
133 | stride=1
134 | pad=1
135 | activation=leaky
136 |
137 | [convolutional]
138 | batch_normalize=1
139 | filters=512
140 | size=3
141 | stride=1
142 | pad=1
143 | activation=leaky
144 |
145 | [maxpool]
146 | size=2
147 | stride=2
148 |
149 | [convolutional]
150 | batch_normalize=1
151 | filters=1024
152 | size=3
153 | stride=1
154 | pad=1
155 | activation=leaky
156 |
157 | [convolutional]
158 | batch_normalize=1
159 | filters=512
160 | size=1
161 | stride=1
162 | pad=1
163 | activation=leaky
164 |
165 | [convolutional]
166 | batch_normalize=1
167 | filters=1024
168 | size=3
169 | stride=1
170 | pad=1
171 | activation=leaky
172 |
173 | [convolutional]
174 | batch_normalize=1
175 | filters=512
176 | size=1
177 | stride=1
178 | pad=1
179 | activation=leaky
180 |
181 | [convolutional]
182 | batch_normalize=1
183 | filters=1024
184 | size=3
185 | stride=1
186 | pad=1
187 | activation=leaky
188 |
189 |
190 | #######
191 |
192 | [convolutional]
193 | batch_normalize=1
194 | size=3
195 | stride=1
196 | pad=1
197 | filters=1024
198 | activation=leaky
199 |
200 | [convolutional]
201 | batch_normalize=1
202 | size=3
203 | stride=1
204 | pad=1
205 | filters=1024
206 | activation=leaky
207 |
208 | [route]
209 | layers=-9
210 |
211 | [convolutional]
212 | batch_normalize=1
213 | size=1
214 | stride=1
215 | pad=1
216 | filters=64
217 | activation=leaky
218 |
219 | [reorg]
220 | stride=2
221 |
222 | [route]
223 | layers=-1,-4
224 |
225 | [convolutional]
226 | batch_normalize=1
227 | size=3
228 | stride=1
229 | pad=1
230 | filters=1024
231 | activation=leaky
232 |
233 | [convolutional]
234 | size=1
235 | stride=1
236 | pad=1
237 | filters=425
238 | activation=linear
239 |
240 |
241 | [region]
242 | anchors = 0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828
243 | bias_match=1
244 | classes=80
245 | coords=4
246 | num=5
247 | softmax=1
248 | jitter=.3
249 | rescore=1
250 |
251 | object_scale=5
252 | noobject_scale=1
253 | class_scale=1
254 | coord_scale=1
255 |
256 | absolute=1
257 | thresh = .6
258 | random=1
259 |
--------------------------------------------------------------------------------
/demo/lib/yolov3/cfg/yolov3.cfg:
--------------------------------------------------------------------------------
1 | [net]
2 | # Testing
3 | batch=1
4 | subdivisions=1
5 | # Training
6 | # batch=64
7 | # subdivisions=16
8 | width= 320
9 | height = 320
10 | channels=3
11 | momentum=0.9
12 | decay=0.0005
13 | angle=0
14 | saturation = 1.5
15 | exposure = 1.5
16 | hue=.1
17 |
18 | learning_rate=0.001
19 | burn_in=1000
20 | max_batches = 500200
21 | policy=steps
22 | steps=400000,450000
23 | scales=.1,.1
24 |
25 | [convolutional]
26 | batch_normalize=1
27 | filters=32
28 | size=3
29 | stride=1
30 | pad=1
31 | activation=leaky
32 |
33 | # Downsample
34 |
35 | [convolutional]
36 | batch_normalize=1
37 | filters=64
38 | size=3
39 | stride=2
40 | pad=1
41 | activation=leaky
42 |
43 | [convolutional]
44 | batch_normalize=1
45 | filters=32
46 | size=1
47 | stride=1
48 | pad=1
49 | activation=leaky
50 |
51 | [convolutional]
52 | batch_normalize=1
53 | filters=64
54 | size=3
55 | stride=1
56 | pad=1
57 | activation=leaky
58 |
59 | [shortcut]
60 | from=-3
61 | activation=linear
62 |
63 | # Downsample
64 |
65 | [convolutional]
66 | batch_normalize=1
67 | filters=128
68 | size=3
69 | stride=2
70 | pad=1
71 | activation=leaky
72 |
73 | [convolutional]
74 | batch_normalize=1
75 | filters=64
76 | size=1
77 | stride=1
78 | pad=1
79 | activation=leaky
80 |
81 | [convolutional]
82 | batch_normalize=1
83 | filters=128
84 | size=3
85 | stride=1
86 | pad=1
87 | activation=leaky
88 |
89 | [shortcut]
90 | from=-3
91 | activation=linear
92 |
93 | [convolutional]
94 | batch_normalize=1
95 | filters=64
96 | size=1
97 | stride=1
98 | pad=1
99 | activation=leaky
100 |
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 |
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 |
113 | # Downsample
114 |
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 |
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 |
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 |
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 |
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 |
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 |
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 |
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 |
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 |
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 |
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 |
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 |
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 |
203 |
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 |
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 |
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 |
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 |
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 |
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 |
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 |
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 |
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 |
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 |
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 |
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 |
284 | # Downsample
285 |
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 |
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 |
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 |
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 |
314 |
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 |
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 |
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 |
335 |
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 |
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 |
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 |
356 |
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 |
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 |
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 |
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 |
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 |
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 |
397 |
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 |
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 |
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 |
418 |
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 |
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 |
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 |
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 |
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 |
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 |
459 | # Downsample
460 |
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 |
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 |
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 |
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 |
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 |
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 |
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 |
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 |
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 |
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 |
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 |
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 |
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 |
549 | ######################
550 |
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 |
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 |
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 |
575 | [convolutional]
576 | batch_normalize=1
577 | size=3
578 | stride=1
579 | pad=1
580 | filters=1024
581 | activation=leaky
582 |
583 | [convolutional]
584 | batch_normalize=1
585 | filters=512
586 | size=1
587 | stride=1
588 | pad=1
589 | activation=leaky
590 |
591 | [convolutional]
592 | batch_normalize=1
593 | size=3
594 | stride=1
595 | pad=1
596 | filters=1024
597 | activation=leaky
598 |
599 | [convolutional]
600 | size=1
601 | stride=1
602 | pad=1
603 | filters=255
604 | activation=linear
605 |
606 |
607 | [yolo]
608 | mask = 6,7,8
609 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
610 | classes=80
611 | num=9
612 | jitter=.3
613 | ignore_thresh = .5
614 | truth_thresh = 1
615 | random=1
616 |
617 |
618 | [route]
619 | layers = -4
620 |
621 | [convolutional]
622 | batch_normalize=1
623 | filters=256
624 | size=1
625 | stride=1
626 | pad=1
627 | activation=leaky
628 |
629 | [upsample]
630 | stride=2
631 |
632 | [route]
633 | layers = -1, 61
634 |
635 |
636 |
637 | [convolutional]
638 | batch_normalize=1
639 | filters=256
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 |
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=512
651 | activation=leaky
652 |
653 | [convolutional]
654 | batch_normalize=1
655 | filters=256
656 | size=1
657 | stride=1
658 | pad=1
659 | activation=leaky
660 |
661 | [convolutional]
662 | batch_normalize=1
663 | size=3
664 | stride=1
665 | pad=1
666 | filters=512
667 | activation=leaky
668 |
669 | [convolutional]
670 | batch_normalize=1
671 | filters=256
672 | size=1
673 | stride=1
674 | pad=1
675 | activation=leaky
676 |
677 | [convolutional]
678 | batch_normalize=1
679 | size=3
680 | stride=1
681 | pad=1
682 | filters=512
683 | activation=leaky
684 |
685 | [convolutional]
686 | size=1
687 | stride=1
688 | pad=1
689 | filters=255
690 | activation=linear
691 |
692 |
693 | [yolo]
694 | mask = 3,4,5
695 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
696 | classes=80
697 | num=9
698 | jitter=.3
699 | ignore_thresh = .5
700 | truth_thresh = 1
701 | random=1
702 |
703 |
704 |
705 | [route]
706 | layers = -4
707 |
708 | [convolutional]
709 | batch_normalize=1
710 | filters=128
711 | size=1
712 | stride=1
713 | pad=1
714 | activation=leaky
715 |
716 | [upsample]
717 | stride=2
718 |
719 | [route]
720 | layers = -1, 36
721 |
722 |
723 |
724 | [convolutional]
725 | batch_normalize=1
726 | filters=128
727 | size=1
728 | stride=1
729 | pad=1
730 | activation=leaky
731 |
732 | [convolutional]
733 | batch_normalize=1
734 | size=3
735 | stride=1
736 | pad=1
737 | filters=256
738 | activation=leaky
739 |
740 | [convolutional]
741 | batch_normalize=1
742 | filters=128
743 | size=1
744 | stride=1
745 | pad=1
746 | activation=leaky
747 |
748 | [convolutional]
749 | batch_normalize=1
750 | size=3
751 | stride=1
752 | pad=1
753 | filters=256
754 | activation=leaky
755 |
756 | [convolutional]
757 | batch_normalize=1
758 | filters=128
759 | size=1
760 | stride=1
761 | pad=1
762 | activation=leaky
763 |
764 | [convolutional]
765 | batch_normalize=1
766 | size=3
767 | stride=1
768 | pad=1
769 | filters=256
770 | activation=leaky
771 |
772 | [convolutional]
773 | size=1
774 | stride=1
775 | pad=1
776 | filters=255
777 | activation=linear
778 |
779 |
780 | [yolo]
781 | mask = 0,1,2
782 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
783 | classes=80
784 | num=9
785 | jitter=.3
786 | ignore_thresh = .5
787 | truth_thresh = 1
788 | random=1
789 |
790 |
--------------------------------------------------------------------------------
/demo/lib/yolov3/darknet.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 |
3 | import torch
4 | import torch.nn as nn
5 | import torch.nn.functional as F
6 | import numpy as np
7 | import cv2
8 | import os
9 | import sys
10 |
11 | from lib.yolov3.util import convert2cpu as cpu
12 | from lib.yolov3.util import predict_transform
13 |
14 |
15 | class test_net(nn.Module):
16 | def __init__(self, num_layers, input_size):
17 | super(test_net, self).__init__()
18 | self.num_layers= num_layers
19 | self.linear_1 = nn.Linear(input_size, 5)
20 | self.middle = nn.ModuleList([nn.Linear(5,5) for x in range(num_layers)])
21 | self.output = nn.Linear(5,2)
22 |
23 | def forward(self, x):
24 | x = x.view(-1)
25 | fwd = nn.Sequential(self.linear_1, *self.middle, self.output)
26 | return fwd(x)
27 |
28 |
29 | def get_test_input():
30 | img = cv2.imread("dog-cycle-car.png")
31 | img = cv2.resize(img, (416, 416))
32 | img_ = img[:, :, ::-1].transpose((2, 0, 1))
33 | img_ = img_[np.newaxis, :, :, :]/255.0
34 | img_ = torch.from_numpy(img_).float()
35 | return img_
36 |
37 |
38 | def parse_cfg(cfgfile):
39 | """
40 | Takes a configuration file
41 |
42 | Returns a list of blocks. Each blocks describes a block in the neural
43 | network to be built. Block is represented as a dictionary in the list
44 |
45 | """
46 | # cfgfile = os.path.join(sys.path[-1], cfgfile)
47 | file = open(cfgfile, 'r')
48 | lines = file.read().split('\n') # store the lines in a list
49 | lines = [x for x in lines if len(x) > 0] # get read of the empty lines
50 | lines = [x for x in lines if x[0] != '#']
51 | lines = [x.rstrip().lstrip() for x in lines]
52 |
53 | block = {}
54 | blocks = []
55 |
56 | for line in lines:
57 | if line[0] == "[": # This marks the start of a new block
58 | if len(block) != 0:
59 | blocks.append(block)
60 | block = {}
61 | block["type"] = line[1:-1].rstrip()
62 | else:
63 | key,value = line.split("=")
64 | block[key.rstrip()] = value.lstrip()
65 | blocks.append(block)
66 |
67 | return blocks
68 |
69 |
70 | class MaxPoolStride1(nn.Module):
71 | def __init__(self, kernel_size):
72 | super(MaxPoolStride1, self).__init__()
73 | self.kernel_size = kernel_size
74 | self.pad = kernel_size - 1
75 |
76 | def forward(self, x):
77 | padded_x = F.pad(x, (0, self.pad, 0, self.pad), mode="replicate")
78 | pooled_x = nn.MaxPool2d(self.kernel_size, self.pad)(padded_x)
79 | return pooled_x
80 |
81 |
82 | class EmptyLayer(nn.Module):
83 | def __init__(self):
84 | super(EmptyLayer, self).__init__()
85 |
86 |
87 | class DetectionLayer(nn.Module):
88 | def __init__(self, anchors):
89 | super(DetectionLayer, self).__init__()
90 | self.anchors = anchors
91 |
92 | def forward(self, x, inp_dim, num_classes, confidence):
93 | x = x.data
94 | global CUDA
95 | prediction = x
96 | prediction = predict_transform(prediction, inp_dim, self.anchors, num_classes, confidence, CUDA)
97 | return prediction
98 |
99 |
100 | class Upsample(nn.Module):
101 | def __init__(self, stride=2):
102 | super(Upsample, self).__init__()
103 | self.stride = stride
104 |
105 | def forward(self, x):
106 | stride = self.stride
107 | assert(x.data.dim() == 4)
108 | B = x.data.size(0)
109 | C = x.data.size(1)
110 | H = x.data.size(2)
111 | W = x.data.size(3)
112 | ws = stride
113 | hs = stride
114 | x = x.view(B, C, H, 1, W, 1).expand(B, C, H, stride, W, stride).contiguous().view(B, C, H*stride, W*stride)
115 | return x
116 |
117 |
118 | class ReOrgLayer(nn.Module):
119 | def __init__(self, stride=2):
120 | super(ReOrgLayer, self).__init__()
121 | self.stride= stride
122 |
123 | def forward(self, x):
124 | assert(x.data.dim() == 4)
125 | B, C, H, W = x.data.shape
126 | hs = self.stride
127 | ws = self.stride
128 | assert(H % hs == 0), "The stride " + str(self.stride) + " is not a proper divisor of height " + str(H)
129 | assert(W % ws == 0), "The stride " + str(self.stride) + " is not a proper divisor of height " + str(W)
130 | x = x.view(B, C, H // hs, hs, W // ws, ws).transpose(-2, -3).contiguous()
131 | x = x.view(B, C, H // hs * W // ws, hs, ws)
132 | x = x.view(B, C, H // hs * W // ws, hs*ws).transpose(-1, -2).contiguous()
133 | x = x.view(B, C, ws*hs, H // ws, W // ws).transpose(1, 2).contiguous()
134 | x = x.view(B, C*ws*hs, H // ws, W // ws)
135 | return x
136 |
137 |
138 | def create_modules(blocks):
139 | net_info = blocks[0] # Captures the information about the input and pre-processing
140 |
141 | module_list = nn.ModuleList()
142 |
143 | index = 0 # indexing blocks helps with implementing route layers (skip connections)
144 | prev_filters = 3
145 | output_filters = []
146 |
147 | for x in blocks:
148 | module = nn.Sequential()
149 | if x["type"] == "net":
150 | continue
151 |
152 | # If it's a convolutional layer
153 | if x["type"] == "convolutional":
154 | # Get the info about the layer
155 | activation = x["activation"]
156 | try:
157 | batch_normalize = int(x["batch_normalize"])
158 | bias = False
159 | except:
160 | batch_normalize = 0
161 | bias = True
162 |
163 | filters= int(x["filters"])
164 | padding = int(x["pad"])
165 | kernel_size = int(x["size"])
166 | stride = int(x["stride"])
167 |
168 | if padding:
169 | pad = (kernel_size - 1) // 2
170 | else:
171 | pad = 0
172 |
173 | # Add the convolutional layer
174 | conv = nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias = bias)
175 | module.add_module("conv_{0}".format(index), conv)
176 |
177 | # Add the Batch Norm Layer
178 | if batch_normalize:
179 | bn = nn.BatchNorm2d(filters)
180 | module.add_module("batch_norm_{0}".format(index), bn)
181 |
182 | # Check the activation.
183 | # It is either Linear or a Leaky ReLU for YOLO
184 | if activation == "leaky":
185 | activn = nn.LeakyReLU(0.1, inplace = True)
186 | module.add_module("leaky_{0}".format(index), activn)
187 |
188 | # If it's an upsampling layer
189 | # We use Bilinear2dUpsampling
190 |
191 | elif x["type"] == "upsample":
192 | stride = int(x["stride"])
193 | # upsample = Upsample(stride)
194 | upsample = nn.Upsample(scale_factor=2, mode="nearest")
195 | module.add_module("upsample_{}".format(index), upsample)
196 |
197 | # If it is a route layer
198 | elif (x["type"] == "route"):
199 | x["layers"] = x["layers"].split(',')
200 |
201 | # Start of a route
202 | start = int(x["layers"][0])
203 |
204 | # end, if there exists one.
205 | try:
206 | end = int(x["layers"][1])
207 | except:
208 | end = 0
209 |
210 | # Positive anotation
211 | if start > 0:
212 | start = start - index
213 |
214 | if end > 0:
215 | end = end - index
216 |
217 | route = EmptyLayer()
218 | module.add_module("route_{0}".format(index), route)
219 |
220 | if end < 0:
221 | filters = output_filters[index + start] + output_filters[index + end]
222 | else:
223 | filters = output_filters[index + start]
224 |
225 | # shortcut corresponds to skip connection
226 | elif x["type"] == "shortcut":
227 | from_ = int(x["from"])
228 | shortcut = EmptyLayer()
229 | module.add_module("shortcut_{}".format(index), shortcut)
230 |
231 | elif x["type"] == "maxpool":
232 | stride = int(x["stride"])
233 | size = int(x["size"])
234 | if stride != 1:
235 | maxpool = nn.MaxPool2d(size, stride)
236 | else:
237 | maxpool = MaxPoolStride1(size)
238 |
239 | module.add_module("maxpool_{}".format(index), maxpool)
240 |
241 | # Yolo is the detection layer
242 | elif x["type"] == "yolo":
243 | mask = x["mask"].split(",")
244 | mask = [int(x) for x in mask]
245 |
246 | anchors = x["anchors"].split(",")
247 | anchors = [int(a) for a in anchors]
248 | anchors = [(anchors[i], anchors[i+1]) for i in range(0, len(anchors),2)]
249 | anchors = [anchors[i] for i in mask]
250 |
251 | detection = DetectionLayer(anchors)
252 | module.add_module("Detection_{}".format(index), detection)
253 |
254 | else:
255 | print("Something I dunno")
256 | assert False
257 |
258 | module_list.append(module)
259 | prev_filters = filters
260 | output_filters.append(filters)
261 | index += 1
262 |
263 | return (net_info, module_list)
264 |
265 |
266 | class Darknet(nn.Module):
267 | def __init__(self, cfgfile):
268 | super(Darknet, self).__init__()
269 | self.blocks = parse_cfg(cfgfile)
270 | self.net_info, self.module_list = create_modules(self.blocks)
271 | self.header = torch.IntTensor([0, 0, 0, 0])
272 | self.seen = 0
273 |
274 | def get_blocks(self):
275 | return self.blocks
276 |
277 | def get_module_list(self):
278 | return self.module_list
279 |
280 | def forward(self, x, CUDA):
281 | detections = []
282 | modules = self.blocks[1:]
283 | outputs = {} # We cache the outputs for the route layer
284 |
285 | write = 0
286 | for i in range(len(modules)):
287 |
288 | module_type = (modules[i]["type"])
289 | if module_type == "convolutional" or module_type == "upsample" or module_type == "maxpool":
290 |
291 | x = self.module_list[i](x)
292 | outputs[i] = x
293 |
294 | elif module_type == "route":
295 | layers = modules[i]["layers"]
296 | layers = [int(a) for a in layers]
297 |
298 | if (layers[0]) > 0:
299 | layers[0] = layers[0] - i
300 |
301 | if len(layers) == 1:
302 | x = outputs[i + (layers[0])]
303 |
304 | else:
305 | if (layers[1]) > 0:
306 | layers[1] = layers[1] - i
307 |
308 | map1 = outputs[i + layers[0]]
309 | map2 = outputs[i + layers[1]]
310 |
311 | x = torch.cat((map1, map2), 1)
312 | outputs[i] = x
313 |
314 | elif module_type == "shortcut":
315 | from_ = int(modules[i]["from"])
316 | x = outputs[i-1] + outputs[i+from_]
317 | outputs[i] = x
318 |
319 | elif module_type == 'yolo':
320 |
321 | anchors = self.module_list[i][0].anchors
322 | # Get the input dimensions
323 | inp_dim = int(self.net_info["height"])
324 |
325 | # Get the number of classes
326 | num_classes = int(modules[i]["classes"])
327 |
328 | # Output the result
329 | x = x.data
330 | x = predict_transform(x, inp_dim, anchors, num_classes, CUDA)
331 |
332 | if type(x) == int:
333 | continue
334 |
335 | if not write:
336 | detections = x
337 | write = 1
338 | else:
339 | detections = torch.cat((detections, x), 1)
340 |
341 | outputs[i] = outputs[i-1]
342 |
343 | try:
344 | return detections
345 | except:
346 | return 0
347 |
348 | def load_weights(self, weightfile):
349 | # Introduction: https://blog.paperspace.com/how-to-implement-a-yolo-v3-object-detector-from-scratch-in-pytorch-part-3/
350 | # Open the weights file
351 | # weightfile = os.path.join(sys.path[-1], weightfile)
352 | fp = open(weightfile, "rb")
353 |
354 | # The first 5 values are header information
355 | # 1. Major version number
356 | # 2. Minor Version Number
357 | # 3. Subversion number
358 | # 4.5 Images seen by the network (during training)
359 | header = np.fromfile(fp, dtype = np.int32, count = 5)
360 | self.header = torch.from_numpy(header)
361 | self.seen = self.header[3]
362 |
363 | # The rest of the values are the weights
364 | # Let's load them up
365 | weights = np.fromfile(fp, dtype = np.float32)
366 |
367 | ptr = 0
368 | for i in range(len(self.module_list)):
369 | module_type = self.blocks[i + 1]["type"]
370 |
371 | if module_type == "convolutional":
372 | model = self.module_list[i]
373 | try:
374 | batch_normalize = int(self.blocks[i+1]["batch_normalize"])
375 | except:
376 | batch_normalize = 0
377 |
378 | conv = model[0]
379 |
380 | if (batch_normalize):
381 | bn = model[1]
382 |
383 | # Get the number of weights of Batch Norm Layer
384 | num_bn_biases = bn.bias.numel()
385 |
386 | # Load the weights
387 | bn_biases = torch.from_numpy(weights[ptr:ptr + num_bn_biases])
388 | ptr += num_bn_biases
389 |
390 | bn_weights = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
391 | ptr += num_bn_biases
392 |
393 | bn_running_mean = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
394 | ptr += num_bn_biases
395 |
396 | bn_running_var = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
397 | ptr += num_bn_biases
398 |
399 | # Cast the loaded weights into dims of model weights.
400 | bn_biases = bn_biases.view_as(bn.bias.data)
401 | bn_weights = bn_weights.view_as(bn.weight.data)
402 | bn_running_mean = bn_running_mean.view_as(bn.running_mean)
403 | bn_running_var = bn_running_var.view_as(bn.running_var)
404 |
405 | # Copy the data to model
406 | bn.bias.data.copy_(bn_biases)
407 | bn.weight.data.copy_(bn_weights)
408 | bn.running_mean.copy_(bn_running_mean)
409 | bn.running_var.copy_(bn_running_var)
410 |
411 | else:
412 | # Number of biases
413 | num_biases = conv.bias.numel()
414 |
415 | # Load the weights
416 | conv_biases = torch.from_numpy(weights[ptr: ptr + num_biases])
417 | ptr = ptr + num_biases
418 |
419 | # reshape the loaded weights according to the dims of the model weights
420 | conv_biases = conv_biases.view_as(conv.bias.data)
421 |
422 | # Finally copy the data
423 | conv.bias.data.copy_(conv_biases)
424 |
425 | # Let us load the weights for the Convolutional layers
426 | num_weights = conv.weight.numel()
427 |
428 | # Do the same as above for weights
429 | conv_weights = torch.from_numpy(weights[ptr:ptr+num_weights])
430 | ptr = ptr + num_weights
431 |
432 | conv_weights = conv_weights.view_as(conv.weight.data)
433 | conv.weight.data.copy_(conv_weights)
434 |
--------------------------------------------------------------------------------
/demo/lib/yolov3/data/coco.names:
--------------------------------------------------------------------------------
1 | person
2 | bicycle
3 | car
4 | motorbike
5 | aeroplane
6 | bus
7 | train
8 | truck
9 | boat
10 | traffic light
11 | fire hydrant
12 | stop sign
13 | parking meter
14 | bench
15 | bird
16 | cat
17 | dog
18 | horse
19 | sheep
20 | cow
21 | elephant
22 | bear
23 | zebra
24 | giraffe
25 | backpack
26 | umbrella
27 | handbag
28 | tie
29 | suitcase
30 | frisbee
31 | skis
32 | snowboard
33 | sports ball
34 | kite
35 | baseball bat
36 | baseball glove
37 | skateboard
38 | surfboard
39 | tennis racket
40 | bottle
41 | wine glass
42 | cup
43 | fork
44 | knife
45 | spoon
46 | bowl
47 | banana
48 | apple
49 | sandwich
50 | orange
51 | broccoli
52 | carrot
53 | hot dog
54 | pizza
55 | donut
56 | cake
57 | chair
58 | sofa
59 | pottedplant
60 | bed
61 | diningtable
62 | toilet
63 | tvmonitor
64 | laptop
65 | mouse
66 | remote
67 | keyboard
68 | cell phone
69 | microwave
70 | oven
71 | toaster
72 | sink
73 | refrigerator
74 | book
75 | clock
76 | vase
77 | scissors
78 | teddy bear
79 | hair drier
80 | toothbrush
81 |
--------------------------------------------------------------------------------
/demo/lib/yolov3/data/pallete:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/demo/lib/yolov3/data/pallete
--------------------------------------------------------------------------------
/demo/lib/yolov3/data/voc.names:
--------------------------------------------------------------------------------
1 | aeroplane
2 | bicycle
3 | bird
4 | boat
5 | bottle
6 | bus
7 | car
8 | cat
9 | chair
10 | cow
11 | diningtable
12 | dog
13 | horse
14 | motorbike
15 | person
16 | pottedplant
17 | sheep
18 | sofa
19 | train
20 | tvmonitor
21 |
--------------------------------------------------------------------------------
/demo/lib/yolov3/human_detector.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 | import time
3 | import torch
4 | import numpy as np
5 | import cv2
6 | import os
7 | import sys
8 | import random
9 | import pickle as pkl
10 | import argparse
11 |
12 | from lib.yolov3.util import *
13 | from lib.yolov3.darknet import Darknet
14 | from lib.yolov3 import preprocess
15 |
16 | cur_dir = os.path.dirname(os.path.realpath(__file__))
17 | project_root = os.path.join(cur_dir, '../../../')
18 | chk_root = os.path.join(project_root, 'checkpoint/')
19 | data_root = os.path.join(project_root, 'data/')
20 |
21 |
22 | sys.path.insert(0, project_root)
23 | sys.path.pop(0)
24 |
25 |
26 | def prep_image(img, inp_dim):
27 | """
28 | Prepare image for inputting to the neural network.
29 |
30 | Returns a Variable
31 | """
32 | ori_img = img
33 | dim = ori_img.shape[1], ori_img.shape[0]
34 | img = cv2.resize(ori_img, (inp_dim, inp_dim))
35 | img_ = img[:, :, ::-1].transpose((2, 0, 1)).copy()
36 | img_ = torch.from_numpy(img_).float().div(255.0).unsqueeze(0)
37 | return img_, ori_img, dim
38 |
39 |
40 | def write(x, img, colors):
41 | x = [int(i) for i in x]
42 | c1 = tuple(x[0:2])
43 | c2 = tuple(x[2:4])
44 |
45 | label = 'People {}'.format(0)
46 | color = (0, 0, 255)
47 | cv2.rectangle(img, c1, c2, color, 2)
48 | t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1, 1)[0]
49 | c2 = c1[0] + t_size[0] + 3, c1[1] + t_size[1] + 4
50 | cv2.rectangle(img, c1, c2, color, -1)
51 | cv2.putText(img, label, (c1[0], c1[1] + t_size[1] + 4), cv2.FONT_HERSHEY_PLAIN, 1, [225, 255, 255], 1)
52 | return img
53 |
54 |
55 | def arg_parse():
56 | """"
57 | Parse arguements to the detect module
58 |
59 | """
60 | parser = argparse.ArgumentParser(description='YOLO v3 Cam Demo')
61 | parser.add_argument('--confidence', dest='confidence', type=float, default=0.70,
62 | help='Object Confidence to filter predictions')
63 | parser.add_argument('--nms-thresh', dest='nms_thresh', type=float, default=0.4, help='NMS Threshold')
64 | parser.add_argument('--reso', dest='reso', default=416, type=int, help='Input resolution of the network. '
65 | 'Increase to increase accuracy. Decrease to increase speed. (160, 416)')
66 | parser.add_argument('-wf', '--weight-file', type=str, default= 'demo/lib/checkpoint/yolov3.weights', help='The path'
67 | 'of model weight file')
68 | parser.add_argument('-cf', '--cfg-file', type=str, default=cur_dir + '/cfg/yolov3.cfg', help='weight file')
69 | parser.add_argument('-a', '--animation', action='store_true', help='output animation')
70 | parser.add_argument('-v', '--video', type=str, default='camera', help='The input video path')
71 | parser.add_argument("-f", "--figure", type=str, default='demo.jpg',help="input figure file name")
72 | parser.add_argument('-i', '--image', type=str, default=cur_dir + '/data/dog-cycle-car.png',
73 | help='The input video path')
74 | parser.add_argument('-np', '--num-person', type=int, default=1, help='number of estimated human poses. [1, 2]')
75 | parser.add_argument('--gpu', type=str, default='0', help='input video')
76 |
77 | return parser.parse_args()
78 |
79 |
80 | def load_model(args=None, CUDA=None, inp_dim=416):
81 | if args is None:
82 | args = arg_parse()
83 |
84 | if CUDA is None:
85 | CUDA = torch.cuda.is_available()
86 |
87 | # Set up the neural network
88 | model = Darknet(args.cfg_file)
89 | model.load_weights(args.weight_file)
90 | # print("YOLOv3 network successfully loaded")
91 |
92 | model.net_info["height"] = inp_dim
93 | assert inp_dim % 32 == 0
94 | assert inp_dim > 32
95 |
96 | # If there's a GPU availible, put the model on GPU
97 | if CUDA:
98 | model.cuda()
99 |
100 | # Set the model in evaluation mode
101 | model.eval()
102 |
103 | return model
104 |
105 |
106 | def yolo_human_det(img, model=None, reso=416, confidence=0.70):
107 | args = arg_parse()
108 | # args.reso = reso
109 | inp_dim = reso
110 | num_classes = 80
111 |
112 | CUDA = torch.cuda.is_available()
113 | if model is None:
114 | model = load_model(args, CUDA, inp_dim)
115 |
116 | if type(img) == str:
117 | assert os.path.isfile(img), 'The image path does not exist'
118 | img = cv2.imread(img)
119 |
120 | img, ori_img, img_dim = preprocess.prep_image(img, inp_dim)
121 | img_dim = torch.FloatTensor(img_dim).repeat(1, 2)
122 |
123 | with torch.no_grad():
124 | if CUDA:
125 | img_dim = img_dim.cuda()
126 | img = img.cuda()
127 | output = model(img, CUDA)
128 | output = write_results(output, confidence, num_classes, nms=True, nms_conf=args.nms_thresh, det_hm=True)
129 |
130 | if len(output) == 0:
131 | return None, None
132 |
133 | img_dim = img_dim.repeat(output.size(0), 1)
134 | scaling_factor = torch.min(inp_dim / img_dim, 1)[0].view(-1, 1)
135 |
136 | output[:, [1, 3]] -= (inp_dim - scaling_factor * img_dim[:, 0].view(-1, 1)) / 2
137 | output[:, [2, 4]] -= (inp_dim - scaling_factor * img_dim[:, 1].view(-1, 1)) / 2
138 | output[:, 1:5] /= scaling_factor
139 |
140 | for i in range(output.shape[0]):
141 | output[i, [1, 3]] = torch.clamp(output[i, [1, 3]], 0.0, img_dim[i, 0])
142 | output[i, [2, 4]] = torch.clamp(output[i, [2, 4]], 0.0, img_dim[i, 1])
143 |
144 | bboxs = []
145 | scores = []
146 | for i in range(len(output)):
147 | item = output[i]
148 | bbox = item[1:5].cpu().numpy()
149 | # conver float32 to .2f data
150 | bbox = [round(i, 2) for i in list(bbox)]
151 | score = item[5].cpu().numpy()
152 | bboxs.append(bbox)
153 | scores.append(score)
154 | scores = np.expand_dims(np.array(scores), 1)
155 | bboxs = np.array(bboxs)
156 |
157 | return bboxs, scores
158 |
--------------------------------------------------------------------------------
/demo/lib/yolov3/preprocess.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 |
3 | import torch
4 | import numpy as np
5 | import cv2
6 | from PIL import Image
7 |
8 |
9 | def letterbox_image(img, inp_dim):
10 | '''resize image with unchanged aspect ratio using padding'''
11 | img_w, img_h = img.shape[1], img.shape[0]
12 | w, h = inp_dim
13 | new_w = int(img_w * min(w/img_w, h/img_h))
14 | new_h = int(img_h * min(w/img_w, h/img_h))
15 | resized_image = cv2.resize(img, (new_w, new_h), interpolation=cv2.INTER_CUBIC)
16 |
17 | canvas = np.full((inp_dim[1], inp_dim[0], 3), 128)
18 |
19 | canvas[(h - new_h) // 2:(h - new_h) // 2 + new_h, (w - new_w) // 2:(w - new_w) // 2 + new_w, :] = resized_image
20 |
21 | return canvas
22 |
23 |
24 | def prep_image(img, inp_dim):
25 | """
26 | Prepare image for inputting to the neural network.
27 |
28 | Returns a Variable
29 | """
30 | if type(img) == str:
31 | orig_im = cv2.imread(img)
32 | else:
33 | orig_im = img
34 | dim = orig_im.shape[1], orig_im.shape[0]
35 | img = (letterbox_image(orig_im, (inp_dim, inp_dim)))
36 | img_ = img[:, :, ::-1].transpose((2, 0, 1)).copy()
37 | img_ = torch.from_numpy(img_).float().div(255.0).unsqueeze(0)
38 | return img_, orig_im, dim
39 |
40 |
41 | def prep_image_pil(img, network_dim):
42 | orig_im = Image.open(img)
43 | img = orig_im.convert('RGB')
44 | dim = img.size
45 | img = img.resize(network_dim)
46 | img = torch.ByteTensor(torch.ByteStorage.from_buffer(img.tobytes()))
47 | img = img.view(*network_dim, 3).transpose(0, 1).transpose(0, 2).contiguous()
48 | img = img.view(1, 3, *network_dim)
49 | img = img.float().div(255.0)
50 | return img, orig_im, dim
51 |
52 |
53 | def inp_to_image(inp):
54 | inp = inp.cpu().squeeze()
55 | inp = inp * 255
56 | try:
57 | inp = inp.data.numpy()
58 | except RuntimeError:
59 | inp = inp.numpy()
60 | inp = inp.transpose(1, 2, 0)
61 |
62 | inp = inp[:, :, ::-1]
63 | return inp
64 |
--------------------------------------------------------------------------------
/demo/lib/yolov3/util.py:
--------------------------------------------------------------------------------
1 | from __future__ import division
2 |
3 | import torch
4 | import numpy as np
5 | import cv2
6 | import os.path as osp
7 | from lib.yolov3.bbox import bbox_iou
8 |
9 |
10 | def get_path(cur_file):
11 | cur_dir = osp.dirname(osp.realpath(cur_file))
12 | project_root = osp.join(cur_dir, '../../../')
13 | chk_root = osp.join(project_root, 'checkpoint/')
14 | data_root = osp.join(project_root, 'data/')
15 |
16 | return project_root, chk_root, data_root, cur_dir
17 |
18 |
19 | def count_parameters(model):
20 | return sum(p.numel() for p in model.parameters())
21 |
22 |
23 | def count_learnable_parameters(model):
24 | return sum(p.numel() for p in model.parameters() if p.requires_grad)
25 |
26 |
27 | def convert2cpu(matrix):
28 | if matrix.is_cuda:
29 | return torch.FloatTensor(matrix.size()).copy_(matrix)
30 | else:
31 | return matrix
32 |
33 |
34 | def predict_transform(prediction, inp_dim, anchors, num_classes, CUDA = True):
35 | batch_size = prediction.size(0)
36 | stride = inp_dim // prediction.size(2)
37 | grid_size = inp_dim // stride
38 | bbox_attrs = 5 + num_classes
39 | num_anchors = len(anchors)
40 |
41 | anchors = [(a[0]/stride, a[1]/stride) for a in anchors]
42 |
43 | prediction = prediction.view(batch_size, bbox_attrs*num_anchors, grid_size*grid_size)
44 | prediction = prediction.transpose(1, 2).contiguous()
45 | prediction = prediction.view(batch_size, grid_size*grid_size*num_anchors, bbox_attrs)
46 |
47 | # Sigmoid the centre_X, centre_Y. and object confidencce
48 | prediction[:, :, 0] = torch.sigmoid(prediction[:, :, 0])
49 | prediction[:, :, 1] = torch.sigmoid(prediction[:, :, 1])
50 | prediction[:, :, 4] = torch.sigmoid(prediction[:, :, 4])
51 |
52 | # Add the center offsets
53 | grid_len = np.arange(grid_size)
54 | a, b = np.meshgrid(grid_len, grid_len)
55 |
56 | x_offset = torch.FloatTensor(a).view(-1, 1)
57 | y_offset = torch.FloatTensor(b).view(-1, 1)
58 |
59 | if CUDA:
60 | x_offset = x_offset.cuda()
61 | y_offset = y_offset.cuda()
62 |
63 | x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1, num_anchors).view(-1, 2).unsqueeze(0)
64 |
65 | prediction[:, :, :2] += x_y_offset
66 |
67 | # log space transform height and the width
68 | anchors = torch.FloatTensor(anchors)
69 |
70 | if CUDA:
71 | anchors = anchors.cuda()
72 |
73 | anchors = anchors.repeat(grid_size*grid_size, 1).unsqueeze(0)
74 | prediction[:, :, 2:4] = torch.exp(prediction[:, :, 2:4])*anchors
75 |
76 | # Softmax the class scores
77 | prediction[:, :, 5: 5 + num_classes] = torch.sigmoid((prediction[:, :, 5: 5 + num_classes]))
78 |
79 | prediction[:, :, :4] *= stride
80 |
81 | return prediction
82 |
83 |
84 | def load_classes(namesfile):
85 | fp = open(namesfile, "r")
86 | names = fp.read().split("\n")[:-1]
87 | return names
88 |
89 |
90 | def get_im_dim(im):
91 | im = cv2.imread(im)
92 | w, h = im.shape[1], im.shape[0]
93 | return w, h
94 |
95 |
96 | def unique(tensor):
97 | tensor_np = tensor.cpu().numpy()
98 | unique_np = np.unique(tensor_np)
99 | unique_tensor = torch.from_numpy(unique_np)
100 |
101 | tensor_res = tensor.new(unique_tensor.shape)
102 | tensor_res.copy_(unique_tensor)
103 | return tensor_res
104 |
105 |
106 | # ADD SOFT NMS
107 | def write_results(prediction, confidence, num_classes, nms=True, nms_conf=0.4, det_hm=False):
108 | """
109 | https://blog.paperspace.com/how-to-implement-a-yolo-v3-object-detector-from-scratch-in-pytorch-part-4/
110 | prediction: (B x 10647 x 85)
111 | B: the number of images in a batch,
112 | 10647: the number of bounding boxes predicted per image. (52×52+26×26+13×13)×3=10647
113 | 85: the number of bounding box attributes. (c_x, c_y, w, h, object confidence, and 80 class scores)
114 |
115 | output: Num_obj × [img_index, x_1, y_1, x_2, y_2, object confidence, class_score, label_index]
116 | """
117 |
118 | conf_mask = (prediction[:, :, 4] > confidence).float().unsqueeze(2)
119 | prediction = prediction*conf_mask
120 |
121 | box_a = prediction.new(prediction.shape)
122 | box_a[:, :, 0] = (prediction[:, :, 0] - prediction[:, :, 2]/2)
123 | box_a[:, :, 1] = (prediction[:, :, 1] - prediction[:, :, 3]/2)
124 | box_a[:, :, 2] = (prediction[:, :, 0] + prediction[:, :, 2]/2)
125 | box_a[:, :, 3] = (prediction[:, :, 1] + prediction[:, :, 3]/2)
126 | prediction[:, :, :4] = box_a[:, :, :4]
127 |
128 | batch_size = prediction.size(0)
129 |
130 | output = prediction.new(1, prediction.size(2) + 1)
131 | write = False
132 |
133 | for ind in range(batch_size):
134 | # select the image from the batch
135 | image_pred = prediction[ind]
136 |
137 | # Get the class having maximum score, and the index of that class
138 | # Get rid of num_classes softmax scores
139 | # Add the class index and the class score of class having maximum score
140 | max_conf, max_conf_index = torch.max(image_pred[:, 5:5 + num_classes], 1)
141 | max_conf = max_conf.float().unsqueeze(1)
142 | max_conf_index = max_conf_index.float().unsqueeze(1)
143 | seq = (image_pred[:, :5], max_conf, max_conf_index)
144 | image_pred = torch.cat(seq, 1) # image_pred:(10647, 7) 7:[x1, y1, x2, y2, obj_score, max_conf, max_conf_index]
145 |
146 | # Get rid of the zero entries
147 | non_zero_ind = (torch.nonzero(image_pred[:, 4]))
148 | image_pred__ = image_pred[non_zero_ind.squeeze(), :].view(-1, 7)
149 |
150 | # filters out people id
151 | if det_hm:
152 | cls_mask = (image_pred__[:, -1] == 0).float()
153 | class_mask_ind = torch.nonzero(cls_mask).squeeze()
154 | image_pred_ = image_pred__[class_mask_ind].view(-1, 7)
155 |
156 | if torch.sum(cls_mask) == 0:
157 | return image_pred_
158 | else:
159 | image_pred_ = image_pred__
160 |
161 | # Get the various classes detected in the image
162 | try:
163 | # img_classes = unique(image_pred_[:, -1])
164 | img_classes = torch.unique(image_pred_[:, -1], sorted=True).float()
165 | except:
166 | continue
167 |
168 | # We will do NMS classwise
169 | # import ipdb;ipdb.set_trace()
170 | for cls in img_classes:
171 | # get the detections with one particular class
172 | cls_mask = image_pred_*(image_pred_[:, -1] == cls).float().unsqueeze(1)
173 | class_mask_ind = torch.nonzero(cls_mask[:, -2]).squeeze()
174 | image_pred_class = image_pred_[class_mask_ind].view(-1, 7)
175 |
176 | # sort the detections such that the entry with the maximum objectness
177 | # confidence is at the top
178 | conf_sort_index = torch.sort(image_pred_class[:, 4], descending=True)[1]
179 | image_pred_class = image_pred_class[conf_sort_index]
180 | idx = image_pred_class.size(0)
181 |
182 | # from soft_NMS import soft_nms
183 | # boxes = image_pred_class[:,:4]
184 | # scores = image_pred_class[:, 4]
185 | # k, N = soft_nms(boxes, scores, method=2)
186 | # image_pred_class = image_pred_class[k]
187 |
188 | # if nms has to be done
189 | if nms:
190 | # For each detection
191 | for i in range(idx):
192 | # Get the IOUs of all boxes that come after the one we are looking at
193 | # in the loop
194 | try:
195 | ious = bbox_iou(image_pred_class[i].unsqueeze(0), image_pred_class[i+1:])
196 | except ValueError:
197 | break
198 |
199 | except IndexError:
200 | break
201 |
202 | # Zero out all the detections that have IoU > threshold
203 | iou_mask = (ious < nms_conf).float().unsqueeze(1)
204 | image_pred_class[i+1:] *= iou_mask
205 |
206 | # Remove the zero entries
207 | non_zero_ind = torch.nonzero(image_pred_class[:, 4]).squeeze()
208 | image_pred_class = image_pred_class[non_zero_ind].view(-1, 7)
209 |
210 | # Concatenate the batch_id of the image to the detection
211 | # this helps us identify which image does the detection correspond to
212 | # We use a linear structure to hold ALL the detections from the batch
213 | # the batch_dim is flattened
214 | # batch is identified by extra batch column
215 |
216 | batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind)
217 | seq = batch_ind, image_pred_class
218 | if not write:
219 | output = torch.cat(seq, 1)
220 | write = True
221 | else:
222 | out = torch.cat(seq, 1)
223 | output = torch.cat((output, out))
224 |
225 | return output
226 |
--------------------------------------------------------------------------------
/demo/vis.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import cv2
3 |
4 | from lib.preprocess import h36m_coco_format, revise_kpts
5 | from lib.hrnet.gen_kpts import gen_video_kpts as hrnet_pose
6 | import os
7 | import numpy as np
8 | import torch
9 | import glob
10 | from tqdm import tqdm
11 | import copy
12 | import shutil
13 | from IPython import embed
14 |
15 |
16 | sys.path.append(os.getcwd())
17 | from model.GCN_conv import adj_mx_from_skeleton
18 | from model.trans import HTNet
19 | from common.camera import *
20 | from common.h36m_dataset import Human36mDataset
21 | from common.camera import camera_to_world
22 | from common.opt import opts
23 | opt = opts().parse()
24 |
25 | import matplotlib
26 | import matplotlib.pyplot as plt
27 | import matplotlib.gridspec as gridspec
28 |
29 | os.environ["CUDA_VISIBLE_DEVICES"] = opt.gpu
30 |
31 | plt.switch_backend('agg')
32 | matplotlib.rcParams['pdf.fonttype'] = 42
33 | matplotlib.rcParams['ps.fonttype'] = 42
34 |
35 | dataset_path = './dataset/data_3d_h36m.npz'
36 | dataset = Human36mDataset(dataset_path, opt)
37 | adj = adj_mx_from_skeleton(dataset.skeleton())
38 |
39 | def show2Dpose(kps, img):
40 | connections = [[0, 1], [1, 2], [2, 3], [0, 4], [4, 5],
41 | [5, 6], [0, 7], [7, 8], [8, 9], [9, 10],
42 | [8, 11], [11, 12], [12, 13], [8, 14], [14, 15], [15, 16]]
43 |
44 | LR = np.array([0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0], dtype=bool)
45 |
46 | lcolor = (255, 0, 0)
47 | rcolor = (0, 0, 255)
48 | thickness = 3
49 |
50 | for j,c in enumerate(connections):
51 | start = map(int, kps[c[0]])
52 | end = map(int, kps[c[1]])
53 | start = list(start)
54 | end = list(end)
55 | cv2.line(img, (start[0], start[1]), (end[0], end[1]), lcolor if LR[j] else rcolor, thickness)
56 | cv2.circle(img, (start[0], start[1]), thickness=-1, color=(0, 255, 0), radius=3)
57 | cv2.circle(img, (end[0], end[1]), thickness=-1, color=(0, 255, 0), radius=3)
58 |
59 | return img
60 |
61 |
62 |
63 | def show3Dpose(vals, ax):
64 | ax.view_init(elev=15., azim=70)
65 |
66 | lcolor=(0,0,1)
67 | rcolor=(1,0,0)
68 |
69 | I = np.array( [0, 0, 1, 4, 2, 5, 0, 7, 8, 8, 14, 15, 11, 12, 8, 9])
70 | J = np.array( [1, 4, 2, 5, 3, 6, 7, 8, 14, 11, 15, 16, 12, 13, 9, 10])
71 |
72 | LR = np.array([0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0], dtype=bool)
73 |
74 | for i in np.arange( len(I) ):
75 | x, y, z = [np.array( [vals[I[i], j], vals[J[i], j]] ) for j in range(3)]
76 | ax.plot(x, y, z, lw=2, color = lcolor if LR[i] else rcolor)
77 |
78 | RADIUS = 0.72
79 | RADIUS_Z = 0.7
80 |
81 | xroot, yroot, zroot = vals[0,0], vals[0,1], vals[0,2]
82 | ax.set_xlim3d([-RADIUS+xroot, RADIUS+xroot])
83 | ax.set_ylim3d([-RADIUS+yroot, RADIUS+yroot])
84 | ax.set_zlim3d([-RADIUS_Z+zroot, RADIUS_Z+zroot])
85 | ax.set_aspect('auto') # works fine in matplotlib==2.2.2
86 |
87 | white = (1.0, 1.0, 1.0, 0.0)
88 | ax.xaxis.set_pane_color(white)
89 | ax.yaxis.set_pane_color(white)
90 | ax.zaxis.set_pane_color(white)
91 |
92 | ax.tick_params('x', labelbottom = False)
93 | ax.tick_params('y', labelleft = False)
94 | ax.tick_params('z', labelleft = False)
95 |
96 |
97 |
98 |
99 | def showimage(ax, img):
100 | ax.set_xticks([])
101 | ax.set_yticks([])
102 | plt.axis('off')
103 | ax.imshow(img)
104 |
105 |
106 | def get_pose3D(figure_path, output_dir, file_name):
107 | # Genarate 2D pose
108 | keypoints, scores = hrnet_pose(figure_path, det_dim=416, num_peroson=1, gen_output=True)
109 | keypoints, scores, valid_frames = h36m_coco_format(keypoints, scores)
110 |
111 | ## Reload
112 | previous_dir = './ckpt/cpn'
113 | model = HTNet(opt, adj).cuda()
114 | model_dict = model.state_dict()
115 | model_path = sorted(glob.glob(os.path.join(previous_dir, '*.pth')))[0]
116 | pre_dict = torch.load(model_path)
117 | for name, key in model_dict.items():
118 | model_dict[name] = pre_dict[name]
119 | model.load_state_dict(model_dict)
120 | model.eval()
121 |
122 | ## 3D
123 | img = cv2.imread(figure_path)
124 | img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
125 | img_size = img.shape
126 | input_2D_no = keypoints[:,0,:,:]
127 |
128 | joints_left = [4, 5, 6, 11, 12, 13]
129 | joints_right = [1, 2, 3, 14, 15, 16]
130 |
131 | input_2D = normalize_screen_coordinates(input_2D_no, w=img_size[1], h=img_size[0])
132 |
133 | input_2D_aug = copy.deepcopy(input_2D)
134 | input_2D_aug[ :, :, 0] *= -1
135 | input_2D_aug[ :, joints_left + joints_right] = input_2D_aug[ :, joints_right + joints_left]
136 | input_2D = np.concatenate((np.expand_dims(input_2D, axis=0), np.expand_dims(input_2D_aug, axis=0)), 0)
137 |
138 | input_2D = input_2D[np.newaxis, :, :, :, :]
139 |
140 | input_2D = torch.from_numpy(input_2D.astype('float32')).cuda()
141 |
142 | N = input_2D.size(0)
143 |
144 | ## estimation
145 | output_3D_non_flip = model(input_2D[:, 0])
146 | output_3D_flip = model(input_2D[:, 1])
147 |
148 | output_3D_flip[:, :, :, 0] *= -1
149 | output_3D_flip[:, :, joints_left + joints_right, :] = output_3D_flip[:, :, joints_right + joints_left, :]
150 |
151 | output_3D = (output_3D_non_flip + output_3D_flip) / 2
152 |
153 | output_3D = output_3D[0:, opt.pad].unsqueeze(1)
154 | output_3D[:, :, 0, :] = 0
155 | post_out = output_3D[0, 0].cpu().detach().numpy()
156 |
157 | rot = [0.1407056450843811, -0.1500701755285263, -0.755240797996521, 0.6223280429840088]
158 | rot = np.array(rot, dtype='float32')
159 | post_out = camera_to_world(post_out, R=rot, t=0)
160 | post_out[:, 2] -= np.min(post_out[:, 2])
161 |
162 | input_2D_no = input_2D_no[opt.pad]
163 |
164 | ## 2D
165 | image = show2Dpose(input_2D_no, copy.deepcopy(img))
166 |
167 |
168 | ## 3D
169 | fig = plt.figure( figsize=(9.6, 5.4))
170 | gs = gridspec.GridSpec(1, 1)
171 | gs.update(wspace=-0.00, hspace=0.05)
172 | ax = plt.subplot(gs[0], projection='3d')
173 | show3Dpose( post_out, ax)
174 |
175 |
176 | output_dir_3D = output_dir +'pose3D/'
177 | os.makedirs(output_dir_3D, exist_ok=True)
178 | plt.savefig(output_dir_3D + '_3D.png', dpi=200, format='png', bbox_inches = 'tight')
179 |
180 |
181 |
182 |
183 | ## all
184 | image_3d_dir = sorted(glob.glob(os.path.join(output_dir_3D, '*.png')))
185 |
186 | for i in range(len(image_3d_dir)):
187 | image_2d = image
188 | image_3d = plt.imread(image_3d_dir[i])
189 | ## crop
190 | edge = (image_2d.shape[1] - image_2d.shape[0]) // 2
191 | image_2d = image_2d[:, edge:image_2d.shape[1] - edge]
192 |
193 | edge = 130
194 | image_3d = image_3d[edge:image_3d.shape[0] - edge, edge:image_3d.shape[1] - edge]
195 | ## show
196 | font_size = 12
197 | ax = plt.subplot(121)
198 | showimage(ax, image_2d)
199 | ax.set_title("Input", fontsize = font_size)
200 |
201 | ax = plt.subplot(122)
202 | showimage(ax, image_3d)
203 | ax.set_title("Pose", fontsize = font_size)
204 |
205 | ## save
206 | output_dir_pose = output_dir
207 | plt.savefig(output_dir + file_name + '_pose.png', dpi=200, bbox_inches = 'tight')
208 |
209 | shutil.rmtree("./demo/output/pose3D")
210 |
211 |
212 |
213 | if __name__ == "__main__":
214 | items = os.listdir('./demo/figure/')
215 | print(items)
216 | for i, file_name in enumerate(items):
217 | print("Gnenerate Pose For " + file_name)
218 | figure_path = './demo/figure/' + file_name
219 | output_dir = './demo/output/'
220 | get_pose3D(figure_path, output_dir, file_name[:-4])
221 |
222 |
223 |
224 |
--------------------------------------------------------------------------------
/figure/README.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/figure/messi_pose.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/figure/messi_pose.png
--------------------------------------------------------------------------------
/figure/structure.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/figure/structure.png
--------------------------------------------------------------------------------
/figure/wild.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/figure/wild.png
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
1 | import os
2 | import glob
3 | import torch
4 | import random
5 | import logging
6 | import numpy as np
7 | from tqdm import tqdm
8 | import torch.utils.data
9 | import torch.optim as optim
10 |
11 | from common.opt import opts
12 | from common.utils import *
13 | from common.load_data_hm36 import Fusion
14 | from common.h36m_dataset import Human36mDataset
15 | from model.GCN_conv import adj_mx_from_skeleton
16 | from model.trans import HTNet
17 |
18 | opt = opts().parse()
19 | os.environ["CUDA_VISIBLE_DEVICES"] = opt.gpu
20 | from tensorboardX import SummaryWriter
21 |
22 | writer = SummaryWriter(log_dir='./runs/' + opt.model_name)
23 |
24 |
25 | def train(opt, actions, train_loader, model, optimizer, epoch):
26 | return step('train', opt, actions, train_loader, model, optimizer, epoch)
27 |
28 | def val(opt, actions, val_loader, model):
29 | with torch.no_grad():
30 | return step('test', opt, actions, val_loader, model)
31 |
32 | def step(split, opt, actions, dataLoader, model, optimizer=None, epoch=None):
33 | loss_all = {'loss': AccumLoss()}
34 | action_error_sum = define_error_list(actions)
35 | if split == 'train':
36 | model.train()
37 | else:
38 | model.eval()
39 | for i, data in enumerate(tqdm(dataLoader, 0)):
40 | batch_cam, gt_3D, input_2D, action, subject, scale, bb_box, cam_ind = data
41 | [input_2D, gt_3D, batch_cam, scale, bb_box] = get_varialbe(split, [input_2D, gt_3D, batch_cam, scale, bb_box])
42 | if split =='train':
43 | output_3D = model(input_2D)
44 | else:
45 | input_2D, output_3D = input_augmentation(input_2D, model)
46 | out_target = gt_3D.clone()
47 | out_target[:, :, 0] = 0
48 | if split == 'train':
49 | loss = mpjpe_cal(output_3D, out_target)
50 | N = input_2D.size(0)
51 | loss_all['loss'].update(loss.detach().cpu().numpy() * N, N)
52 | optimizer.zero_grad()
53 | loss.backward()
54 | optimizer.step()
55 | elif split == 'test':
56 | output_3D = output_3D[:, opt.pad].unsqueeze(1)
57 | output_3D[:, :, 0, :] = 0
58 | action_error_sum = test_calculation(output_3D, out_target, action, action_error_sum, opt.dataset, subject)
59 | if split == 'train':
60 | return loss_all['loss'].avg
61 | elif split == 'test':
62 | p1, p2 = print_error(opt.dataset, action_error_sum, opt.train)
63 | return p1, p2
64 |
65 | def input_augmentation(input_2D, model):
66 | joints_left = [4, 5, 6, 11, 12, 13]
67 | joints_right = [1, 2, 3, 14, 15, 16]
68 | input_2D_non_flip = input_2D[:, 0]
69 | input_2D_flip = input_2D[:, 1]
70 | output_3D_non_flip = model(input_2D_non_flip)
71 | output_3D_flip = model(input_2D_flip)
72 | output_3D_flip[:, :, :, 0] *= -1
73 | output_3D_flip[:, :, joints_left + joints_right, :] = output_3D_flip[:, :, joints_right + joints_left, :]
74 | output_3D = (output_3D_non_flip + output_3D_flip) / 2
75 | input_2D = input_2D_non_flip
76 | return input_2D, output_3D
77 |
78 | if __name__ == '__main__':
79 | manualSeed = opt.seed
80 | random.seed(manualSeed)
81 | torch.manual_seed(manualSeed)
82 | np.random.seed(manualSeed)
83 | torch.cuda.manual_seed_all(manualSeed)
84 | torch.backends.cudnn.benchmark = False
85 | torch.backends.cudnn.deterministic = True
86 |
87 | print("lr: ", opt.lr)
88 | print("batch_size: ", opt.batch_size)
89 | print("channel: ", opt.channel)
90 | print("GPU: ", opt.gpu)
91 |
92 | if opt.train:
93 | logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%Y/%m/%d %H:%M:%S', \
94 | filename=os.path.join(opt.checkpoint, 'train.log'), level=logging.INFO)
95 |
96 | root_path = opt.root_path
97 | dataset_path = root_path + 'data_3d_' + opt.dataset + '.npz'
98 |
99 | dataset = Human36mDataset(dataset_path, opt)
100 | actions = define_actions(opt.actions)
101 | adj = adj_mx_from_skeleton(dataset.skeleton())
102 |
103 |
104 | if opt.train:
105 | train_data = Fusion(opt=opt, train=True, dataset=dataset, root_path=root_path)
106 | train_dataloader = torch.utils.data.DataLoader(train_data, batch_size=opt.batch_size,
107 | shuffle=True, num_workers=int(opt.workers), pin_memory=True)
108 |
109 | test_data = Fusion(opt=opt, train=False, dataset=dataset, root_path =root_path)
110 | test_dataloader = torch.utils.data.DataLoader(test_data, batch_size=opt.batch_size,
111 | shuffle=False, num_workers=int(opt.workers), pin_memory=True)
112 |
113 | model = HTNet(opt,adj).cuda()
114 |
115 | if opt.reload:
116 | model_dict = model.state_dict()
117 | model_path = sorted(glob.glob(os.path.join(opt.previous_dir, '*.pth')))[0]
118 | print(model_path)
119 | pre_dict = torch.load(model_path)
120 | pre_key = pre_dict.keys()
121 | for name, key in model_dict.items():
122 | model_dict[name] = pre_dict[name]
123 | model.load_state_dict(model_dict)
124 |
125 | model_params = 0
126 | for parameter in model.parameters():
127 | model_params += parameter.numel()
128 | print('INFO: Trainable parameter count:', model_params / 1000000)
129 |
130 |
131 | all_param = []
132 | lr = opt.lr
133 | all_param += list(model.parameters())
134 | optimizer = optim.Adam(all_param, lr=opt.lr, amsgrad=True)
135 | scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.317, patience=5, verbose=True)
136 |
137 | for epoch in range(1, opt.nepoch):
138 | if opt.train:
139 | loss = train(opt, actions, train_dataloader, model, optimizer, epoch)
140 | p1, p2 = val(opt, actions, test_dataloader, model)
141 | writer.add_scalar('mpjpe',p1,epoch)
142 | writer.add_scalar('p2',p1,epoch)
143 |
144 | if opt.train and p1 < opt.previous_best_threshold:
145 | opt.previous_name = save_model(opt.previous_name, opt.checkpoint, epoch, p1, model)
146 | opt.previous_best_threshold = p1
147 | if opt.train == 0:
148 | print('p1: %.2f, p2: %.2f' % (p1, p2))
149 | break
150 | else:
151 | logging.info('epoch: %d, lr: %.7f, loss: %.4f, p1: %.2f, p2: %.2f' % (epoch, lr, loss, p1, p2))
152 | print('e: %d, lr: %.7f, loss: %.4f, p1: %.2f, p2: %.2f' % (epoch, lr, loss, p1, p2))
153 | if epoch % opt.large_decay_epoch == 0:
154 | for param_group in optimizer.param_groups:
155 | param_group['lr'] *= opt.lr_decay_large
156 | lr *= opt.lr_decay_large
157 | else:
158 | for param_group in optimizer.param_groups:
159 | param_group['lr'] *= opt.lr_decay
160 | lr *= opt.lr_decay
161 |
162 |
163 |
164 |
165 |
166 |
167 |
168 |
169 |
--------------------------------------------------------------------------------
/model/Block.py:
--------------------------------------------------------------------------------
1 | from functools import partial
2 | import torch
3 | import torch.nn as nn
4 |
5 | from timm.data import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD
6 | from timm.models.layers import DropPath
7 | from model.GCN_conv import ModulatedGraphConv
8 | from model.Transformer import Attention, Mlp
9 |
10 | #X_1
11 | rl_2joints = [2,3]
12 | ll_2joints = [5,6]
13 | la_2joints = [12,13]
14 | ra_2joints = [15,16]
15 | part_2joints = [rl_2joints,ll_2joints,la_2joints,ra_2joints]
16 | # X_2
17 | rl_3joints = [1,2,3]
18 | ll_3joints = [4,5,6]
19 | ra_3joints = [14,15,16]
20 | la_3joints = [11,12,13]
21 | part_3joints = [rl_3joints,ll_3joints,la_3joints,ra_3joints]
22 |
23 | class LJC(nn.Module):
24 | def __init__(self, adj, dim, drop_path=0., norm_layer=nn.LayerNorm):
25 | super().__init__()
26 | self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
27 | self.adj = adj
28 | self.norm_gcn1 = norm_layer(dim)
29 | self.gcn1 = ModulatedGraphConv(dim,384,self.adj)
30 | self.gelu = nn.GELU()
31 | self.gcn2 = ModulatedGraphConv(384,dim,self.adj)
32 | self.norm_gcn2 = norm_layer(dim)
33 |
34 | def forward(self, x_gcn):
35 | x_gcn = x_gcn + self.drop_path(self.norm_gcn2(self.gcn2(self.gelu(self.gcn1(self.norm_gcn1(x_gcn))))))
36 | return x_gcn
37 |
38 |
39 | class IPC(nn.Module):
40 | def __init__(self, dim, mlp_hidden_dim, drop=0., drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm):
41 | super().__init__()
42 | self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
43 | self.index_1 = [1,2,3, 4,5,6, 11,12,13, 14,15,16] # 6parts
44 | self.index_2 = [2,3, 5,6, 12,13, 15,16]
45 | self.gelu = nn.GELU()
46 | self.norm_conv1 = norm_layer(dim)
47 | self.conv1 = nn.Conv1d(dim,dim, kernel_size=3, padding=0, stride=3)
48 | self.norm_conv1_mlp = norm_layer(dim)
49 | self.mlp_down_1 = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
50 | self.norm_conv2 = norm_layer(dim)
51 | self.conv2 = nn.Conv1d(dim,dim, kernel_size=2, padding=0, stride=2)
52 | self.norm_conv2_mlp = norm_layer(dim)
53 | self.mlp_down_2 = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
54 |
55 |
56 | def forward(self, x_gcn, x_conv):
57 | x_conv = x_conv + x_gcn
58 |
59 | #NOTE:Conv_1 3 joints per limb
60 | x_conv_1 = self.norm_conv1(x_conv)
61 | x_conv_1 = x_conv_1.permute(0,2,1)
62 | x_pooling_1 = x_conv_1[:, :, self.index_1]
63 | x_pooling_1 = self.drop_path(self.gelu(self.conv1(x_pooling_1)))
64 |
65 | x_pooling_1 = x_pooling_1.permute(0,2,1)
66 | x_pooling_1 = x_pooling_1 + self.drop_path(self.mlp_down_1(self.norm_conv1_mlp(x_pooling_1)))
67 | x_pooling_1 = x_pooling_1.permute(0,2,1)
68 | for i in range(len(part_3joints)):
69 | num_joints = len(part_3joints[i]) - 1
70 | x_conv_1[:,:,part_3joints[i][1:]] = x_pooling_1[:,:,i].unsqueeze(-1).repeat(1,1,num_joints)
71 | x_conv_1 = x_conv_1.permute(0,2,1)
72 |
73 | #NOTE:Conv_2 2 joints per limb
74 | x_conv_2 = self.norm_conv2(x_conv)
75 | x_conv_2 = x_conv_2.permute(0,2,1)
76 | x_pooling_2 = x_conv_2[:, :, self.index_2]
77 | x_pooling_2 = self.drop_path(self.gelu(self.conv2(x_pooling_2)))
78 |
79 | x_pooling_2 = x_pooling_2.permute(0,2,1)
80 | x_pooling_2 = x_pooling_2 + self.drop_path(self.mlp_down_2(self.norm_conv2_mlp(x_pooling_2)))
81 | x_pooling_2 = x_pooling_2.permute(0,2,1)
82 | for i in range(len(part_2joints)):
83 | num_joints = len(part_2joints[i]) - 1
84 | x_conv_2[:,:,part_2joints[i][1:]] = x_pooling_2[:,:,i].unsqueeze(-1).repeat(1,1,num_joints)
85 | x_conv_2 = x_conv_2.permute(0,2,1)
86 |
87 | x_conv = x_conv_1 + x_conv_2 + x_conv
88 | return x_conv
89 |
90 |
91 | class GBI(nn.Module):
92 | def __init__(self, dim, num_heads, qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,
93 | drop_path=0., norm_layer=nn.LayerNorm, length=1):
94 | super().__init__()
95 | self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
96 | self.norm_attn = norm_layer(dim)
97 | self.attn = Attention(dim, num_heads=num_heads, qkv_bias=qkv_bias, \
98 | qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop, length=length)
99 |
100 | def forward(self, x_conv, x_attn):
101 | x_attn = x_attn + x_conv
102 | x_attn = x_attn + self.drop_path(self.attn(self.norm_attn(x_attn)))
103 | return x_attn
104 |
105 |
106 |
107 |
108 |
109 | class Hiremixer(nn.Module):
110 | def __init__(self, adj, depth=8, embed_dim=512, mlp_hidden_dim=1024, h=8, drop_rate=0.1, length=9):
111 | super().__init__()
112 | drop_path_rate = 0.3
113 | attn_drop_rate = 0.
114 | qkv_bias = True
115 | qk_scale = None
116 | norm_layer = partial(nn.LayerNorm, eps=1e-6)
117 | # Stochastic depth decay rule
118 | dpr = [x.item() for x in torch.linspace(0.1, drop_path_rate, depth)]
119 | self.blocks = nn.ModuleList([
120 | Block(
121 | adj, dim=embed_dim, num_heads=h, mlp_hidden_dim=mlp_hidden_dim, qkv_bias=qkv_bias, qk_scale=qk_scale,
122 | drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer, length=length)
123 | for i in range(depth)])
124 | self.Temporal_norm = norm_layer(embed_dim)
125 |
126 | def forward(self, x):
127 | for blk in self.blocks:
128 | x = blk(x)
129 | x = self.Temporal_norm(x)
130 | return x
131 |
132 |
133 | class Block(nn.Module):
134 | def __init__(self, adj, dim, num_heads, mlp_hidden_dim, qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,
135 | drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm, length=1):
136 | super().__init__()
137 |
138 | dim = int(dim/3)
139 | self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
140 |
141 | # Three sub-modules
142 | self.lgc = LJC(adj, dim, drop_path=drop_path, norm_layer=nn.LayerNorm)
143 | self.ipc = IPC(dim, mlp_hidden_dim, drop=drop, drop_path=drop_path, act_layer=nn.GELU, norm_layer=nn.LayerNorm)
144 | self.gbi = GBI(dim, num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale, drop=0.1, attn_drop=attn_drop,
145 | drop_path=drop_path, norm_layer=nn.LayerNorm, length=length)
146 |
147 | self.norm_mlp = norm_layer(dim*3)
148 | self.mlp = Mlp(in_features=dim*3, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
149 |
150 |
151 |
152 | def forward(self, x):
153 | x_split = torch.chunk(x,3,-1)
154 | x_lgc, x_ipc, x_gbi = x_split
155 | # Local Joint-level Connection (LJC)
156 | x_lgc = self.lgc(x_lgc)
157 | # Inter-Part Constraint (IPC)
158 | x_ipc = self.ipc(x_lgc, x_ipc)
159 | # Global body-level Interaction (GBI)
160 | x_gbi = self.gbi(x_ipc, x_gbi)
161 | x_cat = torch.cat([x_lgc,x_ipc,x_gbi], -1)
162 | x = x_cat + self.drop_path(self.mlp(self.norm_mlp(x_cat)))
163 | return x
164 |
165 |
166 |
167 | class Hiremixer_frame(nn.Module):
168 | def __init__(self, adj, depth=8, embed_dim=512, mlp_hidden_dim=1024, h=8, drop_rate=0.1, length=9):
169 | super().__init__()
170 | drop_path_rate = 0.3
171 | attn_drop_rate = 0.
172 | qkv_bias = True
173 | qk_scale = None
174 | norm_layer = partial(nn.LayerNorm, eps=1e-6)
175 | # Stochastic depth decay rule
176 | dpr = [x.item() for x in torch.linspace(0.1, drop_path_rate, depth)]
177 | self.blocks = nn.ModuleList([
178 | Block_frame(
179 | adj, dim=embed_dim, num_heads=h, mlp_hidden_dim=mlp_hidden_dim, qkv_bias=qkv_bias, qk_scale=qk_scale,
180 | drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer, length=length)
181 | for i in range(depth)])
182 | self.Temporal_norm = norm_layer(embed_dim)
183 |
184 | def forward(self, x):
185 | for blk in self.blocks:
186 | x = blk(x)
187 | x = self.Temporal_norm(x)
188 | return x
189 |
190 |
191 | class Block_frame(nn.Module):
192 | def __init__(self, adj, dim, num_heads, mlp_hidden_dim, qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,
193 | drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm, length=1):
194 | super().__init__()
195 |
196 | dim = int(dim/2)
197 | self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
198 |
199 | # Three sub-modules
200 | self.lgc = LJC(adj, dim, drop_path=drop_path, norm_layer=nn.LayerNorm)
201 | self.gbi = GBI(dim, num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale, drop=0.1, attn_drop=attn_drop,
202 | drop_path=drop_path, norm_layer=nn.LayerNorm, length=length)
203 |
204 | self.norm_mlp = norm_layer(dim*2)
205 | self.mlp = Mlp(in_features=dim*2, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
206 |
207 |
208 |
209 | def forward(self, x):
210 | x_split = torch.chunk(x,2,-1)
211 | x_lgc, x_gbi = x_split
212 | # Local Joint-level Connection (LJC)
213 | x_lgc = self.lgc(x_lgc)
214 | # Global body-level Interaction (GBI)
215 | x_gbi = self.gbi(x_lgc, x_gbi)
216 | x_cat = torch.cat([x_lgc,x_gbi], -1)
217 | x = x_cat + self.drop_path(self.mlp(self.norm_mlp(x_cat)))
218 | return x
219 |
220 |
221 |
222 | class Block_ipc(nn.Module):
223 | def __init__(self, adj, dim, num_heads, mlp_hidden_dim, qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,
224 | drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm, length=1):
225 | super().__init__()
226 |
227 | self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
228 |
229 | # Three sub-modules
230 | self.ipc = IPC(dim, mlp_hidden_dim, drop=drop, drop_path=drop_path, act_layer=nn.GELU, norm_layer=nn.LayerNorm)
231 |
232 | self.norm_mlp = norm_layer(dim)
233 | self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
234 |
235 |
236 |
237 | def forward(self, x):
238 | x = x + self.ipc(x)
239 | x = x + self.drop_path(self.mlp(self.norm_mlp(x)))
240 | return x
--------------------------------------------------------------------------------
/model/GCN_conv.py:
--------------------------------------------------------------------------------
1 | from __future__ import absolute_import, division
2 |
3 | import math
4 | import torch
5 | import torch.nn as nn
6 | import torch.nn.functional as F
7 | import numpy as np
8 |
9 | import scipy.sparse as sp
10 |
11 |
12 | class _NonLocalBlockND(nn.Module):
13 | def __init__(self, in_channels, inter_channels=None, dimension=3, sub_sample=True, bn_layer=True):
14 | super(_NonLocalBlockND, self).__init__()
15 |
16 | assert dimension in [1, 2, 3]
17 |
18 | self.dimension = dimension
19 | self.sub_sample = sub_sample
20 |
21 | self.in_channels = in_channels
22 | self.inter_channels = inter_channels
23 |
24 | if self.inter_channels is None:
25 | self.inter_channels = in_channels // 2
26 | if self.inter_channels == 0:
27 | self.inter_channels = 1
28 |
29 | if dimension == 3:
30 | conv_nd = nn.Conv3d
31 | max_pool_layer = nn.MaxPool3d(kernel_size=(1, 2, 2))
32 | bn = nn.BatchNorm3d
33 | elif dimension == 2:
34 | conv_nd = nn.Conv2d
35 | max_pool_layer = nn.MaxPool2d(kernel_size=(2, 2))
36 | bn = nn.BatchNorm2d
37 | else:
38 | conv_nd = nn.Conv1d
39 | max_pool_layer = nn.MaxPool1d(kernel_size=(2))
40 | bn = nn.BatchNorm1d
41 |
42 | self.g = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
43 | kernel_size=1, stride=1, padding=0)
44 |
45 | if bn_layer:
46 | self.W = nn.Sequential(
47 | conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
48 | kernel_size=1, stride=1, padding=0),
49 | bn(self.in_channels)
50 | )
51 | nn.init.constant_(self.W[1].weight, 0)
52 | nn.init.constant_(self.W[1].bias, 0)
53 | else:
54 | self.W = conv_nd(in_channels=self.inter_channels, out_channels=self.in_channels,
55 | kernel_size=1, stride=1, padding=0)
56 | nn.init.constant_(self.W.weight, 0)
57 | nn.init.constant_(self.W.bias, 0)
58 |
59 | self.theta = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
60 | kernel_size=1, stride=1, padding=0)
61 | self.phi = conv_nd(in_channels=self.in_channels, out_channels=self.inter_channels,
62 | kernel_size=1, stride=1, padding=0)
63 |
64 | if sub_sample:
65 | self.g = nn.Sequential(self.g, max_pool_layer)#384-》192
66 | self.phi = nn.Sequential(self.phi, max_pool_layer)
67 |
68 | def forward(self, x):
69 | '''
70 | :param x: (b, c, t, h, w)
71 | :return:
72 | '''
73 |
74 | batch_size = x.size(0)#torch.Size([256, 384, 1, 17])
75 |
76 | g_x = self.g(x).view(batch_size, self.inter_channels, -1)#256,192,17
77 | g_x = g_x.permute(0, 2, 1)#torch.Size([256, 17, 192])
78 |
79 | theta_x = self.theta(x).view(batch_size, self.inter_channels, -1)
80 | theta_x = theta_x.permute(0, 2, 1)#torch.Size([256, 17, 192])
81 | phi_x = self.phi(x).view(batch_size, self.inter_channels, -1)#torch.Size([256, 192, 17])
82 | f = torch.matmul(theta_x, phi_x)#torch.Size([256, 17, 17])
83 | f_div_C = F.softmax(f, dim=-1)#torch.Size([256, 17, 17])在最后一个维度上进行softmax
84 |
85 | y = torch.matmul(f_div_C, g_x) #这一步其实就是AMIX感觉
86 | y = y.permute(0, 2, 1).contiguous()#torch.Size([256, 17, 192])
87 | y = y.view(batch_size, self.inter_channels, *x.size()[2:])#torch.Size([256, 192, 1, 17])
88 | W_y = self.W(y)
89 | z = W_y + x##torch.Size([256, 384, 1, 17]) Amix
90 |
91 | return z
92 |
93 |
94 |
95 |
96 | class NONLocalBlock2D(_NonLocalBlockND):
97 | def __init__(self, in_channels, inter_channels=None, sub_sample=True, bn_layer=True):
98 | super(NONLocalBlock2D, self).__init__(in_channels,
99 | inter_channels=inter_channels,
100 | dimension=2, sub_sample=sub_sample,
101 | bn_layer=bn_layer)
102 |
103 |
104 | class ModulatedGraphConv(nn.Module):
105 | """
106 | Semantic graph convolution layer
107 | """
108 |
109 | def __init__(self, in_features, out_features, adj, bias=True):
110 | super(ModulatedGraphConv, self).__init__()
111 | self.in_features = in_features
112 | self.out_features = out_features
113 |
114 | self.W = nn.Parameter(torch.zeros(size=(2, in_features, out_features), dtype=torch.float)) #torch.Size([2,2, 384])
115 | nn.init.xavier_uniform_(self.W.data, gain=1.414)
116 |
117 | self.M = nn.Parameter(torch.zeros(size=(adj.size(0), out_features), dtype=torch.float))#17,384,取值在
118 | nn.init.xavier_uniform_(self.M.data, gain=1.414)
119 |
120 | self.adj = adj
121 |
122 | self.adj2 = nn.Parameter(torch.ones_like(adj))
123 | nn.init.constant_(self.adj2, 1e-6)
124 |
125 | if bias:
126 | self.bias = nn.Parameter(torch.zeros(out_features, dtype=torch.float))
127 | stdv = 1. / math.sqrt(self.W.size(2))
128 | self.bias.data.uniform_(-stdv, stdv)
129 | else:
130 | self.register_parameter('bias', None)
131 |
132 | def forward(self, input):
133 | h0 = torch.matmul(input, self.W[0]) #input 256,17,2 -> 256,17,384
134 | h1 = torch.matmul(input, self.W[1])
135 |
136 | adj = self.adj.to(input.device) + self.adj2.to(input.device)
137 | adj = (adj.T + adj)/2
138 | E = torch.eye(adj.size(0), dtype=torch.float).to(input.device) #17*17的I
139 |
140 | output = torch.matmul(adj * E, self.M*h0) + torch.matmul(adj * (1 - E), self.M*h1) #前者是专门针对自己的I,后者是针对M的
141 | if self.bias is not None:
142 | return output + self.bias.view(1, 1, -1) #torch.Size([256, 17, 384]),全部都有的
143 | else:
144 | return output
145 |
146 | def __repr__(self):
147 | return self.__class__.__name__ + ' (' + str(self.in_features) + ' -> ' + str(self.out_features) + ')'
148 |
149 |
150 |
151 | def normalize(mx):
152 | """Row-normalize sparse matrix"""
153 | rowsum = np.array(mx.sum(1))
154 | r_inv = np.power(rowsum, -1).flatten()
155 | r_inv[np.isinf(r_inv)] = 0.
156 | r_mat_inv = sp.diags(r_inv)
157 | mx = r_mat_inv.dot(mx)
158 | return mx
159 |
160 | def sparse_mx_to_torch_sparse_tensor(sparse_mx):
161 | """Convert a scipy sparse matrix to a torch sparse tensor."""
162 | sparse_mx = sparse_mx.tocoo().astype(np.float32)
163 | indices = torch.from_numpy(np.vstack((sparse_mx.row, sparse_mx.col)).astype(np.int64))
164 | values = torch.from_numpy(sparse_mx.data)
165 | shape = torch.Size(sparse_mx.shape)
166 | return torch.sparse.FloatTensor(indices, values, shape)
167 |
168 |
169 | def adj_mx_from_edges(num_pts, edges, sparse=True):
170 | edges = np.array(edges, dtype=np.int32)
171 | data, i, j = np.ones(edges.shape[0]), edges[:, 0], edges[:, 1]
172 | adj_mx = sp.coo_matrix((data, (i, j)), shape=(num_pts, num_pts), dtype=np.float32)
173 |
174 | # build symmetric adjacency matrix
175 | adj_mx = adj_mx + adj_mx.T.multiply(adj_mx.T > adj_mx) - adj_mx.multiply(adj_mx.T > adj_mx)
176 | adj_mx = normalize(adj_mx) #+ sp.eye(adj_mx.shape[0]))
177 | if sparse:
178 | adj_mx = sparse_mx_to_torch_sparse_tensor(adj_mx)
179 | else:
180 | adj_mx = torch.tensor(adj_mx.todense(), dtype=torch.float)
181 |
182 | adj_mx = adj_mx * (1-torch.eye(adj_mx.shape[0])) + torch.eye(adj_mx.shape[0])
183 | return adj_mx
184 |
185 |
186 | def adj_mx_from_skeleton(skeleton):
187 | num_joints = skeleton.num_joints()
188 | edges = list(filter(lambda x: x[1] >= 0, zip(list(range(0, num_joints)), skeleton.parents())))
189 | return adj_mx_from_edges(num_joints, edges, sparse=False)
190 |
--------------------------------------------------------------------------------
/model/Transformer.py:
--------------------------------------------------------------------------------
1 | import torch.nn as nn
2 | from timm.data import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD
3 | import torch
4 |
5 |
6 | class Mlp(nn.Module):
7 | def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):
8 | super().__init__()
9 | out_features = out_features or in_features
10 | hidden_features = hidden_features or in_features
11 | self.fc1 = nn.Linear(in_features, hidden_features)
12 | self.act = act_layer()
13 | self.fc2 = nn.Linear(hidden_features, out_features)
14 | self.drop = nn.Dropout(drop)
15 |
16 | def forward(self, x):
17 | x = self.fc1(x)
18 | x = self.act(x)
19 | x = self.drop(x)
20 | x = self.fc2(x)
21 | x = self.drop(x)
22 | return x
23 |
24 | class Attention(nn.Module):
25 | def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0., length=27):
26 | super().__init__()
27 |
28 | self.num_heads = num_heads
29 | head_dim = torch.div(dim, num_heads)
30 | self.scale = qk_scale or head_dim ** -0.5
31 | self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)
32 | self.attn_drop = nn.Dropout(attn_drop)
33 | self.proj = nn.Linear(dim, dim)
34 | self.proj_drop = nn.Dropout(proj_drop)
35 |
36 | def forward(self, x):
37 | B, N, C = x.shape
38 | qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, torch.div(C, self.num_heads, rounding_mode='floor')).permute(2, 0, 3, 1, 4)
39 | q, k, v = qkv[0], qkv[1], qkv[2]
40 |
41 | attn = (q @ k.transpose(-2, -1)) * self.scale
42 | attn = attn.softmax(dim=-1)
43 |
44 | attn = self.attn_drop(attn)
45 |
46 | x = (attn @ v).transpose(1, 2).reshape(B, N, C)
47 | x = self.proj(x)
48 | x = self.proj_drop(x)
49 | return x
--------------------------------------------------------------------------------
/model/__pycache__/Block.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/model/__pycache__/Block.cpython-39.pyc
--------------------------------------------------------------------------------
/model/__pycache__/GCN_conv.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/model/__pycache__/GCN_conv.cpython-39.pyc
--------------------------------------------------------------------------------
/model/__pycache__/Transformer.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/model/__pycache__/Transformer.cpython-39.pyc
--------------------------------------------------------------------------------
/model/__pycache__/trans.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vefalun/HTNet/8c5f9f3f0a24af33d6d66ecc4b64755acd525853/model/__pycache__/trans.cpython-39.pyc
--------------------------------------------------------------------------------
/model/post_refine.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 |
4 | from torch.autograd import Variable
5 |
6 |
7 |
8 | inter_channels = [128, 256]
9 | fc_out = inter_channels[1]
10 | fc_unit = 1024
11 | class post_refine(nn.Module):
12 |
13 |
14 | def __init__(self, opt):
15 | super().__init__()
16 |
17 | out_seqlen = 1
18 | fc_in = opt.out_channels*2*out_seqlen*opt.n_joints
19 |
20 | fc_out = opt.in_channels * opt.n_joints
21 | self.post_refine = nn.Sequential(
22 | nn.Linear(fc_in, fc_unit),
23 | nn.ReLU(),
24 | nn.Dropout(0.5,inplace=True),
25 | nn.Linear(fc_unit, fc_out),
26 | nn.Sigmoid()
27 |
28 | )
29 |
30 |
31 | def forward(self, x, x_1):
32 | """
33 |
34 | :param x: N*T*V*3
35 | :param x_1: N*T*V*2
36 | :return:
37 | """
38 | # data normalization
39 | N, T, V,_ = x.size()
40 | x_in = torch.cat((x, x_1), -1) #N*T*V*5
41 | x_in = x_in.view(N, -1)
42 |
43 |
44 |
45 | score = self.post_refine(x_in).view(N,T,V,2)
46 | score_cm = Variable(torch.ones(score.size()), requires_grad=False).cuda() - score
47 | x_out = x.clone()
48 | x_out[:, :, :, :2] = score * x[:, :, :, :2] + score_cm * x_1[:, :, :, :2]
49 |
50 | return x_out
--------------------------------------------------------------------------------
/model/refine.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | from torch.autograd import Variable
4 |
5 | fc_out = 256
6 | fc_unit = 1024
7 |
8 | class refine(nn.Module):
9 | def __init__(self, opt):
10 | super().__init__()
11 |
12 | out_seqlen = 1
13 | fc_in = opt.out_channels*2*out_seqlen*opt.n_joints
14 | fc_out = opt.in_channels * opt.n_joints
15 |
16 | self.post_refine = nn.Sequential(
17 | nn.Linear(fc_in, fc_unit),
18 | nn.ReLU(inplace =False),
19 | nn.Dropout(0.5,inplace =False),
20 | nn.Linear(fc_unit, fc_out),
21 | nn.Sigmoid()
22 | )
23 |
24 | def forward(self, x, x_1):
25 | N, T, V,_ = x.size()#256,1,17,3
26 | x_in = torch.cat((x, x_1), -1) #torch.Size([256, 1, 17, 6])
27 | x_in = x_in.view(N, -1) #torch.Size([256, 102])
28 |
29 | score = self.post_refine(x_in).view(N,T,V,2) #torch.Size([256, 1, 17, 2])
30 | score_cm = Variable(torch.ones(score.size()), requires_grad=False).cuda() - score
31 | x_out = x.clone()
32 | x_out[:, :, :, :2] = score * x[:, :, :, :2] + score_cm * x_1[:, :, :, :2]#torch.Size([256, 1, 17, 3])
33 |
34 | return x_out
35 |
36 |
37 |
--------------------------------------------------------------------------------
/model/trans.py:
--------------------------------------------------------------------------------
1 | import sys
2 | from einops.einops import rearrange
3 | sys.path.append("..")
4 | import torch
5 | import torch.nn as nn
6 | from model.Block import Hiremixer
7 | from common.opt import opts
8 | opt = opts().parse()
9 |
10 |
11 |
12 | class HTNet(nn.Module):
13 | def __init__(self, args, adj):
14 | super().__init__()
15 |
16 | if args == -1:
17 | layers, channel, d_hid, length = 3, 512, 1024, 27
18 | self.num_joints_in, self.num_joints_out = 17, 17
19 | else:
20 | layers, channel, d_hid, length = args.layers, args.channel, args.d_hid, args.frames
21 | self.num_joints_in, self.num_joints_out = args.n_joints, args.out_joints
22 |
23 | self.patch_embed = nn.Linear(2, channel)
24 | self.pos_embed = nn.Parameter(torch.zeros(1, self.num_joints_in, channel))
25 | self.Hiremixer = Hiremixer(adj, layers, channel, d_hid, length=length)
26 | self.fcn = nn.Linear(args.channel, 3)
27 |
28 | def forward(self, x):
29 | x = rearrange(x, 'b f j c -> (b f) j c').contiguous()
30 | x = self.patch_embed(x)
31 | x = x + self.pos_embed
32 | x = self.Hiremixer(x)
33 | x = self.fcn(x)
34 | x = x.view(x.shape[0], -1, self.num_joints_out, x.shape[2])
35 | return x
36 |
37 |
38 |
--------------------------------------------------------------------------------
/requirement.txt:
--------------------------------------------------------------------------------
1 | einops
2 | timm
3 | tensorboardX
4 | scipy
5 | filterpy
6 | tqdm
--------------------------------------------------------------------------------
/runs/README.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------