├── LICENSE ├── README.md ├── demo_img ├── cxk.jpg ├── cxk.mp4 ├── epoch0_step1500_i_1.jpg ├── epoch0_step200_i_1.jpg ├── epoch0_step500_i_0.jpg ├── epoch0_step900_i_1.jpg ├── epoch1_step1500_i_1.jpg ├── epoch1_step200_i_1.jpg ├── epoch1_step500_i_0.jpg ├── epoch1_step900_i_1.jpg ├── epoch2_step1500_i_1.jpg ├── epoch2_step200_i_1.jpg ├── epoch2_step500_i_0.jpg ├── epoch2_step900_i_1.jpg ├── epoch3_step1500_i_1.jpg ├── epoch3_step200_i_1.jpg ├── epoch3_step500_i_0.jpg ├── epoch3_step900_i_1.jpg └── result.jpg ├── src ├── __pycache__ │ ├── dataset.cpython-36.pyc │ ├── heatmap.cpython-36.pyc │ ├── hrnet.cpython-36.pyc │ └── utils.cpython-36.pyc ├── dataset.py ├── evaluate.py ├── heatmap.py ├── hrnet.py ├── temp.py ├── test.py ├── train.py └── utils.py └── test_img ├── step11_i_0.jpg └── step136_i_0.jpg /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 VXallset 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # deep-high-resolution-net.TensorFlow 2 | A TensorFlow implementation of HRNet-32.The dataset used to train the model is the AI Challenger dataset. 3 | 4 | Just for fun! A **'famous' actor** CXK in China and the keypoints estimated using the HRNet-32. 5 |

6 | 7 | For more details, please refer to the [paper](https://arxiv.org/abs/1902.09212) and the [dataset](https://challenger.ai/competition/keypoint). 8 | 9 | # Environment 10 | - python 3.6 or higher 11 | - TensorFlow 1.11 or higher 12 | - PyCharm 13 | 14 | # How to Use 15 | ### For Training 16 | - Download the AI Challenger dataset. 17 | - Convert the images in the AI Challenger dataset (train_images folder) to TFRecords by running the dataset.py. Please make sure that the **dataset_root_path** you used in the **extract_people_from_dataset()** function is the path of the AI Challenger dataset you saved in the previous step. 18 | - Run the train.py! 19 | 20 | Please note that the structure of the HRNet is complicated. I trained the HRNet-32 network using 2 Nvidia Titan V graphics cards. As the limited of the graphics memory（16 GB）, the max batch size I used was 2, and it took around 30 hours to finish 1 epoch (189176 steps). The model files were uploaded to [Google Drive](https://drive.google.com/drive/folders/13ll_UyKLW31ozasChqzB_91sWEE4I2PZ?usp=sharing) and [Baidu Cloud](https://pan.baidu.com/s/1bTmiP3MxxC17pF1S4pDpWQ) (Extraction code: 7hym). 21 | 22 | ### For Testing 23 | - Finish the 4 steps in the training. 24 | - Make sure the dataset name, mode file name are corrected. 25 | - Run the test.py! 26 | 27 | The result images will be saved in the _test_img_ folder. It will also generate the distances.npy and the classes.npy file, which will be used to calculate the AP50 and AP75 later. 28 | 29 | ### For Evaluating 30 | - Run the evaluate.py. 31 | 32 | It will print the AP50 and AP75 information in the command line. 33 | 34 | ### For Debugging 35 | If you encounter any problems, please try to run the _temp.py_ file to see if it can work properly. It is a simple demo file that can predict the human pose in the cxk.mp4 file. Compare to other scripts, this one is easier to debug. 36 | 37 | # What You Will See 38 | ### For Training 39 | - The loss information. 40 | - The examples of images predicted by the network will be saved into the _./demo_img/_ folder. 41 | 42 | Epoch Number | example image 1 | example image 2 | example image 3 | example image 4 43 | :-: | :-: | :-: | :-: | :-: 44 | epoch 0|

| 45 | epoch 1|

| 46 | epoch 2|

| 47 | epoch 3|

| 48 | 49 | ### For Testing 50 | - The result of testing images will be saved into the _./test_img/_ floder. 51 |

52 | 53 | # For More 54 | Contact me: vxallset@outlook.com 55 | -------------------------------------------------------------------------------- /demo_img/cxk.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/cxk.jpg -------------------------------------------------------------------------------- /demo_img/cxk.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/cxk.mp4 -------------------------------------------------------------------------------- /demo_img/epoch0_step1500_i_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/epoch0_step1500_i_1.jpg -------------------------------------------------------------------------------- /demo_img/epoch0_step200_i_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/epoch0_step200_i_1.jpg -------------------------------------------------------------------------------- /demo_img/epoch0_step500_i_0.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/epoch0_step500_i_0.jpg -------------------------------------------------------------------------------- /demo_img/epoch0_step900_i_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/epoch0_step900_i_1.jpg -------------------------------------------------------------------------------- /demo_img/epoch1_step1500_i_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/epoch1_step1500_i_1.jpg -------------------------------------------------------------------------------- /demo_img/epoch1_step200_i_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/epoch1_step200_i_1.jpg -------------------------------------------------------------------------------- /demo_img/epoch1_step500_i_0.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/epoch1_step500_i_0.jpg -------------------------------------------------------------------------------- /demo_img/epoch1_step900_i_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/epoch1_step900_i_1.jpg -------------------------------------------------------------------------------- /demo_img/epoch2_step1500_i_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/epoch2_step1500_i_1.jpg -------------------------------------------------------------------------------- /demo_img/epoch2_step200_i_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/epoch2_step200_i_1.jpg -------------------------------------------------------------------------------- /demo_img/epoch2_step500_i_0.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/epoch2_step500_i_0.jpg -------------------------------------------------------------------------------- /demo_img/epoch2_step900_i_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/epoch2_step900_i_1.jpg -------------------------------------------------------------------------------- /demo_img/epoch3_step1500_i_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/epoch3_step1500_i_1.jpg -------------------------------------------------------------------------------- /demo_img/epoch3_step200_i_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/epoch3_step200_i_1.jpg -------------------------------------------------------------------------------- /demo_img/epoch3_step500_i_0.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/epoch3_step500_i_0.jpg -------------------------------------------------------------------------------- /demo_img/epoch3_step900_i_1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/epoch3_step900_i_1.jpg -------------------------------------------------------------------------------- /demo_img/result.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/demo_img/result.jpg -------------------------------------------------------------------------------- /src/__pycache__/dataset.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/src/__pycache__/dataset.cpython-36.pyc -------------------------------------------------------------------------------- /src/__pycache__/heatmap.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/src/__pycache__/heatmap.cpython-36.pyc -------------------------------------------------------------------------------- /src/__pycache__/hrnet.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/src/__pycache__/hrnet.cpython-36.pyc -------------------------------------------------------------------------------- /src/__pycache__/utils.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/src/__pycache__/utils.cpython-36.pyc -------------------------------------------------------------------------------- /src/dataset.py: -------------------------------------------------------------------------------- 1 | """ 2 | This file is used to generate TFRecords using the AI Challenger dataset. 3 | 4 | @ Author: Yu Sun. vxallset@outlook.com 5 | 6 | @ Date created: Jun 04, 2019 7 | 8 | @ Last modified: Jun 27, 2019 9 | 10 | """ 11 | import numpy as np 12 | import os 13 | import time 14 | from skimage import io, draw 15 | from skimage.transform import resize 16 | from random import shuffle 17 | import tensorflow as tf 18 | import json 19 | 20 | """ 21 | # The image which contains a person is collected from the AI Challenger dataset in the following steps: 22 | 1. Get the coordinate of the bounding box in the original image. 23 | 2. Adjust the ratio of the bounding box to be 4:3 (height : width) 24 | 25 | Note that the coordinates of keypoints are also re-calculated when the foreground parts are clipped from the 26 | original images. 27 | 28 | """ 29 | 30 | 31 | def draw_points_on_img(img, point_ver, point_hor, point_class): 32 | for i in range(len(point_class)): 33 | if point_class[i] != 3: 34 | rr, cc = draw.circle(point_ver[i], point_hor[i], 10, (256, 192)) 35 | #draw.set_color(img, [rr, cc], [0., 0., 0.], alpha=5) 36 | img[rr, cc, :] = 0 37 | #io.imshow(img) 38 | #io.show() 39 | 40 | return img 41 | 42 | 43 | def draw_lines_on_img(img, point_ver, point_hor, point_class): 44 | line_list = [[0, 1], [1, 2], [3, 4], [4, 5], [6, 7], [7, 8], [9, 10], 45 | [10, 11], [12, 13], [13, 6], [13, 9], [13, 0], [13, 3]] 46 | 47 | # key point class: 1:visible, 2: not visible, 3: not marked 48 | for start_point_id in range(len(point_class)): 49 | if point_class[start_point_id] == 3: 50 | continue 51 | for end_point_id in range(len(point_class)): 52 | if point_class[end_point_id] == 3: 53 | continue 54 | 55 | if [start_point_id, end_point_id] in line_list: 56 | rr, cc = draw.line(int(point_ver[start_point_id]), int(point_hor[start_point_id]), 57 | int(point_ver[end_point_id]), int(point_hor[end_point_id])) 58 | draw.set_color(img, [rr, cc], [255, 0, 0]) 59 | 60 | return img 61 | 62 | 63 | def _int64_feature(value): 64 | return tf.train.Feature(int64_list=tf.train.Int64List(value=[value])) 65 | 66 | 67 | def _bytes_feature(value): 68 | return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value])) 69 | 70 | 71 | def _float_feature(value): 72 | return tf.train.Feature(float_list=tf.train.FloatList(value=[value])) 73 | 74 | 75 | def extract_people_from_dataset(dataset_root_path='../../../dataset/ai_challenger/', image_save_path='../dataset/imgs/', 76 | tfrecords_path='../dataset/', is_shuffle=True): 77 | """ 78 | This function is used to extract people from the AI Challenger dataset. The extract image will contain only one 79 | person each and will be saved as a single .jpg file. At last, the image and the the corresponding annotation will 80 | be saved into a .tfrecord file. 81 | 82 | :param dataset_root_path: the root path of the AI Challenger dataset. 83 | :param image_save_path: the path used to save the clipped images. 84 | :param tfrecord_path: the path used to save the .tfrecords file. 85 | :param is_shuffle: is shuffle. 86 | :return: None. 87 | """ 88 | annotation_file = os.path.join(dataset_root_path, 'keypoint_train_annotations_20170909.json') 89 | image_read_path = os.path.join(dataset_root_path, 'train_images') 90 | tfrecords_file = os.path.join(tfrecords_path, 'train.tfrecords') 91 | 92 | if not os.path.exists(tfrecords_path): 93 | os.mkdir(tfrecords_path) 94 | if os.path.exists(tfrecords_file): 95 | os.remove(tfrecords_file) 96 | if os.path.exists(image_save_path): 97 | useless = os.listdir(image_save_path) 98 | for onefile in useless: 99 | os.remove(os.path.join(image_save_path, onefile)) 100 | else: 101 | os.mkdir(image_save_path) 102 | 103 | saved_number = 0 104 | image_number = 0 105 | start_time = time.time() 106 | with tf.python_io.TFRecordWriter(tfrecords_file) as tfwriter: 107 | 108 | with open(annotation_file, 'r') as jsfile: 109 | data = json.load(jsfile) 110 | 111 | for one_item in data: 112 | img_id = one_item['image_id'] 113 | image_number += 1 114 | if image_number % 100 == 0: 115 | print('Processed {} images, extracted {} people from the dataset. ' 116 | 'time = {}'.format(image_number, saved_number, time.time() - start_time)) 117 | 118 | kps = one_item['keypoint_annotations'] 119 | boxes = one_item['human_annotations'] 120 | 121 | # read image 122 | img_filename = os.path.join(image_read_path, img_id + '.jpg') 123 | img = io.imread(img_filename) 124 | 125 | for i in range(len(boxes)): 126 | # construct the name of a human in the dictionary, 127 | # for example, the first one (when i = 0) is 'human1' 128 | human_name = 'human' + str(i+1) 129 | 130 | kp = kps[human_name] 131 | box = boxes[human_name] 132 | p1_hor, p1_ver, p2_hor, p2_ver = box 133 | foreground = img[p1_ver:p2_ver, p1_hor:p2_hor, :] 134 | 135 | try: 136 | foreground = resize(foreground, (256, 192, 3)) 137 | except ValueError: 138 | print('ValueError at image {} and {}'.format(image_number, human_name)) 139 | continue 140 | 141 | foreground = foreground * 255.0 142 | foreground_uint8 = np.uint8(foreground) 143 | 144 | kp_hor = (np.array(kp[0::3]) - p1_hor) / (p2_hor - p1_hor) * 192 145 | kp_ver = (np.array(kp[1::3]) - p1_ver) / (p2_ver - p1_ver) * 256 146 | kp_class = np.array(kp[2::3]) 147 | 148 | img_name = img_id + '_' + human_name + '.jpg' 149 | 150 | io.imsave(os.path.join(image_save_path, img_id + '_' + human_name + '.jpg'), foreground_uint8) 151 | 152 | example = tf.train.Example( 153 | features=tf.train.Features( 154 | feature={ 155 | 'image_name': _bytes_feature(img_name.encode()), 156 | 'image_raw': _bytes_feature(foreground_uint8.tobytes()), 157 | 'keypoints_ver': _bytes_feature(np.uint8(kp_ver).tobytes()), 158 | 'keypoints_hor': _bytes_feature(np.uint8(kp_hor).tobytes()), 159 | 'keypoints_class': _bytes_feature(np.uint8(kp_class).tobytes()) 160 | })) 161 | tfwriter.write(example.SerializeToString()) 162 | 163 | saved_number += 1 164 | print('Extracted {} people from the dataset in total.'.format(saved_number)) 165 | 166 | 167 | def decode_proto(proto): 168 | features = tf.parse_single_example(proto, 169 | features={ 170 | 'image_name': tf.FixedLenFeature([], tf.string), 171 | 'image_raw': tf.FixedLenFeature([], tf.string), 172 | 'keypoints_ver': tf.FixedLenFeature([], tf.string), 173 | 'keypoints_hor': tf.FixedLenFeature([], tf.string), 174 | 'keypoints_class': tf.FixedLenFeature([], tf.string), 175 | }) 176 | image_name = features['image_name'] 177 | 178 | image_raw = tf.decode_raw(features['image_raw'], out_type=np.uint8) 179 | image = tf.reshape(image_raw, [256, 192, 3]) 180 | 181 | keypoints_ver = tf.decode_raw(features['keypoints_ver'], out_type=np.uint8) 182 | keypoints_hor = tf.decode_raw(features['keypoints_hor'], out_type=np.uint8) 183 | keypoints_class = tf.decode_raw(features['keypoints_class'], out_type=np.uint8) 184 | return image_name, image, keypoints_ver, keypoints_hor, keypoints_class 185 | 186 | 187 | def decode_tfrecord(filename_queue): 188 | tfreader = tf.TFRecordReader() 189 | _, proto = tfreader.read(filename_queue) 190 | image_name, image, keypoints_ver, keypoints_hor, keypoints_class = decode_proto(proto) 191 | 192 | return image_name, image, keypoints_ver, keypoints_hor, keypoints_class 193 | 194 | 195 | def input_batch(datasetname, batch_size, num_epochs): 196 | """ 197 | This function is used to decode the TFrecord and return a batch of images as well as their information 198 | :param datasetname: the name of the TFrecord file. 199 | :param batch_size: the number of images in a batch 200 | :param num_epochs: the number of epochs 201 | :return: a batch of images as well as their information 202 | """ 203 | with tf.name_scope('input_batch'): 204 | # The shuffle transformation uses a finite-sized buffer to shuffle elements 205 | # in memory. The parameter is the number of elements in the buffer. For 206 | # completely uniform shuffling, set the parameter to be the same as the 207 | # number of elements in the dataset. 208 | mydataset = tf.data.TFRecordDataset(datasetname) 209 | mydataset = mydataset.map(decode_proto) 210 | 211 | # have no idea why I can't set the parameter of mydataset.shuffle to be the number of the dataset...... 212 | # mydataset = mydataset.shuffle(200) 213 | mydataset = mydataset.repeat(num_epochs * 2) 214 | # drop all the data that can't be used to make up a batch 215 | mydataset = mydataset.batch(batch_size, drop_remainder=True) 216 | iterator = mydataset.make_one_shot_iterator() 217 | 218 | nextelement = iterator.get_next() 219 | return nextelement 220 | 221 | 222 | def mytest(): 223 | tfrecord_file = '../dataset/train.tfrecords' 224 | 225 | filename_queue = tf.train.string_input_producer([tfrecord_file], num_epochs=None) 226 | image_name, image, keypoints_ver, keypoints_hor, keypoints_class = decode_tfrecord(filename_queue) 227 | 228 | with tf.Session() as sess: 229 | init_op = tf.global_variables_initializer() 230 | sess.run(init_op) 231 | coord = tf.train.Coordinator() 232 | threads = tf.train.start_queue_runners(coord=coord) 233 | try: 234 | # while not coord.should_stop(): 235 | for i in range(10): 236 | img_name, img, point_ver, point_hor, point_class = sess.run([image_name, image, keypoints_ver, 237 | keypoints_hor, keypoints_class]) 238 | 239 | print(img_name, point_hor, point_ver, point_class) 240 | 241 | for i in range(len(point_class)): 242 | if point_class[i] > 0: 243 | rr, cc = draw.circle(point_ver[i], point_hor[i], 10, (256, 192)) 244 | img[rr, cc, :] = 0 245 | 246 | io.imshow(img) 247 | io.show() 248 | 249 | except tf.errors.OutOfRangeError: 250 | print('Done reading') 251 | finally: 252 | coord.request_stop() 253 | 254 | 255 | if __name__ == '__main__': 256 | extract_people_from_dataset() 257 | #mytest() 258 | 259 | 260 | -------------------------------------------------------------------------------- /src/evaluate.py: -------------------------------------------------------------------------------- 1 | """ 2 | This file is used to evaluate the performance of the model. Please run the test.py before running this file. 3 | 4 | @ Author: Yu Sun. vxallset@outlook.com 5 | 6 | @ Date created: Jun 04, 2019 7 | 8 | @ Last modified: Jun 27, 2019 9 | 10 | """ 11 | import numpy as np 12 | 13 | 14 | def calculate_sigma2s(distances): 15 | sigma2s = np.zeros(14, dtype=np.float) 16 | for keypoint_id in range(14): 17 | distance = distances[:, keypoint_id] 18 | distance2 = distance ** 2 19 | sigma2s[keypoint_id] = np.mean(distance2) 20 | return sigma2s 21 | 22 | 23 | def calculate_OKS(distances, classes): 24 | sigma2s = calculate_sigma2s(distances) 25 | sigmas = np.sqrt(sigma2s) 26 | oks = np.zeros(len(distances)) 27 | for id in range(len(distances)): 28 | one_distance = distances[id] 29 | one_class = classes[id] 30 | one_oks = np.sum(np.exp(-one_distance ** 2 / (2.0 * (1 * sigmas) ** 2)) * 31 | np.array(one_class != 3, dtype=np.int)) / np.sum(np.array(one_class != 3, dtype=np.int)) 32 | oks[id] = one_oks 33 | 34 | return oks 35 | 36 | 37 | if __name__ == '__main__': 38 | distance_file = 'distances.npy' 39 | classes_file = 'classes.npy' 40 | distances = np.load(distance_file) 41 | classes = np.load(classes_file) 42 | oks = calculate_OKS(distances, classes) 43 | oks50_mask = np.array(oks > 0.5, dtype=np.int) 44 | oks75_mask = np.array(oks > 0.75, dtype=np.int) 45 | ap50 = np.sum(oks50_mask) / len(oks50_mask) 46 | ap75 = np.sum(oks75_mask) / len(oks75_mask) 47 | print("AP50 = {}, AP75 = {}".format(ap50, ap75)) 48 | -------------------------------------------------------------------------------- /src/heatmap.py: -------------------------------------------------------------------------------- 1 | """ 2 | This file is used to generate the heat map and other stuffs. 3 | 4 | @ Author: Yu Sun. vxallset@outlook.com 5 | 6 | @ Date created: Jun 04, 2019 7 | 8 | @ Last modified: Jun 27, 2019 9 | 10 | """ 11 | import numpy as np 12 | from skimage import io, draw 13 | from dataset import draw_lines_on_img 14 | 15 | 16 | def gaussian_kernel(kernel_length=3, sigma=1.): 17 | """ 18 | creates gaussian kernel with side length l and a sigma of sig 19 | """ 20 | 21 | ax = np.arange(-kernel_length // 2 + 1., kernel_length // 2 + 1.) 22 | xx, yy = np.meshgrid(ax, ax) 23 | 24 | kernel = np.exp(-0.5 * (np.square(xx) + np.square(yy)) / np.square(sigma)) 25 | 26 | return kernel / np.sum(kernel) 27 | 28 | 29 | def calculate_groundtruth_heatmap(keypoint_ver, keypoint_hor, kepoint_class, kernel_length=3, sigma=1.0): 30 | batch_size, keypoints_number = kepoint_class.shape 31 | assert kernel_length % 2 == 1, 'kernel_length must be odd!' 32 | kernel = gaussian_kernel(kernel_length=kernel_length, sigma=sigma) 33 | half_length = kernel_length // 2 34 | heatmap = np.zeros((batch_size, 256, 192, keypoints_number), dtype=np.float32) 35 | 36 | for b in range(batch_size): 37 | for n in range(keypoints_number): 38 | # if the keypoint class is 3, continue 39 | if kepoint_class[b, n] == 3: 40 | continue 41 | 42 | for i in range(-half_length, half_length + 1): 43 | for j in range(-half_length, half_length + 1): 44 | if keypoint_ver[b, n] + i >= 256 or keypoint_ver[b, n] + i < 0 \ 45 | or keypoint_hor[b, n] + j >= 192 or keypoint_hor[b, n] + j < 0: 46 | continue 47 | heatmap[b, keypoint_ver[b, n] + i, keypoint_hor[b, n] + j, n] += kernel[i + half_length, j + half_length] 48 | return heatmap 49 | 50 | 51 | def decode_output(net_output, threshold=0.0): 52 | batch_size, size_ver, size_hor, keypoints_number = net_output.shape 53 | kp_ver = np.zeros((batch_size, keypoints_number)) 54 | kp_hor = np.zeros_like(kp_ver) 55 | kp_class = np.ones_like(kp_hor) * 3 56 | 57 | for b in range(batch_size): 58 | for n in range(keypoints_number): 59 | max_index = np.argmax(net_output[b, :, :, n]) 60 | max_row = max_index // 192 61 | max_col = max_index % 192 62 | if net_output[b, max_row, max_col, n] > threshold: 63 | # print(net_output[b, max_row, max_col, n]) 64 | kp_ver[b, n] = max_row 65 | kp_hor[b, n] = max_col 66 | kp_class[b, n] = 1 67 | prediction = np.zeros((batch_size, keypoints_number * 3)) 68 | prediction[:, ::3] = kp_ver 69 | prediction[:, 1::3] = kp_hor 70 | prediction[:, 2::3] = kp_class 71 | return prediction 72 | 73 | 74 | def decode_pose(images, net_output, threshold=0.001): 75 | # key point class: 1:visible, 2: invisible, 3: not marked 76 | prediction = decode_output(net_output, threshold=threshold) 77 | 78 | batch_size, size_ver, size_hor, keypoints_number = net_output.shape 79 | kp_ver = prediction[:, ::3] 80 | kp_hor = prediction[:, 1::3] 81 | kp_class = prediction[:, 2::3] 82 | 83 | for b in range(batch_size): 84 | point_hor = kp_hor[b] 85 | point_ver = kp_ver[b] 86 | point_class = kp_class[b] 87 | images[b, :, :, :] = draw_lines_on_img(images[b], point_ver, point_hor, point_class) 88 | for i in range(len(point_class)): 89 | if point_class[i] != 3: 90 | rr, cc = draw.circle(point_ver[i], point_hor[i], 10, (256, 192)) 91 | images[b, rr, cc, :] = 0 92 | 93 | return images 94 | 95 | 96 | def calculate_distance(prediction, groundtruth): 97 | kp_ver_pred = prediction[:, ::3] 98 | kp_hor_pred = prediction[:, 1::3] 99 | kp_class_pred = prediction[:, 2::3] 100 | 101 | kp_ver_gt = groundtruth[:, ::3] 102 | kp_hor_gt = groundtruth[:, 1::3] 103 | kp_class_gt = groundtruth[:, 2::3] 104 | 105 | distance2 = (kp_ver_gt - kp_ver_pred) ** 2 + (kp_hor_gt - kp_hor_pred) ** 2 106 | mask = np.array(kp_class_gt != 3, dtype=np.int) 107 | result = np.sqrt(distance2) * mask 108 | return result 109 | -------------------------------------------------------------------------------- /src/hrnet.py: -------------------------------------------------------------------------------- 1 | """ 2 | This is the structure of the HRNet-32, an implementation of the CVPR 2019 paper "Deep High-Resolution Representation 3 | Learning for Human Pose Estimation" using TensorFlow. 4 | 5 | @ Author: Yu Sun. vxallset@outlook.com 6 | 7 | @ Date created: Jun 04, 2019 8 | 9 | @ Last modified: Jun 06, 2019 10 | 11 | """ 12 | import tensorflow as tf 13 | from utils import * 14 | 15 | 16 | def stage1(input, name='stage1', is_training=True): 17 | output = [] 18 | with tf.variable_scope(name): 19 | s1_res1 = residual_unit_bottleneck(input, name='rs1', is_training=is_training) 20 | s1_res2 = residual_unit_bottleneck(s1_res1, name='rs2', is_training=is_training) 21 | s1_res3 = residual_unit_bottleneck(s1_res2, name='rs3', is_training=is_training) 22 | s1_res4 = residual_unit_bottleneck(s1_res3, name='rs4', is_training=is_training) 23 | output.append(conv_2d(s1_res4, channels=32, activation=leaky_Relu, name=name + '_output', 24 | is_training=is_training)) 25 | return output 26 | 27 | 28 | def stage2(input, name='stage2', is_training=True): 29 | with tf.variable_scope(name): 30 | sub_networks = exchange_between_stage(input, name='between_stage', is_training=is_training) 31 | sub_networks = exchange_block(sub_networks, name='exchange_block', is_training=is_training) 32 | return sub_networks 33 | 34 | 35 | def stage3(input, name='stage3', is_training=True): 36 | with tf.variable_scope(name): 37 | sub_networks = exchange_between_stage(input, name=name, is_training=is_training) 38 | sub_networks = exchange_block(sub_networks, name='exchange_block1', is_training=is_training) 39 | sub_networks = exchange_block(sub_networks, name='exchange_block2', is_training=is_training) 40 | sub_networks = exchange_block(sub_networks, name='exchange_block3', is_training=is_training) 41 | sub_networks = exchange_block(sub_networks, name='exchange_block4', is_training=is_training) 42 | return sub_networks 43 | 44 | 45 | def stage4(input, name='stage4', is_training=True): 46 | with tf.variable_scope(name): 47 | sub_networks = exchange_between_stage(input, name=name, is_training=is_training) 48 | sub_networks = exchange_block(sub_networks, name='exchange_block1', is_training=is_training) 49 | sub_networks = exchange_block(sub_networks, name='exchange_block2', is_training=is_training) 50 | sub_networks = exchange_block(sub_networks, name='exchange_block3', is_training=is_training) 51 | return sub_networks 52 | 53 | 54 | def HRNet(input, is_training=True, eps=1e-10): 55 | output = stage1(input=input, is_training=is_training) 56 | output = stage2(input=output, is_training=is_training) 57 | output = stage3(input=output, is_training=is_training) 58 | output = stage4(input=output, is_training=is_training) 59 | 60 | # The output contains 4 sub-networks, we only need the first one, which contains information of all 61 | # resolution levels 62 | output = output[0] 63 | 64 | # using a 3x3 convolution to reduce the channels of feature maps to 14 (the number of keypoints) 65 | output = conv_2d(output, channels=14, kernel_size=3, batch_normalization=False, name='change_channel', 66 | is_training=is_training, activation=tf.nn.relu) 67 | # sigmoid can convert the output to the interval of (0, 1) 68 | # output = tf.nn.sigmoid(output, name='net_output') 69 | 70 | # If we don't normalize the value of the output to 1, the net may predict the values on all pixels to be 0, which 71 | # will make the loss of one image to be around 1.75 (batch_size = 1, 256, 192, 14). This is because that the value 72 | # of an 3 x 3 gaussian kernel is g = 73 | # [[0.07511361 0.1238414 0.07511361] 74 | # [0.1238414 0.20417996 0.1238414 ] 75 | # [0.07511361 0.1238414 0.07511361]] 76 | 77 | # so g^2 = 78 | # [[0.00564205 0.01533669 0.00564205] 79 | # [0.01533669 0.04168945 0.01533669] 80 | # [0.00564205 0.01533669 0.00564205]] 81 | # therefore, np.sum(g^2) * 14 = 1.75846 82 | 83 | # In order to avoid this from happening, we need to normalize the value of the net output by dividing the value on 84 | # all pixels by the sum of the value on that image (1, 256, 192, 1). Or we may calculate the classification loss 85 | # to indicate the class of the key points. 86 | 87 | 88 | # sum up the value on each pixels, the result should be a [batch_size, 14] tensor, then expend dim to be 89 | # [batch_size, 1, 1, 14] tensor so as to normalize the output 90 | output_sum = tf.expand_dims(tf.expand_dims(tf.reduce_sum(tf.reduce_sum(output, axis=-2), 91 | axis=-2), axis=-2), axis=-2, name='net_output_sum') 92 | 93 | output = tf.truediv(output, output_sum + eps, name='net_output_final') 94 | 95 | return output 96 | 97 | 98 | def mytest(): 99 | input = tf.ones((16, 256, 192, 3)) 100 | output = HRNet(input) 101 | 102 | print(output) 103 | 104 | 105 | def compute_loss(net_output, ground_truth): 106 | diff = tf.square(tf.subtract(net_output, ground_truth), name='square_difference') 107 | loss = tf.reduce_sum(diff, name='loss') 108 | #loss = tf.losses.mean_squared_error(ground_truth, net_output) 109 | 110 | return loss 111 | 112 | 113 | if __name__ == '__main__': 114 | mytest() 115 | -------------------------------------------------------------------------------- /src/temp.py: -------------------------------------------------------------------------------- 1 | """ 2 | This file is used to test the model using the AI Challenger dataset. 3 | 4 | @ Author: Yu Sun. vxallset@outlook.com 5 | 6 | @ Date created: Jun 04, 2019 7 | 8 | @ Last modified: Apr 13, 2019 9 | 10 | """ 11 | import numpy as np 12 | import tensorflow as tf 13 | from hrnet import * 14 | import dataset 15 | from heatmap import * 16 | import time 17 | import os 18 | from skimage import io 19 | from skimage.transform import resize 20 | import cv2 21 | 22 | def main(use_GPU = True): 23 | batch_size = 1 24 | num_epochs = 10 25 | image_numbers = 378352 26 | #image_numbers = 4500 27 | 28 | root_path = os.getcwd()[:-3] 29 | 30 | datasetname = os.path.join(root_path, 'dataset/test.tfrecords') 31 | model_folder = os.path.join(root_path, 'models/') 32 | modelfile = os.path.join(root_path, 'models/epoch2.ckpt-567528') 33 | 34 | global_step = tf.Variable(0, trainable=False) 35 | 36 | image_name, image, keypoints_ver, keypoints_hor, keypoints_class = dataset.input_batch( 37 | datasetname=datasetname, batch_size=batch_size, num_epochs=num_epochs) 38 | 39 | input_images = tf.placeholder(tf.float32, [None, 256, 192, 3]) 40 | ground_truth = tf.placeholder(tf.float32, [None, 256, 192, 14]) 41 | 42 | input_images = tf.cast(input_images / 255.0, tf.float32, name='change_type') 43 | net_output = HRNet(input=input_images) 44 | loss = compute_loss(net_output=net_output, ground_truth=ground_truth) 45 | 46 | saver = tf.train.Saver() 47 | device = '/gpu:0' 48 | if not use_GPU: 49 | os.environ['CUDA_VISIBLE_DEVICES'] = '' 50 | device = '/cpu:0' 51 | 52 | video_capture = cv2.VideoCapture('./cxk.mp4') 53 | fps = video_capture.get(cv2.CAP_PROP_FPS) 54 | start_second = 0 55 | start_frame = fps * start_second 56 | video_capture.set(cv2.CAP_PROP_POS_FRAMES, start_frame) 57 | 58 | with tf.Session() as sess: 59 | with tf.device(device): 60 | sess.run(tf.global_variables_initializer()) 61 | saver.restore(sess=sess, save_path=modelfile) 62 | 63 | 64 | try: 65 | framid = 0 66 | while True: 67 | start_time = time.time() 68 | retval, img_data = video_capture.read() 69 | if not retval: 70 | break 71 | img_data = cv2.cvtColor(img_data, code=cv2.COLOR_BGR2RGB) 72 | _img = cv2.resize(img_data, (192, 256)) 73 | _img = np.array([_img]) 74 | 75 | tnet_output = sess.run(net_output, feed_dict={input_images: _img}) 76 | 77 | #prediction = decode_output(tnet_output, threshold=0.001) 78 | 79 | timgs = decode_pose(_img, tnet_output, threshold=0.001) 80 | resultimg = timgs[0]/ 255.0 81 | 82 | io.imsave('../demo_img/frame_{}.jpg'.format(framid), resultimg) 83 | framid += 1 84 | print('time = {}'.format(time.time() - start_time)) 85 | print('---------------------------------------------------------------------------------') 86 | 87 | except tf.errors.OutOfRangeError: 88 | print('End testing...') 89 | finally: 90 | total_time = time.time() - start_time 91 | print('Running time: {} s'.format(total_time)) 92 | print('Done!') 93 | 94 | 95 | if __name__ == '__main__': 96 | main(use_GPU=True) 97 | -------------------------------------------------------------------------------- /src/test.py: -------------------------------------------------------------------------------- 1 | """ 2 | This file is used to test the model using the AI Challenger dataset. 3 | 4 | @ Author: Yu Sun. vxallset@outlook.com 5 | 6 | @ Date created: Jun 04, 2019 7 | 8 | @ Last modified: Jun 27, 2019 9 | 10 | """ 11 | import numpy as np 12 | import tensorflow as tf 13 | from hrnet import * 14 | import dataset 15 | from heatmap import * 16 | import time 17 | import os 18 | 19 | from functools import reduce 20 | from operator import mul 21 | 22 | def get_num_params(): 23 | num_params = 0 24 | for variable in tf.trainable_variables(): 25 | shape = variable.get_shape() 26 | num_params += reduce(mul, [dim.value for dim in shape], 1) 27 | return num_params 28 | 29 | 30 | def main(device_option='/gpu:0'): 31 | batch_size = 1 32 | num_epochs = 10 33 | image_numbers = 378352 34 | #image_numbers = 4500 35 | 36 | root_path = os.getcwd()[:-3] 37 | 38 | datasetname = os.path.join(root_path, 'dataset/train.tfrecords') 39 | model_folder = os.path.join(root_path, 'models/') 40 | modelfile = os.path.join(root_path, 'models/epoch2.ckpt-567528') 41 | 42 | global_step = tf.Variable(0, trainable=False) 43 | 44 | image_name, image, keypoints_ver, keypoints_hor, keypoints_class = dataset.input_batch( 45 | datasetname=datasetname, batch_size=batch_size, num_epochs=num_epochs) 46 | 47 | input_images = tf.placeholder(tf.float32, [None, 256, 192, 3]) 48 | ground_truth = tf.placeholder(tf.float32, [None, 256, 192, 14]) 49 | 50 | input_images = tf.cast(input_images / 255.0, tf.float32, name='change_type') 51 | net_output = HRNet(input=input_images) 52 | loss = compute_loss(net_output=net_output, ground_truth=ground_truth) 53 | 54 | saver = tf.train.Saver() 55 | # os.environ['CUDA_VISIBLE_DEVICES'] = '' 56 | 57 | with tf.Session() as sess: 58 | with tf.device(device_option): 59 | sess.run(tf.global_variables_initializer()) 60 | saver.restore(sess=sess, save_path=modelfile) 61 | #print(get_num_params()) 62 | 63 | writer = tf.summary.FileWriter('../log/', sess.graph) 64 | start_time = time.time() 65 | try: 66 | distances = 0 67 | classes = 0 68 | for step in range(int(image_numbers / batch_size)): 69 | _img, _kp_ver, _kp_hor, _kp_class = sess.run( 70 | [image, keypoints_ver, keypoints_hor, keypoints_class]) 71 | _gt = calculate_groundtruth_heatmap(_kp_ver, _kp_hor, _kp_class) 72 | 73 | tloss, tnet_output = sess.run([loss, net_output], 74 | feed_dict={input_images: _img, ground_truth: _gt}) 75 | 76 | prediction = decode_output(tnet_output, threshold=0.001) 77 | gt_all = np.zeros((batch_size, 14*3)) 78 | gt_all[:, ::3] = _kp_ver 79 | gt_all[:, 1::3] = _kp_hor 80 | gt_all[:, 2::3] = _kp_class 81 | distance = calculate_distance(prediction, gt_all) 82 | 83 | if step == 0: 84 | distances = distance 85 | classes = _kp_class 86 | elif step == 1000: 87 | np.save('distances.npy', distances) 88 | np.save('classes.npy', classes) 89 | break 90 | else: 91 | distances = np.append(distances, distance, axis=0) 92 | classes = np.append(classes, _kp_class, axis=0) 93 | 94 | timgs = decode_pose(_img, tnet_output, threshold=0.001) 95 | for i in range(batch_size): 96 | io.imsave('../test_img/step{}_i_{}.jpg'.format(step, i), timgs[i]) 97 | print('Step = {:>6}/{:>6}, loss = {:.6f}, time = {}' 98 | .format(step, int(image_numbers / batch_size), tloss, 99 | time.time() - start_time)) 100 | print('---------------------------------------------------------------------------------') 101 | 102 | except tf.errors.OutOfRangeError: 103 | print('End testing...') 104 | finally: 105 | total_time = time.time() - start_time 106 | print('Running time: {} s'.format(total_time)) 107 | print('Done!') 108 | 109 | 110 | if __name__ == '__main__': 111 | main(device_option='/gpu:0') 112 | -------------------------------------------------------------------------------- /src/train.py: -------------------------------------------------------------------------------- 1 | """ 2 | This file is used to train the HRNet-32 model. 3 | 4 | @ Author: Yu Sun. vxallset@outlook.com 5 | 6 | @ Date created: Jun 04, 2019 7 | 8 | @ Last modified: Jun 27, 2019 9 | 10 | """ 11 | import numpy as np 12 | import tensorflow as tf 13 | from hrnet import * 14 | import dataset 15 | from heatmap import * 16 | import time 17 | import os 18 | 19 | 20 | def main(gpu_divice='/gpu:0'): 21 | is_training = True 22 | 23 | batch_size = 1 24 | num_epochs = 10 25 | image_numbers = 378352 26 | learning_rate = 0.001 27 | save_epoch_number = 1 28 | root_path = os.getcwd()[:-3] 29 | 30 | datasetname = os.path.join(root_path, 'dataset/train.tfrecords') 31 | model_folder = os.path.join(root_path, 'models/') 32 | modelfile = os.path.join(root_path, 'models/model.ckpt') 33 | 34 | global_step = tf.Variable(0, trainable=False) 35 | 36 | image_name, image, keypoints_ver, keypoints_hor, keypoints_class = dataset.input_batch( 37 | datasetname=datasetname, batch_size=batch_size, num_epochs=num_epochs) 38 | 39 | input_images = tf.placeholder(tf.float32, [None, 256, 192, 3]) 40 | ground_truth = tf.placeholder(tf.float32, [None, 256, 192, 14]) 41 | 42 | input_images = tf.cast(input_images / 255.0, tf.float32, name='change_type') 43 | net_output = HRNet(input=input_images, is_training=is_training) 44 | loss = compute_loss(net_output=net_output, ground_truth=ground_truth) 45 | 46 | saver = tf.train.Saver() 47 | train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss, global_step=global_step) 48 | 49 | with tf.Session() as sess: 50 | with tf.device(gpu_divice): 51 | sess.run(tf.global_variables_initializer()) 52 | 53 | writer = tf.summary.FileWriter('../log/', sess.graph) 54 | start_time = time.time() 55 | try: 56 | for epoch in range(num_epochs): 57 | epoch_time = time.time() 58 | for step in range(int(image_numbers / batch_size)): 59 | _img, _kp_ver, _kp_hor, _kp_class = sess.run( 60 | [image, keypoints_ver, keypoints_hor, keypoints_class]) 61 | _gt = calculate_groundtruth_heatmap(_kp_ver, _kp_hor, _kp_class) 62 | 63 | train_step.run(feed_dict={input_images: _img, ground_truth: _gt}) 64 | 65 | if step % 100 == 0: 66 | tloss, tnet_output = sess.run([loss, net_output], 67 | feed_dict={input_images: _img, ground_truth: _gt}) 68 | 69 | timgs = decode_pose(_img, tnet_output, threshold=0.0) 70 | for i in range(batch_size): 71 | io.imsave('../demo_img/epoch{}_step{}_i_{}.jpg'.format(epoch, step, i), timgs[i]) 72 | print('Epoch {:>2}/{}, step = {:>6}/{:>6}, loss = {:.6f}, time = {}' 73 | .format(epoch, num_epochs, step, int(image_numbers / batch_size), tloss, 74 | time.time() - epoch_time)) 75 | print('---------------------------------------------------------------------------------') 76 | if epoch % save_epoch_number == 0: 77 | saver.save(sess, model_folder + 'epoch{}.ckpt'.format(epoch), global_step=global_step) 78 | print('Model saved in: {}'.format(model_folder + 'epoch{}.ckpt'.format(epoch))) 79 | except tf.errors.OutOfRangeError: 80 | print('End training...') 81 | finally: 82 | total_time = time.time() - start_time 83 | saver.save(sess, modelfile, global_step=global_step) 84 | print('Model saved as: {}, runing time: {} s'.format(modelfile, total_time)) 85 | print('Done!') 86 | 87 | """ 88 | imgs, kp_vers, kp_hors, kp_classses = sess.run([output, keypoints_ver, keypoints_hor, keypoints_class]) 89 | img = imgs[0] 90 | kp_ver = kp_vers[0] 91 | kp_hor = kp_hors[0] 92 | kp_classs = kp_classses[0] 93 | 94 | dataset.draw_points_on_img(img, point_ver=kp_ver, point_hor=kp_hor, point_class=kp_classs) 95 | """ 96 | 97 | 98 | if __name__ == '__main__': 99 | main(gpu_divice='/gpu:0') 100 | -------------------------------------------------------------------------------- /src/utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | This is the utils for deep learning, implemented with TensorFlow. 3 | 4 | @ Author: Yu Sun. vxallset@outlook.com 5 | 6 | @ Date created: Jun 04, 2019 7 | 8 | @ Last modified: Jun 06, 2019 9 | 10 | """ 11 | import tensorflow as tf 12 | 13 | 14 | def leaky_Relu(input, name=''): 15 | return tf.nn.leaky_relu(input, alpha=0.1, name=name + '_relu') 16 | 17 | 18 | def conv_2d(inputs, channels, kernel_size=3, strides=1, batch_normalization=True, activation=None, 19 | name='', padding='same', kernel_initializer=tf.random_normal_initializer(stddev=0.01), is_training=True): 20 | 21 | output = tf.layers.conv2d(inputs=inputs, filters=channels, kernel_size=kernel_size, strides=strides, 22 | padding=padding, name=name + '_conv', kernel_initializer=kernel_initializer) 23 | name = name + '_conv' 24 | 25 | if batch_normalization: 26 | output = tf.layers.batch_normalization(output, axis=-1, momentum=0.9, name=name+'_bn', training=is_training) 27 | name = name + '_bn' 28 | 29 | if activation: 30 | output = activation(output, name=name) 31 | 32 | return output 33 | 34 | 35 | def down_sampling(input, method='strided_convolution', rate=2, name='', activation=leaky_Relu, is_training=True): 36 | assert method == 'max_pooling' or method == 'strided_convolution', \ 37 | 'Unknown type of down_sample method! "strided_convolution" and "' \ 38 | 'max_pooling" are expected, but "' + method + '" is provided!' 39 | output = input 40 | 41 | if method == 'strided_convolution': 42 | _, _, _, channels = input.get_shape() 43 | channels = channels.value 44 | output = input 45 | loop_index = 1 46 | new_rate = rate 47 | while new_rate > 1: 48 | assert new_rate % 2 == 0, 'The rate of down_sampling (using "strided_convolution") must be the power of ' \ 49 | '2, but "{}" is provided!'.format(rate) 50 | output = conv_2d(output, channels=channels * (2 ** loop_index), strides=2, activation=activation, 51 | name=name + 'down_sampling' + '_x' + str(loop_index * 2), is_training=is_training) 52 | loop_index += 1 53 | new_rate = int(new_rate / 2) 54 | 55 | elif method == 'max_pooling': 56 | output = tf.layers.max_pooling2d(input, pool_size=rate, strides=rate, name=name+'_max_pooling') 57 | 58 | return output 59 | 60 | 61 | def up_sampling(input, channels, method='nearest_neighbor', rate=2, name='', activation=leaky_Relu, is_training=True): 62 | assert method == 'nearest_neighbor', 'Only "nearest_neighbor" method is supported now! ' \ 63 | 'However, "' + method + '" is provided.' 64 | output = input 65 | if method == 'nearest_neighbor': 66 | _, x, y, _= input.get_shape() 67 | x = x.value 68 | y = y.value 69 | 70 | output = tf.image.resize_nearest_neighbor(input, size=(x*rate, y*rate), name=name + '_upsampling') 71 | name += '_upsampling' 72 | output = conv_2d(output, channels=channels, kernel_size=1, activation=activation, 73 | name=name + '_align_channels', is_training=is_training) 74 | 75 | return output 76 | 77 | 78 | # Repeated multi-scale fusion (namely the exchange block) within a stage (the input and the output has the same number 79 | # of sub-networks) 80 | def exchange_within_stage(inputs, name='exchange_within_stage', is_training=True): 81 | with tf.variable_scope(name): 82 | subnetworks_number = len(inputs) 83 | outputs = [] 84 | 85 | # suppose i is the index of the input sub-network, o is the index of the output sub-network 86 | for o in range(subnetworks_number): 87 | one_subnetwork = 0 88 | for i in range(subnetworks_number): 89 | if i == o: 90 | # if in the same resolution 91 | temp_subnetwork = inputs[i] 92 | elif i - o < 0: 93 | # if the input resolution is greater the output resolution, down-sampling with rate 94 | # of 2 ** (o - i) 95 | temp_subnetwork = down_sampling(inputs[i], rate=2 ** (o - i), name='i_{}_o_{}'.format(i, o), 96 | is_training=is_training) 97 | else: 98 | # if the input resolution is smaller the output resolution, up-sampling with rate of 99 | # 2 ** (o - i) 100 | _, _, _, c = inputs[o].get_shape() 101 | temp_subnetwork = up_sampling(inputs[i], channels=c, rate=2 ** (i - o), 102 | name='i_{}_o_{}'.format(i, o), is_training=is_training) 103 | one_subnetwork = tf.add(temp_subnetwork, one_subnetwork, name='add_i_{}_o_{}'.format(i, o)) 104 | outputs.append(one_subnetwork) 105 | return outputs 106 | 107 | 108 | # Repeated multi-scale fusion (namely the exchange block) between two stages (the input and the output has the same 109 | # number of sub-networks) 110 | def exchange_between_stage(inputs, name='exchange_between_stage', is_training=True): 111 | subnetworks_number = len(inputs) 112 | outputs = [] 113 | 114 | # suppose i is the index of the input sub-network, o is the index of the output sub-network 115 | for o in range(subnetworks_number): 116 | one_subnetwork = 0 117 | for i in range(subnetworks_number): 118 | if i == o: 119 | # if in the same resolution 120 | temp_subnetwork = inputs[i] 121 | elif i - o < 0: 122 | # if the input resolution is greater the output resolution, down-sampling with rate 123 | # of 2 ** (o - i) 124 | temp_subnetwork = down_sampling(inputs[i], rate=2 ** (o - i), name='i_{}_o_{}'.format(i, o), 125 | is_training=is_training) 126 | else: 127 | # if the input resolution is smaller the output resolution, up-sampling with rate of 128 | # 2 ** (o - i) 129 | _, _, _, c = inputs[o].get_shape() 130 | temp_subnetwork = up_sampling(inputs[i], channels=c, rate=2 ** (i - o), 131 | name='i_{}_o_{}'.format(i, o), is_training=is_training) 132 | one_subnetwork = tf.add(temp_subnetwork, one_subnetwork, name='add_i_{}_o_{}'.format(i, o)) 133 | outputs.append(one_subnetwork) 134 | one_subnetwork = down_sampling(inputs[-1], rate=2, name='new_resolution', is_training=is_training) 135 | outputs.append(one_subnetwork) 136 | return outputs 137 | 138 | 139 | def residual_unit_bottleneck(input, name='RU_bottleneck', channels=64, is_training=True): 140 | """ 141 | Residual unit with bottleneck design, default width is 64. 142 | :param input: 143 | :param name: 144 | :return: 145 | """ 146 | _, _, _, c = input.get_shape() 147 | conv_1x1_1 = conv_2d(input, channels=channels, kernel_size=1, activation=leaky_Relu, name=name + '_conv1x1_1', 148 | is_training = is_training) 149 | conv_3x3 = conv_2d(conv_1x1_1, channels=channels, activation=leaky_Relu, name=name + '_conv3x3', 150 | is_training=is_training) 151 | conv_1x1_2 = conv_2d(conv_3x3, channels=c, kernel_size=1, name=name + '_conv1x1_2', is_training=is_training) 152 | _output = tf.add(input, conv_1x1_2, name=name + '_add') 153 | output = leaky_Relu(_output, name=name + '_out') 154 | return output 155 | 156 | 157 | def residual_unit(input, name='RU', is_training=True): 158 | """ 159 | Residual unit with two 3 x 3 convolution layers. 160 | :param input: 161 | :param name: 162 | :return: 163 | """ 164 | _, _, _, channels = input.get_shape() 165 | conv3x3_1 = conv_2d(inputs=input, channels=channels, activation=leaky_Relu, name=name + '_conv3x3_1', 166 | is_training=is_training) 167 | conv3x3_2 = conv_2d(inputs=conv3x3_1, channels=channels, name=name + '_conv3x3_2', is_training=is_training) 168 | _output = tf.add(input, conv3x3_2, name=name + '_add') 169 | output = leaky_Relu(_output, name=name + '_out') 170 | return output 171 | 172 | 173 | def exchange_block(inputs, name='exchange_block', is_training=True): 174 | with tf.variable_scope(name): 175 | output = [] 176 | level = 0 177 | for input in inputs: 178 | sub_network = residual_unit(input, name='level{}RU1'.format(level), is_training=is_training) 179 | sub_network = residual_unit(sub_network, name='level{}RU2'.format(level), is_training=is_training) 180 | sub_network = residual_unit(sub_network, name='level{}RU3'.format(level), is_training=is_training) 181 | sub_network = residual_unit(sub_network, name='level{}RU4'.format(level), is_training=is_training) 182 | output.append(sub_network) 183 | level += 1 184 | outputs = exchange_within_stage(output, is_training=is_training) 185 | return outputs -------------------------------------------------------------------------------- /test_img/step11_i_0.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/test_img/step11_i_0.jpg -------------------------------------------------------------------------------- /test_img/step136_i_0.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/VXallset/deep-high-resolution-net.TensorFlow/d885abc6f8699f5dfd09b270170f3c68fbf32ac2/test_img/step136_i_0.jpg --------------------------------------------------------------------------------