├── .gitignore ├── LICENSE ├── README.md ├── assets ├── 1.png └── 2.png ├── configuration.py ├── convert_to_tflite.py ├── data_process ├── __init__.py ├── make_dataset.py ├── parse_coco.py ├── parse_voc.py └── read_txt.py ├── dataset └── README.md ├── detect_objects_in_video.py ├── parse_cfg.py ├── saved_model └── README.md ├── test_data └── README.md ├── test_on_single_image.py ├── test_results_during_training └── README.md ├── train_from_scratch.py ├── utils ├── __init__.py ├── iou.py ├── nms.py ├── preprocess.py ├── resize_image.py └── visualize.py ├── write_coco_to_txt.py ├── write_voc_to_txt.py └── yolo ├── __init__.py ├── anchor.py ├── bounding_box.py ├── inference.py ├── loss.py ├── make_label.py └── yolo_v3.py /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | __pycache__ 3 | /dataset/* 4 | !/dataset/README.md 5 | /data_process/data.txt 6 | /saved_model/* 7 | !/saved_model/README.md 8 | /test_data/* 9 | !/test_data/README.md 10 | /test_results_during_training/* 11 | !/test_results_during_training/README.md -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 calmisential 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # YOLOv3_TensorFlow2 2 | A tensorflow2 implementation of YOLO_V3. 3 | 4 | ## Requirements: 5 | + Python == 3.7 6 | + TensorFlow == 2.1.0 7 | + numpy == 1.17.0 8 | + opencv-python == 4.1.0 9 | 10 | ## Usage 11 | ### Train on PASCAL VOC 2012 12 | 1. Download the [PASCAL VOC 2012 dataset](http://host.robots.ox.ac.uk/pascal/VOC/). 13 | 2. Unzip the file and place it in the 'dataset' folder, make sure the directory is like this : 14 | ``` 15 | |——dataset 16 | |——VOCdevkit 17 | |——VOC2012 18 | |——Annotations 19 | |——ImageSets 20 | |——JPEGImages 21 | |——SegmentationClass 22 | |——SegmentationObject 23 | ``` 24 | 3. Change the parameters in **configuration.py** according to the specific situation. Specially, you can set *"load_weights_before_training"* to **True** if you would like to restore training from saved weights. You 25 | can also set *"test_images_during_training"* to **True**, so that the detect results will be show after each epoch. 26 | 4. Run **write_voc_to_txt.py** to generate *data.txt*, and then run **train_from_scratch.py** to start training. 27 | 28 | ### Train on COCO2017 29 | 1. Download the COCO2017 dataset. 30 | 2. Unzip the **train2017.zip**, **annotations_trainval2017.zip** and place them in the 'dataset' folder, make sure the directory is like this : 31 | ``` 32 | |——dataset 33 | |——COCO 34 | |——2017 35 | |——annotations 36 | |——train2017 37 | ``` 38 | 3. Change the parameters in **configuration.py** according to the specific situation. Specially, you can set *"load_weights_before_training"* to **True** if you would like to restore training from saved weights. You 39 | can also set *"test_images_during_training"* to **True**, so that the detect results will be show after each epoch. 40 | 4. Run **write_coco_to_txt.py** to generate *data.txt*, and then run **train_from_scratch.py** to start training. 41 | 42 | 43 | 44 | ### Train on custom dataset 45 | 1. Turn your custom dataset's labels into this form: 46 | ```xxx.jpg 100 200 300 400 1 300 600 500 800 2```. 47 | The first position is the image name, and the next 5 elements are [xmin, ymin, xmax, ymax, class_id]. If there are multiple boxes, continue to add elements later.
**Considering that the image will be resized before it is entered into the network, the values of xmin, ymin, xmax, and ymax will also change accordingly.**
48 | The example of **original picture**(from PASCAL VOC 2012 dataset) and **resized picture**:
49 | ![original picture](https://raw.githubusercontent.com/calmisential/YOLOv3_TensorFlow2/master/assets/1.png) 50 | ![resized picture](https://raw.githubusercontent.com/calmisential/YOLOv3_TensorFlow2/master/assets/2.png)
51 | Create a new file *data.txt* in the data_process directory and write the label of each picture into it, each line is a label for an image. 52 | 2. Change the parameters *CATEGORY_NUM*, *use_dataset*, *custom_dataset_dir*, *custom_dataset_classes* in **configuration.py**. 53 | 3. Run **write_to_txt.py** to generate *data.txt*, and then run **train_from_scratch.py** to start training. 54 | 55 | ### Test 56 | 1. Change *"test_picture_dir"* in **configuration.py** according to the specific situation. 57 | 2. Run **test_on_single_image.py** to test single picture. 58 | 59 | ### Convert model to TensorFlow Lite format 60 | 1. Change the *"TFLite_model_dir"* in **configuration.py** according to the specific situation. 61 | 2. Run **convert_to_tflite.py** to generate TensorFlow Lite model. 62 | 63 | 64 | ## References 65 | 1. YOLO_v3 paper: https://pjreddie.com/media/files/papers/YOLOv3.pdf or https://arxiv.org/abs/1804.02767 66 | 2. Keras implementation of YOLOV3: https://github.com/qqwweee/keras-yolo3 67 | 3. [blog 1](https://www.cnblogs.com/wangxinzhe/p/10592184.html), [blog 2](https://www.cnblogs.com/wangxinzhe/p/10648465.html), [blog 3](https://blog.csdn.net/leviopku/article/details/82660381), [blog 4](https://blog.csdn.net/qq_37541097/article/details/81214953), [blog 5](https://blog.csdn.net/Gentleman_Qin/article/details/84349144), [blog 6](https://blog.csdn.net/qq_34199326/article/details/84109828), [blog 7](https://blog.csdn.net/weixin_38145317/article/details/95349201) 68 | 5. 李金洪. 深度学习之TensorFlow工程化项目实战[M]. 北京: 电子工业出版社, 2019: 343-375 69 | 6. https://zhuanlan.zhihu.com/p/49556105 -------------------------------------------------------------------------------- /assets/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calmiLovesAI/YOLOv3_TensorFlow2/7789280e4974f62b20304f9746aa4121be4c5333/assets/1.png -------------------------------------------------------------------------------- /assets/2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calmiLovesAI/YOLOv3_TensorFlow2/7789280e4974f62b20304f9746aa4121be4c5333/assets/2.png -------------------------------------------------------------------------------- /configuration.py: -------------------------------------------------------------------------------- 1 | # training 2 | EPOCHS = 1000 3 | BATCH_SIZE = 8 4 | load_weights_before_training = False 5 | load_weights_from_epoch = 10 6 | 7 | # input image 8 | IMAGE_HEIGHT = 416 9 | IMAGE_WIDTH = 416 10 | CHANNELS = 3 11 | 12 | # Dataset 13 | CATEGORY_NUM = 80 14 | ANCHOR_NUM_EACH_SCALE = 3 15 | COCO_ANCHORS = [[116, 90], [156, 198], [373, 326], [30, 61], [62, 45], [59, 119], [10, 13], [16, 30], [33, 23]] 16 | COCO_ANCHOR_INDEX = [[0, 1, 2], [3, 4, 5], [6, 7, 8]] 17 | SCALE_SIZE = [13, 26, 52] 18 | 19 | use_dataset = "pascal_voc" # "custom", "pascal_voc", "coco" 20 | 21 | PASCAL_VOC_DIR = "./dataset/VOCdevkit/VOC2012/" 22 | PASCAL_VOC_ANNOTATION = PASCAL_VOC_DIR + "Annotations" 23 | PASCAL_VOC_IMAGE = PASCAL_VOC_DIR + "JPEGImages" 24 | # The 20 object classes of PASCAL VOC 25 | PASCAL_VOC_CLASSES = {"person": 1, "bird": 2, "cat": 3, "cow": 4, "dog": 5, 26 | "horse": 6, "sheep": 7, "aeroplane": 8, "bicycle": 9, 27 | "boat": 10, "bus": 11, "car": 12, "motorbike": 13, 28 | "train": 14, "bottle": 15, "chair": 16, "diningtable": 17, 29 | "pottedplant": 18, "sofa": 19, "tvmonitor": 20} 30 | 31 | COCO_DIR = "./dataset/COCO/2017/" 32 | COCO_CLASSES = {"person": 1, "bicycle": 2, "car": 3, "motorcycle": 4, "airplane": 5, 33 | "bus": 6, "train": 7, "truck": 8, "boat": 9, "traffic light": 10, 34 | "fire hydrant": 11, "stop sign": 12, "parking meter": 13, "bench": 14, 35 | "bird": 15, "cat": 16, "dog": 17, "horse": 18, "sheep": 19, "cow": 20, 36 | "elephant": 21, "bear": 22, "zebra": 23, "giraffe": 24, "backpack": 25, 37 | "umbrella": 26, "handbag": 27, "tie": 28, "suitcase": 29, "frisbee": 30, 38 | "skis": 31, "snowboard": 32, "sports ball": 33, "kite": 34, "baseball bat": 35, 39 | "baseball glove": 36, "skateboard": 37, "surfboard": 38, "tennis racket": 39, 40 | "bottle": 40, "wine glass": 41, "cup": 42, "fork": 43, "knife": 44, "spoon": 45, 41 | "bowl": 46, "banana": 47, "apple": 48, "sandwich": 49, "orange": 50, "broccoli": 51, 42 | "carrot": 52, "hot dog": 53, "pizza": 54, "donut": 55, "cake": 56, "chair": 57, 43 | "couch": 58, "potted plant": 59, "bed": 60, "dining table": 61, "toilet": 62, 44 | "tv": 63, "laptop": 64, "mouse": 65, "remote": 66, "keyboard": 67, "cell phone": 68, 45 | "microwave": 69, "oven": 70, "toaster": 71, "sink": 72, "refrigerator": 73, 46 | "book": 74, "clock": 75, "vase": 76, "scissors": 77, "teddy bear": 78, 47 | "hair drier": 79, "toothbrush": 80} 48 | 49 | 50 | 51 | TXT_DIR = "./data_process/data.txt" 52 | 53 | custom_dataset_dir = "" 54 | custom_dataset_classes = {} 55 | 56 | 57 | 58 | # loss 59 | IGNORE_THRESHOLD = 0.5 60 | 61 | 62 | # NMS 63 | CONFIDENCE_THRESHOLD = 0.6 64 | IOU_THRESHOLD = 0.5 65 | MAX_BOX_NUM = 50 66 | 67 | MAX_TRUE_BOX_NUM_PER_IMG = 20 68 | 69 | 70 | # save model 71 | save_model_dir = "saved_model/" 72 | save_frequency = 5 73 | 74 | # tensorflow lite model 75 | TFLite_model_dir = "yolov3_model.tflite" 76 | 77 | test_images_during_training = True 78 | training_results_save_dir = "./test_results_during_training/" 79 | test_images = ["", ""] 80 | 81 | test_picture_dir = "./test_data/1.jpg" 82 | test_video_dir = "./test_data/test_video.mp4" 83 | temp_frame_dir = "./test_data/temp.jpg" -------------------------------------------------------------------------------- /convert_to_tflite.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | 3 | from configuration import CATEGORY_NUM, save_model_dir, IMAGE_HEIGHT, IMAGE_WIDTH, CHANNELS, TFLite_model_dir 4 | from yolo.yolo_v3 import YOLOV3 5 | 6 | if __name__ == '__main__': 7 | # GPU settings 8 | gpus = tf.config.list_physical_devices(device_type="GPU") 9 | if gpus: 10 | for gpu in gpus: 11 | tf.config.experimental.set_memory_growth(device=gpu, enable=True) 12 | 13 | # load model 14 | yolo_v3 = YOLOV3(out_channels=3 * (CATEGORY_NUM + 5)) 15 | yolo_v3.load_weights(filepath=save_model_dir+"saved_model") 16 | yolo_v3._set_inputs(inputs=tf.random.normal(shape=(1, IMAGE_HEIGHT, IMAGE_WIDTH, CHANNELS))) 17 | 18 | converter = tf.lite.TFLiteConverter.from_keras_model(yolo_v3) 19 | tflite_model = converter.convert() 20 | open(TFLite_model_dir, "wb").write(tflite_model) -------------------------------------------------------------------------------- /data_process/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calmiLovesAI/YOLOv3_TensorFlow2/7789280e4974f62b20304f9746aa4121be4c5333/data_process/__init__.py -------------------------------------------------------------------------------- /data_process/make_dataset.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from data_process.read_txt import ReadTxt 3 | from configuration import BATCH_SIZE, TXT_DIR 4 | import numpy as np 5 | 6 | 7 | def get_length_of_dataset(dataset): 8 | count = 0 9 | for _ in dataset: 10 | count += 1 11 | return count 12 | 13 | 14 | def generate_dataset(): 15 | txt_dataset = tf.data.TextLineDataset(filenames=TXT_DIR) 16 | 17 | train_count = get_length_of_dataset(txt_dataset) 18 | train_dataset = txt_dataset.batch(batch_size=BATCH_SIZE) 19 | 20 | return train_dataset, train_count 21 | 22 | 23 | # Return : 24 | # image_name_list : list, length is N (N is the batch size.) 25 | # boxes_array : numpy.ndarrray, shape is (N, MAX_TRUE_BOX_NUM_PER_IMG, 5) 26 | def parse_dataset_batch(dataset): 27 | image_name_list = [] 28 | boxes_list = [] 29 | len_of_batch = dataset.shape[0] 30 | for i in range(len_of_batch): 31 | image_name, boxes = ReadTxt(line_bytes=dataset[i].numpy()).parse_line() 32 | image_name_list.append(image_name) 33 | boxes_list.append(boxes) 34 | boxes_array = np.array(boxes_list) 35 | return image_name_list, boxes_array 36 | -------------------------------------------------------------------------------- /data_process/parse_coco.py: -------------------------------------------------------------------------------- 1 | from configuration import COCO_DIR, COCO_CLASSES 2 | import json 3 | from pathlib import Path 4 | import time 5 | 6 | from utils.resize_image import ResizeWithPad 7 | 8 | 9 | class ParseCOCO(object): 10 | def __init__(self): 11 | self.annotation_dir = COCO_DIR + "annotations/" 12 | self.images_dir = COCO_DIR + "train2017/" 13 | self.train_annotation = Path(self.annotation_dir + "instances_train2017.json") 14 | start_time = time.time() 15 | self.train_dict = self.__load_json(self.train_annotation) 16 | print("It took {:.2f} seconds to load the json files.".format(time.time() - start_time)) 17 | print(self.__get_category_id_information(self.train_dict)) 18 | 19 | def __load_json(self, json_file): 20 | print("Start loading {}...".format(json_file.name)) 21 | with json_file.open(mode='r') as f: 22 | load_dict = json.load(f) 23 | print("Loading is complete!") 24 | return load_dict 25 | 26 | def __find_all(self, x, value): 27 | list_data = [] 28 | for i in range(len(x)): 29 | if x[i] == value: 30 | list_data.append(i) 31 | return list_data 32 | 33 | def __get_image_information(self, data_dict): 34 | images = data_dict["images"] 35 | image_file_list = [] 36 | image_id_list = [] 37 | image_height_list = [] 38 | image_width_list = [] 39 | for image in images: 40 | image_file_list.append(image["file_name"]) 41 | image_id_list.append(image["id"]) 42 | image_height_list.append(image["height"]) 43 | image_width_list.append(image["width"]) 44 | return image_file_list, image_id_list, image_height_list, image_width_list 45 | 46 | def __get_bounding_box_information(self, data_dict): 47 | annotations = data_dict["annotations"] 48 | image_id_list = [] 49 | bbox_list = [] 50 | category_id_list = [] 51 | for annotation in annotations: 52 | category_id_list.append(annotation["category_id"]) 53 | image_id_list.append(annotation["image_id"]) 54 | bbox_list.append(annotation["bbox"]) 55 | return image_id_list, bbox_list, category_id_list 56 | 57 | def __get_category_id_information(self, data_dict): 58 | categories = data_dict["categories"] 59 | category_dict = {} 60 | for category in categories: 61 | category_dict[category["name"]] = category["id"] 62 | return category_dict 63 | 64 | def __process_coord(self, h, w, x_min, y_min, x_max, y_max): 65 | x_min, y_min, x_max, y_max = ResizeWithPad(h=h, w=w).raw_to_resized(x_min=x_min, y_min=y_min, x_max=x_max, y_max=y_max) 66 | return int(x_min), int(y_min), int(x_max), int(y_max) 67 | 68 | def __bbox_information(self, image_id, image_ids_from_annotation, bboxes, image_height, image_width, category_ids): 69 | processed_bboxes = [] 70 | index_list = self.__find_all(x=image_ids_from_annotation, value=image_id) 71 | for index in index_list: 72 | x, y, w, h = bboxes[index] 73 | xmax = int(x + w) 74 | ymax = int(y + h) 75 | x_min, y_min, x_max, y_max = self.__process_coord(h=image_height, w=image_width, x_min=x, y_min=y, x_max=xmax, y_max=ymax) 76 | processed_bboxes.append([x_min, y_min, x_max, y_max, self.__category_id_transform(category_ids[index])]) 77 | return processed_bboxes 78 | 79 | def __category_id_transform(self, original_id): 80 | category_id_dict = self.__get_category_id_information(self.train_dict) 81 | original_name = "none" 82 | for category_name, category_id in category_id_dict.items(): 83 | if category_id == original_id: 84 | original_name = category_name 85 | if original_name == "none": 86 | raise ValueError("An error occurred while transforming the category id.") 87 | return COCO_CLASSES[original_name] 88 | 89 | def __bbox_str(self, bboxes): 90 | bbox_info = "" 91 | for bbox in bboxes: 92 | for item in bbox: 93 | bbox_info += str(item) 94 | bbox_info += " " 95 | return bbox_info.strip() 96 | 97 | def write_data_to_txt(self, txt_dir): 98 | image_files, image_ids, image_heights, image_widths = self.__get_image_information(self.train_dict) 99 | image_ids_from_annotation, bboxes, category_ids = self.__get_bounding_box_information(self.train_dict) 100 | with open(file=txt_dir, mode="a+") as f: 101 | picture_index = 0 102 | for i in range(len(image_files)): 103 | write_line_start_time = time.time() 104 | line_info = "" 105 | line_info += image_files[i] + " " 106 | processed_bboxes = self.__bbox_information(image_ids[i], 107 | image_ids_from_annotation, 108 | bboxes, 109 | image_heights[i], 110 | image_widths[i], 111 | category_ids) 112 | if processed_bboxes: 113 | picture_index += 1 114 | line_info += self.__bbox_str(bboxes=processed_bboxes) 115 | line_info += "\n" 116 | print("Writing information of the {}th picture {} to {}, which took {:.2f}s".format(picture_index, image_files[i], txt_dir, time.time() - write_line_start_time)) 117 | f.write(line_info) 118 | 119 | 120 | 121 | -------------------------------------------------------------------------------- /data_process/parse_voc.py: -------------------------------------------------------------------------------- 1 | import xml.dom.minidom as xdom 2 | from configuration import PASCAL_VOC_CLASSES, PASCAL_VOC_ANNOTATION, PASCAL_VOC_IMAGE 3 | import os 4 | from utils.resize_image import ResizeWithPad 5 | 6 | 7 | class ParsePascalVOC(object): 8 | def __init__(self): 9 | super(ParsePascalVOC, self).__init__() 10 | self.all_xml_dir = PASCAL_VOC_ANNOTATION 11 | self.all_image_dir = PASCAL_VOC_IMAGE 12 | 13 | def __str_to_int(self, x): 14 | return int(float(x)) 15 | 16 | def __process_coord(self, h, w, x_min, y_min, x_max, y_max): 17 | h = self.__str_to_int(h) 18 | w = self.__str_to_int(w) 19 | x_min = self.__str_to_int(x_min) 20 | y_min = self.__str_to_int(y_min) 21 | x_max = self.__str_to_int(x_max) 22 | y_max = self.__str_to_int(y_max) 23 | 24 | x_min, y_min, x_max, y_max = ResizeWithPad(h=h, w=w).raw_to_resized(x_min=x_min, y_min=y_min, x_max=x_max, y_max=y_max) 25 | 26 | return int(x_min), int(y_min), int(x_max), int(y_max) 27 | 28 | # parse one xml file 29 | def __parse_xml(self, xml): 30 | obj_and_box_list = [] 31 | DOMTree = xdom.parse(os.path.join(self.all_xml_dir, xml)) 32 | annotation = DOMTree.documentElement 33 | image_name = annotation.getElementsByTagName("filename")[0].childNodes[0].data 34 | size = annotation.getElementsByTagName("size") 35 | image_height = 0 36 | image_width = 0 37 | for s in size: 38 | image_height = s.getElementsByTagName("height")[0].childNodes[0].data 39 | image_width = s.getElementsByTagName("width")[0].childNodes[0].data 40 | 41 | obj = annotation.getElementsByTagName("object") 42 | for o in obj: 43 | o_list = [] 44 | obj_name = o.getElementsByTagName("name")[0].childNodes[0].data 45 | bndbox = o.getElementsByTagName("bndbox") 46 | for box in bndbox: 47 | xmin = box.getElementsByTagName("xmin")[0].childNodes[0].data 48 | ymin = box.getElementsByTagName("ymin")[0].childNodes[0].data 49 | xmax = box.getElementsByTagName("xmax")[0].childNodes[0].data 50 | ymax = box.getElementsByTagName("ymax")[0].childNodes[0].data 51 | xmin, ymin, xmax, ymax = self.__process_coord(image_height, image_width, xmin, ymin, xmax, ymax) 52 | o_list.append(xmin) 53 | o_list.append(ymin) 54 | o_list.append(xmax) 55 | o_list.append(ymax) 56 | break 57 | o_list.append(PASCAL_VOC_CLASSES[obj_name]) 58 | obj_and_box_list.append(o_list) 59 | return image_name, obj_and_box_list 60 | 61 | def __combine_info(self, image_name, box_list): 62 | line_str = image_name 63 | line_str += " " 64 | for box in box_list: 65 | for item in box: 66 | item_str = str(item) 67 | line_str += item_str 68 | line_str += " " 69 | line_str = line_str.strip() 70 | return line_str 71 | 72 | def write_data_to_txt(self, txt_dir): 73 | for item in os.listdir(self.all_xml_dir): 74 | image_name, box_list = self.__parse_xml(xml=item) 75 | print("Writing information of picture {} to {}".format(image_name, txt_dir)) 76 | # Combine the information into one line. 77 | line_info = self.__combine_info(image_name, box_list) 78 | line_info += "\n" 79 | with open(txt_dir, mode="a+") as f: 80 | f.write(line_info) 81 | -------------------------------------------------------------------------------- /data_process/read_txt.py: -------------------------------------------------------------------------------- 1 | from configuration import MAX_TRUE_BOX_NUM_PER_IMG 2 | 3 | 4 | class ReadTxt(object): 5 | def __init__(self, line_bytes): 6 | super(ReadTxt, self).__init__() 7 | # bytes -> string 8 | self.line_str = bytes.decode(line_bytes, encoding="utf-8") 9 | 10 | def parse_line(self): 11 | line_info = self.line_str.strip('\n') 12 | split_line = line_info.split(" ") 13 | box_num = (len(split_line) - 1) / 5 14 | image_name = split_line[0] 15 | # print("Reading {}".format(image_name)) 16 | split_line = split_line[1:] 17 | boxes = [] 18 | for i in range(MAX_TRUE_BOX_NUM_PER_IMG): 19 | if i < box_num: 20 | box_xmin = int(float(split_line[i * 5])) 21 | box_ymin = int(float(split_line[i * 5 + 1])) 22 | box_xmax = int(float(split_line[i * 5 + 2])) 23 | box_ymax = int(float(split_line[i * 5 + 3])) 24 | class_id = int(split_line[i * 5 + 4]) 25 | boxes.append([box_xmin, box_ymin, box_xmax, box_ymax, class_id]) 26 | else: 27 | box_xmin = 0 28 | box_ymin = 0 29 | box_xmax = 0 30 | box_ymax = 0 31 | class_id = 0 32 | boxes.append([box_xmin, box_ymin, box_xmax, box_ymax, class_id]) 33 | 34 | return image_name, boxes 35 | 36 | -------------------------------------------------------------------------------- /dataset/README.md: -------------------------------------------------------------------------------- 1 | Dataset -------------------------------------------------------------------------------- /detect_objects_in_video.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import cv2 3 | from configuration import test_video_dir, temp_frame_dir, CATEGORY_NUM, save_model_dir 4 | from test_on_single_image import single_image_inference 5 | from yolo.yolo_v3 import YOLOV3 6 | 7 | 8 | def frame_detection(frame, model): 9 | cv2.imwrite(filename=temp_frame_dir, img=frame) 10 | frame = single_image_inference(image_dir=temp_frame_dir, model=model) 11 | return frame 12 | 13 | 14 | if __name__ == '__main__': 15 | # GPU settings 16 | gpus = tf.config.experimental.list_physical_devices('GPU') 17 | if gpus: 18 | for gpu in gpus: 19 | tf.config.experimental.set_memory_growth(gpu, True) 20 | 21 | # load model 22 | yolo_v3 = YOLOV3(out_channels=3 * (CATEGORY_NUM + 5)) 23 | yolo_v3.load_weights(filepath=save_model_dir+"saved_model") 24 | 25 | capture = cv2.VideoCapture(test_video_dir) 26 | fps = capture.get(cv2.CAP_PROP_FPS) 27 | while True: 28 | ret, frame = capture.read() 29 | if ret: 30 | new_frame = frame_detection(frame, yolo_v3) 31 | cv2.namedWindow("detect result", flags=cv2.WINDOW_NORMAL) 32 | cv2.imshow("detect result", new_frame) 33 | cv2.waitKey(int(1000 / fps)) 34 | else: 35 | break 36 | capture.release() 37 | cv2.destroyAllWindows() -------------------------------------------------------------------------------- /parse_cfg.py: -------------------------------------------------------------------------------- 1 | from configuration import PASCAL_VOC_DIR, PASCAL_VOC_CLASSES, \ 2 | custom_dataset_classes, custom_dataset_dir, use_dataset, COCO_CLASSES, COCO_DIR 3 | 4 | 5 | class ParseCfg(): 6 | 7 | def get_images_dir(self): 8 | if use_dataset == "custom": 9 | return custom_dataset_dir 10 | elif use_dataset == "pascal_voc": 11 | return PASCAL_VOC_DIR + "JPEGImages" 12 | elif use_dataset == "coco": 13 | return COCO_DIR + "train2017" 14 | 15 | def get_classes(self): 16 | if use_dataset == "custom": 17 | return custom_dataset_classes 18 | elif use_dataset == "pascal_voc": 19 | return PASCAL_VOC_CLASSES 20 | elif use_dataset == "coco": 21 | return COCO_CLASSES 22 | 23 | 24 | -------------------------------------------------------------------------------- /saved_model/README.md: -------------------------------------------------------------------------------- 1 | The model will be saved here. -------------------------------------------------------------------------------- /test_data/README.md: -------------------------------------------------------------------------------- 1 | Test pictures and videos -------------------------------------------------------------------------------- /test_on_single_image.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import cv2 3 | from configuration import test_picture_dir, save_model_dir, CHANNELS, CATEGORY_NUM 4 | from parse_cfg import ParseCfg 5 | from yolo.inference import Inference 6 | from yolo.yolo_v3 import YOLOV3 7 | from utils.preprocess import resize_image_with_pad 8 | 9 | 10 | def find_class_name(class_id): 11 | for k, v in ParseCfg().get_classes().items(): 12 | if v == class_id: 13 | return k 14 | 15 | 16 | # shape of boxes : (N, 4) (xmin, ymin, xmax, ymax) 17 | # shape of scores : (N,) 18 | # shape of classes : (N,) 19 | def draw_boxes_on_image(image, boxes, scores, classes): 20 | 21 | num_boxes = boxes.shape[0] 22 | for i in range(num_boxes): 23 | class_and_score = str(find_class_name(classes[i] + 1)) + ": " + str(scores[i].numpy()) 24 | cv2.rectangle(img=image, pt1=(boxes[i, 0], boxes[i, 1]), pt2=(boxes[i, 2], boxes[i, 3]), color=(255, 0, 0), thickness=2) 25 | cv2.putText(img=image, text=class_and_score, org=(boxes[i, 0], boxes[i, 1] - 10), fontFace=cv2.FONT_HERSHEY_COMPLEX, fontScale=1.5, color=(0, 255, 255), thickness=2) 26 | return image 27 | 28 | 29 | def single_image_inference(image_dir, model): 30 | image = tf.io.decode_image(contents=tf.io.read_file(image_dir), channels=CHANNELS) 31 | h = image.shape[0] 32 | w = image.shape[1] 33 | input_image_shape = tf.constant([h, w], dtype=tf.dtypes.float32) 34 | img_tensor = resize_image_with_pad(image) 35 | img_tensor = tf.dtypes.cast(img_tensor, dtype=tf.dtypes.float32) 36 | # img_tensor = img_tensor / 255.0 37 | yolo_output = model(img_tensor, training=False) 38 | boxes, scores, classes = Inference(yolo_output=yolo_output, input_image_shape=input_image_shape).get_final_boxes() 39 | image_with_boxes = draw_boxes_on_image(cv2.imread(image_dir), boxes, scores, classes) 40 | return image_with_boxes 41 | 42 | 43 | if __name__ == '__main__': 44 | # GPU settings 45 | gpus = tf.config.list_physical_devices(device_type="GPU") 46 | if gpus: 47 | for gpu in gpus: 48 | tf.config.experimental.set_memory_growth(device=gpu, enable=True) 49 | 50 | # load model 51 | yolo_v3 = YOLOV3(out_channels=3 * (CATEGORY_NUM + 5)) 52 | yolo_v3.load_weights(filepath=save_model_dir+"saved_model") 53 | # inference 54 | image = single_image_inference(image_dir=test_picture_dir, model=yolo_v3) 55 | 56 | cv2.namedWindow("detect result", flags=cv2.WINDOW_NORMAL) 57 | cv2.imshow("detect result", image) 58 | cv2.waitKey(0) -------------------------------------------------------------------------------- /test_results_during_training/README.md: -------------------------------------------------------------------------------- 1 | The test results on the sample images during training will be saved here. -------------------------------------------------------------------------------- /train_from_scratch.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | 3 | from utils.visualize import visualize_training_results 4 | from yolo.yolo_v3 import YOLOV3 5 | from configuration import CATEGORY_NUM, IMAGE_HEIGHT, IMAGE_WIDTH, CHANNELS, EPOCHS, BATCH_SIZE, \ 6 | save_model_dir, save_frequency, load_weights_before_training, load_weights_from_epoch, \ 7 | test_images_during_training, test_images 8 | from yolo.loss import YoloLoss 9 | from data_process.make_dataset import generate_dataset, parse_dataset_batch 10 | from yolo.make_label import GenerateLabel 11 | from utils.preprocess import process_image_filenames 12 | 13 | 14 | def print_model_summary(network): 15 | network.build(input_shape=(None, IMAGE_HEIGHT, IMAGE_WIDTH, CHANNELS)) 16 | network.summary() 17 | 18 | 19 | def generate_label_batch(true_boxes): 20 | true_label = GenerateLabel(true_boxes=true_boxes, input_shape=[IMAGE_HEIGHT, IMAGE_WIDTH]).generate_label() 21 | return true_label 22 | 23 | 24 | if __name__ == '__main__': 25 | # GPU settings 26 | gpus = tf.config.list_physical_devices(device_type="GPU") 27 | if gpus: 28 | for gpu in gpus: 29 | tf.config.experimental.set_memory_growth(device=gpu, enable=True) 30 | 31 | # dataset 32 | train_dataset, train_count = generate_dataset() 33 | 34 | net = YOLOV3(out_channels=3 * (CATEGORY_NUM + 5)) 35 | print_model_summary(network=net) 36 | 37 | if load_weights_before_training: 38 | net.load_weights(filepath=save_model_dir+"epoch-{}".format(load_weights_from_epoch)) 39 | print("Successfully load weights!") 40 | else: 41 | load_weights_from_epoch = -1 42 | 43 | # loss and optimizer 44 | yolo_loss = YoloLoss() 45 | lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay( 46 | initial_learning_rate=0.001, 47 | decay_steps=3000, 48 | decay_rate=0.96, 49 | staircase=True 50 | ) 51 | optimizer = tf.optimizers.Adam(learning_rate=lr_schedule) 52 | 53 | 54 | # metrics 55 | loss_metric = tf.metrics.Mean() 56 | 57 | def train_step(image_batch, label_batch): 58 | with tf.GradientTape() as tape: 59 | yolo_output = net(image_batch, training=True) 60 | loss = yolo_loss(y_true=label_batch, y_pred=yolo_output) 61 | gradients = tape.gradient(loss, net.trainable_variables) 62 | optimizer.apply_gradients(grads_and_vars=zip(gradients, net.trainable_variables)) 63 | loss_metric.update_state(values=loss) 64 | 65 | 66 | for epoch in range(load_weights_from_epoch + 1, EPOCHS): 67 | step = 0 68 | for dataset_batch in train_dataset: 69 | step += 1 70 | images, boxes = parse_dataset_batch(dataset=dataset_batch) 71 | labels = generate_label_batch(true_boxes=boxes) 72 | train_step(image_batch=process_image_filenames(images), label_batch=labels) 73 | print("Epoch: {}/{}, step: {}/{}, loss: {:.5f}".format(epoch, 74 | EPOCHS, 75 | step, 76 | tf.math.ceil(train_count / BATCH_SIZE), 77 | loss_metric.result())) 78 | 79 | loss_metric.reset_states() 80 | 81 | if epoch % save_frequency == 0: 82 | net.save_weights(filepath=save_model_dir+"epoch-{}".format(epoch), save_format='tf') 83 | 84 | if test_images_during_training: 85 | visualize_training_results(pictures=test_images, model=net, epoch=epoch) 86 | 87 | net.save_weights(filepath=save_model_dir+"saved_model", save_format='tf') -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calmiLovesAI/YOLOv3_TensorFlow2/7789280e4974f62b20304f9746aa4121be4c5333/utils/__init__.py -------------------------------------------------------------------------------- /utils/iou.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | # Box_1 and box_2 have different center points, and their last dimension is 4 (x, y, w, h). 5 | class IOUDifferentXY(): 6 | def __init__(self, box_1, box_2): 7 | super(IOUDifferentXY, self).__init__() 8 | self.box_1_min, self.box_1_max = IOUDifferentXY.__get_box_min_and_max(box_1) 9 | self.box_2_min, self.box_2_max = IOUDifferentXY.__get_box_min_and_max(box_2) 10 | self.box_1_area = IOUDifferentXY.__get_box_area(box_1) 11 | self.box_2_area = IOUDifferentXY.__get_box_area(box_2) 12 | 13 | @staticmethod 14 | def __get_box_min_and_max(box): 15 | box_xy = box[..., 0:2] 16 | box_wh = box[..., 2:4] 17 | box_min = box_xy - box_wh / 2 18 | box_max = box_xy + box_wh / 2 19 | return box_min, box_max 20 | 21 | @staticmethod 22 | def __get_box_area(box): 23 | return box[..., 2] * box[..., 3] 24 | 25 | def calculate_iou(self): 26 | intersect_min = np.maximum(self.box_1_min, self.box_2_min) 27 | intersect_max = np.minimum(self.box_1_max, self.box_2_max) 28 | intersect_wh = np.maximum(intersect_max - intersect_min, 0.0) 29 | intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1] 30 | union_area = self.box_1_area + self.box_2_area - intersect_area 31 | iou = intersect_area / union_area 32 | return iou 33 | 34 | 35 | # Calculate the IOU between two boxes, both center points are (0, 0). 36 | # The shape of anchors : [1, 9, 2] 37 | # The shape of boxes : [N, 1, 2] 38 | class IOUSameXY(): 39 | def __init__(self, anchors, boxes): 40 | super(IOUSameXY, self).__init__() 41 | self.anchor_max = anchors / 2 42 | self.anchor_min = - self.anchor_max 43 | self.box_max = boxes / 2 44 | self.box_min = - self.box_max 45 | self.anchor_area = anchors[..., 0] * anchors[..., 1] 46 | self.box_area = boxes[..., 0] * boxes[..., 1] 47 | 48 | def calculate_iou(self): 49 | intersect_min = np.maximum(self.box_min, self.anchor_min) 50 | intersect_max = np.minimum(self.box_max, self.anchor_max) 51 | intersect_wh = np.maximum(intersect_max - intersect_min + 1.0, 0.0) 52 | intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1] # w * h 53 | union_area = self.anchor_area + self.box_area - intersect_area 54 | iou = intersect_area / union_area # shape : [N, 9] 55 | 56 | return iou 57 | 58 | 59 | -------------------------------------------------------------------------------- /utils/nms.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from configuration import IOU_THRESHOLD, CONFIDENCE_THRESHOLD, MAX_BOX_NUM, CATEGORY_NUM 3 | 4 | 5 | class NMS(): 6 | def __init__(self): 7 | super(NMS, self).__init__() 8 | self.max_box_num = MAX_BOX_NUM 9 | self.num_class = CATEGORY_NUM 10 | 11 | def nms(self, boxes, box_scores): 12 | mask = box_scores >= CONFIDENCE_THRESHOLD 13 | box_list = [] 14 | score_list = [] 15 | class_list = [] 16 | for i in range(self.num_class): 17 | box_of_class = tf.boolean_mask(boxes, mask[:, i]) 18 | score_of_class = tf.boolean_mask(box_scores[:, i], mask[:, i]) 19 | selected_indices = tf.image.non_max_suppression(boxes=box_of_class, 20 | scores=score_of_class, 21 | max_output_size=self.max_box_num, 22 | iou_threshold=IOU_THRESHOLD) 23 | selected_boxes = tf.gather(box_of_class, selected_indices) 24 | selected_scores = tf.gather(score_of_class, selected_indices) 25 | classes = tf.ones_like(selected_scores, dtype=tf.dtypes.int32) * i 26 | box_list.append(selected_boxes) 27 | score_list.append(selected_scores) 28 | class_list.append(classes) 29 | box_array = tf.concat(values=box_list, axis=0) 30 | score_array = tf.concat(values=score_list, axis=0) 31 | class_array = tf.concat(values=class_list, axis=0) 32 | 33 | return box_array, score_array, class_array 34 | -------------------------------------------------------------------------------- /utils/preprocess.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from configuration import IMAGE_WIDTH, IMAGE_HEIGHT, CHANNELS 3 | from parse_cfg import ParseCfg 4 | import os 5 | 6 | 7 | def resize_image_with_pad(image): 8 | image_tensor = tf.image.resize_with_pad(image=image, target_height=IMAGE_HEIGHT, target_width=IMAGE_WIDTH) 9 | image_tensor = tf.cast(image_tensor, tf.float32) 10 | image_tensor = image_tensor / 255.0 11 | image_tensor = tf.expand_dims(image_tensor, axis=0) 12 | return image_tensor 13 | 14 | 15 | def process_single_image(image_filename): 16 | img_raw = tf.io.read_file(image_filename) 17 | image = tf.io.decode_jpeg(img_raw, channels=CHANNELS) 18 | image = resize_image_with_pad(image=image) 19 | image = tf.dtypes.cast(image, dtype=tf.dtypes.float32) 20 | image = image / 255.0 21 | return image 22 | 23 | 24 | def process_image_filenames(filenames): 25 | image_list = [] 26 | for filename in filenames: 27 | image_path = os.path.join(ParseCfg().get_images_dir(), filename) 28 | image_tensor = process_single_image(image_path) 29 | image_list.append(image_tensor) 30 | return tf.concat(values=image_list, axis=0) 31 | 32 | -------------------------------------------------------------------------------- /utils/resize_image.py: -------------------------------------------------------------------------------- 1 | from configuration import IMAGE_HEIGHT, IMAGE_WIDTH 2 | 3 | 4 | class ResizeWithPad(): 5 | def __init__(self, h, w): 6 | super(ResizeWithPad, self).__init__() 7 | self.H = IMAGE_HEIGHT 8 | self.W = IMAGE_WIDTH 9 | self.h = h 10 | self.w = w 11 | 12 | def get_transform_coefficient(self): 13 | if self.h <= self.w: 14 | longer_edge = "w" 15 | scale = self.W / self.w 16 | padding_length = (self.H - self.h * scale) / 2 17 | else: 18 | longer_edge = "h" 19 | scale = self.H / self.h 20 | padding_length = (self.W - self.w * scale) / 2 21 | return longer_edge, scale, padding_length 22 | 23 | def raw_to_resized(self, x_min, y_min, x_max, y_max): 24 | longer_edge, scale, padding_length = self.get_transform_coefficient() 25 | x_min = x_min * scale 26 | x_max = x_max * scale 27 | y_min = y_min * scale 28 | y_max = y_max * scale 29 | if longer_edge == "h": 30 | x_min += padding_length 31 | x_max += padding_length 32 | else: 33 | y_min += padding_length 34 | y_max += padding_length 35 | return x_min, y_min, x_max, y_max 36 | 37 | def resized_to_raw(self, center_x, center_y, width, height): 38 | longer_edge, scale, padding_length = self.get_transform_coefficient() 39 | center_x *= self.W 40 | width *= self.W 41 | center_y *= self.H 42 | height *= self.H 43 | if longer_edge == "h": 44 | center_x -= padding_length 45 | else: 46 | center_y -= padding_length 47 | center_x = center_x / scale 48 | center_y = center_y / scale 49 | width = width / scale 50 | height = height / scale 51 | return center_x, center_y, width, height -------------------------------------------------------------------------------- /utils/visualize.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | from test_on_single_image import single_image_inference 3 | from configuration import training_results_save_dir 4 | 5 | 6 | def visualize_training_results(pictures, model, epoch): 7 | # pictures : List of image directories. 8 | index = 0 9 | for picture in pictures: 10 | index += 1 11 | result = single_image_inference(image_dir=picture, model=model) 12 | cv2.imwrite(filename=training_results_save_dir + "epoch-{}-picture-{}.jpg".format(epoch, index), img=result) 13 | 14 | -------------------------------------------------------------------------------- /write_coco_to_txt.py: -------------------------------------------------------------------------------- 1 | from data_process.parse_coco import ParseCOCO 2 | from configuration import TXT_DIR 3 | 4 | 5 | if __name__ == '__main__': 6 | coco = ParseCOCO() 7 | coco.write_data_to_txt(txt_dir=TXT_DIR) -------------------------------------------------------------------------------- /write_voc_to_txt.py: -------------------------------------------------------------------------------- 1 | from data_process.parse_voc import ParsePascalVOC 2 | from configuration import TXT_DIR 3 | 4 | 5 | if __name__ == '__main__': 6 | ParsePascalVOC().write_data_to_txt(txt_dir=TXT_DIR) -------------------------------------------------------------------------------- /yolo/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calmiLovesAI/YOLOv3_TensorFlow2/7789280e4974f62b20304f9746aa4121be4c5333/yolo/__init__.py -------------------------------------------------------------------------------- /yolo/anchor.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from configuration import COCO_ANCHORS, COCO_ANCHOR_INDEX 3 | 4 | 5 | def get_coco_anchors(scale_type): 6 | index_list = COCO_ANCHOR_INDEX[scale_type] 7 | return tf.convert_to_tensor(COCO_ANCHORS[index_list[0]: index_list[-1] + 1], dtype=tf.dtypes.float32) -------------------------------------------------------------------------------- /yolo/bounding_box.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from configuration import ANCHOR_NUM_EACH_SCALE, CATEGORY_NUM, IMAGE_HEIGHT 3 | from yolo.anchor import get_coco_anchors 4 | 5 | 6 | def generate_grid_index(grid_dim): 7 | x = tf.range(grid_dim, dtype=tf.dtypes.float32) 8 | y = tf.range(grid_dim, dtype=tf.dtypes.float32) 9 | X, Y = tf.meshgrid(x, y) 10 | X = tf.reshape(X, shape=(-1, 1)) 11 | Y = tf.reshape(Y, shape=(-1, 1)) 12 | return tf.concat(values=[X, Y], axis=-1) 13 | 14 | 15 | def bounding_box_predict(feature_map, scale_type, is_training=False): 16 | h = feature_map.shape[1] 17 | w = feature_map.shape[2] 18 | if h != w: 19 | raise ValueError("The shape[1] and shape[2] of feature map must be the same value.") 20 | area = h * w 21 | pred = tf.reshape(feature_map, shape=(-1, ANCHOR_NUM_EACH_SCALE * area, CATEGORY_NUM + 5)) 22 | # pred = tf.nn.sigmoid(pred) 23 | tx_ty, tw_th, confidence, class_prob = tf.split(pred, num_or_size_splits=[2, 2, 1, CATEGORY_NUM], axis=-1) 24 | confidence = tf.nn.sigmoid(confidence) 25 | class_prob = tf.nn.sigmoid(class_prob) 26 | center_index = generate_grid_index(grid_dim=h) 27 | center_index = tf.tile(center_index, [1, ANCHOR_NUM_EACH_SCALE]) 28 | center_index = tf.reshape(center_index, shape=(1, -1, 2)) 29 | # shape : (1, 507, 2), (1, 2028, 2), (1, 8112, 2) 30 | 31 | center_coord = center_index + tf.nn.sigmoid(tx_ty) 32 | anchors = tf.tile(get_coco_anchors(scale_type) / IMAGE_HEIGHT, [area, 1]) # shape: (507, 2), (2028, 2), (8112, 2) 33 | bw_bh = tf.math.exp(tw_th) * anchors 34 | 35 | box_xy = center_coord / h 36 | box_wh = bw_bh 37 | 38 | 39 | # reshape 40 | center_index = tf.reshape(center_index, shape=(-1, h, w, ANCHOR_NUM_EACH_SCALE, 2)) 41 | box_xy = tf.reshape(box_xy, shape=(-1, h, w, ANCHOR_NUM_EACH_SCALE, 2)) 42 | box_wh = tf.reshape(box_wh, shape=(-1, h, w, ANCHOR_NUM_EACH_SCALE, 2)) 43 | feature_map = tf.reshape(feature_map, shape=(-1, h, w, ANCHOR_NUM_EACH_SCALE, CATEGORY_NUM + 5)) 44 | 45 | # cast dtype 46 | center_index = tf.cast(center_index, dtype=tf.dtypes.float32) 47 | box_xy = tf.cast(box_xy, dtype=tf.dtypes.float32) 48 | box_wh = tf.cast(box_wh, dtype=tf.dtypes.float32) 49 | 50 | if is_training: 51 | return box_xy, box_wh, center_index, feature_map 52 | else: 53 | return box_xy, box_wh, confidence, class_prob 54 | -------------------------------------------------------------------------------- /yolo/inference.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from yolo.bounding_box import bounding_box_predict 3 | from configuration import CATEGORY_NUM, SCALE_SIZE 4 | from utils.nms import NMS 5 | from utils.resize_image import ResizeWithPad 6 | 7 | 8 | class Inference(): 9 | def __init__(self, yolo_output, input_image_shape): 10 | super(Inference, self).__init__() 11 | self.yolo_output = yolo_output 12 | self.input_image_h = input_image_shape[0] 13 | self.input_image_w = input_image_shape[1] 14 | 15 | def __yolo_post_processing(self, feature, scale_type): 16 | box_xy, box_wh, confidence, class_prob = bounding_box_predict(feature_map=feature, 17 | scale_type=scale_type, 18 | is_training=False) 19 | boxes = self.__boxes_to_original_image(box_xy, box_wh) 20 | boxes = tf.reshape(boxes, shape=(-1, 4)) 21 | box_scores = confidence * class_prob 22 | box_scores = tf.reshape(box_scores, shape=(-1, CATEGORY_NUM)) 23 | return boxes, box_scores 24 | 25 | def __boxes_to_original_image(self, box_xy, box_wh): 26 | x = tf.expand_dims(box_xy[..., 0], axis=-1) 27 | y = tf.expand_dims(box_xy[..., 1], axis=-1) 28 | w = tf.expand_dims(box_wh[..., 0], axis=-1) 29 | h = tf.expand_dims(box_wh[..., 1], axis=-1) 30 | x, y, w, h = ResizeWithPad(h=self.input_image_h, w=self.input_image_w).resized_to_raw(center_x=x, center_y=y, width=w, height=h) 31 | xmin = x - w / 2 32 | ymin = y - h / 2 33 | xmax = x + w / 2 34 | ymax = y + h / 2 35 | boxes = tf.concat(values=[xmin, ymin, xmax, ymax], axis=-1) 36 | return boxes 37 | 38 | def get_final_boxes(self): 39 | boxes_list = [] 40 | box_scores_list = [] 41 | for i in range(len(SCALE_SIZE)): 42 | boxes, box_scores = self.__yolo_post_processing(feature=self.yolo_output[i], 43 | scale_type=i) 44 | boxes_list.append(boxes) 45 | box_scores_list.append(box_scores) 46 | boxes_array = tf.concat(boxes_list, axis=0) 47 | box_scores_array = tf.concat(box_scores_list, axis=0) 48 | return NMS().nms(boxes=boxes_array, box_scores=box_scores_array) 49 | 50 | -------------------------------------------------------------------------------- /yolo/loss.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from configuration import SCALE_SIZE, IMAGE_HEIGHT, IMAGE_WIDTH, IGNORE_THRESHOLD 3 | from utils.iou import IOUDifferentXY 4 | from yolo.bounding_box import bounding_box_predict 5 | from yolo.anchor import get_coco_anchors 6 | 7 | 8 | class YoloLoss(tf.keras.losses.Loss): 9 | def __init__(self): 10 | super(YoloLoss, self).__init__() 11 | self.scale_num = len(SCALE_SIZE) 12 | 13 | def call(self, y_true, y_pred): 14 | loss = self.__calculate_loss(y_true=y_true, y_pred=y_pred) 15 | return loss 16 | 17 | def __generate_grid_shape(self): 18 | scale_tensor = tf.convert_to_tensor(SCALE_SIZE, dtype=tf.dtypes.float32) 19 | grid_shape = tf.stack(values=[scale_tensor, scale_tensor], axis=-1) 20 | return grid_shape 21 | 22 | def __get_scale_size(self, scale): 23 | return tf.convert_to_tensor([IMAGE_HEIGHT, IMAGE_WIDTH], dtype=tf.dtypes.float32) / get_coco_anchors(scale_type=scale) 24 | 25 | def __binary_crossentropy_keep_dim(self, y_true, y_pred, from_logits): 26 | x = tf.keras.losses.binary_crossentropy(y_true=y_true, y_pred=y_pred, from_logits=from_logits) 27 | x = tf.expand_dims(x, axis=-1) 28 | return x 29 | 30 | def __calculate_loss(self, y_true, y_pred): 31 | grid_shapes = self.__generate_grid_shape() 32 | total_loss = 0 33 | # batch size 34 | B = y_pred[0].shape[0] 35 | B_int = tf.convert_to_tensor(B, dtype=tf.dtypes.int32) # tf.Tensor(4, shape=(), dtype=int32) 36 | B_float = tf.convert_to_tensor(B, dtype=tf.dtypes.float32) # tf.Tensor(4.0, shape=(), dtype=float32) 37 | for i in range(self.scale_num): 38 | true_object_mask = y_true[i][..., 4:5] 39 | true_object_mask_bool = tf.cast(true_object_mask, dtype=tf.dtypes.bool) 40 | true_class_probs = y_true[i][..., 5:] 41 | 42 | pred_xy, pred_wh, grid, pred_features = bounding_box_predict(feature_map=y_pred[i], 43 | scale_type=i, 44 | is_training=True) 45 | 46 | pred_box = tf.concat(values=[pred_xy, pred_wh], axis=-1) 47 | true_xy_offset = y_true[i][..., 0:2] * grid_shapes[i] - grid 48 | true_wh_offset = tf.math.log(y_true[i][..., 2:4] * self.__get_scale_size(scale=i) + 1e-10) 49 | true_wh_offset = tf.keras.backend.switch(true_object_mask_bool, true_wh_offset, tf.zeros_like(true_wh_offset)) 50 | 51 | 52 | box_loss_scale = 2 - y_true[i][..., 2:3] * y_true[i][..., 3:4] 53 | 54 | ignore_mask = tf.TensorArray(dtype=tf.dtypes.float32, size=1, dynamic_size=True) 55 | 56 | def loop_body(b, ignore_mask): 57 | true_box = tf.boolean_mask(y_true[i][b, ..., 0:4], true_object_mask_bool[b, ..., 0]) 58 | true_box = tf.cast(true_box, dtype=tf.dtypes.float32) 59 | # expand dim for broadcasting 60 | box_1 = tf.expand_dims(pred_box[b], axis=-2) 61 | box_2 = tf.expand_dims(true_box, axis=0) 62 | iou = IOUDifferentXY(box_1=box_1, box_2=box_2).calculate_iou() 63 | best_iou = tf.keras.backend.max(iou, axis=-1) 64 | ignore_mask = ignore_mask.write(b, tf.cast(best_iou < IGNORE_THRESHOLD, dtype=tf.dtypes.float32)) 65 | return b + 1, ignore_mask 66 | 67 | _, ignore_mask = tf.while_loop(lambda b, *args: b < B_int, loop_body, [0, ignore_mask]) 68 | ignore_mask = ignore_mask.stack() 69 | ignore_mask = tf.expand_dims(ignore_mask, axis=-1) 70 | 71 | xy_loss = true_object_mask * box_loss_scale * self.__binary_crossentropy_keep_dim(true_xy_offset, pred_features[..., 0:2], from_logits=True) 72 | wh_loss = true_object_mask * box_loss_scale * 0.5 * tf.math.square(true_wh_offset - pred_features[..., 2:4]) 73 | confidence_loss = true_object_mask * self.__binary_crossentropy_keep_dim(true_object_mask, pred_features[..., 4:5], from_logits=True) + (1 - true_object_mask) * self.__binary_crossentropy_keep_dim(true_object_mask, pred_features[..., 4:5], from_logits=True) * ignore_mask 74 | class_loss = true_object_mask * self.__binary_crossentropy_keep_dim(true_class_probs, pred_features[..., 5:], from_logits=True) 75 | 76 | average_xy_loss = tf.keras.backend.sum(xy_loss) / B_float 77 | average_wh_loss = tf.keras.backend.sum(wh_loss) / B_float 78 | average_confidence_loss = tf.keras.backend.sum(confidence_loss) / B_float 79 | average_class_loss = tf.keras.backend.sum(class_loss) / B_float 80 | total_loss += average_xy_loss + average_wh_loss + average_confidence_loss + average_class_loss 81 | 82 | return total_loss 83 | -------------------------------------------------------------------------------- /yolo/make_label.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from configuration import CATEGORY_NUM, SCALE_SIZE, \ 3 | COCO_ANCHORS, ANCHOR_NUM_EACH_SCALE, COCO_ANCHOR_INDEX 4 | from utils import iou 5 | 6 | 7 | class GenerateLabel(): 8 | def __init__(self, true_boxes, input_shape): 9 | super(GenerateLabel, self).__init__() 10 | self.true_boxes = np.array(true_boxes, dtype=np.float32) 11 | self.input_shape = np.array(input_shape, dtype=np.int32) 12 | self.anchors = np.array(COCO_ANCHORS, dtype=np.float32) 13 | self.batch_size = self.true_boxes.shape[0] 14 | 15 | def generate_label(self): 16 | center_xy = (self.true_boxes[..., 0:2] + self.true_boxes[..., 2:4]) // 2 # shape : [B, N, 2] 17 | box_wh = self.true_boxes[..., 2:4] - self.true_boxes[..., 0:2] # shape : [B, N, 2] 18 | self.true_boxes[..., 0:2] = center_xy / self.input_shape # Normalization 19 | self.true_boxes[..., 2:4] = box_wh / self.input_shape # Normalization 20 | true_label_1 = np.zeros((self.batch_size, SCALE_SIZE[0], SCALE_SIZE[0], ANCHOR_NUM_EACH_SCALE, CATEGORY_NUM + 5)) 21 | true_label_2 = np.zeros((self.batch_size, SCALE_SIZE[1], SCALE_SIZE[1], ANCHOR_NUM_EACH_SCALE, CATEGORY_NUM + 5)) 22 | true_label_3 = np.zeros((self.batch_size, SCALE_SIZE[2], SCALE_SIZE[2], ANCHOR_NUM_EACH_SCALE, CATEGORY_NUM + 5)) 23 | # true_label : list of 3 arrays of type numpy.ndarray(all elements are 0), which shapes are: 24 | # (self.batch_size, 13, 13, 3, 5 + C) 25 | # (self.batch_size, 26, 26, 3, 5 + C) 26 | # (self.batch_size, 52, 52, 3, 5 + C) 27 | true_label = [true_label_1, true_label_2, true_label_3] 28 | # shape : (9, 2) --> (1, 9, 2) 29 | anchors = np.expand_dims(self.anchors, axis=0) 30 | # valid_mask filters out the valid boxes. 31 | valid_mask = box_wh[..., 0] > 0 32 | 33 | for b in range(self.batch_size): 34 | wh = box_wh[b, valid_mask[b]] 35 | if len(wh) == 0: 36 | # For pictures without boxes, iou is not calculated. 37 | continue 38 | # shape of wh : [N, 1, 2], N is the actual number of boxes per picture 39 | wh = np.expand_dims(wh, axis=1) 40 | # Calculate the iou between the box and the anchor, both center points are (0, 0). 41 | iou_value = iou.IOUSameXY(anchors=anchors, boxes=wh).calculate_iou() 42 | # shape of best_anchor : [N] 43 | best_anchor = np.argmax(iou_value, axis=-1) 44 | for i, n in enumerate(best_anchor): 45 | for s in range(ANCHOR_NUM_EACH_SCALE): 46 | if n in COCO_ANCHOR_INDEX[s]: 47 | x = np.floor(self.true_boxes[b, i, 0] * SCALE_SIZE[s]).astype('int32') 48 | y = np.floor(self.true_boxes[b, i, 1] * SCALE_SIZE[s]).astype('int32') 49 | anchor_id = COCO_ANCHOR_INDEX[s].index(n) 50 | class_id = self.true_boxes[b, i, 4].astype('int32') 51 | true_label[s][b, y, x, anchor_id, 0:4] = self.true_boxes[b, i, 0:4] 52 | true_label[s][b, y, x, anchor_id, 4] = 1 53 | true_label[s][b, y, x, anchor_id, 5 + class_id - 1] = 1 54 | 55 | return true_label 56 | -------------------------------------------------------------------------------- /yolo/yolo_v3.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | 3 | 4 | class DarkNetConv2D(tf.keras.layers.Layer): 5 | def __init__(self, filters, kernel_size, strides): 6 | super(DarkNetConv2D, self).__init__() 7 | self.conv = tf.keras.layers.Conv2D(filters=filters, 8 | kernel_size=kernel_size, 9 | strides=strides, 10 | padding="same") 11 | self.bn = tf.keras.layers.BatchNormalization() 12 | 13 | def call(self, inputs, training=None, **kwargs): 14 | x = self.conv(inputs) 15 | x = self.bn(x, training=training) 16 | x = tf.nn.leaky_relu(x, alpha=0.1) 17 | return x 18 | 19 | 20 | class ResidualBlock(tf.keras.layers.Layer): 21 | def __init__(self, filters): 22 | super(ResidualBlock, self).__init__() 23 | self.conv1 = DarkNetConv2D(filters=filters, kernel_size=(1, 1), strides=1) 24 | self.conv2 = DarkNetConv2D(filters=filters * 2, kernel_size=(3, 3), strides=1) 25 | 26 | def call(self, inputs, training=None, **kwargs): 27 | x = self.conv1(inputs, training=training) 28 | x = self.conv2(x, training=training) 29 | x = tf.keras.layers.add([x, inputs]) 30 | return x 31 | 32 | 33 | def make_residual_block(filters, num_blocks): 34 | x = tf.keras.Sequential() 35 | x.add(DarkNetConv2D(filters=2 * filters, kernel_size=(3, 3), strides=2)) 36 | for _ in range(num_blocks): 37 | x.add(ResidualBlock(filters=filters)) 38 | return x 39 | 40 | 41 | class DarkNet53(tf.keras.Model): 42 | def __init__(self): 43 | super(DarkNet53, self).__init__() 44 | self.conv1 = DarkNetConv2D(filters=32, kernel_size=(3, 3), strides=1) 45 | self.block1 = make_residual_block(filters=32, num_blocks=1) 46 | self.block2 = make_residual_block(filters=64, num_blocks=2) 47 | self.block3 = make_residual_block(filters=128, num_blocks=8) 48 | self.block4 = make_residual_block(filters=256, num_blocks=8) 49 | self.block5 = make_residual_block(filters=512, num_blocks=4) 50 | 51 | def call(self, inputs, training=None, **kwargs): 52 | x = self.conv1(inputs, training=training) 53 | x = self.block1(x, training=training) 54 | x = self.block2(x, training=training) 55 | output_1 = self.block3(x, training=training) 56 | output_2 = self.block4(output_1, training=training) 57 | output_3 = self.block5(output_2, training=training) 58 | # print(output_1.shape, output_2.shape, output_3.shape) 59 | return output_3, output_2, output_1 60 | 61 | 62 | class YOLOTail(tf.keras.layers.Layer): 63 | def __init__(self, in_channels, out_channels): 64 | super(YOLOTail, self).__init__() 65 | self.conv1 = DarkNetConv2D(filters=in_channels, kernel_size=(1, 1), strides=1) 66 | self.conv2 = DarkNetConv2D(filters=2 * in_channels, kernel_size=(3, 3), strides=1) 67 | self.conv3 = DarkNetConv2D(filters=in_channels, kernel_size=(1, 1), strides=1) 68 | self.conv4 = DarkNetConv2D(filters=2 * in_channels, kernel_size=(3, 3), strides=1) 69 | self.conv5 = DarkNetConv2D(filters=in_channels, kernel_size=(1, 1), strides=1) 70 | 71 | self.conv6 = DarkNetConv2D(filters=2 * in_channels, kernel_size=(3, 3), strides=1) 72 | self.normal_conv = tf.keras.layers.Conv2D(filters=out_channels, 73 | kernel_size=(1, 1), 74 | strides=1, 75 | padding="same") 76 | 77 | def call(self, inputs, training=None, **kwargs): 78 | x = self.conv1(inputs, training=training) 79 | x = self.conv2(x, training=training) 80 | x = self.conv3(x, training=training) 81 | x = self.conv4(x, training=training) 82 | branch = self.conv5(x, training=training) 83 | 84 | stem = self.conv6(branch, training=training) 85 | stem = self.normal_conv(stem) 86 | return stem, branch 87 | 88 | 89 | class YOLOV3(tf.keras.Model): 90 | def __init__(self, out_channels): 91 | super(YOLOV3, self).__init__() 92 | self.darknet = DarkNet53() 93 | self.tail_1 = YOLOTail(in_channels=512, out_channels=out_channels) 94 | self.upsampling_1 = self._make_upsampling(num_filter=256) 95 | self.tail_2 = YOLOTail(in_channels=256, out_channels=out_channels) 96 | self.upsampling_2 = self._make_upsampling(num_filter=128) 97 | self.tail_3 = YOLOTail(in_channels=128, out_channels=out_channels) 98 | 99 | def _make_upsampling(self, num_filter): 100 | layer = tf.keras.Sequential() 101 | layer.add(DarkNetConv2D(filters=num_filter, kernel_size=(1, 1), strides=1)) 102 | layer.add(tf.keras.layers.UpSampling2D(size=(2, 2))) 103 | return layer 104 | 105 | def call(self, inputs, training=None, mask=None): 106 | x_1, x_2, x_3 = self.darknet(inputs, training=training) 107 | stem_1, branch_1 = self.tail_1(x_1, training=training) 108 | branch_1 = self.upsampling_1(branch_1, training=training) 109 | x_2 = tf.keras.layers.concatenate([branch_1, x_2]) 110 | stem_2, branch_2 = self.tail_2(x_2, training=training) 111 | branch_2 = self.upsampling_2(branch_2, training=training) 112 | x_3 = tf.keras.layers.concatenate([branch_2, x_3]) 113 | stem_3, _ = self.tail_3(x_3, training=training) 114 | 115 | return [stem_1, stem_2, stem_3] 116 | 117 | --------------------------------------------------------------------------------