├── .spyproject ├── codestyle.ini ├── encoding.ini ├── vcs.ini └── workspace.ini ├── README.md ├── checkpoint └── yolov3_tiny_Car │ ├── checkpoint │ ├── log.txt │ ├── model-step_144000_loss_2.974138_lr_3.433207e-06.data-00000-of-00001 │ ├── model-step_144000_loss_2.974138_lr_3.433207e-06.index │ └── model-step_144000_loss_2.974138_lr_3.433207e-06.meta ├── convert_weight.py ├── data ├── COCO.names ├── Car.names ├── darknet_weights │ └── readme ├── demo_data │ ├── car.jpg │ ├── dog.jpg │ ├── kite.jpg │ ├── messi.jpg │ └── results │ │ ├── dog.jpg │ │ ├── kite.jpg │ │ └── messi.jpg ├── logs │ └── readme ├── my_data │ ├── readme │ ├── train_COCO_tf.txt │ └── val_COCO_tf.txt ├── yolov3_anchors.txt ├── yolov3_tiny_COCO_anchors.txt └── yolov3_tiny_Car_anchors.txt ├── detection_result.jpg ├── docs ├── backbone.png └── yolo_v3_architecture.png ├── eval.py ├── get_kmeans.py ├── model ├── __init__.py ├── yolov3.py └── yolov3_tiny.py ├── test_single_image.py ├── test_single_image_pb.py ├── train.py ├── utils ├── __init__.py ├── data_utils.py ├── eval_utils.py ├── layer_utils.py ├── misc_utils.py ├── nms_utils.py └── plot_utils.py └── video_test.py /.spyproject/codestyle.ini: -------------------------------------------------------------------------------- 1 | [codestyle] 2 | indentation = True 3 | 4 | [main] 5 | version = 0.1.0 6 | 7 | -------------------------------------------------------------------------------- /.spyproject/encoding.ini: -------------------------------------------------------------------------------- 1 | [encoding] 2 | text_encoding = utf-8 3 | 4 | [main] 5 | version = 0.1.0 6 | 7 | -------------------------------------------------------------------------------- /.spyproject/vcs.ini: -------------------------------------------------------------------------------- 1 | [vcs] 2 | version_control_system = 3 | use_version_control = False 4 | 5 | [main] 6 | version = 0.1.0 7 | 8 | -------------------------------------------------------------------------------- /.spyproject/workspace.ini: -------------------------------------------------------------------------------- 1 | [workspace] 2 | save_data_on_exit = True 3 | save_history = True 4 | save_non_project_files = False 5 | restore_data_on_startup = True 6 | 7 | [main] 8 | version = 0.1.0 9 | recent_files = ['D:\\Github\\DeepLearning\\YOLO\\YOLOv3_tiny_TensorFlow\\README.md', 'D:\\Github\\DeepLearning\\YOLO\\YOLOv3_tiny_TensorFlow\\utils\\data_utils.py', 'D:\\Github\\DeepLearning\\YOLO\\YOLOv3_tiny_TensorFlow\\test_single_image.py'] 10 | 11 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # YOLOv3 and YOLOv3_tiny for TensorFlow 2 | 3 | ### 1. Introduction 4 | 5 | Add YOLOv3_tiny and data augment(clip, brighten, change saturation) 6 | 7 | ### 2. Requirements 8 | 9 | - tensorflow >= 1.8.0 (lower versions may work too) 10 | - opencv-python 11 | 12 | 13 | ### 3. Running demos 14 | 15 | (1) Single image test demo using ckpt file: 16 | 17 | ```shell 18 | python test_single_image.py ./data/demo_data/car.jpg 19 | ``` 20 | 21 | (2) Single image test demo using pb file: 22 | 23 | ```shell 24 | python test_single_image_pb.py ./data/demo_data/car.jpg 25 | ``` 26 | 27 | ### 4. Training 28 | 29 | #### 4.1 Data preparation 30 | 31 | (1) annotation file 32 | 33 | Generate `train.txt/val.txt/test.txt` files under `./data/my_data/` directory. 34 | One line for one image, in the format like `image_absolute_path box_1 box_2 ... box_n`. 35 | Box_format: `label_index x_min y_min x_max y_max`.(The origin of coordinates is at the left top corner.) 36 | 37 | For example: 38 | 39 | ``` 40 | xxx/xxx/1.jpg 0 453 369 473 391 1 588 245 608 268 41 | xxx/xxx/2.jpg 1 466 403 485 422 2 793 300 809 320 42 | ... 43 | ``` 44 | 45 | **NOTE**: **You should leave a blank line at the end of each txt file.** 46 | 47 | (2) class_names file: 48 | 49 | Generate the `data.names` file under `./data/my_data/` directory. Each line represents a class name. 50 | 51 | For example: 52 | 53 | ``` 54 | bird 55 | car 56 | bike 57 | ... 58 | ``` 59 | 60 | The COCO dataset class names file is placed at `./data/coco.names`. 61 | 62 | (3) prior anchor file: 63 | 64 | Using the kmeans algorithm to get the prior anchors: 65 | 66 | ``` 67 | python get_kmeans.py 68 | ``` 69 | 70 | Then you will get 9 anchors and the average IOU. Save the anchors to a txt file. 71 | 72 | The COCO dataset anchors offered by YOLO v3 author is placed at `./data/yolo_anchors.txt`, you can use that one too. 73 | 74 | **NOTE: The yolo anchors should be scaled to the rescaled new image size. 75 | Suppose your image size is [W, H], and the image will be rescale to 416*416 as input, for each generated anchor [anchor_w, anchor_h], 76 | you should apply the transformation anchor_w = anchor_w / W * 416, anchor_h = anchor_g / H * 416.** 77 | 78 | #### 4.2 Training 79 | 80 | Using `train.py`. The parameters are as following: 81 | 82 | ```shell 83 | $ python train.py -h 84 | usage: train.py 85 | 86 | net_name = 'the yolo model' 87 | anchors_name = 'the anchors name' 88 | body_name = 'the yolo body net' 89 | data_name = 'the training data name' 90 | 91 | 92 | ``` 93 | 94 | Check the `train.py` for more details. You should set the parameters yourself. 95 | 96 | Some training tricks in my experiment: 97 | 98 | the yolov3 using `darknet53`, the yolov3_tiny using `darknet19` 99 | 100 | 101 | 102 | ### Credits: 103 | 104 | I refer to many fantastic repos during the implementation: 105 | 106 | https://github.com/wizyoung/YOLOv3_TensorFlow 107 | 108 | 109 | 110 | 111 | 112 | -------------------------------------------------------------------------------- /checkpoint/yolov3_tiny_Car/checkpoint: -------------------------------------------------------------------------------- 1 | model_checkpoint_path: "model-step_144000_loss_2.974138_lr_3.433207e-06" 2 | all_model_checkpoint_paths: "model-step_136000_loss_4.582234_lr_4.75916e-06" 3 | all_model_checkpoint_paths: "model-step_138000_loss_3.534735_lr_4.386041e-06" 4 | all_model_checkpoint_paths: "model-step_140000_loss_4.178957_lr_4.042176e-06" 5 | all_model_checkpoint_paths: "model-step_142000_loss_2.621993_lr_3.725269e-06" 6 | all_model_checkpoint_paths: "model-step_144000_loss_2.974138_lr_3.433207e-06" 7 | -------------------------------------------------------------------------------- /checkpoint/yolov3_tiny_Car/log.txt: -------------------------------------------------------------------------------- 1 | ===> batch recall: 0.871, batch precision: 0.864 <=== 2 | ===> Epoch: 32, global_step: 142000, recall: 0.553, precision: 0.643, total_loss: 37.199, loss_xy: 1.078, loss_wh: 1.009, loss_conf: 31.234, loss_class: 3.878 3 | Epoch: 32, global_step: 142050, lr: 0.00000373, total_loss: 4.751, loss_xy: 0.814, loss_wh: 0.612, loss_conf: 3.294, loss_class: 0.031 4 | Epoch: 32, global_step: 142100, lr: 0.00000373, total_loss: 3.753, loss_xy: 0.600, loss_wh: 0.523, loss_conf: 2.387, loss_class: 0.243 5 | Epoch: 32, global_step: 142150, lr: 0.00000373, total_loss: 2.970, loss_xy: 0.620, loss_wh: 0.455, loss_conf: 1.796, loss_class: 0.099 6 | Epoch: 32, global_step: 142200, lr: 0.00000373, total_loss: 3.949, loss_xy: 0.836, loss_wh: 0.477, loss_conf: 2.631, loss_class: 0.004 7 | Epoch: 32, global_step: 142250, lr: 0.00000373, total_loss: 4.312, loss_xy: 0.855, loss_wh: 0.771, loss_conf: 2.665, loss_class: 0.021 8 | Epoch: 32, global_step: 142300, lr: 0.00000373, total_loss: 3.754, loss_xy: 0.703, loss_wh: 0.544, loss_conf: 2.499, loss_class: 0.007 9 | Epoch: 32, global_step: 142350, lr: 0.00000358, total_loss: 3.994, loss_xy: 0.756, loss_wh: 0.533, loss_conf: 2.641, loss_class: 0.064 10 | Epoch: 32, global_step: 142400, lr: 0.00000358, total_loss: 2.882, loss_xy: 0.567, loss_wh: 0.392, loss_conf: 1.883, loss_class: 0.040 11 | Epoch: 32, global_step: 142450, lr: 0.00000358, total_loss: 4.038, loss_xy: 0.650, loss_wh: 0.590, loss_conf: 2.733, loss_class: 0.065 12 | Epoch: 32, global_step: 142500, lr: 0.00000358, total_loss: 3.947, loss_xy: 0.767, loss_wh: 0.618, loss_conf: 2.494, loss_class: 0.068 13 | ===> batch recall: 0.789, batch precision: 0.793 <=== 14 | Epoch: 32, global_step: 142550, lr: 0.00000358, total_loss: 2.653, loss_xy: 0.714, loss_wh: 0.633, loss_conf: 1.298, loss_class: 0.008 15 | Epoch: 32, global_step: 142600, lr: 0.00000358, total_loss: 3.421, loss_xy: 0.735, loss_wh: 0.547, loss_conf: 2.098, loss_class: 0.041 16 | Epoch: 32, global_step: 142650, lr: 0.00000358, total_loss: 5.391, loss_xy: 0.845, loss_wh: 0.732, loss_conf: 3.704, loss_class: 0.110 17 | Epoch: 32, global_step: 142700, lr: 0.00000358, total_loss: 2.978, loss_xy: 0.645, loss_wh: 0.454, loss_conf: 1.868, loss_class: 0.011 18 | Epoch: 32, global_step: 142750, lr: 0.00000358, total_loss: 5.921, loss_xy: 1.009, loss_wh: 0.837, loss_conf: 3.879, loss_class: 0.195 19 | Epoch: 32, global_step: 142800, lr: 0.00000358, total_loss: 3.452, loss_xy: 0.587, loss_wh: 0.492, loss_conf: 2.370, loss_class: 0.002 20 | Epoch: 32, global_step: 142850, lr: 0.00000358, total_loss: 3.199, loss_xy: 0.490, loss_wh: 0.345, loss_conf: 2.341, loss_class: 0.023 21 | Epoch: 32, global_step: 142900, lr: 0.00000358, total_loss: 5.232, loss_xy: 0.837, loss_wh: 0.691, loss_conf: 3.569, loss_class: 0.135 22 | Epoch: 32, global_step: 142950, lr: 0.00000358, total_loss: 4.597, loss_xy: 0.833, loss_wh: 0.648, loss_conf: 3.099, loss_class: 0.017 23 | Epoch: 32, global_step: 143000, lr: 0.00000358, total_loss: 5.087, loss_xy: 0.818, loss_wh: 0.623, loss_conf: 3.624, loss_class: 0.022 24 | ===> batch recall: 0.752, batch precision: 0.756 <=== 25 | Epoch: 32, global_step: 143050, lr: 0.00000358, total_loss: 3.583, loss_xy: 0.679, loss_wh: 0.457, loss_conf: 2.438, loss_class: 0.009 26 | Epoch: 32, global_step: 143100, lr: 0.00000358, total_loss: 2.846, loss_xy: 0.587, loss_wh: 0.439, loss_conf: 1.814, loss_class: 0.006 27 | Epoch: 32, global_step: 143150, lr: 0.00000358, total_loss: 4.150, loss_xy: 0.744, loss_wh: 0.681, loss_conf: 2.655, loss_class: 0.070 28 | Epoch: 33, global_step: 143200, lr: 0.00000358, total_loss: 3.527, loss_xy: 0.616, loss_wh: 0.444, loss_conf: 2.453, loss_class: 0.014 29 | Epoch: 33, global_step: 143250, lr: 0.00000358, total_loss: 2.432, loss_xy: 0.571, loss_wh: 0.406, loss_conf: 1.454, loss_class: 0.001 30 | Epoch: 33, global_step: 143300, lr: 0.00000358, total_loss: 5.030, loss_xy: 0.716, loss_wh: 0.769, loss_conf: 3.488, loss_class: 0.056 31 | Epoch: 33, global_step: 143350, lr: 0.00000343, total_loss: 4.151, loss_xy: 0.885, loss_wh: 0.621, loss_conf: 2.626, loss_class: 0.019 32 | Epoch: 33, global_step: 143400, lr: 0.00000343, total_loss: 5.016, loss_xy: 0.822, loss_wh: 0.588, loss_conf: 3.554, loss_class: 0.052 33 | Epoch: 33, global_step: 143450, lr: 0.00000343, total_loss: 4.503, loss_xy: 0.683, loss_wh: 0.507, loss_conf: 3.309, loss_class: 0.003 34 | Epoch: 33, global_step: 143500, lr: 0.00000343, total_loss: 3.313, loss_xy: 0.680, loss_wh: 0.496, loss_conf: 2.125, loss_class: 0.012 35 | ===> batch recall: 0.832, batch precision: 0.864 <=== 36 | Epoch: 33, global_step: 143550, lr: 0.00000343, total_loss: 3.279, loss_xy: 0.620, loss_wh: 0.449, loss_conf: 2.201, loss_class: 0.008 37 | Epoch: 33, global_step: 143600, lr: 0.00000343, total_loss: 3.524, loss_xy: 0.560, loss_wh: 0.536, loss_conf: 2.406, loss_class: 0.022 38 | Epoch: 33, global_step: 143650, lr: 0.00000343, total_loss: 3.361, loss_xy: 0.635, loss_wh: 0.446, loss_conf: 2.277, loss_class: 0.003 39 | Epoch: 33, global_step: 143700, lr: 0.00000343, total_loss: 2.603, loss_xy: 0.615, loss_wh: 0.449, loss_conf: 1.514, loss_class: 0.025 40 | Epoch: 33, global_step: 143750, lr: 0.00000343, total_loss: 4.765, loss_xy: 1.000, loss_wh: 0.663, loss_conf: 3.073, loss_class: 0.029 41 | Epoch: 33, global_step: 143800, lr: 0.00000343, total_loss: 4.013, loss_xy: 0.789, loss_wh: 0.639, loss_conf: 2.550, loss_class: 0.036 42 | Epoch: 33, global_step: 143850, lr: 0.00000343, total_loss: 3.597, loss_xy: 0.699, loss_wh: 0.464, loss_conf: 2.431, loss_class: 0.002 43 | Epoch: 33, global_step: 143900, lr: 0.00000343, total_loss: 3.572, loss_xy: 0.887, loss_wh: 0.593, loss_conf: 2.077, loss_class: 0.016 44 | Epoch: 33, global_step: 143950, lr: 0.00000343, total_loss: 3.375, loss_xy: 0.590, loss_wh: 0.533, loss_conf: 2.230, loss_class: 0.022 45 | Epoch: 33, global_step: 144000, lr: 0.00000343, total_loss: 2.974, loss_xy: 0.629, loss_wh: 0.550, loss_conf: 1.768, loss_class: 0.028 46 | ===> batch recall: 0.815, batch precision: 0.846 <=== -------------------------------------------------------------------------------- /checkpoint/yolov3_tiny_Car/model-step_144000_loss_2.974138_lr_3.433207e-06.data-00000-of-00001: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/checkpoint/yolov3_tiny_Car/model-step_144000_loss_2.974138_lr_3.433207e-06.data-00000-of-00001 -------------------------------------------------------------------------------- /checkpoint/yolov3_tiny_Car/model-step_144000_loss_2.974138_lr_3.433207e-06.index: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/checkpoint/yolov3_tiny_Car/model-step_144000_loss_2.974138_lr_3.433207e-06.index -------------------------------------------------------------------------------- /checkpoint/yolov3_tiny_Car/model-step_144000_loss_2.974138_lr_3.433207e-06.meta: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/checkpoint/yolov3_tiny_Car/model-step_144000_loss_2.974138_lr_3.433207e-06.meta -------------------------------------------------------------------------------- /convert_weight.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | # for more details about the yolo darknet weights file, refer to 3 | # https://itnext.io/implementing-yolo-v3-in-tensorflow-tf-slim-c3c55ff59dbe 4 | 5 | from __future__ import division, print_function 6 | 7 | import os 8 | import sys 9 | import tensorflow as tf 10 | import numpy as np 11 | 12 | from model import yolov3 13 | from utils.misc_utils import parse_anchors, load_weights 14 | 15 | num_class = 80 16 | img_size = 416 17 | weight_path = './data/darknet_weights/yolov3.weights' 18 | save_path = './data/darknet_weights/yolov3.ckpt' 19 | anchors = parse_anchors('./data/yolo_anchors.txt') 20 | 21 | model = yolov3(80, anchors) 22 | with tf.Session() as sess: 23 | inputs = tf.placeholder(tf.float32, [1, img_size, img_size, 3]) 24 | 25 | with tf.variable_scope('yolov3'): 26 | feature_map = model.forward(inputs) 27 | 28 | saver = tf.train.Saver(var_list=tf.global_variables(scope='yolov3')) 29 | 30 | load_ops = load_weights(tf.global_variables(scope='yolov3'), weight_path) 31 | sess.run(load_ops) 32 | saver.save(sess, save_path=save_path) 33 | print('TensorFlow model checkpoint has been saved to {}'.format(save_path)) 34 | 35 | 36 | 37 | -------------------------------------------------------------------------------- /data/COCO.names: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorbike 5 | aeroplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | stop sign 13 | parking meter 14 | bench 15 | bird 16 | cat 17 | dog 18 | horse 19 | sheep 20 | cow 21 | elephant 22 | bear 23 | zebra 24 | giraffe 25 | backpack 26 | umbrella 27 | handbag 28 | tie 29 | suitcase 30 | frisbee 31 | skis 32 | snowboard 33 | sports ball 34 | kite 35 | baseball bat 36 | baseball glove 37 | skateboard 38 | surfboard 39 | tennis racket 40 | bottle 41 | wine glass 42 | cup 43 | fork 44 | knife 45 | spoon 46 | bowl 47 | banana 48 | apple 49 | sandwich 50 | orange 51 | broccoli 52 | carrot 53 | hot dog 54 | pizza 55 | donut 56 | cake 57 | chair 58 | sofa 59 | pottedplant 60 | bed 61 | diningtable 62 | toilet 63 | tvmonitor 64 | laptop 65 | mouse 66 | remote 67 | keyboard 68 | cell phone 69 | microwave 70 | oven 71 | toaster 72 | sink 73 | refrigerator 74 | book 75 | clock 76 | vase 77 | scissors 78 | teddy bear 79 | hair drier 80 | toothbrush 81 | -------------------------------------------------------------------------------- /data/Car.names: -------------------------------------------------------------------------------- 1 | bicycle 2 | car 3 | person -------------------------------------------------------------------------------- /data/darknet_weights/readme: -------------------------------------------------------------------------------- 1 | place pretrained weights on COCO dataset here. -------------------------------------------------------------------------------- /data/demo_data/car.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/data/demo_data/car.jpg -------------------------------------------------------------------------------- /data/demo_data/dog.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/data/demo_data/dog.jpg -------------------------------------------------------------------------------- /data/demo_data/kite.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/data/demo_data/kite.jpg -------------------------------------------------------------------------------- /data/demo_data/messi.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/data/demo_data/messi.jpg -------------------------------------------------------------------------------- /data/demo_data/results/dog.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/data/demo_data/results/dog.jpg -------------------------------------------------------------------------------- /data/demo_data/results/kite.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/data/demo_data/results/kite.jpg -------------------------------------------------------------------------------- /data/demo_data/results/messi.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/data/demo_data/results/messi.jpg -------------------------------------------------------------------------------- /data/logs/readme: -------------------------------------------------------------------------------- 1 | tensorboard event files will be here. -------------------------------------------------------------------------------- /data/my_data/readme: -------------------------------------------------------------------------------- 1 | place your data files here. 2 | 3 | image path and name, classID, xl, yl, xr, yr, classID, xl, yl, xr, yr ... -------------------------------------------------------------------------------- /data/yolov3_anchors.txt: -------------------------------------------------------------------------------- 1 | 10,13,16,30,33,23,30,61,62,45,59,119,116,90,156,198,373,326 -------------------------------------------------------------------------------- /data/yolov3_tiny_COCO_anchors.txt: -------------------------------------------------------------------------------- 1 | 10,14,23,27,37,58,81,82,135,169,344,319 -------------------------------------------------------------------------------- /data/yolov3_tiny_Car_anchors.txt: -------------------------------------------------------------------------------- 1 | 10,18,15,25,27,36,42,68,79,134,171,272 -------------------------------------------------------------------------------- /detection_result.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/detection_result.jpg -------------------------------------------------------------------------------- /docs/backbone.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/docs/backbone.png -------------------------------------------------------------------------------- /docs/yolo_v3_architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/docs/yolo_v3_architecture.png -------------------------------------------------------------------------------- /eval.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | from __future__ import division, print_function 4 | 5 | import tensorflow as tf 6 | import numpy as np 7 | import argparse 8 | 9 | from utils.data_utils import parse_data 10 | from utils.misc_utils import parse_anchors, read_class_names, shuffle_and_overwrite, update_dict, make_summary, config_learning_rate, config_optimizer, list_add 11 | from utils.eval_utils import evaluate_on_cpu, evaluate_on_gpu 12 | from utils.nms_utils import gpu_nms 13 | 14 | from model.yolov3 import yolov3 15 | 16 | ################# 17 | # ArgumentParser 18 | ################# 19 | parser = argparse.ArgumentParser(description="YOLO-V3 eval procedure.") 20 | # some paths 21 | parser.add_argument("--eval_file", type=str, default="./data/my_data/val.txt", 22 | help="The path of the validation or test txt file.") 23 | 24 | parser.add_argument("--restore_path", type=str, default="./data/darknet_weights/yolov3.ckpt", 25 | help="The path of the weights to restore.") 26 | 27 | parser.add_argument("--anchor_path", type=str, default="./data/yolo_anchors.txt", 28 | help="The path of the anchor txt file.") 29 | 30 | parser.add_argument("--class_name_path", type=str, default="./data/COCO.names", 31 | help="The path of the class names.") 32 | 33 | # some numbers 34 | parser.add_argument("--batch_size", type=int, default=20, 35 | help="The batch size for training.") 36 | 37 | parser.add_argument("--img_size", nargs='*', type=int, default=[416, 416], 38 | help="Resize the input image to `img_size`, size format: [width, height]") 39 | 40 | parser.add_argument("--num_threads", type=int, default=10, 41 | help="Number of threads for image processing used in tf.data pipeline.") 42 | 43 | parser.add_argument("--prefetech_buffer", type=int, default=3, 44 | help="Prefetech_buffer used in tf.data pipeline.") 45 | 46 | args = parser.parse_args() 47 | 48 | # args params 49 | args.anchors = parse_anchors(args.anchor_path) 50 | args.classes = read_class_names(args.class_name_path) 51 | args.class_num = len(args.classes) 52 | args.img_cnt = len(open(args.eval_file, 'r').readlines()) 53 | args.batch_num = int(np.ceil(float(args.img_cnt) / args.batch_size)) 54 | 55 | # setting placeholders 56 | is_training = tf.placeholder(dtype=tf.bool, name="phase_train") 57 | handle_flag = tf.placeholder(tf.string, [], name='iterator_handle_flag') 58 | 59 | ################## 60 | # tf.data pipeline 61 | ################## 62 | 63 | dataset = tf.data.TextLineDataset(args.eval_file) 64 | dataset = dataset.apply(tf.contrib.data.map_and_batch( 65 | lambda x: tf.py_func(parse_data, [x, args.class_num, args.img_size, args.anchors, 'val'], [tf.float32, tf.float32, tf.float32, tf.float32]), 66 | num_parallel_calls=args.num_threads, batch_size=args.batch_size)) 67 | dataset = dataset.prefetch(args.prefetech_buffer) 68 | 69 | iterator = dataset.make_one_shot_iterator() 70 | 71 | # get an element from the dataset iterator 72 | image, y_true_13, y_true_26, y_true_52 = iterator.get_next() 73 | y_true = [y_true_13, y_true_26, y_true_52] 74 | 75 | # tf.data pipeline will lose the data shape, so we need to set it manually 76 | image.set_shape([None, args.img_size[1], args.img_size[0], 3]) 77 | for y in y_true: 78 | y.set_shape([None, None, None, None, None]) 79 | 80 | ################## 81 | # Model definition 82 | ################## 83 | 84 | # define yolo-v3 model here 85 | yolo_model = yolov3(args.class_num, args.anchors) 86 | with tf.variable_scope('yolov3'): 87 | pred_feature_maps = yolo_model.forward(image, is_training=is_training) 88 | loss = yolo_model.compute_loss(pred_feature_maps, y_true) 89 | y_pred = yolo_model.predict(pred_feature_maps) 90 | 91 | ################ 92 | # register the gpu nms operation here for the following evaluation scheme 93 | pred_boxes_flag = tf.placeholder(tf.float32, [1, None, None]) 94 | pred_scores_flag = tf.placeholder(tf.float32, [1, None, None]) 95 | gpu_nms_op = gpu_nms(pred_boxes_flag, pred_scores_flag, args.class_num) 96 | ################ 97 | 98 | saver_to_restore = tf.train.Saver() 99 | 100 | with tf.Session() as sess: 101 | sess.run([tf.global_variables_initializer()]) 102 | saver_to_restore.restore(sess, args.restore_path) 103 | 104 | print('\n----------- start to eval -----------\n') 105 | 106 | true_positive_dict, true_labels_dict, pred_labels_dict = {}, {}, {} 107 | val_loss = [0., 0., 0., 0., 0.] 108 | 109 | for j in range(args.batch_num): 110 | y_pred_, y_true_, loss_ = sess.run([y_pred, y_true, loss], feed_dict={is_training: False}) 111 | true_positive_dict_tmp, true_labels_dict_tmp, pred_labels_dict_tmp = \ 112 | evaluate_on_gpu(sess, gpu_nms_op, pred_boxes_flag, pred_scores_flag, 113 | y_pred_, y_true_, args.class_num, calc_now=False) 114 | true_positive_dict = update_dict(true_positive_dict, true_positive_dict_tmp) 115 | true_labels_dict = update_dict(true_labels_dict, true_labels_dict_tmp) 116 | pred_labels_dict = update_dict(pred_labels_dict, pred_labels_dict_tmp) 117 | 118 | val_loss = list_add(val_loss, loss_) 119 | 120 | # make sure there is at least ground truth an object in each image 121 | # avoid divided by 0 122 | recall = float(sum(true_positive_dict.values())) / (sum(true_labels_dict.values()) + 1e-6) 123 | precision = float(sum(true_positive_dict.values())) / (sum(pred_labels_dict.values()) + 1e-6) 124 | 125 | print("recall: {:.3f}, precision: {:.3f}".format(recall, precision)) 126 | print("total_loss: {:.3f}, loss_xy: {:.3f}, loss_wh: {:.3f}, loss_conf: {:.3f}, loss_class: {:.3f}".format( 127 | val_loss[0] / args.img_cnt, val_loss[1] / args.img_cnt, val_loss[2] / args.img_cnt, val_loss[3] / args.img_cnt, val_loss[4] / args.img_cnt)) -------------------------------------------------------------------------------- /get_kmeans.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | # This script is modified from https://github.com/lars76/kmeans-anchor-boxes 3 | 4 | from __future__ import division, print_function 5 | 6 | import numpy as np 7 | 8 | def iou(box, clusters): 9 | """ 10 | Calculates the Intersection over Union (IoU) between a box and k clusters. 11 | param: 12 | box: tuple or array, shifted to the origin (i. e. width and height) 13 | clusters: numpy array of shape (k, 2) where k is the number of clusters 14 | return: 15 | numpy array of shape (k, 0) where k is the number of clusters 16 | """ 17 | x = np.minimum(clusters[:, 0], box[0]) 18 | y = np.minimum(clusters[:, 1], box[1]) 19 | if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0: 20 | raise ValueError("Box has no area") 21 | 22 | intersection = x * y 23 | box_area = box[0] * box[1] 24 | cluster_area = clusters[:, 0] * clusters[:, 1] 25 | 26 | iou_ = intersection / (box_area + cluster_area - intersection + 1e-10) 27 | 28 | return iou_ 29 | 30 | 31 | def avg_iou(boxes, clusters): 32 | """ 33 | Calculates the average Intersection over Union (IoU) between a numpy array of boxes and k clusters. 34 | param: 35 | boxes: numpy array of shape (r, 2), where r is the number of rows 36 | clusters: numpy array of shape (k, 2) where k is the number of clusters 37 | return: 38 | average IoU as a single float 39 | """ 40 | return np.mean([np.max(iou(boxes[i], clusters)) for i in range(boxes.shape[0])]) 41 | 42 | 43 | def translate_boxes(boxes): 44 | """ 45 | Translates all the boxes to the origin. 46 | param: 47 | boxes: numpy array of shape (r, 4) 48 | return: 49 | numpy array of shape (r, 2) 50 | """ 51 | new_boxes = boxes.copy() 52 | for row in range(new_boxes.shape[0]): 53 | new_boxes[row][2] = np.abs(new_boxes[row][2] - new_boxes[row][0]) 54 | new_boxes[row][3] = np.abs(new_boxes[row][3] - new_boxes[row][1]) 55 | return np.delete(new_boxes, [0, 1], axis=1) 56 | 57 | 58 | def kmeans(boxes, k, dist=np.median): 59 | """ 60 | Calculates k-means clustering with the Intersection over Union (IoU) metric. 61 | param: 62 | boxes: numpy array of shape (r, 2), where r is the number of rows 63 | k: number of clusters 64 | dist: distance function 65 | return: 66 | numpy array of shape (k, 2) 67 | """ 68 | rows = boxes.shape[0] 69 | 70 | distances = np.empty((rows, k)) 71 | last_clusters = np.zeros((rows,)) 72 | 73 | np.random.seed() 74 | 75 | # the Forgy method will fail if the whole array contains the same rows 76 | clusters = boxes[np.random.choice(rows, k, replace=False)] 77 | 78 | while True: 79 | for row in range(rows): 80 | distances[row] = 1 - iou(boxes[row], clusters) 81 | 82 | nearest_clusters = np.argmin(distances, axis=1) 83 | 84 | if (last_clusters == nearest_clusters).all(): 85 | break 86 | 87 | for cluster in range(k): 88 | clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0) 89 | 90 | last_clusters = nearest_clusters 91 | 92 | return clusters 93 | 94 | 95 | def parse_anno(annotation_path): 96 | anno = open(annotation_path, 'r') 97 | result = [] 98 | for line in anno: 99 | s = line.strip().split(' ') 100 | s = s[1:] 101 | box_cnt = len(s) // 5 102 | for i in range(box_cnt): 103 | x_min, y_min, x_max, y_max = float(s[i*5+1]), float(s[i*5+2]), float(s[i*5+3]), float(s[i*5+4]) 104 | width = x_max - x_min 105 | height = y_max - y_min 106 | assert width > 0 107 | assert height > 0 108 | result.append([width, height]) 109 | result = np.asarray(result) 110 | return result 111 | 112 | 113 | def get_kmeans(anno, cluster_num=9): 114 | 115 | anchors = kmeans(anno, cluster_num) 116 | ave_iou = avg_iou(anno, anchors) 117 | 118 | anchors = anchors.astype('int').tolist() 119 | 120 | anchors = sorted(anchors, key=lambda x: x[0] * x[1]) 121 | 122 | return anchors, ave_iou 123 | 124 | 125 | if __name__ == '__main__': 126 | annotation_path = "./data/my_data/train.txt" 127 | anno_result = parse_anno(annotation_path) 128 | anchors, ave_iou = get_kmeans(anno_result, 9) 129 | 130 | anchor_string = '' 131 | for anchor in anchors: 132 | anchor_string += '{},{}, '.format(anchor[0], anchor[1]) 133 | anchor_string = anchor_string[:-2] 134 | 135 | print('anchors are:') 136 | print(anchor_string) 137 | print('the average iou is:') 138 | print(ave_iou) 139 | 140 | -------------------------------------------------------------------------------- /model/__init__.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------------------------------- /model/yolov3.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # for better understanding about yolov3 architecture, refer to this website (in Chinese): 3 | # https://blog.csdn.net/leviopku/article/details/82660381 4 | 5 | from __future__ import division, print_function 6 | 7 | import numpy as np 8 | import tensorflow as tf 9 | slim = tf.contrib.slim 10 | 11 | from utils.layer_utils import conv2d, darknet53_body, yolo_block, upsample_layer 12 | 13 | class yolov3(object): 14 | 15 | def __init__(self, class_num, anchors, batch_norm_decay=0.9): 16 | 17 | # self.anchors = [[10, 13], [16, 30], [33, 23], 18 | # [30, 61], [62, 45], [59, 119], 19 | # [116, 90], [156, 198], [373,326]] 20 | self.class_num = class_num 21 | self.anchors = anchors 22 | self.batch_norm_decay = batch_norm_decay 23 | 24 | def forward(self, inputs, is_training=False, reuse=False): 25 | # the input img_size, form: [height, weight] 26 | # it will be used later 27 | self.img_size = tf.shape(inputs)[1:3] 28 | # set batch norm params 29 | batch_norm_params = { 30 | 'decay': self.batch_norm_decay, 31 | 'epsilon': 1e-05, 32 | 'scale': True, 33 | 'is_training': is_training, 34 | 'fused': None, # Use fused batch norm if possible. 35 | } 36 | 37 | with slim.arg_scope([slim.conv2d, slim.batch_norm],reuse=reuse): 38 | with slim.arg_scope([slim.conv2d], normalizer_fn=slim.batch_norm, 39 | normalizer_params=batch_norm_params, 40 | biases_initializer=None, 41 | activation_fn=lambda x: tf.nn.leaky_relu(x, alpha=0.1)): 42 | with tf.variable_scope('darknet53_body'): 43 | route_1, route_2, route_3 = darknet53_body(inputs) 44 | 45 | with tf.variable_scope('yolov3_head'): 46 | inter1, net = yolo_block(route_3, 512) 47 | feature_map_1 = slim.conv2d(net, 3 * (5 + self.class_num), 1, 48 | stride=1, normalizer_fn=None, 49 | activation_fn=None, biases_initializer=tf.zeros_initializer()) 50 | feature_map_1 = tf.identity(feature_map_1, name='feature_map_1') 51 | 52 | inter1 = conv2d(inter1, 256, 1) 53 | inter1 = upsample_layer(inter1, route_2.get_shape().as_list()) 54 | concat1 = tf.concat([inter1, route_2], axis=3) 55 | 56 | inter2, net = yolo_block(concat1, 256) 57 | feature_map_2 = slim.conv2d(net, 3 * (5 + self.class_num), 1, 58 | stride=1, normalizer_fn=None, 59 | activation_fn=None, biases_initializer=tf.zeros_initializer()) 60 | feature_map_2 = tf.identity(feature_map_2, name='feature_map_2') 61 | 62 | inter2 = conv2d(inter2, 128, 1) 63 | inter2 = upsample_layer(inter2, route_1.get_shape().as_list()) 64 | concat2 = tf.concat([inter2, route_1], axis=3) 65 | 66 | _, feature_map_3 = yolo_block(concat2, 128) 67 | feature_map_3 = slim.conv2d(feature_map_3, 3 * (5 + self.class_num), 1, 68 | stride=1, normalizer_fn=None, 69 | activation_fn=None, biases_initializer=tf.zeros_initializer()) 70 | feature_map_3 = tf.identity(feature_map_3, name='feature_map_3') 71 | 72 | return feature_map_1, feature_map_2, feature_map_3 73 | 74 | def reorg_layer(self, feature_map, anchors): 75 | ''' 76 | feature_map: a feature_map from [feature_map_1, feature_map_2, feature_map_3] returned 77 | from `forward` function 78 | anchors: shape: [3, 2] 79 | ''' 80 | # NOTE: size in [h, w] format! don't get messed up! 81 | grid_size = feature_map.shape.as_list()[1:3] # [13, 13] 82 | # the downscale ratio in height and weight 83 | ratio = tf.cast(self.img_size / grid_size, tf.float32) 84 | # rescale the anchors to the feature_map 85 | # NOTE: the anchor is in [w, h] format! 86 | rescaled_anchors = [(anchor[0] / ratio[1], anchor[1] / ratio[0]) for anchor in anchors] 87 | 88 | feature_map = tf.reshape(feature_map, [-1, grid_size[0], grid_size[1], 3, 5 + self.class_num]) 89 | 90 | # split the feature_map along the last dimension 91 | # shape info: take 416x416 input image and the 13*13 feature_map for example: 92 | # box_centers: [N, 13, 13, 3, 2] last_dimension: [center_x, center_y] 93 | # box_sizes: [N, 13, 13, 3, 2] last_dimension: [width, height] 94 | # conf_logits: [N, 13, 13, 3, 1] 95 | # prob_logits: [N, 13, 13, 3, class_num] 96 | box_centers, box_sizes, conf_logits, prob_logits = tf.split(feature_map, [2, 2, 1, self.class_num], axis=-1) 97 | box_centers = tf.nn.sigmoid(box_centers) 98 | 99 | # use some broadcast tricks to get the mesh coordinates 100 | grid_x = tf.range(grid_size[1], dtype=tf.int32) 101 | grid_y = tf.range(grid_size[0], dtype=tf.int32) 102 | grid_x, grid_y = tf.meshgrid(grid_x, grid_y) 103 | x_offset = tf.reshape(grid_x, (-1, 1)) 104 | y_offset = tf.reshape(grid_y, (-1, 1)) 105 | x_y_offset = tf.concat([x_offset, y_offset], axis=-1) 106 | # shape: [13, 13, 1, 2] 107 | x_y_offset = tf.cast(tf.reshape(x_y_offset, [grid_size[0], grid_size[1], 1, 2]), tf.float32) 108 | 109 | # get the absolute box coordinates on the feature_map 110 | box_centers = box_centers + x_y_offset 111 | # rescale to the original image scale 112 | box_centers = box_centers * ratio[::-1] 113 | 114 | # avoid getting possible nan value with tf.clip_by_value 115 | box_sizes = tf.clip_by_value(tf.exp(box_sizes), 1e-9, 50) * rescaled_anchors 116 | # rescale to the original image scale 117 | box_sizes = box_sizes * ratio[::-1] 118 | 119 | # shape: [N, 13, 13, 3, 4] 120 | # last dimension: (center_x, center_y, w, h) 121 | boxes = tf.concat([box_centers, box_sizes], axis=-1) 122 | 123 | # shape: 124 | # x_y_offset: [13, 13, 1, 2] 125 | # boxes: [N, 13, 13, 3, 4], rescaled to the original image scale 126 | # conf_logits: [N, 13, 13, 3, 1] 127 | # prob_logits: [N, 13, 13, 3, class_num] 128 | return x_y_offset, boxes, conf_logits, prob_logits 129 | 130 | def reorg_layer_xx(self, feature_map, anchors): 131 | ''' 132 | feature_map: a feature_map from [feature_map_1, feature_map_2, feature_map_3] returned 133 | from `forward` function 134 | anchors: shape: [3, 2] 135 | ''' 136 | # NOTE: size in [h, w] format! don't get messed up! 137 | grid_size = feature_map.shape.as_list()[1:3] # [13, 13] 138 | # the downscale ratio in height and weight 139 | ratio = tf.cast(self.img_size / grid_size, tf.float32) 140 | # rescale the anchors to the feature_map 141 | # NOTE: the anchor is in [w, h] format! 142 | rescaled_anchors = [(anchor[0] / ratio[1], anchor[1] / ratio[0]) for anchor in anchors] 143 | 144 | feature_map_1, feature_map_2 = tf.split(feature_map, [3*4, 3*(self.class_num+1)], axis=-1) 145 | feature_map_1 = tf.reshape(feature_map_1, [-1, grid_size[0], grid_size[1], 3, 4]) 146 | feature_map_2 = tf.reshape(feature_map_2, [-1, grid_size[0], grid_size[1], 3, 1 + self.class_num]) 147 | 148 | # split the feature_map along the last dimension 149 | # shape info: take 416x416 input image and the 13*13 feature_map for example: 150 | # box_centers: [N, 13, 13, 3, 2] last_dimension: [center_x, center_y] 151 | # box_sizes: [N, 13, 13, 3, 2] last_dimension: [width, height] 152 | # conf_logits: [N, 13, 13, 3, 1] 153 | # prob_logits: [N, 13, 13, 3, class_num] 154 | box_centers, box_sizes = tf.split(feature_map_1, [2, 2], axis=-1) 155 | conf_logits, prob_logits = tf.split(feature_map_2, [1, self.class_num], axis=-1) 156 | box_centers = tf.nn.sigmoid(box_centers) 157 | 158 | # use some broadcast tricks to get the mesh coordinates 159 | grid_x = tf.range(grid_size[1], dtype=tf.int32) 160 | grid_y = tf.range(grid_size[0], dtype=tf.int32) 161 | grid_x, grid_y = tf.meshgrid(grid_x, grid_y) 162 | x_offset = tf.reshape(grid_x, (-1, 1)) 163 | y_offset = tf.reshape(grid_y, (-1, 1)) 164 | x_y_offset = tf.concat([x_offset, y_offset], axis=-1) 165 | # shape: [13, 13, 1, 2] 166 | x_y_offset = tf.cast(tf.reshape(x_y_offset, [grid_size[0], grid_size[1], 1, 2]), tf.float32) 167 | 168 | # get the absolute box coordinates on the feature_map 169 | box_centers = box_centers + x_y_offset 170 | # rescale to the original image scale 171 | box_centers = box_centers * ratio[::-1] 172 | 173 | # avoid getting possible nan value with tf.clip_by_value 174 | box_sizes = tf.clip_by_value(tf.exp(box_sizes), 1e-9, 50) * rescaled_anchors 175 | # rescale to the original image scale 176 | box_sizes = box_sizes * ratio[::-1] 177 | 178 | # shape: [N, 13, 13, 3, 4] 179 | # last dimension: (center_x, center_y, w, h) 180 | boxes = tf.concat([box_centers, box_sizes], axis=-1) 181 | 182 | # shape: 183 | # x_y_offset: [13, 13, 1, 2] 184 | # boxes: [N, 13, 13, 3, 4], rescaled to the original image scale 185 | # conf_logits: [N, 13, 13, 3, 1] 186 | # prob_logits: [N, 13, 13, 3, class_num] 187 | return x_y_offset, boxes, conf_logits, prob_logits 188 | 189 | 190 | def predict(self, feature_maps): 191 | ''' 192 | Receive the returned feature_maps from `forward` function, 193 | the produce the output predictions at the test stage. 194 | ''' 195 | feature_map_1, feature_map_2, feature_map_3 = feature_maps 196 | 197 | feature_map_anchors = [(feature_map_1, self.anchors[6:9]), 198 | (feature_map_2, self.anchors[3:6]), 199 | (feature_map_3, self.anchors[0:3])] 200 | reorg_results = [self.reorg_layer(feature_map, anchors) for (feature_map, anchors) in feature_map_anchors] 201 | 202 | def _reshape(result): 203 | x_y_offset, boxes, conf_logits, prob_logits = result 204 | grid_size = x_y_offset.shape.as_list()[:2] 205 | boxes = tf.reshape(boxes, [-1, grid_size[0] * grid_size[1] * 3, 4]) 206 | conf_logits = tf.reshape(conf_logits, [-1, grid_size[0] * grid_size[1] * 3, 1]) 207 | prob_logits = tf.reshape(prob_logits, [-1, grid_size[0] * grid_size[1] * 3, self.class_num]) 208 | # shape: (take 416*416 input image and feature_map_1 for example) 209 | # boxes: [N, 13*13*3, 4] 210 | # conf_logits: [N, 13*13*3, 1] 211 | # prob_logits: [N, 13*13*3, class_num] 212 | return boxes, conf_logits, prob_logits 213 | 214 | boxes_list, confs_list, probs_list = [], [], [] 215 | for result in reorg_results: 216 | boxes, conf_logits, prob_logits = _reshape(result) 217 | confs = tf.sigmoid(conf_logits) 218 | probs = tf.sigmoid(prob_logits) 219 | boxes_list.append(boxes) 220 | confs_list.append(confs) 221 | probs_list.append(probs) 222 | 223 | # collect results on three scales 224 | # take 416*416 input image for example: 225 | # shape: [N, (13*13+26*26+52*52)*3, 4] 226 | boxes = tf.concat(boxes_list, axis=1) 227 | # shape: [N, (13*13+26*26+52*52)*3, 1] 228 | confs = tf.concat(confs_list, axis=1) 229 | # shape: [N, (13*13+26*26+52*52)*3, class_num] 230 | probs = tf.concat(probs_list, axis=1) 231 | 232 | center_x, center_y, width, height = tf.split(boxes, [1, 1, 1, 1], axis=-1) 233 | x_min = center_x - width / 2 234 | y_min = center_y - height / 2 235 | x_max = center_x + width / 2 236 | y_max = center_y + height / 2 237 | 238 | boxes = tf.concat([x_min, y_min, x_max, y_max], axis=-1) 239 | 240 | return boxes, confs, probs 241 | 242 | def loss_layer(self, feature_map_i, y_true, anchors): 243 | ''' 244 | calc loss function from a certain scale 245 | ''' 246 | 247 | # size in [h, w] format! don't get messed up! 248 | grid_size = tf.shape(feature_map_i)[1:3] 249 | # the downscale ratio in height and weight 250 | ratio = tf.cast(self.img_size / grid_size, tf.float32) 251 | # N: batch_size 252 | N = tf.cast(tf.shape(feature_map_i)[0], tf.float32) 253 | 254 | x_y_offset, pred_boxes, pred_conf_logits, pred_prob_logits = self.reorg_layer(feature_map_i, anchors) 255 | 256 | ########### 257 | # get mask 258 | ########### 259 | # shape: take 416x416 input image and 13*13 feature_map for example: 260 | # [N, 13, 13, 3, 1] 261 | object_mask = y_true[..., 4:5] 262 | # shape: [N, 13, 13, 3, 4] & [N, 13, 13, 3] ==> [V, 4] 263 | # V: num of true gt box 264 | valid_true_boxes = tf.boolean_mask(y_true[..., 0:4], tf.cast(object_mask[..., 0], 'bool')) 265 | 266 | # shape: [V, 2] 267 | valid_true_box_xy = valid_true_boxes[:, 0:2] 268 | valid_true_box_wh = valid_true_boxes[:, 2:4] 269 | # shape: [N, 13, 13, 3, 2] 270 | pred_box_xy = pred_boxes[..., 0:2] 271 | pred_box_wh = pred_boxes[..., 2:4] 272 | 273 | # calc iou 274 | # shape: [N, 13, 13, 3, V] 275 | iou = self.broadcast_iou(valid_true_box_xy, valid_true_box_wh, pred_box_xy, pred_box_wh) 276 | 277 | # shape: [N, 13, 13, 3] 278 | best_iou = tf.reduce_max(iou, axis=-1) 279 | 280 | # get_ignore_mask 281 | ignore_mask = tf.cast(best_iou < 0.5, tf.float32) 282 | # shape: [N, 13, 13, 3, 1] 283 | ignore_mask = tf.expand_dims(ignore_mask, -1) 284 | 285 | # get xy coordinates in one cell from the feature_map 286 | # numerical range: 0 ~ 1 287 | # shape: [N, 13, 13, 3, 2] 288 | true_xy = y_true[..., 0:2] / ratio[::-1] - x_y_offset 289 | pred_xy = pred_box_xy / ratio[::-1] - x_y_offset 290 | 291 | # get_tw_th 292 | # numerical range: 0 ~ 1 293 | # shape: [N, 13, 13, 3, 2] 294 | true_tw_th = y_true[..., 2:4] / anchors 295 | pred_tw_th = pred_box_wh / anchors 296 | # for numerical stability 297 | true_tw_th = tf.where(condition=tf.equal(true_tw_th, 0), 298 | x=tf.ones_like(true_tw_th), y=true_tw_th) 299 | pred_tw_th = tf.where(condition=tf.equal(pred_tw_th, 0), 300 | x=tf.ones_like(pred_tw_th), y=pred_tw_th) 301 | true_tw_th = tf.log(tf.clip_by_value(true_tw_th, 1e-9, 1e9)) 302 | pred_tw_th = tf.log(tf.clip_by_value(pred_tw_th, 1e-9, 1e9)) 303 | 304 | # box size punishment: 305 | # box with smaller area has bigger weight. This is taken from the yolo darknet C source code. 306 | # shape: [N, 13, 13, 3, 1] 307 | box_loss_scale = 2. - (y_true[..., 2:3] / tf.cast(self.img_size[1], tf.float32)) * (y_true[..., 3:4] / tf.cast(self.img_size[0], tf.float32)) 308 | 309 | ############ 310 | # loss_part 311 | ############ 312 | # shape: [N, 13, 13, 3, 1] 313 | xy_loss = tf.reduce_sum(tf.square(true_xy - pred_xy) * object_mask * box_loss_scale) / N 314 | wh_loss = tf.reduce_sum(tf.square(true_tw_th - pred_tw_th) * object_mask * box_loss_scale) / N 315 | 316 | # shape: [N, 13, 13, 3, 1] 317 | conf_pos_mask = object_mask 318 | conf_neg_mask = (1 - object_mask) * ignore_mask 319 | conf_loss_pos = conf_pos_mask * tf.nn.sigmoid_cross_entropy_with_logits(labels=object_mask, logits=pred_conf_logits) 320 | conf_loss_neg = conf_neg_mask * tf.nn.sigmoid_cross_entropy_with_logits(labels=object_mask, logits=pred_conf_logits) 321 | conf_loss = tf.reduce_sum(conf_loss_pos + conf_loss_neg) / N 322 | 323 | # shape: [N, 13, 13, 3, 1] 324 | class_loss = object_mask * tf.nn.sigmoid_cross_entropy_with_logits(labels=y_true[..., 5:], logits=pred_prob_logits) 325 | class_loss = tf.reduce_sum(class_loss) / N 326 | 327 | return xy_loss, wh_loss, conf_loss, class_loss 328 | 329 | 330 | def compute_loss(self, y_pred, y_true): 331 | ''' 332 | param: 333 | y_pred: returned feature_map list by `forward` function: [feature_map_1, feature_map_2, feature_map_3] 334 | y_true: input y_true by the tf.data pipeline 335 | ''' 336 | loss_xy, loss_wh, loss_conf, loss_class = 0., 0., 0., 0. 337 | anchor_group = [self.anchors[6:9], self.anchors[3:6], self.anchors[0:3]] 338 | 339 | # calc loss in 3 scales 340 | for i in range(len(y_pred)): 341 | result = self.loss_layer(y_pred[i], y_true[i], anchor_group[i]) 342 | loss_xy += result[0] 343 | loss_wh += result[1] 344 | loss_conf += result[2] 345 | loss_class += result[3] 346 | total_loss = loss_xy + loss_wh + loss_conf + loss_class 347 | return [total_loss, loss_xy, loss_wh, loss_conf, loss_class] 348 | 349 | 350 | 351 | def broadcast_iou(self, true_box_xy, true_box_wh, pred_box_xy, pred_box_wh): 352 | ''' 353 | maintain an efficient way to calculate the ios matrix between ground truth true boxes and the predicted boxes 354 | note: here we only care about the size match 355 | ''' 356 | # shape: 357 | # true_box_??: [V, 2] 358 | # pred_box_??: [N, 13, 13, 3, 2] 359 | 360 | # shape: [N, 13, 13, 3, 1, 2] 361 | pred_box_xy = tf.expand_dims(pred_box_xy, -2) 362 | pred_box_wh = tf.expand_dims(pred_box_wh, -2) 363 | 364 | # shape: [1, V, 2] 365 | true_box_xy = tf.expand_dims(true_box_xy, 0) 366 | true_box_wh = tf.expand_dims(true_box_wh, 0) 367 | 368 | # [N, 13, 13, 3, 1, 2] & [1, V, 2] ==> [N, 13, 13, 3, V, 2] 369 | intersect_mins = tf.maximum(pred_box_xy - pred_box_wh / 2., 370 | true_box_xy - true_box_wh / 2.) 371 | intersect_maxs = tf.minimum(pred_box_xy + pred_box_wh / 2., 372 | true_box_xy + true_box_wh / 2.) 373 | intersect_wh = tf.maximum(intersect_maxs - intersect_mins, 0.) 374 | 375 | # shape: [N, 13, 13, 3, V] 376 | intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1] 377 | # shape: [N, 13, 13, 3, 1] 378 | pred_box_area = pred_box_wh[..., 0] * pred_box_wh[..., 1] 379 | # shape: [1, V] 380 | true_box_area = true_box_wh[..., 0] * true_box_wh[..., 1] 381 | 382 | # [N, 13, 13, 3, V] 383 | iou = intersect_area / (pred_box_area + true_box_area - intersect_area + 1e-10) 384 | 385 | return iou 386 | -------------------------------------------------------------------------------- /model/yolov3_tiny.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # for better understanding about yolov3 architecture, refer to this website (in Chinese): 3 | # https://blog.csdn.net/leviopku/article/details/82660381 4 | 5 | from __future__ import division, print_function 6 | 7 | import numpy as np 8 | import tensorflow as tf 9 | slim = tf.contrib.slim 10 | 11 | from utils.layer_utils import conv2d, darknet19_body, yolo_tiny_block, upsample_layer 12 | 13 | class yolov3_tiny(object): 14 | 15 | def __init__(self, class_num, anchors, batch_norm_decay=0.9): 16 | 17 | # self.anchors = [[10, 18], [15, 25], [27, 36], 18 | # [42, 68], [79, 134], [171, 272], 19 | 20 | self.class_num = class_num 21 | self.anchors = anchors 22 | self.batch_norm_decay = batch_norm_decay 23 | 24 | def forward(self, inputs, is_training=False, reuse=False): 25 | # the input img_size, form: [height, weight] 26 | # it will be used later 27 | self.img_size = tf.shape(inputs)[1:3] 28 | # set batch norm params 29 | batch_norm_params = { 30 | 'decay': self.batch_norm_decay, 31 | 'epsilon': 1e-05, 32 | 'scale': True, 33 | 'is_training': is_training, 34 | 'fused': None, # Use fused batch norm if possible. 35 | } 36 | 37 | with slim.arg_scope([slim.conv2d, slim.batch_norm],reuse=reuse): 38 | with slim.arg_scope([slim.conv2d], normalizer_fn=slim.batch_norm, 39 | normalizer_params=batch_norm_params, 40 | biases_initializer=None, 41 | activation_fn=lambda x: tf.nn.leaky_relu(x, alpha=0.1)): 42 | 43 | with tf.variable_scope('darknet19_body'): 44 | route_1,_, route_2,_ = darknet19_body(inputs) 45 | 46 | with tf.variable_scope('yolov3_tiny_head'): 47 | inter1, net = yolo_tiny_block(route_2, 256) 48 | feature_map_1 = slim.conv2d(net, 3 * (5 + self.class_num), 1, 49 | stride=1, normalizer_fn=None, 50 | activation_fn=None, biases_initializer=tf.zeros_initializer()) 51 | feature_map_1 = tf.identity(feature_map_1, name='feature_map_1') 52 | 53 | inter1 = conv2d(inter1, 128, 1, strides=1) 54 | inter1 = upsample_layer(inter1, route_1.get_shape().as_list()) 55 | concat1 = tf.concat([inter1, route_1], axis=3) 56 | 57 | net = conv2d(concat1, 256, 3, strides=1) 58 | feature_map_2 = slim.conv2d(net, 3 * (5 + self.class_num), 1, 59 | stride=1, normalizer_fn=None, 60 | activation_fn=None, biases_initializer=tf.zeros_initializer()) 61 | feature_map_2 = tf.identity(feature_map_2, name='feature_map_2') 62 | 63 | 64 | return feature_map_1, feature_map_2 65 | 66 | def reorg_layer(self, feature_map, anchors): 67 | ''' 68 | feature_map: a feature_map from [feature_map_1, feature_map_2, feature_map_3] returned 69 | from `forward` function 70 | anchors: shape: [3, 2] 71 | ''' 72 | # NOTE: size in [h, w] format! don't get messed up! 73 | grid_size = feature_map.shape.as_list()[1:3] # [13, 13] 74 | # the downscale ratio in height and weight 75 | ratio = tf.cast(self.img_size / grid_size, tf.float32) 76 | # rescale the anchors to the feature_map 77 | # NOTE: the anchor is in [w, h] format! 78 | rescaled_anchors = [(anchor[0] / ratio[1], anchor[1] / ratio[0]) for anchor in anchors] 79 | 80 | feature_map = tf.reshape(feature_map, [-1, grid_size[0], grid_size[1], 3, 5 + self.class_num]) 81 | 82 | # split the feature_map along the last dimension 83 | # shape info: take 416x416 input image and the 13*13 feature_map for example: 84 | # box_centers: [N, 13, 13, 3, 2] last_dimension: [center_x, center_y] 85 | # box_sizes: [N, 13, 13, 3, 2] last_dimension: [width, height] 86 | # conf_logits: [N, 13, 13, 3, 1] 87 | # prob_logits: [N, 13, 13, 3, class_num] 88 | box_centers, box_sizes, conf_logits, prob_logits = tf.split(feature_map, [2, 2, 1, self.class_num], axis=-1) 89 | box_centers = tf.nn.sigmoid(box_centers) 90 | 91 | # use some broadcast tricks to get the mesh coordinates 92 | grid_x = tf.range(grid_size[1], dtype=tf.int32) 93 | grid_y = tf.range(grid_size[0], dtype=tf.int32) 94 | grid_x, grid_y = tf.meshgrid(grid_x, grid_y) 95 | x_offset = tf.reshape(grid_x, (-1, 1)) 96 | y_offset = tf.reshape(grid_y, (-1, 1)) 97 | x_y_offset = tf.concat([x_offset, y_offset], axis=-1) 98 | # shape: [13, 13, 1, 2] 99 | x_y_offset = tf.cast(tf.reshape(x_y_offset, [grid_size[0], grid_size[1], 1, 2]), tf.float32) 100 | 101 | # get the absolute box coordinates on the feature_map 102 | box_centers = box_centers + x_y_offset 103 | # rescale to the original image scale 104 | box_centers = box_centers * ratio[::-1] 105 | 106 | # avoid getting possible nan value with tf.clip_by_value 107 | box_sizes = tf.clip_by_value(tf.exp(box_sizes), 1e-9, 50) * rescaled_anchors 108 | # rescale to the original image scale 109 | box_sizes = box_sizes * ratio[::-1] 110 | 111 | # shape: [N, 13, 13, 3, 4] 112 | # last dimension: (center_x, center_y, w, h) 113 | boxes = tf.concat([box_centers, box_sizes], axis=-1) 114 | 115 | # shape: 116 | # x_y_offset: [13, 13, 1, 2] 117 | # boxes: [N, 13, 13, 3, 4], rescaled to the original image scale 118 | # conf_logits: [N, 13, 13, 3, 1] 119 | # prob_logits: [N, 13, 13, 3, class_num] 120 | return x_y_offset, boxes, conf_logits, prob_logits 121 | 122 | def reorg_layer_xx(self, feature_map, anchors): 123 | ''' 124 | feature_map: a feature_map from [feature_map_1, feature_map_2, feature_map_3] returned 125 | from `forward` function 126 | anchors: shape: [3, 2] 127 | ''' 128 | # NOTE: size in [h, w] format! don't get messed up! 129 | grid_size = feature_map.shape.as_list()[1:3] # [13, 13] 130 | # the downscale ratio in height and weight 131 | ratio = tf.cast(self.img_size / grid_size, tf.float32) 132 | # rescale the anchors to the feature_map 133 | # NOTE: the anchor is in [w, h] format! 134 | rescaled_anchors = [(anchor[0] / ratio[1], anchor[1] / ratio[0]) for anchor in anchors] 135 | 136 | feature_map_1, feature_map_2 = tf.split(feature_map, [3*4, 3*(self.class_num+1)], axis=-1) 137 | feature_map_1 = tf.reshape(feature_map_1, [-1, grid_size[0], grid_size[1], 3, 4]) 138 | feature_map_2 = tf.reshape(feature_map_2, [-1, grid_size[0], grid_size[1], 3, 1 + self.class_num]) 139 | 140 | # split the feature_map along the last dimension 141 | # shape info: take 416x416 input image and the 13*13 feature_map for example: 142 | # box_centers: [N, 13, 13, 3, 2] last_dimension: [center_x, center_y] 143 | # box_sizes: [N, 13, 13, 3, 2] last_dimension: [width, height] 144 | # conf_logits: [N, 13, 13, 3, 1] 145 | # prob_logits: [N, 13, 13, 3, class_num] 146 | box_centers, box_sizes = tf.split(feature_map_1, [2, 2], axis=-1) 147 | conf_logits, prob_logits = tf.split(feature_map_2, [1, self.class_num], axis=-1) 148 | box_centers = tf.nn.sigmoid(box_centers) 149 | 150 | # use some broadcast tricks to get the mesh coordinates 151 | grid_x = tf.range(grid_size[1], dtype=tf.int32) 152 | grid_y = tf.range(grid_size[0], dtype=tf.int32) 153 | grid_x, grid_y = tf.meshgrid(grid_x, grid_y) 154 | x_offset = tf.reshape(grid_x, (-1, 1)) 155 | y_offset = tf.reshape(grid_y, (-1, 1)) 156 | x_y_offset = tf.concat([x_offset, y_offset], axis=-1) 157 | # shape: [13, 13, 1, 2] 158 | x_y_offset = tf.cast(tf.reshape(x_y_offset, [grid_size[0], grid_size[1], 1, 2]), tf.float32) 159 | 160 | # get the absolute box coordinates on the feature_map 161 | box_centers = box_centers + x_y_offset 162 | # rescale to the original image scale 163 | box_centers = box_centers * ratio[::-1] 164 | 165 | # avoid getting possible nan value with tf.clip_by_value 166 | box_sizes = tf.clip_by_value(tf.exp(box_sizes), 1e-9, 50) * rescaled_anchors 167 | # rescale to the original image scale 168 | box_sizes = box_sizes * ratio[::-1] 169 | 170 | # shape: [N, 13, 13, 3, 4] 171 | # last dimension: (center_x, center_y, w, h) 172 | boxes = tf.concat([box_centers, box_sizes], axis=-1) 173 | 174 | # shape: 175 | # x_y_offset: [13, 13, 1, 2] 176 | # boxes: [N, 13, 13, 3, 4], rescaled to the original image scale 177 | # conf_logits: [N, 13, 13, 3, 1] 178 | # prob_logits: [N, 13, 13, 3, class_num] 179 | return x_y_offset, boxes, conf_logits, prob_logits 180 | 181 | 182 | def predict(self, feature_maps): 183 | ''' 184 | Receive the returned feature_maps from `forward` function, 185 | the produce the output predictions at the test stage. 186 | ''' 187 | feature_map_1, feature_map_2 = feature_maps 188 | 189 | feature_map_anchors = [(feature_map_1, self.anchors[3:6]), 190 | (feature_map_2, self.anchors[0:3])] 191 | reorg_results = [self.reorg_layer(feature_map, anchors) for (feature_map, anchors) in feature_map_anchors] 192 | 193 | def _reshape(result): 194 | x_y_offset, boxes, conf_logits, prob_logits = result 195 | grid_size = x_y_offset.shape.as_list()[:2] 196 | boxes = tf.reshape(boxes, [-1, grid_size[0] * grid_size[1] * 3, 4]) 197 | conf_logits = tf.reshape(conf_logits, [-1, grid_size[0] * grid_size[1] * 3, 1]) 198 | prob_logits = tf.reshape(prob_logits, [-1, grid_size[0] * grid_size[1] * 3, self.class_num]) 199 | # shape: (take 416*416 input image and feature_map_1 for example) 200 | # boxes: [N, 13*13*3, 4] 201 | # conf_logits: [N, 13*13*3, 1] 202 | # prob_logits: [N, 13*13*3, class_num] 203 | return boxes, conf_logits, prob_logits 204 | 205 | boxes_list, confs_list, probs_list = [], [], [] 206 | for result in reorg_results: 207 | boxes, conf_logits, prob_logits = _reshape(result) 208 | confs = tf.sigmoid(conf_logits) 209 | probs = tf.sigmoid(prob_logits) 210 | boxes_list.append(boxes) 211 | confs_list.append(confs) 212 | probs_list.append(probs) 213 | 214 | # collect results on three scales 215 | # take 416*416 input image for example: 216 | # shape: [N, (13*13+26*26+52*52)*3, 4] 217 | boxes = tf.concat(boxes_list, axis=1) 218 | # shape: [N, (13*13+26*26+52*52)*3, 1] 219 | confs = tf.concat(confs_list, axis=1) 220 | # shape: [N, (13*13+26*26+52*52)*3, class_num] 221 | probs = tf.concat(probs_list, axis=1) 222 | 223 | center_x, center_y, width, height = tf.split(boxes, [1, 1, 1, 1], axis=-1) 224 | x_min = center_x - width / 2 225 | y_min = center_y - height / 2 226 | x_max = center_x + width / 2 227 | y_max = center_y + height / 2 228 | 229 | boxes = tf.concat([x_min, y_min, x_max, y_max], axis=-1) 230 | 231 | return boxes, confs, probs 232 | 233 | def loss_layer(self, feature_map_i, y_true, anchors): 234 | ''' 235 | calc loss function from a certain scale 236 | ''' 237 | 238 | # size in [h, w] format! don't get messed up! 239 | grid_size = tf.shape(feature_map_i)[1:3] 240 | # the downscale ratio in height and weight 241 | ratio = tf.cast(self.img_size / grid_size, tf.float32) 242 | # N: batch_size 243 | N = tf.cast(tf.shape(feature_map_i)[0], tf.float32) 244 | 245 | x_y_offset, pred_boxes, pred_conf_logits, pred_prob_logits = self.reorg_layer(feature_map_i, anchors) 246 | 247 | ########### 248 | # get mask 249 | ########### 250 | # shape: take 416x416 input image and 13*13 feature_map for example: 251 | # [N, 13, 13, 3, 1] 252 | object_mask = y_true[..., 4:5] 253 | # shape: [N, 13, 13, 3, 4] & [N, 13, 13, 3] ==> [V, 4] 254 | # V: num of true gt box 255 | valid_true_boxes = tf.boolean_mask(y_true[..., 0:4], tf.cast(object_mask[..., 0], 'bool')) 256 | 257 | # shape: [V, 2] 258 | valid_true_box_xy = valid_true_boxes[:, 0:2] 259 | valid_true_box_wh = valid_true_boxes[:, 2:4] 260 | # shape: [N, 13, 13, 3, 2] 261 | pred_box_xy = pred_boxes[..., 0:2] 262 | pred_box_wh = pred_boxes[..., 2:4] 263 | 264 | # calc iou 265 | # shape: [N, 13, 13, 3, V] 266 | iou = self.broadcast_iou(valid_true_box_xy, valid_true_box_wh, pred_box_xy, pred_box_wh) 267 | 268 | # shape: [N, 13, 13, 3] 269 | best_iou = tf.reduce_max(iou, axis=-1) 270 | 271 | # get_ignore_mask 272 | ignore_mask = tf.cast(best_iou < 0.5, tf.float32) 273 | # shape: [N, 13, 13, 3, 1] 274 | ignore_mask = tf.expand_dims(ignore_mask, -1) 275 | 276 | # get xy coordinates in one cell from the feature_map 277 | # numerical range: 0 ~ 1 278 | # shape: [N, 13, 13, 3, 2] 279 | true_xy = y_true[..., 0:2] / ratio[::-1] - x_y_offset 280 | pred_xy = pred_box_xy / ratio[::-1] - x_y_offset 281 | 282 | # get_tw_th 283 | # numerical range: 0 ~ 1 284 | # shape: [N, 13, 13, 3, 2] 285 | true_tw_th = y_true[..., 2:4] / anchors 286 | pred_tw_th = pred_box_wh / anchors 287 | # for numerical stability 288 | true_tw_th = tf.where(condition=tf.equal(true_tw_th, 0), 289 | x=tf.ones_like(true_tw_th), y=true_tw_th) 290 | pred_tw_th = tf.where(condition=tf.equal(pred_tw_th, 0), 291 | x=tf.ones_like(pred_tw_th), y=pred_tw_th) 292 | true_tw_th = tf.log(tf.clip_by_value(true_tw_th, 1e-9, 1e9)) 293 | pred_tw_th = tf.log(tf.clip_by_value(pred_tw_th, 1e-9, 1e9)) 294 | 295 | # box size punishment: 296 | # box with smaller area has bigger weight. This is taken from the yolo darknet C source code. 297 | # shape: [N, 13, 13, 3, 1] 298 | box_loss_scale = 2. - (y_true[..., 2:3] / tf.cast(self.img_size[1], tf.float32)) * (y_true[..., 3:4] / tf.cast(self.img_size[0], tf.float32)) 299 | 300 | ############ 301 | # loss_part 302 | ############ 303 | # shape: [N, 13, 13, 3, 1] 304 | xy_loss = tf.reduce_sum(tf.square(true_xy - pred_xy) * object_mask * box_loss_scale) / N 305 | wh_loss = tf.reduce_sum(tf.square(true_tw_th - pred_tw_th) * object_mask * box_loss_scale) / N 306 | 307 | # shape: [N, 13, 13, 3, 1] 308 | conf_pos_mask = object_mask 309 | conf_neg_mask = (1 - object_mask) * ignore_mask 310 | conf_loss_pos = conf_pos_mask * tf.nn.sigmoid_cross_entropy_with_logits(labels=object_mask, logits=pred_conf_logits) 311 | conf_loss_neg = conf_neg_mask * tf.nn.sigmoid_cross_entropy_with_logits(labels=object_mask, logits=pred_conf_logits) 312 | conf_loss = tf.reduce_sum(conf_loss_pos + conf_loss_neg) / N 313 | 314 | # shape: [N, 13, 13, 3, 1] 315 | class_loss = object_mask * tf.nn.sigmoid_cross_entropy_with_logits(labels=y_true[..., 5:], logits=pred_prob_logits) 316 | class_loss = tf.reduce_sum(class_loss) / N 317 | 318 | return xy_loss, wh_loss, conf_loss, class_loss 319 | 320 | 321 | def compute_loss(self, y_pred, y_true): 322 | ''' 323 | param: 324 | y_pred: returned feature_map list by `forward` function: [feature_map_1, feature_map_2, feature_map_3] 325 | y_true: input y_true by the tf.data pipeline 326 | ''' 327 | loss_xy, loss_wh, loss_conf, loss_class = 0., 0., 0., 0. 328 | anchor_group = [self.anchors[3:6], self.anchors[0:3]] 329 | 330 | # calc loss in 2 scales 331 | for i in range(len(y_pred)): 332 | result = self.loss_layer(y_pred[i], y_true[i], anchor_group[i]) 333 | loss_xy += result[0] 334 | loss_wh += result[1] 335 | loss_conf += result[2] 336 | loss_class += result[3] 337 | total_loss = loss_xy + loss_wh + loss_conf + loss_class 338 | return [total_loss, loss_xy, loss_wh, loss_conf, loss_class] 339 | 340 | 341 | 342 | def broadcast_iou(self, true_box_xy, true_box_wh, pred_box_xy, pred_box_wh): 343 | ''' 344 | maintain an efficient way to calculate the ios matrix between ground truth true boxes and the predicted boxes 345 | note: here we only care about the size match 346 | ''' 347 | # shape: 348 | # true_box_??: [V, 2] 349 | # pred_box_??: [N, 13, 13, 3, 2] 350 | 351 | # shape: [N, 13, 13, 3, 1, 2] 352 | pred_box_xy = tf.expand_dims(pred_box_xy, -2) 353 | pred_box_wh = tf.expand_dims(pred_box_wh, -2) 354 | 355 | # shape: [1, V, 2] 356 | true_box_xy = tf.expand_dims(true_box_xy, 0) 357 | true_box_wh = tf.expand_dims(true_box_wh, 0) 358 | 359 | # [N, 13, 13, 3, 1, 2] & [1, V, 2] ==> [N, 13, 13, 3, V, 2] 360 | intersect_mins = tf.maximum(pred_box_xy - pred_box_wh / 2., 361 | true_box_xy - true_box_wh / 2.) 362 | intersect_maxs = tf.minimum(pred_box_xy + pred_box_wh / 2., 363 | true_box_xy + true_box_wh / 2.) 364 | intersect_wh = tf.maximum(intersect_maxs - intersect_mins, 0.) 365 | 366 | # shape: [N, 13, 13, 3, V] 367 | intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1] 368 | # shape: [N, 13, 13, 3, 1] 369 | pred_box_area = pred_box_wh[..., 0] * pred_box_wh[..., 1] 370 | # shape: [1, V] 371 | true_box_area = true_box_wh[..., 0] * true_box_wh[..., 1] 372 | 373 | # [N, 13, 13, 3, V] 374 | iou = intersect_area / (pred_box_area + true_box_area - intersect_area + 1e-10) 375 | 376 | return iou 377 | 378 | 379 | -------------------------------------------------------------------------------- /test_single_image.py: -------------------------------------------------------------------------------- 1 | #### 2 | 3 | # edicted by Huangdebo 4 | # test the model using ckpt file 5 | 6 | # *** 7 | 8 | from __future__ import division, print_function 9 | 10 | import tensorflow as tf 11 | import numpy as np 12 | import argparse 13 | import cv2 14 | 15 | from utils.misc_utils import parse_anchors, read_class_names 16 | from utils.nms_utils import gpu_nms 17 | from utils.plot_utils import get_color_table, plot_one_box 18 | 19 | from model.yolov3 import yolov3 20 | from model.yolov3_tiny import yolov3_tiny 21 | 22 | net_name = 'yolov3_tiny' 23 | body_name = 'darknet19' 24 | data_name = 'Car' 25 | ckpt_name = 'model-step_144000_loss_2.974138_lr_3.433207e-06' 26 | 27 | 28 | parser = argparse.ArgumentParser(description="%s test single image test procedure."%net_name) 29 | parser.add_argument("--input_image", type=str, default="./data/demo_data/car.jpg", 30 | help="The path of the input image.") 31 | parser.add_argument("--anchor_path", type=str, default="./data/%s_%s_anchors.txt"%(net_name,data_name), 32 | help="The path of the anchor txt file.") 33 | parser.add_argument("--new_size", nargs='*', type=int, default=[416, 416], 34 | help="Resize the input image with `new_size`, size format: [width, height]") 35 | parser.add_argument("--class_name_path", type=str, default="./data/%s.names"%data_name, 36 | help="The path of the class names.") 37 | parser.add_argument("--restore_path", type=str, default="./checkpoint/%s_%s/%s"%(net_name,data_name,ckpt_name), 38 | help="The path of the weights to restore.") 39 | args = parser.parse_args() 40 | 41 | args.anchors = parse_anchors(args.anchor_path) 42 | args.classes = read_class_names(args.class_name_path) 43 | args.num_class = len(args.classes) 44 | 45 | color_table = get_color_table(args.num_class) 46 | img_ori = cv2.imread(args.input_image) 47 | height_ori, width_ori = img_ori.shape[:2] 48 | img = cv2.resize(img_ori, tuple(args.new_size)) 49 | img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) 50 | img = np.asarray(img, np.float32) 51 | img = img[np.newaxis, :] / 255. 52 | 53 | with tf.Session() as sess: 54 | input_data = tf.placeholder(tf.float32, [1, args.new_size[1], args.new_size[0], 3], name='input_data') 55 | # yolo_model = yolov3(args.num_class, args.anchors) 56 | yolo_model = yolov3_tiny(args.num_class, args.anchors) 57 | with tf.variable_scope(net_name): 58 | pred_feature_maps = yolo_model.forward(input_data, False) 59 | 60 | pred_boxes, pred_confs, pred_probs = yolo_model.predict(pred_feature_maps) 61 | 62 | pred_scores = pred_confs * pred_probs 63 | 64 | boxes, scores, labels = gpu_nms(pred_boxes, pred_scores, args.num_class, max_boxes=200, score_thresh=0.4, iou_thresh=0.5) 65 | 66 | saver = tf.train.Saver() 67 | saver.restore(sess, args.restore_path) 68 | 69 | boxes_, scores_, labels_ = sess.run([boxes, scores, labels], feed_dict={input_data: img}) 70 | 71 | # rescale the coordinates to the original image 72 | boxes_[:, 0] *= (width_ori/float(args.new_size[0])) 73 | boxes_[:, 2] *= (width_ori/float(args.new_size[0])) 74 | boxes_[:, 1] *= (height_ori/float(args.new_size[1])) 75 | boxes_[:, 3] *= (height_ori/float(args.new_size[1])) 76 | 77 | print("box coords:") 78 | print(boxes_) 79 | print('*' * 30) 80 | print("scores:") 81 | print(scores_) 82 | print('*' * 30) 83 | print("labels:") 84 | print(labels_) 85 | 86 | for i in range(len(boxes_)): 87 | x0, y0, x1, y1 = boxes_[i] 88 | plot_one_box(img_ori, [x0, y0, x1, y1], label=args.classes[labels_[i]], color=color_table[labels_[i]]) 89 | cv2.imshow('Detection result', img_ori) 90 | cv2.imwrite('detection_result.jpg', img_ori) 91 | cv2.waitKey(0) 92 | -------------------------------------------------------------------------------- /test_single_image_pb.py: -------------------------------------------------------------------------------- 1 | #### 2 | 3 | # edicted by Huangdebo 4 | # test the model using pb file 5 | 6 | # *** 7 | 8 | import numpy as np 9 | import tensorflow as tf 10 | from tensorflow.gfile import GFile 11 | import cv2 as cv 12 | 13 | from utils.nms_utils import gpu_nms 14 | from utils.plot_utils import get_color_table, plot_one_box 15 | 16 | # The file path to save the data 17 | save_file = 'xxx.pb' 18 | 19 | num_class = 3 20 | classes = ['bicycle', 'car', 'person'] 21 | color_table = get_color_table(num_class) 22 | 23 | 24 | sess = tf.Session() 25 | with GFile(save_file, 'rb') as f: 26 | graph_def = tf.GraphDef() 27 | graph_def.ParseFromString(f.read()) 28 | sess.graph.as_default() 29 | tf.import_graph_def(graph_def, name='') # import the net 30 | 31 | 32 | # initializer 33 | sess.run(tf.global_variables_initializer()) 34 | 35 | 36 | # input 37 | image = sess.graph.get_tensor_by_name('inputs:0') 38 | phase_train = sess.graph.get_tensor_by_name('phase_train:0') 39 | boxes = sess.graph.get_tensor_by_name('boxes:0') 40 | confs = sess.graph.get_tensor_by_name('confs:0') 41 | probs = sess.graph.get_tensor_by_name('probs:0') 42 | 43 | pred_scores = confs * probs 44 | boxes, scores, labels = gpu_nms(boxes, pred_scores, 3, max_boxes=50, score_thresh=0.4, iou_thresh=0.5) 45 | 46 | img_ori = cv.imread('./data/demo_data/1.jpg') 47 | height_ori, width_ori = img_ori.shape[:2] 48 | size=[416, 416] 49 | im = cv.resize(img_ori, size) 50 | 51 | boxes_, scores_, labels_ = sess.run([boxes, confs, probs], feed_dict={phase_train:False, image: im}) 52 | 53 | 54 | # rescale the coordinates to the original image 55 | boxes_[:, 0] *= (width_ori/float(size[0])) 56 | boxes_[:, 2] *= (width_ori/float(size[0])) 57 | boxes_[:, 1] *= (height_ori/float(size[1])) 58 | boxes_[:, 3] *= (height_ori/float(size[1])) 59 | 60 | print("box coords:") 61 | print(boxes_) 62 | print('*' * 30) 63 | print("scores:") 64 | print(scores_) 65 | print('*' * 30) 66 | print("labels:") 67 | print(labels_) 68 | 69 | for i in range(len(boxes_)): 70 | x0, y0, x1, y1 = boxes_[i] 71 | plot_one_box(img_ori, [x0, y0, x1, y1], label=classes[labels_[i]], color=color_table[labels_[i]]) 72 | cv.imshow('Detection result', img_ori) 73 | cv.imwrite('detection_result.jpg', img_ori) 74 | cv.waitKey(0) 75 | 76 | 77 | 78 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | from __future__ import division, print_function 4 | 5 | import tensorflow as tf 6 | from tensorflow.python.framework import graph_util 7 | import numpy as np 8 | import argparse 9 | import logging 10 | 11 | from utils.data_utils import parse_data 12 | from utils.misc_utils import parse_anchors, read_class_names, shuffle_and_overwrite, update_dict, make_summary, config_learning_rate, config_optimizer, list_add 13 | from utils.eval_utils import evaluate_on_cpu, evaluate_on_gpu 14 | from utils.nms_utils import gpu_nms 15 | 16 | from model.yolov3 import yolov3 17 | from model.yolov3_tiny import yolov3_tiny 18 | 19 | net_name = 'yolov3_tiny' 20 | anchors_name = 'yolov3_tiny' 21 | body_name = 'darknet19' 22 | data_name = 'COCO' 23 | 24 | restore_path = '' 25 | is_train_scratch = True 26 | 27 | ################# 28 | # ArgumentParser 29 | ################# 30 | parser = argparse.ArgumentParser(description="%s training procedure."%net_name) 31 | # some paths 32 | parser.add_argument("--train_file", type=str, default="./data/my_data/train_%s_tf.txt"%data_name, 33 | help="The path of the training txt file.") 34 | 35 | parser.add_argument("--val_file", type=str, default="./data/my_data/val_%s_tf.txt"%data_name, 36 | help="The path of the validation txt file.") 37 | 38 | parser.add_argument("--restore_path", type=str, default=restore_path, 39 | help="The path of the weights to restore.") 40 | 41 | parser.add_argument("--save_dir", type=str, default="./checkpoint/%s_%s/"%(net_name,data_name), 42 | help="The directory of the weights to save.") 43 | 44 | parser.add_argument("--log_dir", type=str, default="./data/logs/%s_%s/"%(net_name,data_name), 45 | help="The directory to store the tensorboard log files.") 46 | 47 | parser.add_argument("--progress_log_path", type=str, default="./data/%s_progress.log"%data_name, 48 | help="The path to record the training progress.") 49 | 50 | parser.add_argument("--anchor_path", type=str, default="./data/%s_%s_anchors.txt"%(anchors_name,data_name), 51 | help="The path of the anchor txt file.") 52 | 53 | parser.add_argument("--class_name_path", type=str, default="./data/%s.names"%data_name, 54 | help="The path of the class names.") 55 | 56 | # some numbers 57 | parser.add_argument("--batch_size", type=int, default=32, 58 | help="The batch size for training.") 59 | 60 | parser.add_argument("--img_size", nargs='*', type=int, default=[416, 416], 61 | help="Resize the input image to `img_size`, size format: [width, height]") 62 | 63 | parser.add_argument("--total_epoches", type=int, default=100, 64 | help="Total epoches to train.") 65 | 66 | parser.add_argument("--train_evaluation_freq", type=int, default=500, 67 | help="Evaluate on the training batch after some steps.") 68 | 69 | parser.add_argument("--print_evaluation_freq", type=int, default=50, 70 | help="Print the result of evaluate on the training batch after some steps.") 71 | 72 | parser.add_argument("--val_evaluation_freq", type=int, default=2000, 73 | help="Evaluate on the whole validation dataset after some steps.") 74 | 75 | parser.add_argument("--save_freq", type=int, default=2000, 76 | help="Save the model after some steps.") 77 | 78 | parser.add_argument("--num_threads", type=int, default=4, 79 | help="Number of threads for image processing used in tf.data pipeline.") 80 | 81 | parser.add_argument("--prefetech_buffer", type=int, default=3, 82 | help="Prefetech_buffer used in tf.data pipeline.") 83 | 84 | # learning rate and optimizer 85 | parser.add_argument("--optimizer_name", type=str, default='adam', 86 | help="The optimizer name. Chosen from [sgd, momentum, adam, rmsprop]") 87 | 88 | parser.add_argument("--save_optimizer", type=lambda x: (str(x).lower() == 'true'), default=False, 89 | help="Whether to save the optimizer parameters into the checkpoint file.") 90 | 91 | parser.add_argument("--learning_rate_init", type=float, default=1.0e-3, 92 | help="The initial learning rate.") 93 | 94 | parser.add_argument("--lr_type", type=str, default='exponential', 95 | help="The learning rate type. Chosen from [fixed, exponential]") 96 | 97 | parser.add_argument("--lr_decay_freq", type=int, default=1000, 98 | help="The learning rate decay frequency. Used when chosen exponential lr_type.") 99 | 100 | parser.add_argument("--lr_decay_factor", type=float, default=0.96, 101 | help="The learning rate decay factor. Used when chosen exponential lr_type.") 102 | 103 | parser.add_argument("--lr_lower_bound", type=float, default=1e-8, 104 | help="The minimum learning rate. Used when chosen exponential lr type.") 105 | 106 | # finetune 107 | parser.add_argument("--restore_part", nargs='*', type=str, default=['%s/%s_body'%(net_name,body_name)], 108 | help="Partially restore part of the model for finetuning. Set [None] to restore the whole model.") 109 | 110 | parser.add_argument("--update_part", nargs='*', type=str, default=['%s/%s_head'%(net_name,net_name)], 111 | help="Partially restore part of the model for finetuning. Set [None] to train the whole model.") 112 | 113 | # warm up strategy 114 | parser.add_argument("--use_warm_up", type=lambda x: (str(x).lower() == 'true'), default=True, 115 | help="Whether to use warm up strategy.") 116 | 117 | parser.add_argument("--warm_up_lr", type=float, default=5e-5, 118 | help="Warm up learning rate.") 119 | 120 | parser.add_argument("--warm_up_epoch", type=int, default=1, 121 | help="Warm up training epoches.") 122 | args = parser.parse_args() 123 | 124 | # args params 125 | args.anchors = parse_anchors(args.anchor_path) 126 | args.classes = read_class_names(args.class_name_path) 127 | args.class_num = len(args.classes) 128 | args.train_img_cnt = len(open(args.train_file, 'r').readlines()) 129 | args.val_img_cnt = len(open(args.val_file, 'r').readlines()) 130 | args.train_batch_num = int(np.floor(float(args.train_img_cnt) / args.batch_size)) 131 | args.val_batch_num = int(np.floor(float(args.val_img_cnt) / args.batch_size)) 132 | 133 | # setting loggers 134 | logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(levelname)s %(message)s', 135 | datefmt='%a, %d %b %Y %H:%M:%S', filename=args.progress_log_path, filemode='w') 136 | 137 | # setting placeholders 138 | is_training = tf.placeholder(dtype=tf.bool, name="phase_train") 139 | handle_flag = tf.placeholder(tf.string, [], name='iterator_handle_flag') 140 | 141 | ################## 142 | # tf.data pipeline 143 | ################## 144 | # Selecting `feedable iterator` to switch between training dataset and validation dataset 145 | 146 | # manually shuffle the train txt file because tf.data.shuffle is soooo slow!! 147 | # you can google it for more details. 148 | shuffle_and_overwrite(args.train_file) 149 | train_dataset = tf.data.TextLineDataset(args.train_file) 150 | train_dataset = train_dataset.apply(tf.data.experimental.map_and_batch( 151 | lambda x: tf.py_func(parse_data, [x, args.class_num, args.img_size, args.anchors, 'train'], [tf.float32, tf.float32, tf.float32, tf.float32]), 152 | num_parallel_calls=args.num_threads, batch_size=args.batch_size)) 153 | train_dataset = train_dataset.prefetch(args.prefetech_buffer) 154 | 155 | val_dataset = tf.data.TextLineDataset(args.val_file) 156 | val_dataset = val_dataset.apply(tf.data.experimental.map_and_batch( 157 | lambda x: tf.py_func(parse_data, [x, args.class_num, args.img_size, args.anchors, 'val'], [tf.float32, tf.float32, tf.float32, tf.float32]), 158 | num_parallel_calls=args.num_threads, batch_size=args.batch_size)) 159 | val_dataset.prefetch(args.prefetech_buffer) 160 | 161 | # creating two dataset iterators 162 | train_iterator = train_dataset.make_initializable_iterator() 163 | val_iterator = val_dataset.make_initializable_iterator() 164 | 165 | # creating two dataset handles 166 | train_handle = train_iterator.string_handle() 167 | val_handle = val_iterator.string_handle() 168 | # select a specific iterator based on the passed handle 169 | dataset_iterator = tf.data.Iterator.from_string_handle(handle_flag, train_dataset.output_types, 170 | train_dataset.output_shapes) 171 | 172 | # get an element from the choosed dataset iterator 173 | image, y_true_13, y_true_26, y_true_52 = dataset_iterator.get_next() 174 | 175 | if (net_name == 'yolov3'): 176 | y_true = [y_true_13, y_true_26, y_true_52] 177 | else: 178 | y_true = [y_true_13, y_true_26] 179 | 180 | # tf.data pipeline will lose the data shape, so we need to set it manually 181 | image.set_shape([None, args.img_size[1], args.img_size[0], 3]) 182 | for y in y_true: 183 | y.set_shape([None, None, None, None, None]) 184 | 185 | ################## 186 | # Model definition 187 | ################## 188 | 189 | # define yolo-v3 model here 190 | #yolo_model = yolov3(args.class_num, args.anchors) 191 | yolo_model = yolov3_tiny(args.class_num, args.anchors) 192 | 193 | 194 | # the input variables name of .pb 195 | image = tf.identity(image, name='inputs') 196 | with tf.variable_scope(net_name): 197 | pred_feature_maps = yolo_model.forward(image, is_training=is_training) 198 | loss = yolo_model.compute_loss(pred_feature_maps, y_true) 199 | y_pred = yolo_model.predict(pred_feature_maps) 200 | 201 | # the output variables name of .pb 202 | boxes, confs, probs = y_pred 203 | boxes = tf.identity(boxes, name='boxes') 204 | confs = tf.identity(confs, name='confs') 205 | probs = tf.identity(probs, name='probs') 206 | 207 | ################ 208 | # register the gpu nms operation here for the following evaluation scheme 209 | pred_boxes_flag = tf.placeholder(tf.float32, [1, None, None]) 210 | pred_scores_flag = tf.placeholder(tf.float32, [1, None, None]) 211 | gpu_nms_op = gpu_nms(pred_boxes_flag, pred_scores_flag, args.class_num) 212 | ################ 213 | 214 | # train the whole model from scratch 215 | if is_train_scratch == True: 216 | args.restore_part = ['None'] 217 | args.update_part = ['None'] 218 | 219 | if args.restore_part == ['None']: 220 | args.restore_part = [None] 221 | if args.update_part == ['None']: 222 | args.update_part = [None] 223 | saver_to_restore = tf.train.Saver(var_list=tf.contrib.framework.get_variables_to_restore(include=args.restore_part)) 224 | update_vars = tf.contrib.framework.get_variables_to_restore(include=args.update_part) 225 | 226 | tf.summary.scalar('train_batch_statistics/total_loss', loss[0]) 227 | tf.summary.scalar('train_batch_statistics/loss_xy', loss[1]) 228 | tf.summary.scalar('train_batch_statistics/loss_wh', loss[2]) 229 | tf.summary.scalar('train_batch_statistics/loss_conf', loss[3]) 230 | tf.summary.scalar('train_batch_statistics/loss_class', loss[4]) 231 | 232 | global_step = tf.Variable(0, trainable=False, collections=[tf.GraphKeys.LOCAL_VARIABLES]) 233 | if args.use_warm_up: 234 | learning_rate = tf.cond(tf.less(global_step, args.train_batch_num * args.warm_up_epoch), 235 | lambda: args.warm_up_lr, lambda: config_learning_rate(args, global_step - args.train_batch_num * args.warm_up_epoch)) 236 | else: 237 | learning_rate = config_learning_rate(args, global_step) 238 | tf.summary.scalar('learning_rate', learning_rate) 239 | 240 | if not args.save_optimizer: 241 | saver_to_save = tf.train.Saver() 242 | 243 | optimizer = config_optimizer(args.optimizer_name, learning_rate) 244 | 245 | if args.save_optimizer: 246 | saver_to_save = tf.train.Saver() 247 | 248 | # set dependencies for BN ops 249 | update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) 250 | with tf.control_dependencies(update_ops): 251 | train_op = optimizer.minimize(loss[0], var_list=update_vars, global_step=global_step) 252 | 253 | 254 | with tf.Session() as sess: 255 | sess.run([tf.global_variables_initializer(), tf.local_variables_initializer(), train_iterator.initializer]) 256 | train_handle_value, val_handle_value = sess.run([train_handle, val_handle]) 257 | if args.restore_path: 258 | saver_to_restore.restore(sess, args.restore_path) 259 | 260 | merged = tf.summary.merge_all() 261 | writer = tf.summary.FileWriter(args.log_dir, sess.graph) 262 | 263 | print('\n----------- start to train -----------\n') 264 | 265 | for epoch in range(args.total_epoches): 266 | for i in range(args.train_batch_num): 267 | _, summary, y_pred_, y_true_, loss_, global_step_, lr = sess.run([train_op, merged, y_pred, y_true, loss, global_step, learning_rate], 268 | feed_dict={is_training: True, handle_flag: train_handle_value}) 269 | writer.add_summary(summary, global_step=global_step_) 270 | 271 | # print the evaluation 272 | if global_step_ % args.print_evaluation_freq == 0 and global_step_ > 0: 273 | info = "Epoch: {}, global_step: {}, lr: {:.8f}, total_loss: {:.3f}, loss_xy: {:.3f}, loss_wh: {:.3f}, loss_conf: {:.3f}, loss_class: {:.3f}".format( 274 | epoch, global_step_, lr, loss_[0], loss_[1], loss_[2], loss_[3], loss_[4]) 275 | print(info) 276 | logging.info(info) 277 | 278 | # evaluation on the training batch 279 | if global_step_ % args.train_evaluation_freq == 0 and global_step_ > 0: 280 | # recall, precision = evaluate_on_cpu(y_pred_, y_true_, args.class_num, calc_now=True) 281 | recall, precision = evaluate_on_gpu(sess, gpu_nms_op, pred_boxes_flag, pred_scores_flag, y_pred_, y_true_, args.class_num, calc_now=True) 282 | info = "===> batch recall: {:.3f}, batch precision: {:.3f} <===".format(recall, precision) 283 | print(info) 284 | logging.info(info) 285 | 286 | writer.add_summary(make_summary('evaluation/train_batch_recall', recall), global_step=global_step_) 287 | writer.add_summary(make_summary('evaluation/train_batch_precision', precision), global_step=global_step_) 288 | 289 | # start to save 290 | # NOTE: this is just demo. You can set the conditions when to save the weights. 291 | if global_step_ % args.save_freq == 0 and global_step_ > 0: 292 | if loss_[0] <= 10. or (epoch > 20 and global_step_ % (args.save_freq * 4) == 0): 293 | saver_to_save.save(sess, args.save_dir + 'model-step_{}_loss_{:4f}_lr_{:.7g}'.format(global_step_, loss_[0], lr)) 294 | 295 | # switch to validation dataset for evaluation 296 | if global_step_ % args.val_evaluation_freq == 0 and global_step_ > 0: 297 | sess.run(val_iterator.initializer) 298 | true_positive_dict, true_labels_dict, pred_labels_dict = {}, {}, {} 299 | val_loss = [0., 0., 0., 0., 0.] 300 | for j in range(args.val_batch_num): 301 | y_pred_, y_true_, loss_ = sess.run([y_pred, y_true, loss], 302 | feed_dict={is_training: False, handle_flag: val_handle_value}) 303 | true_positive_dict_tmp, true_labels_dict_tmp, pred_labels_dict_tmp = \ 304 | evaluate_on_gpu(sess, gpu_nms_op, pred_boxes_flag, pred_scores_flag, 305 | y_pred_, y_true_, args.class_num, calc_now=False) 306 | true_positive_dict = update_dict(true_positive_dict, true_positive_dict_tmp) 307 | true_labels_dict = update_dict(true_labels_dict, true_labels_dict_tmp) 308 | pred_labels_dict = update_dict(pred_labels_dict, pred_labels_dict_tmp) 309 | 310 | val_loss = list_add(val_loss, loss_) 311 | 312 | # make sure there is at least one ground truth object in each image 313 | # avoid divided by 0 314 | recall = float(sum(true_positive_dict.values())) / (sum(true_labels_dict.values()) + 1e-6) 315 | precision = float(sum(true_positive_dict.values())) / (sum(pred_labels_dict.values()) + 1e-6) 316 | 317 | info = "===> Epoch: {}, global_step: {}, recall: {:.3f}, precision: {:.3f}, total_loss: {:.3f}, loss_xy: {:.3f}, loss_wh: {:.3f}, loss_conf: {:.3f}, loss_class: {:.3f}".format( 318 | epoch, global_step_, recall, precision, val_loss[0] / args.val_batch_num, val_loss[1] / args.val_batch_num, val_loss[2] / args.val_batch_num, val_loss[3] / args.val_batch_num, val_loss[4] / args.val_batch_num) 319 | print(info) 320 | logging.info(info) 321 | writer.add_summary(make_summary('evaluation/val_recall', recall), global_step=epoch) 322 | writer.add_summary(make_summary('evaluation/val_precision', precision), global_step=epoch) 323 | 324 | writer.add_summary(make_summary('validation_statistics/total_loss', val_loss[0] / args.val_batch_num), global_step=epoch) 325 | writer.add_summary(make_summary('validation_statistics/loss_xy', val_loss[1] / args.val_batch_num), global_step=epoch) 326 | writer.add_summary(make_summary('validation_statistics/loss_wh', val_loss[2] / args.val_batch_num), global_step=epoch) 327 | writer.add_summary(make_summary('validation_statistics/loss_conf', val_loss[3] / args.val_batch_num), global_step=epoch) 328 | writer.add_summary(make_summary('validation_statistics/loss_class', val_loss[4] / args.val_batch_num), global_step=epoch) 329 | 330 | # manually shuffle the training data in a new epoch 331 | shuffle_and_overwrite(args.train_file) 332 | sess.run(train_iterator.initializer) 333 | 334 | # save the pb, "boxes","confs","probs" 为输出变量 335 | print('saving the pb ...') 336 | constant_graph = graph_util.convert_variables_to_constants(sess, sess.graph_def, ['boxes', 'confs','probs']) 337 | with tf.gfile.FastGFile(args.save_dir+'model.pb', mode='wb') as f: 338 | f.write(constant_graph.SerializeToString()) 339 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/utils/__init__.py -------------------------------------------------------------------------------- /utils/data_utils.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | from __future__ import division, print_function 4 | 5 | import numpy as np 6 | import tensorflow as tf 7 | import cv2 8 | import random 9 | 10 | # 只针对一个样本的信息 11 | def parse_line(line): 12 | ''' 13 | Given a line from the training/test txt file, return parsed 14 | pic_path, boxes info, and label info. 15 | return: 16 | pic_path: string. 17 | boxes: shape [N, 4], N is the ground truth count, elements in the second 18 | dimension are [x_min, y_min, x_max, y_max] 19 | ''' 20 | line = line.decode() 21 | s = line.strip().split(' ') 22 | pic_path = s[0] 23 | s = s[1:] 24 | box_cnt = len(s) // 5 25 | boxes = [] 26 | labels = [] 27 | for i in range(box_cnt): 28 | label, x_min, y_min, x_max, y_max = int(s[i*5]), float(s[i*5+1]), float(s[i*5+2]), float(s[i*5+3]), float(s[i*5+4]) 29 | boxes.append([x_min, y_min, x_max, y_max]) 30 | labels.append(label) 31 | boxes = np.asarray(boxes, np.float32) 32 | labels = np.asarray(labels, np.int64) 33 | return pic_path, boxes, labels 34 | 35 | 36 | def resize_image_and_correct_boxes(img, boxes, img_size): 37 | # convert gray scale image to 3-channel fake RGB image 38 | if len(img) == 2: 39 | img = np.expand_dims(img, -1) 40 | ori_height, ori_width = np.shape(img)[:2] 41 | new_width, new_height = img_size 42 | # shape to (new_height, new_width) 43 | img = cv2.resize(img, (new_width, new_height)) 44 | 45 | # convert to float 46 | img = np.asarray(img, np.float32) 47 | 48 | # boxes 49 | # xmin, xmax 50 | boxes[:, 0] = boxes[:, 0] / ori_width * new_width 51 | boxes[:, 2] = boxes[:, 2] / ori_width * new_width 52 | # ymin, ymax 53 | boxes[:, 1] = boxes[:, 1] / ori_height * new_height 54 | boxes[:, 3] = boxes[:, 3] / ori_height * new_height 55 | 56 | return img, boxes 57 | 58 | 59 | def data_augmentation(img, boxes, label): 60 | ''' 61 | Do your own data augmentation here. 62 | note: if use the clip, the data_augmentation() must before the resize_image_and_correct_boxes() 63 | param: 64 | img: a [H, W, 3] shape RGB format image, float32 dtype 65 | boxes: [N, 4] shape boxes coordinate info, N is the ground truth box number, 66 | 4 elements in the second dimension are [x_min, y_min, x_max, y_max], float32 dtype 67 | label: [N] shape labels, int64 dtype (you should not convert to int32) 68 | ''' 69 | 70 | # randomly clip the image 71 | 72 | # img, boxes = clip_image(img, boxes) 73 | 74 | # randomly change the bright 75 | img = bright_image(img) 76 | 77 | # randomly change the saturation 78 | img = saturation_image(img) 79 | 80 | return img, boxes, label 81 | 82 | # randomly clip the image 83 | def clip_image(img, boxes): 84 | 85 | xmin = 10000 86 | ymin = 10000 87 | xmax = 0 88 | ymax = 0 89 | for i in range(len(boxes)): 90 | if boxes[i][0] < xmin: 91 | xmin = boxes[i][0] 92 | 93 | if boxes[i][1] < ymin: 94 | ymin = boxes[i][1] 95 | 96 | if boxes[i][2] > xmax: 97 | xmax = boxes[i][2] 98 | 99 | if boxes[i][3] > ymax: 100 | ymax = boxes[i][3] 101 | 102 | max_h = int(np.shape(img)[0]/6) 103 | max_w = int(np.shape(img)[1]/6) 104 | rand_l = random.randint(0, max_w) 105 | rand_l = min(rand_l, xmin) 106 | rand_r = random.randint(0, max_w) 107 | rand_r = min(rand_r, np.shape(img)[1]-xmax) 108 | rand_u = random.randint(0, max_h) 109 | rand_u = min(rand_u, ymin) 110 | rand_d = random.randint(0, max_h) 111 | rand_d = min(rand_d, np.shape(img)[0]-ymax) 112 | 113 | img_clip = img[rand_u:np.shape(img)[0]-rand_d, rand_l:np.shape(img)[1]-rand_r] 114 | 115 | for i in range(len(boxes)): 116 | boxes[i][0] = max(boxes[i][0]-rand_l, 0) 117 | boxes[i][1] = max(boxes[i][1]-rand_u, 0) 118 | boxes[i][2] = min(boxes[i][2]-rand_l, np.shape(img_clip)[1]-1) 119 | boxes[i][3] = min(boxes[i][3]-rand_u, np.shape(img_clip)[0]-1) 120 | 121 | return img_clip, boxes 122 | 123 | # randomly change the bright 124 | def bright_image(img, adjust=0.25): 125 | 126 | a = random.uniform(0, adjust) 127 | flag = random.randint(-1, 1) 128 | 129 | img_u = (255-img)*a 130 | img_d = img*a 131 | 132 | delta = np.minimum(img_u, img_d) 133 | img_new = img + flag*delta 134 | 135 | return img_new 136 | 137 | 138 | # randomly change the saturation 139 | def saturation_image(img, adjust=0.1): 140 | 141 | a_0 = random.uniform(1-adjust, 1+adjust) 142 | a_1 = random.uniform(1-adjust, 1+adjust) 143 | a_2 = random.uniform(1-adjust, 1+adjust) 144 | 145 | img_new = np.ones(img.shape) * 255 146 | img_new[:,:,0] = np.minimum(img[:,:,0] * a_0, img_new[:,:,0]) 147 | img_new[:,:,1] = np.minimum(img[:,:,1] * a_1, img_new[:,:,1]) 148 | img_new[:,:,2] = np.minimum(img[:,:,2] * a_2, img_new[:,:,2]) 149 | 150 | return img_new 151 | 152 | 153 | # 只针对一个样本的信息,成成 ground truth box 对应的网格信息,没有物体的网格值为 0 154 | def process_box(boxes, labels, img_size, class_num, anchors): 155 | ''' 156 | Generate the y_true label, i.e. the ground truth feature_maps in 3 different scales. 157 | ''' 158 | anchors_mask = [[6,7,8], [3,4,5], [0,1,2]] 159 | 160 | # convert boxes form: 161 | # shape: [N, 2] 162 | # (x_center, y_center) 163 | box_centers = (boxes[:, 0:2] + boxes[:, 2:4]) / 2 164 | # (width, height) 165 | box_sizes = boxes[:, 2:4] - boxes[:, 0:2] 166 | 167 | # [13, 13, 3, 3+num_class] 168 | y_true_13 = np.zeros((img_size[1] // 32, img_size[0] // 32, 3, 5 + class_num), np.float32) 169 | y_true_26 = np.zeros((img_size[1] // 16, img_size[0] // 16, 3, 5 + class_num), np.float32) 170 | y_true_52 = np.zeros((img_size[1] // 8, img_size[0] // 8, 3, 5 + class_num), np.float32) 171 | 172 | y_true = [y_true_13, y_true_26, y_true_52] 173 | 174 | # [N, 1, 2] 175 | box_sizes = np.expand_dims(box_sizes, 1) 176 | # broadcast tricks 177 | # [N, 1, 2] & [9, 2] ==> [N, 9, 2] 178 | mins = np.maximum(- box_sizes / 2, - anchors / 2) 179 | maxs = np.minimum(box_sizes / 2, anchors / 2) 180 | # [N, 9, 2] 181 | whs = maxs - mins 182 | 183 | # [N, 9] 计算 ground truth box 与 9个 anchors 的 iou(因为每个像素都对应一个 anchor, 所以计算 iou 时不用考虑 x,y 184 | iou = (whs[:, :, 0] * whs[:, :, 1]) / (box_sizes[:, :, 0] * box_sizes[:, :, 1] + anchors[:, 0] * anchors[:, 1] - whs[:, :, 0] * whs[:, :, 1] + 1e-10) 185 | # [N] 找出 iou 最大的坐标 186 | best_match_idx = np.argmax(iou, axis=1) 187 | 188 | ratio_dict = {1.: 8., 2.: 16., 3.: 32.} 189 | for i, idx in enumerate(best_match_idx): 190 | # idx: 0,1,2 ==> 0; 3,4,5 ==> 1; 6,7,8 ==> 2 找出对应的尺寸 191 | feature_map_group = 2 - idx // 3 192 | # scale ratio: 0,1,2 ==> 8; 3,4,5 ==> 16; 6,7,8 ==> 32 193 | ratio = ratio_dict[np.ceil((idx + 1) / 3.)] 194 | x = int(np.floor(box_centers[i, 0] / ratio)) # 找出 ground truth box 对应的网格坐标 195 | y = int(np.floor(box_centers[i, 1] / ratio)) 196 | k = anchors_mask[feature_map_group].index(idx) 197 | c = labels[i] 198 | # print feature_map_group, '|', y,x,k,c 199 | 200 | # 把信息写入对应的信息表中(成网格状),没有物体的网格信息为 0 201 | y_true[feature_map_group][y, x, k, :2] = box_centers[i] 202 | y_true[feature_map_group][y, x, k, 2:4] = box_sizes[i] 203 | y_true[feature_map_group][y, x, k, 4] = 1. 204 | y_true[feature_map_group][y, x, k, 5+c] = 1. 205 | 206 | return y_true_13, y_true_26, y_true_52 207 | 208 | # 只针对一个样本的信息,成成 ground truth box 对应的网格信息,没有物体的网格值为 0 209 | def process_box_tiny(boxes, labels, img_size, class_num, anchors): 210 | ''' 211 | Generate the y_true label, i.e. the ground truth feature_maps in 3 different scales. 212 | ''' 213 | anchors_mask = [[3,4,5], [0,1,2]] 214 | 215 | # convert boxes form: 216 | # shape: [N, 2] 217 | # (x_center, y_center) 218 | box_centers = (boxes[:, 0:2] + boxes[:, 2:4]) / 2 219 | # (width, height) 220 | box_sizes = boxes[:, 2:4] - boxes[:, 0:2] 221 | 222 | # [13, 13, 3, 3+num_class] 223 | y_true_13 = np.zeros((img_size[1] // 32, img_size[0] // 32, 3, 5 + class_num), np.float32) 224 | y_true_26 = np.zeros((img_size[1] // 16, img_size[0] // 16, 3, 5 + class_num), np.float32) 225 | 226 | y_true = [y_true_13, y_true_26] 227 | 228 | # [N, 1, 2] 229 | box_sizes = np.expand_dims(box_sizes, 1) 230 | # broadcast tricks 231 | # [N, 1, 2] & [6, 2] ==> [N, 6, 2] 232 | mins = np.maximum(- box_sizes / 2, - anchors / 2) 233 | maxs = np.minimum(box_sizes / 2, anchors / 2) 234 | # [N, 6, 2] 235 | whs = maxs - mins 236 | 237 | # [N, 6] 计算 ground truth box 与 6个 anchors 的 iou(因为每个像素都对应一个 anchor, 所以计算 iou 时不用考虑 x,y 238 | iou = (whs[:, :, 0] * whs[:, :, 1]) / (box_sizes[:, :, 0] * box_sizes[:, :, 1] + anchors[:, 0] * anchors[:, 1] - whs[:, :, 0] * whs[:, :, 1] + 1e-10) 239 | # [N] 找出 iou 最大的坐标 240 | best_match_idx = np.argmax(iou, axis=1) 241 | 242 | ratio_dict = {1.: 16., 2.: 32.} 243 | for i, idx in enumerate(best_match_idx): 244 | # idx: 0,1,2 ==> 0; 3,4,5 ==> 1; 6,7,8 ==> 2 找出对应的尺寸 245 | feature_map_group = 1 - idx // 3 246 | # scale ratio: 0,1,2 ==> 8; 3,4,5 ==> 16; 6,7,8 ==> 32 247 | ratio = ratio_dict[np.ceil((idx + 1) / 3.)] 248 | x = int(np.floor(box_centers[i, 0] / ratio)) # 找出 ground truth box 对应的网格坐标 249 | y = int(np.floor(box_centers[i, 1] / ratio)) 250 | k = anchors_mask[feature_map_group].index(idx) 251 | c = labels[i] 252 | # print feature_map_group, '|', y,x,k,c 253 | 254 | # 把信息写入对应的信息表中(成网格状),没有物体的网格信息为 0 255 | y_true[feature_map_group][y, x, k, :2] = box_centers[i] 256 | y_true[feature_map_group][y, x, k, 2:4] = box_sizes[i] 257 | y_true[feature_map_group][y, x, k, 4] = 1. 258 | y_true[feature_map_group][y, x, k, 5+c] = 1. 259 | 260 | return y_true_13, y_true_26 261 | 262 | 263 | def parse_data(line, class_num, img_size, anchors, mode): 264 | ''' 265 | param: 266 | line: a line from the training/test txt file 267 | args: args returned from the main program 268 | mode: 'train' or 'val'. When set to 'train', data_augmentation will be applied. 269 | ''' 270 | pic_path, boxes, labels = parse_line(line) 271 | 272 | img = cv2.imread(pic_path) 273 | img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) 274 | 275 | img, boxes = resize_image_and_correct_boxes(img, boxes, img_size) 276 | 277 | # do data augmentation here 278 | # note: if use the clip, the data_augmentation() must before the resize_image_and_correct_boxes() 279 | if mode == 'train': 280 | img, boxes, labels = data_augmentation(img, boxes, labels) 281 | 282 | # the input of yolo_v3 should be in range 0~1 283 | img = img / 255 284 | 285 | if (np.shape(anchors)[0] == 9): 286 | y_true_13, y_true_26, y_true_52 = process_box(boxes, labels, img_size, class_num, anchors) 287 | return img, y_true_13, y_true_26, y_true_52 288 | 289 | elif (np.shape(anchors)[0] == 6): 290 | y_true_13, y_true_26 = process_box_tiny(boxes, labels, img_size, class_num, anchors) 291 | y_true_xx = y_true_26 292 | return img, y_true_13, y_true_26, y_true_xx 293 | -------------------------------------------------------------------------------- /utils/eval_utils.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | from __future__ import division, print_function 4 | 5 | import numpy as np 6 | from collections import Counter 7 | 8 | from utils.nms_utils import cpu_nms, gpu_nms 9 | 10 | 11 | def calc_iou(pred_boxes, true_boxes): 12 | ''' 13 | Maintain an efficient way to calculate the ios matrix using the numpy broadcast tricks. 14 | shape_info: pred_boxes: [N, 4] 15 | true_boxes: [V, 4] 16 | ''' 17 | 18 | # [N, 1, 4] 19 | pred_boxes = np.expand_dims(pred_boxes, -2) 20 | # [1, V, 4] 21 | true_boxes = np.expand_dims(true_boxes, 0) 22 | 23 | # [N, 1, 2] & [1, V, 2] ==> [N, V, 2] 24 | intersect_mins = np.maximum(pred_boxes[..., :2], true_boxes[..., :2]) 25 | intersect_maxs = np.minimum(pred_boxes[..., 2:], true_boxes[..., 2:]) 26 | intersect_wh = np.maximum(intersect_maxs - intersect_mins, 0.) 27 | 28 | # shape: [N, V] 29 | intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1] 30 | # shape: [N, 1, 2] 31 | pred_box_wh = pred_boxes[..., 2:] - pred_boxes[..., :2] 32 | # shape: [N, 1] 33 | pred_box_area = pred_box_wh[..., 0] * pred_box_wh[..., 1] 34 | # [1, V, 2] 35 | true_boxes_wh = true_boxes[..., 2:] - true_boxes[..., :2] 36 | # [1, V] 37 | true_boxes_area = true_boxes_wh[..., 0] * true_boxes_wh[..., 1] 38 | 39 | # shape: [N, V] 40 | iou = intersect_area / (pred_box_area + true_boxes_area - intersect_area + 1e-10) 41 | 42 | return iou 43 | 44 | 45 | def evaluate_on_cpu(y_pred, y_true, num_classes, calc_now=True, score_thresh=0.5, iou_thresh=0.5): 46 | # y_pred -> [None, 13, 13, 255], 47 | # [None, 26, 26, 255], 48 | # [None, 52, 52, 255], 49 | 50 | num_images = y_true[0].shape[0] 51 | true_labels_dict = {i: 0 for i in range(num_classes)} # {class: count} 52 | pred_labels_dict = {i: 0 for i in range(num_classes)} 53 | true_positive_dict = {i: 0 for i in range(num_classes)} 54 | 55 | for i in range(num_images): 56 | true_labels_list, true_boxes_list = [], [] 57 | for j in range(len(y_true)): # three feature maps 58 | # shape: [13, 13, 3, 80] 59 | true_probs_temp = y_true[j][i][..., 5:] 60 | # shape: [13, 13, 3, 4] (x_center, y_center, w, h) 61 | true_boxes_temp = y_true[j][i][..., 0:4] 62 | 63 | # [13, 13, 3] 64 | object_mask = true_probs_temp.sum(axis=-1) > 0 65 | 66 | # [V, 3] V: Ground truth number of the current image 67 | true_probs_temp = true_probs_temp[object_mask] 68 | # [V, 4] 69 | true_boxes_temp = true_boxes_temp[object_mask] 70 | 71 | # [V], labels 72 | true_labels_list += np.argmax(true_probs_temp, axis=-1).tolist() 73 | # [V, 4] (x_center, y_center, w, h) 74 | true_boxes_list += true_boxes_temp.tolist() 75 | 76 | if len(true_labels_list) != 0: 77 | for cls, count in Counter(true_labels_list).items(): 78 | true_labels_dict[cls] += count 79 | 80 | # [V, 4] (xmin, ymin, xmax, ymax) 81 | true_boxes = np.array(true_boxes_list) 82 | box_centers, box_sizes = true_boxes[:, 0:2], true_boxes[:, 2:4] 83 | true_boxes[:, 0:2] = box_centers - box_sizes / 2. 84 | true_boxes[:, 2:4] = true_boxes[:, 0:2] + box_sizes 85 | 86 | # [1, xxx, 4] 87 | pred_boxes = y_pred[0][i:i + 1] 88 | pred_confs = y_pred[1][i:i + 1] 89 | pred_probs = y_pred[2][i:i + 1] 90 | 91 | # pred_boxes: [N, 4] 92 | # pred_confs: [N] 93 | # pred_labels: [N] 94 | # N: Detected box number of the current image 95 | pred_boxes, pred_confs, pred_labels = cpu_nms(pred_boxes, pred_confs * pred_probs, num_classes, 96 | score_thresh=score_thresh, iou_thresh=iou_thresh) 97 | 98 | # len: N 99 | pred_labels_list = [] if pred_labels is None else pred_labels.tolist() 100 | if pred_labels_list == []: 101 | continue 102 | 103 | # calc iou 104 | # [N, V] 105 | iou_matrix = calc_iou(pred_boxes, true_boxes) 106 | # [N] 107 | max_iou_idx = np.argmax(iou_matrix, axis=-1) 108 | 109 | correct_idx = [] 110 | correct_conf = [] 111 | for k in range(max_iou_idx.shape[0]): 112 | pred_labels_dict[pred_labels_list[k]] += 1 113 | match_idx = max_iou_idx[k] # V level 114 | if iou_matrix[k, match_idx] > iou_thresh and true_labels_list[match_idx] == pred_labels_list[k]: 115 | if not match_idx in correct_idx: 116 | correct_idx.append(match_idx) 117 | correct_conf.append(pred_confs[k]) 118 | else: 119 | same_idx = correct_idx.index(match_idx) 120 | if pred_confs[k] > correct_conf[same_idx]: 121 | correct_idx.pop(same_idx) 122 | correct_conf.pop(same_idx) 123 | correct_idx.append(match_idx) 124 | correct_conf.append(pred_confs[k]) 125 | 126 | for t in correct_idx: 127 | true_positive_dict[true_labels_list[t]] += 1 128 | 129 | if calc_now: 130 | # avoid divided by 0 131 | recall = sum(true_positive_dict.values()) / (sum(true_labels_dict.values()) + 1e-6) 132 | precision = sum(true_positive_dict.values()) / (sum(pred_labels_dict.values()) + 1e-6) 133 | 134 | return recall, precision 135 | else: 136 | return true_positive_dict, true_labels_dict, pred_labels_dict 137 | 138 | 139 | def evaluate_on_gpu(sess, gpu_nms_op, pred_boxes_flag, pred_scores_flag, y_pred, y_true, num_classes, calc_now=True, score_thresh=0.5, iou_thresh=0.5): 140 | # y_pred -> [None, 13, 13, 255], 141 | # [None, 26, 26, 255], 142 | # [None, 52, 52, 255], 143 | 144 | num_images = y_true[0].shape[0] 145 | true_labels_dict = {i: 0 for i in range(num_classes)} # {class: count} 146 | pred_labels_dict = {i: 0 for i in range(num_classes)} 147 | true_positive_dict = {i: 0 for i in range(num_classes)} 148 | 149 | for i in range(num_images): 150 | true_labels_list, true_boxes_list = [], [] 151 | for j in range(len(y_true)): # 2 or 3 feature maps 152 | # shape: [13, 13, 3, 80] 153 | true_probs_temp = y_true[j][i][..., 5:] 154 | # shape: [13, 13, 3, 4] (x_center, y_center, w, h) 155 | true_boxes_temp = y_true[j][i][..., 0:4] 156 | 157 | # [13, 13, 3] 158 | object_mask = true_probs_temp.sum(axis=-1) > 0 159 | 160 | # [V, 3] V: Ground truth number of the current image 161 | true_probs_temp = true_probs_temp[object_mask] 162 | # [V, 4] 163 | true_boxes_temp = true_boxes_temp[object_mask] 164 | 165 | # [V], labels 166 | true_labels_list += np.argmax(true_probs_temp, axis=-1).tolist() 167 | # [V, 4] (x_center, y_center, w, h) 168 | true_boxes_list += true_boxes_temp.tolist() 169 | 170 | if len(true_labels_list) != 0: 171 | for cls, count in Counter(true_labels_list).items(): 172 | true_labels_dict[cls] += count 173 | 174 | # [V, 4] (xmin, ymin, xmax, ymax) 175 | true_boxes = np.array(true_boxes_list) 176 | box_centers, box_sizes = true_boxes[:, 0:2], true_boxes[:, 2:4] 177 | true_boxes[:, 0:2] = box_centers - box_sizes / 2. 178 | true_boxes[:, 2:4] = true_boxes[:, 0:2] + box_sizes 179 | 180 | # [1, xxx, 4] 181 | pred_boxes = y_pred[0][i:i + 1] 182 | pred_confs = y_pred[1][i:i + 1] 183 | pred_probs = y_pred[2][i:i + 1] 184 | 185 | # pred_boxes: [N, 4] 186 | # pred_confs: [N] 187 | # pred_labels: [N] 188 | # N: Detected box number of the current image 189 | pred_boxes, pred_confs, pred_labels = sess.run(gpu_nms_op, 190 | feed_dict={pred_boxes_flag: pred_boxes, 191 | pred_scores_flag: pred_confs * pred_probs}) 192 | # len: N 193 | pred_labels_list = [] if pred_labels is None else pred_labels.tolist() 194 | if pred_labels_list == []: 195 | continue 196 | 197 | # calc iou 198 | # [N, V] 199 | iou_matrix = calc_iou(pred_boxes, true_boxes) 200 | # [N] 201 | max_iou_idx = np.argmax(iou_matrix, axis=-1) 202 | 203 | correct_idx = [] 204 | correct_conf = [] 205 | for k in range(max_iou_idx.shape[0]): 206 | pred_labels_dict[pred_labels_list[k]] += 1 207 | match_idx = max_iou_idx[k] # V level 208 | if iou_matrix[k, match_idx] > iou_thresh and true_labels_list[match_idx] == pred_labels_list[k]: 209 | if not match_idx in correct_idx: 210 | correct_idx.append(match_idx) 211 | correct_conf.append(pred_confs[k]) 212 | else: 213 | same_idx = correct_idx.index(match_idx) 214 | if pred_confs[k] > correct_conf[same_idx]: 215 | correct_idx.pop(same_idx) 216 | correct_conf.pop(same_idx) 217 | correct_idx.append(match_idx) 218 | correct_conf.append(pred_confs[k]) 219 | 220 | for t in correct_idx: 221 | true_positive_dict[true_labels_list[t]] += 1 222 | 223 | if calc_now: 224 | # avoid divided by 0 225 | recall = sum(true_positive_dict.values()) / (sum(true_labels_dict.values()) + 1e-6) 226 | precision = sum(true_positive_dict.values()) / (sum(pred_labels_dict.values()) + 1e-6) 227 | 228 | return recall, precision 229 | else: 230 | return true_positive_dict, true_labels_dict, pred_labels_dict -------------------------------------------------------------------------------- /utils/layer_utils.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | from __future__ import division, print_function 4 | 5 | import numpy as np 6 | import tensorflow as tf 7 | slim = tf.contrib.slim 8 | 9 | def conv2d(inputs, filters, kernel_size, strides=1): 10 | def _fixed_padding(inputs, kernel_size): 11 | pad_total = kernel_size - 1 12 | pad_beg = pad_total // 2 13 | pad_end = pad_total - pad_beg 14 | 15 | padded_inputs = tf.pad(inputs, [[0, 0], [pad_beg, pad_end], 16 | [pad_beg, pad_end], [0, 0]], mode='CONSTANT') 17 | return padded_inputs 18 | if strides > 1: 19 | inputs = _fixed_padding(inputs, kernel_size) 20 | inputs = slim.conv2d(inputs, filters, kernel_size, stride=strides, 21 | padding=('SAME' if strides == 1 else 'VALID')) 22 | return inputs 23 | 24 | def darknet53_body(inputs): 25 | 26 | def res_block(inputs, filters): 27 | shortcut = inputs 28 | net = conv2d(inputs, filters * 1, 1) 29 | net = conv2d(net, filters * 2, 3) 30 | 31 | net = net + shortcut 32 | 33 | return net 34 | 35 | # first two conv2d layers 36 | net = conv2d(inputs, 32, 3, strides=1) 37 | net = conv2d(net, 64, 3, strides=2) 38 | 39 | # res_block * 1 40 | net = res_block(net, 32) 41 | 42 | net = conv2d(net, 128, 3, strides=2) 43 | 44 | # res_block * 2 45 | for i in range(2): 46 | net = res_block(net, 64) 47 | 48 | net = conv2d(net, 256, 3, strides=2) 49 | 50 | # res_block * 8 51 | for i in range(8): 52 | net = res_block(net, 128) 53 | 54 | route_1 = net 55 | net = conv2d(net, 512, 3, strides=2) 56 | 57 | # res_block * 8 58 | for i in range(8): 59 | net = res_block(net, 256) 60 | 61 | route_2 = net 62 | net = conv2d(net, 1024, 3, strides=2) 63 | 64 | # res_block * 4 65 | for i in range(4): 66 | net = res_block(net, 512) 67 | route_3 = net 68 | 69 | return route_1, route_2, route_3 70 | 71 | 72 | def darknet19_body(inputs): 73 | 74 | net = conv2d(inputs, 16, 3, strides=1) 75 | net = slim.max_pool2d(net, [2,2], stride=2, padding='SAME') 76 | # 208 x 208 77 | 78 | net = conv2d(net, 32, 3, strides=1) 79 | net = slim.max_pool2d(net, [2,2], stride=2, padding='SAME') 80 | # 104 x 104 81 | 82 | net = conv2d(net, 64, 3, strides=1) 83 | net = slim.max_pool2d(net, [2,2], stride=2, padding='SAME') 84 | route_1f = net 85 | # 52 x 52 86 | 87 | net = conv2d(net, 128, 3, strides=1) 88 | net = slim.max_pool2d(net, [2,2], stride=2, padding='SAME') 89 | route_1 = net 90 | # 26 x 26 91 | 92 | net = conv2d(net, 256, 3, strides=1) 93 | route_2f = net 94 | net = slim.max_pool2d(net, [2,2], stride=2, padding='SAME') 95 | # 13 x 13 96 | 97 | net = conv2d(net, 512, 3, strides=1) 98 | net = slim.max_pool2d(net, [2,2], stride=1, padding='SAME') 99 | # 13 x 13 100 | 101 | net = conv2d(net, 1024, 3, strides=1) 102 | route_2 = net 103 | # 13 x 13 104 | 105 | return route_1, route_1f, route_2, route_2f 106 | 107 | 108 | def yolo_block(inputs, filters): 109 | net = conv2d(inputs, filters * 1, 1) 110 | net = conv2d(net, filters * 2, 3) 111 | net = conv2d(net, filters * 1, 1) 112 | net = conv2d(net, filters * 2, 3) 113 | net = conv2d(net, filters * 1, 1) 114 | route = net 115 | net = conv2d(net, filters * 2, 3) 116 | return route, net 117 | 118 | def yolo_tiny_block(inputs, filters): 119 | net = conv2d(inputs, filters * 1, 1) 120 | route = net 121 | net = conv2d(net, filters * 2, 3) 122 | return route, net 123 | 124 | 125 | def upsample_layer(inputs, out_shape): 126 | new_height, new_width = out_shape[1], out_shape[2] 127 | # NOTE: here height is the first 128 | inputs = tf.image.resize_nearest_neighbor(inputs, (new_height, new_width), align_corners=True, name='upsampled') 129 | return inputs 130 | 131 | def PAM_layer(net): 132 | 133 | N, H, W, C = np.shape(net) 134 | 135 | net_b = slim.conv2d(net, C, 1, stride=1) 136 | net_c = slim.conv2d(net, C, 1, stride=1) 137 | net_d = slim.conv2d(net, C, 1, stride=1) 138 | 139 | net_b = tf.reshape(net_b, [-1, H*W, C]) 140 | net_c = tf.reshape(net_c, [-1, H*W, C]) 141 | net_c = tf.transpose(net_c, [0,2,1]) 142 | net_d = tf.reshape(net_d, [-1, H*W, C]) 143 | 144 | net_bc = tf.matmul(net_b,net_c) 145 | net_bcd = tf.matmul(net_bc,net_d) 146 | 147 | net_bcd = tf.reshape(net_bcd, [-1, H, W, C]) 148 | 149 | net = net_bcd * 0.2 + net 150 | 151 | return net 152 | 153 | 154 | -------------------------------------------------------------------------------- /utils/misc_utils.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | import numpy as np 4 | import tensorflow as tf 5 | import random 6 | 7 | from tensorflow.core.framework import summary_pb2 8 | 9 | 10 | def make_summary(name, val): 11 | return summary_pb2.Summary(value=[summary_pb2.Summary.Value(tag=name, simple_value=val)]) 12 | 13 | 14 | def parse_anchors(anchor_path): 15 | ''' 16 | parse anchors. 17 | returned data: shape [N, 2], dtype float32 18 | ''' 19 | anchors = np.reshape(np.asarray(open(anchor_path, 'r').read().split(','), np.float32), [-1, 2]) 20 | return anchors 21 | 22 | 23 | def read_class_names(class_name_path): 24 | names = {} 25 | with open(class_name_path, 'r') as data: 26 | for ID, name in enumerate(data): 27 | names[ID] = name.strip('\n') 28 | return names 29 | 30 | 31 | def shuffle_and_overwrite(file_name): 32 | content = open(file_name, 'r').readlines() 33 | random.shuffle(content) 34 | with open(file_name, 'w') as f: 35 | for line in content: 36 | f.write(line) 37 | 38 | 39 | def update_dict(ori_dict, new_dict): 40 | if not ori_dict: 41 | return new_dict 42 | for key in ori_dict: 43 | ori_dict[key] += new_dict[key] 44 | return ori_dict 45 | 46 | 47 | def list_add(ori_list, new_list): 48 | for i in range(len(ori_list)): 49 | ori_list[i] += new_list[i] 50 | return ori_list 51 | 52 | 53 | def load_weights(var_list, weights_file): 54 | """ 55 | Loads and converts pre-trained weights. 56 | param: 57 | var_list: list of network variables. 58 | weights_file: name of the binary file. 59 | """ 60 | with open(weights_file, "rb") as fp: 61 | np.fromfile(fp, dtype=np.int32, count=5) 62 | weights = np.fromfile(fp, dtype=np.float32) 63 | 64 | ptr = 0 65 | i = 0 66 | assign_ops = [] 67 | while i < len(var_list) - 1: 68 | var1 = var_list[i] 69 | var2 = var_list[i + 1] 70 | # do something only if we process conv layer 71 | if 'Conv' in var1.name.split('/')[-2]: 72 | # check type of next layer 73 | if 'BatchNorm' in var2.name.split('/')[-2]: 74 | # load batch norm params 75 | gamma, beta, mean, var = var_list[i + 1:i + 5] 76 | batch_norm_vars = [beta, gamma, mean, var] 77 | for var in batch_norm_vars: 78 | shape = var.shape.as_list() 79 | num_params = np.prod(shape) 80 | var_weights = weights[ptr:ptr + num_params].reshape(shape) 81 | ptr += num_params 82 | assign_ops.append(tf.assign(var, var_weights, validate_shape=True)) 83 | # we move the pointer by 4, because we loaded 4 variables 84 | i += 4 85 | elif 'Conv' in var2.name.split('/')[-2]: 86 | # load biases 87 | bias = var2 88 | bias_shape = bias.shape.as_list() 89 | bias_params = np.prod(bias_shape) 90 | bias_weights = weights[ptr:ptr + 91 | bias_params].reshape(bias_shape) 92 | ptr += bias_params 93 | assign_ops.append(tf.assign(bias, bias_weights, validate_shape=True)) 94 | # we loaded 1 variable 95 | i += 1 96 | # we can load weights of conv layer 97 | shape = var1.shape.as_list() 98 | num_params = np.prod(shape) 99 | 100 | var_weights = weights[ptr:ptr + num_params].reshape( 101 | (shape[3], shape[2], shape[0], shape[1])) 102 | # remember to transpose to column-major 103 | var_weights = np.transpose(var_weights, (2, 3, 1, 0)) 104 | ptr += num_params 105 | assign_ops.append( 106 | tf.assign(var1, var_weights, validate_shape=True)) 107 | i += 1 108 | 109 | return assign_ops 110 | 111 | 112 | def config_learning_rate(args, global_step): 113 | if args.lr_type == 'exponential': 114 | lr_tmp = tf.train.exponential_decay(args.learning_rate_init, global_step, args.lr_decay_freq, 115 | args.lr_decay_factor, staircase=True, name='exponential_learning_rate') 116 | return tf.maximum(lr_tmp, args.lr_lower_bound) 117 | elif args.lr_type == 'fixed': 118 | return tf.convert_to_tensor(args.learning_rate_init, name='fixed_learning_rate') 119 | else: 120 | raise ValueError('Unsupported learning rate type!') 121 | 122 | 123 | def config_optimizer(optimizer_name, learning_rate, decay=0.9, momentum=0.9): 124 | if optimizer_name == 'momentum': 125 | return tf.train.MomentumOptimizer(learning_rate, momentum=momentum) 126 | elif optimizer_name == 'rmsprop': 127 | return tf.train.RMSPropOptimizer(learning_rate, decay=decay, momentum=momentum) 128 | elif optimizer_name == 'adam': 129 | return tf.train.AdamOptimizer(learning_rate) 130 | elif optimizer_name == 'sgd': 131 | return tf.train.GradientDescentOptimizer(learning_rate) 132 | else: 133 | raise ValueError('Unsupported optimizer type!') -------------------------------------------------------------------------------- /utils/nms_utils.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | from __future__ import division, print_function 4 | 5 | import numpy as np 6 | import tensorflow as tf 7 | 8 | def gpu_nms(boxes, scores, num_classes, max_boxes=50, score_thresh=0.5, iou_thresh=0.5): 9 | """ 10 | Perform NMS on GPU using TensorFlow. 11 | 12 | params: 13 | boxes: tensor of shape [1, 10647, 4] # 10647=(13*13+26*26+52*52)*3, for input 416*416 image 14 | scores: tensor of shape [1, 10647, num_classes], score=conf*prob 15 | num_classes: total number of classes 16 | max_boxes: integer, maximum number of predicted boxes you'd like, default is 50 17 | score_thresh: if [ highest class probability score < score_threshold] 18 | then get rid of the corresponding box 19 | iou_thresh: real value, "intersection over union" threshold used for NMS filtering 20 | """ 21 | 22 | boxes_list, label_list, score_list = [], [], [] 23 | max_boxes = tf.constant(max_boxes, dtype='int32') 24 | 25 | # since we do nms for single image, then reshape it 26 | boxes = tf.reshape(boxes, [-1, 4]) # '-1' means we don't konw the exact number of boxes 27 | score = tf.reshape(scores, [-1, num_classes]) 28 | 29 | # Step 1: Create a filtering mask based on "box_class_scores" by using "threshold". 30 | mask = tf.greater_equal(score, tf.constant(score_thresh)) 31 | # Step 2: Do non_max_suppression for each class 32 | for i in range(num_classes): 33 | # Step 3: Apply the mask to scores, boxes and pick them out 34 | filter_boxes = tf.boolean_mask(boxes, mask[:,i]) 35 | filter_score = tf.boolean_mask(score[:,i], mask[:,i]) 36 | nms_indices = tf.image.non_max_suppression(boxes=filter_boxes, 37 | scores=filter_score, 38 | max_output_size=max_boxes, 39 | iou_threshold=iou_thresh, name='nms_indices') 40 | label_list.append(tf.ones_like(tf.gather(filter_score, nms_indices), 'int32')*i) 41 | boxes_list.append(tf.gather(filter_boxes, nms_indices)) 42 | score_list.append(tf.gather(filter_score, nms_indices)) 43 | 44 | boxes = tf.concat(boxes_list, axis=0) 45 | score = tf.concat(score_list, axis=0) 46 | label = tf.concat(label_list, axis=0) 47 | 48 | return boxes, score, label 49 | 50 | 51 | def py_nms(boxes, scores, max_boxes=50, iou_thresh=0.5): 52 | """ 53 | Pure Python NMS baseline. 54 | 55 | Arguments: boxes: shape of [-1, 4], the value of '-1' means that dont know the 56 | exact number of boxes 57 | scores: shape of [-1,] 58 | max_boxes: representing the maximum of boxes to be selected by non_max_suppression 59 | iou_thresh: representing iou_threshold for deciding to keep boxes 60 | """ 61 | assert boxes.shape[1] == 4 and len(scores.shape) == 1 62 | 63 | x1 = boxes[:, 0] 64 | y1 = boxes[:, 1] 65 | x2 = boxes[:, 2] 66 | y2 = boxes[:, 3] 67 | 68 | areas = (x2 - x1) * (y2 - y1) 69 | order = scores.argsort()[::-1] 70 | 71 | keep = [] 72 | while order.size > 0: 73 | i = order[0] 74 | keep.append(i) 75 | xx1 = np.maximum(x1[i], x1[order[1:]]) 76 | yy1 = np.maximum(y1[i], y1[order[1:]]) 77 | xx2 = np.minimum(x2[i], x2[order[1:]]) 78 | yy2 = np.minimum(y2[i], y2[order[1:]]) 79 | 80 | w = np.maximum(0.0, xx2 - xx1 + 1) 81 | h = np.maximum(0.0, yy2 - yy1 + 1) 82 | inter = w * h 83 | ovr = inter / (areas[i] + areas[order[1:]] - inter) 84 | 85 | inds = np.where(ovr <= iou_thresh)[0] 86 | order = order[inds + 1] 87 | 88 | return keep[:max_boxes] 89 | 90 | 91 | def cpu_nms(boxes, scores, num_classes, max_boxes=50, score_thresh=0.5, iou_thresh=0.5): 92 | """ 93 | Perform NMS on CPU. 94 | Arguments: 95 | boxes: shape [1, 10647, 4] 96 | scores: shape [1, 10647, num_classes] 97 | """ 98 | 99 | boxes = boxes.reshape(-1, 4) 100 | scores = scores.reshape(-1, num_classes) 101 | # Picked bounding boxes 102 | picked_boxes, picked_score, picked_label = [], [], [] 103 | 104 | for i in range(num_classes): 105 | indices = np.where(scores[:,i] >= score_thresh) 106 | filter_boxes = boxes[indices] 107 | filter_scores = scores[:,i][indices] 108 | if len(filter_boxes) == 0: 109 | continue 110 | # do non_max_suppression on the cpu 111 | indices = py_nms(filter_boxes, filter_scores, 112 | max_boxes=max_boxes, iou_thresh=iou_thresh) 113 | picked_boxes.append(filter_boxes[indices]) 114 | picked_score.append(filter_scores[indices]) 115 | picked_label.append(np.ones(len(indices), dtype='int32')*i) 116 | if len(picked_boxes) == 0: 117 | return None, None, None 118 | 119 | boxes = np.concatenate(picked_boxes, axis=0) 120 | score = np.concatenate(picked_score, axis=0) 121 | label = np.concatenate(picked_label, axis=0) 122 | 123 | return boxes, score, label -------------------------------------------------------------------------------- /utils/plot_utils.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | from __future__ import division, print_function 4 | 5 | import cv2 6 | import random 7 | 8 | 9 | def get_color_table(class_num, seed=2): 10 | random.seed(seed) 11 | color_table = {} 12 | for i in range(class_num): 13 | color_table[i] = [random.randint(0, 255) for _ in range(3)] 14 | return color_table 15 | 16 | 17 | def plot_one_box(img, coord, label=None, color=None, line_thickness=None): 18 | ''' 19 | coord: [x_min, y_min, x_max, y_max] format coordinates. 20 | img: img to plot on. 21 | label: str. The label name. 22 | color: int. color index. 23 | line_thickness: int. rectangle line thickness. 24 | ''' 25 | tl = line_thickness or int(round(0.002 * max(img.shape[0:2]))) # line thickness 26 | color = color or [random.randint(0, 255) for _ in range(3)] 27 | c1, c2 = (int(coord[0]), int(coord[1])), (int(coord[2]), int(coord[3])) 28 | cv2.rectangle(img, c1, c2, color, thickness=tl) 29 | if label: 30 | tf = max(tl - 1, 1) # font thickness 31 | t_size = cv2.getTextSize(label, 0, fontScale=float(tl) / 3, thickness=tf)[0] 32 | c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3 33 | cv2.rectangle(img, c1, c2, color, -1) # filled 34 | cv2.putText(img, label, (c1[0], c1[1] - 2), 0, float(tl) / 3, [0, 0, 0], thickness=tf, lineType=cv2.LINE_AA) 35 | 36 | -------------------------------------------------------------------------------- /video_test.py: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | 3 | from __future__ import division, print_function 4 | 5 | import tensorflow as tf 6 | import numpy as np 7 | import argparse 8 | import cv2 9 | import time 10 | 11 | from utils.misc_utils import parse_anchors, read_class_names 12 | from utils.nms_utils import gpu_nms 13 | from utils.plot_utils import get_color_table, plot_one_box 14 | 15 | from model import yolov3 16 | 17 | parser = argparse.ArgumentParser(description="YOLO-V3 video test procedure.") 18 | parser.add_argument("input_video", type=str, 19 | help="The path of the input video.") 20 | parser.add_argument("--anchor_path", type=str, default="./data/yolo_anchors.txt", 21 | help="The path of the anchor txt file.") 22 | parser.add_argument("--new_size", nargs='*', type=int, default=[416, 416], 23 | help="Resize the input image with `new_size`, size format: [width, height]") 24 | parser.add_argument("--class_name_path", type=str, default="./data/coco.names", 25 | help="The path of the class names.") 26 | parser.add_argument("--restore_path", type=str, default="./data/darknet_weights/yolov3.ckpt", 27 | help="The path of the weights to restore.") 28 | parser.add_argument("--save_video", type=lambda x: (str(x).lower() == 'true'), default=False, 29 | help="Whether to save the video detection results.") 30 | args = parser.parse_args() 31 | 32 | args.anchors = parse_anchors(args.anchor_path) 33 | args.classes = read_class_names(args.class_name_path) 34 | args.num_class = len(args.classes) 35 | 36 | color_table = get_color_table(args.num_class) 37 | 38 | vid = cv2.VideoCapture(args.input_video) 39 | video_frame_cnt = int(vid.get(7)) 40 | video_width = int(vid.get(3)) 41 | video_height = int(vid.get(4)) 42 | video_fps = int(vid.get(5)) 43 | 44 | if args.save_video: 45 | fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') 46 | videoWriter = cv2.VideoWriter('video_result.mp4', fourcc, video_fps, (video_width, video_height)) 47 | 48 | with tf.Session() as sess: 49 | input_data = tf.placeholder(tf.float32, [1, args.new_size[1], args.new_size[0], 3], name='input_data') 50 | yolo_model = yolov3(args.num_class, args.anchors) 51 | with tf.variable_scope('yolov3'): 52 | pred_feature_maps = yolo_model.forward(input_data, False) 53 | pred_boxes, pred_confs, pred_probs = yolo_model.predict(pred_feature_maps) 54 | 55 | pred_scores = pred_confs * pred_probs 56 | 57 | boxes, scores, labels = gpu_nms(pred_boxes, pred_scores, args.num_class, max_boxes=30, score_thresh=0.5, iou_thresh=0.5) 58 | 59 | saver = tf.train.Saver() 60 | saver.restore(sess, args.restore_path) 61 | 62 | for i in range(video_frame_cnt): 63 | ret, img_ori = vid.read() 64 | 65 | height_ori, width_ori = img_ori.shape[:2] 66 | img = cv2.resize(img_ori, tuple(args.new_size)) 67 | img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) 68 | img = np.asarray(img, np.float32) 69 | img = img[np.newaxis, :] / 255. 70 | 71 | start_time = time.time() 72 | boxes_, scores_, labels_ = sess.run([boxes, scores, labels], feed_dict={input_data: img}) 73 | end_time = time.time() 74 | 75 | # rescale the coordinates to the original image 76 | boxes_[:, 0] *= (width_ori/float(args.new_size[0])) 77 | boxes_[:, 2] *= (width_ori/float(args.new_size[0])) 78 | boxes_[:, 1] *= (height_ori/float(args.new_size[1])) 79 | boxes_[:, 3] *= (height_ori/float(args.new_size[1])) 80 | 81 | 82 | for i in range(len(boxes_)): 83 | x0, y0, x1, y1 = boxes_[i] 84 | plot_one_box(img_ori, [x0, y0, x1, y1], label=args.classes[labels_[i]], color=color_table[labels_[i]]) 85 | cv2.putText(img_ori, '{:.2f}ms'.format((end_time - start_time) * 1000), (40, 40), 0, 86 | fontScale=1, color=(0, 255, 0), thickness=2) 87 | cv2.imshow('image', img_ori) 88 | if args.save_video: 89 | videoWriter.write(img_ori) 90 | if cv2.waitKey(1) & 0xFF == ord('q'): 91 | break 92 | 93 | vid.release() 94 | if args.save_video: 95 | videoWriter.release() 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | --------------------------------------------------------------------------------