├── .spyproject
    ├── codestyle.ini
    ├── encoding.ini
    ├── vcs.ini
    └── workspace.ini
├── README.md
├── checkpoint
    └── yolov3_tiny_Car
    │   ├── checkpoint
    │   ├── log.txt
    │   ├── model-step_144000_loss_2.974138_lr_3.433207e-06.data-00000-of-00001
    │   ├── model-step_144000_loss_2.974138_lr_3.433207e-06.index
    │   └── model-step_144000_loss_2.974138_lr_3.433207e-06.meta
├── convert_weight.py
├── data
    ├── COCO.names
    ├── Car.names
    ├── darknet_weights
    │   └── readme
    ├── demo_data
    │   ├── car.jpg
    │   ├── dog.jpg
    │   ├── kite.jpg
    │   ├── messi.jpg
    │   └── results
    │   │   ├── dog.jpg
    │   │   ├── kite.jpg
    │   │   └── messi.jpg
    ├── logs
    │   └── readme
    ├── my_data
    │   ├── readme
    │   ├── train_COCO_tf.txt
    │   └── val_COCO_tf.txt
    ├── yolov3_anchors.txt
    ├── yolov3_tiny_COCO_anchors.txt
    └── yolov3_tiny_Car_anchors.txt
├── detection_result.jpg
├── docs
    ├── backbone.png
    └── yolo_v3_architecture.png
├── eval.py
├── get_kmeans.py
├── model
    ├── __init__.py
    ├── yolov3.py
    └── yolov3_tiny.py
├── test_single_image.py
├── test_single_image_pb.py
├── train.py
├── utils
    ├── __init__.py
    ├── data_utils.py
    ├── eval_utils.py
    ├── layer_utils.py
    ├── misc_utils.py
    ├── nms_utils.py
    └── plot_utils.py
└── video_test.py


/.spyproject/codestyle.ini:
--------------------------------------------------------------------------------
1 | [codestyle]
2 | indentation = True
3 | 
4 | [main]
5 | version = 0.1.0
6 | 
7 | 


--------------------------------------------------------------------------------
/.spyproject/encoding.ini:
--------------------------------------------------------------------------------
1 | [encoding]
2 | text_encoding = utf-8
3 | 
4 | [main]
5 | version = 0.1.0
6 | 
7 | 


--------------------------------------------------------------------------------
/.spyproject/vcs.ini:
--------------------------------------------------------------------------------
1 | [vcs]
2 | version_control_system = 
3 | use_version_control = False
4 | 
5 | [main]
6 | version = 0.1.0
7 | 
8 | 


--------------------------------------------------------------------------------
/.spyproject/workspace.ini:
--------------------------------------------------------------------------------
 1 | [workspace]
 2 | save_data_on_exit = True
 3 | save_history = True
 4 | save_non_project_files = False
 5 | restore_data_on_startup = True
 6 | 
 7 | [main]
 8 | version = 0.1.0
 9 | recent_files = ['D:\\Github\\DeepLearning\\YOLO\\YOLOv3_tiny_TensorFlow\\README.md', 'D:\\Github\\DeepLearning\\YOLO\\YOLOv3_tiny_TensorFlow\\utils\\data_utils.py', 'D:\\Github\\DeepLearning\\YOLO\\YOLOv3_tiny_TensorFlow\\test_single_image.py']
10 | 
11 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | #  YOLOv3 and YOLOv3_tiny for TensorFlow
  2 | 
  3 | ### 1. Introduction
  4 | 
  5 | Add YOLOv3_tiny and data augment(clip, brighten, change saturation)
  6 | 
  7 | ### 2. Requirements
  8 | 
  9 | - tensorflow >= 1.8.0 (lower versions may work too)
 10 | - opencv-python
 11 | 
 12 | 
 13 | ### 3. Running demos
 14 | 
 15 | (1) Single image test demo using ckpt file:
 16 | 
 17 | ```shell
 18 | python test_single_image.py ./data/demo_data/car.jpg
 19 | ```
 20 | 
 21 | (2) Single image test demo using pb file:
 22 | 
 23 | ```shell
 24 | python test_single_image_pb.py ./data/demo_data/car.jpg
 25 | ```
 26 | 
 27 | ### 4. Training
 28 | 
 29 | #### 4.1 Data preparation 
 30 | 
 31 | (1) annotation file
 32 | 
 33 | Generate `train.txt/val.txt/test.txt` files under `./data/my_data/` directory. 
 34 | One line for one image, in the format like `image_absolute_path box_1 box_2 ... box_n`. 
 35 | Box_format: `label_index x_min y_min x_max y_max`.(The origin of coordinates is at the left top corner.)
 36 | 
 37 | For example:
 38 | 
 39 | ```
 40 | xxx/xxx/1.jpg 0 453 369 473 391 1 588 245 608 268
 41 | xxx/xxx/2.jpg 1 466 403 485 422 2 793 300 809 320
 42 | ...
 43 | ```
 44 | 
 45 | **NOTE**: **You should leave a blank line at the end of each txt file.**
 46 | 
 47 | (2)  class_names file:
 48 | 
 49 | Generate the `data.names` file under `./data/my_data/` directory. Each line represents a class name.
 50 | 
 51 | For example:
 52 | 
 53 | ```
 54 | bird
 55 | car
 56 | bike
 57 | ...
 58 | ```
 59 | 
 60 | The COCO dataset class names file is placed at `./data/coco.names`.
 61 | 
 62 | (3) prior anchor file:
 63 | 
 64 | Using the kmeans algorithm to get the prior anchors:
 65 | 
 66 | ```
 67 | python get_kmeans.py
 68 | ```
 69 | 
 70 | Then you will get 9 anchors and the average IOU. Save the anchors to a txt file.
 71 | 
 72 | The COCO dataset anchors offered by YOLO v3 author is placed at `./data/yolo_anchors.txt`, you can use that one too.
 73 | 
 74 | **NOTE: The yolo anchors should be scaled to the rescaled new image size. 
 75 | Suppose your image size is [W, H], and the image will be rescale to 416*416 as input, for each generated anchor [anchor_w, anchor_h], 
 76 | you should apply the transformation anchor_w = anchor_w / W * 416, anchor_h = anchor_g / H * 416.**
 77 | 
 78 | #### 4.2 Training
 79 | 
 80 | Using `train.py`. The parameters are as following:
 81 | 
 82 | ```shell
 83 | $ python train.py -h
 84 | usage: train.py 
 85 | 
 86 |         net_name = 'the yolo model'
 87 |         anchors_name = 'the anchors name'
 88 |         body_name = 'the yolo body net'
 89 |         data_name = 'the training data name'
 90 | 
 91 | 
 92 | ```
 93 | 
 94 | Check the `train.py` for more details. You should set the parameters yourself. 
 95 | 
 96 | Some training tricks in my experiment:
 97 | 
 98 | the yolov3 using  `darknet53`, the yolov3_tiny using `darknet19`
 99 | 
100 | 
101 | 
102 | ### Credits:
103 | 
104 | I refer to many fantastic repos during the implementation:
105 | 
106 | https://github.com/wizyoung/YOLOv3_TensorFlow
107 | 
108 | 
109 | 
110 | 
111 | 
112 |  


--------------------------------------------------------------------------------
/checkpoint/yolov3_tiny_Car/checkpoint:
--------------------------------------------------------------------------------
1 | model_checkpoint_path: "model-step_144000_loss_2.974138_lr_3.433207e-06"
2 | all_model_checkpoint_paths: "model-step_136000_loss_4.582234_lr_4.75916e-06"
3 | all_model_checkpoint_paths: "model-step_138000_loss_3.534735_lr_4.386041e-06"
4 | all_model_checkpoint_paths: "model-step_140000_loss_4.178957_lr_4.042176e-06"
5 | all_model_checkpoint_paths: "model-step_142000_loss_2.621993_lr_3.725269e-06"
6 | all_model_checkpoint_paths: "model-step_144000_loss_2.974138_lr_3.433207e-06"
7 | 


--------------------------------------------------------------------------------
/checkpoint/yolov3_tiny_Car/log.txt:
--------------------------------------------------------------------------------
 1 | ===> batch recall: 0.871, batch precision: 0.864 <===
 2 | ===> Epoch: 32, global_step: 142000, recall: 0.553, precision: 0.643, total_loss: 37.199, loss_xy: 1.078, loss_wh: 1.009, loss_conf: 31.234, loss_class: 3.878
 3 | Epoch: 32, global_step: 142050, lr: 0.00000373, total_loss: 4.751, loss_xy: 0.814, loss_wh: 0.612, loss_conf: 3.294, loss_class: 0.031
 4 | Epoch: 32, global_step: 142100, lr: 0.00000373, total_loss: 3.753, loss_xy: 0.600, loss_wh: 0.523, loss_conf: 2.387, loss_class: 0.243
 5 | Epoch: 32, global_step: 142150, lr: 0.00000373, total_loss: 2.970, loss_xy: 0.620, loss_wh: 0.455, loss_conf: 1.796, loss_class: 0.099
 6 | Epoch: 32, global_step: 142200, lr: 0.00000373, total_loss: 3.949, loss_xy: 0.836, loss_wh: 0.477, loss_conf: 2.631, loss_class: 0.004
 7 | Epoch: 32, global_step: 142250, lr: 0.00000373, total_loss: 4.312, loss_xy: 0.855, loss_wh: 0.771, loss_conf: 2.665, loss_class: 0.021
 8 | Epoch: 32, global_step: 142300, lr: 0.00000373, total_loss: 3.754, loss_xy: 0.703, loss_wh: 0.544, loss_conf: 2.499, loss_class: 0.007
 9 | Epoch: 32, global_step: 142350, lr: 0.00000358, total_loss: 3.994, loss_xy: 0.756, loss_wh: 0.533, loss_conf: 2.641, loss_class: 0.064
10 | Epoch: 32, global_step: 142400, lr: 0.00000358, total_loss: 2.882, loss_xy: 0.567, loss_wh: 0.392, loss_conf: 1.883, loss_class: 0.040
11 | Epoch: 32, global_step: 142450, lr: 0.00000358, total_loss: 4.038, loss_xy: 0.650, loss_wh: 0.590, loss_conf: 2.733, loss_class: 0.065
12 | Epoch: 32, global_step: 142500, lr: 0.00000358, total_loss: 3.947, loss_xy: 0.767, loss_wh: 0.618, loss_conf: 2.494, loss_class: 0.068
13 | ===> batch recall: 0.789, batch precision: 0.793 <===
14 | Epoch: 32, global_step: 142550, lr: 0.00000358, total_loss: 2.653, loss_xy: 0.714, loss_wh: 0.633, loss_conf: 1.298, loss_class: 0.008
15 | Epoch: 32, global_step: 142600, lr: 0.00000358, total_loss: 3.421, loss_xy: 0.735, loss_wh: 0.547, loss_conf: 2.098, loss_class: 0.041
16 | Epoch: 32, global_step: 142650, lr: 0.00000358, total_loss: 5.391, loss_xy: 0.845, loss_wh: 0.732, loss_conf: 3.704, loss_class: 0.110
17 | Epoch: 32, global_step: 142700, lr: 0.00000358, total_loss: 2.978, loss_xy: 0.645, loss_wh: 0.454, loss_conf: 1.868, loss_class: 0.011
18 | Epoch: 32, global_step: 142750, lr: 0.00000358, total_loss: 5.921, loss_xy: 1.009, loss_wh: 0.837, loss_conf: 3.879, loss_class: 0.195
19 | Epoch: 32, global_step: 142800, lr: 0.00000358, total_loss: 3.452, loss_xy: 0.587, loss_wh: 0.492, loss_conf: 2.370, loss_class: 0.002
20 | Epoch: 32, global_step: 142850, lr: 0.00000358, total_loss: 3.199, loss_xy: 0.490, loss_wh: 0.345, loss_conf: 2.341, loss_class: 0.023
21 | Epoch: 32, global_step: 142900, lr: 0.00000358, total_loss: 5.232, loss_xy: 0.837, loss_wh: 0.691, loss_conf: 3.569, loss_class: 0.135
22 | Epoch: 32, global_step: 142950, lr: 0.00000358, total_loss: 4.597, loss_xy: 0.833, loss_wh: 0.648, loss_conf: 3.099, loss_class: 0.017
23 | Epoch: 32, global_step: 143000, lr: 0.00000358, total_loss: 5.087, loss_xy: 0.818, loss_wh: 0.623, loss_conf: 3.624, loss_class: 0.022
24 | ===> batch recall: 0.752, batch precision: 0.756 <===
25 | Epoch: 32, global_step: 143050, lr: 0.00000358, total_loss: 3.583, loss_xy: 0.679, loss_wh: 0.457, loss_conf: 2.438, loss_class: 0.009
26 | Epoch: 32, global_step: 143100, lr: 0.00000358, total_loss: 2.846, loss_xy: 0.587, loss_wh: 0.439, loss_conf: 1.814, loss_class: 0.006
27 | Epoch: 32, global_step: 143150, lr: 0.00000358, total_loss: 4.150, loss_xy: 0.744, loss_wh: 0.681, loss_conf: 2.655, loss_class: 0.070
28 | Epoch: 33, global_step: 143200, lr: 0.00000358, total_loss: 3.527, loss_xy: 0.616, loss_wh: 0.444, loss_conf: 2.453, loss_class: 0.014
29 | Epoch: 33, global_step: 143250, lr: 0.00000358, total_loss: 2.432, loss_xy: 0.571, loss_wh: 0.406, loss_conf: 1.454, loss_class: 0.001
30 | Epoch: 33, global_step: 143300, lr: 0.00000358, total_loss: 5.030, loss_xy: 0.716, loss_wh: 0.769, loss_conf: 3.488, loss_class: 0.056
31 | Epoch: 33, global_step: 143350, lr: 0.00000343, total_loss: 4.151, loss_xy: 0.885, loss_wh: 0.621, loss_conf: 2.626, loss_class: 0.019
32 | Epoch: 33, global_step: 143400, lr: 0.00000343, total_loss: 5.016, loss_xy: 0.822, loss_wh: 0.588, loss_conf: 3.554, loss_class: 0.052
33 | Epoch: 33, global_step: 143450, lr: 0.00000343, total_loss: 4.503, loss_xy: 0.683, loss_wh: 0.507, loss_conf: 3.309, loss_class: 0.003
34 | Epoch: 33, global_step: 143500, lr: 0.00000343, total_loss: 3.313, loss_xy: 0.680, loss_wh: 0.496, loss_conf: 2.125, loss_class: 0.012
35 | ===> batch recall: 0.832, batch precision: 0.864 <===
36 | Epoch: 33, global_step: 143550, lr: 0.00000343, total_loss: 3.279, loss_xy: 0.620, loss_wh: 0.449, loss_conf: 2.201, loss_class: 0.008
37 | Epoch: 33, global_step: 143600, lr: 0.00000343, total_loss: 3.524, loss_xy: 0.560, loss_wh: 0.536, loss_conf: 2.406, loss_class: 0.022
38 | Epoch: 33, global_step: 143650, lr: 0.00000343, total_loss: 3.361, loss_xy: 0.635, loss_wh: 0.446, loss_conf: 2.277, loss_class: 0.003
39 | Epoch: 33, global_step: 143700, lr: 0.00000343, total_loss: 2.603, loss_xy: 0.615, loss_wh: 0.449, loss_conf: 1.514, loss_class: 0.025
40 | Epoch: 33, global_step: 143750, lr: 0.00000343, total_loss: 4.765, loss_xy: 1.000, loss_wh: 0.663, loss_conf: 3.073, loss_class: 0.029
41 | Epoch: 33, global_step: 143800, lr: 0.00000343, total_loss: 4.013, loss_xy: 0.789, loss_wh: 0.639, loss_conf: 2.550, loss_class: 0.036
42 | Epoch: 33, global_step: 143850, lr: 0.00000343, total_loss: 3.597, loss_xy: 0.699, loss_wh: 0.464, loss_conf: 2.431, loss_class: 0.002
43 | Epoch: 33, global_step: 143900, lr: 0.00000343, total_loss: 3.572, loss_xy: 0.887, loss_wh: 0.593, loss_conf: 2.077, loss_class: 0.016
44 | Epoch: 33, global_step: 143950, lr: 0.00000343, total_loss: 3.375, loss_xy: 0.590, loss_wh: 0.533, loss_conf: 2.230, loss_class: 0.022
45 | Epoch: 33, global_step: 144000, lr: 0.00000343, total_loss: 2.974, loss_xy: 0.629, loss_wh: 0.550, loss_conf: 1.768, loss_class: 0.028
46 | ===> batch recall: 0.815, batch precision: 0.846 <===


--------------------------------------------------------------------------------
/checkpoint/yolov3_tiny_Car/model-step_144000_loss_2.974138_lr_3.433207e-06.data-00000-of-00001:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/checkpoint/yolov3_tiny_Car/model-step_144000_loss_2.974138_lr_3.433207e-06.data-00000-of-00001


--------------------------------------------------------------------------------
/checkpoint/yolov3_tiny_Car/model-step_144000_loss_2.974138_lr_3.433207e-06.index:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/checkpoint/yolov3_tiny_Car/model-step_144000_loss_2.974138_lr_3.433207e-06.index


--------------------------------------------------------------------------------
/checkpoint/yolov3_tiny_Car/model-step_144000_loss_2.974138_lr_3.433207e-06.meta:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/checkpoint/yolov3_tiny_Car/model-step_144000_loss_2.974138_lr_3.433207e-06.meta


--------------------------------------------------------------------------------
/convert_weight.py:
--------------------------------------------------------------------------------
 1 | # coding: utf-8
 2 | # for more details about the yolo darknet weights file, refer to
 3 | # https://itnext.io/implementing-yolo-v3-in-tensorflow-tf-slim-c3c55ff59dbe
 4 | 
 5 | from __future__ import division, print_function
 6 | 
 7 | import os
 8 | import sys
 9 | import tensorflow as tf
10 | import numpy as np
11 | 
12 | from model import yolov3
13 | from utils.misc_utils import parse_anchors, load_weights
14 | 
15 | num_class = 80
16 | img_size = 416
17 | weight_path = './data/darknet_weights/yolov3.weights'
18 | save_path = './data/darknet_weights/yolov3.ckpt'
19 | anchors = parse_anchors('./data/yolo_anchors.txt')
20 | 
21 | model = yolov3(80, anchors)
22 | with tf.Session() as sess:
23 |     inputs = tf.placeholder(tf.float32, [1, img_size, img_size, 3])
24 | 
25 |     with tf.variable_scope('yolov3'):
26 |         feature_map = model.forward(inputs)
27 | 
28 |     saver = tf.train.Saver(var_list=tf.global_variables(scope='yolov3'))
29 | 
30 |     load_ops = load_weights(tf.global_variables(scope='yolov3'), weight_path)
31 |     sess.run(load_ops)
32 |     saver.save(sess, save_path=save_path)
33 |     print('TensorFlow model checkpoint has been saved to {}'.format(save_path))
34 | 
35 | 
36 | 
37 | 


--------------------------------------------------------------------------------
/data/COCO.names:
--------------------------------------------------------------------------------
 1 | person
 2 | bicycle
 3 | car
 4 | motorbike
 5 | aeroplane
 6 | bus
 7 | train
 8 | truck
 9 | boat
10 | traffic light
11 | fire hydrant
12 | stop sign
13 | parking meter
14 | bench
15 | bird
16 | cat
17 | dog
18 | horse
19 | sheep
20 | cow
21 | elephant
22 | bear
23 | zebra
24 | giraffe
25 | backpack
26 | umbrella
27 | handbag
28 | tie
29 | suitcase
30 | frisbee
31 | skis
32 | snowboard
33 | sports ball
34 | kite
35 | baseball bat
36 | baseball glove
37 | skateboard
38 | surfboard
39 | tennis racket
40 | bottle
41 | wine glass
42 | cup
43 | fork
44 | knife
45 | spoon
46 | bowl
47 | banana
48 | apple
49 | sandwich
50 | orange
51 | broccoli
52 | carrot
53 | hot dog
54 | pizza
55 | donut
56 | cake
57 | chair
58 | sofa
59 | pottedplant
60 | bed
61 | diningtable
62 | toilet
63 | tvmonitor
64 | laptop
65 | mouse
66 | remote
67 | keyboard
68 | cell phone
69 | microwave
70 | oven
71 | toaster
72 | sink
73 | refrigerator
74 | book
75 | clock
76 | vase
77 | scissors
78 | teddy bear
79 | hair drier
80 | toothbrush
81 | 


--------------------------------------------------------------------------------
/data/Car.names:
--------------------------------------------------------------------------------
1 | bicycle
2 | car
3 | person


--------------------------------------------------------------------------------
/data/darknet_weights/readme:
--------------------------------------------------------------------------------
1 | place pretrained weights on COCO dataset here.


--------------------------------------------------------------------------------
/data/demo_data/car.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/data/demo_data/car.jpg


--------------------------------------------------------------------------------
/data/demo_data/dog.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/data/demo_data/dog.jpg


--------------------------------------------------------------------------------
/data/demo_data/kite.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/data/demo_data/kite.jpg


--------------------------------------------------------------------------------
/data/demo_data/messi.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/data/demo_data/messi.jpg


--------------------------------------------------------------------------------
/data/demo_data/results/dog.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/data/demo_data/results/dog.jpg


--------------------------------------------------------------------------------
/data/demo_data/results/kite.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/data/demo_data/results/kite.jpg


--------------------------------------------------------------------------------
/data/demo_data/results/messi.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/data/demo_data/results/messi.jpg


--------------------------------------------------------------------------------
/data/logs/readme:
--------------------------------------------------------------------------------
1 | tensorboard event files will be here.


--------------------------------------------------------------------------------
/data/my_data/readme:
--------------------------------------------------------------------------------
1 | place your data files here.
2 | 
3 | image path and name, classID, xl, yl, xr, yr, classID, xl, yl, xr, yr ...


--------------------------------------------------------------------------------
/data/yolov3_anchors.txt:
--------------------------------------------------------------------------------
1 | 10,13,16,30,33,23,30,61,62,45,59,119,116,90,156,198,373,326


--------------------------------------------------------------------------------
/data/yolov3_tiny_COCO_anchors.txt:
--------------------------------------------------------------------------------
1 | 10,14,23,27,37,58,81,82,135,169,344,319


--------------------------------------------------------------------------------
/data/yolov3_tiny_Car_anchors.txt:
--------------------------------------------------------------------------------
1 | 10,18,15,25,27,36,42,68,79,134,171,272


--------------------------------------------------------------------------------
/detection_result.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/detection_result.jpg


--------------------------------------------------------------------------------
/docs/backbone.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/docs/backbone.png


--------------------------------------------------------------------------------
/docs/yolo_v3_architecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/docs/yolo_v3_architecture.png


--------------------------------------------------------------------------------
/eval.py:
--------------------------------------------------------------------------------
  1 | # coding: utf-8
  2 | 
  3 | from __future__ import division, print_function
  4 | 
  5 | import tensorflow as tf
  6 | import numpy as np
  7 | import argparse
  8 | 
  9 | from utils.data_utils import parse_data
 10 | from utils.misc_utils import parse_anchors, read_class_names, shuffle_and_overwrite, update_dict, make_summary, config_learning_rate, config_optimizer, list_add
 11 | from utils.eval_utils import evaluate_on_cpu, evaluate_on_gpu
 12 | from utils.nms_utils import gpu_nms
 13 | 
 14 | from model.yolov3 import yolov3
 15 | 
 16 | #################
 17 | # ArgumentParser
 18 | #################
 19 | parser = argparse.ArgumentParser(description="YOLO-V3 eval procedure.")
 20 | # some paths
 21 | parser.add_argument("--eval_file", type=str, default="./data/my_data/val.txt",
 22 |                     help="The path of the validation or test txt file.")
 23 | 
 24 | parser.add_argument("--restore_path", type=str, default="./data/darknet_weights/yolov3.ckpt",
 25 |                     help="The path of the weights to restore.")
 26 | 
 27 | parser.add_argument("--anchor_path", type=str, default="./data/yolo_anchors.txt",
 28 |                     help="The path of the anchor txt file.")
 29 | 
 30 | parser.add_argument("--class_name_path", type=str, default="./data/COCO.names",
 31 |                     help="The path of the class names.")
 32 | 
 33 | # some numbers
 34 | parser.add_argument("--batch_size", type=int, default=20,
 35 |                     help="The batch size for training.")
 36 | 
 37 | parser.add_argument("--img_size", nargs='*', type=int, default=[416, 416],
 38 |                     help="Resize the input image to `img_size`, size format: [width, height]")
 39 | 
 40 | parser.add_argument("--num_threads", type=int, default=10,
 41 |                     help="Number of threads for image processing used in tf.data pipeline.")
 42 | 
 43 | parser.add_argument("--prefetech_buffer", type=int, default=3,
 44 |                     help="Prefetech_buffer used in tf.data pipeline.")
 45 | 
 46 | args = parser.parse_args()
 47 | 
 48 | # args params
 49 | args.anchors = parse_anchors(args.anchor_path)
 50 | args.classes = read_class_names(args.class_name_path)
 51 | args.class_num = len(args.classes)
 52 | args.img_cnt = len(open(args.eval_file, 'r').readlines())
 53 | args.batch_num = int(np.ceil(float(args.img_cnt) / args.batch_size))
 54 | 
 55 | # setting placeholders
 56 | is_training = tf.placeholder(dtype=tf.bool, name="phase_train")
 57 | handle_flag = tf.placeholder(tf.string, [], name='iterator_handle_flag')
 58 | 
 59 | ##################
 60 | # tf.data pipeline
 61 | ##################
 62 | 
 63 | dataset = tf.data.TextLineDataset(args.eval_file)
 64 | dataset = dataset.apply(tf.contrib.data.map_and_batch(
 65 |     lambda x: tf.py_func(parse_data, [x, args.class_num, args.img_size, args.anchors, 'val'], [tf.float32, tf.float32, tf.float32, tf.float32]),
 66 |     num_parallel_calls=args.num_threads, batch_size=args.batch_size))
 67 | dataset = dataset.prefetch(args.prefetech_buffer)
 68 | 
 69 | iterator = dataset.make_one_shot_iterator()
 70 | 
 71 | # get an element from the dataset iterator
 72 | image, y_true_13, y_true_26, y_true_52 = iterator.get_next()
 73 | y_true = [y_true_13, y_true_26, y_true_52]
 74 | 
 75 | # tf.data pipeline will lose the data shape, so we need to set it manually
 76 | image.set_shape([None, args.img_size[1], args.img_size[0], 3])
 77 | for y in y_true:
 78 |     y.set_shape([None, None, None, None, None])
 79 | 
 80 | ##################
 81 | # Model definition
 82 | ##################
 83 | 
 84 | # define yolo-v3 model here
 85 | yolo_model = yolov3(args.class_num, args.anchors)
 86 | with tf.variable_scope('yolov3'):
 87 |     pred_feature_maps = yolo_model.forward(image, is_training=is_training)
 88 | loss = yolo_model.compute_loss(pred_feature_maps, y_true)
 89 | y_pred = yolo_model.predict(pred_feature_maps)
 90 | 
 91 | ################
 92 | # register the gpu nms operation here for the following evaluation scheme
 93 | pred_boxes_flag = tf.placeholder(tf.float32, [1, None, None])
 94 | pred_scores_flag = tf.placeholder(tf.float32, [1, None, None])
 95 | gpu_nms_op = gpu_nms(pred_boxes_flag, pred_scores_flag, args.class_num)
 96 | ################
 97 | 
 98 | saver_to_restore = tf.train.Saver()
 99 | 
100 | with tf.Session() as sess:
101 |     sess.run([tf.global_variables_initializer()])
102 |     saver_to_restore.restore(sess, args.restore_path)
103 | 
104 |     print('\n----------- start to eval -----------\n')
105 | 
106 |     true_positive_dict, true_labels_dict, pred_labels_dict = {}, {}, {}
107 |     val_loss = [0., 0., 0., 0., 0.]
108 | 
109 |     for j in range(args.batch_num):
110 |         y_pred_, y_true_, loss_ = sess.run([y_pred, y_true, loss], feed_dict={is_training: False})
111 |         true_positive_dict_tmp, true_labels_dict_tmp, pred_labels_dict_tmp = \
112 |             evaluate_on_gpu(sess, gpu_nms_op, pred_boxes_flag, pred_scores_flag,
113 |                             y_pred_, y_true_, args.class_num, calc_now=False)
114 |         true_positive_dict = update_dict(true_positive_dict, true_positive_dict_tmp)
115 |         true_labels_dict = update_dict(true_labels_dict, true_labels_dict_tmp)
116 |         pred_labels_dict = update_dict(pred_labels_dict, pred_labels_dict_tmp)
117 | 
118 |         val_loss = list_add(val_loss, loss_)
119 | 
120 |     # make sure there is at least ground truth an object in each image
121 |     # avoid divided by 0
122 |     recall = float(sum(true_positive_dict.values())) / (sum(true_labels_dict.values()) + 1e-6)
123 |     precision = float(sum(true_positive_dict.values())) / (sum(pred_labels_dict.values()) + 1e-6)
124 | 
125 |     print("recall: {:.3f}, precision: {:.3f}".format(recall, precision))
126 |     print("total_loss: {:.3f}, loss_xy: {:.3f}, loss_wh: {:.3f}, loss_conf: {:.3f}, loss_class: {:.3f}".format(
127 |         val_loss[0] / args.img_cnt, val_loss[1] / args.img_cnt, val_loss[2] / args.img_cnt, val_loss[3] / args.img_cnt, val_loss[4] / args.img_cnt))


--------------------------------------------------------------------------------
/get_kmeans.py:
--------------------------------------------------------------------------------
  1 | # coding: utf-8
  2 | # This script is modified from https://github.com/lars76/kmeans-anchor-boxes
  3 | 
  4 | from __future__ import division, print_function
  5 | 
  6 | import numpy as np
  7 | 
  8 | def iou(box, clusters):
  9 |     """
 10 |     Calculates the Intersection over Union (IoU) between a box and k clusters.
 11 |     param:
 12 |         box: tuple or array, shifted to the origin (i. e. width and height)
 13 |         clusters: numpy array of shape (k, 2) where k is the number of clusters
 14 |     return:
 15 |         numpy array of shape (k, 0) where k is the number of clusters
 16 |     """
 17 |     x = np.minimum(clusters[:, 0], box[0])
 18 |     y = np.minimum(clusters[:, 1], box[1])
 19 |     if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0:
 20 |         raise ValueError("Box has no area")
 21 | 
 22 |     intersection = x * y
 23 |     box_area = box[0] * box[1]
 24 |     cluster_area = clusters[:, 0] * clusters[:, 1]
 25 | 
 26 |     iou_ = intersection / (box_area + cluster_area - intersection + 1e-10)
 27 | 
 28 |     return iou_
 29 | 
 30 | 
 31 | def avg_iou(boxes, clusters):
 32 |     """
 33 |     Calculates the average Intersection over Union (IoU) between a numpy array of boxes and k clusters.
 34 |     param:
 35 |         boxes: numpy array of shape (r, 2), where r is the number of rows
 36 |         clusters: numpy array of shape (k, 2) where k is the number of clusters
 37 |     return:
 38 |         average IoU as a single float
 39 |     """
 40 |     return np.mean([np.max(iou(boxes[i], clusters)) for i in range(boxes.shape[0])])
 41 | 
 42 | 
 43 | def translate_boxes(boxes):
 44 |     """
 45 |     Translates all the boxes to the origin.
 46 |     param:
 47 |         boxes: numpy array of shape (r, 4)
 48 |     return:
 49 |     numpy array of shape (r, 2)
 50 |     """
 51 |     new_boxes = boxes.copy()
 52 |     for row in range(new_boxes.shape[0]):
 53 |         new_boxes[row][2] = np.abs(new_boxes[row][2] - new_boxes[row][0])
 54 |         new_boxes[row][3] = np.abs(new_boxes[row][3] - new_boxes[row][1])
 55 |     return np.delete(new_boxes, [0, 1], axis=1)
 56 | 
 57 | 
 58 | def kmeans(boxes, k, dist=np.median):
 59 |     """
 60 |     Calculates k-means clustering with the Intersection over Union (IoU) metric.
 61 |     param:
 62 |         boxes: numpy array of shape (r, 2), where r is the number of rows
 63 |         k: number of clusters
 64 |         dist: distance function
 65 |     return:
 66 |         numpy array of shape (k, 2)
 67 |     """
 68 |     rows = boxes.shape[0]
 69 | 
 70 |     distances = np.empty((rows, k))
 71 |     last_clusters = np.zeros((rows,))
 72 | 
 73 |     np.random.seed()
 74 | 
 75 |     # the Forgy method will fail if the whole array contains the same rows
 76 |     clusters = boxes[np.random.choice(rows, k, replace=False)]
 77 | 
 78 |     while True:
 79 |         for row in range(rows):
 80 |             distances[row] = 1 - iou(boxes[row], clusters)
 81 | 
 82 |         nearest_clusters = np.argmin(distances, axis=1)
 83 | 
 84 |         if (last_clusters == nearest_clusters).all():
 85 |             break
 86 | 
 87 |         for cluster in range(k):
 88 |             clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)
 89 | 
 90 |         last_clusters = nearest_clusters
 91 | 
 92 |     return clusters
 93 | 
 94 | 
 95 | def parse_anno(annotation_path):
 96 |     anno = open(annotation_path, 'r')
 97 |     result = []
 98 |     for line in anno:
 99 |         s = line.strip().split(' ')
100 |         s = s[1:]
101 |         box_cnt = len(s) // 5
102 |         for i in range(box_cnt):
103 |             x_min, y_min, x_max, y_max = float(s[i*5+1]), float(s[i*5+2]), float(s[i*5+3]), float(s[i*5+4])
104 |             width = x_max - x_min
105 |             height = y_max - y_min
106 |             assert width > 0
107 |             assert height > 0
108 |             result.append([width, height])
109 |     result = np.asarray(result)
110 |     return result
111 | 
112 | 
113 | def get_kmeans(anno, cluster_num=9):
114 | 
115 |     anchors = kmeans(anno, cluster_num)
116 |     ave_iou = avg_iou(anno, anchors)
117 | 
118 |     anchors = anchors.astype('int').tolist()
119 | 
120 |     anchors = sorted(anchors, key=lambda x: x[0] * x[1])
121 | 
122 |     return anchors, ave_iou
123 | 
124 | 
125 | if __name__ == '__main__':
126 |     annotation_path = "./data/my_data/train.txt"
127 |     anno_result = parse_anno(annotation_path)
128 |     anchors, ave_iou = get_kmeans(anno_result, 9)
129 | 
130 |     anchor_string = ''
131 |     for anchor in anchors:
132 |         anchor_string += '{},{}, '.format(anchor[0], anchor[1])
133 |     anchor_string = anchor_string[:-2]
134 | 
135 |     print('anchors are:')
136 |     print(anchor_string)
137 |     print('the average iou is:')
138 |     print(ave_iou)
139 | 
140 | 


--------------------------------------------------------------------------------
/model/__init__.py:
--------------------------------------------------------------------------------
1 | #


--------------------------------------------------------------------------------
/model/yolov3.py:
--------------------------------------------------------------------------------
  1 | # coding=utf-8
  2 | # for better understanding about yolov3 architecture, refer to this website (in Chinese):
  3 | # https://blog.csdn.net/leviopku/article/details/82660381
  4 | 
  5 | from __future__ import division, print_function
  6 | 
  7 | import numpy as np
  8 | import tensorflow as tf
  9 | slim = tf.contrib.slim
 10 | 
 11 | from utils.layer_utils import conv2d, darknet53_body, yolo_block, upsample_layer
 12 | 
 13 | class yolov3(object):
 14 | 
 15 |     def __init__(self, class_num, anchors, batch_norm_decay=0.9):
 16 | 
 17 |         # self.anchors = [[10, 13], [16, 30], [33, 23],
 18 |                          # [30, 61], [62, 45], [59,  119],
 19 |                          # [116, 90], [156, 198], [373,326]]
 20 |         self.class_num = class_num
 21 |         self.anchors = anchors
 22 |         self.batch_norm_decay = batch_norm_decay
 23 | 
 24 |     def forward(self, inputs, is_training=False, reuse=False):    
 25 |         # the input img_size, form: [height, weight]
 26 |         # it will be used later
 27 |         self.img_size = tf.shape(inputs)[1:3]
 28 |         # set batch norm params
 29 |         batch_norm_params = {
 30 |             'decay': self.batch_norm_decay,
 31 |             'epsilon': 1e-05,
 32 |             'scale': True,
 33 |             'is_training': is_training,
 34 |             'fused': None,  # Use fused batch norm if possible.
 35 |         }
 36 | 
 37 |         with slim.arg_scope([slim.conv2d, slim.batch_norm],reuse=reuse):
 38 |             with slim.arg_scope([slim.conv2d], normalizer_fn=slim.batch_norm,
 39 |                                 normalizer_params=batch_norm_params,
 40 |                                 biases_initializer=None,
 41 |                                 activation_fn=lambda x: tf.nn.leaky_relu(x, alpha=0.1)):
 42 |                 with tf.variable_scope('darknet53_body'):
 43 |                     route_1, route_2, route_3 = darknet53_body(inputs)
 44 | 
 45 |                 with tf.variable_scope('yolov3_head'):
 46 |                     inter1, net = yolo_block(route_3, 512)
 47 |                     feature_map_1 = slim.conv2d(net, 3 * (5 + self.class_num), 1,
 48 |                                                 stride=1, normalizer_fn=None,
 49 |                                                 activation_fn=None, biases_initializer=tf.zeros_initializer())
 50 |                     feature_map_1 = tf.identity(feature_map_1, name='feature_map_1')
 51 | 
 52 |                     inter1 = conv2d(inter1, 256, 1)
 53 |                     inter1 = upsample_layer(inter1, route_2.get_shape().as_list())
 54 |                     concat1 = tf.concat([inter1, route_2], axis=3)
 55 | 
 56 |                     inter2, net = yolo_block(concat1, 256)
 57 |                     feature_map_2 = slim.conv2d(net, 3 * (5 + self.class_num), 1,
 58 |                                                 stride=1, normalizer_fn=None,
 59 |                                                 activation_fn=None, biases_initializer=tf.zeros_initializer())
 60 |                     feature_map_2 = tf.identity(feature_map_2, name='feature_map_2')
 61 | 
 62 |                     inter2 = conv2d(inter2, 128, 1)
 63 |                     inter2 = upsample_layer(inter2, route_1.get_shape().as_list())
 64 |                     concat2 = tf.concat([inter2, route_1], axis=3)
 65 | 
 66 |                     _, feature_map_3 = yolo_block(concat2, 128)
 67 |                     feature_map_3 = slim.conv2d(feature_map_3, 3 * (5 + self.class_num), 1,
 68 |                                                 stride=1, normalizer_fn=None,
 69 |                                                 activation_fn=None, biases_initializer=tf.zeros_initializer())
 70 |                     feature_map_3 = tf.identity(feature_map_3, name='feature_map_3')
 71 | 
 72 |             return feature_map_1, feature_map_2, feature_map_3
 73 | 
 74 |     def reorg_layer(self, feature_map, anchors):
 75 |         '''
 76 |         feature_map: a feature_map from [feature_map_1, feature_map_2, feature_map_3] returned
 77 |             from `forward` function
 78 |         anchors: shape: [3, 2]
 79 |         '''
 80 |         # NOTE: size in [h, w] format! don't get messed up!
 81 |         grid_size = feature_map.shape.as_list()[1:3]  # [13, 13]
 82 |         # the downscale ratio in height and weight
 83 |         ratio = tf.cast(self.img_size / grid_size, tf.float32)
 84 |         # rescale the anchors to the feature_map
 85 |         # NOTE: the anchor is in [w, h] format!
 86 |         rescaled_anchors = [(anchor[0] / ratio[1], anchor[1] / ratio[0]) for anchor in anchors]
 87 | 
 88 |         feature_map = tf.reshape(feature_map, [-1, grid_size[0], grid_size[1], 3, 5 + self.class_num])
 89 | 
 90 |         # split the feature_map along the last dimension
 91 |         # shape info: take 416x416 input image and the 13*13 feature_map for example:
 92 |         # box_centers: [N, 13, 13, 3, 2] last_dimension: [center_x, center_y]
 93 |         # box_sizes: [N, 13, 13, 3, 2] last_dimension: [width, height]
 94 |         # conf_logits: [N, 13, 13, 3, 1]
 95 |         # prob_logits: [N, 13, 13, 3, class_num]
 96 |         box_centers, box_sizes, conf_logits, prob_logits = tf.split(feature_map, [2, 2, 1, self.class_num], axis=-1)
 97 |         box_centers = tf.nn.sigmoid(box_centers)
 98 | 
 99 |         # use some broadcast tricks to get the mesh coordinates
100 |         grid_x = tf.range(grid_size[1], dtype=tf.int32)
101 |         grid_y = tf.range(grid_size[0], dtype=tf.int32)
102 |         grid_x, grid_y = tf.meshgrid(grid_x, grid_y)
103 |         x_offset = tf.reshape(grid_x, (-1, 1))
104 |         y_offset = tf.reshape(grid_y, (-1, 1))
105 |         x_y_offset = tf.concat([x_offset, y_offset], axis=-1)
106 |         # shape: [13, 13, 1, 2]
107 |         x_y_offset = tf.cast(tf.reshape(x_y_offset, [grid_size[0], grid_size[1], 1, 2]), tf.float32)
108 | 
109 |         # get the absolute box coordinates on the feature_map 
110 |         box_centers = box_centers + x_y_offset
111 |         # rescale to the original image scale
112 |         box_centers = box_centers * ratio[::-1]
113 | 
114 |         # avoid getting possible nan value with tf.clip_by_value
115 |         box_sizes = tf.clip_by_value(tf.exp(box_sizes), 1e-9, 50) * rescaled_anchors
116 |         # rescale to the original image scale
117 |         box_sizes = box_sizes * ratio[::-1]
118 | 
119 |         # shape: [N, 13, 13, 3, 4]
120 |         # last dimension: (center_x, center_y, w, h)
121 |         boxes = tf.concat([box_centers, box_sizes], axis=-1)
122 | 
123 |         # shape:
124 |         # x_y_offset: [13, 13, 1, 2]
125 |         # boxes: [N, 13, 13, 3, 4], rescaled to the original image scale
126 |         # conf_logits: [N, 13, 13, 3, 1]
127 |         # prob_logits: [N, 13, 13, 3, class_num]
128 |         return x_y_offset, boxes, conf_logits, prob_logits
129 |     
130 |     def reorg_layer_xx(self, feature_map, anchors):
131 |         '''
132 |         feature_map: a feature_map from [feature_map_1, feature_map_2, feature_map_3] returned
133 |             from `forward` function
134 |         anchors: shape: [3, 2]
135 |         '''
136 |         # NOTE: size in [h, w] format! don't get messed up!
137 |         grid_size = feature_map.shape.as_list()[1:3]  # [13, 13]
138 |         # the downscale ratio in height and weight
139 |         ratio = tf.cast(self.img_size / grid_size, tf.float32)
140 |         # rescale the anchors to the feature_map
141 |         # NOTE: the anchor is in [w, h] format!
142 |         rescaled_anchors = [(anchor[0] / ratio[1], anchor[1] / ratio[0]) for anchor in anchors]
143 | 
144 |         feature_map_1, feature_map_2 = tf.split(feature_map, [3*4, 3*(self.class_num+1)], axis=-1)
145 |         feature_map_1 = tf.reshape(feature_map_1, [-1, grid_size[0], grid_size[1], 3, 4])
146 |         feature_map_2 = tf.reshape(feature_map_2, [-1, grid_size[0], grid_size[1], 3, 1 + self.class_num])
147 | 
148 |         # split the feature_map along the last dimension
149 |         # shape info: take 416x416 input image and the 13*13 feature_map for example:
150 |         # box_centers: [N, 13, 13, 3, 2] last_dimension: [center_x, center_y]
151 |         # box_sizes: [N, 13, 13, 3, 2] last_dimension: [width, height]
152 |         # conf_logits: [N, 13, 13, 3, 1]
153 |         # prob_logits: [N, 13, 13, 3, class_num]
154 |         box_centers, box_sizes = tf.split(feature_map_1, [2, 2], axis=-1)
155 |         conf_logits, prob_logits = tf.split(feature_map_2, [1, self.class_num], axis=-1)
156 |         box_centers = tf.nn.sigmoid(box_centers)
157 | 
158 |         # use some broadcast tricks to get the mesh coordinates
159 |         grid_x = tf.range(grid_size[1], dtype=tf.int32)
160 |         grid_y = tf.range(grid_size[0], dtype=tf.int32)
161 |         grid_x, grid_y = tf.meshgrid(grid_x, grid_y)
162 |         x_offset = tf.reshape(grid_x, (-1, 1))
163 |         y_offset = tf.reshape(grid_y, (-1, 1))
164 |         x_y_offset = tf.concat([x_offset, y_offset], axis=-1)
165 |         # shape: [13, 13, 1, 2]
166 |         x_y_offset = tf.cast(tf.reshape(x_y_offset, [grid_size[0], grid_size[1], 1, 2]), tf.float32)
167 | 
168 |         # get the absolute box coordinates on the feature_map 
169 |         box_centers = box_centers + x_y_offset
170 |         # rescale to the original image scale
171 |         box_centers = box_centers * ratio[::-1]
172 | 
173 |         # avoid getting possible nan value with tf.clip_by_value
174 |         box_sizes = tf.clip_by_value(tf.exp(box_sizes), 1e-9, 50) * rescaled_anchors
175 |         # rescale to the original image scale
176 |         box_sizes = box_sizes * ratio[::-1]
177 | 
178 |         # shape: [N, 13, 13, 3, 4]
179 |         # last dimension: (center_x, center_y, w, h)
180 |         boxes = tf.concat([box_centers, box_sizes], axis=-1)
181 | 
182 |         # shape:
183 |         # x_y_offset: [13, 13, 1, 2]
184 |         # boxes: [N, 13, 13, 3, 4], rescaled to the original image scale
185 |         # conf_logits: [N, 13, 13, 3, 1]
186 |         # prob_logits: [N, 13, 13, 3, class_num]
187 |         return x_y_offset, boxes, conf_logits, prob_logits
188 | 
189 | 
190 |     def predict(self, feature_maps):
191 |         '''
192 |         Receive the returned feature_maps from `forward` function,
193 |         the produce the output predictions at the test stage.
194 |         '''
195 |         feature_map_1, feature_map_2, feature_map_3 = feature_maps
196 | 
197 |         feature_map_anchors = [(feature_map_1, self.anchors[6:9]),
198 |                                (feature_map_2, self.anchors[3:6]),
199 |                                (feature_map_3, self.anchors[0:3])]
200 |         reorg_results = [self.reorg_layer(feature_map, anchors) for (feature_map, anchors) in feature_map_anchors]
201 | 
202 |         def _reshape(result):
203 |             x_y_offset, boxes, conf_logits, prob_logits = result
204 |             grid_size = x_y_offset.shape.as_list()[:2]
205 |             boxes = tf.reshape(boxes, [-1, grid_size[0] * grid_size[1] * 3, 4])
206 |             conf_logits = tf.reshape(conf_logits, [-1, grid_size[0] * grid_size[1] * 3, 1])
207 |             prob_logits = tf.reshape(prob_logits, [-1, grid_size[0] * grid_size[1] * 3, self.class_num])
208 |             # shape: (take 416*416 input image and feature_map_1 for example)
209 |             # boxes: [N, 13*13*3, 4]
210 |             # conf_logits: [N, 13*13*3, 1]
211 |             # prob_logits: [N, 13*13*3, class_num]
212 |             return boxes, conf_logits, prob_logits
213 | 
214 |         boxes_list, confs_list, probs_list = [], [], []
215 |         for result in reorg_results:
216 |             boxes, conf_logits, prob_logits = _reshape(result)
217 |             confs = tf.sigmoid(conf_logits)
218 |             probs = tf.sigmoid(prob_logits)
219 |             boxes_list.append(boxes)
220 |             confs_list.append(confs)
221 |             probs_list.append(probs)
222 |         
223 |         # collect results on three scales
224 |         # take 416*416 input image for example:
225 |         # shape: [N, (13*13+26*26+52*52)*3, 4]
226 |         boxes = tf.concat(boxes_list, axis=1)
227 |         # shape: [N, (13*13+26*26+52*52)*3, 1]
228 |         confs = tf.concat(confs_list, axis=1)
229 |         # shape: [N, (13*13+26*26+52*52)*3, class_num]
230 |         probs = tf.concat(probs_list, axis=1)
231 | 
232 |         center_x, center_y, width, height = tf.split(boxes, [1, 1, 1, 1], axis=-1)
233 |         x_min = center_x - width / 2
234 |         y_min = center_y - height / 2
235 |         x_max = center_x + width / 2
236 |         y_max = center_y + height / 2
237 | 
238 |         boxes = tf.concat([x_min, y_min, x_max, y_max], axis=-1)
239 | 
240 |         return boxes, confs, probs
241 |     
242 |     def loss_layer(self, feature_map_i, y_true, anchors):
243 |         '''
244 |         calc loss function from a certain scale
245 |         '''
246 |         
247 |         # size in [h, w] format! don't get messed up!
248 |         grid_size = tf.shape(feature_map_i)[1:3]
249 |         # the downscale ratio in height and weight
250 |         ratio = tf.cast(self.img_size / grid_size, tf.float32)
251 |         # N: batch_size
252 |         N = tf.cast(tf.shape(feature_map_i)[0], tf.float32)
253 | 
254 |         x_y_offset, pred_boxes, pred_conf_logits, pred_prob_logits = self.reorg_layer(feature_map_i, anchors)
255 | 
256 |         ###########
257 |         # get mask
258 |         ###########
259 |         # shape: take 416x416 input image and 13*13 feature_map for example:
260 |         # [N, 13, 13, 3, 1]
261 |         object_mask = y_true[..., 4:5]
262 |         # shape: [N, 13, 13, 3, 4] & [N, 13, 13, 3] ==> [V, 4]
263 |         # V: num of true gt box
264 |         valid_true_boxes = tf.boolean_mask(y_true[..., 0:4], tf.cast(object_mask[..., 0], 'bool'))
265 | 
266 |         # shape: [V, 2]
267 |         valid_true_box_xy = valid_true_boxes[:, 0:2]
268 |         valid_true_box_wh = valid_true_boxes[:, 2:4]
269 |         # shape: [N, 13, 13, 3, 2]
270 |         pred_box_xy = pred_boxes[..., 0:2]
271 |         pred_box_wh = pred_boxes[..., 2:4]
272 | 
273 |         # calc iou
274 |         # shape: [N, 13, 13, 3, V]
275 |         iou = self.broadcast_iou(valid_true_box_xy, valid_true_box_wh, pred_box_xy, pred_box_wh)
276 | 
277 |         # shape: [N, 13, 13, 3]
278 |         best_iou = tf.reduce_max(iou, axis=-1)
279 | 
280 |         # get_ignore_mask
281 |         ignore_mask = tf.cast(best_iou < 0.5, tf.float32)
282 |         # shape: [N, 13, 13, 3, 1]
283 |         ignore_mask = tf.expand_dims(ignore_mask, -1)
284 | 
285 |         # get xy coordinates in one cell from the feature_map
286 |         # numerical range: 0 ~ 1
287 |         # shape: [N, 13, 13, 3, 2]
288 |         true_xy = y_true[..., 0:2] / ratio[::-1] - x_y_offset
289 |         pred_xy = pred_box_xy / ratio[::-1] - x_y_offset
290 | 
291 |         # get_tw_th
292 |         # numerical range: 0 ~ 1
293 |         # shape: [N, 13, 13, 3, 2]
294 |         true_tw_th = y_true[..., 2:4] / anchors
295 |         pred_tw_th = pred_box_wh / anchors
296 |         # for numerical stability
297 |         true_tw_th = tf.where(condition=tf.equal(true_tw_th, 0),
298 |                               x=tf.ones_like(true_tw_th), y=true_tw_th)
299 |         pred_tw_th = tf.where(condition=tf.equal(pred_tw_th, 0),
300 |                               x=tf.ones_like(pred_tw_th), y=pred_tw_th)
301 |         true_tw_th = tf.log(tf.clip_by_value(true_tw_th, 1e-9, 1e9))
302 |         pred_tw_th = tf.log(tf.clip_by_value(pred_tw_th, 1e-9, 1e9))
303 | 
304 |         # box size punishment: 
305 |         # box with smaller area has bigger weight. This is taken from the yolo darknet C source code.
306 |         # shape: [N, 13, 13, 3, 1]
307 |         box_loss_scale = 2. - (y_true[..., 2:3] / tf.cast(self.img_size[1], tf.float32)) * (y_true[..., 3:4] / tf.cast(self.img_size[0], tf.float32))
308 | 
309 |         ############
310 |         # loss_part
311 |         ############
312 |         # shape: [N, 13, 13, 3, 1]
313 |         xy_loss = tf.reduce_sum(tf.square(true_xy - pred_xy) * object_mask * box_loss_scale) / N
314 |         wh_loss = tf.reduce_sum(tf.square(true_tw_th - pred_tw_th) * object_mask * box_loss_scale) / N
315 | 
316 |         # shape: [N, 13, 13, 3, 1]
317 |         conf_pos_mask = object_mask
318 |         conf_neg_mask = (1 - object_mask) * ignore_mask
319 |         conf_loss_pos = conf_pos_mask * tf.nn.sigmoid_cross_entropy_with_logits(labels=object_mask, logits=pred_conf_logits)
320 |         conf_loss_neg = conf_neg_mask * tf.nn.sigmoid_cross_entropy_with_logits(labels=object_mask, logits=pred_conf_logits)
321 |         conf_loss = tf.reduce_sum(conf_loss_pos + conf_loss_neg) / N
322 | 
323 |         # shape: [N, 13, 13, 3, 1]
324 |         class_loss = object_mask * tf.nn.sigmoid_cross_entropy_with_logits(labels=y_true[..., 5:], logits=pred_prob_logits)
325 |         class_loss = tf.reduce_sum(class_loss) / N
326 | 
327 |         return xy_loss, wh_loss, conf_loss, class_loss
328 | 
329 |     
330 |     def compute_loss(self, y_pred, y_true):
331 |         '''
332 |         param:
333 |             y_pred: returned feature_map list by `forward` function: [feature_map_1, feature_map_2, feature_map_3]
334 |             y_true: input y_true by the tf.data pipeline
335 |         '''
336 |         loss_xy, loss_wh, loss_conf, loss_class = 0., 0., 0., 0.
337 |         anchor_group = [self.anchors[6:9], self.anchors[3:6], self.anchors[0:3]]
338 | 
339 |         # calc loss in 3 scales
340 |         for i in range(len(y_pred)):
341 |             result = self.loss_layer(y_pred[i], y_true[i], anchor_group[i])
342 |             loss_xy += result[0]
343 |             loss_wh += result[1]
344 |             loss_conf += result[2]
345 |             loss_class += result[3]
346 |         total_loss = loss_xy + loss_wh + loss_conf + loss_class
347 |         return [total_loss, loss_xy, loss_wh, loss_conf, loss_class]
348 | 
349 | 
350 | 
351 |     def broadcast_iou(self, true_box_xy, true_box_wh, pred_box_xy, pred_box_wh):
352 |         '''
353 |         maintain an efficient way to calculate the ios matrix between ground truth true boxes and the predicted boxes
354 |         note: here we only care about the size match
355 |         '''
356 |         # shape:
357 |         # true_box_??: [V, 2]
358 |         # pred_box_??: [N, 13, 13, 3, 2]
359 | 
360 |         # shape: [N, 13, 13, 3, 1, 2]
361 |         pred_box_xy = tf.expand_dims(pred_box_xy, -2)
362 |         pred_box_wh = tf.expand_dims(pred_box_wh, -2)
363 | 
364 |         # shape: [1, V, 2]
365 |         true_box_xy = tf.expand_dims(true_box_xy, 0)
366 |         true_box_wh = tf.expand_dims(true_box_wh, 0)
367 | 
368 |         # [N, 13, 13, 3, 1, 2] & [1, V, 2] ==> [N, 13, 13, 3, V, 2]
369 |         intersect_mins = tf.maximum(pred_box_xy - pred_box_wh / 2.,
370 |                                     true_box_xy - true_box_wh / 2.)
371 |         intersect_maxs = tf.minimum(pred_box_xy + pred_box_wh / 2.,
372 |                                     true_box_xy + true_box_wh / 2.)
373 |         intersect_wh = tf.maximum(intersect_maxs - intersect_mins, 0.)
374 | 
375 |         # shape: [N, 13, 13, 3, V]
376 |         intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
377 |         # shape: [N, 13, 13, 3, 1]
378 |         pred_box_area = pred_box_wh[..., 0] * pred_box_wh[..., 1]
379 |         # shape: [1, V]
380 |         true_box_area = true_box_wh[..., 0] * true_box_wh[..., 1]
381 | 
382 |         # [N, 13, 13, 3, V]
383 |         iou = intersect_area / (pred_box_area + true_box_area - intersect_area + 1e-10)
384 | 
385 |         return iou
386 | 


--------------------------------------------------------------------------------
/model/yolov3_tiny.py:
--------------------------------------------------------------------------------
  1 | # coding=utf-8
  2 | # for better understanding about yolov3 architecture, refer to this website (in Chinese):
  3 | # https://blog.csdn.net/leviopku/article/details/82660381
  4 | 
  5 | from __future__ import division, print_function
  6 | 
  7 | import numpy as np
  8 | import tensorflow as tf
  9 | slim = tf.contrib.slim
 10 | 
 11 | from utils.layer_utils import conv2d, darknet19_body, yolo_tiny_block, upsample_layer
 12 | 
 13 | class yolov3_tiny(object):
 14 | 
 15 |     def __init__(self, class_num, anchors, batch_norm_decay=0.9):
 16 | 
 17 |         # self.anchors = [[10, 18], [15, 25], [27, 36],
 18 |                          # [42, 68], [79, 134], [171, 272],
 19 | 
 20 |         self.class_num = class_num
 21 |         self.anchors = anchors
 22 |         self.batch_norm_decay = batch_norm_decay
 23 | 
 24 |     def forward(self, inputs, is_training=False, reuse=False):
 25 |         # the input img_size, form: [height, weight]
 26 |         # it will be used later
 27 |         self.img_size = tf.shape(inputs)[1:3]
 28 |         # set batch norm params
 29 |         batch_norm_params = {
 30 |             'decay': self.batch_norm_decay,
 31 |             'epsilon': 1e-05,
 32 |             'scale': True,
 33 |             'is_training': is_training,
 34 |             'fused': None,  # Use fused batch norm if possible.
 35 |         }
 36 | 
 37 |         with slim.arg_scope([slim.conv2d, slim.batch_norm],reuse=reuse):
 38 |             with slim.arg_scope([slim.conv2d], normalizer_fn=slim.batch_norm,
 39 |                                 normalizer_params=batch_norm_params,
 40 |                                 biases_initializer=None,
 41 |                                 activation_fn=lambda x: tf.nn.leaky_relu(x, alpha=0.1)):
 42 |                                                 
 43 |                 with tf.variable_scope('darknet19_body'):
 44 |                     route_1,_, route_2,_ = darknet19_body(inputs)
 45 | 
 46 |                 with tf.variable_scope('yolov3_tiny_head'):
 47 |                     inter1, net = yolo_tiny_block(route_2, 256)
 48 |                     feature_map_1 = slim.conv2d(net, 3 * (5 + self.class_num), 1,
 49 |                                                 stride=1, normalizer_fn=None,
 50 |                                                 activation_fn=None, biases_initializer=tf.zeros_initializer())
 51 |                     feature_map_1 = tf.identity(feature_map_1, name='feature_map_1')
 52 | 
 53 |                     inter1 = conv2d(inter1, 128, 1, strides=1)
 54 |                     inter1 = upsample_layer(inter1, route_1.get_shape().as_list())
 55 |                     concat1 = tf.concat([inter1, route_1], axis=3)
 56 | 
 57 |                     net = conv2d(concat1, 256, 3, strides=1)
 58 |                     feature_map_2 = slim.conv2d(net, 3 * (5 + self.class_num), 1,
 59 |                                                 stride=1, normalizer_fn=None,
 60 |                                                 activation_fn=None, biases_initializer=tf.zeros_initializer())
 61 |                     feature_map_2 = tf.identity(feature_map_2, name='feature_map_2')
 62 | 
 63 | 
 64 |             return feature_map_1, feature_map_2
 65 | 
 66 |     def reorg_layer(self, feature_map, anchors):
 67 |         '''
 68 |         feature_map: a feature_map from [feature_map_1, feature_map_2, feature_map_3] returned
 69 |             from `forward` function
 70 |         anchors: shape: [3, 2]
 71 |         '''
 72 |         # NOTE: size in [h, w] format! don't get messed up!
 73 |         grid_size = feature_map.shape.as_list()[1:3]  # [13, 13]
 74 |         # the downscale ratio in height and weight
 75 |         ratio = tf.cast(self.img_size / grid_size, tf.float32)
 76 |         # rescale the anchors to the feature_map
 77 |         # NOTE: the anchor is in [w, h] format!
 78 |         rescaled_anchors = [(anchor[0] / ratio[1], anchor[1] / ratio[0]) for anchor in anchors]
 79 | 
 80 |         feature_map = tf.reshape(feature_map, [-1, grid_size[0], grid_size[1], 3, 5 + self.class_num])
 81 | 
 82 |         # split the feature_map along the last dimension
 83 |         # shape info: take 416x416 input image and the 13*13 feature_map for example:
 84 |         # box_centers: [N, 13, 13, 3, 2] last_dimension: [center_x, center_y]
 85 |         # box_sizes: [N, 13, 13, 3, 2] last_dimension: [width, height]
 86 |         # conf_logits: [N, 13, 13, 3, 1]
 87 |         # prob_logits: [N, 13, 13, 3, class_num]
 88 |         box_centers, box_sizes, conf_logits, prob_logits = tf.split(feature_map, [2, 2, 1, self.class_num], axis=-1)
 89 |         box_centers = tf.nn.sigmoid(box_centers)
 90 | 
 91 |         # use some broadcast tricks to get the mesh coordinates
 92 |         grid_x = tf.range(grid_size[1], dtype=tf.int32)
 93 |         grid_y = tf.range(grid_size[0], dtype=tf.int32)
 94 |         grid_x, grid_y = tf.meshgrid(grid_x, grid_y)
 95 |         x_offset = tf.reshape(grid_x, (-1, 1))
 96 |         y_offset = tf.reshape(grid_y, (-1, 1))
 97 |         x_y_offset = tf.concat([x_offset, y_offset], axis=-1)
 98 |         # shape: [13, 13, 1, 2]
 99 |         x_y_offset = tf.cast(tf.reshape(x_y_offset, [grid_size[0], grid_size[1], 1, 2]), tf.float32)
100 | 
101 |         # get the absolute box coordinates on the feature_map 
102 |         box_centers = box_centers + x_y_offset
103 |         # rescale to the original image scale
104 |         box_centers = box_centers * ratio[::-1]
105 | 
106 |         # avoid getting possible nan value with tf.clip_by_value        
107 |         box_sizes = tf.clip_by_value(tf.exp(box_sizes), 1e-9, 50) * rescaled_anchors
108 |         # rescale to the original image scale
109 |         box_sizes = box_sizes * ratio[::-1]
110 | 
111 |         # shape: [N, 13, 13, 3, 4]
112 |         # last dimension: (center_x, center_y, w, h)
113 |         boxes = tf.concat([box_centers, box_sizes], axis=-1)
114 | 
115 |         # shape:
116 |         # x_y_offset: [13, 13, 1, 2]
117 |         # boxes: [N, 13, 13, 3, 4], rescaled to the original image scale
118 |         # conf_logits: [N, 13, 13, 3, 1]
119 |         # prob_logits: [N, 13, 13, 3, class_num]
120 |         return x_y_offset, boxes, conf_logits, prob_logits
121 |     
122 |     def reorg_layer_xx(self, feature_map, anchors):
123 |         '''
124 |         feature_map: a feature_map from [feature_map_1, feature_map_2, feature_map_3] returned
125 |             from `forward` function
126 |         anchors: shape: [3, 2]
127 |         '''
128 |         # NOTE: size in [h, w] format! don't get messed up!
129 |         grid_size = feature_map.shape.as_list()[1:3]  # [13, 13]
130 |         # the downscale ratio in height and weight
131 |         ratio = tf.cast(self.img_size / grid_size, tf.float32)
132 |         # rescale the anchors to the feature_map
133 |         # NOTE: the anchor is in [w, h] format!
134 |         rescaled_anchors = [(anchor[0] / ratio[1], anchor[1] / ratio[0]) for anchor in anchors]
135 | 
136 |         feature_map_1, feature_map_2 = tf.split(feature_map, [3*4, 3*(self.class_num+1)], axis=-1)
137 |         feature_map_1 = tf.reshape(feature_map_1, [-1, grid_size[0], grid_size[1], 3, 4])
138 |         feature_map_2 = tf.reshape(feature_map_2, [-1, grid_size[0], grid_size[1], 3, 1 + self.class_num])
139 | 
140 |         # split the feature_map along the last dimension
141 |         # shape info: take 416x416 input image and the 13*13 feature_map for example:
142 |         # box_centers: [N, 13, 13, 3, 2] last_dimension: [center_x, center_y]
143 |         # box_sizes: [N, 13, 13, 3, 2] last_dimension: [width, height]
144 |         # conf_logits: [N, 13, 13, 3, 1]
145 |         # prob_logits: [N, 13, 13, 3, class_num]
146 |         box_centers, box_sizes = tf.split(feature_map_1, [2, 2], axis=-1)
147 |         conf_logits, prob_logits = tf.split(feature_map_2, [1, self.class_num], axis=-1)
148 |         box_centers = tf.nn.sigmoid(box_centers)
149 | 
150 |         # use some broadcast tricks to get the mesh coordinates
151 |         grid_x = tf.range(grid_size[1], dtype=tf.int32)
152 |         grid_y = tf.range(grid_size[0], dtype=tf.int32)
153 |         grid_x, grid_y = tf.meshgrid(grid_x, grid_y)
154 |         x_offset = tf.reshape(grid_x, (-1, 1))
155 |         y_offset = tf.reshape(grid_y, (-1, 1))
156 |         x_y_offset = tf.concat([x_offset, y_offset], axis=-1)
157 |         # shape: [13, 13, 1, 2]
158 |         x_y_offset = tf.cast(tf.reshape(x_y_offset, [grid_size[0], grid_size[1], 1, 2]), tf.float32)
159 | 
160 |         # get the absolute box coordinates on the feature_map 
161 |         box_centers = box_centers + x_y_offset
162 |         # rescale to the original image scale
163 |         box_centers = box_centers * ratio[::-1]
164 | 
165 |         # avoid getting possible nan value with tf.clip_by_value
166 |         box_sizes = tf.clip_by_value(tf.exp(box_sizes), 1e-9, 50) * rescaled_anchors
167 |         # rescale to the original image scale
168 |         box_sizes = box_sizes * ratio[::-1]
169 | 
170 |         # shape: [N, 13, 13, 3, 4]
171 |         # last dimension: (center_x, center_y, w, h)
172 |         boxes = tf.concat([box_centers, box_sizes], axis=-1)
173 | 
174 |         # shape:
175 |         # x_y_offset: [13, 13, 1, 2]
176 |         # boxes: [N, 13, 13, 3, 4], rescaled to the original image scale
177 |         # conf_logits: [N, 13, 13, 3, 1]
178 |         # prob_logits: [N, 13, 13, 3, class_num]
179 |         return x_y_offset, boxes, conf_logits, prob_logits
180 | 
181 | 
182 |     def predict(self, feature_maps):
183 |         '''
184 |         Receive the returned feature_maps from `forward` function,
185 |         the produce the output predictions at the test stage.
186 |         '''
187 |         feature_map_1, feature_map_2 = feature_maps
188 | 
189 |         feature_map_anchors = [(feature_map_1, self.anchors[3:6]),
190 |                                (feature_map_2, self.anchors[0:3])]
191 |         reorg_results = [self.reorg_layer(feature_map, anchors) for (feature_map, anchors) in feature_map_anchors]
192 | 
193 |         def _reshape(result):
194 |             x_y_offset, boxes, conf_logits, prob_logits = result
195 |             grid_size = x_y_offset.shape.as_list()[:2]
196 |             boxes = tf.reshape(boxes, [-1, grid_size[0] * grid_size[1] * 3, 4])
197 |             conf_logits = tf.reshape(conf_logits, [-1, grid_size[0] * grid_size[1] * 3, 1])
198 |             prob_logits = tf.reshape(prob_logits, [-1, grid_size[0] * grid_size[1] * 3, self.class_num])
199 |             # shape: (take 416*416 input image and feature_map_1 for example)
200 |             # boxes: [N, 13*13*3, 4]
201 |             # conf_logits: [N, 13*13*3, 1]
202 |             # prob_logits: [N, 13*13*3, class_num]
203 |             return boxes, conf_logits, prob_logits
204 | 
205 |         boxes_list, confs_list, probs_list = [], [], []
206 |         for result in reorg_results:
207 |             boxes, conf_logits, prob_logits = _reshape(result)
208 |             confs = tf.sigmoid(conf_logits)
209 |             probs = tf.sigmoid(prob_logits)
210 |             boxes_list.append(boxes)
211 |             confs_list.append(confs)
212 |             probs_list.append(probs)
213 |         
214 |         # collect results on three scales
215 |         # take 416*416 input image for example:
216 |         # shape: [N, (13*13+26*26+52*52)*3, 4]
217 |         boxes = tf.concat(boxes_list, axis=1)
218 |         # shape: [N, (13*13+26*26+52*52)*3, 1]
219 |         confs = tf.concat(confs_list, axis=1)
220 |         # shape: [N, (13*13+26*26+52*52)*3, class_num]
221 |         probs = tf.concat(probs_list, axis=1)
222 | 
223 |         center_x, center_y, width, height = tf.split(boxes, [1, 1, 1, 1], axis=-1)
224 |         x_min = center_x - width / 2
225 |         y_min = center_y - height / 2
226 |         x_max = center_x + width / 2
227 |         y_max = center_y + height / 2
228 | 
229 |         boxes = tf.concat([x_min, y_min, x_max, y_max], axis=-1)
230 | 
231 |         return boxes, confs, probs
232 |     
233 |     def loss_layer(self, feature_map_i, y_true, anchors):
234 |         '''
235 |         calc loss function from a certain scale
236 |         '''
237 |         
238 |         # size in [h, w] format! don't get messed up!
239 |         grid_size = tf.shape(feature_map_i)[1:3]
240 |         # the downscale ratio in height and weight
241 |         ratio = tf.cast(self.img_size / grid_size, tf.float32)
242 |         # N: batch_size
243 |         N = tf.cast(tf.shape(feature_map_i)[0], tf.float32)
244 | 
245 |         x_y_offset, pred_boxes, pred_conf_logits, pred_prob_logits = self.reorg_layer(feature_map_i, anchors)
246 | 
247 |         ###########
248 |         # get mask
249 |         ###########
250 |         # shape: take 416x416 input image and 13*13 feature_map for example:
251 |         # [N, 13, 13, 3, 1]
252 |         object_mask = y_true[..., 4:5]
253 |         # shape: [N, 13, 13, 3, 4] & [N, 13, 13, 3] ==> [V, 4]
254 |         # V: num of true gt box
255 |         valid_true_boxes = tf.boolean_mask(y_true[..., 0:4], tf.cast(object_mask[..., 0], 'bool'))
256 | 
257 |         # shape: [V, 2]
258 |         valid_true_box_xy = valid_true_boxes[:, 0:2]
259 |         valid_true_box_wh = valid_true_boxes[:, 2:4]
260 |         # shape: [N, 13, 13, 3, 2]
261 |         pred_box_xy = pred_boxes[..., 0:2]
262 |         pred_box_wh = pred_boxes[..., 2:4]
263 | 
264 |         # calc iou
265 |         # shape: [N, 13, 13, 3, V]
266 |         iou = self.broadcast_iou(valid_true_box_xy, valid_true_box_wh, pred_box_xy, pred_box_wh)
267 | 
268 |         # shape: [N, 13, 13, 3]
269 |         best_iou = tf.reduce_max(iou, axis=-1)
270 | 
271 |         # get_ignore_mask
272 |         ignore_mask = tf.cast(best_iou < 0.5, tf.float32)
273 |         # shape: [N, 13, 13, 3, 1]
274 |         ignore_mask = tf.expand_dims(ignore_mask, -1)
275 | 
276 |         # get xy coordinates in one cell from the feature_map
277 |         # numerical range: 0 ~ 1
278 |         # shape: [N, 13, 13, 3, 2]
279 |         true_xy = y_true[..., 0:2] / ratio[::-1] - x_y_offset
280 |         pred_xy = pred_box_xy / ratio[::-1] - x_y_offset
281 | 
282 |         # get_tw_th
283 |         # numerical range: 0 ~ 1
284 |         # shape: [N, 13, 13, 3, 2]
285 |         true_tw_th = y_true[..., 2:4] / anchors
286 |         pred_tw_th = pred_box_wh / anchors
287 |         # for numerical stability
288 |         true_tw_th = tf.where(condition=tf.equal(true_tw_th, 0),
289 |                               x=tf.ones_like(true_tw_th), y=true_tw_th)
290 |         pred_tw_th = tf.where(condition=tf.equal(pred_tw_th, 0),
291 |                               x=tf.ones_like(pred_tw_th), y=pred_tw_th)
292 |         true_tw_th = tf.log(tf.clip_by_value(true_tw_th, 1e-9, 1e9))
293 |         pred_tw_th = tf.log(tf.clip_by_value(pred_tw_th, 1e-9, 1e9))
294 | 
295 |         # box size punishment: 
296 |         # box with smaller area has bigger weight. This is taken from the yolo darknet C source code.
297 |         # shape: [N, 13, 13, 3, 1]
298 |         box_loss_scale = 2. - (y_true[..., 2:3] / tf.cast(self.img_size[1], tf.float32)) * (y_true[..., 3:4] / tf.cast(self.img_size[0], tf.float32))
299 | 
300 |         ############
301 |         # loss_part
302 |         ############
303 |         # shape: [N, 13, 13, 3, 1]
304 |         xy_loss = tf.reduce_sum(tf.square(true_xy - pred_xy) * object_mask * box_loss_scale) / N
305 |         wh_loss = tf.reduce_sum(tf.square(true_tw_th - pred_tw_th) * object_mask * box_loss_scale) / N
306 | 
307 |         # shape: [N, 13, 13, 3, 1]
308 |         conf_pos_mask = object_mask
309 |         conf_neg_mask = (1 - object_mask) * ignore_mask
310 |         conf_loss_pos = conf_pos_mask * tf.nn.sigmoid_cross_entropy_with_logits(labels=object_mask, logits=pred_conf_logits)
311 |         conf_loss_neg = conf_neg_mask * tf.nn.sigmoid_cross_entropy_with_logits(labels=object_mask, logits=pred_conf_logits)
312 |         conf_loss = tf.reduce_sum(conf_loss_pos + conf_loss_neg) / N
313 | 
314 |         # shape: [N, 13, 13, 3, 1]
315 |         class_loss = object_mask * tf.nn.sigmoid_cross_entropy_with_logits(labels=y_true[..., 5:], logits=pred_prob_logits)
316 |         class_loss = tf.reduce_sum(class_loss) / N
317 | 
318 |         return xy_loss, wh_loss, conf_loss, class_loss
319 | 
320 |     
321 |     def compute_loss(self, y_pred, y_true):
322 |         '''
323 |         param:
324 |             y_pred: returned feature_map list by `forward` function: [feature_map_1, feature_map_2, feature_map_3]
325 |             y_true: input y_true by the tf.data pipeline
326 |         '''
327 |         loss_xy, loss_wh, loss_conf, loss_class = 0., 0., 0., 0.
328 |         anchor_group = [self.anchors[3:6], self.anchors[0:3]]
329 | 
330 |         # calc loss in 2 scales
331 |         for i in range(len(y_pred)):
332 |             result = self.loss_layer(y_pred[i], y_true[i], anchor_group[i])
333 |             loss_xy += result[0]
334 |             loss_wh += result[1]
335 |             loss_conf += result[2]
336 |             loss_class += result[3]
337 |         total_loss = loss_xy + loss_wh + loss_conf + loss_class
338 |         return [total_loss, loss_xy, loss_wh, loss_conf, loss_class]
339 | 
340 | 
341 | 
342 |     def broadcast_iou(self, true_box_xy, true_box_wh, pred_box_xy, pred_box_wh):
343 |         '''
344 |         maintain an efficient way to calculate the ios matrix between ground truth true boxes and the predicted boxes
345 |         note: here we only care about the size match
346 |         '''
347 |         # shape:
348 |         # true_box_??: [V, 2]
349 |         # pred_box_??: [N, 13, 13, 3, 2]
350 | 
351 |         # shape: [N, 13, 13, 3, 1, 2]
352 |         pred_box_xy = tf.expand_dims(pred_box_xy, -2)
353 |         pred_box_wh = tf.expand_dims(pred_box_wh, -2)
354 | 
355 |         # shape: [1, V, 2]
356 |         true_box_xy = tf.expand_dims(true_box_xy, 0)
357 |         true_box_wh = tf.expand_dims(true_box_wh, 0)
358 | 
359 |         # [N, 13, 13, 3, 1, 2] & [1, V, 2] ==> [N, 13, 13, 3, V, 2]
360 |         intersect_mins = tf.maximum(pred_box_xy - pred_box_wh / 2.,
361 |                                     true_box_xy - true_box_wh / 2.)
362 |         intersect_maxs = tf.minimum(pred_box_xy + pred_box_wh / 2.,
363 |                                     true_box_xy + true_box_wh / 2.)
364 |         intersect_wh = tf.maximum(intersect_maxs - intersect_mins, 0.)
365 | 
366 |         # shape: [N, 13, 13, 3, V]
367 |         intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
368 |         # shape: [N, 13, 13, 3, 1]
369 |         pred_box_area = pred_box_wh[..., 0] * pred_box_wh[..., 1]
370 |         # shape: [1, V]
371 |         true_box_area = true_box_wh[..., 0] * true_box_wh[..., 1]
372 | 
373 |         # [N, 13, 13, 3, V]
374 |         iou = intersect_area / (pred_box_area + true_box_area - intersect_area + 1e-10)
375 | 
376 |         return iou
377 | 
378 | 
379 | 


--------------------------------------------------------------------------------
/test_single_image.py:
--------------------------------------------------------------------------------
 1 | ####
 2 | 
 3 | # edicted by Huangdebo
 4 | # test the model using ckpt file
 5 | 
 6 | # ***
 7 | 
 8 | from __future__ import division, print_function
 9 | 
10 | import tensorflow as tf
11 | import numpy as np
12 | import argparse
13 | import cv2
14 | 
15 | from utils.misc_utils import parse_anchors, read_class_names
16 | from utils.nms_utils import gpu_nms
17 | from utils.plot_utils import get_color_table, plot_one_box
18 | 
19 | from model.yolov3 import yolov3
20 | from model.yolov3_tiny import yolov3_tiny
21 | 
22 | net_name = 'yolov3_tiny'
23 | body_name = 'darknet19'
24 | data_name = 'Car'
25 | ckpt_name = 'model-step_144000_loss_2.974138_lr_3.433207e-06'
26 | 
27 | 
28 | parser = argparse.ArgumentParser(description="%s test single image test procedure."%net_name)
29 | parser.add_argument("--input_image", type=str, default="./data/demo_data/car.jpg",
30 |                     help="The path of the input image.")
31 | parser.add_argument("--anchor_path", type=str, default="./data/%s_%s_anchors.txt"%(net_name,data_name),
32 |                     help="The path of the anchor txt file.")
33 | parser.add_argument("--new_size", nargs='*', type=int, default=[416, 416],
34 |                     help="Resize the input image with `new_size`, size format: [width, height]")
35 | parser.add_argument("--class_name_path", type=str, default="./data/%s.names"%data_name,
36 |                     help="The path of the class names.")
37 | parser.add_argument("--restore_path", type=str, default="./checkpoint/%s_%s/%s"%(net_name,data_name,ckpt_name),
38 |                     help="The path of the weights to restore.")
39 | args = parser.parse_args()
40 | 
41 | args.anchors = parse_anchors(args.anchor_path)
42 | args.classes = read_class_names(args.class_name_path)
43 | args.num_class = len(args.classes)
44 | 
45 | color_table = get_color_table(args.num_class)
46 | img_ori = cv2.imread(args.input_image)
47 | height_ori, width_ori = img_ori.shape[:2]
48 | img = cv2.resize(img_ori, tuple(args.new_size))
49 | img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
50 | img = np.asarray(img, np.float32)
51 | img = img[np.newaxis, :] / 255.
52 | 
53 | with tf.Session() as sess:
54 |     input_data = tf.placeholder(tf.float32, [1, args.new_size[1], args.new_size[0], 3], name='input_data')
55 | #    yolo_model = yolov3(args.num_class, args.anchors)
56 |     yolo_model = yolov3_tiny(args.num_class, args.anchors)
57 |     with tf.variable_scope(net_name):
58 |         pred_feature_maps = yolo_model.forward(input_data, False)
59 |         
60 |     pred_boxes, pred_confs, pred_probs = yolo_model.predict(pred_feature_maps)
61 | 
62 |     pred_scores = pred_confs * pred_probs
63 | 
64 |     boxes, scores, labels = gpu_nms(pred_boxes, pred_scores, args.num_class, max_boxes=200, score_thresh=0.4, iou_thresh=0.5)
65 | 
66 |     saver = tf.train.Saver()
67 |     saver.restore(sess, args.restore_path)
68 | 
69 |     boxes_, scores_, labels_ = sess.run([boxes, scores, labels], feed_dict={input_data: img})
70 | 
71 |     # rescale the coordinates to the original image
72 |     boxes_[:, 0] *= (width_ori/float(args.new_size[0]))
73 |     boxes_[:, 2] *= (width_ori/float(args.new_size[0]))
74 |     boxes_[:, 1] *= (height_ori/float(args.new_size[1]))
75 |     boxes_[:, 3] *= (height_ori/float(args.new_size[1]))
76 | 
77 |     print("box coords:")
78 |     print(boxes_)
79 |     print('*' * 30)
80 |     print("scores:")
81 |     print(scores_)
82 |     print('*' * 30)
83 |     print("labels:")
84 |     print(labels_)
85 | 
86 |     for i in range(len(boxes_)):
87 |         x0, y0, x1, y1 = boxes_[i]
88 |         plot_one_box(img_ori, [x0, y0, x1, y1], label=args.classes[labels_[i]], color=color_table[labels_[i]])
89 |     cv2.imshow('Detection result', img_ori)
90 |     cv2.imwrite('detection_result.jpg', img_ori)
91 |     cv2.waitKey(0)
92 | 


--------------------------------------------------------------------------------
/test_single_image_pb.py:
--------------------------------------------------------------------------------
 1 | ####
 2 | 
 3 | # edicted by Huangdebo
 4 | # test the model using pb file
 5 | 
 6 | # ***
 7 | 
 8 | import numpy as np
 9 | import tensorflow as tf
10 | from tensorflow.gfile import GFile
11 | import cv2 as cv
12 | 
13 | from utils.nms_utils import gpu_nms
14 | from utils.plot_utils import get_color_table, plot_one_box
15 | 
16 | # The file path to save the data
17 | save_file = 'xxx.pb'
18 | 
19 | num_class = 3
20 | classes = ['bicycle', 'car', 'person']
21 | color_table = get_color_table(num_class)
22 | 
23 | 
24 | sess = tf.Session()
25 | with GFile(save_file, 'rb') as f:
26 |     graph_def = tf.GraphDef()
27 |     graph_def.ParseFromString(f.read())
28 |     sess.graph.as_default()
29 |     tf.import_graph_def(graph_def, name='') # import the net
30 |     
31 |  
32 | # initializer 
33 | sess.run(tf.global_variables_initializer())
34 |  
35 |  
36 | # input
37 | image = sess.graph.get_tensor_by_name('inputs:0')
38 | phase_train = sess.graph.get_tensor_by_name('phase_train:0')
39 | boxes = sess.graph.get_tensor_by_name('boxes:0')
40 | confs = sess.graph.get_tensor_by_name('confs:0')
41 | probs = sess.graph.get_tensor_by_name('probs:0')
42 | 
43 | pred_scores = confs * probs
44 | boxes, scores, labels = gpu_nms(boxes, pred_scores, 3, max_boxes=50, score_thresh=0.4, iou_thresh=0.5)
45 | 
46 | img_ori = cv.imread('./data/demo_data/1.jpg')
47 | height_ori, width_ori = img_ori.shape[:2]
48 | size=[416, 416]
49 | im = cv.resize(img_ori, size)
50 |  
51 | boxes_, scores_, labels_  = sess.run([boxes, confs, probs],  feed_dict={phase_train:False, image: im})
52 | 
53 | 
54 | # rescale the coordinates to the original image
55 | boxes_[:, 0] *= (width_ori/float(size[0]))
56 | boxes_[:, 2] *= (width_ori/float(size[0]))
57 | boxes_[:, 1] *= (height_ori/float(size[1]))
58 | boxes_[:, 3] *= (height_ori/float(size[1]))
59 | 
60 | print("box coords:")
61 | print(boxes_)
62 | print('*' * 30)
63 | print("scores:")
64 | print(scores_)
65 | print('*' * 30)
66 | print("labels:")
67 | print(labels_)
68 | 
69 | for i in range(len(boxes_)):
70 |     x0, y0, x1, y1 = boxes_[i]
71 |     plot_one_box(img_ori, [x0, y0, x1, y1], label=classes[labels_[i]], color=color_table[labels_[i]])
72 | cv.imshow('Detection result', img_ori)
73 | cv.imwrite('detection_result.jpg', img_ori)
74 | cv.waitKey(0)
75 | 
76 | 
77 | 
78 | 


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | # coding: utf-8
  2 | 
  3 | from __future__ import division, print_function
  4 | 
  5 | import tensorflow as tf
  6 | from tensorflow.python.framework import graph_util
  7 | import numpy as np
  8 | import argparse
  9 | import logging
 10 | 
 11 | from utils.data_utils import parse_data
 12 | from utils.misc_utils import parse_anchors, read_class_names, shuffle_and_overwrite, update_dict, make_summary, config_learning_rate, config_optimizer, list_add
 13 | from utils.eval_utils import evaluate_on_cpu, evaluate_on_gpu
 14 | from utils.nms_utils import gpu_nms
 15 | 
 16 | from model.yolov3 import yolov3
 17 | from model.yolov3_tiny import yolov3_tiny
 18 | 
 19 | net_name = 'yolov3_tiny'
 20 | anchors_name = 'yolov3_tiny'
 21 | body_name = 'darknet19'
 22 | data_name = 'COCO'
 23 | 
 24 | restore_path = ''
 25 | is_train_scratch = True
 26 | 
 27 | #################
 28 | # ArgumentParser
 29 | #################
 30 | parser = argparse.ArgumentParser(description="%s training procedure."%net_name)
 31 | # some paths
 32 | parser.add_argument("--train_file", type=str, default="./data/my_data/train_%s_tf.txt"%data_name,
 33 |                     help="The path of the training txt file.")
 34 | 
 35 | parser.add_argument("--val_file", type=str, default="./data/my_data/val_%s_tf.txt"%data_name,
 36 |                     help="The path of the validation txt file.")
 37 | 
 38 | parser.add_argument("--restore_path", type=str, default=restore_path,
 39 |                     help="The path of the weights to restore.")
 40 | 
 41 | parser.add_argument("--save_dir", type=str, default="./checkpoint/%s_%s/"%(net_name,data_name),
 42 |                     help="The directory of the weights to save.")
 43 | 
 44 | parser.add_argument("--log_dir", type=str, default="./data/logs/%s_%s/"%(net_name,data_name),
 45 |                     help="The directory to store the tensorboard log files.")
 46 | 
 47 | parser.add_argument("--progress_log_path", type=str, default="./data/%s_progress.log"%data_name,
 48 |                     help="The path to record the training progress.")
 49 | 
 50 | parser.add_argument("--anchor_path", type=str, default="./data/%s_%s_anchors.txt"%(anchors_name,data_name),
 51 |                     help="The path of the anchor txt file.")
 52 | 
 53 | parser.add_argument("--class_name_path", type=str, default="./data/%s.names"%data_name,
 54 |                     help="The path of the class names.")
 55 | 
 56 | # some numbers
 57 | parser.add_argument("--batch_size", type=int, default=32,
 58 |                     help="The batch size for training.")
 59 | 
 60 | parser.add_argument("--img_size", nargs='*', type=int, default=[416, 416],
 61 |                     help="Resize the input image to `img_size`, size format: [width, height]")
 62 | 
 63 | parser.add_argument("--total_epoches", type=int, default=100,
 64 |                     help="Total epoches to train.")
 65 | 
 66 | parser.add_argument("--train_evaluation_freq", type=int, default=500,
 67 |                     help="Evaluate on the training batch after some steps.")
 68 | 
 69 | parser.add_argument("--print_evaluation_freq", type=int, default=50,
 70 |                     help="Print the result of evaluate on the training batch after some steps.")
 71 | 
 72 | parser.add_argument("--val_evaluation_freq", type=int, default=2000,
 73 |                     help="Evaluate on the whole validation dataset after some steps.")
 74 | 
 75 | parser.add_argument("--save_freq", type=int, default=2000,
 76 |                     help="Save the model after some steps.")
 77 | 
 78 | parser.add_argument("--num_threads", type=int, default=4,
 79 |                     help="Number of threads for image processing used in tf.data pipeline.")
 80 | 
 81 | parser.add_argument("--prefetech_buffer", type=int, default=3,
 82 |                     help="Prefetech_buffer used in tf.data pipeline.")
 83 | 
 84 | # learning rate and optimizer
 85 | parser.add_argument("--optimizer_name", type=str, default='adam',
 86 |                     help="The optimizer name. Chosen from [sgd, momentum, adam, rmsprop]")
 87 | 
 88 | parser.add_argument("--save_optimizer", type=lambda x: (str(x).lower() == 'true'), default=False,
 89 |                     help="Whether to save the optimizer parameters into the checkpoint file.")
 90 | 
 91 | parser.add_argument("--learning_rate_init", type=float, default=1.0e-3,
 92 |                     help="The initial learning rate.")
 93 | 
 94 | parser.add_argument("--lr_type", type=str, default='exponential',
 95 |                     help="The learning rate type. Chosen from [fixed, exponential]")
 96 | 
 97 | parser.add_argument("--lr_decay_freq", type=int, default=1000,
 98 |                     help="The learning rate decay frequency. Used when chosen exponential lr_type.")
 99 | 
100 | parser.add_argument("--lr_decay_factor", type=float, default=0.96,
101 |                     help="The learning rate decay factor. Used when chosen exponential lr_type.")
102 | 
103 | parser.add_argument("--lr_lower_bound", type=float, default=1e-8,
104 |                     help="The minimum learning rate. Used when chosen exponential lr type.")
105 | 
106 | # finetune
107 | parser.add_argument("--restore_part", nargs='*', type=str, default=['%s/%s_body'%(net_name,body_name)],
108 |                     help="Partially restore part of the model for finetuning. Set [None] to restore the whole model.")
109 | 
110 | parser.add_argument("--update_part", nargs='*', type=str, default=['%s/%s_head'%(net_name,net_name)],
111 |                     help="Partially restore part of the model for finetuning. Set [None] to train the whole model.")
112 | 
113 | # warm up strategy
114 | parser.add_argument("--use_warm_up", type=lambda x: (str(x).lower() == 'true'), default=True,
115 |                     help="Whether to use warm up strategy.")
116 | 
117 | parser.add_argument("--warm_up_lr", type=float, default=5e-5,
118 |                     help="Warm up learning rate.")
119 | 
120 | parser.add_argument("--warm_up_epoch", type=int, default=1,
121 |                     help="Warm up training epoches.")
122 | args = parser.parse_args()
123 | 
124 | # args params
125 | args.anchors = parse_anchors(args.anchor_path)
126 | args.classes = read_class_names(args.class_name_path)
127 | args.class_num = len(args.classes)
128 | args.train_img_cnt = len(open(args.train_file, 'r').readlines())
129 | args.val_img_cnt = len(open(args.val_file, 'r').readlines())
130 | args.train_batch_num = int(np.floor(float(args.train_img_cnt) / args.batch_size))
131 | args.val_batch_num = int(np.floor(float(args.val_img_cnt) / args.batch_size))
132 | 
133 | # setting loggers
134 | logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(levelname)s %(message)s',
135 |                     datefmt='%a, %d %b %Y %H:%M:%S', filename=args.progress_log_path, filemode='w')
136 | 
137 | # setting placeholders
138 | is_training = tf.placeholder(dtype=tf.bool, name="phase_train")
139 | handle_flag = tf.placeholder(tf.string, [], name='iterator_handle_flag')
140 | 
141 | ##################
142 | # tf.data pipeline
143 | ##################
144 | # Selecting `feedable iterator` to switch between training dataset and validation dataset
145 | 
146 | # manually shuffle the train txt file because tf.data.shuffle is soooo slow!!
147 | # you can google it for more details.
148 | shuffle_and_overwrite(args.train_file)
149 | train_dataset = tf.data.TextLineDataset(args.train_file)
150 | train_dataset = train_dataset.apply(tf.data.experimental.map_and_batch(
151 |     lambda x: tf.py_func(parse_data, [x, args.class_num, args.img_size, args.anchors, 'train'], [tf.float32, tf.float32, tf.float32, tf.float32]),
152 |     num_parallel_calls=args.num_threads, batch_size=args.batch_size))
153 | train_dataset = train_dataset.prefetch(args.prefetech_buffer)
154 | 
155 | val_dataset = tf.data.TextLineDataset(args.val_file)
156 | val_dataset = val_dataset.apply(tf.data.experimental.map_and_batch(
157 |     lambda x: tf.py_func(parse_data, [x, args.class_num, args.img_size, args.anchors, 'val'], [tf.float32, tf.float32, tf.float32, tf.float32]),
158 |     num_parallel_calls=args.num_threads, batch_size=args.batch_size))
159 | val_dataset.prefetch(args.prefetech_buffer)
160 | 
161 | # creating two dataset iterators
162 | train_iterator = train_dataset.make_initializable_iterator()
163 | val_iterator = val_dataset.make_initializable_iterator()
164 | 
165 | # creating two dataset handles
166 | train_handle = train_iterator.string_handle()
167 | val_handle = val_iterator.string_handle()
168 | # select a specific iterator based on the passed handle
169 | dataset_iterator = tf.data.Iterator.from_string_handle(handle_flag, train_dataset.output_types,
170 |                                                        train_dataset.output_shapes)
171 | 
172 | # get an element from the choosed dataset iterator
173 | image, y_true_13, y_true_26, y_true_52 = dataset_iterator.get_next()
174 | 
175 | if (net_name == 'yolov3'):    
176 |     y_true = [y_true_13, y_true_26, y_true_52]
177 | else:    
178 |     y_true = [y_true_13, y_true_26]
179 | 
180 | # tf.data pipeline will lose the data shape, so we need to set it manually
181 | image.set_shape([None, args.img_size[1], args.img_size[0], 3])
182 | for y in y_true:
183 |     y.set_shape([None, None, None, None, None])
184 | 
185 | ##################
186 | # Model definition
187 | ##################
188 | 
189 | # define yolo-v3 model here
190 | #yolo_model = yolov3(args.class_num, args.anchors)
191 | yolo_model = yolov3_tiny(args.class_num, args.anchors)
192 | 
193 | 
194 | # the input variables name of .pb
195 | image = tf.identity(image, name='inputs')
196 | with tf.variable_scope(net_name):
197 |     pred_feature_maps = yolo_model.forward(image, is_training=is_training)
198 | loss = yolo_model.compute_loss(pred_feature_maps, y_true)
199 | y_pred = yolo_model.predict(pred_feature_maps)
200 | 
201 | # the output variables name of .pb
202 | boxes, confs, probs = y_pred
203 | boxes = tf.identity(boxes, name='boxes')
204 | confs = tf.identity(confs, name='confs')
205 | probs = tf.identity(probs, name='probs')
206 | 
207 | ################
208 | # register the gpu nms operation here for the following evaluation scheme
209 | pred_boxes_flag = tf.placeholder(tf.float32, [1, None, None])
210 | pred_scores_flag = tf.placeholder(tf.float32, [1, None, None])
211 | gpu_nms_op = gpu_nms(pred_boxes_flag, pred_scores_flag, args.class_num)
212 | ################
213 | 
214 | # train the whole model from scratch
215 | if is_train_scratch == True:
216 |     args.restore_part = ['None']
217 |     args.update_part = ['None']
218 | 
219 | if args.restore_part == ['None']:
220 |     args.restore_part = [None]
221 | if args.update_part == ['None']:
222 |     args.update_part = [None]
223 | saver_to_restore = tf.train.Saver(var_list=tf.contrib.framework.get_variables_to_restore(include=args.restore_part))
224 | update_vars = tf.contrib.framework.get_variables_to_restore(include=args.update_part)
225 | 
226 | tf.summary.scalar('train_batch_statistics/total_loss', loss[0])
227 | tf.summary.scalar('train_batch_statistics/loss_xy', loss[1])
228 | tf.summary.scalar('train_batch_statistics/loss_wh', loss[2])
229 | tf.summary.scalar('train_batch_statistics/loss_conf', loss[3])
230 | tf.summary.scalar('train_batch_statistics/loss_class', loss[4])
231 | 
232 | global_step = tf.Variable(0, trainable=False, collections=[tf.GraphKeys.LOCAL_VARIABLES])
233 | if args.use_warm_up:
234 |     learning_rate = tf.cond(tf.less(global_step, args.train_batch_num * args.warm_up_epoch), 
235 |         lambda: args.warm_up_lr, lambda: config_learning_rate(args, global_step - args.train_batch_num * args.warm_up_epoch))
236 | else:
237 |     learning_rate = config_learning_rate(args, global_step)
238 | tf.summary.scalar('learning_rate', learning_rate)
239 | 
240 | if not args.save_optimizer:
241 |     saver_to_save = tf.train.Saver()
242 | 
243 | optimizer = config_optimizer(args.optimizer_name, learning_rate)
244 | 
245 | if args.save_optimizer:
246 |     saver_to_save = tf.train.Saver()
247 | 
248 | # set dependencies for BN ops
249 | update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
250 | with tf.control_dependencies(update_ops):
251 |     train_op = optimizer.minimize(loss[0], var_list=update_vars, global_step=global_step)
252 | 
253 | 
254 | with tf.Session() as sess:
255 |     sess.run([tf.global_variables_initializer(), tf.local_variables_initializer(), train_iterator.initializer])
256 |     train_handle_value, val_handle_value = sess.run([train_handle, val_handle])
257 |     if args.restore_path:
258 |         saver_to_restore.restore(sess, args.restore_path)
259 |         
260 |     merged = tf.summary.merge_all()
261 |     writer = tf.summary.FileWriter(args.log_dir, sess.graph)
262 | 
263 |     print('\n----------- start to train -----------\n')
264 | 
265 |     for epoch in range(args.total_epoches):
266 |         for i in range(args.train_batch_num):
267 |             _, summary, y_pred_, y_true_, loss_, global_step_, lr = sess.run([train_op, merged, y_pred, y_true, loss, global_step, learning_rate],
268 |                                                                              feed_dict={is_training: True, handle_flag: train_handle_value})
269 |             writer.add_summary(summary, global_step=global_step_)
270 |             
271 |             # print the evaluation
272 |             if global_step_ % args.print_evaluation_freq == 0  and global_step_ > 0:                
273 |                 info = "Epoch: {}, global_step: {}, lr: {:.8f}, total_loss: {:.3f}, loss_xy: {:.3f}, loss_wh: {:.3f}, loss_conf: {:.3f}, loss_class: {:.3f}".format(
274 |                     epoch, global_step_, lr, loss_[0], loss_[1], loss_[2], loss_[3], loss_[4])
275 |                 print(info)
276 |                 logging.info(info)
277 | 
278 |             # evaluation on the training batch
279 |             if global_step_ % args.train_evaluation_freq == 0 and global_step_ > 0:
280 |                 # recall, precision = evaluate_on_cpu(y_pred_, y_true_, args.class_num, calc_now=True)
281 |                 recall, precision = evaluate_on_gpu(sess, gpu_nms_op, pred_boxes_flag, pred_scores_flag, y_pred_, y_true_, args.class_num, calc_now=True)
282 |                 info = "===> batch recall: {:.3f}, batch precision: {:.3f} <===".format(recall, precision)
283 |                 print(info)
284 |                 logging.info(info)
285 | 
286 |                 writer.add_summary(make_summary('evaluation/train_batch_recall', recall), global_step=global_step_)
287 |                 writer.add_summary(make_summary('evaluation/train_batch_precision', precision), global_step=global_step_)
288 | 
289 |             # start to save
290 |             # NOTE: this is just demo. You can set the conditions when to save the weights.
291 |             if global_step_ % args.save_freq == 0 and global_step_ > 0:
292 |                 if loss_[0] <= 10. or (epoch > 20 and global_step_ % (args.save_freq * 4) == 0):
293 |                     saver_to_save.save(sess, args.save_dir + 'model-step_{}_loss_{:4f}_lr_{:.7g}'.format(global_step_, loss_[0], lr))
294 | 
295 |             # switch to validation dataset for evaluation
296 |             if global_step_ % args.val_evaluation_freq == 0 and global_step_ > 0:
297 |                 sess.run(val_iterator.initializer)
298 |                 true_positive_dict, true_labels_dict, pred_labels_dict = {}, {}, {}
299 |                 val_loss = [0., 0., 0., 0., 0.]
300 |                 for j in range(args.val_batch_num):
301 |                     y_pred_, y_true_, loss_ = sess.run([y_pred, y_true, loss],
302 |                                                         feed_dict={is_training: False, handle_flag: val_handle_value})
303 |                     true_positive_dict_tmp, true_labels_dict_tmp, pred_labels_dict_tmp = \
304 |                         evaluate_on_gpu(sess, gpu_nms_op, pred_boxes_flag, pred_scores_flag,
305 |                                         y_pred_, y_true_, args.class_num, calc_now=False)
306 |                     true_positive_dict = update_dict(true_positive_dict, true_positive_dict_tmp)
307 |                     true_labels_dict = update_dict(true_labels_dict, true_labels_dict_tmp)
308 |                     pred_labels_dict = update_dict(pred_labels_dict, pred_labels_dict_tmp)
309 | 
310 |                     val_loss = list_add(val_loss, loss_)
311 | 
312 |                 # make sure there is at least one ground truth object in each image
313 |                 # avoid divided by 0
314 |                 recall = float(sum(true_positive_dict.values())) / (sum(true_labels_dict.values()) + 1e-6)
315 |                 precision = float(sum(true_positive_dict.values())) / (sum(pred_labels_dict.values()) + 1e-6)
316 | 
317 |                 info = "===> Epoch: {}, global_step: {}, recall: {:.3f}, precision: {:.3f}, total_loss: {:.3f}, loss_xy: {:.3f}, loss_wh: {:.3f}, loss_conf: {:.3f}, loss_class: {:.3f}".format(
318 |                     epoch, global_step_, recall, precision, val_loss[0] / args.val_batch_num, val_loss[1] / args.val_batch_num, val_loss[2] / args.val_batch_num, val_loss[3] / args.val_batch_num, val_loss[4] / args.val_batch_num)
319 |                 print(info)
320 |                 logging.info(info)
321 |                 writer.add_summary(make_summary('evaluation/val_recall', recall), global_step=epoch)
322 |                 writer.add_summary(make_summary('evaluation/val_precision', precision), global_step=epoch)
323 | 
324 |                 writer.add_summary(make_summary('validation_statistics/total_loss', val_loss[0] / args.val_batch_num), global_step=epoch)
325 |                 writer.add_summary(make_summary('validation_statistics/loss_xy', val_loss[1] / args.val_batch_num), global_step=epoch)
326 |                 writer.add_summary(make_summary('validation_statistics/loss_wh', val_loss[2] / args.val_batch_num), global_step=epoch)
327 |                 writer.add_summary(make_summary('validation_statistics/loss_conf', val_loss[3] / args.val_batch_num), global_step=epoch)
328 |                 writer.add_summary(make_summary('validation_statistics/loss_class', val_loss[4] / args.val_batch_num), global_step=epoch)
329 | 
330 |         # manually shuffle the training data in a new epoch
331 |         shuffle_and_overwrite(args.train_file)
332 |         sess.run(train_iterator.initializer)
333 |         
334 |     # save the pb， "boxes","confs","probs" 为输出变量
335 |     print('saving the pb ...')
336 |     constant_graph = graph_util.convert_variables_to_constants(sess, sess.graph_def, ['boxes', 'confs','probs'])    
337 |     with tf.gfile.FastGFile(args.save_dir+'model.pb', mode='wb') as f:
338 |         f.write(constant_graph.SerializeToString())
339 | 


--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Huangdebo/YOLOv3_tiny_TensorFlow/6b4676a33b0431760a4ea7382eba7bd2673ffbb2/utils/__init__.py


--------------------------------------------------------------------------------
/utils/data_utils.py:
--------------------------------------------------------------------------------
  1 | # coding: utf-8
  2 | 
  3 | from __future__ import division, print_function
  4 | 
  5 | import numpy as np
  6 | import tensorflow as tf
  7 | import cv2
  8 | import random
  9 | 
 10 | # 只针对一个样本的信息
 11 | def parse_line(line):
 12 |     '''
 13 |     Given a line from the training/test txt file, return parsed
 14 |     pic_path, boxes info, and label info.
 15 |     return:
 16 |         pic_path: string.
 17 |         boxes: shape [N, 4], N is the ground truth count, elements in the second
 18 |             dimension are [x_min, y_min, x_max, y_max]
 19 |     '''
 20 |     line = line.decode()
 21 |     s = line.strip().split(' ')
 22 |     pic_path = s[0]
 23 |     s = s[1:]
 24 |     box_cnt = len(s) // 5
 25 |     boxes = []
 26 |     labels = []
 27 |     for i in range(box_cnt):
 28 |         label, x_min, y_min, x_max, y_max = int(s[i*5]), float(s[i*5+1]), float(s[i*5+2]), float(s[i*5+3]), float(s[i*5+4])
 29 |         boxes.append([x_min, y_min, x_max, y_max])
 30 |         labels.append(label)
 31 |     boxes = np.asarray(boxes, np.float32)
 32 |     labels = np.asarray(labels, np.int64)
 33 |     return pic_path, boxes, labels
 34 | 
 35 | 
 36 | def resize_image_and_correct_boxes(img, boxes, img_size):
 37 |     # convert gray scale image to 3-channel fake RGB image
 38 |     if len(img) == 2:
 39 |         img = np.expand_dims(img, -1)
 40 |     ori_height, ori_width = np.shape(img)[:2]
 41 |     new_width, new_height = img_size
 42 |     # shape to (new_height, new_width)
 43 |     img = cv2.resize(img, (new_width, new_height))
 44 | 
 45 |     # convert to float
 46 |     img = np.asarray(img, np.float32)
 47 |     
 48 |     # boxes
 49 |     # xmin, xmax
 50 |     boxes[:, 0] = boxes[:, 0] / ori_width * new_width
 51 |     boxes[:, 2] = boxes[:, 2] / ori_width * new_width
 52 |     # ymin, ymax
 53 |     boxes[:, 1] = boxes[:, 1] / ori_height * new_height
 54 |     boxes[:, 3] = boxes[:, 3] / ori_height * new_height
 55 | 
 56 |     return img, boxes
 57 | 
 58 | 
 59 | def data_augmentation(img, boxes, label):
 60 |     '''
 61 |     Do your own data augmentation here.
 62 |     note: if use the clip, the data_augmentation() must before the resize_image_and_correct_boxes()
 63 |     param:
 64 |         img: a [H, W, 3] shape RGB format image, float32 dtype
 65 |         boxes: [N, 4] shape boxes coordinate info, N is the ground truth box number,
 66 |             4 elements in the second dimension are [x_min, y_min, x_max, y_max], float32 dtype
 67 |         label: [N] shape labels, int64 dtype (you should not convert to int32)
 68 |     '''
 69 |     
 70 |     # randomly clip the image
 71 |     
 72 |     # img, boxes = clip_image(img, boxes)
 73 |     
 74 |     # randomly change the bright
 75 |     img = bright_image(img)
 76 |     
 77 |     # randomly change the saturation
 78 |     img = saturation_image(img)
 79 |     
 80 |     return img, boxes, label
 81 | 
 82 | # randomly clip the image
 83 | def clip_image(img, boxes):
 84 |     
 85 |     xmin = 10000
 86 |     ymin = 10000
 87 |     xmax = 0
 88 |     ymax = 0
 89 |     for i in range(len(boxes)): 
 90 |         if boxes[i][0] < xmin:
 91 |             xmin = boxes[i][0]
 92 |         
 93 |         if boxes[i][1] < ymin:
 94 |             ymin = boxes[i][1]
 95 |             
 96 |         if boxes[i][2] > xmax:
 97 |             xmax = boxes[i][2]
 98 |             
 99 |         if boxes[i][3] > ymax:
100 |             ymax = boxes[i][3]
101 |          
102 |     max_h = int(np.shape(img)[0]/6)
103 |     max_w = int(np.shape(img)[1]/6)
104 |     rand_l = random.randint(0, max_w)
105 |     rand_l = min(rand_l, xmin)
106 |     rand_r = random.randint(0, max_w)
107 |     rand_r = min(rand_r, np.shape(img)[1]-xmax)
108 |     rand_u = random.randint(0, max_h)
109 |     rand_u = min(rand_u, ymin)
110 |     rand_d = random.randint(0, max_h)
111 |     rand_d = min(rand_d, np.shape(img)[0]-ymax)
112 |       
113 |     img_clip = img[rand_u:np.shape(img)[0]-rand_d, rand_l:np.shape(img)[1]-rand_r]
114 |     
115 |     for i in range(len(boxes)):    
116 |         boxes[i][0] = max(boxes[i][0]-rand_l, 0)
117 |         boxes[i][1] = max(boxes[i][1]-rand_u, 0)
118 |         boxes[i][2] = min(boxes[i][2]-rand_l, np.shape(img_clip)[1]-1)
119 |         boxes[i][3] = min(boxes[i][3]-rand_u, np.shape(img_clip)[0]-1)
120 | 
121 |     return img_clip, boxes
122 | 
123 | # randomly change the bright
124 | def bright_image(img, adjust=0.25):
125 |     
126 |     a = random.uniform(0, adjust)
127 |     flag = random.randint(-1, 1)
128 |            
129 |     img_u = (255-img)*a
130 |     img_d = img*a    
131 |     
132 |     delta = np.minimum(img_u, img_d)
133 |     img_new = img + flag*delta
134 |                         
135 |     return img_new
136 |   
137 |     
138 | # randomly change the saturation
139 | def saturation_image(img, adjust=0.1):
140 |            
141 |     a_0 = random.uniform(1-adjust, 1+adjust)
142 |     a_1 = random.uniform(1-adjust, 1+adjust)
143 |     a_2 = random.uniform(1-adjust, 1+adjust)
144 |     
145 |     img_new = np.ones(img.shape) * 255
146 |     img_new[:,:,0] = np.minimum(img[:,:,0] * a_0, img_new[:,:,0])
147 |     img_new[:,:,1] = np.minimum(img[:,:,1] * a_1, img_new[:,:,1])
148 |     img_new[:,:,2] = np.minimum(img[:,:,2] * a_2, img_new[:,:,2])
149 |     
150 |     return img_new
151 |     
152 | 
153 | # 只针对一个样本的信息，成成 ground truth box 对应的网格信息，没有物体的网格值为 0
154 | def process_box(boxes, labels, img_size, class_num, anchors):
155 |     '''
156 |     Generate the y_true label, i.e. the ground truth feature_maps in 3 different scales.
157 |     '''
158 |     anchors_mask = [[6,7,8], [3,4,5], [0,1,2]]
159 | 
160 |     # convert boxes form:
161 |     # shape: [N, 2]
162 |     # (x_center, y_center)
163 |     box_centers = (boxes[:, 0:2] + boxes[:, 2:4]) / 2
164 |     # (width, height)
165 |     box_sizes = boxes[:, 2:4] - boxes[:, 0:2]
166 | 
167 |     # [13, 13, 3, 3+num_class]
168 |     y_true_13 = np.zeros((img_size[1] // 32, img_size[0] // 32, 3, 5 + class_num), np.float32)
169 |     y_true_26 = np.zeros((img_size[1] // 16, img_size[0] // 16, 3, 5 + class_num), np.float32)
170 |     y_true_52 = np.zeros((img_size[1] // 8, img_size[0] // 8, 3, 5 + class_num), np.float32)
171 | 
172 |     y_true = [y_true_13, y_true_26, y_true_52]
173 | 
174 |     # [N, 1, 2]
175 |     box_sizes = np.expand_dims(box_sizes, 1)
176 |     # broadcast tricks
177 |     # [N, 1, 2] & [9, 2] ==> [N, 9, 2]
178 |     mins = np.maximum(- box_sizes / 2, - anchors / 2)
179 |     maxs = np.minimum(box_sizes / 2, anchors / 2)
180 |     # [N, 9, 2]
181 |     whs = maxs - mins
182 | 
183 |     # [N, 9]  计算 ground truth box 与 9个 anchors 的 iou（因为每个像素都对应一个 anchor, 所以计算 iou 时不用考虑 x,y
184 |     iou = (whs[:, :, 0] * whs[:, :, 1]) / (box_sizes[:, :, 0] * box_sizes[:, :, 1] + anchors[:, 0] * anchors[:, 1] - whs[:, :, 0] * whs[:, :, 1] + 1e-10)
185 |     # [N] 找出 iou 最大的坐标
186 |     best_match_idx = np.argmax(iou, axis=1)
187 | 
188 |     ratio_dict = {1.: 8., 2.: 16., 3.: 32.}
189 |     for i, idx in enumerate(best_match_idx):
190 |         # idx: 0,1,2 ==> 0; 3,4,5 ==> 1; 6,7,8 ==> 2  找出对应的尺寸
191 |         feature_map_group = 2 - idx // 3
192 |         # scale ratio: 0,1,2 ==> 8; 3,4,5 ==> 16; 6,7,8 ==> 32
193 |         ratio = ratio_dict[np.ceil((idx + 1) / 3.)]
194 |         x = int(np.floor(box_centers[i, 0] / ratio))    # 找出 ground truth box 对应的网格坐标
195 |         y = int(np.floor(box_centers[i, 1] / ratio))
196 |         k = anchors_mask[feature_map_group].index(idx)
197 |         c = labels[i]
198 |         # print feature_map_group, '|', y,x,k,c
199 | 
200 |         # 把信息写入对应的信息表中（成网格状），没有物体的网格信息为 0
201 |         y_true[feature_map_group][y, x, k, :2] = box_centers[i]
202 |         y_true[feature_map_group][y, x, k, 2:4] = box_sizes[i]
203 |         y_true[feature_map_group][y, x, k, 4] = 1.
204 |         y_true[feature_map_group][y, x, k, 5+c] = 1.
205 | 
206 |     return y_true_13, y_true_26, y_true_52
207 | 
208 | # 只针对一个样本的信息，成成 ground truth box 对应的网格信息，没有物体的网格值为 0
209 | def process_box_tiny(boxes, labels, img_size, class_num, anchors):
210 |     '''
211 |     Generate the y_true label, i.e. the ground truth feature_maps in 3 different scales.
212 |     '''
213 |     anchors_mask = [[3,4,5], [0,1,2]]
214 | 
215 |     # convert boxes form:
216 |     # shape: [N, 2]
217 |     # (x_center, y_center)
218 |     box_centers = (boxes[:, 0:2] + boxes[:, 2:4]) / 2
219 |     # (width, height)
220 |     box_sizes = boxes[:, 2:4] - boxes[:, 0:2]
221 | 
222 |     # [13, 13, 3, 3+num_class]
223 |     y_true_13 = np.zeros((img_size[1] // 32, img_size[0] // 32, 3, 5 + class_num), np.float32)
224 |     y_true_26 = np.zeros((img_size[1] // 16, img_size[0] // 16, 3, 5 + class_num), np.float32)
225 | 
226 |     y_true = [y_true_13, y_true_26]
227 | 
228 |     # [N, 1, 2]
229 |     box_sizes = np.expand_dims(box_sizes, 1)
230 |     # broadcast tricks
231 |     # [N, 1, 2] & [6, 2] ==> [N, 6, 2]
232 |     mins = np.maximum(- box_sizes / 2, - anchors / 2)
233 |     maxs = np.minimum(box_sizes / 2, anchors / 2)
234 |     # [N, 6, 2]
235 |     whs = maxs - mins
236 | 
237 |     # [N, 6]  计算 ground truth box 与 6个 anchors 的 iou（因为每个像素都对应一个 anchor, 所以计算 iou 时不用考虑 x,y
238 |     iou = (whs[:, :, 0] * whs[:, :, 1]) / (box_sizes[:, :, 0] * box_sizes[:, :, 1] + anchors[:, 0] * anchors[:, 1] - whs[:, :, 0] * whs[:, :, 1] + 1e-10)
239 |     # [N] 找出 iou 最大的坐标
240 |     best_match_idx = np.argmax(iou, axis=1)
241 | 
242 |     ratio_dict = {1.: 16., 2.: 32.}
243 |     for i, idx in enumerate(best_match_idx):
244 |         # idx: 0,1,2 ==> 0; 3,4,5 ==> 1; 6,7,8 ==> 2  找出对应的尺寸
245 |         feature_map_group = 1 - idx // 3
246 |         # scale ratio: 0,1,2 ==> 8; 3,4,5 ==> 16; 6,7,8 ==> 32
247 |         ratio = ratio_dict[np.ceil((idx + 1) / 3.)]
248 |         x = int(np.floor(box_centers[i, 0] / ratio))    # 找出 ground truth box 对应的网格坐标
249 |         y = int(np.floor(box_centers[i, 1] / ratio))
250 |         k = anchors_mask[feature_map_group].index(idx)
251 |         c = labels[i]
252 |         # print feature_map_group, '|', y,x,k,c
253 | 
254 |         # 把信息写入对应的信息表中（成网格状），没有物体的网格信息为 0
255 |         y_true[feature_map_group][y, x, k, :2] = box_centers[i]
256 |         y_true[feature_map_group][y, x, k, 2:4] = box_sizes[i]
257 |         y_true[feature_map_group][y, x, k, 4] = 1.
258 |         y_true[feature_map_group][y, x, k, 5+c] = 1.
259 | 
260 |     return y_true_13, y_true_26
261 | 
262 | 
263 | def parse_data(line, class_num, img_size, anchors, mode):
264 |     '''
265 |     param:
266 |         line: a line from the training/test txt file
267 |         args: args returned from the main program
268 |         mode: 'train' or 'val'. When set to 'train', data_augmentation will be applied.
269 |     '''
270 |     pic_path, boxes, labels = parse_line(line)
271 | 
272 |     img = cv2.imread(pic_path)
273 |     img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
274 | 
275 |     img, boxes = resize_image_and_correct_boxes(img, boxes, img_size)
276 |     
277 |     # do data augmentation here
278 |     # note: if use the clip, the data_augmentation() must before the resize_image_and_correct_boxes()
279 |     if mode == 'train':
280 |         img, boxes, labels = data_augmentation(img, boxes, labels)
281 | 
282 |     # the input of yolo_v3 should be in range 0~1
283 |     img = img / 255
284 | 
285 |     if (np.shape(anchors)[0] == 9):   
286 |         y_true_13, y_true_26, y_true_52 = process_box(boxes, labels, img_size, class_num, anchors)   
287 |         return img, y_true_13, y_true_26, y_true_52
288 |     
289 |     elif (np.shape(anchors)[0] == 6):   
290 |         y_true_13, y_true_26 = process_box_tiny(boxes, labels, img_size, class_num, anchors)  
291 |         y_true_xx = y_true_26
292 |         return img, y_true_13, y_true_26, y_true_xx
293 | 


--------------------------------------------------------------------------------
/utils/eval_utils.py:
--------------------------------------------------------------------------------
  1 | # coding: utf-8
  2 | 
  3 | from __future__ import division, print_function
  4 | 
  5 | import numpy as np
  6 | from collections import Counter
  7 | 
  8 | from utils.nms_utils import cpu_nms, gpu_nms
  9 | 
 10 | 
 11 | def calc_iou(pred_boxes, true_boxes):
 12 |     '''
 13 |     Maintain an efficient way to calculate the ios matrix using the numpy broadcast tricks.
 14 |     shape_info: pred_boxes: [N, 4]
 15 |                 true_boxes: [V, 4]
 16 |     '''
 17 | 
 18 |     # [N, 1, 4]
 19 |     pred_boxes = np.expand_dims(pred_boxes, -2)
 20 |     # [1, V, 4]
 21 |     true_boxes = np.expand_dims(true_boxes, 0)
 22 | 
 23 |     # [N, 1, 2] & [1, V, 2] ==> [N, V, 2]
 24 |     intersect_mins = np.maximum(pred_boxes[..., :2], true_boxes[..., :2])
 25 |     intersect_maxs = np.minimum(pred_boxes[..., 2:], true_boxes[..., 2:])
 26 |     intersect_wh = np.maximum(intersect_maxs - intersect_mins, 0.)
 27 | 
 28 |     # shape: [N, V]
 29 |     intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
 30 |     # shape: [N, 1, 2]
 31 |     pred_box_wh = pred_boxes[..., 2:] - pred_boxes[..., :2]
 32 |     # shape: [N, 1]
 33 |     pred_box_area = pred_box_wh[..., 0] * pred_box_wh[..., 1]
 34 |     # [1, V, 2]
 35 |     true_boxes_wh = true_boxes[..., 2:] - true_boxes[..., :2]
 36 |     # [1, V]
 37 |     true_boxes_area = true_boxes_wh[..., 0] * true_boxes_wh[..., 1]
 38 | 
 39 |     # shape: [N, V]
 40 |     iou = intersect_area / (pred_box_area + true_boxes_area - intersect_area + 1e-10)
 41 | 
 42 |     return iou
 43 | 
 44 | 
 45 | def evaluate_on_cpu(y_pred, y_true, num_classes, calc_now=True, score_thresh=0.5, iou_thresh=0.5):
 46 |     # y_pred -> [None, 13, 13, 255],
 47 |     #           [None, 26, 26, 255],
 48 |     #           [None, 52, 52, 255],
 49 | 
 50 |     num_images = y_true[0].shape[0]
 51 |     true_labels_dict = {i: 0 for i in range(num_classes)}  # {class: count}
 52 |     pred_labels_dict = {i: 0 for i in range(num_classes)}
 53 |     true_positive_dict = {i: 0 for i in range(num_classes)}
 54 | 
 55 |     for i in range(num_images):
 56 |         true_labels_list, true_boxes_list = [], []
 57 |         for j in range(len(y_true)):  # three feature maps
 58 |             # shape: [13, 13, 3, 80]
 59 |             true_probs_temp = y_true[j][i][..., 5:]
 60 |             # shape: [13, 13, 3, 4] (x_center, y_center, w, h)
 61 |             true_boxes_temp = y_true[j][i][..., 0:4]
 62 | 
 63 |             # [13, 13, 3]
 64 |             object_mask = true_probs_temp.sum(axis=-1) > 0
 65 | 
 66 |             # [V, 3] V: Ground truth number of the current image
 67 |             true_probs_temp = true_probs_temp[object_mask]
 68 |             # [V, 4]
 69 |             true_boxes_temp = true_boxes_temp[object_mask]
 70 | 
 71 |             # [V], labels
 72 |             true_labels_list += np.argmax(true_probs_temp, axis=-1).tolist()
 73 |             # [V, 4] (x_center, y_center, w, h)
 74 |             true_boxes_list += true_boxes_temp.tolist()
 75 | 
 76 |         if len(true_labels_list) != 0:
 77 |             for cls, count in Counter(true_labels_list).items():
 78 |                 true_labels_dict[cls] += count
 79 | 
 80 |         # [V, 4] (xmin, ymin, xmax, ymax)
 81 |         true_boxes = np.array(true_boxes_list)
 82 |         box_centers, box_sizes = true_boxes[:, 0:2], true_boxes[:, 2:4]
 83 |         true_boxes[:, 0:2] = box_centers - box_sizes / 2.
 84 |         true_boxes[:, 2:4] = true_boxes[:, 0:2] + box_sizes
 85 | 
 86 |         # [1, xxx, 4]
 87 |         pred_boxes = y_pred[0][i:i + 1]
 88 |         pred_confs = y_pred[1][i:i + 1]
 89 |         pred_probs = y_pred[2][i:i + 1]
 90 | 
 91 |         # pred_boxes: [N, 4]
 92 |         # pred_confs: [N]
 93 |         # pred_labels: [N]
 94 |         # N: Detected box number of the current image
 95 |         pred_boxes, pred_confs, pred_labels = cpu_nms(pred_boxes, pred_confs * pred_probs, num_classes,
 96 |                                                       score_thresh=score_thresh, iou_thresh=iou_thresh)
 97 | 
 98 |         # len: N
 99 |         pred_labels_list = [] if pred_labels is None else pred_labels.tolist()
100 |         if pred_labels_list == []:
101 |             continue
102 | 
103 |         # calc iou
104 |         # [N, V]
105 |         iou_matrix = calc_iou(pred_boxes, true_boxes)
106 |         # [N]
107 |         max_iou_idx = np.argmax(iou_matrix, axis=-1)
108 | 
109 |         correct_idx = []
110 |         correct_conf = []
111 |         for k in range(max_iou_idx.shape[0]):
112 |             pred_labels_dict[pred_labels_list[k]] += 1
113 |             match_idx = max_iou_idx[k]  # V level
114 |             if iou_matrix[k, match_idx] > iou_thresh and true_labels_list[match_idx] == pred_labels_list[k]:
115 |                 if not match_idx in correct_idx:
116 |                     correct_idx.append(match_idx)
117 |                     correct_conf.append(pred_confs[k])
118 |                 else:
119 |                     same_idx = correct_idx.index(match_idx)
120 |                     if pred_confs[k] > correct_conf[same_idx]:
121 |                         correct_idx.pop(same_idx)
122 |                         correct_conf.pop(same_idx)
123 |                         correct_idx.append(match_idx)
124 |                         correct_conf.append(pred_confs[k])
125 | 
126 |         for t in correct_idx:
127 |             true_positive_dict[true_labels_list[t]] += 1
128 | 
129 |     if calc_now:
130 |         # avoid divided by 0
131 |         recall = sum(true_positive_dict.values()) / (sum(true_labels_dict.values()) + 1e-6)
132 |         precision = sum(true_positive_dict.values()) / (sum(pred_labels_dict.values()) + 1e-6)
133 | 
134 |         return recall, precision
135 |     else:
136 |         return true_positive_dict, true_labels_dict, pred_labels_dict
137 | 
138 | 
139 | def evaluate_on_gpu(sess, gpu_nms_op, pred_boxes_flag, pred_scores_flag, y_pred, y_true, num_classes, calc_now=True, score_thresh=0.5, iou_thresh=0.5):
140 |     # y_pred -> [None, 13, 13, 255],
141 |     #           [None, 26, 26, 255],
142 |     #           [None, 52, 52, 255],
143 | 
144 |     num_images = y_true[0].shape[0]
145 |     true_labels_dict = {i: 0 for i in range(num_classes)}  # {class: count}
146 |     pred_labels_dict = {i: 0 for i in range(num_classes)}
147 |     true_positive_dict = {i: 0 for i in range(num_classes)}
148 | 
149 |     for i in range(num_images):
150 |         true_labels_list, true_boxes_list = [], []
151 |         for j in range(len(y_true)):  # 2 or 3 feature maps
152 |             # shape: [13, 13, 3, 80]
153 |             true_probs_temp = y_true[j][i][..., 5:]
154 |             # shape: [13, 13, 3, 4] (x_center, y_center, w, h)
155 |             true_boxes_temp = y_true[j][i][..., 0:4]
156 | 
157 |             # [13, 13, 3]
158 |             object_mask = true_probs_temp.sum(axis=-1) > 0
159 | 
160 |             # [V, 3] V: Ground truth number of the current image
161 |             true_probs_temp = true_probs_temp[object_mask]
162 |             # [V, 4]
163 |             true_boxes_temp = true_boxes_temp[object_mask]
164 | 
165 |             # [V], labels
166 |             true_labels_list += np.argmax(true_probs_temp, axis=-1).tolist()
167 |             # [V, 4] (x_center, y_center, w, h)
168 |             true_boxes_list += true_boxes_temp.tolist()
169 | 
170 |         if len(true_labels_list) != 0:
171 |             for cls, count in Counter(true_labels_list).items():
172 |                 true_labels_dict[cls] += count
173 | 
174 |         # [V, 4] (xmin, ymin, xmax, ymax)
175 |         true_boxes = np.array(true_boxes_list)
176 |         box_centers, box_sizes = true_boxes[:, 0:2], true_boxes[:, 2:4]
177 |         true_boxes[:, 0:2] = box_centers - box_sizes / 2.
178 |         true_boxes[:, 2:4] = true_boxes[:, 0:2] + box_sizes
179 | 
180 |         # [1, xxx, 4]
181 |         pred_boxes = y_pred[0][i:i + 1]
182 |         pred_confs = y_pred[1][i:i + 1]
183 |         pred_probs = y_pred[2][i:i + 1]
184 | 
185 |         # pred_boxes: [N, 4]
186 |         # pred_confs: [N]
187 |         # pred_labels: [N]
188 |         # N: Detected box number of the current image
189 |         pred_boxes, pred_confs, pred_labels = sess.run(gpu_nms_op,
190 |                                                        feed_dict={pred_boxes_flag: pred_boxes,
191 |                                                                   pred_scores_flag: pred_confs * pred_probs})
192 |         # len: N
193 |         pred_labels_list = [] if pred_labels is None else pred_labels.tolist()
194 |         if pred_labels_list == []:
195 |             continue
196 | 
197 |         # calc iou
198 |         # [N, V]
199 |         iou_matrix = calc_iou(pred_boxes, true_boxes)
200 |         # [N]
201 |         max_iou_idx = np.argmax(iou_matrix, axis=-1)
202 | 
203 |         correct_idx = []
204 |         correct_conf = []
205 |         for k in range(max_iou_idx.shape[0]):
206 |             pred_labels_dict[pred_labels_list[k]] += 1
207 |             match_idx = max_iou_idx[k]  # V level
208 |             if iou_matrix[k, match_idx] > iou_thresh and true_labels_list[match_idx] == pred_labels_list[k]:
209 |                 if not match_idx in correct_idx:
210 |                     correct_idx.append(match_idx)
211 |                     correct_conf.append(pred_confs[k])
212 |                 else:
213 |                     same_idx = correct_idx.index(match_idx)
214 |                     if pred_confs[k] > correct_conf[same_idx]:
215 |                         correct_idx.pop(same_idx)
216 |                         correct_conf.pop(same_idx)
217 |                         correct_idx.append(match_idx)
218 |                         correct_conf.append(pred_confs[k])
219 | 
220 |         for t in correct_idx:
221 |             true_positive_dict[true_labels_list[t]] += 1
222 | 
223 |     if calc_now:
224 |         # avoid divided by 0
225 |         recall = sum(true_positive_dict.values()) / (sum(true_labels_dict.values()) + 1e-6)
226 |         precision = sum(true_positive_dict.values()) / (sum(pred_labels_dict.values()) + 1e-6)
227 | 
228 |         return recall, precision
229 |     else:
230 |         return true_positive_dict, true_labels_dict, pred_labels_dict


--------------------------------------------------------------------------------
/utils/layer_utils.py:
--------------------------------------------------------------------------------
  1 | # coding: utf-8
  2 | 
  3 | from __future__ import division, print_function
  4 | 
  5 | import numpy as np
  6 | import tensorflow as tf
  7 | slim = tf.contrib.slim
  8 | 
  9 | def conv2d(inputs, filters, kernel_size, strides=1):
 10 |     def _fixed_padding(inputs, kernel_size):
 11 |         pad_total = kernel_size - 1
 12 |         pad_beg = pad_total // 2
 13 |         pad_end = pad_total - pad_beg
 14 | 
 15 |         padded_inputs = tf.pad(inputs, [[0, 0], [pad_beg, pad_end],
 16 |                                         [pad_beg, pad_end], [0, 0]], mode='CONSTANT')
 17 |         return padded_inputs
 18 |     if strides > 1: 
 19 |         inputs = _fixed_padding(inputs, kernel_size)
 20 |     inputs = slim.conv2d(inputs, filters, kernel_size, stride=strides,
 21 |                          padding=('SAME' if strides == 1 else 'VALID'))
 22 |     return inputs
 23 | 
 24 | def darknet53_body(inputs):
 25 |     
 26 |     def res_block(inputs, filters):
 27 |         shortcut = inputs
 28 |         net = conv2d(inputs, filters * 1, 1)
 29 |         net = conv2d(net, filters * 2, 3)
 30 | 
 31 |         net = net + shortcut
 32 | 
 33 |         return net
 34 |     
 35 |     # first two conv2d layers
 36 |     net = conv2d(inputs, 32,  3, strides=1)
 37 |     net = conv2d(net, 64,  3, strides=2)
 38 | 
 39 |     # res_block * 1
 40 |     net = res_block(net, 32)
 41 | 
 42 |     net = conv2d(net, 128, 3, strides=2)
 43 | 
 44 |     # res_block * 2
 45 |     for i in range(2):
 46 |         net = res_block(net, 64)
 47 | 
 48 |     net = conv2d(net, 256, 3, strides=2)
 49 | 
 50 |     # res_block * 8
 51 |     for i in range(8):
 52 |         net = res_block(net, 128)
 53 | 
 54 |     route_1 = net
 55 |     net = conv2d(net, 512, 3, strides=2)
 56 | 
 57 |     # res_block * 8
 58 |     for i in range(8):
 59 |         net = res_block(net, 256)
 60 | 
 61 |     route_2 = net
 62 |     net = conv2d(net, 1024, 3, strides=2)
 63 | 
 64 |     # res_block * 4
 65 |     for i in range(4):
 66 |         net = res_block(net, 512)
 67 |     route_3 = net
 68 | 
 69 |     return route_1, route_2, route_3
 70 | 
 71 | 
 72 | def darknet19_body(inputs):
 73 | 
 74 |     net = conv2d(inputs, 16,  3, strides=1)   
 75 |     net = slim.max_pool2d(net, [2,2], stride=2, padding='SAME')
 76 |     # 208 x 208
 77 | 
 78 |     net = conv2d(net, 32,  3, strides=1)
 79 |     net = slim.max_pool2d(net, [2,2], stride=2, padding='SAME')
 80 |     # 104 x 104
 81 | 
 82 |     net = conv2d(net, 64,  3, strides=1)
 83 |     net = slim.max_pool2d(net, [2,2], stride=2, padding='SAME')
 84 |     route_1f = net
 85 |     # 52 x 52
 86 | 
 87 |     net = conv2d(net, 128,  3, strides=1)
 88 |     net = slim.max_pool2d(net, [2,2], stride=2, padding='SAME')   
 89 |     route_1 = net
 90 |     # 26 x 26
 91 |     
 92 |     net = conv2d(net, 256,  3, strides=1)
 93 |     route_2f = net
 94 |     net = slim.max_pool2d(net, [2,2], stride=2, padding='SAME')
 95 |     # 13 x 13
 96 |     
 97 |     net = conv2d(net, 512,  3, strides=1)
 98 |     net = slim.max_pool2d(net, [2,2], stride=1, padding='SAME')
 99 |     # 13 x 13
100 |        
101 |     net = conv2d(net, 1024,  3, strides=1)
102 |     route_2 = net
103 |     # 13 x 13
104 |     
105 |     return route_1, route_1f, route_2, route_2f
106 | 
107 | 
108 | def yolo_block(inputs, filters):
109 |     net = conv2d(inputs, filters * 1, 1)
110 |     net = conv2d(net, filters * 2, 3)
111 |     net = conv2d(net, filters * 1, 1)
112 |     net = conv2d(net, filters * 2, 3)
113 |     net = conv2d(net, filters * 1, 1)
114 |     route = net
115 |     net = conv2d(net, filters * 2, 3)
116 |     return route, net
117 | 
118 | def yolo_tiny_block(inputs, filters):
119 |     net = conv2d(inputs, filters * 1, 1)
120 |     route = net
121 |     net = conv2d(net, filters * 2, 3)
122 |     return route, net
123 | 
124 | 
125 | def upsample_layer(inputs, out_shape):
126 |     new_height, new_width = out_shape[1], out_shape[2]
127 |     # NOTE: here height is the first
128 |     inputs = tf.image.resize_nearest_neighbor(inputs, (new_height, new_width), align_corners=True, name='upsampled')
129 |     return inputs
130 | 
131 | def PAM_layer(net):
132 |     
133 |     N, H, W, C = np.shape(net)
134 |     
135 |     net_b = slim.conv2d(net, C, 1, stride=1)
136 |     net_c = slim.conv2d(net, C, 1, stride=1)
137 |     net_d = slim.conv2d(net, C, 1, stride=1)
138 |     
139 |     net_b = tf.reshape(net_b, [-1, H*W, C])
140 |     net_c = tf.reshape(net_c, [-1, H*W, C])
141 |     net_c = tf.transpose(net_c, [0,2,1])
142 |     net_d = tf.reshape(net_d, [-1, H*W, C])
143 |     
144 |     net_bc = tf.matmul(net_b,net_c)
145 |     net_bcd = tf.matmul(net_bc,net_d)
146 |     
147 |     net_bcd = tf.reshape(net_bcd, [-1, H, W, C])
148 |     
149 |     net = net_bcd * 0.2 + net
150 |     
151 |     return net
152 | 
153 | 
154 | 


--------------------------------------------------------------------------------
/utils/misc_utils.py:
--------------------------------------------------------------------------------
  1 | # coding: utf-8
  2 | 
  3 | import numpy as np
  4 | import tensorflow as tf
  5 | import random
  6 | 
  7 | from tensorflow.core.framework import summary_pb2
  8 | 
  9 | 
 10 | def make_summary(name, val):
 11 |     return summary_pb2.Summary(value=[summary_pb2.Summary.Value(tag=name, simple_value=val)])
 12 | 
 13 | 
 14 | def parse_anchors(anchor_path):
 15 |     '''
 16 |     parse anchors.
 17 |     returned data: shape [N, 2], dtype float32
 18 |     '''
 19 |     anchors = np.reshape(np.asarray(open(anchor_path, 'r').read().split(','), np.float32), [-1, 2])
 20 |     return anchors
 21 | 
 22 | 
 23 | def read_class_names(class_name_path):
 24 |     names = {}
 25 |     with open(class_name_path, 'r') as data:
 26 |         for ID, name in enumerate(data):
 27 |             names[ID] = name.strip('\n')
 28 |     return names
 29 | 
 30 | 
 31 | def shuffle_and_overwrite(file_name):
 32 |     content = open(file_name, 'r').readlines()
 33 |     random.shuffle(content)
 34 |     with open(file_name, 'w') as f:
 35 |         for line in content:
 36 |             f.write(line)
 37 | 
 38 | 
 39 | def update_dict(ori_dict, new_dict):
 40 |     if not ori_dict:
 41 |         return new_dict
 42 |     for key in ori_dict:
 43 |         ori_dict[key] += new_dict[key]
 44 |     return ori_dict
 45 | 
 46 | 
 47 | def list_add(ori_list, new_list):
 48 |     for i in range(len(ori_list)):
 49 |         ori_list[i] += new_list[i]
 50 |     return ori_list
 51 | 
 52 | 
 53 | def load_weights(var_list, weights_file):
 54 |     """
 55 |     Loads and converts pre-trained weights.
 56 |     param:
 57 |         var_list: list of network variables.
 58 |         weights_file: name of the binary file.
 59 |     """
 60 |     with open(weights_file, "rb") as fp:
 61 |         np.fromfile(fp, dtype=np.int32, count=5)
 62 |         weights = np.fromfile(fp, dtype=np.float32)
 63 | 
 64 |     ptr = 0
 65 |     i = 0
 66 |     assign_ops = []
 67 |     while i < len(var_list) - 1:
 68 |         var1 = var_list[i]
 69 |         var2 = var_list[i + 1]
 70 |         # do something only if we process conv layer
 71 |         if 'Conv' in var1.name.split('/')[-2]:
 72 |             # check type of next layer
 73 |             if 'BatchNorm' in var2.name.split('/')[-2]:
 74 |                 # load batch norm params
 75 |                 gamma, beta, mean, var = var_list[i + 1:i + 5]
 76 |                 batch_norm_vars = [beta, gamma, mean, var]
 77 |                 for var in batch_norm_vars:
 78 |                     shape = var.shape.as_list()
 79 |                     num_params = np.prod(shape)
 80 |                     var_weights = weights[ptr:ptr + num_params].reshape(shape)
 81 |                     ptr += num_params
 82 |                     assign_ops.append(tf.assign(var, var_weights, validate_shape=True))
 83 |                 # we move the pointer by 4, because we loaded 4 variables
 84 |                 i += 4
 85 |             elif 'Conv' in var2.name.split('/')[-2]:
 86 |                 # load biases
 87 |                 bias = var2
 88 |                 bias_shape = bias.shape.as_list()
 89 |                 bias_params = np.prod(bias_shape)
 90 |                 bias_weights = weights[ptr:ptr +
 91 |                                        bias_params].reshape(bias_shape)
 92 |                 ptr += bias_params
 93 |                 assign_ops.append(tf.assign(bias, bias_weights, validate_shape=True))
 94 |                 # we loaded 1 variable
 95 |                 i += 1
 96 |             # we can load weights of conv layer
 97 |             shape = var1.shape.as_list()
 98 |             num_params = np.prod(shape)
 99 | 
100 |             var_weights = weights[ptr:ptr + num_params].reshape(
101 |                 (shape[3], shape[2], shape[0], shape[1]))
102 |             # remember to transpose to column-major
103 |             var_weights = np.transpose(var_weights, (2, 3, 1, 0))
104 |             ptr += num_params
105 |             assign_ops.append(
106 |                 tf.assign(var1, var_weights, validate_shape=True))
107 |             i += 1
108 | 
109 |     return assign_ops
110 | 
111 | 
112 | def config_learning_rate(args, global_step):
113 |     if args.lr_type == 'exponential':
114 |         lr_tmp = tf.train.exponential_decay(args.learning_rate_init, global_step, args.lr_decay_freq,
115 |                                             args.lr_decay_factor, staircase=True, name='exponential_learning_rate')
116 |         return tf.maximum(lr_tmp, args.lr_lower_bound)
117 |     elif args.lr_type == 'fixed':
118 |         return tf.convert_to_tensor(args.learning_rate_init, name='fixed_learning_rate')
119 |     else:
120 |         raise ValueError('Unsupported learning rate type!')
121 | 
122 | 
123 | def config_optimizer(optimizer_name, learning_rate, decay=0.9, momentum=0.9):
124 |     if optimizer_name == 'momentum':
125 |         return tf.train.MomentumOptimizer(learning_rate, momentum=momentum)
126 |     elif optimizer_name == 'rmsprop':
127 |         return tf.train.RMSPropOptimizer(learning_rate, decay=decay, momentum=momentum)
128 |     elif optimizer_name == 'adam':
129 |         return tf.train.AdamOptimizer(learning_rate)
130 |     elif optimizer_name == 'sgd':
131 |         return tf.train.GradientDescentOptimizer(learning_rate)
132 |     else:
133 |         raise ValueError('Unsupported optimizer type!')


--------------------------------------------------------------------------------
/utils/nms_utils.py:
--------------------------------------------------------------------------------
  1 | # coding: utf-8
  2 | 
  3 | from __future__ import division, print_function
  4 | 
  5 | import numpy as np
  6 | import tensorflow as tf
  7 | 
  8 | def gpu_nms(boxes, scores, num_classes, max_boxes=50, score_thresh=0.5, iou_thresh=0.5):
  9 |     """
 10 |     Perform NMS on GPU using TensorFlow.
 11 | 
 12 |     params:
 13 |         boxes: tensor of shape [1, 10647, 4] # 10647=(13*13+26*26+52*52)*3, for input 416*416 image
 14 |         scores: tensor of shape [1, 10647, num_classes], score=conf*prob
 15 |         num_classes: total number of classes
 16 |         max_boxes: integer, maximum number of predicted boxes you'd like, default is 50
 17 |         score_thresh: if [ highest class probability score < score_threshold]
 18 |                         then get rid of the corresponding box
 19 |         iou_thresh: real value, "intersection over union" threshold used for NMS filtering
 20 |     """
 21 | 
 22 |     boxes_list, label_list, score_list = [], [], []
 23 |     max_boxes = tf.constant(max_boxes, dtype='int32')
 24 | 
 25 |     # since we do nms for single image, then reshape it
 26 |     boxes = tf.reshape(boxes, [-1, 4]) # '-1' means we don't konw the exact number of boxes
 27 |     score = tf.reshape(scores, [-1, num_classes])
 28 | 
 29 |     # Step 1: Create a filtering mask based on "box_class_scores" by using "threshold".
 30 |     mask = tf.greater_equal(score, tf.constant(score_thresh))
 31 |     # Step 2: Do non_max_suppression for each class
 32 |     for i in range(num_classes):
 33 |         # Step 3: Apply the mask to scores, boxes and pick them out
 34 |         filter_boxes = tf.boolean_mask(boxes, mask[:,i])
 35 |         filter_score = tf.boolean_mask(score[:,i], mask[:,i])
 36 |         nms_indices = tf.image.non_max_suppression(boxes=filter_boxes,
 37 |                                                    scores=filter_score,
 38 |                                                    max_output_size=max_boxes,
 39 |                                                    iou_threshold=iou_thresh, name='nms_indices')
 40 |         label_list.append(tf.ones_like(tf.gather(filter_score, nms_indices), 'int32')*i)
 41 |         boxes_list.append(tf.gather(filter_boxes, nms_indices))
 42 |         score_list.append(tf.gather(filter_score, nms_indices))
 43 | 
 44 |     boxes = tf.concat(boxes_list, axis=0)
 45 |     score = tf.concat(score_list, axis=0)
 46 |     label = tf.concat(label_list, axis=0)
 47 | 
 48 |     return boxes, score, label
 49 | 
 50 | 
 51 | def py_nms(boxes, scores, max_boxes=50, iou_thresh=0.5):
 52 |     """
 53 |     Pure Python NMS baseline.
 54 | 
 55 |     Arguments: boxes: shape of [-1, 4], the value of '-1' means that dont know the
 56 |                       exact number of boxes
 57 |                scores: shape of [-1,]
 58 |                max_boxes: representing the maximum of boxes to be selected by non_max_suppression
 59 |                iou_thresh: representing iou_threshold for deciding to keep boxes
 60 |     """
 61 |     assert boxes.shape[1] == 4 and len(scores.shape) == 1
 62 | 
 63 |     x1 = boxes[:, 0]
 64 |     y1 = boxes[:, 1]
 65 |     x2 = boxes[:, 2]
 66 |     y2 = boxes[:, 3]
 67 | 
 68 |     areas = (x2 - x1) * (y2 - y1)
 69 |     order = scores.argsort()[::-1]
 70 | 
 71 |     keep = []
 72 |     while order.size > 0:
 73 |         i = order[0]
 74 |         keep.append(i)
 75 |         xx1 = np.maximum(x1[i], x1[order[1:]])
 76 |         yy1 = np.maximum(y1[i], y1[order[1:]])
 77 |         xx2 = np.minimum(x2[i], x2[order[1:]])
 78 |         yy2 = np.minimum(y2[i], y2[order[1:]])
 79 | 
 80 |         w = np.maximum(0.0, xx2 - xx1 + 1)
 81 |         h = np.maximum(0.0, yy2 - yy1 + 1)
 82 |         inter = w * h
 83 |         ovr = inter / (areas[i] + areas[order[1:]] - inter)
 84 | 
 85 |         inds = np.where(ovr <= iou_thresh)[0]
 86 |         order = order[inds + 1]
 87 | 
 88 |     return keep[:max_boxes]
 89 | 
 90 | 
 91 | def cpu_nms(boxes, scores, num_classes, max_boxes=50, score_thresh=0.5, iou_thresh=0.5):
 92 |     """
 93 |     Perform NMS on CPU.
 94 |     Arguments:
 95 |         boxes: shape [1, 10647, 4]
 96 |         scores: shape [1, 10647, num_classes]
 97 |     """
 98 | 
 99 |     boxes = boxes.reshape(-1, 4)
100 |     scores = scores.reshape(-1, num_classes)
101 |     # Picked bounding boxes
102 |     picked_boxes, picked_score, picked_label = [], [], []
103 | 
104 |     for i in range(num_classes):
105 |         indices = np.where(scores[:,i] >= score_thresh)
106 |         filter_boxes = boxes[indices]
107 |         filter_scores = scores[:,i][indices]
108 |         if len(filter_boxes) == 0: 
109 |             continue
110 |         # do non_max_suppression on the cpu
111 |         indices = py_nms(filter_boxes, filter_scores,
112 |                          max_boxes=max_boxes, iou_thresh=iou_thresh)
113 |         picked_boxes.append(filter_boxes[indices])
114 |         picked_score.append(filter_scores[indices])
115 |         picked_label.append(np.ones(len(indices), dtype='int32')*i)
116 |     if len(picked_boxes) == 0: 
117 |         return None, None, None
118 | 
119 |     boxes = np.concatenate(picked_boxes, axis=0)
120 |     score = np.concatenate(picked_score, axis=0)
121 |     label = np.concatenate(picked_label, axis=0)
122 | 
123 |     return boxes, score, label


--------------------------------------------------------------------------------
/utils/plot_utils.py:
--------------------------------------------------------------------------------
 1 | # coding: utf-8
 2 | 
 3 | from __future__ import division, print_function
 4 | 
 5 | import cv2
 6 | import random
 7 | 
 8 | 
 9 | def get_color_table(class_num, seed=2):
10 |     random.seed(seed)
11 |     color_table = {}
12 |     for i in range(class_num):
13 |         color_table[i] = [random.randint(0, 255) for _ in range(3)]
14 |     return color_table
15 | 
16 | 
17 | def plot_one_box(img, coord, label=None, color=None, line_thickness=None):
18 |     '''
19 |     coord: [x_min, y_min, x_max, y_max] format coordinates.
20 |     img: img to plot on.
21 |     label: str. The label name.
22 |     color: int. color index.
23 |     line_thickness: int. rectangle line thickness.
24 |     '''
25 |     tl = line_thickness or int(round(0.002 * max(img.shape[0:2])))  # line thickness
26 |     color = color or [random.randint(0, 255) for _ in range(3)]
27 |     c1, c2 = (int(coord[0]), int(coord[1])), (int(coord[2]), int(coord[3]))
28 |     cv2.rectangle(img, c1, c2, color, thickness=tl)
29 |     if label:
30 |         tf = max(tl - 1, 1)  # font thickness
31 |         t_size = cv2.getTextSize(label, 0, fontScale=float(tl) / 3, thickness=tf)[0]
32 |         c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
33 |         cv2.rectangle(img, c1, c2, color, -1)  # filled
34 |         cv2.putText(img, label, (c1[0], c1[1] - 2), 0, float(tl) / 3, [0, 0, 0], thickness=tf, lineType=cv2.LINE_AA)
35 | 
36 | 


--------------------------------------------------------------------------------
/video_test.py:
--------------------------------------------------------------------------------
  1 | # coding: utf-8
  2 | 
  3 | from __future__ import division, print_function
  4 | 
  5 | import tensorflow as tf
  6 | import numpy as np
  7 | import argparse
  8 | import cv2
  9 | import time
 10 | 
 11 | from utils.misc_utils import parse_anchors, read_class_names
 12 | from utils.nms_utils import gpu_nms
 13 | from utils.plot_utils import get_color_table, plot_one_box
 14 | 
 15 | from model import yolov3
 16 | 
 17 | parser = argparse.ArgumentParser(description="YOLO-V3 video test procedure.")
 18 | parser.add_argument("input_video", type=str,
 19 |                     help="The path of the input video.")
 20 | parser.add_argument("--anchor_path", type=str, default="./data/yolo_anchors.txt",
 21 |                     help="The path of the anchor txt file.")
 22 | parser.add_argument("--new_size", nargs='*', type=int, default=[416, 416],
 23 |                     help="Resize the input image with `new_size`, size format: [width, height]")
 24 | parser.add_argument("--class_name_path", type=str, default="./data/coco.names",
 25 |                     help="The path of the class names.")
 26 | parser.add_argument("--restore_path", type=str, default="./data/darknet_weights/yolov3.ckpt",
 27 |                     help="The path of the weights to restore.")
 28 | parser.add_argument("--save_video", type=lambda x: (str(x).lower() == 'true'), default=False,
 29 |                     help="Whether to save the video detection results.")
 30 | args = parser.parse_args()
 31 | 
 32 | args.anchors = parse_anchors(args.anchor_path)
 33 | args.classes = read_class_names(args.class_name_path)
 34 | args.num_class = len(args.classes)
 35 | 
 36 | color_table = get_color_table(args.num_class)
 37 | 
 38 | vid = cv2.VideoCapture(args.input_video)
 39 | video_frame_cnt = int(vid.get(7))
 40 | video_width = int(vid.get(3))
 41 | video_height = int(vid.get(4))
 42 | video_fps = int(vid.get(5))
 43 | 
 44 | if args.save_video:
 45 |     fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
 46 |     videoWriter = cv2.VideoWriter('video_result.mp4', fourcc, video_fps, (video_width, video_height))
 47 | 
 48 | with tf.Session() as sess:
 49 |     input_data = tf.placeholder(tf.float32, [1, args.new_size[1], args.new_size[0], 3], name='input_data')
 50 |     yolo_model = yolov3(args.num_class, args.anchors)
 51 |     with tf.variable_scope('yolov3'):
 52 |         pred_feature_maps = yolo_model.forward(input_data, False)
 53 |     pred_boxes, pred_confs, pred_probs = yolo_model.predict(pred_feature_maps)
 54 | 
 55 |     pred_scores = pred_confs * pred_probs
 56 | 
 57 |     boxes, scores, labels = gpu_nms(pred_boxes, pred_scores, args.num_class, max_boxes=30, score_thresh=0.5, iou_thresh=0.5)
 58 | 
 59 |     saver = tf.train.Saver()
 60 |     saver.restore(sess, args.restore_path)
 61 | 
 62 |     for i in range(video_frame_cnt):
 63 |         ret, img_ori = vid.read()
 64 | 
 65 |         height_ori, width_ori = img_ori.shape[:2]
 66 |         img = cv2.resize(img_ori, tuple(args.new_size))
 67 |         img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
 68 |         img = np.asarray(img, np.float32)
 69 |         img = img[np.newaxis, :] / 255.
 70 | 
 71 |         start_time = time.time()
 72 |         boxes_, scores_, labels_ = sess.run([boxes, scores, labels], feed_dict={input_data: img})
 73 |         end_time = time.time()
 74 | 
 75 |         # rescale the coordinates to the original image
 76 |         boxes_[:, 0] *= (width_ori/float(args.new_size[0]))
 77 |         boxes_[:, 2] *= (width_ori/float(args.new_size[0]))
 78 |         boxes_[:, 1] *= (height_ori/float(args.new_size[1]))
 79 |         boxes_[:, 3] *= (height_ori/float(args.new_size[1]))
 80 | 
 81 | 
 82 |         for i in range(len(boxes_)):
 83 |             x0, y0, x1, y1 = boxes_[i]
 84 |             plot_one_box(img_ori, [x0, y0, x1, y1], label=args.classes[labels_[i]], color=color_table[labels_[i]])
 85 |         cv2.putText(img_ori, '{:.2f}ms'.format((end_time - start_time) * 1000), (40, 40), 0,
 86 |                     fontScale=1, color=(0, 255, 0), thickness=2)
 87 |         cv2.imshow('image', img_ori)
 88 |         if args.save_video:
 89 |             videoWriter.write(img_ori)
 90 |         if cv2.waitKey(1) & 0xFF == ord('q'):
 91 |             break
 92 | 
 93 |     vid.release()
 94 |     if args.save_video:
 95 |         videoWriter.release()
 96 | 
 97 | 
 98 | 
 99 | 
100 | 
101 | 
102 | 
103 | 
104 | 
105 | 
106 | 
107 | 
108 | 


--------------------------------------------------------------------------------