├── LICENSE
├── README.md
├── main.py
├── session_params
    └── README.md
├── ssd300.py
├── ssd300_resnet.py
└── train_datasets
    ├── README.md
    ├── voc2007
        └── README.md
    └── voc2012
        └── README.md


/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity authorized by
 13 |       the copyright owner that is granting the License.
 14 | 
 15 |       "Legal Entity" shall mean the union of the acting entity and all
 16 |       other entities that control, are controlled by, or are under common
 17 |       control with that entity. For the purposes of this definition,
 18 |       "control" means (i) the power, direct or indirect, to cause the
 19 |       direction or management of such entity, whether by contract or
 20 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |       outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |       "You" (or "Your") shall mean an individual or Legal Entity
 24 |       exercising permissions granted by this License.
 25 | 
 26 |       "Source" form shall mean the preferred form for making modifications,
 27 |       including but not limited to software source code, documentation
 28 |       source, and configuration files.
 29 | 
 30 |       "Object" form shall mean any form resulting from mechanical
 31 |       transformation or translation of a Source form, including but
 32 |       not limited to compiled object code, generated documentation,
 33 |       and conversions to other media types.
 34 | 
 35 |       "Work" shall mean the work of authorship, whether in Source or
 36 |       Object form, made available under the License, as indicated by a
 37 |       copyright notice that is included in or attached to the work
 38 |       (an example is provided in the Appendix below).
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based on (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and Derivative Works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control systems,
 57 |       and issue tracking systems that are managed by, or on behalf of, the
 58 |       Licensor for the purpose of discussing and improving the Work, but
 59 |       excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |       on behalf of whom a Contribution has been received by Licensor and
 64 |       subsequently incorporated within the Work.
 65 | 
 66 |    2. Grant of Copyright License. Subject to the terms and conditions of
 67 |       this License, each Contributor hereby grants to You a perpetual,
 68 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |       copyright license to reproduce, prepare Derivative Works of,
 70 |       publicly display, publicly perform, sublicense, and distribute the
 71 |       Work and such Derivative Works in Source or Object form.
 72 | 
 73 |    3. Grant of Patent License. Subject to the terms and conditions of
 74 |       this License, each Contributor hereby grants to You a perpetual,
 75 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |       (except as stated in this section) patent license to make, have made,
 77 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |       where such license applies only to those patent claims licensable
 79 |       by such Contributor that are necessarily infringed by their
 80 |       Contribution(s) alone or by combination of their Contribution(s)
 81 |       with the Work to which such Contribution(s) was submitted. If You
 82 |       institute patent litigation against any entity (including a
 83 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |       or a Contribution incorporated within the Work constitutes direct
 85 |       or contributory patent infringement, then any patent licenses
 86 |       granted to You under this License for that Work shall terminate
 87 |       as of the date such litigation is filed.
 88 | 
 89 |    4. Redistribution. You may reproduce and distribute copies of the
 90 |       Work or Derivative Works thereof in any medium, with or without
 91 |       modifications, and in Source or Object form, provided that You
 92 |       meet the following conditions:
 93 | 
 94 |       (a) You must give any other recipients of the Work or
 95 |           Derivative Works a copy of this License; and
 96 | 
 97 |       (b) You must cause any modified files to carry prominent notices
 98 |           stating that You changed the files; and
 99 | 
100 |       (c) You must retain, in the Source form of any Derivative Works
101 |           that You distribute, all copyright, patent, trademark, and
102 |           attribution notices from the Source form of the Work,
103 |           excluding those notices that do not pertain to any part of
104 |           the Derivative Works; and
105 | 
106 |       (d) If the Work includes a "NOTICE" text file as part of its
107 |           distribution, then any Derivative Works that You distribute must
108 |           include a readable copy of the attribution notices contained
109 |           within such NOTICE file, excluding those notices that do not
110 |           pertain to any part of the Derivative Works, in at least one
111 |           of the following places: within a NOTICE text file distributed
112 |           as part of the Derivative Works; within the Source form or
113 |           documentation, if provided along with the Derivative Works; or,
114 |           within a display generated by the Derivative Works, if and
115 |           wherever such third-party notices normally appear. The contents
116 |           of the NOTICE file are for informational purposes only and
117 |           do not modify the License. You may add Your own attribution
118 |           notices within Derivative Works that You distribute, alongside
119 |           or as an addendum to the NOTICE text from the Work, provided
120 |           that such additional attribution notices cannot be construed
121 |           as modifying the License.
122 | 
123 |       You may add Your own copyright statement to Your modifications and
124 |       may provide additional or different license terms and conditions
125 |       for use, reproduction, or distribution of Your modifications, or
126 |       for any such Derivative Works as a whole, provided Your use,
127 |       reproduction, and distribution of the Work otherwise complies with
128 |       the conditions stated in this License.
129 | 
130 |    5. Submission of Contributions. Unless You explicitly state otherwise,
131 |       any Contribution intentionally submitted for inclusion in the Work
132 |       by You to the Licensor shall be under the terms and conditions of
133 |       this License, without any additional terms or conditions.
134 |       Notwithstanding the above, nothing herein shall supersede or modify
135 |       the terms of any separate license agreement you may have executed
136 |       with Licensor regarding such Contributions.
137 | 
138 |    6. Trademarks. This License does not grant permission to use the trade
139 |       names, trademarks, service marks, or product names of the Licensor,
140 |       except as required for reasonable and customary use in describing the
141 |       origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 |    7. Disclaimer of Warranty. Unless required by applicable law or
144 |       agreed to in writing, Licensor provides the Work (and each
145 |       Contributor provides its Contributions) on an "AS IS" BASIS,
146 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |       implied, including, without limitation, any warranties or conditions
148 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |       PARTICULAR PURPOSE. You are solely responsible for determining the
150 |       appropriateness of using or redistributing the Work and assume any
151 |       risks associated with Your exercise of permissions under this License.
152 | 
153 |    8. Limitation of Liability. In no event and under no legal theory,
154 |       whether in tort (including negligence), contract, or otherwise,
155 |       unless required by applicable law (such as deliberate and grossly
156 |       negligent acts) or agreed to in writing, shall any Contributor be
157 |       liable to You for damages, including any direct, indirect, special,
158 |       incidental, or consequential damages of any character arising as a
159 |       result of this License or out of the use or inability to use the
160 |       Work (including but not limited to damages for loss of goodwill,
161 |       work stoppage, computer failure or malfunction, or any and all
162 |       other commercial damages or losses), even if such Contributor
163 |       has been advised of the possibility of such damages.
164 | 
165 |    9. Accepting Warranty or Additional Liability. While redistributing
166 |       the Work or Derivative Works thereof, You may choose to offer,
167 |       and charge a fee for, acceptance of support, warranty, indemnity,
168 |       or other liability obligations and/or rights consistent with this
169 |       License. However, in accepting such obligations, You may act only
170 |       on Your own behalf and on Your sole responsibility, not on behalf
171 |       of any other Contributor, and only if You agree to indemnify,
172 |       defend, and hold each Contributor harmless for any liability
173 |       incurred by, or claims asserted against, such Contributor by reason
174 |       of your accepting any such warranty or additional liability.
175 | 
176 |    END OF TERMS AND CONDITIONS
177 | 
178 |    APPENDIX: How to apply the Apache License to your work.
179 | 
180 |       To apply the Apache License to your work, attach the following
181 |       boilerplate notice, with the fields enclosed by brackets "[]"
182 |       replaced with your own identifying information. (Don't include
183 |       the brackets!)  The text should be enclosed in the appropriate
184 |       comment syntax for the file format. We also recommend that a
185 |       file or class name and description of purpose be included on the
186 |       same "printed page" as the copyright notice for easier
187 |       identification within third-party archives.
188 | 
189 |    Copyright [yyyy] [name of copyright owner]
190 | 
191 |    Licensed under the Apache License, Version 2.0 (the "License");
192 |    you may not use this file except in compliance with the License.
193 |    You may obtain a copy of the License at
194 | 
195 |        http://www.apache.org/licenses/LICENSE-2.0
196 | 
197 |    Unless required by applicable law or agreed to in writing, software
198 |    distributed under the License is distributed on an "AS IS" BASIS,
199 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 |    See the License for the specific language governing permissions and
201 |    limitations under the License.
202 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # SSD_for_Tensorflow
 2 |   
 3 | Single Shot MultiBox Detector目标检测算法基于tensorflow的实现<br/>
 4 | 论文在<a href='https://arxiv.org/abs/1512.02325' target='_blank'>这里</a>
 5 | <br/><br/>
 6 | 网上也有不少基于tensorflow实现ssd的源码，不过大多码得太复杂。<br/>
 7 | 我看了几套，然后就有一种强烈的冲动再码一套尽可能简单直接的源码，一方面可以更好地理解SSD的内部原理，另一方面也可以给各位初学者有个简单入门的源码参考。<br/>
 8 | <br/>
 9 | 代码结构：<br/>
10 | <b style='color:#ff5500'>ssd300.py</b> - ssd的核心代码封装，实现300 * 300的图片格式。<br/>
11 | <b style='color:#ff5500'>main.py</b> - ssd300的使用用例，包括训练、检测的调用示例。训练时使用voc2012数据集，数据可以从<a href='http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar' target='_blank'>这里</a>下载，解压到\train_datasets\voc2012目录下
12 | <br/>
13 | <br/>
14 | 因为要简洁，就只有这2个文件。
15 | <br/>
16 | <br/>
17 | 与原论文不一致的地方：<br/>
18 | <span style='text-decoration:line-through'><b>1</b>，box的位置信息论文描述为 [center_X, center_Y, width, height], 为了更好兼容和理解，这套源码统一改为[top_X, top_Y, width, height]</span><br/>
19 | <br/><br/>
20 | <b>2</b>，论文中default box的width=scale*sqrt(aspect_ratio)、height=scale/sqrt(aspect_ratio) 是错误的，<br/>改为width=sqrt(scale * aspect_ratio)、height=sqrt(scale/aspect_ratio)，有兴趣的朋友可以反推一下。<br/><br/>
21 | <b>3</b>，按照论文中描述长宽比ratio = 1时，scale=sqrt(scale0 * scale1)，即值为sqrt(1.0 * 2.0)=1.414，与scale4=1.5接近，不利于区分default box，因此直接修改为(scale0+scale4)/2=(1.0+1.5)/2=1.25，即取scale0和scale4中间值。<br/><br/>
22 | <b>4</b>，论文中default_box_scale由公式s_k=s_min+(s_max-s_min) * (k-1)/(m-1)生成,源码改为np.linspace生成等差数组,效果一致<br/><br/>
23 | <b>5</b>，box scale 由[ 0.2 , 0.9 ]改为[ 0.1 , 0.9 ]，因为最小box面积0.2不利于识别面积小的物体，所以改为0.1<br/><br/>
24 | 
25 | 
26 | <br/><br/>
27 | 调用简单示例<br/>
28 | 1,检测<br/>
29 | &nbsp;&nbsp;&nbsp;&nbsp;ssd_model = ssd300.SSD300(tf_sess=sess, isTraining=False)<br/>
30 | &nbsp;&nbsp;&nbsp;&nbsp;f_class,f_location = ssd_model.run(input_img,None)<br/>
31 | 2,训练<br/>
32 | &nbsp;&nbsp;&nbsp;&nbsp;ssd_model = ssd300.SSD300(tf_sess=sess, isTraining=True)<br/>
33 | &nbsp;&nbsp;&nbsp;&nbsp;loss_all,loss_location,loss_class = ssd_model.run(train_data, actual_data)<br/>
34 | <br/>
35 | 【整体框架源码已完成，可以参考学习。还没完成训练,可能还存在一些问题,如果发现有问题,请告诉我 : jasonli8848@qq.com】<br/>
36 | <br/>
37 | 【注】<br/>
38 | 1，【经实验top_x,top_y并不适合卷积，会降低精度，应改为center_x,center_y】；<br/>
39 | 2，【源码中vgg基础网络并不完善，最好改为ResNet + Inception2】；<br/>
40 | 3，【default box 应根据具体业务设置，以免造成资源浪费以及影响精度】；<br/>
41 | 
42 | 


--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
  1 | """
  2 | date: 2017/11/10
  3 | author: lslcode [jasonli8848@qq.com]
  4 | """
  5 | 
  6 | import os
  7 | import gc
  8 | import xml.etree.ElementTree as etxml
  9 | import random
 10 | import skimage.io
 11 | import skimage.transform
 12 | import numpy as np
 13 | import tensorflow as tf
 14 | import ssd300
 15 | import time
 16 | 
 17 | '''
 18 | SSD检测
 19 | '''
 20 | def testing():
 21 |     gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.9)
 22 |     with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
 23 |         ssd_model = ssd300.SSD300(sess,False)
 24 |         sess.run(tf.global_variables_initializer())
 25 |         saver = tf.train.Saver(var_list=tf.trainable_variables())
 26 |         if os.path.exists('./session_params/session.ckpt.index') :
 27 |             saver.restore(sess, './session_params/session.ckpt')
 28 |             image, actual,file_list = get_traindata_voc2007(1)
 29 |             pred_class, pred_class_val, pred_location = ssd_model.run(image,None)
 30 |             print('file_list:' + str(file_list))
 31 |             
 32 |             for index, act in zip(range(len(image)), actual):
 33 |                 for a in act :
 34 |                     print('【img-'+str(index)+' actual】:' + str(a))
 35 |                 print('pred_class:' + str(pred_class[index]))
 36 |                 print('pred_class_val:' + str(pred_class_val[index]))
 37 |                 print('pred_location:' + str(pred_location[index]))   
 38 |                                    
 39 |         else:
 40 |             print('No Data Exists!')
 41 |         sess.close()
 42 |     
 43 | '''
 44 | SSD训练
 45 | '''
 46 | def training():
 47 |     batch_size = 15
 48 |     running_count = 0
 49 |     
 50 |     gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.9)
 51 |     with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
 52 |         ssd_model = ssd300.SSD300(sess,True)
 53 |         sess.run(tf.global_variables_initializer())
 54 |         saver = tf.train.Saver(var_list=tf.trainable_variables())
 55 |         if os.path.exists('./session_params/session.ckpt.index') :
 56 |             print('\nStart Restore')
 57 |             saver.restore(sess, './session_params/session.ckpt')
 58 |             print('\nEnd Restore')
 59 |          
 60 |         print('\nStart Training')
 61 |         min_loss_location = 100000.
 62 |         min_loss_class = 100000.
 63 |         while((min_loss_location + min_loss_class) > 0.001 and running_count < 100000):
 64 |             running_count += 1
 65 |             
 66 |             train_data, actual_data,_ = get_traindata_voc2007(batch_size)
 67 |             if len(train_data) > 0:
 68 |                 loss_all,loss_class,loss_location,pred_class,pred_location = ssd_model.run(train_data, actual_data)
 69 |                 l = np.sum(loss_location)
 70 |                 c = np.sum(loss_class)
 71 |                 if min_loss_location > l:
 72 |                     min_loss_location = l
 73 |                 if min_loss_class > c:
 74 |                     min_loss_class = c
 75 |                
 76 |                 print('Running:【' + str(running_count) + '】|Loss All:【'+str(min_loss_location + min_loss_class)+'|'+ str(loss_all) + '】|Location:【'+ str(np.sum(loss_location)) + '】|Class:【'+ str(np.sum(loss_class)) + '】|pred_class:【'+ str(np.sum(pred_class))+'|'+str(np.amax(pred_class))+'|'+ str(np.min(pred_class)) + '】|pred_location:【'+ str(np.sum(pred_location))+'|'+str(np.amax(pred_location))+'|'+ str(np.min(pred_location)) + '】')
 77 |                 
 78 |                 # 定期保存ckpt
 79 |                 if running_count % 100 == 0:
 80 |                     saver.save(sess, './session_params/session.ckpt')
 81 |                     print('session.ckpt has been saved.')
 82 |                     gc.collect()
 83 |             else:
 84 |                 print('No Data Exists!')
 85 |                 break
 86 |             
 87 |         saver.save(sess, './session_params/session.ckpt')
 88 |         sess.close()
 89 |         gc.collect()
 90 |             
 91 |     print('End Training')
 92 |     
 93 | '''
 94 | 获取voc2007训练图片数据
 95 | train_data：训练批次图像，格式[None,width,height,3]
 96 | actual_data：图像标注数据，格式[None,[None,center_x,center_y,width,height,lable]]
 97 | '''
 98 | file_name_list = os.listdir('./train_datasets/voc2007/JPEGImages/')
 99 | lable_arr = ['background','aeroplane','bicycle','bird','boat','bottle','bus','car','cat','chair','cow','diningtable','dog','horse','motorbike','person','pottedplant','sheep','sofa','train','tvmonitor']
100 | # 图像白化，格式:[R,G,B]
101 | whitened_RGB_mean = [123.68, 116.78, 103.94]
102 | def get_traindata_voc2007(batch_size):
103 |     def get_actual_data_from_xml(xml_path):
104 |         actual_item = []
105 |         try:
106 |             annotation_node = etxml.parse(xml_path).getroot()
107 |             img_width =  float(annotation_node.find('size').find('width').text.strip())
108 |             img_height = float(annotation_node.find('size').find('height').text.strip())
109 |             object_node_list = annotation_node.findall('object')       
110 |             for obj_node in object_node_list:                       
111 |                 lable = lable_arr.index(obj_node.find('name').text.strip())
112 |                 bndbox = obj_node.find('bndbox')
113 |                 x_min = float(bndbox.find('xmin').text.strip())
114 |                 y_min = float(bndbox.find('ymin').text.strip())
115 |                 x_max = float(bndbox.find('xmax').text.strip())
116 |                 y_max = float(bndbox.find('ymax').text.strip())
117 |                 # 位置数据用比例来表示，格式[center_x,center_y,width,height,lable]
118 |                 actual_item.append([((x_min + x_max)/2/img_width), ((y_min + y_max)/2/img_height), ((x_max - x_min) / img_width), ((y_max - y_min) / img_height), lable])
119 |             return actual_item  
120 |         except:
121 |             return None
122 |         
123 |     train_data = []
124 |     actual_data = []
125 |     
126 |     file_list = random.sample(file_name_list, batch_size)
127 |     
128 |     for f_name in file_list :
129 |         img_path = './train_datasets/voc2007/JPEGImages/' + f_name
130 |         xml_path = './train_datasets/voc2007/Annotations/' + f_name.replace('.jpg','.xml')
131 |         if os.path.splitext(img_path)[1].lower() == '.jpg' :
132 |             actual_item = get_actual_data_from_xml(xml_path)
133 |             if actual_item != None :
134 |                 actual_data.append(actual_item)
135 |             else :
136 |                 print('Error : '+xml_path)
137 |                 continue
138 |             img = skimage.io.imread(img_path)
139 |             img = skimage.transform.resize(img, (300, 300))
140 |             # 图像白化预处理
141 |             img = img - whitened_RGB_mean
142 |             train_data.append(img)
143 |             
144 |     return train_data, actual_data,file_list
145 | 
146 |  
147 | '''
148 | 主程序入口
149 | '''
150 | if __name__ == '__main__':
151 |     print('\nStart Running')
152 |     # 检测
153 |     #testing()
154 |     # 训练
155 |     training()
156 |     print('\nEnd Running')
157 |    
158 |         
159 | 


--------------------------------------------------------------------------------
/session_params/README.md:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/ssd300.py:
--------------------------------------------------------------------------------
  1 | """
  2 | date: 2017/11/10
  3 | author: lslcode [jasonli8848@qq.com]
  4 | """
  5 | 
  6 | import numpy as np
  7 | import tensorflow as tf
  8 | from tensorflow.python.training.moving_averages import assign_moving_average
  9 | 
 10 | class SSD300:
 11 |     def __init__(self, tf_sess, isTraining):
 12 |         # tensorflow session
 13 |         self.sess = tf_sess
 14 |         # 是否训练
 15 |         self.isTraining = isTraining
 16 |         # 允许的图像大小
 17 |         self.img_size = [300, 300]
 18 |         # 分类总数量
 19 |         self.classes_size = 21
 20 |         # 背景分类的值
 21 |         self.background_classes_val = 0
 22 |         # 每个特征图单元的default box数量
 23 |         self.default_box_size = [4, 6, 6, 6, 4, 4]
 24 |         # default box 尺寸长宽比例
 25 |         self.box_aspect_ratio = [
 26 |             [1.0, 1.25, 2.0, 3.0],
 27 |             [1.0, 1.25, 2.0, 3.0, 1.0 / 2.0, 1.0 / 3.0],
 28 |             [1.0, 1.25, 2.0, 3.0, 1.0 / 2.0, 1.0 / 3.0],
 29 |             [1.0, 1.25, 2.0, 3.0, 1.0 / 2.0, 1.0 / 3.0],
 30 |             [1.0, 1.25, 2.0, 3.0],
 31 |             [1.0, 1.25, 2.0, 3.0]
 32 |         ]
 33 |         # 最小default box面积比例
 34 |         self.min_box_scale = 0.05
 35 |         # 最大default box面积比例
 36 |         self.max_box_scale = 0.9
 37 |         # 每个特征层的面积比例
 38 |         # numpy生成等差数组，效果等同于论文中的s_k=s_min+(s_max-s_min)*(k-1)/(m-1)
 39 |         self.default_box_scale = np.linspace(self.min_box_scale, self.max_box_scale, num = np.amax(self.default_box_size))
 40 |         print('##   default_box_scale:'+str(self.default_box_scale))
 41 |         # 卷积步长
 42 |         self.conv_strides_1 = [1, 1, 1, 1]
 43 |         self.conv_strides_2 = [1, 2, 2, 1]
 44 |         self.conv_strides_3 = [1, 3, 3, 1]
 45 |         # 池化窗口
 46 |         self.pool_size = [1, 2, 2, 1]
 47 |         # 池化步长
 48 |         self.pool_strides = [1, 2, 2, 1]
 49 |         # Batch Normalization 算法的 decay 参数
 50 |         self.conv_bn_decay = 0.99999
 51 |         # Batch Normalization 算法的 variance_epsilon 参数
 52 |         self.conv_bn_epsilon = 0.00001
 53 |         # Jaccard相似度判断阀值
 54 |         self.jaccard_value = 0.6
 55 | 
 56 |         # 初始化Tensorflow Graph
 57 |         self.generate_graph()
 58 |         
 59 |     def generate_graph(self):
 60 |         # 输入数据
 61 |         self.input = tf.placeholder(shape=[None, self.img_size[0], self.img_size[1], 3], dtype=tf.float32, name='input_image')
 62 |         
 63 |         # vvg16卷积层 1
 64 |         self.conv_1_1 = self.convolution(self.input, [3, 3,  3, 32], self.conv_strides_1,'conv_1_1')
 65 |         self.conv_1_2 = self.convolution(self.conv_1_1, [3, 3, 32, 32], self.conv_strides_1,'conv_1_2')
 66 |         self.conv_1_2 = tf.nn.avg_pool(self.conv_1_2, self.pool_size, self.pool_strides, padding='SAME', name='pool_1_2')
 67 |         print('##   conv_1_2 shape: ' + str(self.conv_1_2.get_shape().as_list()))
 68 |         # vvg16卷积层 2
 69 |         self.conv_2_1 = self.convolution(self.conv_1_2, [3, 3,  32, 64], self.conv_strides_1,'conv_2_1')
 70 |         self.conv_2_2 = self.convolution(self.conv_2_1, [3, 3, 64, 64], self.conv_strides_1,'conv_2_2')
 71 |         #self.conv_2_2 = tf.nn.avg_pool(self.conv_2_2, self.pool_size, self.pool_strides, padding='SAME',   name='pool_2_2')
 72 |         print('##   conv_2_2 shape: ' + str(self.conv_2_2.get_shape().as_list()))
 73 |         # vvg16卷积层 3
 74 |         self.conv_3_1 = self.convolution(self.conv_2_2, [3, 3, 64, 128], self.conv_strides_1,'conv_3_1')
 75 |         self.conv_3_2 = self.convolution(self.conv_3_1, [3, 3, 128, 128], self.conv_strides_1,'conv_3_2')
 76 |         self.conv_3_3 = self.convolution(self.conv_3_2, [3, 3, 128, 128], self.conv_strides_1,'conv_3_3')
 77 |         self.conv_3_3 = tf.nn.avg_pool(self.conv_3_3, self.pool_size, self.pool_strides, padding='SAME', name='pool_3_3')
 78 |         print('##   conv_3_3 shape: ' + str(self.conv_3_3.get_shape().as_list()))
 79 |         # vvg16卷积层 4
 80 |         self.conv_4_1 = self.convolution(self.conv_3_3, [3, 3, 128, 256], self.conv_strides_1,'conv_4_1')
 81 |         self.conv_4_2 = self.convolution(self.conv_4_1, [3, 3, 256, 256], self.conv_strides_1,'conv_4_2')
 82 |         self.conv_4_3 = self.convolution(self.conv_4_2, [3, 3, 256, 256], self.conv_strides_1,'conv_4_3')
 83 |         self.conv_4_3 = tf.nn.avg_pool(self.conv_4_3, self.pool_size, self.pool_strides, padding='SAME', name='pool_4_3')
 84 |         print('##   conv_4_3 shape: ' + str(self.conv_4_3.get_shape().as_list()))
 85 |         # vvg16卷积层 5
 86 |         self.conv_5_1 = self.convolution(self.conv_4_3, [3, 3, 256, 256], self.conv_strides_1,'conv_5_1')
 87 |         self.conv_5_2 = self.convolution(self.conv_5_1, [3, 3, 256, 256], self.conv_strides_1,'conv_5_2')
 88 |         self.conv_5_3 = self.convolution(self.conv_5_2, [3, 3, 256, 256], self.conv_strides_1,'conv_5_3')
 89 |         self.conv_5_3 = tf.nn.avg_pool(self.conv_5_3, self.pool_size, self.pool_strides, padding='SAME', name='pool_5_3')
 90 |         print('##   conv_5_3 shape: ' + str(self.conv_5_3.get_shape().as_list()))
 91 |         # ssd卷积层 6
 92 |         self.conv_6_1 = self.convolution(self.conv_5_3, [3, 3, 256, 512], self.conv_strides_1,'conv_6_1')
 93 |         print('##   conv_6_1 shape: ' + str(self.conv_6_1.get_shape().as_list()))
 94 |         # ssd卷积层 7 
 95 |         self.conv_7_1 = self.convolution(self.conv_6_1, [1, 1, 512, 512], self.conv_strides_1,'conv_7_1')
 96 |         print('##   conv_7_1 shape: ' + str(self.conv_7_1.get_shape().as_list()))
 97 |         # ssd卷积层 8     
 98 |         self.conv_8_1 = self.convolution(self.conv_7_1, [1, 1, 512, 128], self.conv_strides_1,'conv_8_1')
 99 |         self.conv_8_2 = self.convolution(self.conv_8_1, [3, 3, 128, 256], self.conv_strides_2,'conv_8_2')
100 |         print('##   conv_8_2 shape: ' + str(self.conv_8_2.get_shape().as_list()))
101 |         # ssd卷积层 9   
102 |         self.conv_9_1 = self.convolution(self.conv_8_2, [1, 1, 256, 64], self.conv_strides_1,'conv_9_1')
103 |         self.conv_9_2 = self.convolution(self.conv_9_1, [3, 3, 64, 128], self.conv_strides_2,'conv_9_2')
104 |         print('##   conv_9_2 shape: ' + str(self.conv_9_2.get_shape().as_list()))
105 |         # ssd卷积层 10    
106 |         self.conv_10_1 = self.convolution(self.conv_9_2, [1, 1, 128, 64], self.conv_strides_1,'conv_10_1')
107 |         self.conv_10_2 = self.convolution(self.conv_10_1, [3, 3, 64, 128], self.conv_strides_2,'conv_10_2')
108 |         print('##   conv_10_2 shape: ' + str(self.conv_10_2.get_shape().as_list()))
109 |         # ssd卷积层 11
110 |         self.conv_11 = tf.nn.avg_pool(self.conv_10_2, self.pool_size, self.pool_strides, "VALID")
111 |         print('##   conv_11 shape: ' + str(self.conv_11.get_shape().as_list()))
112 | 
113 |         # 第 1 层 特征层，来源于conv_4_3  
114 |         self.features_1 = self.convolution(self.conv_4_3, [3, 3, 256, self.default_box_size[0] * (self.classes_size + 4)], self.conv_strides_1,'features_1')
115 |         print('##   features_1 shape: ' + str(self.features_1.get_shape().as_list()))
116 |         # 第 2 层 特征层，来源于conv_7_1
117 |         self.features_2 = self.convolution(self.conv_7_1, [3, 3, 512, self.default_box_size[1] * (self.classes_size + 4)], self.conv_strides_1,'features_2')
118 |         print('##   features_2 shape: ' + str(self.features_2.get_shape().as_list()))
119 |         # 第 3 层 特征层，来源于conv_8_2
120 |         self.features_3 = self.convolution(self.conv_8_2, [3, 3, 256,  self.default_box_size[2] * (self.classes_size + 4)], self.conv_strides_1,'features_3')
121 |         print('##   features_3 shape: ' + str(self.features_3.get_shape().as_list()))
122 |         # 第 4 层 特征层，来源于conv_9_2
123 |         self.features_4 = self.convolution(self.conv_9_2, [3, 3, 128,  self.default_box_size[3] * (self.classes_size + 4)], self.conv_strides_1,'features_4')
124 |         print('##   features_4 shape: ' + str(self.features_4.get_shape().as_list()))
125 |         # 第 5 层 特征层，来源于conv_10_2
126 |         self.features_5 = self.convolution(self.conv_10_2, [3, 3, 128,  self.default_box_size[4] * (self.classes_size + 4)], self.conv_strides_1,'features_5')
127 |         print('##   features_5 shape: ' + str(self.features_5.get_shape().as_list()))
128 |         # 第 6 层 特征层，来源于conv_11
129 |         self.features_6 = self.convolution(self.conv_11, [1, 1, 128,  self.default_box_size[5] * (self.classes_size + 4)], self.conv_strides_1,'features_6')
130 |         print('##   features_6 shape: ' + str(self.features_6.get_shape().as_list()))
131 |         
132 |         # 特征层集合
133 |         self.feature_maps = [self.features_1, self.features_2, self.features_3, self.features_4, self.features_5, self.features_6]
134 |         # 获取卷积后各个特征层的shape,以便生成feature和groundtruth格式一致的训练数据
135 |         self.feature_maps_shape = [m.get_shape().as_list() for m in self.feature_maps]
136 |         
137 |         # 整理feature数据
138 |         self.tmp_all_feature = []
139 |         for i, fmap in zip(range(len(self.feature_maps)), self.feature_maps):
140 |             width = self.feature_maps_shape[i][1]
141 |             height = self.feature_maps_shape[i][2]
142 |             # 这里reshape目的为定位和类别2方面回归作准备
143 |             # reshape前 shape=[None, width, height, default_box*(classes+4)]
144 |             # reshape后 shape=[None, width*height*default_box, (classes+4) ]
145 |             self.tmp_all_feature.append(tf.reshape(fmap, [-1, (width * height * self.default_box_size[i]) , (self.classes_size + 4)]))
146 |         # 合并每张图像产生的所有特征
147 |         self.tmp_all_feature = tf.concat(self.tmp_all_feature, axis=1)
148 |         # 这里正式拆分为定位和类别2类数据
149 |         self.feature_class = self.tmp_all_feature[:,:,:self.classes_size]
150 |         self.feature_location = self.tmp_all_feature[:,:,self.classes_size:]
151 | 
152 |         print('##   feature_class shape : ' + str(self.feature_class.get_shape().as_list()))
153 |         print('##   feature_location shape : ' + str(self.feature_location.get_shape().as_list()))
154 |         # 生成所有default boxs
155 |         self.all_default_boxs = self.generate_all_default_boxs()
156 |         self.all_default_boxs_len = len(self.all_default_boxs)
157 |         print('##   all default boxs : ' + str(self.all_default_boxs_len))
158 | 
159 |         # 输入真实数据
160 |         self.groundtruth_class = tf.placeholder(shape=[None,self.all_default_boxs_len], dtype=tf.int32,name='groundtruth_class')
161 |         self.groundtruth_location = tf.placeholder(shape=[None,self.all_default_boxs_len,4], dtype=tf.float32,name='groundtruth_location')
162 |         self.groundtruth_positives = tf.placeholder(shape=[None,self.all_default_boxs_len], dtype=tf.float32,name='groundtruth_positives')
163 |         self.groundtruth_negatives = tf.placeholder(shape=[None,self.all_default_boxs_len], dtype=tf.float32,name='groundtruth_negatives')
164 | 
165 |         # 损失函数
166 |         self.groundtruth_count = tf.add(self.groundtruth_positives , self.groundtruth_negatives)
167 |         self.softmax_cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=self.feature_class, labels=self.groundtruth_class)
168 |         self.loss_location = tf.div(tf.reduce_sum(tf.multiply(tf.reduce_sum(self.smooth_L1(tf.subtract(self.groundtruth_location , self.feature_location)), reduction_indices=2) , self.groundtruth_positives), reduction_indices=1) , tf.reduce_sum(self.groundtruth_positives, reduction_indices = 1))
169 |         self.loss_class = tf.div(tf.reduce_sum(tf.multiply(self.softmax_cross_entropy , self.groundtruth_count), reduction_indices=1) , tf.reduce_sum(self.groundtruth_count, reduction_indices = 1))
170 |         self.loss_all = tf.reduce_sum(tf.add(self.loss_class , self.loss_location))
171 |  
172 |         # loss优化函数
173 |         self.optimizer = tf.train.AdamOptimizer(0.001)
174 |         #self.optimizer = tf.train.GradientDescentOptimizer(0.001)
175 |         self.train = self.optimizer.minimize(self.loss_all)
176 | 
177 |     # 图像检测与训练
178 |     # input_images : 输入图像数据，格式:[None,width,hight,channel]
179 |     # actual_data : 标注数据，格式:[None,[None,center_X,center_Y,width,hight,classes]] , classes值范围[0,classes_size)
180 |     def run(self, input_images, actual_data):
181 |         # 训练部分
182 |         if self.isTraining :
183 |             if actual_data is None :
184 |                 raise Exception('actual_data参数不存在!')
185 |             if len(input_images) != len(actual_data):
186 |                 raise Exception('input_images 与 actual_data参数长度不对应!')
187 | 
188 |             f_class, f_location = self.sess.run([self.feature_class, self.feature_location], feed_dict={self.input : input_images })
189 | 
190 |             with tf.control_dependencies([self.feature_class, self.feature_location]):
191 |                 #检查数据是否正确
192 |                 f_class = self.check_numerics(f_class,'预测集f_class')
193 |                 f_location = self.check_numerics(f_location,'预测集f_location')
194 |                 
195 |                 gt_class,gt_location,gt_positives,gt_negatives = self.generate_groundtruth_data(actual_data, f_class) 
196 |                 #print('gt_positives :【'+str(np.sum(gt_positives))+'|'+str(np.amax(gt_positives))+'|'+str(np.amin(gt_positives))+'】|gt_negatives : 【'+str(np.sum(gt_negatives))+'|'+str(np.amax(gt_negatives))+'|'+str(np.amin(gt_negatives))+'】')
197 |                 self.sess.run(self.train, feed_dict={
198 |                     self.input : input_images, 
199 |                     self.groundtruth_class : gt_class,
200 |                     self.groundtruth_location : gt_location,
201 |                     self.groundtruth_positives : gt_positives,
202 |                     self.groundtruth_negatives : gt_negatives
203 |                 })
204 |                 with tf.control_dependencies([self.train]):
205 |                     loss_all,loss_location,loss_class = self.sess.run([self.loss_all,self.loss_location,self.loss_class], feed_dict={
206 |                         self.input : input_images,
207 |                         self.groundtruth_class : gt_class,
208 |                         self.groundtruth_location : gt_location,
209 |                         self.groundtruth_positives : gt_positives,
210 |                         self.groundtruth_negatives : gt_negatives
211 |                     })
212 |                     #检查数据是否正确
213 |                     loss_all = self.check_numerics(loss_all,'损失值loss_all') 
214 |                     return loss_all, loss_class, loss_location, f_class, f_location
215 | 
216 |         # 检测部分
217 |         else :
218 |             # softmax归一化预测结果
219 |             feature_class_softmax = tf.nn.softmax(logits=self.feature_class, dim=-1)
220 |             # 过滤background的预测值
221 |             background_filter = np.ones(self.classes_size, dtype=np.float32)
222 |             background_filter[self.background_classes_val] = 0 
223 |             background_filter = tf.constant(background_filter)  
224 |             feature_class_softmax=tf.multiply(feature_class_softmax, background_filter)
225 |             # 计算每个box的最大预测值
226 |             feature_class_softmax = tf.reduce_max(feature_class_softmax,2)
227 |             # 过滤冗余的预测结果
228 |             box_top_set = tf.nn.top_k(feature_class_softmax, int(self.all_default_boxs_len/20))
229 |             box_top_index = box_top_set.indices
230 |             box_top_value = box_top_set.values
231 |             f_class, f_location, f_class_softmax, box_top_index, box_top_value = self.sess.run(
232 |                 [self.feature_class, self.feature_location, feature_class_softmax, box_top_index, box_top_value], 
233 |                 feed_dict={self.input : input_images }
234 |             )
235 |             top_shape = np.shape(box_top_index)
236 |             pred_class = []
237 |             pred_class_val = []
238 |             pred_location = []
239 |             for i in range(top_shape[0]) :
240 |                 item_img_class = []
241 |                 item_img_class_val = []
242 |                 item_img_location = []
243 |                 for j in range(top_shape[1]) : 
244 |                     p_class_val = f_class_softmax[i][box_top_index[i][j]]
245 |                     if p_class_val < 0.5:
246 |                         continue
247 |                     p_class = np.argmax(f_class[i][box_top_index[i][j]])
248 |                     if p_class==self.background_classes_val:
249 |                         continue
250 |                     p_location = f_location[i][box_top_index[i][j]]
251 |                     if p_location[0]<0 or p_location[1]<0 or p_location[2]<0 or p_location[3]<0 or p_location[2]==0 or p_location[3]==0 :
252 |                         continue
253 |                     is_box_filter = False
254 |                     for f_index in range(len(item_img_class)) :
255 |                         if self.jaccard(p_location,item_img_location[f_index]) > 0.3 and p_class == item_img_class[f_index] :
256 |                             is_box_filter = True
257 |                             break
258 |                     if is_box_filter == False :
259 |                         item_img_class.append(p_class)
260 |                         item_img_class_val.append(p_class_val)
261 |                         item_img_location.append(p_location)        
262 |                 pred_class.append(item_img_class)
263 |                 pred_class_val.append(item_img_class_val)
264 |                 pred_location.append(item_img_location)
265 |             return pred_class, pred_class_val, pred_location
266 | 
267 |     # 卷积操作
268 |     def convolution(self, input, shape, strides, name):
269 |         with tf.variable_scope(name):
270 |             weight = tf.get_variable(initializer=tf.truncated_normal(shape, 0, 1), dtype=tf.float32, name=name+'_weight')
271 |             bias = tf.get_variable(initializer=tf.truncated_normal(shape[-1:], 0, 1), dtype=tf.float32, name=name+'_bias')
272 |             result = tf.nn.conv2d(input, weight, strides, padding='SAME', name=name+'_conv')
273 |             result = tf.nn.bias_add(result, bias)
274 |             result = self.batch_normalization(result, name=name+'_bn')
275 |             result = tf.nn.relu(result, name=name+'_relu')
276 |             return result
277 | 
278 |     # fully connect操作
279 |     def fc(self, input, out_shape, name):
280 |         with tf.variable_scope(name+'_fc'):
281 |             in_shape = 1
282 |             for d in input.get_shape().as_list()[1:]:
283 |                 in_shape *= d
284 |             weight = tf.get_variable(initializer=tf.truncated_normal([in_shape, out_shape], 0, 1), dtype=tf.float32, name=name+'_fc_weight')
285 |             bias = tf.get_variable(initializer=tf.truncated_normal([out_shape], 0, 1), dtype=tf.float32, name=name+'_fc_bias')
286 |             result = tf.reshape(input, [-1, in_shape])
287 |             result = tf.nn.xw_plus_b(result, weight, bias, name=name+'_fc_do')
288 |             return result
289 | 
290 |     # Batch Normalization算法
291 |     def batch_normalization(self, input, name):
292 |         with tf.variable_scope(name):
293 |             bn_input_shape = input.get_shape() 
294 |             moving_mean = tf.get_variable(name+'_mean', bn_input_shape[-1:] , initializer=tf.zeros_initializer, trainable=False)
295 |             moving_variance = tf.get_variable(name+'_variance', bn_input_shape[-1:] , initializer=tf.ones_initializer, trainable=False)
296 |             def mean_var_with_update():
297 |                 mean, variance = tf.nn.moments(input, list(range(len(bn_input_shape) - 1)), name=name+'_moments')
298 |                 with tf.control_dependencies([assign_moving_average(moving_mean, mean, self.conv_bn_decay),assign_moving_average(moving_variance, variance, self.conv_bn_decay)]):
299 |                     return tf.identity(mean), tf.identity(variance)
300 |             #mean, variance = tf.cond(tf.cast(self.isTraining, tf.bool), mean_var_with_update, lambda: (moving_mean, moving_variance))
301 |             mean, variance = tf.cond(tf.cast(True, tf.bool), mean_var_with_update, lambda: (moving_mean, moving_variance))
302 |             beta = tf.get_variable(name+'_beta', bn_input_shape[-1:] , initializer=tf.zeros_initializer)
303 |             gamma = tf.get_variable(name+'_gamma', bn_input_shape[-1:] , initializer=tf.ones_initializer)
304 |             return tf.nn.batch_normalization(input, mean, variance, beta, gamma, self.conv_bn_epsilon, name+'_bn_opt')
305 |     
306 |     # smooth_L1 算法
307 |     def smooth_L1(self, x):
308 |         return tf.where(tf.less_equal(tf.abs(x),1.0), tf.multiply(0.5, tf.pow(x, 2.0)), tf.subtract(tf.abs(x), 0.5))
309 | 
310 |     # 初始化、整理训练数据
311 |     def generate_all_default_boxs(self):
312 |         # 全部按照比例计算并生成一张图像产生的每个default box的坐标以及长宽
313 |         # 用于后续的jaccard匹配
314 |         all_default_boxes = []
315 |         for index, map_shape in zip(range(len(self.feature_maps_shape)), self.feature_maps_shape):
316 |             width = int(map_shape[1])
317 |             height = int(map_shape[2])
318 |             cell_scale = self.default_box_scale[index]
319 |             for x in range(width):
320 |                 for y in range(height):
321 |                     for ratio in self.box_aspect_ratio[index]:
322 |                         center_x = (x / float(width)) + (0.5/ float(width))
323 |                         center_y = (y / float(height)) + (0.5 / float(height))
324 |                         box_width = np.sqrt(cell_scale * ratio)
325 |                         box_height = np.sqrt(cell_scale / ratio)
326 |                         all_default_boxes.append([center_x, center_y, box_width, box_height])
327 |         all_default_boxes = np.array(all_default_boxes)
328 |         #检查数据是否正确
329 |         all_default_boxes = self.check_numerics(all_default_boxes,'all_default_boxes') 
330 |         return all_default_boxes
331 | 
332 |     # 整理生成groundtruth数据
333 |     def generate_groundtruth_data(self,input_actual_data, f_class):
334 |         # 生成空数组，用于保存groundtruth
335 |         input_actual_data_len = len(input_actual_data)
336 |         gt_class = np.zeros((input_actual_data_len, self.all_default_boxs_len)) 
337 |         gt_location = np.zeros((input_actual_data_len, self.all_default_boxs_len, 4))
338 |         gt_positives_jacc = np.zeros((input_actual_data_len, self.all_default_boxs_len))
339 |         gt_positives = np.zeros((input_actual_data_len, self.all_default_boxs_len))
340 |         gt_negatives = np.zeros((input_actual_data_len, self.all_default_boxs_len))
341 |         background_jacc = max(0, (self.jaccard_value-0.2))
342 |         # 初始化正例训练数据
343 |         for img_index in range(input_actual_data_len):
344 |             for pre_actual in input_actual_data[img_index]:
345 |                 gt_class_val = pre_actual[-1:][0]
346 |                 gt_box_val = pre_actual[:-1]
347 |                 for boxe_index in range(self.all_default_boxs_len):
348 |                     jacc = self.jaccard(gt_box_val, self.all_default_boxs[boxe_index])
349 |                     if jacc > self.jaccard_value or jacc == self.jaccard_value:
350 |                         gt_class[img_index][boxe_index] = gt_class_val
351 |                         gt_location[img_index][boxe_index] = gt_box_val
352 |                         gt_positives_jacc[img_index][boxe_index] = jacc
353 |                         gt_positives[img_index][boxe_index] = 1
354 |                         gt_negatives[img_index][boxe_index] = 0
355 |             # 如果没有正例，则随机创建一个正例，预防nan
356 |             if np.sum(gt_positives[img_index])==0 :
357 |                 #print('【没有匹配jacc】:'+str(input_actual_data[img_index]))
358 |                 random_pos_index = np.random.randint(low=0, high=self.all_default_boxs_len, size=1)[0]
359 |                 gt_class[img_index][random_pos_index] = self.background_classes_val
360 |                 gt_location[img_index][random_pos_index] = [0,0,0,0]
361 |                 gt_positives_jacc[img_index][random_pos_index] = self.jaccard_value
362 |                 gt_positives[img_index][random_pos_index] = 1
363 |                 gt_negatives[img_index][random_pos_index] = 0
364 |             # 正负例比值 1:3
365 |             gt_neg_end_count = int(np.sum(gt_positives[img_index]) * 3)
366 |             if (gt_neg_end_count+np.sum(gt_positives[img_index])) > self.all_default_boxs_len :
367 |                 gt_neg_end_count = self.all_default_boxs_len - np.sum(gt_positives[img_index])
368 |             # 随机选择负例
369 |             gt_neg_index = np.random.randint(low=0, high=self.all_default_boxs_len, size=gt_neg_end_count)
370 |             for r_index in gt_neg_index:
371 |                 if gt_positives_jacc[img_index][r_index] < background_jacc : 
372 |                     gt_class[img_index][r_index] = self.background_classes_val
373 |                     gt_positives[img_index][r_index] = 0
374 |                     gt_negatives[img_index][r_index] = 1
375 |         return gt_class, gt_location, gt_positives, gt_negatives
376 | 
377 |     # jaccard算法
378 |     # 计算IOU，rect1、rect2格式为[center_x,center_y,width,height]           
379 |     def jaccard(self, rect1, rect2):
380 |         x_overlap = max(0, (min(rect1[0]+(rect1[2]/2), rect2[0]+(rect2[2]/2)) - max(rect1[0]-(rect1[2]/2), rect2[0]-(rect2[2]/2))))
381 |         y_overlap = max(0, (min(rect1[1]+(rect1[3]/2), rect2[1]+(rect2[3]/2)) - max(rect1[1]-(rect1[3]/2), rect2[1]-(rect2[3]/2))))
382 |         intersection = x_overlap * y_overlap
383 |         # 删除超出图像大小的部分
384 |         rect1_width_sub = 0
385 |         rect1_height_sub = 0
386 |         rect2_width_sub = 0
387 |         rect2_height_sub = 0
388 |         if (rect1[0]-rect1[2]/2) < 0 : rect1_width_sub += 0-(rect1[0]-rect1[2]/2)
389 |         if (rect1[0]+rect1[2]/2) > 1 : rect1_width_sub += (rect1[0]+rect1[2]/2)-1
390 |         if (rect1[1]-rect1[3]/2) < 0 : rect1_height_sub += 0-(rect1[1]-rect1[3]/2)
391 |         if (rect1[1]+rect1[3]/2) > 1 : rect1_height_sub += (rect1[1]+rect1[3]/2)-1
392 |         if (rect2[0]-rect2[2]/2) < 0 : rect2_width_sub += 0-(rect2[0]-rect2[2]/2)
393 |         if (rect2[0]+rect2[2]/2) > 1 : rect2_width_sub += (rect2[0]+rect2[2]/2)-1
394 |         if (rect2[1]-rect2[3]/2) < 0 : rect2_height_sub += 0-(rect2[1]-rect2[3]/2)
395 |         if (rect2[1]+rect2[3]/2) > 1 : rect2_height_sub += (rect2[1]+rect2[3]/2)-1
396 |         area_box_a = (rect1[2]-rect1_width_sub) * (rect1[3]-rect1_height_sub)
397 |         area_box_b = (rect2[2]-rect2_width_sub) * (rect2[3]-rect2_height_sub)
398 |         union = area_box_a + area_box_b - intersection
399 |         if intersection > 0 and union > 0 : 
400 |             return intersection / union 
401 |         else : 
402 |             return 0
403 | 
404 |     # 检测数据是否正常
405 |     def check_numerics(self, input_dataset, message):
406 |         if str(input_dataset).find('Tensor') == 0 :
407 |             input_dataset = tf.check_numerics(input_dataset, message)
408 |         else :
409 |             dataset = np.array(input_dataset)
410 |             nan_count = np.count_nonzero(dataset != dataset) 
411 |             inf_count = len(dataset[dataset == float("inf")])
412 |             n_inf_count = len(dataset[dataset == float("-inf")])
413 |             if nan_count>0 or inf_count>0 or n_inf_count>0:
414 |                 data_error = '【'+ message +'】出现数据错误！【nan：'+str(nan_count)+'|inf：'+str(inf_count)+'|-inf：'+str(n_inf_count)+'】'
415 |                 raise Exception(data_error) 
416 |         return  input_dataset
417 | 


--------------------------------------------------------------------------------
/ssd300_resnet.py:
--------------------------------------------------------------------------------
1 | """
2 | date: 2018/01/17
3 | author: lslcode [jasonli8848@qq.com]
4 | """
5 | 


--------------------------------------------------------------------------------
/train_datasets/README.md:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/train_datasets/voc2007/README.md:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/train_datasets/voc2012/README.md:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------