├── .asserts
    ├── detect.jpg
    ├── image-20201103233753811.png
    └── 下载.jpeg
├── LICENSE
├── README.md
├── setup.py
├── tests
    ├── eval_default.py
    ├── eval_mydata.py
    ├── specify_pre_training_weight.py
    ├── test_voc2xyolo.py
    ├── train.py
    └── use_proxies.py
└── xyolo
    ├── __init__.py
    ├── config.py
    ├── convert.py
    ├── init_yolo.py
    ├── preprocessing.py
    ├── xyolo_data
        ├── coco_classes.txt
        ├── yolo_anchors.txt
        └── yolov3.cfg
    └── yolo3
        ├── __init__.py
        ├── model.py
        ├── utils.py
        └── yolo.py


/.asserts/detect.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AaronJny/xyolo/cc249687a848272a873296d0dbae86b4fb33c951/.asserts/detect.jpg


--------------------------------------------------------------------------------
/.asserts/image-20201103233753811.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AaronJny/xyolo/cc249687a848272a873296d0dbae86b4fb33c951/.asserts/image-20201103233753811.png


--------------------------------------------------------------------------------
/.asserts/下载.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AaronJny/xyolo/cc249687a848272a873296d0dbae86b4fb33c951/.asserts/下载.jpeg


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 qqwweee
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # xyolo
  2 | 
  3 | `xyolo`是一个Python实现的、高度封装的YOLO v3类库。
  4 | 
  5 | 借助xyolo，您可以只使用几行Python代码轻松完成yolo3目标检测任务的训练和调用。
  6 | 
  7 | xyolo is a highly encapsulated YOLO v3 library implemented in Python.
  8 | 
  9 | With xyolo, you can easily complete the training and calling of the yolo3 target detection task with just a few lines of Python code.
 10 | 
 11 | 
 12 | 请注意：
 13 | 
 14 | > 我使用的Python是Anaconda的Python 3.7发行版本，在shell里面进行了初始化(python和pip默认指向当前激活环境，而不是默认的python2)，所以文章中的python和pip请根据自己的情况判断是否需要替换为python3和pip3。
 15 | 
 16 | PS:
 17 | 
 18 | > 此项目是对[tf2-keras-yolo3](https://github.com/AaronJny/tf2-keras-yolo3)项目的重构和封装。
 19 | 
 20 | 
 21 | ## 一、安装
 22 | 
 23 | ### 1、通用安装方法
 24 | 
 25 | `xyolo`的安装非常简单，通过pip一键安装即可。请注意，`xyolo`需要安装的TensorFlow版本>=2.2（未安装TensorFlow的话则会自动安装）
 26 | 
 27 | ```
 28 | pip install --user xyolo
 29 | ```
 30 | 
 31 | 建议使用`--user`参数，避免遇到权限问题。
 32 | 
 33 | 当然，如果有条件的话，使用conda创建一个新环境，并在新环境中进行安装会更好，可以避免意料之外的依赖冲突问题。
 34 | 
 35 | ### 2、GPU版本安装方法
 36 | 
 37 | 如果你想使用TensorFlow的GPU版本，可以选择在安装xyolo之前，先安装2.2及以上的tensorflow-gpu。以conda举例，安装GPU支持的操作流程如下：
 38 | 
 39 | #### 1.创建一个虚拟环境，名为xyolo。
 40 | 
 41 | ```
 42 | conda create -n xyolo python=3.7
 43 | ```
 44 | 
 45 | #### 2.切换到刚创建好的环境里
 46 | 
 47 | ```
 48 | conda activate xyolo
 49 | ```
 50 | 
 51 | 切换完成后，可以通过pip查看一下环境里安装的包：
 52 | 
 53 | ```
 54 | pip list
 55 | ```
 56 | 
 57 | 结果不出预料，很干净，毕竟是一个新环境嘛：
 58 | 
 59 | ```
 60 | Package    Version
 61 | ---------- -------------------
 62 | certifi    2020.6.20
 63 | pip        20.2.4
 64 | setuptools 50.3.0.post20201006
 65 | wheel      0.35.1
 66 | ```
 67 | 
 68 | #### 3.在新环境中安装tensorflow-gpu
 69 | 
 70 | > 注意，安装前需要先安装好和tensorflow版本对应的显卡驱动，这部分还有点麻烦，就不在这篇文章中说明了，我觉得会选择使用GPU跑xyolo的同学应该都已经掌握了这部分技能了吧？
 71 | >
 72 | > 毕竟如果完全没有接触过tensorflow-gpu的话，接触xyolo应该也会选择cpu版本更好上手。
 73 | 
 74 | 我们通过conda来安装tensorflow-gpu，gpu版本的tensorflow存在cuda和cudnn依赖，使用conda可以自动解决这两者的版本依赖和配置问题。
 75 | 
 76 | ```
 77 | conda install tensorflow-gpu=2.2
 78 | ```
 79 | 
 80 | 安静地等待一段时间，即可完成tensorflow-gpu的安装。
 81 | 
 82 | #### 4.使用pip安装xyolo
 83 | 
 84 | ```
 85 | pip install --user xyolo
 86 | ```
 87 | 
 88 | 通过再次执行`pip list`，我们能够看到成功安装的xyolo及相关依赖。
 89 | 
 90 | ```
 91 | Package                Version
 92 | ---------------------- -------------------
 93 | absl-py                0.11.0
 94 | aiohttp                3.6.3
 95 | astunparse             1.6.3
 96 | async-timeout          3.0.1
 97 | attrs                  20.2.0
 98 | blinker                1.4
 99 | brotlipy               0.7.0
100 | cachetools             4.1.1
101 | certifi                2020.6.20
102 | cffi                   1.14.3
103 | chardet                3.0.4
104 | click                  7.1.2
105 | cryptography           3.1.1
106 | cycler                 0.10.0
107 | gast                   0.3.3
108 | google-auth            1.23.0
109 | google-auth-oauthlib   0.4.2
110 | google-pasta           0.2.0
111 | grpcio                 1.31.0
112 | h5py                   2.10.0
113 | idna                   2.10
114 | importlib-metadata     2.0.0
115 | Keras-Preprocessing    1.1.0
116 | kiwisolver             1.3.1
117 | loguru                 0.5.3
118 | lxml                   4.6.1
119 | Markdown               3.3.2
120 | matplotlib             3.3.2
121 | mkl-fft                1.2.0
122 | mkl-random             1.1.1
123 | mkl-service            2.3.0
124 | multidict              4.7.6
125 | numpy                  1.18.5
126 | oauthlib               3.1.0
127 | opencv-python          4.4.0.46
128 | opt-einsum             3.1.0
129 | Pillow                 8.0.1
130 | pip                    20.2.4
131 | protobuf               3.13.0
132 | pyasn1                 0.4.8
133 | pyasn1-modules         0.2.8
134 | pycparser              2.20
135 | PyJWT                  1.7.1
136 | pyOpenSSL              19.1.0
137 | pyparsing              2.4.7
138 | PySocks                1.7.1
139 | python-dateutil        2.8.1
140 | requests               2.24.0
141 | requests-oauthlib      1.3.0
142 | rsa                    4.6
143 | scipy                  1.4.1
144 | setuptools             50.3.0.post20201006
145 | six                    1.15.0
146 | tensorboard            2.2.2
147 | tensorboard-plugin-wit 1.6.0
148 | tensorflow             2.2.0
149 | tensorflow-estimator   2.2.0
150 | termcolor              1.1.0
151 | tqdm                   4.51.0
152 | urllib3                1.25.11
153 | Werkzeug               1.0.1
154 | wheel                  0.35.1
155 | wrapt                  1.12.1
156 | xyolo                  0.1.3
157 | yarl                   1.6.2
158 | zipp                   3.4.0
159 | ```
160 | 
161 | 
162 | 
163 | ## 二、使用方法
164 | 
165 | ### 1、使用官方预训练权重，进行目标检测测试
166 | 
167 | yolov3官方有提供预训练权重，如果我们要在和VOC数据集同分布或相似分布的图片上做目标检测的话，直接使用官方预训练权重也是可以的。
168 | 
169 | xyolo调用官方预训练模型的逻辑是：
170 | 
171 | - 1.从官网下载预训练权重（Darknet输出格式）
172 | - 2.将Darknet的权重文件转成keras的权重文件（现在TensorFlow和Keras已经基本不分家啦）
173 | - 3.构建模型，加载预训练权重
174 | - 4.对选择的图片进行目标检测
175 | 
176 | 好麻烦啊，是不是要写很多代码？当然不是~歪嘴.jpg
177 | 
178 | ![下载](.asserts/下载.jpeg)
179 | 
180 | 
181 | 
182 | 首先，准备一张用于检测的图片，假设它的路径是`./xyolo_data/detect.jpg`，图片内容如下：
183 | 
184 | ![detect](.asserts/detect.jpg)
185 | 
186 | 一个简单的示例如下：
187 | 
188 | ```python
189 | # 导入包
190 | from xyolo import YOLO, DefaultYolo3Config
191 | from xyolo import init_yolo_v3
192 | 
193 | 
194 | # 创建默认配置类对象
195 | config = DefaultYolo3Config()
196 | # 初始化xyolo（下载预训练权重、转换权重等操作都是在这里完成的）
197 | # 下载和转换只在第一次调用的时候进行，之后再调用会使用缓存的文件
198 | init_yolo_v3(config)
199 | # 创建一个yolo对象，这个对象提供使用yolov3进行检测和训练的接口
200 | yolo = YOLO(config)
201 | 
202 | # 检测并在图片上标注出物体
203 | img = yolo.detect_and_draw_image('./xyolo_data/detect.jpg')
204 | # 展示标注后图片
205 | img.show()
206 | ```
207 | 
208 | 输出如下:
209 | 
210 | ```
211 | 2020-11-03 23:33:49.645 | DEBUG    | xyolo.yolo3.yolo:detect_image:273 - Found 3 boxes for img
212 | 2020-11-03 23:33:49.648 | DEBUG    | xyolo.yolo3.yolo:detect_image:289 - Class dog 0.99,Position (402, 176), (586, 310)
213 | 2020-11-03 23:33:49.650 | DEBUG    | xyolo.yolo3.yolo:detect_image:289 - Class bicycle 0.96,Position (0, 80), (383, 412)
214 | 2020-11-03 23:33:49.652 | DEBUG    | xyolo.yolo3.yolo:detect_image:289 - Class person 1.00,Position (9, 1), (200, 410)
215 | 2020-11-03 23:33:49.652 | DEBUG    | xyolo.yolo3.yolo:detect_image:292 - Cost time 6.65432205500838s
216 | ```
217 | 
218 | 同时，打开了一张图片（图片尺寸不一致是我截图和排版的问题）：
219 | 
220 | ![image-20201103233753811](.asserts/image-20201103233753811.png)
221 | 
222 | 这样，一次目标检测就完成了，舒服呀~
223 | 
224 | 当然了，我相信肯定有不少同学已经在这段代码的执行过程中遇到困难或疑惑了，我来集中解答几个可能会遇到的。
225 | 
226 | 1.预训练权重下载慢或无法下载
227 | 
228 | > 因为yolov3的官网在国外，所以国内下载慢也很正常。推荐使用代理，或从备份地址下载后指定预训练权重的地址。具体方法参考`2、预训练权重无法下载或下载速度慢的解决方案`。
229 | 
230 | 2.提示权限问题
231 | 
232 | > 因为xyolo会自动下载预训练权重到安装目录下，在某些情况下可能会遇到权限问题。解决方法就是在安装部分说的，通过指定`--user`参数指明将包安装在用户目录下，一般就没问题了。
233 | 
234 | 3.检测的速度慢
235 | 
236 | > 细心的同学可能发现了，上面对图片做的一次检测，竟然花了6秒！这也太慢了吧？
237 | >
238 | > 事实上，并不是这样的。TensorFlow 2.x版本，默认使用动态图，性能会稍弱于1.x的静态图。所以，我这里使用了tf.function对速度进行了优化，在第一次运算时，模型会自动生成静态图，这部分会比较消耗时间，但后续单次计算的时间都会极大地缩短。
239 | >
240 | > 假如我们接着进行几次识别，就能够发现单次识别的时间有了明显降低，一般只需要零点几秒或几十毫秒。
241 | 
242 | ### 2、预训练权重无法下载或下载速度慢的解决方案
243 | 
244 | 主要有两种方法能够解决，分别看一下。
245 | 
246 | #### 1.设置代理
247 | 
248 | 如果我们手上有能够加快访问速度的网络代理（你应该知道我在说啥~），可以通过设置代理的形式加快下载速度，示例如下：
249 | 
250 | ```python
251 | from xyolo import YOLO, DefaultYolo3Config
252 | from xyolo import init_yolo_v3
253 | 
254 | # 创建一个DefaultYolo3Config的子类，在子类里覆盖默认的配置
255 | class MyConfig(DefaultYolo3Config):
256 |     def __init__(self):
257 |         super(MyConfig, self).__init__()
258 |         # 这是替换成你的代理地址
259 |         self.requests_proxies = {'https': 'http://localhost:7890'}
260 |         
261 | # 使用修改后的配置创建yolo对象
262 | config = MyConfig()
263 | init_yolo_v3(config)
264 | yolo = YOLO(config)
265 | 
266 | # 检测
267 | img = yolo.detect_and_draw_image('./xyolo_data/detect.jpg')
268 | img.show()
269 | ```
270 | 
271 | #### 2.从备用链接手动下载
272 | 
273 | 如果没有代理的话，也可以选择从备用下载地址下载。
274 | 
275 | 我把预训练权重上传到百度云上了，链接如下：
276 | 
277 | >  链接: https://pan.baidu.com/s/1jXpoXHQHlp6Ra0jImruPXg  密码: 48ed
278 | 
279 | 从分享页面下载好权重文件后，又有两种设置方式。
280 | 
281 | ①将文件复制到xyolo包的安装目录下的xyolo_data目录下即可，就和自动下载的情况是一样的，后续不需要再人为做任何操作。
282 | 
283 | ②将文件保存在任意位置，并在配置类中设置它的路径。
284 | 
285 | 第一种就不需要多说了，来看一个第二种设置方法的示例：
286 | 
287 | ```python
288 | from xyolo import YOLO, DefaultYolo3Config
289 | from xyolo import init_yolo_v3
290 | 
291 | # 创建一个DefaultYolo3Config的子类，在子类里覆盖默认的配置
292 | class MyConfig(DefaultYolo3Config):
293 |     def __init__(self):
294 |         super(MyConfig, self).__init__()
295 |         # 这是替换成你的文件路径，为了避免出错，请尽量使用绝对路径
296 |         self._pre_training_weights_darknet_path = '/Users/aaron/data/darknet_yolo.weights'
297 |         
298 | # 使用修改后的配置创建yolo对象
299 | config = MyConfig()
300 | init_yolo_v3(config)
301 | yolo = YOLO(config)
302 | 
303 | # 检测
304 | img = yolo.detect_and_draw_image('./xyolo_data/detect.jpg')
305 | img.show()
306 | ```
307 | 
308 | ### 3、使用自己的数据训练模型
309 | 
310 | 首先介绍一下xyolo的数据集的输入格式。xyolo的输入数据集可表示如下：
311 | 
312 | > 数据集是一个txt文本文件，它包含若干行，每行是一条数据。
313 | >
314 | > 每一行的格式： 图片路径 box1 box2 ... boxN
315 | >
316 | > 每一个box的格式：框框左上角x值,框框左上角y值,框框右下角x值,框框右下角y值,框框内物体的类别编号
317 | >
318 | > 给出一个示例：
319 | >
320 | > > ```
321 | > > path/to/img1.jpg 50,100,150,200,0 30,50,200,120,3
322 | > > path/to/img2.jpg 120,300,250,600,2
323 | > > ...
324 | > > ```
325 | 
326 | 进行图像标注的工具，labelImg算是用的比较多的。labelImg的标注文件格式默认是VOC的格式，文件类型是xml，与xyolo的输入格式并不相同。不用担心，xyolo提供了一个数据格式转换的脚本，我们只需要调用即可。
327 | 
328 | ```python
329 | # 引入转换脚本
330 | from xyolo import voc2xyolo
331 | 
332 | # voc格式的标注数据路径的正则表达式
333 | input_path = '/Users/aaron/data/labels_voc/*.xml'
334 | # classes是我们要检测的所有有效类别名称构成的txt文件，每个类别一行
335 | classes_path = '/Users/aaron/code/xyolo/tests/xyolo_data/classes.txt'
336 | # 转换后的xyolo数据集存放路径
337 | output_path = '/Users/aaron/code/xyolo/tests/xyolo_data/xyolo_label.txt'
338 | # 开始转换
339 | voc2xyolo(input_path=input_path, classes_path=classes_path, output_path=output_path)
340 | ```
341 | 
342 | 脚本执行后，会有进度条提示处理进度，当进度条达到100%时，处理完成。
343 | 
344 | ```
345 | 100%|██████████| 106/106 [00:00<00:00, 3076.05it/s]
346 | ```
347 | 
348 | 有了数据集之后，可以准备训练了。在此之前，确认一遍我们需要的东西：
349 | 
350 | - xyolo支持的数据集
351 | - 要检测的所有有效类别名称构成的txt文件（`classes.txt`）
352 | 
353 | 我以检测某个网站的验证码图片上的文字举例。
354 | 
355 | 首先确定要检测的类别，我只需要判断字的位置，不判断每个字是什么，所以就一个类别text。我们创建一个`classes.txt`文件，在里面写入：
356 | 
357 | ```
358 | text
359 | ```
360 | 
361 | 然后，我标注并使用xyolo转换了数据集(`xyolo_label.txt`)：
362 | 
363 | ```
364 | /home/aaron/tmp/test_xyolo/images/162.png 47,105,75,141,0 157,52,181,80,0 197,85,229,120,0 265,85,296,117,0 257,131,293,166,0 355,63,386,90,0
365 | /home/aaron/tmp/test_xyolo/images/88.png 93,46,129,86,0 79,139,114,174,0 209,42,237,72,0 200,68,226,98,0 256,53,295,86,0 209,134,247,171,0
366 | /home/aaron/tmp/test_xyolo/images/176.png 43,88,76,120,0 123,91,153,127,0 98,155,127,184,0 189,117,224,152,0 289,54,319,86,0 348,123,374,151,0
367 | /home/aaron/tmp/test_xyolo/images/63.png 36,128,72,161,0 79,130,104,161,0 127,120,160,153,0 305,111,329,138,0 302,125,334,153,0 342,81,380,119,0
368 | /home/aaron/tmp/test_xyolo/images/77.png 164,114,200,150,0 193,147,225,182,0 309,90,336,120,0 349,89,382,121,0 321,126,352,155,0 298,150,327,177,0
369 | /home/aaron/tmp/test_xyolo/images/189.png 119,90,148,118,0 122,132,150,159,0 208,44,240,76,0 279,60,314,97,0 299,65,334,98,0 331,93,364,129,0
370 | /home/aaron/tmp/test_xyolo/images/200.png 232,58,265,91,0 288,58,316,85,0 49,118,78,148,0 55,134,83,163,0 75,148,103,175,0 312,131,343,163,0
371 | /home/aaron/tmp/test_xyolo/images/76.png 20,61,56,97,0 29,108,57,139,0 76,117,111,154,0 139,117,167,147,0 204,116,242,157,0 336,153,376,191,0
372 | ...
373 | ```
374 | 
375 | 开始训练：
376 | 
377 | ```python
378 | # 导入包
379 | from xyolo import DefaultYolo3Config, YOLO
380 | from xyolo import init_yolo_v3
381 | 
382 | # 创建一个DefaultYolo3Config的子类，在子类里覆盖默认的配置
383 | class MyConfig(DefaultYolo3Config):
384 |     def __init__(self):
385 |         super(MyConfig, self).__init__()
386 |         # 数据集路径，推荐使用绝对路径
387 |         self._dataset_path = '/home/aaron/tmp/test_xyolo/xyolo_data/yolo_label.txt'
388 |         # 类别名称文件路径，推荐使用绝对路径
389 |         self._classes_path = '/home/aaron/tmp/test_xyolo/xyolo_data/classes.txt'
390 |         # 模型保存路径，默认是保存在当前路径下的xyolo_data下的，也可以进行更改
391 |         # 推荐使用绝对路径
392 |         self._output_model_path = '/home/aaron/tmp/test_xyolo/output_model.h5'
393 |         
394 | 
395 | # 使用修改后的配置创建yolo对象
396 | config = MyConfig()
397 | init_yolo_v3(config)
398 | # 如果是训练，在创建yolo对象时要传递参数train=True
399 | yolo = YOLO(config, train=True)
400 | # 开始训练，训练完成后会自动保存
401 | yolo.fit()
402 | ```
403 | 
404 | 如果想要使用训练好的模型进行预测，可以这样写：
405 | 
406 | ```python
407 | # 导入包
408 | from xyolo import DefaultYolo3Config, YOLO
409 | from xyolo import init_yolo_v3
410 | 
411 | # 创建一个DefaultYolo3Config的子类，在子类里覆盖默认的配置
412 | class MyConfig(DefaultYolo3Config):
413 |     def __init__(self):
414 |         super(MyConfig, self).__init__()
415 |         # 类别名称文件路径，推荐使用绝对路径
416 |         self._classes_path = '/home/aaron/tmp/test_xyolo/xyolo_data/classes.txt'
417 |         # yolo对象使用的模型路径，也就是刚刚训练好的模型路径，推荐使用绝对路径
418 |         self._model_path = '/home/aaron/tmp/test_xyolo/output_model.h5'
419 |         
420 | 
421 | # 使用修改后的配置创建yolo对象
422 | config = MyConfig()
423 | init_yolo_v3(config)
424 | yolo = YOLO(config)
425 | # 检测
426 | img = yolo.detect_and_draw_image('./xyolo_data/detect.jpg')
427 | img.show()
428 | ```
429 | 
430 | 觉得反复配置Config类麻烦？实际上，对于一个项目，我们只需要配置一个Config类，把项目用到的配置都一起配置了，然后再引用同一个类即可，我现在这么写只是为了方便演示。比如上面这个，可以直接把训练和调用的配置写在一起：
431 | 
432 | ```python
433 | # 创建一个DefaultYolo3Config的子类，在子类里覆盖默认的配置
434 | class MyConfig(DefaultYolo3Config):
435 |     def __init__(self):
436 |         super(MyConfig, self).__init__()
437 |         # 数据集路径，推荐使用绝对路径
438 |         self._dataset_path = '/home/aaron/tmp/test_xyolo/xyolo_data/yolo_label.txt'
439 |         # 类别名称文件路径，推荐使用绝对路径
440 |         self._classes_path = '/home/aaron/tmp/test_xyolo/xyolo_data/classes.txt'
441 |         # 模型保存路径，默认是保存在当前路径下的xyolo_data下的，也可以进行更改
442 |         # 推荐使用绝对路径
443 |         self._output_model_path = '/home/aaron/tmp/test_xyolo/output_model.h5'
444 | 				# yolo对象使用的模型路径，也就是刚刚训练好的模型路径，推荐使用绝对路径
445 |         self._model_path = '/home/aaron/tmp/test_xyolo/output_model.h5'
446 | ```
447 | 
448 | ### 4、可选的全部配置项
449 | 
450 | 我决定在`xyolo`工具里面给用户足够的定制空间，所以参数会设置的比较多，这样使用起来会更加灵活。
451 | 
452 | 但也不用担心配置繁琐，因为如上面所见，其实常用的配置项就几个，只要掌握那几个就可以轻松上手了。但是如果想更灵活的话，就需要了解更多参数。下面给出一份完整的参数配置信息（也就是xyolo的默认配置）：
453 | 
454 | ```python
455 | from os.path import abspath, join, dirname, exists
456 | from os import mkdir
457 | 
458 | 
459 | class DefaultYolo3Config:
460 |     """
461 |     yolo3模型的默认设置
462 |     """
463 | 
464 |     def __init__(self):
465 |         # xyolo各种数据的保存路径，包内置
466 |         self.inner_xyolo_data_dir = abspath(join(dirname(__file__), './xyolo_data'))
467 |         # xyolo各种数据的保存路径，包外，针对于项目
468 |         self.outer_xyolo_data_dir = abspath('./xyolo_data')
469 |         # yolo3预训练权重下载地址
470 |         self.pre_training_weights_url = 'https://pjreddie.com/media/files/yolov3.weights'
471 |         # 下载文件时的http代理，
472 |         # 如需设置，格式为{'https_proxy':'host:port'}，如{'https_proxy':'http://127.0.0.1:7890'},
473 |         # 详细设置请参考 https://requests.readthedocs.io/en/master/user/advanced/#proxies
474 |         self.requests_proxies = None
475 |         # Darknet格式的预训练权重路径,请填写相对于inner_xyolo_data_dir的相对或绝对路径
476 |         self._pre_training_weights_darknet_path = 'darknet_yolo.weights'
477 |         # yolo3预训练权重darknet md5 hash值，用于处理异常数据
478 |         self.pre_training_weights_darknet_md5 = 'c84e5b99d0e52cd466ae710cadf6d84c'
479 |         # 转化后的、Keras格式的预训练权重路径，请填写相对于inner_xyolo_data_dir的相对或绝对路径
480 |         self._pre_training_weights_keras_path = 'keras_weights.h5'
481 |         # 预训练权重的配置路径，请填写相对于inner_xyolo_data_dir的相对或绝对路径
482 |         self._pre_training_weights_config_path = 'yolov3.cfg'
483 |         # 默认的anchors box路径，请填写相对于inner_xyolo_data_dir的相对或绝对路径
484 |         self._anchors_path = 'yolo_anchors.txt'
485 |         # 默认的类别文本路径，请填写相对于inner_xyolo_data_dir的相对或绝对路径
486 |         self._classes_path = 'coco_classes.txt'
487 |         # 训练输出的模型地址,请填写相对于outer_xyolo_data_dir的相对或绝对路径
488 |         self._output_model_path = 'output_model.h5'
489 |         # 数据集路径,请填写相对于outer_xyolo_data_dir的相对或绝对路径
490 |         self._dataset_path = 'dataset.txt'
491 |         # 是否开启TensorBoard，默认开启
492 |         self.use_tensorboard = True
493 |         # 训练时TensorBoard输出路径，请填写相对于outer_xyolo_data_dir的相对或绝对路径
494 |         self._tensorboard_log_path = './tensorboard/logs'
495 |         # 是否开启CheckPoint，默认开启
496 |         self.use_checkpoint = True
497 |         # 是否开启学习率衰减
498 |         self.use_reduce_lr = True
499 |         # 学习率衰减监控指标，默认为验证loss
500 |         self.reduce_lr_monitor = 'val_loss'
501 |         # 学习率衰减因子，new_lr = lr * factor
502 |         self.reduce_lr_factor = 0.1
503 |         # 连续patience个epochs内结果未改善，则进行学习率衰减
504 |         self.reduce_lr_patience = 3
505 |         # 是否开启early_stopping
506 |         self.use_early_stopping = True
507 |         # early_stopping监控指标，默认为验证loss
508 |         self.early_stopping_monitor = 'val_loss'
509 |         # 指标至少变化多少认为结果改善了
510 |         self.early_stopping_min_delta = 0
511 |         # 连续patience个epochs内结果未改善，则提前结束训练
512 |         self.early_stopping_patience = 10
513 |         # yolo默认加载的模型路径(最好填写绝对路径)，优先级设置见下方model_path方法
514 |         self._model_path = ''
515 |         # 目标检测分数阈值
516 |         self.score = 0.3
517 |         # 交并比阈值
518 |         self.iou = 0.45
519 |         # 模型图片大小
520 |         self.model_image_size = (416, 416)
521 |         # GPU数量
522 |         self.gpu_num = 1
523 |         # 训练时的验证集分割比例，默认为0.1，
524 |         # 即将数据集中90%的数据用于训练，10%的用于测试
525 |         self.val_split = 0.1
526 |         # 训练分为两步，第一步冻结大多数层进行训练，第二步解冻进行微调
527 |         # 是否开启冻结训练，建议开启
528 |         self.frozen_train = True
529 |         # 冻结时，训练的epoch数
530 |         self.frozen_train_epochs = 50
531 |         # 冻结时，训练的batch_size
532 |         self.frozen_batch_size = 32
533 |         # 冻结时的初始学习率
534 |         self.frozen_lr = 1e-3
535 |         # 是否开启解冻训练，建议开启
536 |         self.unfreeze_train = True
537 |         # 解冻时，训练的epoch数
538 |         self.unfreeze_train_epochs = 50
539 |         # 解冻时，训练的batch_size.注意，解冻时训练对GPU内存需求量非常大，这里建议设置小一点
540 |         self.unfreeze_batch_size = 1
541 |         # 解冻时的初始学习率
542 |         self.unfreeze_lr = 1e-4
543 | 
544 |     def __setattr__(self, key, value):
545 |         _key = '_{}'.format(key)
546 |         if key not in self.__dict__ and _key in self.__dict__:
547 |             self.__dict__[_key] = value
548 |         else:
549 |             self.__dict__[key] = value
550 | 
551 |     @classmethod
552 |     def make_dir(cls, path):
553 |         if not exists(path):
554 |             mkdir(path)
555 | 
556 |     @classmethod
557 |     def join_and_abspath(cls, path1, path2):
558 |         return abspath(join(path1, path2))
559 | 
560 |     def inner_abspath(self, filename):
561 |         self.make_dir(self.inner_xyolo_data_dir)
562 |         return self.join_and_abspath(self.inner_xyolo_data_dir, filename)
563 | 
564 |     def outer_abspath(self, filename):
565 |         self.make_dir(self.outer_xyolo_data_dir)
566 |         return self.join_and_abspath(self.outer_xyolo_data_dir, filename)
567 | 
568 |     @property
569 |     def pre_training_weights_darknet_path(self):
570 |         return self.inner_abspath(self._pre_training_weights_darknet_path)
571 | 
572 |     @property
573 |     def pre_training_weights_config_path(self):
574 |         return self.inner_abspath(self._pre_training_weights_config_path)
575 | 
576 |     @property
577 |     def pre_training_weights_keras_path(self):
578 |         return self.inner_abspath(self._pre_training_weights_keras_path)
579 | 
580 |     @property
581 |     def anchors_path(self):
582 |         return self.inner_abspath(self._anchors_path)
583 | 
584 |     @property
585 |     def classes_path(self):
586 |         return self.inner_abspath(self._classes_path)
587 | 
588 |     @property
589 |     def output_model_path(self):
590 |         return self.outer_abspath(self._output_model_path)
591 | 
592 |     @property
593 |     def dataset_path(self):
594 |         return self.outer_abspath(self._dataset_path)
595 | 
596 |     @property
597 |     def tensorboard_log_path(self):
598 |         return self.outer_abspath(self._tensorboard_log_path)
599 | 
600 |     @property
601 |     def model_path(self):
602 |         """
603 |         Yolo模型默认加载的权重的路径。
604 |         按照 _model_path > output_model_path > pre_training_weights_keras_path 的优先级选择，即：
605 |         如果设置了_model_path,选择_model_path
606 |         否则，如果设置了output_model_path且路径存在，选择output_model_path
607 |         否则，选择pre_training_weights_keras_path
608 |         """
609 |         _model_path = getattr(self, '_model_path', '')
610 |         if _model_path:
611 |             return abspath(_model_path)
612 |         if self._output_model_path and exists(self.output_model_path):
613 |             return self.output_model_path
614 |         return self.pre_training_weights_keras_path
615 | 
616 | ```
617 | 
618 | 配置项都有注释说明，就不再多讲了。这里我还想提一下`model_path`这个`property`。
619 | 
620 | 
621 | 
622 | `model_path`决定了xyolo在进行检测时，会加载那个模型文件。它的选择逻辑如下：
623 | 
624 | 按照 `_model_path` > `output_model_path` > `pre_training_weights_keras_path` 的优先级选择，即：
625 | 
626 | - 如果设置了`_model_path`,选择`_model_path`(`3、使用自己的数据训练模型`中检测部分就是这种情况)
627 | - 否则，如果设置了`output_model_path`且路径存在，选择`output_model_path`（也就是说，因为设置了`_output_model_path`，所以假设`3、使用自己的数据训练模型`中没配置`model_path`，它在检测时也能够正确加载模型）
628 | - 否则，选择`pre_training_weights_keras_path` (也就是转换后的官方预训练模型，即`1、使用官方预训练权重，进行目标检测测试` 中的情况)
629 | 
630 | 
631 | 
632 | ### 5、yolo对象的几个常用方法
633 | 
634 | 上面几个检测示例中，我们调用的都是同一个方法：
635 | 
636 | ```python
637 | img = yolo.detect_and_draw_image('./xyolo_data/detect.jpg')
638 | img.show()
639 | ```
640 | 
641 | 它的输入是一个PIL.Image.Image类的实例，或者一个表示图片路径的字符串，返回的是检测并画上框框标记的图片对象，也是PIL.Image.Image类的实例。
642 | 
643 | 看一下它的代码：
644 | 
645 | ```python
646 |     def detect_and_draw_image(self, image: typing.Union[Image.Image, str], draw_label=True) -> Image.Image:
647 |         """
648 |         在给定图片上做目标检测，并根据检测结果在图片上画出框框和标签
649 | 
650 |         Args:
651 |             image: 要检测的图片对象（PIL.Image.Image）或路径(str)
652 |             draw_label: 是否需要为框框标注类别和概率
653 | 
654 |         Returns:
655 |             添加了检测结果的图片对象
656 |         """
657 |         predicted_results = self.detect_image(image)
658 |         img = self.draw_image(image, predicted_results, draw_label=draw_label)
659 |         return img
660 | ```
661 | 
662 | 可以看到，这个方法实际上是调用了两个方法，`detect_image`用来获取检测结果，`draw_image`用来在图片上绘制检测信息。
663 | 
664 | 如果需要的话，我们也可以直接调用这两个方法，比如当我只需要获取图片上的目标检测结果时，我只需要调用`detect_image`即可，因为我并不关心图片的绘制。
665 | 
666 | 下面附上这两个方法的接口说明：
667 | 
668 | ```python
669 |     def detect_image(self, img: typing.Union[Image.Image, str]) -> typing.List[
670 |             typing.Tuple[str, int, float, int, int, int, int]]:
671 |         """
672 |         在给定图片上做目标检测并返回检测结果
673 | 
674 |         Args:
675 |             img: 要检测的图片对象（PIL.Image.Image）或路径(str)
676 | 
677 |         Returns:
678 |             [[类别名称,类别编号,概率,左上角x值,左上角y值,右下角x值,右下角y值],...]
679 |         """
680 |       	pass
681 |       
682 |      
683 |     def draw_image(self, img: typing.Union[Image.Image, str], predicted_results: typing.List[
684 |       typing.Tuple[str, int, float, int, int, int, int]], draw_label=True) -> Image.Image:
685 |         """
686 |         给定一张图片和目标检测结果，将目标检测结果绘制在图片上，并返回绘制后的图片
687 | 
688 |         Args:
689 |             img: 要检测的图片对象（PIL.Image.Image）或路径(str)
690 |             predicted_results: 目标检测的结果，[[类别名称,类别编号,概率,左上角x值,左上角y值,右下角x值,右下角y值],...]
691 |             draw_label: 是否需要为框框标注类别和概率
692 | 
693 |         Returns:
694 |             添加了检测结果的图片对象
695 |         """
696 |         pass
697 | ```
698 | 
699 | 
700 | 
701 | ## 后话
702 | 
703 | xyolo包还没有进行过充足的测试（我自己做过使用测试，但很明显，不能覆盖到全部的情况），所以出个什么bug也是可以理解的吧（别打我，别打我，抱头）？碰到什么bug的话欢迎联系我哈。
704 | 
705 | PS:
706 | 
707 | > xyolo最低支持的TensorFlow版本是2.2，更低版本的没有测试和适配，不保证能用。精力有限，所以低版本也不打算适配了，所以如果是低版本导致的问题，我可能就不做处理了，请见谅~
708 | 
709 | 最后，如果您喜欢这个项目的话，给个star呗~
710 | 
711 | 感谢支持~


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Date         : 2020-10-30 23:54:59
 3 | # @Author       : AaronJny
 4 | # @LastEditTime : 2021-01-03 20:40:19
 5 | # @FilePath     : /xyolo/setup.py
 6 | # @Desc         :
 7 | import setuptools
 8 | 
 9 | with open("README.md", "r") as fh:
10 |     long_description = fh.read()
11 | 
12 | setuptools.setup(
13 |     name="xyolo",  # Replace with your own username
14 |     version="0.1.6",
15 |     author="AaronJny",
16 |     author_email="aaronjny7@gmail.com",
17 |     description="xyolo is a highly encapsulated YOLO v3 library implemented in Python."
18 |                 "With xyolo, you can easily complete the training and calling of the yolo3 "
19 |                 "target detection task with just a few lines of Python code.",
20 |     long_description=long_description,
21 |     long_description_content_type="text/markdown",
22 |     url="https://github.com/AaronJny/xyolo",
23 |     packages=setuptools.find_packages(),
24 |     package_data={
25 |         'xyolo': ['xyolo_data/*.txt',
26 |                   'xyolo_data/*.cfg']
27 |     },
28 |     classifiers=[
29 |         "Programming Language :: Python :: 3",
30 |         "License :: OSI Approved :: MIT License",
31 |         "Operating System :: OS Independent",
32 |     ],
33 |     python_requires='>=3.6',
34 |     install_requires=[
35 |         'tensorflow>=2.2',
36 |         'numpy>=1.18.1,<1.19.0',
37 |         'pillow>=7.0.0',
38 |         'matplotlib>=3.1.3',
39 |         'loguru>=0.5.1',
40 |         'requests>=2.22.0',
41 |         'tqdm>=4.42.1',
42 |         'lxml>=4.5.0',
43 |         'opencv-python>=4.2.0'
44 |     ]
45 | )
46 | 


--------------------------------------------------------------------------------
/tests/eval_default.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Time    : 2020/10/31 15:20
 3 | # @Author  : AaronJny
 4 | # @File    : use_proxies.py
 5 | # @Desc    : 使用预训练权重对图片执行目标检测
 6 | # 导入包
 7 | # 导入包
 8 | from xyolo import YOLO, DefaultYolo3Config
 9 | from xyolo import init_yolo_v3
10 | 
11 | 
12 | # 创建默认配置类对象
13 | config = DefaultYolo3Config()
14 | # 初始化xyolo（下载预训练权重、转换权重等操作都是在这里完成的）
15 | # 下载和转换只在第一次调用的时候进行，之后再调用会使用缓存的文件
16 | init_yolo_v3(config)
17 | # 创建一个yolo对象，这个对象提供使用yolov3进行检测和训练的接口
18 | yolo = YOLO(config)
19 | 
20 | # 检测并在图片上标注出物体
21 | img = yolo.detect_and_draw_image('./xyolo_data/detect.jpg')
22 | # 展示标注后图片
23 | img.show()


--------------------------------------------------------------------------------
/tests/eval_mydata.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Time    : 2020/10/31 19:49
 3 | # @Author  : AaronJny
 4 | # @File    : eval_mydata.py
 5 | # @Desc    :
 6 | from xyolo import DefaultYolo3Config
 7 | from xyolo import YOLO
 8 | 
 9 | 
10 | class MyConfig(DefaultYolo3Config):
11 | 
12 |     def __init__(self):
13 |         super(MyConfig, self).__init__()
14 |         self._classes_path = '/Users/aaron/code/xyolo/tests/xyolo_data/classes.txt'
15 |         self._model_path = '/Users/aaron/code/xyolo/tests/xyolo_data/output_model.h5'
16 | 
17 | 
18 | config = MyConfig()
19 | yolo = YOLO(config)
20 | image_path = '/Users/aaron/code/bctt/spider/captcha_detection/soopat/images/232.png'
21 | img = yolo.detect_and_draw_image(image_path)
22 | img.show()
23 | 


--------------------------------------------------------------------------------
/tests/specify_pre_training_weight.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Time    : 2020/11/6 22:03
 3 | # @Author  : AaronJny
 4 | # @File    : specify_pre_training_weight.py
 5 | # @Desc    : 指定预训练权重路径
 6 | from xyolo import YOLO, DefaultYolo3Config
 7 | from xyolo import init_yolo_v3
 8 | 
 9 | 
10 | # 创建一个DefaultYolo3Config的子类，在子类里覆盖默认的配置
11 | class MyConfig(DefaultYolo3Config):
12 |     def __init__(self):
13 |         super(MyConfig, self).__init__()
14 |         # 这是替换成你的文件路径，为了避免出错，请尽量使用绝对路径
15 |         self._pre_training_weights_darknet_path = '/Users/aaron/data/darknet_yolo.weights'
16 | 
17 | 
18 | # 使用修改后的配置创建yolo对象
19 | config = MyConfig()
20 | init_yolo_v3(config)
21 | yolo = YOLO(config)
22 | 
23 | # 检测
24 | img = yolo.detect_and_draw_image('./xyolo_data/detect.jpg')
25 | img.show()
26 | 


--------------------------------------------------------------------------------
/tests/test_voc2xyolo.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Time    : 2020/11/3 14:44
 3 | # @Author  : AaronJny
 4 | # @File    : test_voc2xyolo.py
 5 | # @Desc    : 测试将voc标注格式转成xyolo需要的标注格式的方法
 6 | # 引入转换脚本
 7 | from xyolo import voc2xyolo
 8 | 
 9 | # voc格式的标注数据路径的正则表达式
10 | input_path = '/Users/aaron/data/labels_voc/*.xml'
11 | # classes是我们要检测的所有有效类别名称构成的txt文件，每个类别一行
12 | classes_path = '/Users/aaron/code/xyolo/tests/xyolo_data/classes.txt'
13 | # 转换后的xyolo数据集存放路径
14 | output_path = '/Users/aaron/code/xyolo/tests/xyolo_data/xyolo_label.txt'
15 | # 开始转换
16 | voc2xyolo(input_path=input_path, classes_path=classes_path, output_path=output_path)
17 | 


--------------------------------------------------------------------------------
/tests/train.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Time    : 2020/10/31 15:22
 3 | # @Author  : AaronJny
 4 | # @File    : train.py
 5 | # @Desc    : 使用xyolo训练自己的模型
 6 | # 导入包
 7 | from xyolo import DefaultYolo3Config, YOLO
 8 | from xyolo import init_yolo_v3
 9 | 
10 | 
11 | # 创建一个DefaultYolo3Config的子类，在子类里覆盖默认的配置
12 | class MyConfig(DefaultYolo3Config):
13 |     def __init__(self):
14 |         super(MyConfig, self).__init__()
15 |         # 数据集路径，推荐使用绝对路径
16 |         self._dataset_path = '/home/aaron/tmp/test_xyolo/xyolo_data/yolo_label.txt'
17 |         # 类别名称文件路径，推荐使用绝对路径
18 |         self._classes_path = '/home/aaron/tmp/test_xyolo/xyolo_data/classes.txt'
19 |         # 模型保存路径，默认是保存在当前路径下的xyolo_data下的，也可以进行更改
20 |         # 推荐使用绝对路径
21 |         self._output_model_path = '/home/aaron/tmp/test_xyolo/output_model.h5'
22 | 
23 | 
24 | # 使用修改后的配置创建yolo对象
25 | config = MyConfig()
26 | init_yolo_v3(config)
27 | # 如果是训练，在创建yolo对象时要传递参数train=True
28 | yolo = YOLO(config, train=True)
29 | # 开始训练，训练完成后会自动保存
30 | yolo.fit()
31 | 


--------------------------------------------------------------------------------
/tests/use_proxies.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Time    : 2020/10/31 15:20
 3 | # @Author  : AaronJny
 4 | # @File    : use_proxies.py
 5 | # @Desc    : 使用预训练权重对图片执行目标检测
 6 | from xyolo import YOLO, DefaultYolo3Config
 7 | from xyolo import init_yolo_v3
 8 | 
 9 | 
10 | # 创建一个DefaultYolo3Config的子类，在子类里覆盖默认的配置
11 | class MyConfig(DefaultYolo3Config):
12 |     def __init__(self):
13 |         super(MyConfig, self).__init__()
14 |         # 这是替换成你的代理地址
15 |         self.requests_proxies = {'https': 'http://localhost:7890'}
16 | 
17 | 
18 | # 使用修改后的配置创建yolo对象
19 | config = MyConfig()
20 | init_yolo_v3(config)
21 | yolo = YOLO(config)
22 | 
23 | # 检测
24 | img = yolo.detect_and_draw_image('./xyolo_data/detect.jpg')
25 | img.show()


--------------------------------------------------------------------------------
/xyolo/__init__.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Time    : 2020/10/30 23:54
 3 | # @Author  : AaronJny
 4 | # @File    : __init__.py.py
 5 | # @Desc    :
 6 | from .config import DefaultYolo3Config
 7 | from .init_yolo import init_yolo_v3
 8 | from .preprocessing import voc2xyolo
 9 | from .yolo3.yolo import YOLO
10 | 


--------------------------------------------------------------------------------
/xyolo/config.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | # @Time    : 2020/10/31 10:50
  3 | # @Author  : AaronJny
  4 | # @File    : config.py
  5 | # @Desc    :
  6 | from os.path import abspath, join, dirname, exists
  7 | from os import mkdir
  8 | 
  9 | 
 10 | class DefaultYolo3Config:
 11 |     """
 12 |     yolo3模型的默认设置
 13 |     """
 14 | 
 15 |     def __init__(self):
 16 |         # xyolo各种数据的保存路径，包内置
 17 |         self.inner_xyolo_data_dir = abspath(join(dirname(__file__), './xyolo_data'))
 18 |         # xyolo各种数据的保存路径，包外，针对于项目
 19 |         self.outer_xyolo_data_dir = abspath('./xyolo_data')
 20 |         # yolo3预训练权重下载地址
 21 |         self.pre_training_weights_url = 'https://pjreddie.com/media/files/yolov3.weights'
 22 |         # 下载文件时的http代理，
 23 |         # 如需设置，格式为{'https_proxy':'host:port'}，如{'https_proxy':'http://127.0.0.1:7890'},
 24 |         # 详细设置请参考 https://requests.readthedocs.io/en/master/user/advanced/#proxies
 25 |         self.requests_proxies = None
 26 |         # Darknet格式的预训练权重路径,请填写相对于inner_xyolo_data_dir的相对或绝对路径
 27 |         self._pre_training_weights_darknet_path = 'darknet_yolo.weights'
 28 |         # yolo3预训练权重darknet md5 hash值，用于处理异常数据
 29 |         self.pre_training_weights_darknet_md5 = 'c84e5b99d0e52cd466ae710cadf6d84c'
 30 |         # 转化后的、Keras格式的预训练权重路径，请填写相对于inner_xyolo_data_dir的相对或绝对路径
 31 |         self._pre_training_weights_keras_path = 'keras_weights.h5'
 32 |         # 预训练权重的配置路径，请填写相对于inner_xyolo_data_dir的相对或绝对路径
 33 |         self._pre_training_weights_config_path = 'yolov3.cfg'
 34 |         # 默认的anchors box路径，请填写相对于inner_xyolo_data_dir的相对或绝对路径
 35 |         self._anchors_path = 'yolo_anchors.txt'
 36 |         # 默认的类别文本路径，请填写相对于inner_xyolo_data_dir的相对或绝对路径
 37 |         self._classes_path = 'coco_classes.txt'
 38 |         # 训练输出的模型地址,请填写相对于outer_xyolo_data_dir的相对或绝对路径
 39 |         self._output_model_path = 'output_model.h5'
 40 |         # 数据集路径,请填写相对于outer_xyolo_data_dir的相对或绝对路径
 41 |         self._dataset_path = 'dataset.txt'
 42 |         # 是否开启TensorBoard，默认开启
 43 |         self.use_tensorboard = True
 44 |         # 训练时TensorBoard输出路径，请填写相对于outer_xyolo_data_dir的相对或绝对路径
 45 |         self._tensorboard_log_path = './tensorboard/logs'
 46 |         # 是否开启CheckPoint，默认开启
 47 |         self.use_checkpoint = True
 48 |         # 是否开启学习率衰减
 49 |         self.use_reduce_lr = True
 50 |         # 学习率衰减监控指标，默认为验证loss
 51 |         self.reduce_lr_monitor = 'val_loss'
 52 |         # 学习率衰减因子，new_lr = lr * factor
 53 |         self.reduce_lr_factor = 0.1
 54 |         # 连续patience个epochs内结果未改善，则进行学习率衰减
 55 |         self.reduce_lr_patience = 3
 56 |         # 是否开启early_stopping
 57 |         self.use_early_stopping = True
 58 |         # early_stopping监控指标，默认为验证loss
 59 |         self.early_stopping_monitor = 'val_loss'
 60 |         # 指标至少变化多少认为结果改善了
 61 |         self.early_stopping_min_delta = 0
 62 |         # 连续patience个epochs内结果未改善，则提前结束训练
 63 |         self.early_stopping_patience = 10
 64 |         # yolo默认加载的模型路径(最好填写绝对路径)，优先级设置见下方model_path方法
 65 |         self._model_path = ''
 66 |         # 目标检测分数阈值
 67 |         self.score = 0.3
 68 |         # 交并比阈值
 69 |         self.iou = 0.45
 70 |         # 模型图片大小
 71 |         self.model_image_size = (416, 416)
 72 |         # GPU数量
 73 |         self.gpu_num = 1
 74 |         # 训练时的验证集分割比例，默认为0.1，
 75 |         # 即将数据集中90%的数据用于训练，10%的用于测试
 76 |         self.val_split = 0.1
 77 |         # 训练分为两步，第一步冻结大多数层进行训练，第二步解冻进行微调
 78 |         # 是否开启冻结训练，建议开启
 79 |         self.frozen_train = True
 80 |         # 冻结时，训练的epoch数
 81 |         self.frozen_train_epochs = 50
 82 |         # 冻结时，训练的batch_size
 83 |         self.frozen_batch_size = 32
 84 |         # 冻结时的初始学习率
 85 |         self.frozen_lr = 1e-3
 86 |         # 是否开启解冻训练，建议开启
 87 |         self.unfreeze_train = True
 88 |         # 解冻时，训练的epoch数
 89 |         self.unfreeze_train_epochs = 50
 90 |         # 解冻时，训练的batch_size.注意，解冻时训练对GPU内存需求量非常大，这里建议设置小一点
 91 |         self.unfreeze_batch_size = 1
 92 |         # 解冻时的初始学习率
 93 |         self.unfreeze_lr = 1e-4
 94 | 
 95 |     def __setattr__(self, key, value):
 96 |         _key = '_{}'.format(key)
 97 |         if key not in self.__dict__ and _key in self.__dict__:
 98 |             self.__dict__[_key] = value
 99 |         else:
100 |             self.__dict__[key] = value
101 | 
102 |     @classmethod
103 |     def make_dir(cls, path):
104 |         if not exists(path):
105 |             mkdir(path)
106 | 
107 |     @classmethod
108 |     def join_and_abspath(cls, path1, path2):
109 |         return abspath(join(path1, path2))
110 | 
111 |     def inner_abspath(self, filename):
112 |         self.make_dir(self.inner_xyolo_data_dir)
113 |         return self.join_and_abspath(self.inner_xyolo_data_dir, filename)
114 | 
115 |     def outer_abspath(self, filename):
116 |         self.make_dir(self.outer_xyolo_data_dir)
117 |         return self.join_and_abspath(self.outer_xyolo_data_dir, filename)
118 | 
119 |     @property
120 |     def pre_training_weights_darknet_path(self):
121 |         return self.inner_abspath(self._pre_training_weights_darknet_path)
122 | 
123 |     @property
124 |     def pre_training_weights_config_path(self):
125 |         return self.inner_abspath(self._pre_training_weights_config_path)
126 | 
127 |     @property
128 |     def pre_training_weights_keras_path(self):
129 |         return self.inner_abspath(self._pre_training_weights_keras_path)
130 | 
131 |     @property
132 |     def anchors_path(self):
133 |         return self.inner_abspath(self._anchors_path)
134 | 
135 |     @property
136 |     def classes_path(self):
137 |         return self.inner_abspath(self._classes_path)
138 | 
139 |     @property
140 |     def output_model_path(self):
141 |         return self.outer_abspath(self._output_model_path)
142 | 
143 |     @property
144 |     def dataset_path(self):
145 |         return self.outer_abspath(self._dataset_path)
146 | 
147 |     @property
148 |     def tensorboard_log_path(self):
149 |         return self.outer_abspath(self._tensorboard_log_path)
150 | 
151 |     @property
152 |     def model_path(self):
153 |         """
154 |         Yolo模型默认加载的权重的路径。
155 |         按照 _model_path > output_model_path > pre_training_weights_keras_path 的优先级选择，即：
156 |         如果设置了_model_path,选择_model_path
157 |         否则，如果设置了output_model_path且路径存在，选择output_model_path
158 |         否则，选择pre_training_weights_keras_path
159 |         """
160 |         _model_path = getattr(self, '_model_path', '')
161 |         if _model_path:
162 |             return abspath(_model_path)
163 |         if self._output_model_path and exists(self.output_model_path):
164 |             return self.output_model_path
165 |         return self.pre_training_weights_keras_path
166 | 


--------------------------------------------------------------------------------
/xyolo/convert.py:
--------------------------------------------------------------------------------
  1 | #! /usr/bin/env python
  2 | """
  3 | Reads Darknet config and weights and creates Keras model with TF backend.
  4 | 
  5 | """
  6 | 
  7 | import argparse
  8 | import configparser
  9 | import io
 10 | import os
 11 | from collections import defaultdict
 12 | 
 13 | import numpy as np
 14 | import tensorflow.keras.backend as K
 15 | from loguru import logger
 16 | from tensorflow.keras.layers import BatchNormalization
 17 | from tensorflow.keras.layers import (Conv2D, Input, ZeroPadding2D, Add,
 18 |                                      UpSampling2D, MaxPooling2D, Concatenate)
 19 | from tensorflow.keras.layers import LeakyReLU
 20 | from tensorflow.keras.models import Model
 21 | from tensorflow.keras.regularizers import l2
 22 | from tensorflow.keras.utils import plot_model as plot
 23 | 
 24 | parser = argparse.ArgumentParser(description='Darknet To Keras Converter.')
 25 | parser.add_argument('config_path', help='Path to Darknet cfg file.')
 26 | parser.add_argument('weights_path', help='Path to Darknet weights file.')
 27 | parser.add_argument('output_path', help='Path to output Keras model file.')
 28 | parser.add_argument(
 29 |     '-p',
 30 |     '--plot_model',
 31 |     help='Plot generated Keras model and save as image.',
 32 |     action='store_true')
 33 | parser.add_argument(
 34 |     '-w',
 35 |     '--weights_only',
 36 |     help='Save as Keras weights file instead of model file.',
 37 |     action='store_true')
 38 | 
 39 | 
 40 | def unique_config_sections(config_file):
 41 |     """Convert all config sections to have unique names.
 42 | 
 43 |     Adds unique suffixes to config sections for compability with configparser.
 44 |     """
 45 |     section_counters = defaultdict(int)
 46 |     output_stream = io.StringIO()
 47 |     with open(config_file) as fin:
 48 |         for line in fin:
 49 |             if line.startswith('['):
 50 |                 section = line.strip().strip('[]')
 51 |                 _section = section + '_' + str(section_counters[section])
 52 |                 section_counters[section] += 1
 53 |                 line = line.replace(section, _section)
 54 |             output_stream.write(line)
 55 |     output_stream.seek(0)
 56 |     return output_stream
 57 | 
 58 | 
 59 | def convert(config_path, weights_path, output_path, weights_only=None, plot_model=None):
 60 |     output_root = os.path.splitext(output_path)[0]
 61 | 
 62 |     # Load weights and config.
 63 |     logger.info('Loading weights.')
 64 |     weights_file = open(weights_path, 'rb')
 65 |     major, minor, revision = np.ndarray(
 66 |         shape=(3,), dtype='int32', buffer=weights_file.read(12))
 67 |     if (major * 10 + minor) >= 2 and major < 1000 and minor < 1000:
 68 |         seen = np.ndarray(shape=(1,), dtype='int64', buffer=weights_file.read(8))
 69 |     else:
 70 |         seen = np.ndarray(shape=(1,), dtype='int32', buffer=weights_file.read(4))
 71 |     logger.info('Weights Header: ', major, minor, revision, seen)
 72 | 
 73 |     logger.info('Parsing Darknet config.')
 74 |     unique_config_file = unique_config_sections(config_path)
 75 |     cfg_parser = configparser.ConfigParser()
 76 |     cfg_parser.read_file(unique_config_file)
 77 | 
 78 |     logger.info('Creating Keras model.')
 79 |     input_layer = Input(shape=(None, None, 3))
 80 |     prev_layer = input_layer
 81 |     all_layers = []
 82 | 
 83 |     weight_decay = float(cfg_parser['net_0']['decay']
 84 |                          ) if 'net_0' in cfg_parser.sections() else 5e-4
 85 |     count = 0
 86 |     out_index = []
 87 |     for section in cfg_parser.sections():
 88 |         logger.debug('Parsing section {}'.format(section))
 89 |         if section.startswith('convolutional'):
 90 |             filters = int(cfg_parser[section]['filters'])
 91 |             size = int(cfg_parser[section]['size'])
 92 |             stride = int(cfg_parser[section]['stride'])
 93 |             pad = int(cfg_parser[section]['pad'])
 94 |             activation = cfg_parser[section]['activation']
 95 |             batch_normalize = 'batch_normalize' in cfg_parser[section]
 96 | 
 97 |             padding = 'same' if pad == 1 and stride == 1 else 'valid'
 98 | 
 99 |             # Setting weights.
100 |             # Darknet serializes convolutional weights as:
101 |             # [bias/beta, [gamma, mean, variance], conv_weights]
102 |             prev_layer_shape = K.int_shape(prev_layer)
103 | 
104 |             weights_shape = (size, size, prev_layer_shape[-1], filters)
105 |             darknet_w_shape = (filters, weights_shape[2], size, size)
106 |             weights_size = np.product(weights_shape)
107 | 
108 |             logger.debug(' '.join(['conv2d', 'bn' if batch_normalize else '  ',
109 |                                    activation, str(weights_shape)]))
110 | 
111 |             conv_bias = np.ndarray(
112 |                 shape=(filters,),
113 |                 dtype='float32',
114 |                 buffer=weights_file.read(filters * 4))
115 |             count += filters
116 | 
117 |             if batch_normalize:
118 |                 bn_weights = np.ndarray(
119 |                     shape=(3, filters),
120 |                     dtype='float32',
121 |                     buffer=weights_file.read(filters * 12))
122 |                 count += 3 * filters
123 | 
124 |                 bn_weight_list = [
125 |                     bn_weights[0],  # scale gamma
126 |                     conv_bias,  # shift beta
127 |                     bn_weights[1],  # running mean
128 |                     bn_weights[2]  # running var
129 |                 ]
130 | 
131 |             conv_weights = np.ndarray(
132 |                 shape=darknet_w_shape,
133 |                 dtype='float32',
134 |                 buffer=weights_file.read(weights_size * 4))
135 |             count += weights_size
136 | 
137 |             # DarkNet conv_weights are serialized Caffe-style:
138 |             # (out_dim, in_dim, height, width)
139 |             # We would like to set these to Tensorflow order:
140 |             # (height, width, in_dim, out_dim)
141 |             conv_weights = np.transpose(conv_weights, [2, 3, 1, 0])
142 |             conv_weights = [conv_weights] if batch_normalize else [
143 |                 conv_weights, conv_bias
144 |             ]
145 | 
146 |             # Handle activation.
147 |             act_fn = None
148 |             if activation == 'leaky':
149 |                 pass  # Add advanced activation later.
150 |             elif activation != 'linear':
151 |                 raise ValueError(
152 |                     'Unknown activation function `{}` in section {}'.format(
153 |                         activation, section))
154 | 
155 |             # Create Conv2D layer
156 |             if stride > 1:
157 |                 # Darknet uses left and top padding instead of 'same' mode
158 |                 prev_layer = ZeroPadding2D(((1, 0), (1, 0)))(prev_layer)
159 |             conv_layer = (Conv2D(
160 |                 filters, (size, size),
161 |                 strides=(stride, stride),
162 |                 kernel_regularizer=l2(weight_decay),
163 |                 use_bias=not batch_normalize,
164 |                 weights=conv_weights,
165 |                 activation=act_fn,
166 |                 padding=padding))(prev_layer)
167 | 
168 |             if batch_normalize:
169 |                 conv_layer = (BatchNormalization(
170 |                     weights=bn_weight_list))(conv_layer)
171 |             prev_layer = conv_layer
172 | 
173 |             if activation == 'linear':
174 |                 all_layers.append(prev_layer)
175 |             elif activation == 'leaky':
176 |                 act_layer = LeakyReLU(alpha=0.1)(prev_layer)
177 |                 prev_layer = act_layer
178 |                 all_layers.append(act_layer)
179 | 
180 |         elif section.startswith('route'):
181 |             ids = [int(i) for i in cfg_parser[section]['layers'].split(',')]
182 |             layers = [all_layers[i] for i in ids]
183 |             if len(layers) > 1:
184 |                 logger.debug('Concatenating route layers: {}'.format(layers))
185 |                 concatenate_layer = Concatenate()(layers)
186 |                 all_layers.append(concatenate_layer)
187 |                 prev_layer = concatenate_layer
188 |             else:
189 |                 skip_layer = layers[0]  # only one layer to route
190 |                 all_layers.append(skip_layer)
191 |                 prev_layer = skip_layer
192 | 
193 |         elif section.startswith('maxpool'):
194 |             size = int(cfg_parser[section]['size'])
195 |             stride = int(cfg_parser[section]['stride'])
196 |             all_layers.append(
197 |                 MaxPooling2D(
198 |                     pool_size=(size, size),
199 |                     strides=(stride, stride),
200 |                     padding='same')(prev_layer))
201 |             prev_layer = all_layers[-1]
202 | 
203 |         elif section.startswith('shortcut'):
204 |             index = int(cfg_parser[section]['from'])
205 |             activation = cfg_parser[section]['activation']
206 |             assert activation == 'linear', 'Only linear activation supported.'
207 |             all_layers.append(Add()([all_layers[index], prev_layer]))
208 |             prev_layer = all_layers[-1]
209 | 
210 |         elif section.startswith('upsample'):
211 |             stride = int(cfg_parser[section]['stride'])
212 |             assert stride == 2, 'Only stride=2 supported.'
213 |             all_layers.append(UpSampling2D(stride)(prev_layer))
214 |             prev_layer = all_layers[-1]
215 | 
216 |         elif section.startswith('yolo'):
217 |             out_index.append(len(all_layers) - 1)
218 |             all_layers.append(None)
219 |             prev_layer = all_layers[-1]
220 | 
221 |         elif section.startswith('net'):
222 |             pass
223 | 
224 |         else:
225 |             raise ValueError(
226 |                 'Unsupported section header type: {}'.format(section))
227 | 
228 |     # Create and save model.
229 |     if len(out_index) == 0: out_index.append(len(all_layers) - 1)
230 |     model = Model(inputs=input_layer, outputs=[all_layers[i] for i in out_index])
231 |     model.summary()
232 |     if weights_only:
233 |         model.save_weights('{}'.format(output_path))
234 |         logger.info('Saved Keras weights to {}'.format(output_path))
235 |     else:
236 |         model.save('{}'.format(output_path))
237 |         logger.info('Saved Keras model to {}'.format(output_path))
238 | 
239 |     # Check to see if all weights have been read.
240 |     remaining_weights = len(weights_file.read()) / 4
241 |     weights_file.close()
242 |     logger.info('Read {} of {} from Darknet weights.'.format(count, count +
243 |                                                              remaining_weights))
244 |     if remaining_weights > 0:
245 |         logger.info('Warning: {} unused weights'.format(remaining_weights))
246 | 
247 |     if plot_model:
248 |         plot(model, to_file='{}.png'.format(output_root), show_shapes=True)
249 |         logger.info('Saved model plot to {}.png'.format(output_root))
250 | 
251 | 
252 | # %%
253 | def _main(args):
254 |     config_path = os.path.expanduser(args.config_path)
255 |     weights_path = os.path.expanduser(args.weights_path)
256 |     assert config_path.endswith('.cfg'), '{} is not a .cfg file'.format(
257 |         config_path)
258 |     assert weights_path.endswith(
259 |         '.weights'), '{} is not a .weights file'.format(weights_path)
260 | 
261 |     output_path = os.path.expanduser(args.output_path)
262 |     assert output_path.endswith(
263 |         '.h5'), 'output path {} is not a .h5 file'.format(output_path)
264 |     convert(config_path, weights_path, output_path, weights_only=args.weights_only, plot_model=args.plot_model)
265 | 
266 | 
267 | if __name__ == '__main__':
268 |     _main(parser.parse_args())
269 | 


--------------------------------------------------------------------------------
/xyolo/init_yolo.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Time    : 2020/10/31 00:18
 3 | # @Author  : AaronJny
 4 | # @File    : init_yolo.py
 5 | # @Desc    :
 6 | import os
 7 | 
 8 | import requests
 9 | from loguru import logger
10 | from tqdm import tqdm
11 | 
12 | from xyolo.config import DefaultYolo3Config
13 | from xyolo.convert import convert
14 | from hashlib import md5
15 | 
16 | 
17 | def compute_hash_code(filepath):
18 |     """
19 |     读取并计算给定文件的md5 hash值
20 |     """
21 |     with open(filepath, 'rb') as f:
22 |         data = f.read()
23 |     return md5(data).hexdigest()
24 | 
25 | 
26 | def download_weights(config):
27 |     darknet_path = config.pre_training_weights_darknet_path
28 |     if os.path.exists(darknet_path):
29 |         # 如果已经存在，先校验md5哈希值
30 |         current_hash_code = compute_hash_code(darknet_path)
31 |         # md5相同才说明已经下载了，否则重新下载
32 |         if current_hash_code == config.pre_training_weights_darknet_md5:
33 |             logger.info('Pre-training weights already exists! Skip!')
34 |             return
35 |     weights_url = config.pre_training_weights_url
36 |     r = requests.get(weights_url, stream=True, proxies=config.requests_proxies)
37 |     filename = weights_url.split('/')[-1]
38 |     with tqdm.wrapattr(open(darknet_path, "wb"), "write",
39 |                        miniters=1, desc=filename,
40 |                        total=int(r.headers.get('content-length', 0))) as f:
41 |         for chunk in r.iter_content(chunk_size=1024 * 100):
42 |             if chunk:
43 |                 f.write(chunk)
44 |     logger.info('Saved Darknet model to {}'.format(darknet_path))
45 | 
46 | 
47 | def init_yolo_v3(config=None):
48 |     if not config:
49 |         config = DefaultYolo3Config()
50 |     logger.info('Downloading Pre-training weights of yolo v3 ...')
51 |     download_weights(config)
52 |     logger.info('Convert Darknet -> Keras ...')
53 |     if os.path.exists(config.pre_training_weights_keras_path):
54 |         logger.info('Keras model already exists! Skip!')
55 |     else:
56 |         convert(config_path=config.pre_training_weights_config_path,
57 |                 weights_path=config.pre_training_weights_darknet_path,
58 |                 output_path=config.pre_training_weights_keras_path)
59 |     logger.info('Init completed.')
60 | 
61 | 
62 | if __name__ == '__main__':
63 |     init_yolo_v3()
64 | 


--------------------------------------------------------------------------------
/xyolo/preprocessing.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | # @Time    : 2020/11/3 14:26
 3 | # @Author  : AaronJny
 4 | # @File    : preprocessing.py
 5 | # @Desc    :
 6 | import xml.etree.ElementTree as ET
 7 | from glob import glob
 8 | 
 9 | from tqdm import tqdm
10 | 
11 | 
12 | def _voc2xyolo(xml_path, classes):
13 |     in_file = open(xml_path)
14 |     tree = ET.parse(in_file)
15 |     root = tree.getroot()
16 |     image_path = root.find('path').text
17 |     ret = [image_path, ]
18 |     for obj in root.iter('object'):
19 |         difficult = obj.find('difficult').text
20 |         cls = obj.find('name').text
21 |         if cls not in classes or int(difficult) == 1:
22 |             continue
23 |         cls_id = classes[cls]
24 |         xmlbox = obj.find('bndbox')
25 |         b = (int(xmlbox.find('xmin').text), int(xmlbox.find('ymin').text), int(xmlbox.find('xmax').text),
26 |              int(xmlbox.find('ymax').text))
27 |         ret.append(",".join([str(a) for a in b]) + ',' + str(cls_id))
28 |     return ' '.join(ret)
29 | 
30 | 
31 | def voc2xyolo(input_path, classes_path, output_path):
32 |     """
33 |     将voc格式的标注数据转换成xyolo接受的类型
34 | 
35 |     Args:
36 |         input_path: 输入文件路径的正则表达式。这里是使用labelImg标注的图片label文件路径
37 |         classes_path: 保存实体类别的文件路径
38 |         output_path: 转换后的数据集保存路径
39 |     """
40 |     with open(classes_path, 'r', encoding='utf8') as f:
41 |         lines = [line.strip() for line in f.readlines()]
42 |     classes = dict(zip(lines, range(len(lines))))
43 |     files = glob(input_path)
44 |     xyolo_lines = []
45 |     for xml_path in tqdm(files):
46 |         xyolo_line = _voc2xyolo(xml_path, classes)
47 |         xyolo_lines.append(xyolo_line)
48 |     with open(output_path, 'w', encoding='utf8') as f:
49 |         f.write('\n'.join(xyolo_lines))
50 | 


--------------------------------------------------------------------------------
/xyolo/xyolo_data/coco_classes.txt:
--------------------------------------------------------------------------------
 1 | person
 2 | bicycle
 3 | car
 4 | motorbike
 5 | aeroplane
 6 | bus
 7 | train
 8 | truck
 9 | boat
10 | traffic light
11 | fire hydrant
12 | stop sign
13 | parking meter
14 | bench
15 | bird
16 | cat
17 | dog
18 | horse
19 | sheep
20 | cow
21 | elephant
22 | bear
23 | zebra
24 | giraffe
25 | backpack
26 | umbrella
27 | handbag
28 | tie
29 | suitcase
30 | frisbee
31 | skis
32 | snowboard
33 | sports ball
34 | kite
35 | baseball bat
36 | baseball glove
37 | skateboard
38 | surfboard
39 | tennis racket
40 | bottle
41 | wine glass
42 | cup
43 | fork
44 | knife
45 | spoon
46 | bowl
47 | banana
48 | apple
49 | sandwich
50 | orange
51 | broccoli
52 | carrot
53 | hot dog
54 | pizza
55 | donut
56 | cake
57 | chair
58 | sofa
59 | pottedplant
60 | bed
61 | diningtable
62 | toilet
63 | tvmonitor
64 | laptop
65 | mouse
66 | remote
67 | keyboard
68 | cell phone
69 | microwave
70 | oven
71 | toaster
72 | sink
73 | refrigerator
74 | book
75 | clock
76 | vase
77 | scissors
78 | teddy bear
79 | hair drier
80 | toothbrush
81 | 


--------------------------------------------------------------------------------
/xyolo/xyolo_data/yolo_anchors.txt:
--------------------------------------------------------------------------------
1 | 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
2 | 


--------------------------------------------------------------------------------
/xyolo/xyolo_data/yolov3.cfg:
--------------------------------------------------------------------------------
  1 | [net]
  2 | # Testing
  3 | batch=1
  4 | subdivisions=1
  5 | # Training
  6 | # batch=64
  7 | # subdivisions=16
  8 | width=416
  9 | height=416
 10 | channels=3
 11 | momentum=0.9
 12 | decay=0.0005
 13 | angle=0
 14 | saturation = 1.5
 15 | exposure = 1.5
 16 | hue=.1
 17 | 
 18 | learning_rate=0.001
 19 | burn_in=1000
 20 | max_batches = 500200
 21 | policy=steps
 22 | steps=400000,450000
 23 | scales=.1,.1
 24 | 
 25 | [convolutional]
 26 | batch_normalize=1
 27 | filters=32
 28 | size=3
 29 | stride=1
 30 | pad=1
 31 | activation=leaky
 32 | 
 33 | # Downsample
 34 | 
 35 | [convolutional]
 36 | batch_normalize=1
 37 | filters=64
 38 | size=3
 39 | stride=2
 40 | pad=1
 41 | activation=leaky
 42 | 
 43 | [convolutional]
 44 | batch_normalize=1
 45 | filters=32
 46 | size=1
 47 | stride=1
 48 | pad=1
 49 | activation=leaky
 50 | 
 51 | [convolutional]
 52 | batch_normalize=1
 53 | filters=64
 54 | size=3
 55 | stride=1
 56 | pad=1
 57 | activation=leaky
 58 | 
 59 | [shortcut]
 60 | from=-3
 61 | activation=linear
 62 | 
 63 | # Downsample
 64 | 
 65 | [convolutional]
 66 | batch_normalize=1
 67 | filters=128
 68 | size=3
 69 | stride=2
 70 | pad=1
 71 | activation=leaky
 72 | 
 73 | [convolutional]
 74 | batch_normalize=1
 75 | filters=64
 76 | size=1
 77 | stride=1
 78 | pad=1
 79 | activation=leaky
 80 | 
 81 | [convolutional]
 82 | batch_normalize=1
 83 | filters=128
 84 | size=3
 85 | stride=1
 86 | pad=1
 87 | activation=leaky
 88 | 
 89 | [shortcut]
 90 | from=-3
 91 | activation=linear
 92 | 
 93 | [convolutional]
 94 | batch_normalize=1
 95 | filters=64
 96 | size=1
 97 | stride=1
 98 | pad=1
 99 | activation=leaky
100 | 
101 | [convolutional]
102 | batch_normalize=1
103 | filters=128
104 | size=3
105 | stride=1
106 | pad=1
107 | activation=leaky
108 | 
109 | [shortcut]
110 | from=-3
111 | activation=linear
112 | 
113 | # Downsample
114 | 
115 | [convolutional]
116 | batch_normalize=1
117 | filters=256
118 | size=3
119 | stride=2
120 | pad=1
121 | activation=leaky
122 | 
123 | [convolutional]
124 | batch_normalize=1
125 | filters=128
126 | size=1
127 | stride=1
128 | pad=1
129 | activation=leaky
130 | 
131 | [convolutional]
132 | batch_normalize=1
133 | filters=256
134 | size=3
135 | stride=1
136 | pad=1
137 | activation=leaky
138 | 
139 | [shortcut]
140 | from=-3
141 | activation=linear
142 | 
143 | [convolutional]
144 | batch_normalize=1
145 | filters=128
146 | size=1
147 | stride=1
148 | pad=1
149 | activation=leaky
150 | 
151 | [convolutional]
152 | batch_normalize=1
153 | filters=256
154 | size=3
155 | stride=1
156 | pad=1
157 | activation=leaky
158 | 
159 | [shortcut]
160 | from=-3
161 | activation=linear
162 | 
163 | [convolutional]
164 | batch_normalize=1
165 | filters=128
166 | size=1
167 | stride=1
168 | pad=1
169 | activation=leaky
170 | 
171 | [convolutional]
172 | batch_normalize=1
173 | filters=256
174 | size=3
175 | stride=1
176 | pad=1
177 | activation=leaky
178 | 
179 | [shortcut]
180 | from=-3
181 | activation=linear
182 | 
183 | [convolutional]
184 | batch_normalize=1
185 | filters=128
186 | size=1
187 | stride=1
188 | pad=1
189 | activation=leaky
190 | 
191 | [convolutional]
192 | batch_normalize=1
193 | filters=256
194 | size=3
195 | stride=1
196 | pad=1
197 | activation=leaky
198 | 
199 | [shortcut]
200 | from=-3
201 | activation=linear
202 | 
203 | 
204 | [convolutional]
205 | batch_normalize=1
206 | filters=128
207 | size=1
208 | stride=1
209 | pad=1
210 | activation=leaky
211 | 
212 | [convolutional]
213 | batch_normalize=1
214 | filters=256
215 | size=3
216 | stride=1
217 | pad=1
218 | activation=leaky
219 | 
220 | [shortcut]
221 | from=-3
222 | activation=linear
223 | 
224 | [convolutional]
225 | batch_normalize=1
226 | filters=128
227 | size=1
228 | stride=1
229 | pad=1
230 | activation=leaky
231 | 
232 | [convolutional]
233 | batch_normalize=1
234 | filters=256
235 | size=3
236 | stride=1
237 | pad=1
238 | activation=leaky
239 | 
240 | [shortcut]
241 | from=-3
242 | activation=linear
243 | 
244 | [convolutional]
245 | batch_normalize=1
246 | filters=128
247 | size=1
248 | stride=1
249 | pad=1
250 | activation=leaky
251 | 
252 | [convolutional]
253 | batch_normalize=1
254 | filters=256
255 | size=3
256 | stride=1
257 | pad=1
258 | activation=leaky
259 | 
260 | [shortcut]
261 | from=-3
262 | activation=linear
263 | 
264 | [convolutional]
265 | batch_normalize=1
266 | filters=128
267 | size=1
268 | stride=1
269 | pad=1
270 | activation=leaky
271 | 
272 | [convolutional]
273 | batch_normalize=1
274 | filters=256
275 | size=3
276 | stride=1
277 | pad=1
278 | activation=leaky
279 | 
280 | [shortcut]
281 | from=-3
282 | activation=linear
283 | 
284 | # Downsample
285 | 
286 | [convolutional]
287 | batch_normalize=1
288 | filters=512
289 | size=3
290 | stride=2
291 | pad=1
292 | activation=leaky
293 | 
294 | [convolutional]
295 | batch_normalize=1
296 | filters=256
297 | size=1
298 | stride=1
299 | pad=1
300 | activation=leaky
301 | 
302 | [convolutional]
303 | batch_normalize=1
304 | filters=512
305 | size=3
306 | stride=1
307 | pad=1
308 | activation=leaky
309 | 
310 | [shortcut]
311 | from=-3
312 | activation=linear
313 | 
314 | 
315 | [convolutional]
316 | batch_normalize=1
317 | filters=256
318 | size=1
319 | stride=1
320 | pad=1
321 | activation=leaky
322 | 
323 | [convolutional]
324 | batch_normalize=1
325 | filters=512
326 | size=3
327 | stride=1
328 | pad=1
329 | activation=leaky
330 | 
331 | [shortcut]
332 | from=-3
333 | activation=linear
334 | 
335 | 
336 | [convolutional]
337 | batch_normalize=1
338 | filters=256
339 | size=1
340 | stride=1
341 | pad=1
342 | activation=leaky
343 | 
344 | [convolutional]
345 | batch_normalize=1
346 | filters=512
347 | size=3
348 | stride=1
349 | pad=1
350 | activation=leaky
351 | 
352 | [shortcut]
353 | from=-3
354 | activation=linear
355 | 
356 | 
357 | [convolutional]
358 | batch_normalize=1
359 | filters=256
360 | size=1
361 | stride=1
362 | pad=1
363 | activation=leaky
364 | 
365 | [convolutional]
366 | batch_normalize=1
367 | filters=512
368 | size=3
369 | stride=1
370 | pad=1
371 | activation=leaky
372 | 
373 | [shortcut]
374 | from=-3
375 | activation=linear
376 | 
377 | [convolutional]
378 | batch_normalize=1
379 | filters=256
380 | size=1
381 | stride=1
382 | pad=1
383 | activation=leaky
384 | 
385 | [convolutional]
386 | batch_normalize=1
387 | filters=512
388 | size=3
389 | stride=1
390 | pad=1
391 | activation=leaky
392 | 
393 | [shortcut]
394 | from=-3
395 | activation=linear
396 | 
397 | 
398 | [convolutional]
399 | batch_normalize=1
400 | filters=256
401 | size=1
402 | stride=1
403 | pad=1
404 | activation=leaky
405 | 
406 | [convolutional]
407 | batch_normalize=1
408 | filters=512
409 | size=3
410 | stride=1
411 | pad=1
412 | activation=leaky
413 | 
414 | [shortcut]
415 | from=-3
416 | activation=linear
417 | 
418 | 
419 | [convolutional]
420 | batch_normalize=1
421 | filters=256
422 | size=1
423 | stride=1
424 | pad=1
425 | activation=leaky
426 | 
427 | [convolutional]
428 | batch_normalize=1
429 | filters=512
430 | size=3
431 | stride=1
432 | pad=1
433 | activation=leaky
434 | 
435 | [shortcut]
436 | from=-3
437 | activation=linear
438 | 
439 | [convolutional]
440 | batch_normalize=1
441 | filters=256
442 | size=1
443 | stride=1
444 | pad=1
445 | activation=leaky
446 | 
447 | [convolutional]
448 | batch_normalize=1
449 | filters=512
450 | size=3
451 | stride=1
452 | pad=1
453 | activation=leaky
454 | 
455 | [shortcut]
456 | from=-3
457 | activation=linear
458 | 
459 | # Downsample
460 | 
461 | [convolutional]
462 | batch_normalize=1
463 | filters=1024
464 | size=3
465 | stride=2
466 | pad=1
467 | activation=leaky
468 | 
469 | [convolutional]
470 | batch_normalize=1
471 | filters=512
472 | size=1
473 | stride=1
474 | pad=1
475 | activation=leaky
476 | 
477 | [convolutional]
478 | batch_normalize=1
479 | filters=1024
480 | size=3
481 | stride=1
482 | pad=1
483 | activation=leaky
484 | 
485 | [shortcut]
486 | from=-3
487 | activation=linear
488 | 
489 | [convolutional]
490 | batch_normalize=1
491 | filters=512
492 | size=1
493 | stride=1
494 | pad=1
495 | activation=leaky
496 | 
497 | [convolutional]
498 | batch_normalize=1
499 | filters=1024
500 | size=3
501 | stride=1
502 | pad=1
503 | activation=leaky
504 | 
505 | [shortcut]
506 | from=-3
507 | activation=linear
508 | 
509 | [convolutional]
510 | batch_normalize=1
511 | filters=512
512 | size=1
513 | stride=1
514 | pad=1
515 | activation=leaky
516 | 
517 | [convolutional]
518 | batch_normalize=1
519 | filters=1024
520 | size=3
521 | stride=1
522 | pad=1
523 | activation=leaky
524 | 
525 | [shortcut]
526 | from=-3
527 | activation=linear
528 | 
529 | [convolutional]
530 | batch_normalize=1
531 | filters=512
532 | size=1
533 | stride=1
534 | pad=1
535 | activation=leaky
536 | 
537 | [convolutional]
538 | batch_normalize=1
539 | filters=1024
540 | size=3
541 | stride=1
542 | pad=1
543 | activation=leaky
544 | 
545 | [shortcut]
546 | from=-3
547 | activation=linear
548 | 
549 | ######################
550 | 
551 | [convolutional]
552 | batch_normalize=1
553 | filters=512
554 | size=1
555 | stride=1
556 | pad=1
557 | activation=leaky
558 | 
559 | [convolutional]
560 | batch_normalize=1
561 | size=3
562 | stride=1
563 | pad=1
564 | filters=1024
565 | activation=leaky
566 | 
567 | [convolutional]
568 | batch_normalize=1
569 | filters=512
570 | size=1
571 | stride=1
572 | pad=1
573 | activation=leaky
574 | 
575 | [convolutional]
576 | batch_normalize=1
577 | size=3
578 | stride=1
579 | pad=1
580 | filters=1024
581 | activation=leaky
582 | 
583 | [convolutional]
584 | batch_normalize=1
585 | filters=512
586 | size=1
587 | stride=1
588 | pad=1
589 | activation=leaky
590 | 
591 | [convolutional]
592 | batch_normalize=1
593 | size=3
594 | stride=1
595 | pad=1
596 | filters=1024
597 | activation=leaky
598 | 
599 | [convolutional]
600 | size=1
601 | stride=1
602 | pad=1
603 | filters=255
604 | activation=linear
605 | 
606 | 
607 | [yolo]
608 | mask = 6,7,8
609 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
610 | classes=80
611 | num=9
612 | jitter=.3
613 | ignore_thresh = .5
614 | truth_thresh = 1
615 | random=1
616 | 
617 | 
618 | [route]
619 | layers = -4
620 | 
621 | [convolutional]
622 | batch_normalize=1
623 | filters=256
624 | size=1
625 | stride=1
626 | pad=1
627 | activation=leaky
628 | 
629 | [upsample]
630 | stride=2
631 | 
632 | [route]
633 | layers = -1, 61
634 | 
635 | 
636 | 
637 | [convolutional]
638 | batch_normalize=1
639 | filters=256
640 | size=1
641 | stride=1
642 | pad=1
643 | activation=leaky
644 | 
645 | [convolutional]
646 | batch_normalize=1
647 | size=3
648 | stride=1
649 | pad=1
650 | filters=512
651 | activation=leaky
652 | 
653 | [convolutional]
654 | batch_normalize=1
655 | filters=256
656 | size=1
657 | stride=1
658 | pad=1
659 | activation=leaky
660 | 
661 | [convolutional]
662 | batch_normalize=1
663 | size=3
664 | stride=1
665 | pad=1
666 | filters=512
667 | activation=leaky
668 | 
669 | [convolutional]
670 | batch_normalize=1
671 | filters=256
672 | size=1
673 | stride=1
674 | pad=1
675 | activation=leaky
676 | 
677 | [convolutional]
678 | batch_normalize=1
679 | size=3
680 | stride=1
681 | pad=1
682 | filters=512
683 | activation=leaky
684 | 
685 | [convolutional]
686 | size=1
687 | stride=1
688 | pad=1
689 | filters=255
690 | activation=linear
691 | 
692 | 
693 | [yolo]
694 | mask = 3,4,5
695 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
696 | classes=80
697 | num=9
698 | jitter=.3
699 | ignore_thresh = .5
700 | truth_thresh = 1
701 | random=1
702 | 
703 | 
704 | 
705 | [route]
706 | layers = -4
707 | 
708 | [convolutional]
709 | batch_normalize=1
710 | filters=128
711 | size=1
712 | stride=1
713 | pad=1
714 | activation=leaky
715 | 
716 | [upsample]
717 | stride=2
718 | 
719 | [route]
720 | layers = -1, 36
721 | 
722 | 
723 | 
724 | [convolutional]
725 | batch_normalize=1
726 | filters=128
727 | size=1
728 | stride=1
729 | pad=1
730 | activation=leaky
731 | 
732 | [convolutional]
733 | batch_normalize=1
734 | size=3
735 | stride=1
736 | pad=1
737 | filters=256
738 | activation=leaky
739 | 
740 | [convolutional]
741 | batch_normalize=1
742 | filters=128
743 | size=1
744 | stride=1
745 | pad=1
746 | activation=leaky
747 | 
748 | [convolutional]
749 | batch_normalize=1
750 | size=3
751 | stride=1
752 | pad=1
753 | filters=256
754 | activation=leaky
755 | 
756 | [convolutional]
757 | batch_normalize=1
758 | filters=128
759 | size=1
760 | stride=1
761 | pad=1
762 | activation=leaky
763 | 
764 | [convolutional]
765 | batch_normalize=1
766 | size=3
767 | stride=1
768 | pad=1
769 | filters=256
770 | activation=leaky
771 | 
772 | [convolutional]
773 | size=1
774 | stride=1
775 | pad=1
776 | filters=255
777 | activation=linear
778 | 
779 | 
780 | [yolo]
781 | mask = 0,1,2
782 | anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
783 | classes=80
784 | num=9
785 | jitter=.3
786 | ignore_thresh = .5
787 | truth_thresh = 1
788 | random=1
789 | 
790 | 


--------------------------------------------------------------------------------
/xyolo/yolo3/__init__.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | # @Time    : 2020/10/31 00:16
3 | # @Author  : AaronJny
4 | # @File    : __init__.py.py
5 | # @Desc    :
6 | 


--------------------------------------------------------------------------------
/xyolo/yolo3/model.py:
--------------------------------------------------------------------------------
  1 | """YOLO_v3 Model Defined in Keras."""
  2 | 
  3 | from functools import wraps
  4 | 
  5 | import numpy as np
  6 | import tensorflow as tf
  7 | import tensorflow.keras.backend as K
  8 | from tensorflow.keras.layers import BatchNormalization
  9 | from tensorflow.keras.layers import Conv2D, Add, ZeroPadding2D, UpSampling2D, Concatenate, MaxPooling2D, Input, Lambda
 10 | from tensorflow.keras.layers import LeakyReLU
 11 | from tensorflow.keras.models import Model
 12 | from tensorflow.keras.regularizers import l2
 13 | 
 14 | from xyolo.yolo3.utils import compose
 15 | 
 16 | 
 17 | @wraps(Conv2D)
 18 | def DarknetConv2D(*args, **kwargs):
 19 |     """Wrapper to set Darknet parameters for Convolution2D."""
 20 |     darknet_conv_kwargs = {'kernel_regularizer': l2(5e-4)}
 21 |     darknet_conv_kwargs['padding'] = 'valid' if kwargs.get('strides') == (2, 2) else 'same'
 22 |     darknet_conv_kwargs.update(kwargs)
 23 |     return Conv2D(*args, **darknet_conv_kwargs)
 24 | 
 25 | 
 26 | def DarknetConv2D_BN_Leaky(*args, **kwargs):
 27 |     """Darknet Convolution2D followed by BatchNormalization and LeakyReLU."""
 28 |     no_bias_kwargs = {'use_bias': False}
 29 |     no_bias_kwargs.update(kwargs)
 30 |     return compose(
 31 |         DarknetConv2D(*args, **no_bias_kwargs),
 32 |         BatchNormalization(),
 33 |         LeakyReLU(alpha=0.1))
 34 | 
 35 | 
 36 | def resblock_body(x, num_filters, num_blocks):
 37 |     '''A series of resblocks starting with a downsampling Convolution2D'''
 38 |     # Darknet uses left and top padding instead of 'same' mode
 39 |     x = ZeroPadding2D(((1, 0), (1, 0)))(x)
 40 |     x = DarknetConv2D_BN_Leaky(num_filters, (3, 3), strides=(2, 2))(x)
 41 |     for i in range(num_blocks):
 42 |         y = compose(
 43 |             DarknetConv2D_BN_Leaky(num_filters // 2, (1, 1)),
 44 |             DarknetConv2D_BN_Leaky(num_filters, (3, 3)))(x)
 45 |         x = Add()([x, y])
 46 |     return x
 47 | 
 48 | 
 49 | def darknet_body(x):
 50 |     '''Darknent body having 52 Convolution2D layers'''
 51 |     x = DarknetConv2D_BN_Leaky(32, (3, 3))(x)
 52 |     x = resblock_body(x, 64, 1)
 53 |     x = resblock_body(x, 128, 2)
 54 |     x = resblock_body(x, 256, 8)
 55 |     x = resblock_body(x, 512, 8)
 56 |     x = resblock_body(x, 1024, 4)
 57 |     return x
 58 | 
 59 | 
 60 | def make_last_layers(x, num_filters, out_filters):
 61 |     '''6 Conv2D_BN_Leaky layers followed by a Conv2D_linear layer'''
 62 |     x = compose(
 63 |         DarknetConv2D_BN_Leaky(num_filters, (1, 1)),
 64 |         DarknetConv2D_BN_Leaky(num_filters * 2, (3, 3)),
 65 |         DarknetConv2D_BN_Leaky(num_filters, (1, 1)),
 66 |         DarknetConv2D_BN_Leaky(num_filters * 2, (3, 3)),
 67 |         DarknetConv2D_BN_Leaky(num_filters, (1, 1)))(x)
 68 |     y = compose(
 69 |         DarknetConv2D_BN_Leaky(num_filters * 2, (3, 3)),
 70 |         DarknetConv2D(out_filters, (1, 1)))(x)
 71 |     return x, y
 72 | 
 73 | 
 74 | def yolo_body(inputs, num_anchors, num_classes):
 75 |     """Create YOLO_V3 model CNN body in Keras."""
 76 |     darknet = Model(inputs, darknet_body(inputs))
 77 |     x, y1 = make_last_layers(darknet.output, 512, num_anchors * (num_classes + 5))
 78 | 
 79 |     x = compose(
 80 |         DarknetConv2D_BN_Leaky(256, (1, 1)),
 81 |         UpSampling2D(2))(x)
 82 |     x = Concatenate()([x, darknet.layers[152].output])
 83 |     x, y2 = make_last_layers(x, 256, num_anchors * (num_classes + 5))
 84 | 
 85 |     x = compose(
 86 |         DarknetConv2D_BN_Leaky(128, (1, 1)),
 87 |         UpSampling2D(2))(x)
 88 |     x = Concatenate()([x, darknet.layers[92].output])
 89 |     x, y3 = make_last_layers(x, 128, num_anchors * (num_classes + 5))
 90 | 
 91 |     return Model(inputs, [y1, y2, y3])
 92 | 
 93 | 
 94 | def tiny_yolo_body(inputs, num_anchors, num_classes):
 95 |     '''Create Tiny YOLO_v3 model CNN body in keras.'''
 96 |     x1 = compose(
 97 |         DarknetConv2D_BN_Leaky(16, (3, 3)),
 98 |         MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'),
 99 |         DarknetConv2D_BN_Leaky(32, (3, 3)),
100 |         MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'),
101 |         DarknetConv2D_BN_Leaky(64, (3, 3)),
102 |         MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'),
103 |         DarknetConv2D_BN_Leaky(128, (3, 3)),
104 |         MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'),
105 |         DarknetConv2D_BN_Leaky(256, (3, 3)))(inputs)
106 |     x2 = compose(
107 |         MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'),
108 |         DarknetConv2D_BN_Leaky(512, (3, 3)),
109 |         MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding='same'),
110 |         DarknetConv2D_BN_Leaky(1024, (3, 3)),
111 |         DarknetConv2D_BN_Leaky(256, (1, 1)))(x1)
112 |     y1 = compose(
113 |         DarknetConv2D_BN_Leaky(512, (3, 3)),
114 |         DarknetConv2D(num_anchors * (num_classes + 5), (1, 1)))(x2)
115 | 
116 |     x2 = compose(
117 |         DarknetConv2D_BN_Leaky(128, (1, 1)),
118 |         UpSampling2D(2))(x2)
119 |     y2 = compose(
120 |         Concatenate(),
121 |         DarknetConv2D_BN_Leaky(256, (3, 3)),
122 |         DarknetConv2D(num_anchors * (num_classes + 5), (1, 1)))([x2, x1])
123 | 
124 |     return Model(inputs, [y1, y2])
125 | 
126 | 
127 | def yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False):
128 |     """Convert final layer features to bounding box parameters."""
129 |     num_anchors = len(anchors)
130 |     # Reshape to batch, height, width, num_anchors, box_params.
131 |     anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2])
132 | 
133 |     grid_shape = K.shape(feats)[1:3]  # height, width
134 |     grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]),
135 |                     [1, grid_shape[1], 1, 1])
136 |     grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]),
137 |                     [grid_shape[0], 1, 1, 1])
138 |     grid = K.concatenate([grid_x, grid_y])
139 |     grid = K.cast(grid, K.dtype(feats))
140 | 
141 |     feats = K.reshape(
142 |         feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes + 5])
143 | 
144 |     # Adjust preditions to each spatial grid point and anchor size.
145 |     box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[::-1], K.dtype(feats))
146 |     box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[::-1], K.dtype(feats))
147 |     box_confidence = K.sigmoid(feats[..., 4:5])
148 |     box_class_probs = K.sigmoid(feats[..., 5:])
149 | 
150 |     if calc_loss == True:
151 |         return grid, feats, box_xy, box_wh
152 |     return box_xy, box_wh, box_confidence, box_class_probs
153 | 
154 | 
155 | def yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape):
156 |     '''Get corrected boxes'''
157 |     box_yx = box_xy[..., ::-1]
158 |     box_hw = box_wh[..., ::-1]
159 |     input_shape = K.cast(input_shape, K.dtype(box_yx))
160 |     image_shape = K.cast(image_shape, K.dtype(box_yx))
161 |     new_shape = K.round(image_shape * K.min(input_shape / image_shape))
162 |     offset = (input_shape - new_shape) / 2. / input_shape
163 |     scale = input_shape / new_shape
164 |     box_yx = (box_yx - offset) * scale
165 |     box_hw *= scale
166 | 
167 |     box_mins = box_yx - (box_hw / 2.)
168 |     box_maxes = box_yx + (box_hw / 2.)
169 |     boxes = K.concatenate([
170 |         box_mins[..., 0:1],  # y_min
171 |         box_mins[..., 1:2],  # x_min
172 |         box_maxes[..., 0:1],  # y_max
173 |         box_maxes[..., 1:2]  # x_max
174 |     ])
175 | 
176 |     # Scale boxes back to original image shape.
177 |     boxes *= K.concatenate([image_shape, image_shape])
178 |     return boxes
179 | 
180 | 
181 | def yolo_boxes_and_scores(feats, anchors, num_classes, input_shape, image_shape):
182 |     '''Process Conv layer output'''
183 |     box_xy, box_wh, box_confidence, box_class_probs = yolo_head(feats,
184 |                                                                 anchors, num_classes, input_shape)
185 |     boxes = yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape)
186 |     boxes = K.reshape(boxes, [-1, 4])
187 |     box_scores = box_confidence * box_class_probs
188 |     box_scores = K.reshape(box_scores, [-1, num_classes])
189 |     return boxes, box_scores
190 | 
191 | 
192 | def yolo_eval(yolo_outputs,
193 |               anchors,
194 |               num_classes,
195 |               image_shape,
196 |               max_boxes=20,
197 |               score_threshold=.6,
198 |               iou_threshold=.5):
199 |     """Evaluate YOLO model on given input and return filtered boxes."""
200 |     num_layers = len(yolo_outputs)
201 |     anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] if num_layers == 3 else [[3, 4, 5], [1, 2, 3]]  # default setting
202 |     input_shape = K.shape(yolo_outputs[0])[1:3] * 32
203 |     boxes = []
204 |     box_scores = []
205 |     for l in range(num_layers):
206 |         _boxes, _box_scores = yolo_boxes_and_scores(yolo_outputs[l],
207 |                                                     anchors[anchor_mask[l]], num_classes, input_shape, image_shape)
208 |         boxes.append(_boxes)
209 |         box_scores.append(_box_scores)
210 |     boxes = K.concatenate(boxes, axis=0)
211 |     box_scores = K.concatenate(box_scores, axis=0)
212 | 
213 |     mask = box_scores >= score_threshold
214 |     max_boxes_tensor = K.constant(max_boxes, dtype='int32')
215 |     boxes_ = []
216 |     scores_ = []
217 |     classes_ = []
218 |     for c in range(num_classes):
219 |         class_boxes = tf.boolean_mask(boxes, mask[:, c])
220 |         class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c])
221 |         nms_index = tf.image.non_max_suppression(
222 |             class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold)
223 |         class_boxes = K.gather(class_boxes, nms_index)
224 |         class_box_scores = K.gather(class_box_scores, nms_index)
225 |         classes = K.ones_like(class_box_scores, 'int32') * c
226 |         boxes_.append(class_boxes)
227 |         scores_.append(class_box_scores)
228 |         classes_.append(classes)
229 |     boxes_ = K.concatenate(boxes_, axis=0)
230 |     scores_ = K.concatenate(scores_, axis=0)
231 |     classes_ = K.concatenate(classes_, axis=0)
232 | 
233 |     return boxes_, scores_, classes_
234 | 
235 | 
236 | def preprocess_true_boxes(true_boxes, input_shape, anchors, num_classes):
237 |     '''Preprocess true boxes to training input format
238 | 
239 |     Parameters
240 |     ----------
241 |     true_boxes: array, shape=(m, T, 5)
242 |         Absolute x_min, y_min, x_max, y_max, class_id relative to input_shape.
243 |     input_shape: array-like, hw, multiples of 32
244 |     anchors: array, shape=(N, 2), wh
245 |     num_classes: integer
246 | 
247 |     Returns
248 |     -------
249 |     y_true: list of array, shape like yolo_outputs, xywh are reletive value
250 | 
251 |     '''
252 |     assert (true_boxes[..., 4] < num_classes).all(), 'class id must be less than num_classes'
253 |     num_layers = len(anchors) // 3  # default setting
254 |     anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] if num_layers == 3 else [[3, 4, 5], [1, 2, 3]]
255 | 
256 |     true_boxes = np.array(true_boxes, dtype='float32')
257 |     input_shape = np.array(input_shape, dtype='int32')
258 |     boxes_xy = (true_boxes[..., 0:2] + true_boxes[..., 2:4]) // 2
259 |     boxes_wh = true_boxes[..., 2:4] - true_boxes[..., 0:2]
260 |     true_boxes[..., 0:2] = boxes_xy / input_shape[::-1]
261 |     true_boxes[..., 2:4] = boxes_wh / input_shape[::-1]
262 | 
263 |     m = true_boxes.shape[0]
264 |     grid_shapes = [input_shape // {0: 32, 1: 16, 2: 8}[l] for l in range(num_layers)]
265 |     y_true = [np.zeros((m, grid_shapes[l][0], grid_shapes[l][1], len(anchor_mask[l]), 5 + num_classes),
266 |                        dtype='float32') for l in range(num_layers)]
267 | 
268 |     # Expand dim to apply broadcasting.
269 |     anchors = np.expand_dims(anchors, 0)
270 |     anchor_maxes = anchors / 2.
271 |     anchor_mins = -anchor_maxes
272 |     valid_mask = boxes_wh[..., 0] > 0
273 | 
274 |     for b in range(m):
275 |         # Discard zero rows.
276 |         wh = boxes_wh[b, valid_mask[b]]
277 |         if len(wh) == 0: continue
278 |         # Expand dim to apply broadcasting.
279 |         wh = np.expand_dims(wh, -2)
280 |         box_maxes = wh / 2.
281 |         box_mins = -box_maxes
282 | 
283 |         intersect_mins = np.maximum(box_mins, anchor_mins)
284 |         intersect_maxes = np.minimum(box_maxes, anchor_maxes)
285 |         intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.)
286 |         intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
287 |         box_area = wh[..., 0] * wh[..., 1]
288 |         anchor_area = anchors[..., 0] * anchors[..., 1]
289 |         iou = intersect_area / (box_area + anchor_area - intersect_area)
290 | 
291 |         # Find best anchor for each true box
292 |         best_anchor = np.argmax(iou, axis=-1)
293 | 
294 |         for t, n in enumerate(best_anchor):
295 |             for l in range(num_layers):
296 |                 if n in anchor_mask[l]:
297 |                     i = np.floor(true_boxes[b, t, 0] * grid_shapes[l][1]).astype('int32')
298 |                     j = np.floor(true_boxes[b, t, 1] * grid_shapes[l][0]).astype('int32')
299 |                     k = anchor_mask[l].index(n)
300 |                     c = true_boxes[b, t, 4].astype('int32')
301 |                     y_true[l][b, j, i, k, 0:4] = true_boxes[b, t, 0:4]
302 |                     y_true[l][b, j, i, k, 4] = 1
303 |                     y_true[l][b, j, i, k, 5 + c] = 1
304 | 
305 |     return y_true
306 | 
307 | 
308 | def box_iou(b1, b2):
309 |     '''Return iou tensor
310 | 
311 |     Parameters
312 |     ----------
313 |     b1: tensor, shape=(i1,...,iN, 4), xywh
314 |     b2: tensor, shape=(j, 4), xywh
315 | 
316 |     Returns
317 |     -------
318 |     iou: tensor, shape=(i1,...,iN, j)
319 | 
320 |     '''
321 | 
322 |     # Expand dim to apply broadcasting.
323 |     b1 = K.expand_dims(b1, -2)
324 |     b1_xy = b1[..., :2]
325 |     b1_wh = b1[..., 2:4]
326 |     b1_wh_half = b1_wh / 2.
327 |     b1_mins = b1_xy - b1_wh_half
328 |     b1_maxes = b1_xy + b1_wh_half
329 | 
330 |     # Expand dim to apply broadcasting.
331 |     b2 = K.expand_dims(b2, 0)
332 |     b2_xy = b2[..., :2]
333 |     b2_wh = b2[..., 2:4]
334 |     b2_wh_half = b2_wh / 2.
335 |     b2_mins = b2_xy - b2_wh_half
336 |     b2_maxes = b2_xy + b2_wh_half
337 | 
338 |     intersect_mins = K.maximum(b1_mins, b2_mins)
339 |     intersect_maxes = K.minimum(b1_maxes, b2_maxes)
340 |     intersect_wh = K.maximum(intersect_maxes - intersect_mins, 0.)
341 |     intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
342 |     b1_area = b1_wh[..., 0] * b1_wh[..., 1]
343 |     b2_area = b2_wh[..., 0] * b2_wh[..., 1]
344 |     iou = intersect_area / (b1_area + b2_area - intersect_area)
345 | 
346 |     return iou
347 | 
348 | 
349 | def yolo_loss(args, anchors, num_classes, ignore_thresh=.5, print_loss=False):
350 |     '''Return yolo_loss tensor
351 | 
352 |     Parameters
353 |     ----------
354 |     yolo_outputs: list of tensor, the output of yolo_body or tiny_yolo_body
355 |     y_true: list of array, the output of preprocess_true_boxes
356 |     anchors: array, shape=(N, 2), wh
357 |     num_classes: integer
358 |     ignore_thresh: float, the iou threshold whether to ignore object confidence loss
359 | 
360 |     Returns
361 |     -------
362 |     loss: tensor, shape=(1,)
363 | 
364 |     '''
365 |     num_layers = len(anchors) // 3  # default setting
366 |     yolo_outputs = args[:num_layers]
367 |     y_true = args[num_layers:]
368 |     anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] if num_layers == 3 else [[3, 4, 5], [1, 2, 3]]
369 |     input_shape = K.cast(K.shape(yolo_outputs[0])[1:3] * 32, K.dtype(y_true[0]))
370 |     grid_shapes = [K.cast(K.shape(yolo_outputs[l])[1:3], K.dtype(y_true[0])) for l in range(num_layers)]
371 |     loss = 0
372 |     m = K.shape(yolo_outputs[0])[0]  # batch size, tensor
373 |     mf = K.cast(m, K.dtype(yolo_outputs[0]))
374 | 
375 |     for l in range(num_layers):
376 |         object_mask = y_true[l][..., 4:5]
377 |         true_class_probs = y_true[l][..., 5:]
378 | 
379 |         grid, raw_pred, pred_xy, pred_wh = yolo_head(yolo_outputs[l],
380 |                                                      anchors[anchor_mask[l]], num_classes, input_shape, calc_loss=True)
381 |         pred_box = K.concatenate([pred_xy, pred_wh])
382 | 
383 |         # Darknet raw box to calculate loss.
384 |         raw_true_xy = y_true[l][..., :2] * grid_shapes[l][::-1] - grid
385 |         raw_true_wh = K.log(y_true[l][..., 2:4] / anchors[anchor_mask[l]] * input_shape[::-1])
386 |         raw_true_wh = K.switch(object_mask, raw_true_wh, K.zeros_like(raw_true_wh))  # avoid log(0)=-inf
387 |         box_loss_scale = 2 - y_true[l][..., 2:3] * y_true[l][..., 3:4]
388 | 
389 |         # Find ignore mask, iterate over each of batch.
390 |         ignore_mask = tf.TensorArray(K.dtype(y_true[0]), size=1, dynamic_size=True)
391 |         object_mask_bool = K.cast(object_mask, 'bool')
392 | 
393 |         def loop_body(b, ignore_mask):
394 |             true_box = tf.boolean_mask(y_true[l][b, ..., 0:4], object_mask_bool[b, ..., 0])
395 |             iou = box_iou(pred_box[b], true_box)
396 |             best_iou = K.max(iou, axis=-1)
397 |             ignore_mask = ignore_mask.write(b, K.cast(best_iou < ignore_thresh, K.dtype(true_box)))
398 |             return b + 1, ignore_mask
399 | 
400 |         _, ignore_mask = tf.while_loop(lambda b, *args: b < m, loop_body, [0, ignore_mask])
401 |         ignore_mask = ignore_mask.stack()
402 |         ignore_mask = K.expand_dims(ignore_mask, -1)
403 | 
404 |         # K.binary_crossentropy is helpful to avoid exp overflow.
405 |         xy_loss = object_mask * box_loss_scale * K.binary_crossentropy(raw_true_xy, raw_pred[..., 0:2],
406 |                                                                        from_logits=True)
407 |         wh_loss = object_mask * box_loss_scale * 0.5 * K.square(raw_true_wh - raw_pred[..., 2:4])
408 |         confidence_loss = object_mask * K.binary_crossentropy(object_mask, raw_pred[..., 4:5], from_logits=True) + \
409 |                           (1 - object_mask) * K.binary_crossentropy(object_mask, raw_pred[..., 4:5],
410 |                                                                     from_logits=True) * ignore_mask
411 |         class_loss = object_mask * K.binary_crossentropy(true_class_probs, raw_pred[..., 5:], from_logits=True)
412 | 
413 |         xy_loss = K.sum(xy_loss) / mf
414 |         wh_loss = K.sum(wh_loss) / mf
415 |         confidence_loss = K.sum(confidence_loss) / mf
416 |         class_loss = K.sum(class_loss) / mf
417 |         loss += xy_loss + wh_loss + confidence_loss + class_loss
418 |         if print_loss:
419 |             loss = tf.print(loss, [loss, xy_loss, wh_loss, confidence_loss, class_loss, K.sum(ignore_mask)],
420 |                             message='loss: ')
421 |     return loss
422 | 
423 | 
424 | def create_model(input_shape, anchors, num_classes, weights_path, load_pretrained=True, freeze_body=2):
425 |     '''create the training model'''
426 |     K.clear_session()  # get a new session
427 |     image_input = Input(shape=(None, None, 3))
428 |     h, w = input_shape
429 |     num_anchors = len(anchors)
430 | 
431 |     y_true = [Input(shape=(h // {0: 32, 1: 16, 2: 8}[l], w // {0: 32, 1: 16, 2: 8}[l], \
432 |                            num_anchors // 3, num_classes + 5)) for l in range(3)]
433 | 
434 |     model_body = yolo_body(image_input, num_anchors // 3, num_classes)
435 |     print('Create YOLOv3 model with {} anchors and {} classes.'.format(num_anchors, num_classes))
436 | 
437 |     if load_pretrained:
438 |         model_body.load_weights(weights_path, by_name=True, skip_mismatch=True)
439 |         print('Load weights {}.'.format(weights_path))
440 |         if freeze_body in [1, 2]:
441 |             # Freeze darknet53 body or freeze all but 3 output layers.
442 |             num = (185, len(model_body.layers) - 3)[freeze_body - 1]
443 |             for i in range(num): model_body.layers[i].trainable = False
444 |             print('Freeze the first {} layers of total {} layers.'.format(num, len(model_body.layers)))
445 | 
446 |     model_loss = Lambda(yolo_loss, output_shape=(1,), name='yolo_loss',
447 |                         arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.5})(
448 |         [*model_body.output, *y_true])
449 |     model = Model([model_body.input, *y_true], model_loss)
450 | 
451 |     return model
452 | 
453 | 
454 | def create_tiny_model(input_shape, anchors, num_classes, weights_path, load_pretrained=True, freeze_body=2):
455 |     '''create the training model, for Tiny YOLOv3'''
456 |     K.clear_session()  # get a new session
457 |     image_input = Input(shape=(None, None, 3))
458 |     h, w = input_shape
459 |     num_anchors = len(anchors)
460 | 
461 |     y_true = [Input(shape=(h // {0: 32, 1: 16}[l], w // {0: 32, 1: 16}[l], \
462 |                            num_anchors // 2, num_classes + 5)) for l in range(2)]
463 | 
464 |     model_body = tiny_yolo_body(image_input, num_anchors // 2, num_classes)
465 |     print('Create Tiny YOLOv3 model with {} anchors and {} classes.'.format(num_anchors, num_classes))
466 | 
467 |     if load_pretrained:
468 |         model_body.load_weights(weights_path, by_name=True, skip_mismatch=True)
469 |         print('Load weights {}.'.format(weights_path))
470 |         if freeze_body in [1, 2]:
471 |             # Freeze the darknet body or freeze all but 2 output layers.
472 |             num = (20, len(model_body.layers) - 2)[freeze_body - 1]
473 |             for i in range(num): model_body.layers[i].trainable = False
474 |             print('Freeze the first {} layers of total {} layers.'.format(num, len(model_body.layers)))
475 | 
476 |     model_loss = Lambda(yolo_loss, output_shape=(1,), name='yolo_loss',
477 |                         arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.7})(
478 |         [*model_body.output, *y_true])
479 |     model = Model([model_body.input, *y_true], model_loss)
480 | 
481 |     return model
482 | 


--------------------------------------------------------------------------------
/xyolo/yolo3/utils.py:
--------------------------------------------------------------------------------
  1 | """Miscellaneous utility functions."""
  2 | 
  3 | from functools import reduce
  4 | 
  5 | import numpy as np
  6 | from PIL import Image
  7 | from matplotlib.colors import rgb_to_hsv, hsv_to_rgb
  8 | 
  9 | 
 10 | def compose(*funcs):
 11 |     """Compose arbitrarily many functions, evaluated left to right.
 12 | 
 13 |     Reference: https://mathieularose.com/function-composition-in-python/
 14 |     """
 15 |     # return lambda x: reduce(lambda v, f: f(v), funcs, x)
 16 |     if funcs:
 17 |         return reduce(lambda f, g: lambda *a, **kw: g(f(*a, **kw)), funcs)
 18 |     else:
 19 |         raise ValueError('Composition of empty sequence not supported.')
 20 | 
 21 | 
 22 | def letterbox_image(image, size):
 23 |     '''resize image with unchanged aspect ratio using padding'''
 24 |     iw, ih = image.size
 25 |     w, h = size
 26 |     scale = min(w / iw, h / ih)
 27 |     nw = int(iw * scale)
 28 |     nh = int(ih * scale)
 29 | 
 30 |     image = image.resize((nw, nh), Image.BICUBIC)
 31 |     new_image = Image.new('RGB', size, (128, 128, 128))
 32 |     new_image.paste(image, ((w - nw) // 2, (h - nh) // 2))
 33 |     return new_image
 34 | 
 35 | 
 36 | def rand(a=0, b=1):
 37 |     return np.random.rand() * (b - a) + a
 38 | 
 39 | 
 40 | def get_random_data(annotation_line, input_shape, random=True, max_boxes=20, jitter=.3, hue=.1, sat=1.5, val=1.5,
 41 |                     proc_img=True):
 42 |     '''random preprocessing for real-time data augmentation'''
 43 |     line = annotation_line.split()
 44 |     image = Image.open(line[0])
 45 |     iw, ih = image.size
 46 |     h, w = input_shape
 47 |     box = np.array([np.array(list(map(int, box.split(',')))) for box in line[1:]])
 48 | 
 49 |     if not random:
 50 |         # resize image
 51 |         scale = min(w / iw, h / ih)
 52 |         nw = int(iw * scale)
 53 |         nh = int(ih * scale)
 54 |         dx = (w - nw) // 2
 55 |         dy = (h - nh) // 2
 56 |         image_data = 0
 57 |         if proc_img:
 58 |             image = image.resize((nw, nh), Image.BICUBIC)
 59 |             new_image = Image.new('RGB', (w, h), (128, 128, 128))
 60 |             new_image.paste(image, (dx, dy))
 61 |             image_data = np.array(new_image) / 255.
 62 | 
 63 |         # correct boxes
 64 |         box_data = np.zeros((max_boxes, 5))
 65 |         if len(box) > 0:
 66 |             np.random.shuffle(box)
 67 |             if len(box) > max_boxes: box = box[:max_boxes]
 68 |             box[:, [0, 2]] = box[:, [0, 2]] * scale + dx
 69 |             box[:, [1, 3]] = box[:, [1, 3]] * scale + dy
 70 |             box_data[:len(box)] = box
 71 | 
 72 |         return image_data, box_data
 73 | 
 74 |     # resize image
 75 |     new_ar = w / h * rand(1 - jitter, 1 + jitter) / rand(1 - jitter, 1 + jitter)
 76 |     scale = rand(.25, 2)
 77 |     if new_ar < 1:
 78 |         nh = int(scale * h)
 79 |         nw = int(nh * new_ar)
 80 |     else:
 81 |         nw = int(scale * w)
 82 |         nh = int(nw / new_ar)
 83 |     image = image.resize((nw, nh), Image.BICUBIC)
 84 | 
 85 |     # place image
 86 |     dx = int(rand(0, w - nw))
 87 |     dy = int(rand(0, h - nh))
 88 |     new_image = Image.new('RGB', (w, h), (128, 128, 128))
 89 |     new_image.paste(image, (dx, dy))
 90 |     image = new_image
 91 | 
 92 |     # flip image or not
 93 |     flip = rand() < .5
 94 |     if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT)
 95 | 
 96 |     # distort image
 97 |     hue = rand(-hue, hue)
 98 |     sat = rand(1, sat) if rand() < .5 else 1 / rand(1, sat)
 99 |     val = rand(1, val) if rand() < .5 else 1 / rand(1, val)
100 |     x = rgb_to_hsv(np.array(image) / 255.)
101 |     x[..., 0] += hue
102 |     x[..., 0][x[..., 0] > 1] -= 1
103 |     x[..., 0][x[..., 0] < 0] += 1
104 |     x[..., 1] *= sat
105 |     x[..., 2] *= val
106 |     x[x > 1] = 1
107 |     x[x < 0] = 0
108 |     image_data = hsv_to_rgb(x)  # numpy array, 0 to 1
109 | 
110 |     # correct boxes
111 |     box_data = np.zeros((max_boxes, 5))
112 |     if len(box) > 0:
113 |         np.random.shuffle(box)
114 |         box[:, [0, 2]] = box[:, [0, 2]] * nw / iw + dx
115 |         box[:, [1, 3]] = box[:, [1, 3]] * nh / ih + dy
116 |         if flip: box[:, [0, 2]] = w - box[:, [2, 0]]
117 |         box[:, 0:2][box[:, 0:2] < 0] = 0
118 |         box[:, 2][box[:, 2] > w] = w
119 |         box[:, 3][box[:, 3] > h] = h
120 |         box_w = box[:, 2] - box[:, 0]
121 |         box_h = box[:, 3] - box[:, 1]
122 |         box = box[np.logical_and(box_w > 1, box_h > 1)]  # discard invalid box
123 |         if len(box) > max_boxes: box = box[:max_boxes]
124 |         box_data[:len(box)] = box
125 | 
126 |     return image_data, box_data


--------------------------------------------------------------------------------
/xyolo/yolo3/yolo.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Class definition of YOLO_v3 style detection model on image and video
  4 | """
  5 | 
  6 | import colorsys
  7 | import os
  8 | import typing
  9 | from timeit import default_timer as timer
 10 | 
 11 | import numpy as np
 12 | import tensorflow as tf
 13 | from PIL import Image
 14 | from loguru import logger
 15 | from tensorflow.keras.callbacks import TensorBoard, ModelCheckpoint, ReduceLROnPlateau, EarlyStopping
 16 | from tensorflow.keras.layers import Input
 17 | from tensorflow.keras.models import load_model
 18 | from tensorflow.keras.optimizers import Adam
 19 | from tensorflow.keras.utils import multi_gpu_model
 20 | 
 21 | from xyolo.config import DefaultYolo3Config
 22 | from xyolo.yolo3.model import create_model, create_tiny_model
 23 | from xyolo.yolo3.model import yolo_eval, yolo_body, tiny_yolo_body, preprocess_true_boxes
 24 | from xyolo.yolo3.utils import get_random_data
 25 | from xyolo.yolo3.utils import letterbox_image
 26 | 
 27 | 
 28 | class YOLO(object):
 29 | 
 30 |     def __init__(self, config=None, train=False, **kwargs):
 31 |         if not config:
 32 |             config = DefaultYolo3Config()
 33 |         self.config = config
 34 |         self.model_path = config.model_path
 35 |         self.anchors_path = config.anchors_path
 36 |         self.classes_path = config.classes_path
 37 |         self.score = config.score
 38 |         self.iou = config.iou
 39 |         self.model_image_size = config.model_image_size
 40 |         self.gpu_num = config.gpu_num
 41 |         self.dataset_path = config.dataset_path
 42 |         self.__dict__.update(kwargs)  # update with user overrides
 43 |         self.class_names = self._get_class()
 44 |         self.anchors = self._get_anchors()
 45 |         if not train:
 46 |             self.load_yolo_model()
 47 | 
 48 |     def _get_class(self):
 49 |         classes_path = os.path.expanduser(self.classes_path)
 50 |         with open(classes_path) as f:
 51 |             class_names = f.readlines()
 52 |         class_names = [c.strip() for c in class_names]
 53 |         return class_names
 54 | 
 55 |     def _get_anchors(self):
 56 |         anchors_path = os.path.expanduser(self.anchors_path)
 57 |         with open(anchors_path) as f:
 58 |             anchors = f.readline()
 59 |         anchors = [float(x) for x in anchors.split(',')]
 60 |         return np.array(anchors).reshape(-1, 2)
 61 | 
 62 |     def load_yolo_model(self):
 63 |         self.model_path = self.config.model_path
 64 |         model_path = os.path.expanduser(self.model_path)
 65 |         assert model_path.endswith(
 66 |             '.h5'), 'Keras model or weights must be a .h5 file.'
 67 | 
 68 |         # Load model, or construct model and load weights.
 69 |         num_anchors = len(self.anchors)
 70 |         num_classes = len(self.class_names)
 71 |         is_tiny_version = num_anchors == 6  # default setting
 72 |         try:
 73 |             self.yolo_model = load_model(model_path, compile=False)
 74 |         except:
 75 |             self.yolo_model = tiny_yolo_body(Input(shape=(None, None, 3)), num_anchors // 2, num_classes) \
 76 |                 if is_tiny_version else yolo_body(Input(shape=(None, None, 3)), num_anchors // 3, num_classes)
 77 |             # make sure model, anchors and classes match
 78 |             self.yolo_model.load_weights(self.model_path)
 79 |         else:
 80 |             assert self.yolo_model.layers[-1].output_shape[-1] == \
 81 |                 num_anchors / len(self.yolo_model.output) * (num_classes + 5), \
 82 |                 'Mismatch between model and given anchor and class sizes'
 83 | 
 84 |         print('{} model, anchors, and classes loaded.'.format(model_path))
 85 | 
 86 |         # Generate colors for drawing bounding boxes.
 87 |         hsv_tuples = [(x / len(self.class_names), 1., 1.)
 88 |                       for x in range(len(self.class_names))]
 89 |         self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
 90 |         self.colors = list(
 91 |             map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)),
 92 |                 self.colors))
 93 |         np.random.seed(10101)  # Fixed seed for consistent colors across runs.
 94 |         # Shuffle colors to decorrelate adjacent classes.
 95 |         np.random.shuffle(self.colors)
 96 |         np.random.seed(None)  # Reset seed to default.
 97 | 
 98 |     @tf.function
 99 |     def compute_output(self, image_data, image_shape):
100 |         # Generate output tensor targets for filtered bounding boxes.
101 |         # self.input_image_shape = K.placeholder(shape=(2,))
102 |         self.input_image_shape = tf.constant(image_shape)
103 |         if self.gpu_num >= 2:
104 |             self.yolo_model = multi_gpu_model(
105 |                 self.yolo_model, gpus=self.gpu_num)
106 | 
107 |         boxes, scores, classes = yolo_eval(self.yolo_model(image_data), self.anchors,
108 |                                            len(self.class_names), self.input_image_shape,
109 |                                            score_threshold=self.score, iou_threshold=self.iou)
110 |         return boxes, scores, classes
111 | 
112 |     @classmethod
113 |     def data_generator(cls, annotation_lines, batch_size, input_shape, anchors, num_classes):
114 |         '''data generator for fit_generator'''
115 |         n = len(annotation_lines)
116 |         i = 0
117 |         while True:
118 |             image_data = []
119 |             box_data = []
120 |             for b in range(batch_size):
121 |                 if i == 0:
122 |                     np.random.shuffle(annotation_lines)
123 |                 image, box = get_random_data(
124 |                     annotation_lines[i], input_shape, random=True)
125 |                 image_data.append(image)
126 |                 box_data.append(box)
127 |                 i = (i + 1) % n
128 |             image_data = np.array(image_data)
129 |             box_data = np.array(box_data)
130 |             y_true = preprocess_true_boxes(
131 |                 box_data, input_shape, anchors, num_classes)
132 |             yield [image_data, *y_true], np.zeros(batch_size)
133 | 
134 |     @classmethod
135 |     def data_generator_wrapper(cls, annotation_lines, batch_size, input_shape, anchors, num_classes):
136 |         n = len(annotation_lines)
137 |         if n == 0 or batch_size <= 0:
138 |             return None
139 |         return cls.data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes)
140 | 
141 |     def fit(self, **kwargs):
142 |         # 如下参数可以通过传参覆盖config里的配置，更灵活一些
143 |         # 不配置默认使用config里的参数
144 |         dataset_path = kwargs.get('dataset_path', self.config.dataset_path)
145 |         tensorboard_log_path = kwargs.get(
146 |             'tensorboard_log_path', self.config.tensorboard_log_path)
147 |         output_model_path = kwargs.get(
148 |             'output_model_path', self.config.output_model_path)
149 |         frozen_train = kwargs.get('frozen_train', self.config.frozen_train)
150 |         frozen_train_epochs = kwargs.get(
151 |             'frozen_train_epochs', self.config.frozen_train_epochs)
152 |         frozen_batch_size = kwargs.get(
153 |             'frozen_batch_size', self.config.frozen_batch_size)
154 |         frozen_lr = kwargs.get('frozen_lr', self.config.frozen_lr)
155 |         unfreeze_train = kwargs.get(
156 |             'unfreeze_train', self.config.unfreeze_train)
157 |         unfreeze_train_epochs = kwargs.get(
158 |             'unfreeze_train_epochs', self.config.unfreeze_train_epochs)
159 |         unfreeze_batch_size = kwargs.get(
160 |             'unfreeze_batch_size', self.config.unfreeze_batch_size)
161 |         unfreeze_lr = kwargs.get('unfreeze_lr', self.config.unfreeze_lr)
162 |         initial_weight_path = kwargs.get(
163 |             'initial_weight_path', self.config.pre_training_weights_keras_path)
164 |         use_tensorboard = kwargs.get(
165 |             'use_tensorboard', self.config.use_tensorboard)
166 |         use_checkpoint = kwargs.get(
167 |             'use_checkpoint', self.config.use_checkpoint)
168 |         val_split = kwargs.get('val_split', self.config.val_split)
169 |         use_reduce_lr = kwargs.get('use_reduce_lr', self.config.use_reduce_lr)
170 |         reduce_lr_monitor = kwargs.get(
171 |             'reduce_lr_monitor', self.config.reduce_lr_monitor)
172 |         reduce_lr_factor = kwargs.get(
173 |             'reduce_lr_factor', self.config.reduce_lr_factor)
174 |         reduce_lr_patience = kwargs.get(
175 |             'reduce_lr_patience', self.config.reduce_lr_patience)
176 |         use_early_stopping = kwargs.get(
177 |             'use_early_stopping', self.config.use_early_stopping)
178 |         early_stopping_monitor = kwargs.get(
179 |             'early_stopping_monitor', self.config.early_stopping_monitor)
180 |         early_stopping_min_delta = kwargs.get(
181 |             'early_stopping_min_delta', self.config.early_stopping_min_delta)
182 |         early_stopping_patience = kwargs.get(
183 |             'early_stopping_patience', self.config.early_stopping_patience)
184 | 
185 |         is_tiny_version = len(self.anchors) == 6  # default setting
186 |         num_classes = len(self.class_names)
187 |         if is_tiny_version:
188 |             model = create_tiny_model(self.model_image_size, self.anchors, num_classes,
189 |                                       freeze_body=2, weights_path=initial_weight_path)
190 |         else:
191 |             model = create_model(self.model_image_size, self.anchors, num_classes,
192 |                                  freeze_body=2,
193 |                                  weights_path=initial_weight_path)  # make sure you know what you freeze
194 | 
195 |         logger.info('Prepare to train the model...')
196 | 
197 |         callbacks = []
198 |         if use_tensorboard:
199 |             logging = TensorBoard(log_dir=tensorboard_log_path)
200 |             callbacks.append(logging)
201 |         if use_checkpoint:
202 |             checkpoint = ModelCheckpoint(
203 |                 tensorboard_log_path +
204 |                 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5',
205 |                 monitor='val_loss', save_weights_only=True, save_best_only=True)
206 |             callbacks.append(checkpoint)
207 | 
208 |         logger.info('Split dataset for validate...')
209 |         with open(dataset_path) as f:
210 |             lines = f.readlines()
211 |         np.random.seed(10101)
212 |         np.random.shuffle(lines)
213 |         np.random.seed(None)
214 |         num_val = int(len(lines) * val_split)
215 |         num_train = len(lines) - num_val
216 | 
217 |         logger.info('The first step training begins({} epochs).'.format(
218 |             frozen_train_epochs))
219 |         # Train with frozen layers first, to get a stable loss.
220 |         # Adjust num epochs to your dataset. This step is enough to obtain a not bad model.
221 |         if frozen_train:
222 |             model.compile(optimizer=Adam(lr=frozen_lr), loss={
223 |                 # use custom yolo_loss Lambda layer.
224 |                 'yolo_loss': lambda y_true, y_pred: y_pred})
225 | 
226 |             batch_size = frozen_batch_size
227 |             logger.info(
228 |                 'Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))
229 |             model.fit(
230 |                 self.data_generator_wrapper(lines[:num_train], batch_size, self.model_image_size, self.anchors,
231 |                                             num_classes),
232 |                 steps_per_epoch=max(1, num_train // batch_size),
233 |                 validation_data=self.data_generator_wrapper(lines[num_train:], batch_size, self.model_image_size,
234 |                                                             self.anchors,
235 |                                                             num_classes),
236 |                 validation_steps=max(1, num_val // batch_size),
237 |                 epochs=frozen_train_epochs,
238 |                 initial_epoch=0,
239 |                 callbacks=callbacks)
240 | 
241 |         logger.info('The second step training begins({} epochs).'.format(
242 |             unfreeze_train_epochs))
243 |         # Unfreeze and continue training, to fine-tune.
244 |         # Train longer if the result is not good.
245 |         if use_reduce_lr:
246 |             reduce_lr = ReduceLROnPlateau(monitor=reduce_lr_monitor, factor=reduce_lr_factor,
247 |                                           patience=reduce_lr_patience, verbose=1)
248 |             callbacks.append(reduce_lr)
249 |         if use_early_stopping:
250 |             early_stopping = EarlyStopping(monitor=early_stopping_monitor, min_delta=early_stopping_min_delta,
251 |                                            patience=early_stopping_patience, verbose=1)
252 |             callbacks.append(early_stopping)
253 |         if unfreeze_train:
254 |             for i in range(len(model.layers)):
255 |                 model.layers[i].trainable = True
256 |             model.compile(optimizer=Adam(lr=unfreeze_lr),
257 |                           loss={'yolo_loss': lambda y_true, y_pred: y_pred})  # recompile to apply the change
258 |             logger.info('Unfreeze all of the layers.')
259 | 
260 |             # note that more GPU memory is required after unfreezing the body
261 |             batch_size = unfreeze_batch_size
262 |             logger.info('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val,
263 |                                                                                              batch_size))
264 |             model.fit(
265 |                 self.data_generator_wrapper(lines[:num_train], batch_size, self.model_image_size, self.anchors,
266 |                                             num_classes),
267 |                 steps_per_epoch=max(1, num_train // batch_size),
268 |                 validation_data=self.data_generator_wrapper(lines[num_train:], batch_size, self.model_image_size,
269 |                                                             self.anchors,
270 |                                                             num_classes),
271 |                 validation_steps=max(1, num_val // batch_size),
272 |                 epochs=frozen_train_epochs + unfreeze_train_epochs,
273 |                 initial_epoch=frozen_train_epochs,
274 |                 callbacks=callbacks)
275 |         model.save_weights(output_model_path)
276 |         logger.info('Training completed!')
277 | 
278 |     def detect_image(self, img: typing.Union[Image.Image, str]) -> typing.List[
279 |             typing.Tuple[str, int, float, int, int, int, int]]:
280 |         """
281 |         在给定图片上做目标检测并返回检测结果
282 | 
283 |         Args:
284 |             img: 要检测的图片对象（PIL.Image.Image）或路径(str)
285 | 
286 |         Returns:
287 |             [[类别名称,类别编号,概率,左上角x值,左上角y值,右下角x值,右下角y值],...]
288 |         """
289 |         # 输入参数兼容str类型的图片路径和Image类型的图片文件
290 |         if isinstance(img, str):
291 |             image = Image.open(img)
292 |         else:
293 |             image = img
294 |         assert isinstance(image, Image.Image)
295 |         start = timer()
296 |         if self.model_image_size != (None, None):
297 |             assert self.model_image_size[0] % 32 == 0, 'Multiples of 32 required'
298 |             assert self.model_image_size[1] % 32 == 0, 'Multiples of 32 required'
299 |             boxed_image = letterbox_image(
300 |                 image, tuple(reversed(self.model_image_size)))
301 |         else:
302 |             new_image_size = (image.width - (image.width % 32),
303 |                               image.height - (image.height % 32))
304 |             boxed_image = letterbox_image(image, new_image_size)
305 |         image_data = np.array(boxed_image, dtype='float32')
306 | 
307 |         image_data /= 255.
308 |         image_data = np.expand_dims(image_data, 0)  # Add batch dimension.
309 | 
310 |         out_boxes, out_scores, out_classes = self.compute_output(
311 |             image_data, [image.size[1], image.size[0]])
312 | 
313 |         logger.debug('Found {} boxes for {}'.format(len(out_boxes), 'img'))
314 | 
315 |         results = []
316 |         for i, c in reversed(list(enumerate(out_classes))):
317 |             predicted_class = self.class_names[c]
318 |             box = out_boxes[i]
319 |             score = out_scores[i]
320 | 
321 |             label = '{} {:.2f}'.format(predicted_class, score)
322 | 
323 |             top, left, bottom, right = box
324 |             top = max(0, np.floor(top + 0.5).astype('int32'))
325 |             left = max(0, np.floor(left + 0.5).astype('int32'))
326 |             bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32'))
327 |             right = min(image.size[0], np.floor(right + 0.5).astype('int32'))
328 |             results.append((predicted_class, int(c), float(
329 |                 score), left, top, right, bottom))
330 |             logger.debug('Class {},Position {}, {}'.format(
331 |                 label, (left, top), (right, bottom)))
332 | 
333 |         end = timer()
334 |         logger.debug('Cost time {}s'.format(end - start))
335 |         return results
336 | 
337 |     def draw_image(self, img: typing.Union[Image.Image, str], predicted_results: typing.List[
338 |             typing.Tuple[str, int, float, int, int, int, int]], draw_label=True) -> Image.Image:
339 |         """
340 |         给定一张图片和目标检测结果，将目标检测结果绘制在图片上，并返回绘制后的图片
341 | 
342 |         Args:
343 |             img: 要检测的图片对象（PIL.Image.Image）或路径(str)
344 |             predicted_results: 目标检测的结果，[[类别名称,类别编号,概率,左上角x值,左上角y值,右下角x值,右下角y值],...]
345 |             draw_label: 是否需要为框框标注类别和概率
346 | 
347 |         Returns:
348 |             添加了检测结果的图片对象
349 |         """
350 |         import cv2
351 |         # 输入参数兼容str类型的图片路径和Image类型的图片文件
352 |         if isinstance(img, str):
353 |             image = Image.open(img)
354 |         else:
355 |             image = img
356 |         assert isinstance(image, Image.Image)
357 | 
358 |         img_array = np.asarray(image.convert('RGB'))
359 |         for predicted_class, c, score, x1, y1, x2, y2 in predicted_results:
360 |             color = self.colors[c]
361 |             cv2.rectangle(img_array, (x1, y1), (x2, y2), color, 2)
362 |             if draw_label:
363 |                 label = '{} {:.2f}'.format(predicted_class, score)
364 |                 cv2.putText(img_array, text=label, org=(x2 + 3, y1 + 10), fontFace=cv2.FONT_HERSHEY_SIMPLEX,
365 |                             fontScale=0.50, color=color, thickness=2)
366 |         image = Image.fromarray(img_array)
367 |         return image
368 | 
369 |     def detect_and_draw_image(self, image: typing.Union[Image.Image, str], draw_label=True) -> Image.Image:
370 |         """
371 |         在给定图片上做目标检测，并根据检测结果在图片上画出框框和标签
372 | 
373 |         Args:
374 |             image: 要检测的图片对象（PIL.Image.Image）或路径(str)
375 |             draw_label: 是否需要为框框标注类别和概率
376 | 
377 |         Returns:
378 |             添加了检测结果的图片对象
379 |         """
380 |         predicted_results = self.detect_image(image)
381 |         img = self.draw_image(image, predicted_results, draw_label=draw_label)
382 |         return img
383 | 
384 |     def detect_video(self, video_path, output_path=""):
385 |         import cv2
386 |         vid = cv2.VideoCapture(video_path)
387 |         if not vid.isOpened():
388 |             raise IOError("Couldn't open webcam or video")
389 |         video_FourCC = int(vid.get(cv2.CAP_PROP_FOURCC))
390 |         video_fps = vid.get(cv2.CAP_PROP_FPS)
391 |         video_size = (int(vid.get(cv2.CAP_PROP_FRAME_WIDTH)),
392 |                       int(vid.get(cv2.CAP_PROP_FRAME_HEIGHT)))
393 |         isOutput = True if output_path != "" else False
394 |         if isOutput:
395 |             print("!!! TYPE:", type(output_path), type(
396 |                 video_FourCC), type(video_fps), type(video_size))
397 |             out = cv2.VideoWriter(
398 |                 output_path, video_FourCC, video_fps, video_size)
399 |         accum_time = 0
400 |         curr_fps = 0
401 |         fps = "FPS: ??"
402 |         prev_time = timer()
403 |         while True:
404 |             return_value, frame = vid.read()
405 |             b, g, r = cv2.split(frame)
406 |             frame = cv2.merge([r, g, b])
407 |             image = Image.fromarray(frame)
408 |             image = self.detect_and_draw_image(image)
409 |             result = np.asarray(image)
410 |             r, g, b = cv2.split(result)
411 |             result = cv2.merge([b, g, r])
412 |             curr_time = timer()
413 |             exec_time = curr_time - prev_time
414 |             prev_time = curr_time
415 |             accum_time = accum_time + exec_time
416 |             curr_fps = curr_fps + 1
417 |             if accum_time > 1:
418 |                 accum_time = accum_time - 1
419 |                 fps = "FPS: " + str(curr_fps)
420 |                 curr_fps = 0
421 |             cv2.putText(result, text=fps, org=(3, 15), fontFace=cv2.FONT_HERSHEY_SIMPLEX,
422 |                         fontScale=0.50, color=(255, 0, 0), thickness=2)
423 |             cv2.namedWindow("result", cv2.WINDOW_NORMAL)
424 |             cv2.imshow("result", result)
425 |             if isOutput:
426 |                 out.write(result)
427 |             if cv2.waitKey(1) & 0xFF == ord('q'):
428 |                 break
429 | 


--------------------------------------------------------------------------------