├── .asserts ├── detect.jpg ├── image-20201103233753811.png └── 下载.jpeg ├── LICENSE ├── README.md ├── setup.py ├── tests ├── eval_default.py ├── eval_mydata.py ├── specify_pre_training_weight.py ├── test_voc2xyolo.py ├── train.py └── use_proxies.py └── xyolo ├── __init__.py ├── config.py ├── convert.py ├── init_yolo.py ├── preprocessing.py ├── xyolo_data ├── coco_classes.txt ├── yolo_anchors.txt └── yolov3.cfg └── yolo3 ├── __init__.py ├── model.py ├── utils.py └── yolo.py /.asserts/detect.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AaronJny/xyolo/cc249687a848272a873296d0dbae86b4fb33c951/.asserts/detect.jpg -------------------------------------------------------------------------------- /.asserts/image-20201103233753811.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AaronJny/xyolo/cc249687a848272a873296d0dbae86b4fb33c951/.asserts/image-20201103233753811.png -------------------------------------------------------------------------------- /.asserts/下载.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AaronJny/xyolo/cc249687a848272a873296d0dbae86b4fb33c951/.asserts/下载.jpeg -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 qqwweee 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # xyolo 2 | 3 | `xyolo`是一个Python实现的、高度封装的YOLO v3类库。 4 | 5 | 借助xyolo,您可以只使用几行Python代码轻松完成yolo3目标检测任务的训练和调用。 6 | 7 | xyolo is a highly encapsulated YOLO v3 library implemented in Python. 8 | 9 | With xyolo, you can easily complete the training and calling of the yolo3 target detection task with just a few lines of Python code. 10 | 11 | 12 | 请注意: 13 | 14 | > 我使用的Python是Anaconda的Python 3.7发行版本,在shell里面进行了初始化(python和pip默认指向当前激活环境,而不是默认的python2),所以文章中的python和pip请根据自己的情况判断是否需要替换为python3和pip3。 15 | 16 | PS: 17 | 18 | > 此项目是对[tf2-keras-yolo3](https://github.com/AaronJny/tf2-keras-yolo3)项目的重构和封装。 19 | 20 | 21 | ## 一、安装 22 | 23 | ### 1、通用安装方法 24 | 25 | `xyolo`的安装非常简单,通过pip一键安装即可。请注意,`xyolo`需要安装的TensorFlow版本>=2.2(未安装TensorFlow的话则会自动安装) 26 | 27 | ``` 28 | pip install --user xyolo 29 | ``` 30 | 31 | 建议使用`--user`参数,避免遇到权限问题。 32 | 33 | 当然,如果有条件的话,使用conda创建一个新环境,并在新环境中进行安装会更好,可以避免意料之外的依赖冲突问题。 34 | 35 | ### 2、GPU版本安装方法 36 | 37 | 如果你想使用TensorFlow的GPU版本,可以选择在安装xyolo之前,先安装2.2及以上的tensorflow-gpu。以conda举例,安装GPU支持的操作流程如下: 38 | 39 | #### 1.创建一个虚拟环境,名为xyolo。 40 | 41 | ``` 42 | conda create -n xyolo python=3.7 43 | ``` 44 | 45 | #### 2.切换到刚创建好的环境里 46 | 47 | ``` 48 | conda activate xyolo 49 | ``` 50 | 51 | 切换完成后,可以通过pip查看一下环境里安装的包: 52 | 53 | ``` 54 | pip list 55 | ``` 56 | 57 | 结果不出预料,很干净,毕竟是一个新环境嘛: 58 | 59 | ``` 60 | Package Version 61 | ---------- ------------------- 62 | certifi 2020.6.20 63 | pip 20.2.4 64 | setuptools 50.3.0.post20201006 65 | wheel 0.35.1 66 | ``` 67 | 68 | #### 3.在新环境中安装tensorflow-gpu 69 | 70 | > 注意,安装前需要先安装好和tensorflow版本对应的显卡驱动,这部分还有点麻烦,就不在这篇文章中说明了,我觉得会选择使用GPU跑xyolo的同学应该都已经掌握了这部分技能了吧? 71 | > 72 | > 毕竟如果完全没有接触过tensorflow-gpu的话,接触xyolo应该也会选择cpu版本更好上手。 73 | 74 | 我们通过conda来安装tensorflow-gpu,gpu版本的tensorflow存在cuda和cudnn依赖,使用conda可以自动解决这两者的版本依赖和配置问题。 75 | 76 | ``` 77 | conda install tensorflow-gpu=2.2 78 | ``` 79 | 80 | 安静地等待一段时间,即可完成tensorflow-gpu的安装。 81 | 82 | #### 4.使用pip安装xyolo 83 | 84 | ``` 85 | pip install --user xyolo 86 | ``` 87 | 88 | 通过再次执行`pip list`,我们能够看到成功安装的xyolo及相关依赖。 89 | 90 | ``` 91 | Package Version 92 | ---------------------- ------------------- 93 | absl-py 0.11.0 94 | aiohttp 3.6.3 95 | astunparse 1.6.3 96 | async-timeout 3.0.1 97 | attrs 20.2.0 98 | blinker 1.4 99 | brotlipy 0.7.0 100 | cachetools 4.1.1 101 | certifi 2020.6.20 102 | cffi 1.14.3 103 | chardet 3.0.4 104 | click 7.1.2 105 | cryptography 3.1.1 106 | cycler 0.10.0 107 | gast 0.3.3 108 | google-auth 1.23.0 109 | google-auth-oauthlib 0.4.2 110 | google-pasta 0.2.0 111 | grpcio 1.31.0 112 | h5py 2.10.0 113 | idna 2.10 114 | importlib-metadata 2.0.0 115 | Keras-Preprocessing 1.1.0 116 | kiwisolver 1.3.1 117 | loguru 0.5.3 118 | lxml 4.6.1 119 | Markdown 3.3.2 120 | matplotlib 3.3.2 121 | mkl-fft 1.2.0 122 | mkl-random 1.1.1 123 | mkl-service 2.3.0 124 | multidict 4.7.6 125 | numpy 1.18.5 126 | oauthlib 3.1.0 127 | opencv-python 4.4.0.46 128 | opt-einsum 3.1.0 129 | Pillow 8.0.1 130 | pip 20.2.4 131 | protobuf 3.13.0 132 | pyasn1 0.4.8 133 | pyasn1-modules 0.2.8 134 | pycparser 2.20 135 | PyJWT 1.7.1 136 | pyOpenSSL 19.1.0 137 | pyparsing 2.4.7 138 | PySocks 1.7.1 139 | python-dateutil 2.8.1 140 | requests 2.24.0 141 | requests-oauthlib 1.3.0 142 | rsa 4.6 143 | scipy 1.4.1 144 | setuptools 50.3.0.post20201006 145 | six 1.15.0 146 | tensorboard 2.2.2 147 | tensorboard-plugin-wit 1.6.0 148 | tensorflow 2.2.0 149 | tensorflow-estimator 2.2.0 150 | termcolor 1.1.0 151 | tqdm 4.51.0 152 | urllib3 1.25.11 153 | Werkzeug 1.0.1 154 | wheel 0.35.1 155 | wrapt 1.12.1 156 | xyolo 0.1.3 157 | yarl 1.6.2 158 | zipp 3.4.0 159 | ``` 160 | 161 | 162 | 163 | ## 二、使用方法 164 | 165 | ### 1、使用官方预训练权重,进行目标检测测试 166 | 167 | yolov3官方有提供预训练权重,如果我们要在和VOC数据集同分布或相似分布的图片上做目标检测的话,直接使用官方预训练权重也是可以的。 168 | 169 | xyolo调用官方预训练模型的逻辑是: 170 | 171 | - 1.从官网下载预训练权重(Darknet输出格式) 172 | - 2.将Darknet的权重文件转成keras的权重文件(现在TensorFlow和Keras已经基本不分家啦) 173 | - 3.构建模型,加载预训练权重 174 | - 4.对选择的图片进行目标检测 175 | 176 | 好麻烦啊,是不是要写很多代码?当然不是~歪嘴.jpg 177 | 178 | ![下载](.asserts/下载.jpeg) 179 | 180 | 181 | 182 | 首先,准备一张用于检测的图片,假设它的路径是`./xyolo_data/detect.jpg`,图片内容如下: 183 | 184 | ![detect](.asserts/detect.jpg) 185 | 186 | 一个简单的示例如下: 187 | 188 | ```python 189 | # 导入包 190 | from xyolo import YOLO, DefaultYolo3Config 191 | from xyolo import init_yolo_v3 192 | 193 | 194 | # 创建默认配置类对象 195 | config = DefaultYolo3Config() 196 | # 初始化xyolo(下载预训练权重、转换权重等操作都是在这里完成的) 197 | # 下载和转换只在第一次调用的时候进行,之后再调用会使用缓存的文件 198 | init_yolo_v3(config) 199 | # 创建一个yolo对象,这个对象提供使用yolov3进行检测和训练的接口 200 | yolo = YOLO(config) 201 | 202 | # 检测并在图片上标注出物体 203 | img = yolo.detect_and_draw_image('./xyolo_data/detect.jpg') 204 | # 展示标注后图片 205 | img.show() 206 | ``` 207 | 208 | 输出如下: 209 | 210 | ``` 211 | 2020-11-03 23:33:49.645 | DEBUG | xyolo.yolo3.yolo:detect_image:273 - Found 3 boxes for img 212 | 2020-11-03 23:33:49.648 | DEBUG | xyolo.yolo3.yolo:detect_image:289 - Class dog 0.99,Position (402, 176), (586, 310) 213 | 2020-11-03 23:33:49.650 | DEBUG | xyolo.yolo3.yolo:detect_image:289 - Class bicycle 0.96,Position (0, 80), (383, 412) 214 | 2020-11-03 23:33:49.652 | DEBUG | xyolo.yolo3.yolo:detect_image:289 - Class person 1.00,Position (9, 1), (200, 410) 215 | 2020-11-03 23:33:49.652 | DEBUG | xyolo.yolo3.yolo:detect_image:292 - Cost time 6.65432205500838s 216 | ``` 217 | 218 | 同时,打开了一张图片(图片尺寸不一致是我截图和排版的问题): 219 | 220 | ![image-20201103233753811](.asserts/image-20201103233753811.png) 221 | 222 | 这样,一次目标检测就完成了,舒服呀~ 223 | 224 | 当然了,我相信肯定有不少同学已经在这段代码的执行过程中遇到困难或疑惑了,我来集中解答几个可能会遇到的。 225 | 226 | 1.预训练权重下载慢或无法下载 227 | 228 | > 因为yolov3的官网在国外,所以国内下载慢也很正常。推荐使用代理,或从备份地址下载后指定预训练权重的地址。具体方法参考`2、预训练权重无法下载或下载速度慢的解决方案`。 229 | 230 | 2.提示权限问题 231 | 232 | > 因为xyolo会自动下载预训练权重到安装目录下,在某些情况下可能会遇到权限问题。解决方法就是在安装部分说的,通过指定`--user`参数指明将包安装在用户目录下,一般就没问题了。 233 | 234 | 3.检测的速度慢 235 | 236 | > 细心的同学可能发现了,上面对图片做的一次检测,竟然花了6秒!这也太慢了吧? 237 | > 238 | > 事实上,并不是这样的。TensorFlow 2.x版本,默认使用动态图,性能会稍弱于1.x的静态图。所以,我这里使用了tf.function对速度进行了优化,在第一次运算时,模型会自动生成静态图,这部分会比较消耗时间,但后续单次计算的时间都会极大地缩短。 239 | > 240 | > 假如我们接着进行几次识别,就能够发现单次识别的时间有了明显降低,一般只需要零点几秒或几十毫秒。 241 | 242 | ### 2、预训练权重无法下载或下载速度慢的解决方案 243 | 244 | 主要有两种方法能够解决,分别看一下。 245 | 246 | #### 1.设置代理 247 | 248 | 如果我们手上有能够加快访问速度的网络代理(你应该知道我在说啥~),可以通过设置代理的形式加快下载速度,示例如下: 249 | 250 | ```python 251 | from xyolo import YOLO, DefaultYolo3Config 252 | from xyolo import init_yolo_v3 253 | 254 | # 创建一个DefaultYolo3Config的子类,在子类里覆盖默认的配置 255 | class MyConfig(DefaultYolo3Config): 256 | def __init__(self): 257 | super(MyConfig, self).__init__() 258 | # 这是替换成你的代理地址 259 | self.requests_proxies = {'https': 'http://localhost:7890'} 260 | 261 | # 使用修改后的配置创建yolo对象 262 | config = MyConfig() 263 | init_yolo_v3(config) 264 | yolo = YOLO(config) 265 | 266 | # 检测 267 | img = yolo.detect_and_draw_image('./xyolo_data/detect.jpg') 268 | img.show() 269 | ``` 270 | 271 | #### 2.从备用链接手动下载 272 | 273 | 如果没有代理的话,也可以选择从备用下载地址下载。 274 | 275 | 我把预训练权重上传到百度云上了,链接如下: 276 | 277 | > 链接: https://pan.baidu.com/s/1jXpoXHQHlp6Ra0jImruPXg 密码: 48ed 278 | 279 | 从分享页面下载好权重文件后,又有两种设置方式。 280 | 281 | ①将文件复制到xyolo包的安装目录下的xyolo_data目录下即可,就和自动下载的情况是一样的,后续不需要再人为做任何操作。 282 | 283 | ②将文件保存在任意位置,并在配置类中设置它的路径。 284 | 285 | 第一种就不需要多说了,来看一个第二种设置方法的示例: 286 | 287 | ```python 288 | from xyolo import YOLO, DefaultYolo3Config 289 | from xyolo import init_yolo_v3 290 | 291 | # 创建一个DefaultYolo3Config的子类,在子类里覆盖默认的配置 292 | class MyConfig(DefaultYolo3Config): 293 | def __init__(self): 294 | super(MyConfig, self).__init__() 295 | # 这是替换成你的文件路径,为了避免出错,请尽量使用绝对路径 296 | self._pre_training_weights_darknet_path = '/Users/aaron/data/darknet_yolo.weights' 297 | 298 | # 使用修改后的配置创建yolo对象 299 | config = MyConfig() 300 | init_yolo_v3(config) 301 | yolo = YOLO(config) 302 | 303 | # 检测 304 | img = yolo.detect_and_draw_image('./xyolo_data/detect.jpg') 305 | img.show() 306 | ``` 307 | 308 | ### 3、使用自己的数据训练模型 309 | 310 | 首先介绍一下xyolo的数据集的输入格式。xyolo的输入数据集可表示如下: 311 | 312 | > 数据集是一个txt文本文件,它包含若干行,每行是一条数据。 313 | > 314 | > 每一行的格式: 图片路径 box1 box2 ... boxN 315 | > 316 | > 每一个box的格式:框框左上角x值,框框左上角y值,框框右下角x值,框框右下角y值,框框内物体的类别编号 317 | > 318 | > 给出一个示例: 319 | > 320 | > > ``` 321 | > > path/to/img1.jpg 50,100,150,200,0 30,50,200,120,3 322 | > > path/to/img2.jpg 120,300,250,600,2 323 | > > ... 324 | > > ``` 325 | 326 | 进行图像标注的工具,labelImg算是用的比较多的。labelImg的标注文件格式默认是VOC的格式,文件类型是xml,与xyolo的输入格式并不相同。不用担心,xyolo提供了一个数据格式转换的脚本,我们只需要调用即可。 327 | 328 | ```python 329 | # 引入转换脚本 330 | from xyolo import voc2xyolo 331 | 332 | # voc格式的标注数据路径的正则表达式 333 | input_path = '/Users/aaron/data/labels_voc/*.xml' 334 | # classes是我们要检测的所有有效类别名称构成的txt文件,每个类别一行 335 | classes_path = '/Users/aaron/code/xyolo/tests/xyolo_data/classes.txt' 336 | # 转换后的xyolo数据集存放路径 337 | output_path = '/Users/aaron/code/xyolo/tests/xyolo_data/xyolo_label.txt' 338 | # 开始转换 339 | voc2xyolo(input_path=input_path, classes_path=classes_path, output_path=output_path) 340 | ``` 341 | 342 | 脚本执行后,会有进度条提示处理进度,当进度条达到100%时,处理完成。 343 | 344 | ``` 345 | 100%|██████████| 106/106 [00:00<00:00, 3076.05it/s] 346 | ``` 347 | 348 | 有了数据集之后,可以准备训练了。在此之前,确认一遍我们需要的东西: 349 | 350 | - xyolo支持的数据集 351 | - 要检测的所有有效类别名称构成的txt文件(`classes.txt`) 352 | 353 | 我以检测某个网站的验证码图片上的文字举例。 354 | 355 | 首先确定要检测的类别,我只需要判断字的位置,不判断每个字是什么,所以就一个类别text。我们创建一个`classes.txt`文件,在里面写入: 356 | 357 | ``` 358 | text 359 | ``` 360 | 361 | 然后,我标注并使用xyolo转换了数据集(`xyolo_label.txt`): 362 | 363 | ``` 364 | /home/aaron/tmp/test_xyolo/images/162.png 47,105,75,141,0 157,52,181,80,0 197,85,229,120,0 265,85,296,117,0 257,131,293,166,0 355,63,386,90,0 365 | /home/aaron/tmp/test_xyolo/images/88.png 93,46,129,86,0 79,139,114,174,0 209,42,237,72,0 200,68,226,98,0 256,53,295,86,0 209,134,247,171,0 366 | /home/aaron/tmp/test_xyolo/images/176.png 43,88,76,120,0 123,91,153,127,0 98,155,127,184,0 189,117,224,152,0 289,54,319,86,0 348,123,374,151,0 367 | /home/aaron/tmp/test_xyolo/images/63.png 36,128,72,161,0 79,130,104,161,0 127,120,160,153,0 305,111,329,138,0 302,125,334,153,0 342,81,380,119,0 368 | /home/aaron/tmp/test_xyolo/images/77.png 164,114,200,150,0 193,147,225,182,0 309,90,336,120,0 349,89,382,121,0 321,126,352,155,0 298,150,327,177,0 369 | /home/aaron/tmp/test_xyolo/images/189.png 119,90,148,118,0 122,132,150,159,0 208,44,240,76,0 279,60,314,97,0 299,65,334,98,0 331,93,364,129,0 370 | /home/aaron/tmp/test_xyolo/images/200.png 232,58,265,91,0 288,58,316,85,0 49,118,78,148,0 55,134,83,163,0 75,148,103,175,0 312,131,343,163,0 371 | /home/aaron/tmp/test_xyolo/images/76.png 20,61,56,97,0 29,108,57,139,0 76,117,111,154,0 139,117,167,147,0 204,116,242,157,0 336,153,376,191,0 372 | ... 373 | ``` 374 | 375 | 开始训练: 376 | 377 | ```python 378 | # 导入包 379 | from xyolo import DefaultYolo3Config, YOLO 380 | from xyolo import init_yolo_v3 381 | 382 | # 创建一个DefaultYolo3Config的子类,在子类里覆盖默认的配置 383 | class MyConfig(DefaultYolo3Config): 384 | def __init__(self): 385 | super(MyConfig, self).__init__() 386 | # 数据集路径,推荐使用绝对路径 387 | self._dataset_path = '/home/aaron/tmp/test_xyolo/xyolo_data/yolo_label.txt' 388 | # 类别名称文件路径,推荐使用绝对路径 389 | self._classes_path = '/home/aaron/tmp/test_xyolo/xyolo_data/classes.txt' 390 | # 模型保存路径,默认是保存在当前路径下的xyolo_data下的,也可以进行更改 391 | # 推荐使用绝对路径 392 | self._output_model_path = '/home/aaron/tmp/test_xyolo/output_model.h5' 393 | 394 | 395 | # 使用修改后的配置创建yolo对象 396 | config = MyConfig() 397 | init_yolo_v3(config) 398 | # 如果是训练,在创建yolo对象时要传递参数train=True 399 | yolo = YOLO(config, train=True) 400 | # 开始训练,训练完成后会自动保存 401 | yolo.fit() 402 | ``` 403 | 404 | 如果想要使用训练好的模型进行预测,可以这样写: 405 | 406 | ```python 407 | # 导入包 408 | from xyolo import DefaultYolo3Config, YOLO 409 | from xyolo import init_yolo_v3 410 | 411 | # 创建一个DefaultYolo3Config的子类,在子类里覆盖默认的配置 412 | class MyConfig(DefaultYolo3Config): 413 | def __init__(self): 414 | super(MyConfig, self).__init__() 415 | # 类别名称文件路径,推荐使用绝对路径 416 | self._classes_path = '/home/aaron/tmp/test_xyolo/xyolo_data/classes.txt' 417 | # yolo对象使用的模型路径,也就是刚刚训练好的模型路径,推荐使用绝对路径 418 | self._model_path = '/home/aaron/tmp/test_xyolo/output_model.h5' 419 | 420 | 421 | # 使用修改后的配置创建yolo对象 422 | config = MyConfig() 423 | init_yolo_v3(config) 424 | yolo = YOLO(config) 425 | # 检测 426 | img = yolo.detect_and_draw_image('./xyolo_data/detect.jpg') 427 | img.show() 428 | ``` 429 | 430 | 觉得反复配置Config类麻烦?实际上,对于一个项目,我们只需要配置一个Config类,把项目用到的配置都一起配置了,然后再引用同一个类即可,我现在这么写只是为了方便演示。比如上面这个,可以直接把训练和调用的配置写在一起: 431 | 432 | ```python 433 | # 创建一个DefaultYolo3Config的子类,在子类里覆盖默认的配置 434 | class MyConfig(DefaultYolo3Config): 435 | def __init__(self): 436 | super(MyConfig, self).__init__() 437 | # 数据集路径,推荐使用绝对路径 438 | self._dataset_path = '/home/aaron/tmp/test_xyolo/xyolo_data/yolo_label.txt' 439 | # 类别名称文件路径,推荐使用绝对路径 440 | self._classes_path = '/home/aaron/tmp/test_xyolo/xyolo_data/classes.txt' 441 | # 模型保存路径,默认是保存在当前路径下的xyolo_data下的,也可以进行更改 442 | # 推荐使用绝对路径 443 | self._output_model_path = '/home/aaron/tmp/test_xyolo/output_model.h5' 444 | # yolo对象使用的模型路径,也就是刚刚训练好的模型路径,推荐使用绝对路径 445 | self._model_path = '/home/aaron/tmp/test_xyolo/output_model.h5' 446 | ``` 447 | 448 | ### 4、可选的全部配置项 449 | 450 | 我决定在`xyolo`工具里面给用户足够的定制空间,所以参数会设置的比较多,这样使用起来会更加灵活。 451 | 452 | 但也不用担心配置繁琐,因为如上面所见,其实常用的配置项就几个,只要掌握那几个就可以轻松上手了。但是如果想更灵活的话,就需要了解更多参数。下面给出一份完整的参数配置信息(也就是xyolo的默认配置): 453 | 454 | ```python 455 | from os.path import abspath, join, dirname, exists 456 | from os import mkdir 457 | 458 | 459 | class DefaultYolo3Config: 460 | """ 461 | yolo3模型的默认设置 462 | """ 463 | 464 | def __init__(self): 465 | # xyolo各种数据的保存路径,包内置 466 | self.inner_xyolo_data_dir = abspath(join(dirname(__file__), './xyolo_data')) 467 | # xyolo各种数据的保存路径,包外,针对于项目 468 | self.outer_xyolo_data_dir = abspath('./xyolo_data') 469 | # yolo3预训练权重下载地址 470 | self.pre_training_weights_url = 'https://pjreddie.com/media/files/yolov3.weights' 471 | # 下载文件时的http代理, 472 | # 如需设置,格式为{'https_proxy':'host:port'},如{'https_proxy':'http://127.0.0.1:7890'}, 473 | # 详细设置请参考 https://requests.readthedocs.io/en/master/user/advanced/#proxies 474 | self.requests_proxies = None 475 | # Darknet格式的预训练权重路径,请填写相对于inner_xyolo_data_dir的相对或绝对路径 476 | self._pre_training_weights_darknet_path = 'darknet_yolo.weights' 477 | # yolo3预训练权重darknet md5 hash值,用于处理异常数据 478 | self.pre_training_weights_darknet_md5 = 'c84e5b99d0e52cd466ae710cadf6d84c' 479 | # 转化后的、Keras格式的预训练权重路径,请填写相对于inner_xyolo_data_dir的相对或绝对路径 480 | self._pre_training_weights_keras_path = 'keras_weights.h5' 481 | # 预训练权重的配置路径,请填写相对于inner_xyolo_data_dir的相对或绝对路径 482 | self._pre_training_weights_config_path = 'yolov3.cfg' 483 | # 默认的anchors box路径,请填写相对于inner_xyolo_data_dir的相对或绝对路径 484 | self._anchors_path = 'yolo_anchors.txt' 485 | # 默认的类别文本路径,请填写相对于inner_xyolo_data_dir的相对或绝对路径 486 | self._classes_path = 'coco_classes.txt' 487 | # 训练输出的模型地址,请填写相对于outer_xyolo_data_dir的相对或绝对路径 488 | self._output_model_path = 'output_model.h5' 489 | # 数据集路径,请填写相对于outer_xyolo_data_dir的相对或绝对路径 490 | self._dataset_path = 'dataset.txt' 491 | # 是否开启TensorBoard,默认开启 492 | self.use_tensorboard = True 493 | # 训练时TensorBoard输出路径,请填写相对于outer_xyolo_data_dir的相对或绝对路径 494 | self._tensorboard_log_path = './tensorboard/logs' 495 | # 是否开启CheckPoint,默认开启 496 | self.use_checkpoint = True 497 | # 是否开启学习率衰减 498 | self.use_reduce_lr = True 499 | # 学习率衰减监控指标,默认为验证loss 500 | self.reduce_lr_monitor = 'val_loss' 501 | # 学习率衰减因子,new_lr = lr * factor 502 | self.reduce_lr_factor = 0.1 503 | # 连续patience个epochs内结果未改善,则进行学习率衰减 504 | self.reduce_lr_patience = 3 505 | # 是否开启early_stopping 506 | self.use_early_stopping = True 507 | # early_stopping监控指标,默认为验证loss 508 | self.early_stopping_monitor = 'val_loss' 509 | # 指标至少变化多少认为结果改善了 510 | self.early_stopping_min_delta = 0 511 | # 连续patience个epochs内结果未改善,则提前结束训练 512 | self.early_stopping_patience = 10 513 | # yolo默认加载的模型路径(最好填写绝对路径),优先级设置见下方model_path方法 514 | self._model_path = '' 515 | # 目标检测分数阈值 516 | self.score = 0.3 517 | # 交并比阈值 518 | self.iou = 0.45 519 | # 模型图片大小 520 | self.model_image_size = (416, 416) 521 | # GPU数量 522 | self.gpu_num = 1 523 | # 训练时的验证集分割比例,默认为0.1, 524 | # 即将数据集中90%的数据用于训练,10%的用于测试 525 | self.val_split = 0.1 526 | # 训练分为两步,第一步冻结大多数层进行训练,第二步解冻进行微调 527 | # 是否开启冻结训练,建议开启 528 | self.frozen_train = True 529 | # 冻结时,训练的epoch数 530 | self.frozen_train_epochs = 50 531 | # 冻结时,训练的batch_size 532 | self.frozen_batch_size = 32 533 | # 冻结时的初始学习率 534 | self.frozen_lr = 1e-3 535 | # 是否开启解冻训练,建议开启 536 | self.unfreeze_train = True 537 | # 解冻时,训练的epoch数 538 | self.unfreeze_train_epochs = 50 539 | # 解冻时,训练的batch_size.注意,解冻时训练对GPU内存需求量非常大,这里建议设置小一点 540 | self.unfreeze_batch_size = 1 541 | # 解冻时的初始学习率 542 | self.unfreeze_lr = 1e-4 543 | 544 | def __setattr__(self, key, value): 545 | _key = '_{}'.format(key) 546 | if key not in self.__dict__ and _key in self.__dict__: 547 | self.__dict__[_key] = value 548 | else: 549 | self.__dict__[key] = value 550 | 551 | @classmethod 552 | def make_dir(cls, path): 553 | if not exists(path): 554 | mkdir(path) 555 | 556 | @classmethod 557 | def join_and_abspath(cls, path1, path2): 558 | return abspath(join(path1, path2)) 559 | 560 | def inner_abspath(self, filename): 561 | self.make_dir(self.inner_xyolo_data_dir) 562 | return self.join_and_abspath(self.inner_xyolo_data_dir, filename) 563 | 564 | def outer_abspath(self, filename): 565 | self.make_dir(self.outer_xyolo_data_dir) 566 | return self.join_and_abspath(self.outer_xyolo_data_dir, filename) 567 | 568 | @property 569 | def pre_training_weights_darknet_path(self): 570 | return self.inner_abspath(self._pre_training_weights_darknet_path) 571 | 572 | @property 573 | def pre_training_weights_config_path(self): 574 | return self.inner_abspath(self._pre_training_weights_config_path) 575 | 576 | @property 577 | def pre_training_weights_keras_path(self): 578 | return self.inner_abspath(self._pre_training_weights_keras_path) 579 | 580 | @property 581 | def anchors_path(self): 582 | return self.inner_abspath(self._anchors_path) 583 | 584 | @property 585 | def classes_path(self): 586 | return self.inner_abspath(self._classes_path) 587 | 588 | @property 589 | def output_model_path(self): 590 | return self.outer_abspath(self._output_model_path) 591 | 592 | @property 593 | def dataset_path(self): 594 | return self.outer_abspath(self._dataset_path) 595 | 596 | @property 597 | def tensorboard_log_path(self): 598 | return self.outer_abspath(self._tensorboard_log_path) 599 | 600 | @property 601 | def model_path(self): 602 | """ 603 | Yolo模型默认加载的权重的路径。 604 | 按照 _model_path > output_model_path > pre_training_weights_keras_path 的优先级选择,即: 605 | 如果设置了_model_path,选择_model_path 606 | 否则,如果设置了output_model_path且路径存在,选择output_model_path 607 | 否则,选择pre_training_weights_keras_path 608 | """ 609 | _model_path = getattr(self, '_model_path', '') 610 | if _model_path: 611 | return abspath(_model_path) 612 | if self._output_model_path and exists(self.output_model_path): 613 | return self.output_model_path 614 | return self.pre_training_weights_keras_path 615 | 616 | ``` 617 | 618 | 配置项都有注释说明,就不再多讲了。这里我还想提一下`model_path`这个`property`。 619 | 620 | 621 | 622 | `model_path`决定了xyolo在进行检测时,会加载那个模型文件。它的选择逻辑如下: 623 | 624 | 按照 `_model_path` > `output_model_path` > `pre_training_weights_keras_path` 的优先级选择,即: 625 | 626 | - 如果设置了`_model_path`,选择`_model_path`(`3、使用自己的数据训练模型`中检测部分就是这种情况) 627 | - 否则,如果设置了`output_model_path`且路径存在,选择`output_model_path`(也就是说,因为设置了`_output_model_path`,所以假设`3、使用自己的数据训练模型`中没配置`model_path`,它在检测时也能够正确加载模型) 628 | - 否则,选择`pre_training_weights_keras_path` (也就是转换后的官方预训练模型,即`1、使用官方预训练权重,进行目标检测测试` 中的情况) 629 | 630 | 631 | 632 | ### 5、yolo对象的几个常用方法 633 | 634 | 上面几个检测示例中,我们调用的都是同一个方法: 635 | 636 | ```python 637 | img = yolo.detect_and_draw_image('./xyolo_data/detect.jpg') 638 | img.show() 639 | ``` 640 | 641 | 它的输入是一个PIL.Image.Image类的实例,或者一个表示图片路径的字符串,返回的是检测并画上框框标记的图片对象,也是PIL.Image.Image类的实例。 642 | 643 | 看一下它的代码: 644 | 645 | ```python 646 | def detect_and_draw_image(self, image: typing.Union[Image.Image, str], draw_label=True) -> Image.Image: 647 | """ 648 | 在给定图片上做目标检测,并根据检测结果在图片上画出框框和标签 649 | 650 | Args: 651 | image: 要检测的图片对象(PIL.Image.Image)或路径(str) 652 | draw_label: 是否需要为框框标注类别和概率 653 | 654 | Returns: 655 | 添加了检测结果的图片对象 656 | """ 657 | predicted_results = self.detect_image(image) 658 | img = self.draw_image(image, predicted_results, draw_label=draw_label) 659 | return img 660 | ``` 661 | 662 | 可以看到,这个方法实际上是调用了两个方法,`detect_image`用来获取检测结果,`draw_image`用来在图片上绘制检测信息。 663 | 664 | 如果需要的话,我们也可以直接调用这两个方法,比如当我只需要获取图片上的目标检测结果时,我只需要调用`detect_image`即可,因为我并不关心图片的绘制。 665 | 666 | 下面附上这两个方法的接口说明: 667 | 668 | ```python 669 | def detect_image(self, img: typing.Union[Image.Image, str]) -> typing.List[ 670 | typing.Tuple[str, int, float, int, int, int, int]]: 671 | """ 672 | 在给定图片上做目标检测并返回检测结果 673 | 674 | Args: 675 | img: 要检测的图片对象(PIL.Image.Image)或路径(str) 676 | 677 | Returns: 678 | [[类别名称,类别编号,概率,左上角x值,左上角y值,右下角x值,右下角y值],...] 679 | """ 680 | pass 681 | 682 | 683 | def draw_image(self, img: typing.Union[Image.Image, str], predicted_results: typing.List[ 684 | typing.Tuple[str, int, float, int, int, int, int]], draw_label=True) -> Image.Image: 685 | """ 686 | 给定一张图片和目标检测结果,将目标检测结果绘制在图片上,并返回绘制后的图片 687 | 688 | Args: 689 | img: 要检测的图片对象(PIL.Image.Image)或路径(str) 690 | predicted_results: 目标检测的结果,[[类别名称,类别编号,概率,左上角x值,左上角y值,右下角x值,右下角y值],...] 691 | draw_label: 是否需要为框框标注类别和概率 692 | 693 | Returns: 694 | 添加了检测结果的图片对象 695 | """ 696 | pass 697 | ``` 698 | 699 | 700 | 701 | ## 后话 702 | 703 | xyolo包还没有进行过充足的测试(我自己做过使用测试,但很明显,不能覆盖到全部的情况),所以出个什么bug也是可以理解的吧(别打我,别打我,抱头)?碰到什么bug的话欢迎联系我哈。 704 | 705 | PS: 706 | 707 | > xyolo最低支持的TensorFlow版本是2.2,更低版本的没有测试和适配,不保证能用。精力有限,所以低版本也不打算适配了,所以如果是低版本导致的问题,我可能就不做处理了,请见谅~ 708 | 709 | 最后,如果您喜欢这个项目的话,给个star呗~ 710 | 711 | 感谢支持~ -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Date : 2020-10-30 23:54:59 3 | # @Author : AaronJny 4 | # @LastEditTime : 2021-01-03 20:40:19 5 | # @FilePath : /xyolo/setup.py 6 | # @Desc : 7 | import setuptools 8 | 9 | with open("README.md", "r") as fh: 10 | long_description = fh.read() 11 | 12 | setuptools.setup( 13 | name="xyolo", # Replace with your own username 14 | version="0.1.6", 15 | author="AaronJny", 16 | author_email="aaronjny7@gmail.com", 17 | description="xyolo is a highly encapsulated YOLO v3 library implemented in Python." 18 | "With xyolo, you can easily complete the training and calling of the yolo3 " 19 | "target detection task with just a few lines of Python code.", 20 | long_description=long_description, 21 | long_description_content_type="text/markdown", 22 | url="https://github.com/AaronJny/xyolo", 23 | packages=setuptools.find_packages(), 24 | package_data={ 25 | 'xyolo': ['xyolo_data/*.txt', 26 | 'xyolo_data/*.cfg'] 27 | }, 28 | classifiers=[ 29 | "Programming Language :: Python :: 3", 30 | "License :: OSI Approved :: MIT License", 31 | "Operating System :: OS Independent", 32 | ], 33 | python_requires='>=3.6', 34 | install_requires=[ 35 | 'tensorflow>=2.2', 36 | 'numpy>=1.18.1,<1.19.0', 37 | 'pillow>=7.0.0', 38 | 'matplotlib>=3.1.3', 39 | 'loguru>=0.5.1', 40 | 'requests>=2.22.0', 41 | 'tqdm>=4.42.1', 42 | 'lxml>=4.5.0', 43 | 'opencv-python>=4.2.0' 44 | ] 45 | ) 46 | -------------------------------------------------------------------------------- /tests/eval_default.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Time : 2020/10/31 15:20 3 | # @Author : AaronJny 4 | # @File : use_proxies.py 5 | # @Desc : 使用预训练权重对图片执行目标检测 6 | # 导入包 7 | # 导入包 8 | from xyolo import YOLO, DefaultYolo3Config 9 | from xyolo import init_yolo_v3 10 | 11 | 12 | # 创建默认配置类对象 13 | config = DefaultYolo3Config() 14 | # 初始化xyolo(下载预训练权重、转换权重等操作都是在这里完成的) 15 | # 下载和转换只在第一次调用的时候进行,之后再调用会使用缓存的文件 16 | init_yolo_v3(config) 17 | # 创建一个yolo对象,这个对象提供使用yolov3进行检测和训练的接口 18 | yolo = YOLO(config) 19 | 20 | # 检测并在图片上标注出物体 21 | img = yolo.detect_and_draw_image('./xyolo_data/detect.jpg') 22 | # 展示标注后图片 23 | img.show() -------------------------------------------------------------------------------- /tests/eval_mydata.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Time : 2020/10/31 19:49 3 | # @Author : AaronJny 4 | # @File : eval_mydata.py 5 | # @Desc : 6 | from xyolo import DefaultYolo3Config 7 | from xyolo import YOLO 8 | 9 | 10 | class MyConfig(DefaultYolo3Config): 11 | 12 | def __init__(self): 13 | super(MyConfig, self).__init__() 14 | self._classes_path = '/Users/aaron/code/xyolo/tests/xyolo_data/classes.txt' 15 | self._model_path = '/Users/aaron/code/xyolo/tests/xyolo_data/output_model.h5' 16 | 17 | 18 | config = MyConfig() 19 | yolo = YOLO(config) 20 | image_path = '/Users/aaron/code/bctt/spider/captcha_detection/soopat/images/232.png' 21 | img = yolo.detect_and_draw_image(image_path) 22 | img.show() 23 | -------------------------------------------------------------------------------- /tests/specify_pre_training_weight.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Time : 2020/11/6 22:03 3 | # @Author : AaronJny 4 | # @File : specify_pre_training_weight.py 5 | # @Desc : 指定预训练权重路径 6 | from xyolo import YOLO, DefaultYolo3Config 7 | from xyolo import init_yolo_v3 8 | 9 | 10 | # 创建一个DefaultYolo3Config的子类,在子类里覆盖默认的配置 11 | class MyConfig(DefaultYolo3Config): 12 | def __init__(self): 13 | super(MyConfig, self).__init__() 14 | # 这是替换成你的文件路径,为了避免出错,请尽量使用绝对路径 15 | self._pre_training_weights_darknet_path = '/Users/aaron/data/darknet_yolo.weights' 16 | 17 | 18 | # 使用修改后的配置创建yolo对象 19 | config = MyConfig() 20 | init_yolo_v3(config) 21 | yolo = YOLO(config) 22 | 23 | # 检测 24 | img = yolo.detect_and_draw_image('./xyolo_data/detect.jpg') 25 | img.show() 26 | -------------------------------------------------------------------------------- /tests/test_voc2xyolo.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Time : 2020/11/3 14:44 3 | # @Author : AaronJny 4 | # @File : test_voc2xyolo.py 5 | # @Desc : 测试将voc标注格式转成xyolo需要的标注格式的方法 6 | # 引入转换脚本 7 | from xyolo import voc2xyolo 8 | 9 | # voc格式的标注数据路径的正则表达式 10 | input_path = '/Users/aaron/data/labels_voc/*.xml' 11 | # classes是我们要检测的所有有效类别名称构成的txt文件,每个类别一行 12 | classes_path = '/Users/aaron/code/xyolo/tests/xyolo_data/classes.txt' 13 | # 转换后的xyolo数据集存放路径 14 | output_path = '/Users/aaron/code/xyolo/tests/xyolo_data/xyolo_label.txt' 15 | # 开始转换 16 | voc2xyolo(input_path=input_path, classes_path=classes_path, output_path=output_path) 17 | -------------------------------------------------------------------------------- /tests/train.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Time : 2020/10/31 15:22 3 | # @Author : AaronJny 4 | # @File : train.py 5 | # @Desc : 使用xyolo训练自己的模型 6 | # 导入包 7 | from xyolo import DefaultYolo3Config, YOLO 8 | from xyolo import init_yolo_v3 9 | 10 | 11 | # 创建一个DefaultYolo3Config的子类,在子类里覆盖默认的配置 12 | class MyConfig(DefaultYolo3Config): 13 | def __init__(self): 14 | super(MyConfig, self).__init__() 15 | # 数据集路径,推荐使用绝对路径 16 | self._dataset_path = '/home/aaron/tmp/test_xyolo/xyolo_data/yolo_label.txt' 17 | # 类别名称文件路径,推荐使用绝对路径 18 | self._classes_path = '/home/aaron/tmp/test_xyolo/xyolo_data/classes.txt' 19 | # 模型保存路径,默认是保存在当前路径下的xyolo_data下的,也可以进行更改 20 | # 推荐使用绝对路径 21 | self._output_model_path = '/home/aaron/tmp/test_xyolo/output_model.h5' 22 | 23 | 24 | # 使用修改后的配置创建yolo对象 25 | config = MyConfig() 26 | init_yolo_v3(config) 27 | # 如果是训练,在创建yolo对象时要传递参数train=True 28 | yolo = YOLO(config, train=True) 29 | # 开始训练,训练完成后会自动保存 30 | yolo.fit() 31 | -------------------------------------------------------------------------------- /tests/use_proxies.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Time : 2020/10/31 15:20 3 | # @Author : AaronJny 4 | # @File : use_proxies.py 5 | # @Desc : 使用预训练权重对图片执行目标检测 6 | from xyolo import YOLO, DefaultYolo3Config 7 | from xyolo import init_yolo_v3 8 | 9 | 10 | # 创建一个DefaultYolo3Config的子类,在子类里覆盖默认的配置 11 | class MyConfig(DefaultYolo3Config): 12 | def __init__(self): 13 | super(MyConfig, self).__init__() 14 | # 这是替换成你的代理地址 15 | self.requests_proxies = {'https': 'http://localhost:7890'} 16 | 17 | 18 | # 使用修改后的配置创建yolo对象 19 | config = MyConfig() 20 | init_yolo_v3(config) 21 | yolo = YOLO(config) 22 | 23 | # 检测 24 | img = yolo.detect_and_draw_image('./xyolo_data/detect.jpg') 25 | img.show() -------------------------------------------------------------------------------- /xyolo/__init__.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Time : 2020/10/30 23:54 3 | # @Author : AaronJny 4 | # @File : __init__.py.py 5 | # @Desc : 6 | from .config import DefaultYolo3Config 7 | from .init_yolo import init_yolo_v3 8 | from .preprocessing import voc2xyolo 9 | from .yolo3.yolo import YOLO 10 | -------------------------------------------------------------------------------- /xyolo/config.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Time : 2020/10/31 10:50 3 | # @Author : AaronJny 4 | # @File : config.py 5 | # @Desc : 6 | from os.path import abspath, join, dirname, exists 7 | from os import mkdir 8 | 9 | 10 | class DefaultYolo3Config: 11 | """ 12 | yolo3模型的默认设置 13 | """ 14 | 15 | def __init__(self): 16 | # xyolo各种数据的保存路径,包内置 17 | self.inner_xyolo_data_dir = abspath(join(dirname(__file__), './xyolo_data')) 18 | # xyolo各种数据的保存路径,包外,针对于项目 19 | self.outer_xyolo_data_dir = abspath('./xyolo_data') 20 | # yolo3预训练权重下载地址 21 | self.pre_training_weights_url = 'https://pjreddie.com/media/files/yolov3.weights' 22 | # 下载文件时的http代理, 23 | # 如需设置,格式为{'https_proxy':'host:port'},如{'https_proxy':'http://127.0.0.1:7890'}, 24 | # 详细设置请参考 https://requests.readthedocs.io/en/master/user/advanced/#proxies 25 | self.requests_proxies = None 26 | # Darknet格式的预训练权重路径,请填写相对于inner_xyolo_data_dir的相对或绝对路径 27 | self._pre_training_weights_darknet_path = 'darknet_yolo.weights' 28 | # yolo3预训练权重darknet md5 hash值,用于处理异常数据 29 | self.pre_training_weights_darknet_md5 = 'c84e5b99d0e52cd466ae710cadf6d84c' 30 | # 转化后的、Keras格式的预训练权重路径,请填写相对于inner_xyolo_data_dir的相对或绝对路径 31 | self._pre_training_weights_keras_path = 'keras_weights.h5' 32 | # 预训练权重的配置路径,请填写相对于inner_xyolo_data_dir的相对或绝对路径 33 | self._pre_training_weights_config_path = 'yolov3.cfg' 34 | # 默认的anchors box路径,请填写相对于inner_xyolo_data_dir的相对或绝对路径 35 | self._anchors_path = 'yolo_anchors.txt' 36 | # 默认的类别文本路径,请填写相对于inner_xyolo_data_dir的相对或绝对路径 37 | self._classes_path = 'coco_classes.txt' 38 | # 训练输出的模型地址,请填写相对于outer_xyolo_data_dir的相对或绝对路径 39 | self._output_model_path = 'output_model.h5' 40 | # 数据集路径,请填写相对于outer_xyolo_data_dir的相对或绝对路径 41 | self._dataset_path = 'dataset.txt' 42 | # 是否开启TensorBoard,默认开启 43 | self.use_tensorboard = True 44 | # 训练时TensorBoard输出路径,请填写相对于outer_xyolo_data_dir的相对或绝对路径 45 | self._tensorboard_log_path = './tensorboard/logs' 46 | # 是否开启CheckPoint,默认开启 47 | self.use_checkpoint = True 48 | # 是否开启学习率衰减 49 | self.use_reduce_lr = True 50 | # 学习率衰减监控指标,默认为验证loss 51 | self.reduce_lr_monitor = 'val_loss' 52 | # 学习率衰减因子,new_lr = lr * factor 53 | self.reduce_lr_factor = 0.1 54 | # 连续patience个epochs内结果未改善,则进行学习率衰减 55 | self.reduce_lr_patience = 3 56 | # 是否开启early_stopping 57 | self.use_early_stopping = True 58 | # early_stopping监控指标,默认为验证loss 59 | self.early_stopping_monitor = 'val_loss' 60 | # 指标至少变化多少认为结果改善了 61 | self.early_stopping_min_delta = 0 62 | # 连续patience个epochs内结果未改善,则提前结束训练 63 | self.early_stopping_patience = 10 64 | # yolo默认加载的模型路径(最好填写绝对路径),优先级设置见下方model_path方法 65 | self._model_path = '' 66 | # 目标检测分数阈值 67 | self.score = 0.3 68 | # 交并比阈值 69 | self.iou = 0.45 70 | # 模型图片大小 71 | self.model_image_size = (416, 416) 72 | # GPU数量 73 | self.gpu_num = 1 74 | # 训练时的验证集分割比例,默认为0.1, 75 | # 即将数据集中90%的数据用于训练,10%的用于测试 76 | self.val_split = 0.1 77 | # 训练分为两步,第一步冻结大多数层进行训练,第二步解冻进行微调 78 | # 是否开启冻结训练,建议开启 79 | self.frozen_train = True 80 | # 冻结时,训练的epoch数 81 | self.frozen_train_epochs = 50 82 | # 冻结时,训练的batch_size 83 | self.frozen_batch_size = 32 84 | # 冻结时的初始学习率 85 | self.frozen_lr = 1e-3 86 | # 是否开启解冻训练,建议开启 87 | self.unfreeze_train = True 88 | # 解冻时,训练的epoch数 89 | self.unfreeze_train_epochs = 50 90 | # 解冻时,训练的batch_size.注意,解冻时训练对GPU内存需求量非常大,这里建议设置小一点 91 | self.unfreeze_batch_size = 1 92 | # 解冻时的初始学习率 93 | self.unfreeze_lr = 1e-4 94 | 95 | def __setattr__(self, key, value): 96 | _key = '_{}'.format(key) 97 | if key not in self.__dict__ and _key in self.__dict__: 98 | self.__dict__[_key] = value 99 | else: 100 | self.__dict__[key] = value 101 | 102 | @classmethod 103 | def make_dir(cls, path): 104 | if not exists(path): 105 | mkdir(path) 106 | 107 | @classmethod 108 | def join_and_abspath(cls, path1, path2): 109 | return abspath(join(path1, path2)) 110 | 111 | def inner_abspath(self, filename): 112 | self.make_dir(self.inner_xyolo_data_dir) 113 | return self.join_and_abspath(self.inner_xyolo_data_dir, filename) 114 | 115 | def outer_abspath(self, filename): 116 | self.make_dir(self.outer_xyolo_data_dir) 117 | return self.join_and_abspath(self.outer_xyolo_data_dir, filename) 118 | 119 | @property 120 | def pre_training_weights_darknet_path(self): 121 | return self.inner_abspath(self._pre_training_weights_darknet_path) 122 | 123 | @property 124 | def pre_training_weights_config_path(self): 125 | return self.inner_abspath(self._pre_training_weights_config_path) 126 | 127 | @property 128 | def pre_training_weights_keras_path(self): 129 | return self.inner_abspath(self._pre_training_weights_keras_path) 130 | 131 | @property 132 | def anchors_path(self): 133 | return self.inner_abspath(self._anchors_path) 134 | 135 | @property 136 | def classes_path(self): 137 | return self.inner_abspath(self._classes_path) 138 | 139 | @property 140 | def output_model_path(self): 141 | return self.outer_abspath(self._output_model_path) 142 | 143 | @property 144 | def dataset_path(self): 145 | return self.outer_abspath(self._dataset_path) 146 | 147 | @property 148 | def tensorboard_log_path(self): 149 | return self.outer_abspath(self._tensorboard_log_path) 150 | 151 | @property 152 | def model_path(self): 153 | """ 154 | Yolo模型默认加载的权重的路径。 155 | 按照 _model_path > output_model_path > pre_training_weights_keras_path 的优先级选择,即: 156 | 如果设置了_model_path,选择_model_path 157 | 否则,如果设置了output_model_path且路径存在,选择output_model_path 158 | 否则,选择pre_training_weights_keras_path 159 | """ 160 | _model_path = getattr(self, '_model_path', '') 161 | if _model_path: 162 | return abspath(_model_path) 163 | if self._output_model_path and exists(self.output_model_path): 164 | return self.output_model_path 165 | return self.pre_training_weights_keras_path 166 | -------------------------------------------------------------------------------- /xyolo/convert.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env python 2 | """ 3 | Reads Darknet config and weights and creates Keras model with TF backend. 4 | 5 | """ 6 | 7 | import argparse 8 | import configparser 9 | import io 10 | import os 11 | from collections import defaultdict 12 | 13 | import numpy as np 14 | import tensorflow.keras.backend as K 15 | from loguru import logger 16 | from tensorflow.keras.layers import BatchNormalization 17 | from tensorflow.keras.layers import (Conv2D, Input, ZeroPadding2D, Add, 18 | UpSampling2D, MaxPooling2D, Concatenate) 19 | from tensorflow.keras.layers import LeakyReLU 20 | from tensorflow.keras.models import Model 21 | from tensorflow.keras.regularizers import l2 22 | from tensorflow.keras.utils import plot_model as plot 23 | 24 | parser = argparse.ArgumentParser(description='Darknet To Keras Converter.') 25 | parser.add_argument('config_path', help='Path to Darknet cfg file.') 26 | parser.add_argument('weights_path', help='Path to Darknet weights file.') 27 | parser.add_argument('output_path', help='Path to output Keras model file.') 28 | parser.add_argument( 29 | '-p', 30 | '--plot_model', 31 | help='Plot generated Keras model and save as image.', 32 | action='store_true') 33 | parser.add_argument( 34 | '-w', 35 | '--weights_only', 36 | help='Save as Keras weights file instead of model file.', 37 | action='store_true') 38 | 39 | 40 | def unique_config_sections(config_file): 41 | """Convert all config sections to have unique names. 42 | 43 | Adds unique suffixes to config sections for compability with configparser. 44 | """ 45 | section_counters = defaultdict(int) 46 | output_stream = io.StringIO() 47 | with open(config_file) as fin: 48 | for line in fin: 49 | if line.startswith('['): 50 | section = line.strip().strip('[]') 51 | _section = section + '_' + str(section_counters[section]) 52 | section_counters[section] += 1 53 | line = line.replace(section, _section) 54 | output_stream.write(line) 55 | output_stream.seek(0) 56 | return output_stream 57 | 58 | 59 | def convert(config_path, weights_path, output_path, weights_only=None, plot_model=None): 60 | output_root = os.path.splitext(output_path)[0] 61 | 62 | # Load weights and config. 63 | logger.info('Loading weights.') 64 | weights_file = open(weights_path, 'rb') 65 | major, minor, revision = np.ndarray( 66 | shape=(3,), dtype='int32', buffer=weights_file.read(12)) 67 | if (major * 10 + minor) >= 2 and major < 1000 and minor < 1000: 68 | seen = np.ndarray(shape=(1,), dtype='int64', buffer=weights_file.read(8)) 69 | else: 70 | seen = np.ndarray(shape=(1,), dtype='int32', buffer=weights_file.read(4)) 71 | logger.info('Weights Header: ', major, minor, revision, seen) 72 | 73 | logger.info('Parsing Darknet config.') 74 | unique_config_file = unique_config_sections(config_path) 75 | cfg_parser = configparser.ConfigParser() 76 | cfg_parser.read_file(unique_config_file) 77 | 78 | logger.info('Creating Keras model.') 79 | input_layer = Input(shape=(None, None, 3)) 80 | prev_layer = input_layer 81 | all_layers = [] 82 | 83 | weight_decay = float(cfg_parser['net_0']['decay'] 84 | ) if 'net_0' in cfg_parser.sections() else 5e-4 85 | count = 0 86 | out_index = [] 87 | for section in cfg_parser.sections(): 88 | logger.debug('Parsing section {}'.format(section)) 89 | if section.startswith('convolutional'): 90 | filters = int(cfg_parser[section]['filters']) 91 | size = int(cfg_parser[section]['size']) 92 | stride = int(cfg_parser[section]['stride']) 93 | pad = int(cfg_parser[section]['pad']) 94 | activation = cfg_parser[section]['activation'] 95 | batch_normalize = 'batch_normalize' in cfg_parser[section] 96 | 97 | padding = 'same' if pad == 1 and stride == 1 else 'valid' 98 | 99 | # Setting weights. 100 | # Darknet serializes convolutional weights as: 101 | # [bias/beta, [gamma, mean, variance], conv_weights] 102 | prev_layer_shape = K.int_shape(prev_layer) 103 | 104 | weights_shape = (size, size, prev_layer_shape[-1], filters) 105 | darknet_w_shape = (filters, weights_shape[2], size, size) 106 | weights_size = np.product(weights_shape) 107 | 108 | logger.debug(' '.join(['conv2d', 'bn' if batch_normalize else ' ', 109 | activation, str(weights_shape)])) 110 | 111 | conv_bias = np.ndarray( 112 | shape=(filters,), 113 | dtype='float32', 114 | buffer=weights_file.read(filters * 4)) 115 | count += filters 116 | 117 | if batch_normalize: 118 | bn_weights = np.ndarray( 119 | shape=(3, filters), 120 | dtype='float32', 121 | buffer=weights_file.read(filters * 12)) 122 | count += 3 * filters 123 | 124 | bn_weight_list = [ 125 | bn_weights[0], # scale gamma 126 | conv_bias, # shift beta 127 | bn_weights[1], # running mean 128 | bn_weights[2] # running var 129 | ] 130 | 131 | conv_weights = np.ndarray( 132 | shape=darknet_w_shape, 133 | dtype='float32', 134 | buffer=weights_file.read(weights_size * 4)) 135 | count += weights_size 136 | 137 | # DarkNet conv_weights are serialized Caffe-style: 138 | # (out_dim, in_dim, height, width) 139 | # We would like to set these to Tensorflow order: 140 | # (height, width, in_dim, out_dim) 141 | conv_weights = np.transpose(conv_weights, [2, 3, 1, 0]) 142 | conv_weights = [conv_weights] if batch_normalize else [ 143 | conv_weights, conv_bias 144 | ] 145 | 146 | # Handle activation. 147 | act_fn = None 148 | if activation == 'leaky': 149 | pass # Add advanced activation later. 150 | elif activation != 'linear': 151 | raise ValueError( 152 | 'Unknown activation function `{}` in section {}'.format( 153 | activation, section)) 154 | 155 | # Create Conv2D layer 156 | if stride > 1: 157 | # Darknet uses left and top padding instead of 'same' mode 158 | prev_layer = ZeroPadding2D(((1, 0), (1, 0)))(prev_layer) 159 | conv_layer = (Conv2D( 160 | filters, (size, size), 161 | strides=(stride, stride), 162 | kernel_regularizer=l2(weight_decay), 163 | use_bias=not batch_normalize, 164 | weights=conv_weights, 165 | activation=act_fn, 166 | padding=padding))(prev_layer) 167 | 168 | if batch_normalize: 169 | conv_layer = (BatchNormalization( 170 | weights=bn_weight_list))(conv_layer) 171 | prev_layer = conv_layer 172 | 173 | if activation == 'linear': 174 | all_layers.append(prev_layer) 175 | elif activation == 'leaky': 176 | act_layer = LeakyReLU(alpha=0.1)(prev_layer) 177 | prev_layer = act_layer 178 | all_layers.append(act_layer) 179 | 180 | elif section.startswith('route'): 181 | ids = [int(i) for i in cfg_parser[section]['layers'].split(',')] 182 | layers = [all_layers[i] for i in ids] 183 | if len(layers) > 1: 184 | logger.debug('Concatenating route layers: {}'.format(layers)) 185 | concatenate_layer = Concatenate()(layers) 186 | all_layers.append(concatenate_layer) 187 | prev_layer = concatenate_layer 188 | else: 189 | skip_layer = layers[0] # only one layer to route 190 | all_layers.append(skip_layer) 191 | prev_layer = skip_layer 192 | 193 | elif section.startswith('maxpool'): 194 | size = int(cfg_parser[section]['size']) 195 | stride = int(cfg_parser[section]['stride']) 196 | all_layers.append( 197 | MaxPooling2D( 198 | pool_size=(size, size), 199 | strides=(stride, stride), 200 | padding='same')(prev_layer)) 201 | prev_layer = all_layers[-1] 202 | 203 | elif section.startswith('shortcut'): 204 | index = int(cfg_parser[section]['from']) 205 | activation = cfg_parser[section]['activation'] 206 | assert activation == 'linear', 'Only linear activation supported.' 207 | all_layers.append(Add()([all_layers[index], prev_layer])) 208 | prev_layer = all_layers[-1] 209 | 210 | elif section.startswith('upsample'): 211 | stride = int(cfg_parser[section]['stride']) 212 | assert stride == 2, 'Only stride=2 supported.' 213 | all_layers.append(UpSampling2D(stride)(prev_layer)) 214 | prev_layer = all_layers[-1] 215 | 216 | elif section.startswith('yolo'): 217 | out_index.append(len(all_layers) - 1) 218 | all_layers.append(None) 219 | prev_layer = all_layers[-1] 220 | 221 | elif section.startswith('net'): 222 | pass 223 | 224 | else: 225 | raise ValueError( 226 | 'Unsupported section header type: {}'.format(section)) 227 | 228 | # Create and save model. 229 | if len(out_index) == 0: out_index.append(len(all_layers) - 1) 230 | model = Model(inputs=input_layer, outputs=[all_layers[i] for i in out_index]) 231 | model.summary() 232 | if weights_only: 233 | model.save_weights('{}'.format(output_path)) 234 | logger.info('Saved Keras weights to {}'.format(output_path)) 235 | else: 236 | model.save('{}'.format(output_path)) 237 | logger.info('Saved Keras model to {}'.format(output_path)) 238 | 239 | # Check to see if all weights have been read. 240 | remaining_weights = len(weights_file.read()) / 4 241 | weights_file.close() 242 | logger.info('Read {} of {} from Darknet weights.'.format(count, count + 243 | remaining_weights)) 244 | if remaining_weights > 0: 245 | logger.info('Warning: {} unused weights'.format(remaining_weights)) 246 | 247 | if plot_model: 248 | plot(model, to_file='{}.png'.format(output_root), show_shapes=True) 249 | logger.info('Saved model plot to {}.png'.format(output_root)) 250 | 251 | 252 | # %% 253 | def _main(args): 254 | config_path = os.path.expanduser(args.config_path) 255 | weights_path = os.path.expanduser(args.weights_path) 256 | assert config_path.endswith('.cfg'), '{} is not a .cfg file'.format( 257 | config_path) 258 | assert weights_path.endswith( 259 | '.weights'), '{} is not a .weights file'.format(weights_path) 260 | 261 | output_path = os.path.expanduser(args.output_path) 262 | assert output_path.endswith( 263 | '.h5'), 'output path {} is not a .h5 file'.format(output_path) 264 | convert(config_path, weights_path, output_path, weights_only=args.weights_only, plot_model=args.plot_model) 265 | 266 | 267 | if __name__ == '__main__': 268 | _main(parser.parse_args()) 269 | -------------------------------------------------------------------------------- /xyolo/init_yolo.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Time : 2020/10/31 00:18 3 | # @Author : AaronJny 4 | # @File : init_yolo.py 5 | # @Desc : 6 | import os 7 | 8 | import requests 9 | from loguru import logger 10 | from tqdm import tqdm 11 | 12 | from xyolo.config import DefaultYolo3Config 13 | from xyolo.convert import convert 14 | from hashlib import md5 15 | 16 | 17 | def compute_hash_code(filepath): 18 | """ 19 | 读取并计算给定文件的md5 hash值 20 | """ 21 | with open(filepath, 'rb') as f: 22 | data = f.read() 23 | return md5(data).hexdigest() 24 | 25 | 26 | def download_weights(config): 27 | darknet_path = config.pre_training_weights_darknet_path 28 | if os.path.exists(darknet_path): 29 | # 如果已经存在,先校验md5哈希值 30 | current_hash_code = compute_hash_code(darknet_path) 31 | # md5相同才说明已经下载了,否则重新下载 32 | if current_hash_code == config.pre_training_weights_darknet_md5: 33 | logger.info('Pre-training weights already exists! Skip!') 34 | return 35 | weights_url = config.pre_training_weights_url 36 | r = requests.get(weights_url, stream=True, proxies=config.requests_proxies) 37 | filename = weights_url.split('/')[-1] 38 | with tqdm.wrapattr(open(darknet_path, "wb"), "write", 39 | miniters=1, desc=filename, 40 | total=int(r.headers.get('content-length', 0))) as f: 41 | for chunk in r.iter_content(chunk_size=1024 * 100): 42 | if chunk: 43 | f.write(chunk) 44 | logger.info('Saved Darknet model to {}'.format(darknet_path)) 45 | 46 | 47 | def init_yolo_v3(config=None): 48 | if not config: 49 | config = DefaultYolo3Config() 50 | logger.info('Downloading Pre-training weights of yolo v3 ...') 51 | download_weights(config) 52 | logger.info('Convert Darknet -> Keras ...') 53 | if os.path.exists(config.pre_training_weights_keras_path): 54 | logger.info('Keras model already exists! Skip!') 55 | else: 56 | convert(config_path=config.pre_training_weights_config_path, 57 | weights_path=config.pre_training_weights_darknet_path, 58 | output_path=config.pre_training_weights_keras_path) 59 | logger.info('Init completed.') 60 | 61 | 62 | if __name__ == '__main__': 63 | init_yolo_v3() 64 | -------------------------------------------------------------------------------- /xyolo/preprocessing.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Time : 2020/11/3 14:26 3 | # @Author : AaronJny 4 | # @File : preprocessing.py 5 | # @Desc : 6 | import xml.etree.ElementTree as ET 7 | from glob import glob 8 | 9 | from tqdm import tqdm 10 | 11 | 12 | def _voc2xyolo(xml_path, classes): 13 | in_file = open(xml_path) 14 | tree = ET.parse(in_file) 15 | root = tree.getroot() 16 | image_path = root.find('path').text 17 | ret = [image_path, ] 18 | for obj in root.iter('object'): 19 | difficult = obj.find('difficult').text 20 | cls = obj.find('name').text 21 | if cls not in classes or int(difficult) == 1: 22 | continue 23 | cls_id = classes[cls] 24 | xmlbox = obj.find('bndbox') 25 | b = (int(xmlbox.find('xmin').text), int(xmlbox.find('ymin').text), int(xmlbox.find('xmax').text), 26 | int(xmlbox.find('ymax').text)) 27 | ret.append(",".join([str(a) for a in b]) + ',' + str(cls_id)) 28 | return ' '.join(ret) 29 | 30 | 31 | def voc2xyolo(input_path, classes_path, output_path): 32 | """ 33 | 将voc格式的标注数据转换成xyolo接受的类型 34 | 35 | Args: 36 | input_path: 输入文件路径的正则表达式。这里是使用labelImg标注的图片label文件路径 37 | classes_path: 保存实体类别的文件路径 38 | output_path: 转换后的数据集保存路径 39 | """ 40 | with open(classes_path, 'r', encoding='utf8') as f: 41 | lines = [line.strip() for line in f.readlines()] 42 | classes = dict(zip(lines, range(len(lines)))) 43 | files = glob(input_path) 44 | xyolo_lines = [] 45 | for xml_path in tqdm(files): 46 | xyolo_line = _voc2xyolo(xml_path, classes) 47 | xyolo_lines.append(xyolo_line) 48 | with open(output_path, 'w', encoding='utf8') as f: 49 | f.write('\n'.join(xyolo_lines)) 50 | -------------------------------------------------------------------------------- /xyolo/xyolo_data/coco_classes.txt: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorbike 5 | aeroplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | stop sign 13 | parking meter 14 | bench 15 | bird 16 | cat 17 | dog 18 | horse 19 | sheep 20 | cow 21 | elephant 22 | bear 23 | zebra 24 | giraffe 25 | backpack 26 | umbrella 27 | handbag 28 | tie 29 | suitcase 30 | frisbee 31 | skis 32 | snowboard 33 | sports ball 34 | kite 35 | baseball bat 36 | baseball glove 37 | skateboard 38 | surfboard 39 | tennis racket 40 | bottle 41 | wine glass 42 | cup 43 | fork 44 | knife 45 | spoon 46 | bowl 47 | banana 48 | apple 49 | sandwich 50 | orange 51 | broccoli 52 | carrot 53 | hot dog 54 | pizza 55 | donut 56 | cake 57 | chair 58 | sofa 59 | pottedplant 60 | bed 61 | diningtable 62 | toilet 63 | tvmonitor 64 | laptop 65 | mouse 66 | remote 67 | keyboard 68 | cell phone 69 | microwave 70 | oven 71 | toaster 72 | sink 73 | refrigerator 74 | book 75 | clock 76 | vase 77 | scissors 78 | teddy bear 79 | hair drier 80 | toothbrush 81 | -------------------------------------------------------------------------------- /xyolo/xyolo_data/yolo_anchors.txt: -------------------------------------------------------------------------------- 1 | 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 2 | -------------------------------------------------------------------------------- /xyolo/xyolo_data/yolov3.cfg: -------------------------------------------------------------------------------- 1 | [net] 2 | # Testing 3 | batch=1 4 | subdivisions=1 5 | # Training 6 | # batch=64 7 | # subdivisions=16 8 | width=416 9 | height=416 10 | channels=3 11 | momentum=0.9 12 | decay=0.0005 13 | angle=0 14 | saturation = 1.5 15 | exposure = 1.5 16 | hue=.1 17 | 18 | learning_rate=0.001 19 | burn_in=1000 20 | max_batches = 500200 21 | policy=steps 22 | steps=400000,450000 23 | scales=.1,.1 24 | 25 | [convolutional] 26 | batch_normalize=1 27 | filters=32 28 | size=3 29 | stride=1 30 | pad=1 31 | activation=leaky 32 | 33 | # Downsample 34 | 35 | [convolutional] 36 | batch_normalize=1 37 | filters=64 38 | size=3 39 | stride=2 40 | pad=1 41 | activation=leaky 42 | 43 | [convolutional] 44 | batch_normalize=1 45 | filters=32 46 | size=1 47 | stride=1 48 | pad=1 49 | activation=leaky 50 | 51 | [convolutional] 52 | batch_normalize=1 53 | filters=64 54 | size=3 55 | stride=1 56 | pad=1 57 | activation=leaky 58 | 59 | [shortcut] 60 | from=-3 61 | activation=linear 62 | 63 | # Downsample 64 | 65 | [convolutional] 66 | batch_normalize=1 67 | filters=128 68 | size=3 69 | stride=2 70 | pad=1 71 | activation=leaky 72 | 73 | [convolutional] 74 | batch_normalize=1 75 | filters=64 76 | size=1 77 | stride=1 78 | pad=1 79 | activation=leaky 80 | 81 | [convolutional] 82 | batch_normalize=1 83 | filters=128 84 | size=3 85 | stride=1 86 | pad=1 87 | activation=leaky 88 | 89 | [shortcut] 90 | from=-3 91 | activation=linear 92 | 93 | [convolutional] 94 | batch_normalize=1 95 | filters=64 96 | size=1 97 | stride=1 98 | pad=1 99 | activation=leaky 100 | 101 | [convolutional] 102 | batch_normalize=1 103 | filters=128 104 | size=3 105 | stride=1 106 | pad=1 107 | activation=leaky 108 | 109 | [shortcut] 110 | from=-3 111 | activation=linear 112 | 113 | # Downsample 114 | 115 | [convolutional] 116 | batch_normalize=1 117 | filters=256 118 | size=3 119 | stride=2 120 | pad=1 121 | activation=leaky 122 | 123 | [convolutional] 124 | batch_normalize=1 125 | filters=128 126 | size=1 127 | stride=1 128 | pad=1 129 | activation=leaky 130 | 131 | [convolutional] 132 | batch_normalize=1 133 | filters=256 134 | size=3 135 | stride=1 136 | pad=1 137 | activation=leaky 138 | 139 | [shortcut] 140 | from=-3 141 | activation=linear 142 | 143 | [convolutional] 144 | batch_normalize=1 145 | filters=128 146 | size=1 147 | stride=1 148 | pad=1 149 | activation=leaky 150 | 151 | [convolutional] 152 | batch_normalize=1 153 | filters=256 154 | size=3 155 | stride=1 156 | pad=1 157 | activation=leaky 158 | 159 | [shortcut] 160 | from=-3 161 | activation=linear 162 | 163 | [convolutional] 164 | batch_normalize=1 165 | filters=128 166 | size=1 167 | stride=1 168 | pad=1 169 | activation=leaky 170 | 171 | [convolutional] 172 | batch_normalize=1 173 | filters=256 174 | size=3 175 | stride=1 176 | pad=1 177 | activation=leaky 178 | 179 | [shortcut] 180 | from=-3 181 | activation=linear 182 | 183 | [convolutional] 184 | batch_normalize=1 185 | filters=128 186 | size=1 187 | stride=1 188 | pad=1 189 | activation=leaky 190 | 191 | [convolutional] 192 | batch_normalize=1 193 | filters=256 194 | size=3 195 | stride=1 196 | pad=1 197 | activation=leaky 198 | 199 | [shortcut] 200 | from=-3 201 | activation=linear 202 | 203 | 204 | [convolutional] 205 | batch_normalize=1 206 | filters=128 207 | size=1 208 | stride=1 209 | pad=1 210 | activation=leaky 211 | 212 | [convolutional] 213 | batch_normalize=1 214 | filters=256 215 | size=3 216 | stride=1 217 | pad=1 218 | activation=leaky 219 | 220 | [shortcut] 221 | from=-3 222 | activation=linear 223 | 224 | [convolutional] 225 | batch_normalize=1 226 | filters=128 227 | size=1 228 | stride=1 229 | pad=1 230 | activation=leaky 231 | 232 | [convolutional] 233 | batch_normalize=1 234 | filters=256 235 | size=3 236 | stride=1 237 | pad=1 238 | activation=leaky 239 | 240 | [shortcut] 241 | from=-3 242 | activation=linear 243 | 244 | [convolutional] 245 | batch_normalize=1 246 | filters=128 247 | size=1 248 | stride=1 249 | pad=1 250 | activation=leaky 251 | 252 | [convolutional] 253 | batch_normalize=1 254 | filters=256 255 | size=3 256 | stride=1 257 | pad=1 258 | activation=leaky 259 | 260 | [shortcut] 261 | from=-3 262 | activation=linear 263 | 264 | [convolutional] 265 | batch_normalize=1 266 | filters=128 267 | size=1 268 | stride=1 269 | pad=1 270 | activation=leaky 271 | 272 | [convolutional] 273 | batch_normalize=1 274 | filters=256 275 | size=3 276 | stride=1 277 | pad=1 278 | activation=leaky 279 | 280 | [shortcut] 281 | from=-3 282 | activation=linear 283 | 284 | # Downsample 285 | 286 | [convolutional] 287 | batch_normalize=1 288 | filters=512 289 | size=3 290 | stride=2 291 | pad=1 292 | activation=leaky 293 | 294 | [convolutional] 295 | batch_normalize=1 296 | filters=256 297 | size=1 298 | stride=1 299 | pad=1 300 | activation=leaky 301 | 302 | [convolutional] 303 | batch_normalize=1 304 | filters=512 305 | size=3 306 | stride=1 307 | pad=1 308 | activation=leaky 309 | 310 | [shortcut] 311 | from=-3 312 | activation=linear 313 | 314 | 315 | [convolutional] 316 | batch_normalize=1 317 | filters=256 318 | size=1 319 | stride=1 320 | pad=1 321 | activation=leaky 322 | 323 | [convolutional] 324 | batch_normalize=1 325 | filters=512 326 | size=3 327 | stride=1 328 | pad=1 329 | activation=leaky 330 | 331 | [shortcut] 332 | from=-3 333 | activation=linear 334 | 335 | 336 | [convolutional] 337 | batch_normalize=1 338 | filters=256 339 | size=1 340 | stride=1 341 | pad=1 342 | activation=leaky 343 | 344 | [convolutional] 345 | batch_normalize=1 346 | filters=512 347 | size=3 348 | stride=1 349 | pad=1 350 | activation=leaky 351 | 352 | [shortcut] 353 | from=-3 354 | activation=linear 355 | 356 | 357 | [convolutional] 358 | batch_normalize=1 359 | filters=256 360 | size=1 361 | stride=1 362 | pad=1 363 | activation=leaky 364 | 365 | [convolutional] 366 | batch_normalize=1 367 | filters=512 368 | size=3 369 | stride=1 370 | pad=1 371 | activation=leaky 372 | 373 | [shortcut] 374 | from=-3 375 | activation=linear 376 | 377 | [convolutional] 378 | batch_normalize=1 379 | filters=256 380 | size=1 381 | stride=1 382 | pad=1 383 | activation=leaky 384 | 385 | [convolutional] 386 | batch_normalize=1 387 | filters=512 388 | size=3 389 | stride=1 390 | pad=1 391 | activation=leaky 392 | 393 | [shortcut] 394 | from=-3 395 | activation=linear 396 | 397 | 398 | [convolutional] 399 | batch_normalize=1 400 | filters=256 401 | size=1 402 | stride=1 403 | pad=1 404 | activation=leaky 405 | 406 | [convolutional] 407 | batch_normalize=1 408 | filters=512 409 | size=3 410 | stride=1 411 | pad=1 412 | activation=leaky 413 | 414 | [shortcut] 415 | from=-3 416 | activation=linear 417 | 418 | 419 | [convolutional] 420 | batch_normalize=1 421 | filters=256 422 | size=1 423 | stride=1 424 | pad=1 425 | activation=leaky 426 | 427 | [convolutional] 428 | batch_normalize=1 429 | filters=512 430 | size=3 431 | stride=1 432 | pad=1 433 | activation=leaky 434 | 435 | [shortcut] 436 | from=-3 437 | activation=linear 438 | 439 | [convolutional] 440 | batch_normalize=1 441 | filters=256 442 | size=1 443 | stride=1 444 | pad=1 445 | activation=leaky 446 | 447 | [convolutional] 448 | batch_normalize=1 449 | filters=512 450 | size=3 451 | stride=1 452 | pad=1 453 | activation=leaky 454 | 455 | [shortcut] 456 | from=-3 457 | activation=linear 458 | 459 | # Downsample 460 | 461 | [convolutional] 462 | batch_normalize=1 463 | filters=1024 464 | size=3 465 | stride=2 466 | pad=1 467 | activation=leaky 468 | 469 | [convolutional] 470 | batch_normalize=1 471 | filters=512 472 | size=1 473 | stride=1 474 | pad=1 475 | activation=leaky 476 | 477 | [convolutional] 478 | batch_normalize=1 479 | filters=1024 480 | size=3 481 | stride=1 482 | pad=1 483 | activation=leaky 484 | 485 | [shortcut] 486 | from=-3 487 | activation=linear 488 | 489 | [convolutional] 490 | batch_normalize=1 491 | filters=512 492 | size=1 493 | stride=1 494 | pad=1 495 | activation=leaky 496 | 497 | [convolutional] 498 | batch_normalize=1 499 | filters=1024 500 | size=3 501 | stride=1 502 | pad=1 503 | activation=leaky 504 | 505 | [shortcut] 506 | from=-3 507 | activation=linear 508 | 509 | [convolutional] 510 | batch_normalize=1 511 | filters=512 512 | size=1 513 | stride=1 514 | pad=1 515 | activation=leaky 516 | 517 | [convolutional] 518 | batch_normalize=1 519 | filters=1024 520 | size=3 521 | stride=1 522 | pad=1 523 | activation=leaky 524 | 525 | [shortcut] 526 | from=-3 527 | activation=linear 528 | 529 | [convolutional] 530 | batch_normalize=1 531 | filters=512 532 | size=1 533 | stride=1 534 | pad=1 535 | activation=leaky 536 | 537 | [convolutional] 538 | batch_normalize=1 539 | filters=1024 540 | size=3 541 | stride=1 542 | pad=1 543 | activation=leaky 544 | 545 | [shortcut] 546 | from=-3 547 | activation=linear 548 | 549 | ###################### 550 | 551 | [convolutional] 552 | batch_normalize=1 553 | filters=512 554 | size=1 555 | stride=1 556 | pad=1 557 | activation=leaky 558 | 559 | [convolutional] 560 | batch_normalize=1 561 | size=3 562 | stride=1 563 | pad=1 564 | filters=1024 565 | activation=leaky 566 | 567 | [convolutional] 568 | batch_normalize=1 569 | filters=512 570 | size=1 571 | stride=1 572 | pad=1 573 | activation=leaky 574 | 575 | [convolutional] 576 | batch_normalize=1 577 | size=3 578 | stride=1 579 | pad=1 580 | filters=1024 581 | activation=leaky 582 | 583 | [convolutional] 584 | batch_normalize=1 585 | filters=512 586 | size=1 587 | stride=1 588 | pad=1 589 | activation=leaky 590 | 591 | [convolutional] 592 | batch_normalize=1 593 | size=3 594 | stride=1 595 | pad=1 596 | filters=1024 597 | activation=leaky 598 | 599 | [convolutional] 600 | size=1 601 | stride=1 602 | pad=1 603 | filters=255 604 | activation=linear 605 | 606 | 607 | [yolo] 608 | mask = 6,7,8 609 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 610 | classes=80 611 | num=9 612 | jitter=.3 613 | ignore_thresh = .5 614 | truth_thresh = 1 615 | random=1 616 | 617 | 618 | [route] 619 | layers = -4 620 | 621 | [convolutional] 622 | batch_normalize=1 623 | filters=256 624 | size=1 625 | stride=1 626 | pad=1 627 | activation=leaky 628 | 629 | [upsample] 630 | stride=2 631 | 632 | [route] 633 | layers = -1, 61 634 | 635 | 636 | 637 | [convolutional] 638 | batch_normalize=1 639 | filters=256 640 | size=1 641 | stride=1 642 | pad=1 643 | activation=leaky 644 | 645 | [convolutional] 646 | batch_normalize=1 647 | size=3 648 | stride=1 649 | pad=1 650 | filters=512 651 | activation=leaky 652 | 653 | [convolutional] 654 | batch_normalize=1 655 | filters=256 656 | size=1 657 | stride=1 658 | pad=1 659 | activation=leaky 660 | 661 | [convolutional] 662 | batch_normalize=1 663 | size=3 664 | stride=1 665 | pad=1 666 | filters=512 667 | activation=leaky 668 | 669 | [convolutional] 670 | batch_normalize=1 671 | filters=256 672 | size=1 673 | stride=1 674 | pad=1 675 | activation=leaky 676 | 677 | [convolutional] 678 | batch_normalize=1 679 | size=3 680 | stride=1 681 | pad=1 682 | filters=512 683 | activation=leaky 684 | 685 | [convolutional] 686 | size=1 687 | stride=1 688 | pad=1 689 | filters=255 690 | activation=linear 691 | 692 | 693 | [yolo] 694 | mask = 3,4,5 695 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 696 | classes=80 697 | num=9 698 | jitter=.3 699 | ignore_thresh = .5 700 | truth_thresh = 1 701 | random=1 702 | 703 | 704 | 705 | [route] 706 | layers = -4 707 | 708 | [convolutional] 709 | batch_normalize=1 710 | filters=128 711 | size=1 712 | stride=1 713 | pad=1 714 | activation=leaky 715 | 716 | [upsample] 717 | stride=2 718 | 719 | [route] 720 | layers = -1, 36 721 | 722 | 723 | 724 | [convolutional] 725 | batch_normalize=1 726 | filters=128 727 | size=1 728 | stride=1 729 | pad=1 730 | activation=leaky 731 | 732 | [convolutional] 733 | batch_normalize=1 734 | size=3 735 | stride=1 736 | pad=1 737 | filters=256 738 | activation=leaky 739 | 740 | [convolutional] 741 | batch_normalize=1 742 | filters=128 743 | size=1 744 | stride=1 745 | pad=1 746 | activation=leaky 747 | 748 | [convolutional] 749 | batch_normalize=1 750 | size=3 751 | stride=1 752 | pad=1 753 | filters=256 754 | activation=leaky 755 | 756 | [convolutional] 757 | batch_normalize=1 758 | filters=128 759 | size=1 760 | stride=1 761 | pad=1 762 | activation=leaky 763 | 764 | [convolutional] 765 | batch_normalize=1 766 | size=3 767 | stride=1 768 | pad=1 769 | filters=256 770 | activation=leaky 771 | 772 | [convolutional] 773 | size=1 774 | stride=1 775 | pad=1 776 | filters=255 777 | activation=linear 778 | 779 | 780 | [yolo] 781 | mask = 0,1,2 782 | anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 783 | classes=80 784 | num=9 785 | jitter=.3 786 | ignore_thresh = .5 787 | truth_thresh = 1 788 | random=1 789 | 790 | -------------------------------------------------------------------------------- /xyolo/yolo3/__init__.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # @Time : 2020/10/31 00:16 3 | # @Author : AaronJny 4 | # @File : __init__.py.py 5 | # @Desc : 6 | -------------------------------------------------------------------------------- /xyolo/yolo3/model.py: -------------------------------------------------------------------------------- 1 | """YOLO_v3 Model Defined in Keras.""" 2 | 3 | from functools import wraps 4 | 5 | import numpy as np 6 | import tensorflow as tf 7 | import tensorflow.keras.backend as K 8 | from tensorflow.keras.layers import BatchNormalization 9 | from tensorflow.keras.layers import Conv2D, Add, ZeroPadding2D, UpSampling2D, Concatenate, MaxPooling2D, Input, Lambda 10 | from tensorflow.keras.layers import LeakyReLU 11 | from tensorflow.keras.models import Model 12 | from tensorflow.keras.regularizers import l2 13 | 14 | from xyolo.yolo3.utils import compose 15 | 16 | 17 | @wraps(Conv2D) 18 | def DarknetConv2D(*args, **kwargs): 19 | """Wrapper to set Darknet parameters for Convolution2D.""" 20 | darknet_conv_kwargs = {'kernel_regularizer': l2(5e-4)} 21 | darknet_conv_kwargs['padding'] = 'valid' if kwargs.get('strides') == (2, 2) else 'same' 22 | darknet_conv_kwargs.update(kwargs) 23 | return Conv2D(*args, **darknet_conv_kwargs) 24 | 25 | 26 | def DarknetConv2D_BN_Leaky(*args, **kwargs): 27 | """Darknet Convolution2D followed by BatchNormalization and LeakyReLU.""" 28 | no_bias_kwargs = {'use_bias': False} 29 | no_bias_kwargs.update(kwargs) 30 | return compose( 31 | DarknetConv2D(*args, **no_bias_kwargs), 32 | BatchNormalization(), 33 | LeakyReLU(alpha=0.1)) 34 | 35 | 36 | def resblock_body(x, num_filters, num_blocks): 37 | '''A series of resblocks starting with a downsampling Convolution2D''' 38 | # Darknet uses left and top padding instead of 'same' mode 39 | x = ZeroPadding2D(((1, 0), (1, 0)))(x) 40 | x = DarknetConv2D_BN_Leaky(num_filters, (3, 3), strides=(2, 2))(x) 41 | for i in range(num_blocks): 42 | y = compose( 43 | DarknetConv2D_BN_Leaky(num_filters // 2, (1, 1)), 44 | DarknetConv2D_BN_Leaky(num_filters, (3, 3)))(x) 45 | x = Add()([x, y]) 46 | return x 47 | 48 | 49 | def darknet_body(x): 50 | '''Darknent body having 52 Convolution2D layers''' 51 | x = DarknetConv2D_BN_Leaky(32, (3, 3))(x) 52 | x = resblock_body(x, 64, 1) 53 | x = resblock_body(x, 128, 2) 54 | x = resblock_body(x, 256, 8) 55 | x = resblock_body(x, 512, 8) 56 | x = resblock_body(x, 1024, 4) 57 | return x 58 | 59 | 60 | def make_last_layers(x, num_filters, out_filters): 61 | '''6 Conv2D_BN_Leaky layers followed by a Conv2D_linear layer''' 62 | x = compose( 63 | DarknetConv2D_BN_Leaky(num_filters, (1, 1)), 64 | DarknetConv2D_BN_Leaky(num_filters * 2, (3, 3)), 65 | DarknetConv2D_BN_Leaky(num_filters, (1, 1)), 66 | DarknetConv2D_BN_Leaky(num_filters * 2, (3, 3)), 67 | DarknetConv2D_BN_Leaky(num_filters, (1, 1)))(x) 68 | y = compose( 69 | DarknetConv2D_BN_Leaky(num_filters * 2, (3, 3)), 70 | DarknetConv2D(out_filters, (1, 1)))(x) 71 | return x, y 72 | 73 | 74 | def yolo_body(inputs, num_anchors, num_classes): 75 | """Create YOLO_V3 model CNN body in Keras.""" 76 | darknet = Model(inputs, darknet_body(inputs)) 77 | x, y1 = make_last_layers(darknet.output, 512, num_anchors * (num_classes + 5)) 78 | 79 | x = compose( 80 | DarknetConv2D_BN_Leaky(256, (1, 1)), 81 | UpSampling2D(2))(x) 82 | x = Concatenate()([x, darknet.layers[152].output]) 83 | x, y2 = make_last_layers(x, 256, num_anchors * (num_classes + 5)) 84 | 85 | x = compose( 86 | DarknetConv2D_BN_Leaky(128, (1, 1)), 87 | UpSampling2D(2))(x) 88 | x = Concatenate()([x, darknet.layers[92].output]) 89 | x, y3 = make_last_layers(x, 128, num_anchors * (num_classes + 5)) 90 | 91 | return Model(inputs, [y1, y2, y3]) 92 | 93 | 94 | def tiny_yolo_body(inputs, num_anchors, num_classes): 95 | '''Create Tiny YOLO_v3 model CNN body in keras.''' 96 | x1 = compose( 97 | DarknetConv2D_BN_Leaky(16, (3, 3)), 98 | MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'), 99 | DarknetConv2D_BN_Leaky(32, (3, 3)), 100 | MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'), 101 | DarknetConv2D_BN_Leaky(64, (3, 3)), 102 | MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'), 103 | DarknetConv2D_BN_Leaky(128, (3, 3)), 104 | MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'), 105 | DarknetConv2D_BN_Leaky(256, (3, 3)))(inputs) 106 | x2 = compose( 107 | MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same'), 108 | DarknetConv2D_BN_Leaky(512, (3, 3)), 109 | MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding='same'), 110 | DarknetConv2D_BN_Leaky(1024, (3, 3)), 111 | DarknetConv2D_BN_Leaky(256, (1, 1)))(x1) 112 | y1 = compose( 113 | DarknetConv2D_BN_Leaky(512, (3, 3)), 114 | DarknetConv2D(num_anchors * (num_classes + 5), (1, 1)))(x2) 115 | 116 | x2 = compose( 117 | DarknetConv2D_BN_Leaky(128, (1, 1)), 118 | UpSampling2D(2))(x2) 119 | y2 = compose( 120 | Concatenate(), 121 | DarknetConv2D_BN_Leaky(256, (3, 3)), 122 | DarknetConv2D(num_anchors * (num_classes + 5), (1, 1)))([x2, x1]) 123 | 124 | return Model(inputs, [y1, y2]) 125 | 126 | 127 | def yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False): 128 | """Convert final layer features to bounding box parameters.""" 129 | num_anchors = len(anchors) 130 | # Reshape to batch, height, width, num_anchors, box_params. 131 | anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2]) 132 | 133 | grid_shape = K.shape(feats)[1:3] # height, width 134 | grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]), 135 | [1, grid_shape[1], 1, 1]) 136 | grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]), 137 | [grid_shape[0], 1, 1, 1]) 138 | grid = K.concatenate([grid_x, grid_y]) 139 | grid = K.cast(grid, K.dtype(feats)) 140 | 141 | feats = K.reshape( 142 | feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes + 5]) 143 | 144 | # Adjust preditions to each spatial grid point and anchor size. 145 | box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[::-1], K.dtype(feats)) 146 | box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[::-1], K.dtype(feats)) 147 | box_confidence = K.sigmoid(feats[..., 4:5]) 148 | box_class_probs = K.sigmoid(feats[..., 5:]) 149 | 150 | if calc_loss == True: 151 | return grid, feats, box_xy, box_wh 152 | return box_xy, box_wh, box_confidence, box_class_probs 153 | 154 | 155 | def yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape): 156 | '''Get corrected boxes''' 157 | box_yx = box_xy[..., ::-1] 158 | box_hw = box_wh[..., ::-1] 159 | input_shape = K.cast(input_shape, K.dtype(box_yx)) 160 | image_shape = K.cast(image_shape, K.dtype(box_yx)) 161 | new_shape = K.round(image_shape * K.min(input_shape / image_shape)) 162 | offset = (input_shape - new_shape) / 2. / input_shape 163 | scale = input_shape / new_shape 164 | box_yx = (box_yx - offset) * scale 165 | box_hw *= scale 166 | 167 | box_mins = box_yx - (box_hw / 2.) 168 | box_maxes = box_yx + (box_hw / 2.) 169 | boxes = K.concatenate([ 170 | box_mins[..., 0:1], # y_min 171 | box_mins[..., 1:2], # x_min 172 | box_maxes[..., 0:1], # y_max 173 | box_maxes[..., 1:2] # x_max 174 | ]) 175 | 176 | # Scale boxes back to original image shape. 177 | boxes *= K.concatenate([image_shape, image_shape]) 178 | return boxes 179 | 180 | 181 | def yolo_boxes_and_scores(feats, anchors, num_classes, input_shape, image_shape): 182 | '''Process Conv layer output''' 183 | box_xy, box_wh, box_confidence, box_class_probs = yolo_head(feats, 184 | anchors, num_classes, input_shape) 185 | boxes = yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape) 186 | boxes = K.reshape(boxes, [-1, 4]) 187 | box_scores = box_confidence * box_class_probs 188 | box_scores = K.reshape(box_scores, [-1, num_classes]) 189 | return boxes, box_scores 190 | 191 | 192 | def yolo_eval(yolo_outputs, 193 | anchors, 194 | num_classes, 195 | image_shape, 196 | max_boxes=20, 197 | score_threshold=.6, 198 | iou_threshold=.5): 199 | """Evaluate YOLO model on given input and return filtered boxes.""" 200 | num_layers = len(yolo_outputs) 201 | anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] if num_layers == 3 else [[3, 4, 5], [1, 2, 3]] # default setting 202 | input_shape = K.shape(yolo_outputs[0])[1:3] * 32 203 | boxes = [] 204 | box_scores = [] 205 | for l in range(num_layers): 206 | _boxes, _box_scores = yolo_boxes_and_scores(yolo_outputs[l], 207 | anchors[anchor_mask[l]], num_classes, input_shape, image_shape) 208 | boxes.append(_boxes) 209 | box_scores.append(_box_scores) 210 | boxes = K.concatenate(boxes, axis=0) 211 | box_scores = K.concatenate(box_scores, axis=0) 212 | 213 | mask = box_scores >= score_threshold 214 | max_boxes_tensor = K.constant(max_boxes, dtype='int32') 215 | boxes_ = [] 216 | scores_ = [] 217 | classes_ = [] 218 | for c in range(num_classes): 219 | class_boxes = tf.boolean_mask(boxes, mask[:, c]) 220 | class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c]) 221 | nms_index = tf.image.non_max_suppression( 222 | class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold) 223 | class_boxes = K.gather(class_boxes, nms_index) 224 | class_box_scores = K.gather(class_box_scores, nms_index) 225 | classes = K.ones_like(class_box_scores, 'int32') * c 226 | boxes_.append(class_boxes) 227 | scores_.append(class_box_scores) 228 | classes_.append(classes) 229 | boxes_ = K.concatenate(boxes_, axis=0) 230 | scores_ = K.concatenate(scores_, axis=0) 231 | classes_ = K.concatenate(classes_, axis=0) 232 | 233 | return boxes_, scores_, classes_ 234 | 235 | 236 | def preprocess_true_boxes(true_boxes, input_shape, anchors, num_classes): 237 | '''Preprocess true boxes to training input format 238 | 239 | Parameters 240 | ---------- 241 | true_boxes: array, shape=(m, T, 5) 242 | Absolute x_min, y_min, x_max, y_max, class_id relative to input_shape. 243 | input_shape: array-like, hw, multiples of 32 244 | anchors: array, shape=(N, 2), wh 245 | num_classes: integer 246 | 247 | Returns 248 | ------- 249 | y_true: list of array, shape like yolo_outputs, xywh are reletive value 250 | 251 | ''' 252 | assert (true_boxes[..., 4] < num_classes).all(), 'class id must be less than num_classes' 253 | num_layers = len(anchors) // 3 # default setting 254 | anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] if num_layers == 3 else [[3, 4, 5], [1, 2, 3]] 255 | 256 | true_boxes = np.array(true_boxes, dtype='float32') 257 | input_shape = np.array(input_shape, dtype='int32') 258 | boxes_xy = (true_boxes[..., 0:2] + true_boxes[..., 2:4]) // 2 259 | boxes_wh = true_boxes[..., 2:4] - true_boxes[..., 0:2] 260 | true_boxes[..., 0:2] = boxes_xy / input_shape[::-1] 261 | true_boxes[..., 2:4] = boxes_wh / input_shape[::-1] 262 | 263 | m = true_boxes.shape[0] 264 | grid_shapes = [input_shape // {0: 32, 1: 16, 2: 8}[l] for l in range(num_layers)] 265 | y_true = [np.zeros((m, grid_shapes[l][0], grid_shapes[l][1], len(anchor_mask[l]), 5 + num_classes), 266 | dtype='float32') for l in range(num_layers)] 267 | 268 | # Expand dim to apply broadcasting. 269 | anchors = np.expand_dims(anchors, 0) 270 | anchor_maxes = anchors / 2. 271 | anchor_mins = -anchor_maxes 272 | valid_mask = boxes_wh[..., 0] > 0 273 | 274 | for b in range(m): 275 | # Discard zero rows. 276 | wh = boxes_wh[b, valid_mask[b]] 277 | if len(wh) == 0: continue 278 | # Expand dim to apply broadcasting. 279 | wh = np.expand_dims(wh, -2) 280 | box_maxes = wh / 2. 281 | box_mins = -box_maxes 282 | 283 | intersect_mins = np.maximum(box_mins, anchor_mins) 284 | intersect_maxes = np.minimum(box_maxes, anchor_maxes) 285 | intersect_wh = np.maximum(intersect_maxes - intersect_mins, 0.) 286 | intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1] 287 | box_area = wh[..., 0] * wh[..., 1] 288 | anchor_area = anchors[..., 0] * anchors[..., 1] 289 | iou = intersect_area / (box_area + anchor_area - intersect_area) 290 | 291 | # Find best anchor for each true box 292 | best_anchor = np.argmax(iou, axis=-1) 293 | 294 | for t, n in enumerate(best_anchor): 295 | for l in range(num_layers): 296 | if n in anchor_mask[l]: 297 | i = np.floor(true_boxes[b, t, 0] * grid_shapes[l][1]).astype('int32') 298 | j = np.floor(true_boxes[b, t, 1] * grid_shapes[l][0]).astype('int32') 299 | k = anchor_mask[l].index(n) 300 | c = true_boxes[b, t, 4].astype('int32') 301 | y_true[l][b, j, i, k, 0:4] = true_boxes[b, t, 0:4] 302 | y_true[l][b, j, i, k, 4] = 1 303 | y_true[l][b, j, i, k, 5 + c] = 1 304 | 305 | return y_true 306 | 307 | 308 | def box_iou(b1, b2): 309 | '''Return iou tensor 310 | 311 | Parameters 312 | ---------- 313 | b1: tensor, shape=(i1,...,iN, 4), xywh 314 | b2: tensor, shape=(j, 4), xywh 315 | 316 | Returns 317 | ------- 318 | iou: tensor, shape=(i1,...,iN, j) 319 | 320 | ''' 321 | 322 | # Expand dim to apply broadcasting. 323 | b1 = K.expand_dims(b1, -2) 324 | b1_xy = b1[..., :2] 325 | b1_wh = b1[..., 2:4] 326 | b1_wh_half = b1_wh / 2. 327 | b1_mins = b1_xy - b1_wh_half 328 | b1_maxes = b1_xy + b1_wh_half 329 | 330 | # Expand dim to apply broadcasting. 331 | b2 = K.expand_dims(b2, 0) 332 | b2_xy = b2[..., :2] 333 | b2_wh = b2[..., 2:4] 334 | b2_wh_half = b2_wh / 2. 335 | b2_mins = b2_xy - b2_wh_half 336 | b2_maxes = b2_xy + b2_wh_half 337 | 338 | intersect_mins = K.maximum(b1_mins, b2_mins) 339 | intersect_maxes = K.minimum(b1_maxes, b2_maxes) 340 | intersect_wh = K.maximum(intersect_maxes - intersect_mins, 0.) 341 | intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1] 342 | b1_area = b1_wh[..., 0] * b1_wh[..., 1] 343 | b2_area = b2_wh[..., 0] * b2_wh[..., 1] 344 | iou = intersect_area / (b1_area + b2_area - intersect_area) 345 | 346 | return iou 347 | 348 | 349 | def yolo_loss(args, anchors, num_classes, ignore_thresh=.5, print_loss=False): 350 | '''Return yolo_loss tensor 351 | 352 | Parameters 353 | ---------- 354 | yolo_outputs: list of tensor, the output of yolo_body or tiny_yolo_body 355 | y_true: list of array, the output of preprocess_true_boxes 356 | anchors: array, shape=(N, 2), wh 357 | num_classes: integer 358 | ignore_thresh: float, the iou threshold whether to ignore object confidence loss 359 | 360 | Returns 361 | ------- 362 | loss: tensor, shape=(1,) 363 | 364 | ''' 365 | num_layers = len(anchors) // 3 # default setting 366 | yolo_outputs = args[:num_layers] 367 | y_true = args[num_layers:] 368 | anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] if num_layers == 3 else [[3, 4, 5], [1, 2, 3]] 369 | input_shape = K.cast(K.shape(yolo_outputs[0])[1:3] * 32, K.dtype(y_true[0])) 370 | grid_shapes = [K.cast(K.shape(yolo_outputs[l])[1:3], K.dtype(y_true[0])) for l in range(num_layers)] 371 | loss = 0 372 | m = K.shape(yolo_outputs[0])[0] # batch size, tensor 373 | mf = K.cast(m, K.dtype(yolo_outputs[0])) 374 | 375 | for l in range(num_layers): 376 | object_mask = y_true[l][..., 4:5] 377 | true_class_probs = y_true[l][..., 5:] 378 | 379 | grid, raw_pred, pred_xy, pred_wh = yolo_head(yolo_outputs[l], 380 | anchors[anchor_mask[l]], num_classes, input_shape, calc_loss=True) 381 | pred_box = K.concatenate([pred_xy, pred_wh]) 382 | 383 | # Darknet raw box to calculate loss. 384 | raw_true_xy = y_true[l][..., :2] * grid_shapes[l][::-1] - grid 385 | raw_true_wh = K.log(y_true[l][..., 2:4] / anchors[anchor_mask[l]] * input_shape[::-1]) 386 | raw_true_wh = K.switch(object_mask, raw_true_wh, K.zeros_like(raw_true_wh)) # avoid log(0)=-inf 387 | box_loss_scale = 2 - y_true[l][..., 2:3] * y_true[l][..., 3:4] 388 | 389 | # Find ignore mask, iterate over each of batch. 390 | ignore_mask = tf.TensorArray(K.dtype(y_true[0]), size=1, dynamic_size=True) 391 | object_mask_bool = K.cast(object_mask, 'bool') 392 | 393 | def loop_body(b, ignore_mask): 394 | true_box = tf.boolean_mask(y_true[l][b, ..., 0:4], object_mask_bool[b, ..., 0]) 395 | iou = box_iou(pred_box[b], true_box) 396 | best_iou = K.max(iou, axis=-1) 397 | ignore_mask = ignore_mask.write(b, K.cast(best_iou < ignore_thresh, K.dtype(true_box))) 398 | return b + 1, ignore_mask 399 | 400 | _, ignore_mask = tf.while_loop(lambda b, *args: b < m, loop_body, [0, ignore_mask]) 401 | ignore_mask = ignore_mask.stack() 402 | ignore_mask = K.expand_dims(ignore_mask, -1) 403 | 404 | # K.binary_crossentropy is helpful to avoid exp overflow. 405 | xy_loss = object_mask * box_loss_scale * K.binary_crossentropy(raw_true_xy, raw_pred[..., 0:2], 406 | from_logits=True) 407 | wh_loss = object_mask * box_loss_scale * 0.5 * K.square(raw_true_wh - raw_pred[..., 2:4]) 408 | confidence_loss = object_mask * K.binary_crossentropy(object_mask, raw_pred[..., 4:5], from_logits=True) + \ 409 | (1 - object_mask) * K.binary_crossentropy(object_mask, raw_pred[..., 4:5], 410 | from_logits=True) * ignore_mask 411 | class_loss = object_mask * K.binary_crossentropy(true_class_probs, raw_pred[..., 5:], from_logits=True) 412 | 413 | xy_loss = K.sum(xy_loss) / mf 414 | wh_loss = K.sum(wh_loss) / mf 415 | confidence_loss = K.sum(confidence_loss) / mf 416 | class_loss = K.sum(class_loss) / mf 417 | loss += xy_loss + wh_loss + confidence_loss + class_loss 418 | if print_loss: 419 | loss = tf.print(loss, [loss, xy_loss, wh_loss, confidence_loss, class_loss, K.sum(ignore_mask)], 420 | message='loss: ') 421 | return loss 422 | 423 | 424 | def create_model(input_shape, anchors, num_classes, weights_path, load_pretrained=True, freeze_body=2): 425 | '''create the training model''' 426 | K.clear_session() # get a new session 427 | image_input = Input(shape=(None, None, 3)) 428 | h, w = input_shape 429 | num_anchors = len(anchors) 430 | 431 | y_true = [Input(shape=(h // {0: 32, 1: 16, 2: 8}[l], w // {0: 32, 1: 16, 2: 8}[l], \ 432 | num_anchors // 3, num_classes + 5)) for l in range(3)] 433 | 434 | model_body = yolo_body(image_input, num_anchors // 3, num_classes) 435 | print('Create YOLOv3 model with {} anchors and {} classes.'.format(num_anchors, num_classes)) 436 | 437 | if load_pretrained: 438 | model_body.load_weights(weights_path, by_name=True, skip_mismatch=True) 439 | print('Load weights {}.'.format(weights_path)) 440 | if freeze_body in [1, 2]: 441 | # Freeze darknet53 body or freeze all but 3 output layers. 442 | num = (185, len(model_body.layers) - 3)[freeze_body - 1] 443 | for i in range(num): model_body.layers[i].trainable = False 444 | print('Freeze the first {} layers of total {} layers.'.format(num, len(model_body.layers))) 445 | 446 | model_loss = Lambda(yolo_loss, output_shape=(1,), name='yolo_loss', 447 | arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.5})( 448 | [*model_body.output, *y_true]) 449 | model = Model([model_body.input, *y_true], model_loss) 450 | 451 | return model 452 | 453 | 454 | def create_tiny_model(input_shape, anchors, num_classes, weights_path, load_pretrained=True, freeze_body=2): 455 | '''create the training model, for Tiny YOLOv3''' 456 | K.clear_session() # get a new session 457 | image_input = Input(shape=(None, None, 3)) 458 | h, w = input_shape 459 | num_anchors = len(anchors) 460 | 461 | y_true = [Input(shape=(h // {0: 32, 1: 16}[l], w // {0: 32, 1: 16}[l], \ 462 | num_anchors // 2, num_classes + 5)) for l in range(2)] 463 | 464 | model_body = tiny_yolo_body(image_input, num_anchors // 2, num_classes) 465 | print('Create Tiny YOLOv3 model with {} anchors and {} classes.'.format(num_anchors, num_classes)) 466 | 467 | if load_pretrained: 468 | model_body.load_weights(weights_path, by_name=True, skip_mismatch=True) 469 | print('Load weights {}.'.format(weights_path)) 470 | if freeze_body in [1, 2]: 471 | # Freeze the darknet body or freeze all but 2 output layers. 472 | num = (20, len(model_body.layers) - 2)[freeze_body - 1] 473 | for i in range(num): model_body.layers[i].trainable = False 474 | print('Freeze the first {} layers of total {} layers.'.format(num, len(model_body.layers))) 475 | 476 | model_loss = Lambda(yolo_loss, output_shape=(1,), name='yolo_loss', 477 | arguments={'anchors': anchors, 'num_classes': num_classes, 'ignore_thresh': 0.7})( 478 | [*model_body.output, *y_true]) 479 | model = Model([model_body.input, *y_true], model_loss) 480 | 481 | return model 482 | -------------------------------------------------------------------------------- /xyolo/yolo3/utils.py: -------------------------------------------------------------------------------- 1 | """Miscellaneous utility functions.""" 2 | 3 | from functools import reduce 4 | 5 | import numpy as np 6 | from PIL import Image 7 | from matplotlib.colors import rgb_to_hsv, hsv_to_rgb 8 | 9 | 10 | def compose(*funcs): 11 | """Compose arbitrarily many functions, evaluated left to right. 12 | 13 | Reference: https://mathieularose.com/function-composition-in-python/ 14 | """ 15 | # return lambda x: reduce(lambda v, f: f(v), funcs, x) 16 | if funcs: 17 | return reduce(lambda f, g: lambda *a, **kw: g(f(*a, **kw)), funcs) 18 | else: 19 | raise ValueError('Composition of empty sequence not supported.') 20 | 21 | 22 | def letterbox_image(image, size): 23 | '''resize image with unchanged aspect ratio using padding''' 24 | iw, ih = image.size 25 | w, h = size 26 | scale = min(w / iw, h / ih) 27 | nw = int(iw * scale) 28 | nh = int(ih * scale) 29 | 30 | image = image.resize((nw, nh), Image.BICUBIC) 31 | new_image = Image.new('RGB', size, (128, 128, 128)) 32 | new_image.paste(image, ((w - nw) // 2, (h - nh) // 2)) 33 | return new_image 34 | 35 | 36 | def rand(a=0, b=1): 37 | return np.random.rand() * (b - a) + a 38 | 39 | 40 | def get_random_data(annotation_line, input_shape, random=True, max_boxes=20, jitter=.3, hue=.1, sat=1.5, val=1.5, 41 | proc_img=True): 42 | '''random preprocessing for real-time data augmentation''' 43 | line = annotation_line.split() 44 | image = Image.open(line[0]) 45 | iw, ih = image.size 46 | h, w = input_shape 47 | box = np.array([np.array(list(map(int, box.split(',')))) for box in line[1:]]) 48 | 49 | if not random: 50 | # resize image 51 | scale = min(w / iw, h / ih) 52 | nw = int(iw * scale) 53 | nh = int(ih * scale) 54 | dx = (w - nw) // 2 55 | dy = (h - nh) // 2 56 | image_data = 0 57 | if proc_img: 58 | image = image.resize((nw, nh), Image.BICUBIC) 59 | new_image = Image.new('RGB', (w, h), (128, 128, 128)) 60 | new_image.paste(image, (dx, dy)) 61 | image_data = np.array(new_image) / 255. 62 | 63 | # correct boxes 64 | box_data = np.zeros((max_boxes, 5)) 65 | if len(box) > 0: 66 | np.random.shuffle(box) 67 | if len(box) > max_boxes: box = box[:max_boxes] 68 | box[:, [0, 2]] = box[:, [0, 2]] * scale + dx 69 | box[:, [1, 3]] = box[:, [1, 3]] * scale + dy 70 | box_data[:len(box)] = box 71 | 72 | return image_data, box_data 73 | 74 | # resize image 75 | new_ar = w / h * rand(1 - jitter, 1 + jitter) / rand(1 - jitter, 1 + jitter) 76 | scale = rand(.25, 2) 77 | if new_ar < 1: 78 | nh = int(scale * h) 79 | nw = int(nh * new_ar) 80 | else: 81 | nw = int(scale * w) 82 | nh = int(nw / new_ar) 83 | image = image.resize((nw, nh), Image.BICUBIC) 84 | 85 | # place image 86 | dx = int(rand(0, w - nw)) 87 | dy = int(rand(0, h - nh)) 88 | new_image = Image.new('RGB', (w, h), (128, 128, 128)) 89 | new_image.paste(image, (dx, dy)) 90 | image = new_image 91 | 92 | # flip image or not 93 | flip = rand() < .5 94 | if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT) 95 | 96 | # distort image 97 | hue = rand(-hue, hue) 98 | sat = rand(1, sat) if rand() < .5 else 1 / rand(1, sat) 99 | val = rand(1, val) if rand() < .5 else 1 / rand(1, val) 100 | x = rgb_to_hsv(np.array(image) / 255.) 101 | x[..., 0] += hue 102 | x[..., 0][x[..., 0] > 1] -= 1 103 | x[..., 0][x[..., 0] < 0] += 1 104 | x[..., 1] *= sat 105 | x[..., 2] *= val 106 | x[x > 1] = 1 107 | x[x < 0] = 0 108 | image_data = hsv_to_rgb(x) # numpy array, 0 to 1 109 | 110 | # correct boxes 111 | box_data = np.zeros((max_boxes, 5)) 112 | if len(box) > 0: 113 | np.random.shuffle(box) 114 | box[:, [0, 2]] = box[:, [0, 2]] * nw / iw + dx 115 | box[:, [1, 3]] = box[:, [1, 3]] * nh / ih + dy 116 | if flip: box[:, [0, 2]] = w - box[:, [2, 0]] 117 | box[:, 0:2][box[:, 0:2] < 0] = 0 118 | box[:, 2][box[:, 2] > w] = w 119 | box[:, 3][box[:, 3] > h] = h 120 | box_w = box[:, 2] - box[:, 0] 121 | box_h = box[:, 3] - box[:, 1] 122 | box = box[np.logical_and(box_w > 1, box_h > 1)] # discard invalid box 123 | if len(box) > max_boxes: box = box[:max_boxes] 124 | box_data[:len(box)] = box 125 | 126 | return image_data, box_data -------------------------------------------------------------------------------- /xyolo/yolo3/yolo.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Class definition of YOLO_v3 style detection model on image and video 4 | """ 5 | 6 | import colorsys 7 | import os 8 | import typing 9 | from timeit import default_timer as timer 10 | 11 | import numpy as np 12 | import tensorflow as tf 13 | from PIL import Image 14 | from loguru import logger 15 | from tensorflow.keras.callbacks import TensorBoard, ModelCheckpoint, ReduceLROnPlateau, EarlyStopping 16 | from tensorflow.keras.layers import Input 17 | from tensorflow.keras.models import load_model 18 | from tensorflow.keras.optimizers import Adam 19 | from tensorflow.keras.utils import multi_gpu_model 20 | 21 | from xyolo.config import DefaultYolo3Config 22 | from xyolo.yolo3.model import create_model, create_tiny_model 23 | from xyolo.yolo3.model import yolo_eval, yolo_body, tiny_yolo_body, preprocess_true_boxes 24 | from xyolo.yolo3.utils import get_random_data 25 | from xyolo.yolo3.utils import letterbox_image 26 | 27 | 28 | class YOLO(object): 29 | 30 | def __init__(self, config=None, train=False, **kwargs): 31 | if not config: 32 | config = DefaultYolo3Config() 33 | self.config = config 34 | self.model_path = config.model_path 35 | self.anchors_path = config.anchors_path 36 | self.classes_path = config.classes_path 37 | self.score = config.score 38 | self.iou = config.iou 39 | self.model_image_size = config.model_image_size 40 | self.gpu_num = config.gpu_num 41 | self.dataset_path = config.dataset_path 42 | self.__dict__.update(kwargs) # update with user overrides 43 | self.class_names = self._get_class() 44 | self.anchors = self._get_anchors() 45 | if not train: 46 | self.load_yolo_model() 47 | 48 | def _get_class(self): 49 | classes_path = os.path.expanduser(self.classes_path) 50 | with open(classes_path) as f: 51 | class_names = f.readlines() 52 | class_names = [c.strip() for c in class_names] 53 | return class_names 54 | 55 | def _get_anchors(self): 56 | anchors_path = os.path.expanduser(self.anchors_path) 57 | with open(anchors_path) as f: 58 | anchors = f.readline() 59 | anchors = [float(x) for x in anchors.split(',')] 60 | return np.array(anchors).reshape(-1, 2) 61 | 62 | def load_yolo_model(self): 63 | self.model_path = self.config.model_path 64 | model_path = os.path.expanduser(self.model_path) 65 | assert model_path.endswith( 66 | '.h5'), 'Keras model or weights must be a .h5 file.' 67 | 68 | # Load model, or construct model and load weights. 69 | num_anchors = len(self.anchors) 70 | num_classes = len(self.class_names) 71 | is_tiny_version = num_anchors == 6 # default setting 72 | try: 73 | self.yolo_model = load_model(model_path, compile=False) 74 | except: 75 | self.yolo_model = tiny_yolo_body(Input(shape=(None, None, 3)), num_anchors // 2, num_classes) \ 76 | if is_tiny_version else yolo_body(Input(shape=(None, None, 3)), num_anchors // 3, num_classes) 77 | # make sure model, anchors and classes match 78 | self.yolo_model.load_weights(self.model_path) 79 | else: 80 | assert self.yolo_model.layers[-1].output_shape[-1] == \ 81 | num_anchors / len(self.yolo_model.output) * (num_classes + 5), \ 82 | 'Mismatch between model and given anchor and class sizes' 83 | 84 | print('{} model, anchors, and classes loaded.'.format(model_path)) 85 | 86 | # Generate colors for drawing bounding boxes. 87 | hsv_tuples = [(x / len(self.class_names), 1., 1.) 88 | for x in range(len(self.class_names))] 89 | self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples)) 90 | self.colors = list( 91 | map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), 92 | self.colors)) 93 | np.random.seed(10101) # Fixed seed for consistent colors across runs. 94 | # Shuffle colors to decorrelate adjacent classes. 95 | np.random.shuffle(self.colors) 96 | np.random.seed(None) # Reset seed to default. 97 | 98 | @tf.function 99 | def compute_output(self, image_data, image_shape): 100 | # Generate output tensor targets for filtered bounding boxes. 101 | # self.input_image_shape = K.placeholder(shape=(2,)) 102 | self.input_image_shape = tf.constant(image_shape) 103 | if self.gpu_num >= 2: 104 | self.yolo_model = multi_gpu_model( 105 | self.yolo_model, gpus=self.gpu_num) 106 | 107 | boxes, scores, classes = yolo_eval(self.yolo_model(image_data), self.anchors, 108 | len(self.class_names), self.input_image_shape, 109 | score_threshold=self.score, iou_threshold=self.iou) 110 | return boxes, scores, classes 111 | 112 | @classmethod 113 | def data_generator(cls, annotation_lines, batch_size, input_shape, anchors, num_classes): 114 | '''data generator for fit_generator''' 115 | n = len(annotation_lines) 116 | i = 0 117 | while True: 118 | image_data = [] 119 | box_data = [] 120 | for b in range(batch_size): 121 | if i == 0: 122 | np.random.shuffle(annotation_lines) 123 | image, box = get_random_data( 124 | annotation_lines[i], input_shape, random=True) 125 | image_data.append(image) 126 | box_data.append(box) 127 | i = (i + 1) % n 128 | image_data = np.array(image_data) 129 | box_data = np.array(box_data) 130 | y_true = preprocess_true_boxes( 131 | box_data, input_shape, anchors, num_classes) 132 | yield [image_data, *y_true], np.zeros(batch_size) 133 | 134 | @classmethod 135 | def data_generator_wrapper(cls, annotation_lines, batch_size, input_shape, anchors, num_classes): 136 | n = len(annotation_lines) 137 | if n == 0 or batch_size <= 0: 138 | return None 139 | return cls.data_generator(annotation_lines, batch_size, input_shape, anchors, num_classes) 140 | 141 | def fit(self, **kwargs): 142 | # 如下参数可以通过传参覆盖config里的配置,更灵活一些 143 | # 不配置默认使用config里的参数 144 | dataset_path = kwargs.get('dataset_path', self.config.dataset_path) 145 | tensorboard_log_path = kwargs.get( 146 | 'tensorboard_log_path', self.config.tensorboard_log_path) 147 | output_model_path = kwargs.get( 148 | 'output_model_path', self.config.output_model_path) 149 | frozen_train = kwargs.get('frozen_train', self.config.frozen_train) 150 | frozen_train_epochs = kwargs.get( 151 | 'frozen_train_epochs', self.config.frozen_train_epochs) 152 | frozen_batch_size = kwargs.get( 153 | 'frozen_batch_size', self.config.frozen_batch_size) 154 | frozen_lr = kwargs.get('frozen_lr', self.config.frozen_lr) 155 | unfreeze_train = kwargs.get( 156 | 'unfreeze_train', self.config.unfreeze_train) 157 | unfreeze_train_epochs = kwargs.get( 158 | 'unfreeze_train_epochs', self.config.unfreeze_train_epochs) 159 | unfreeze_batch_size = kwargs.get( 160 | 'unfreeze_batch_size', self.config.unfreeze_batch_size) 161 | unfreeze_lr = kwargs.get('unfreeze_lr', self.config.unfreeze_lr) 162 | initial_weight_path = kwargs.get( 163 | 'initial_weight_path', self.config.pre_training_weights_keras_path) 164 | use_tensorboard = kwargs.get( 165 | 'use_tensorboard', self.config.use_tensorboard) 166 | use_checkpoint = kwargs.get( 167 | 'use_checkpoint', self.config.use_checkpoint) 168 | val_split = kwargs.get('val_split', self.config.val_split) 169 | use_reduce_lr = kwargs.get('use_reduce_lr', self.config.use_reduce_lr) 170 | reduce_lr_monitor = kwargs.get( 171 | 'reduce_lr_monitor', self.config.reduce_lr_monitor) 172 | reduce_lr_factor = kwargs.get( 173 | 'reduce_lr_factor', self.config.reduce_lr_factor) 174 | reduce_lr_patience = kwargs.get( 175 | 'reduce_lr_patience', self.config.reduce_lr_patience) 176 | use_early_stopping = kwargs.get( 177 | 'use_early_stopping', self.config.use_early_stopping) 178 | early_stopping_monitor = kwargs.get( 179 | 'early_stopping_monitor', self.config.early_stopping_monitor) 180 | early_stopping_min_delta = kwargs.get( 181 | 'early_stopping_min_delta', self.config.early_stopping_min_delta) 182 | early_stopping_patience = kwargs.get( 183 | 'early_stopping_patience', self.config.early_stopping_patience) 184 | 185 | is_tiny_version = len(self.anchors) == 6 # default setting 186 | num_classes = len(self.class_names) 187 | if is_tiny_version: 188 | model = create_tiny_model(self.model_image_size, self.anchors, num_classes, 189 | freeze_body=2, weights_path=initial_weight_path) 190 | else: 191 | model = create_model(self.model_image_size, self.anchors, num_classes, 192 | freeze_body=2, 193 | weights_path=initial_weight_path) # make sure you know what you freeze 194 | 195 | logger.info('Prepare to train the model...') 196 | 197 | callbacks = [] 198 | if use_tensorboard: 199 | logging = TensorBoard(log_dir=tensorboard_log_path) 200 | callbacks.append(logging) 201 | if use_checkpoint: 202 | checkpoint = ModelCheckpoint( 203 | tensorboard_log_path + 204 | 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5', 205 | monitor='val_loss', save_weights_only=True, save_best_only=True) 206 | callbacks.append(checkpoint) 207 | 208 | logger.info('Split dataset for validate...') 209 | with open(dataset_path) as f: 210 | lines = f.readlines() 211 | np.random.seed(10101) 212 | np.random.shuffle(lines) 213 | np.random.seed(None) 214 | num_val = int(len(lines) * val_split) 215 | num_train = len(lines) - num_val 216 | 217 | logger.info('The first step training begins({} epochs).'.format( 218 | frozen_train_epochs)) 219 | # Train with frozen layers first, to get a stable loss. 220 | # Adjust num epochs to your dataset. This step is enough to obtain a not bad model. 221 | if frozen_train: 222 | model.compile(optimizer=Adam(lr=frozen_lr), loss={ 223 | # use custom yolo_loss Lambda layer. 224 | 'yolo_loss': lambda y_true, y_pred: y_pred}) 225 | 226 | batch_size = frozen_batch_size 227 | logger.info( 228 | 'Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size)) 229 | model.fit( 230 | self.data_generator_wrapper(lines[:num_train], batch_size, self.model_image_size, self.anchors, 231 | num_classes), 232 | steps_per_epoch=max(1, num_train // batch_size), 233 | validation_data=self.data_generator_wrapper(lines[num_train:], batch_size, self.model_image_size, 234 | self.anchors, 235 | num_classes), 236 | validation_steps=max(1, num_val // batch_size), 237 | epochs=frozen_train_epochs, 238 | initial_epoch=0, 239 | callbacks=callbacks) 240 | 241 | logger.info('The second step training begins({} epochs).'.format( 242 | unfreeze_train_epochs)) 243 | # Unfreeze and continue training, to fine-tune. 244 | # Train longer if the result is not good. 245 | if use_reduce_lr: 246 | reduce_lr = ReduceLROnPlateau(monitor=reduce_lr_monitor, factor=reduce_lr_factor, 247 | patience=reduce_lr_patience, verbose=1) 248 | callbacks.append(reduce_lr) 249 | if use_early_stopping: 250 | early_stopping = EarlyStopping(monitor=early_stopping_monitor, min_delta=early_stopping_min_delta, 251 | patience=early_stopping_patience, verbose=1) 252 | callbacks.append(early_stopping) 253 | if unfreeze_train: 254 | for i in range(len(model.layers)): 255 | model.layers[i].trainable = True 256 | model.compile(optimizer=Adam(lr=unfreeze_lr), 257 | loss={'yolo_loss': lambda y_true, y_pred: y_pred}) # recompile to apply the change 258 | logger.info('Unfreeze all of the layers.') 259 | 260 | # note that more GPU memory is required after unfreezing the body 261 | batch_size = unfreeze_batch_size 262 | logger.info('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, 263 | batch_size)) 264 | model.fit( 265 | self.data_generator_wrapper(lines[:num_train], batch_size, self.model_image_size, self.anchors, 266 | num_classes), 267 | steps_per_epoch=max(1, num_train // batch_size), 268 | validation_data=self.data_generator_wrapper(lines[num_train:], batch_size, self.model_image_size, 269 | self.anchors, 270 | num_classes), 271 | validation_steps=max(1, num_val // batch_size), 272 | epochs=frozen_train_epochs + unfreeze_train_epochs, 273 | initial_epoch=frozen_train_epochs, 274 | callbacks=callbacks) 275 | model.save_weights(output_model_path) 276 | logger.info('Training completed!') 277 | 278 | def detect_image(self, img: typing.Union[Image.Image, str]) -> typing.List[ 279 | typing.Tuple[str, int, float, int, int, int, int]]: 280 | """ 281 | 在给定图片上做目标检测并返回检测结果 282 | 283 | Args: 284 | img: 要检测的图片对象(PIL.Image.Image)或路径(str) 285 | 286 | Returns: 287 | [[类别名称,类别编号,概率,左上角x值,左上角y值,右下角x值,右下角y值],...] 288 | """ 289 | # 输入参数兼容str类型的图片路径和Image类型的图片文件 290 | if isinstance(img, str): 291 | image = Image.open(img) 292 | else: 293 | image = img 294 | assert isinstance(image, Image.Image) 295 | start = timer() 296 | if self.model_image_size != (None, None): 297 | assert self.model_image_size[0] % 32 == 0, 'Multiples of 32 required' 298 | assert self.model_image_size[1] % 32 == 0, 'Multiples of 32 required' 299 | boxed_image = letterbox_image( 300 | image, tuple(reversed(self.model_image_size))) 301 | else: 302 | new_image_size = (image.width - (image.width % 32), 303 | image.height - (image.height % 32)) 304 | boxed_image = letterbox_image(image, new_image_size) 305 | image_data = np.array(boxed_image, dtype='float32') 306 | 307 | image_data /= 255. 308 | image_data = np.expand_dims(image_data, 0) # Add batch dimension. 309 | 310 | out_boxes, out_scores, out_classes = self.compute_output( 311 | image_data, [image.size[1], image.size[0]]) 312 | 313 | logger.debug('Found {} boxes for {}'.format(len(out_boxes), 'img')) 314 | 315 | results = [] 316 | for i, c in reversed(list(enumerate(out_classes))): 317 | predicted_class = self.class_names[c] 318 | box = out_boxes[i] 319 | score = out_scores[i] 320 | 321 | label = '{} {:.2f}'.format(predicted_class, score) 322 | 323 | top, left, bottom, right = box 324 | top = max(0, np.floor(top + 0.5).astype('int32')) 325 | left = max(0, np.floor(left + 0.5).astype('int32')) 326 | bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32')) 327 | right = min(image.size[0], np.floor(right + 0.5).astype('int32')) 328 | results.append((predicted_class, int(c), float( 329 | score), left, top, right, bottom)) 330 | logger.debug('Class {},Position {}, {}'.format( 331 | label, (left, top), (right, bottom))) 332 | 333 | end = timer() 334 | logger.debug('Cost time {}s'.format(end - start)) 335 | return results 336 | 337 | def draw_image(self, img: typing.Union[Image.Image, str], predicted_results: typing.List[ 338 | typing.Tuple[str, int, float, int, int, int, int]], draw_label=True) -> Image.Image: 339 | """ 340 | 给定一张图片和目标检测结果,将目标检测结果绘制在图片上,并返回绘制后的图片 341 | 342 | Args: 343 | img: 要检测的图片对象(PIL.Image.Image)或路径(str) 344 | predicted_results: 目标检测的结果,[[类别名称,类别编号,概率,左上角x值,左上角y值,右下角x值,右下角y值],...] 345 | draw_label: 是否需要为框框标注类别和概率 346 | 347 | Returns: 348 | 添加了检测结果的图片对象 349 | """ 350 | import cv2 351 | # 输入参数兼容str类型的图片路径和Image类型的图片文件 352 | if isinstance(img, str): 353 | image = Image.open(img) 354 | else: 355 | image = img 356 | assert isinstance(image, Image.Image) 357 | 358 | img_array = np.asarray(image.convert('RGB')) 359 | for predicted_class, c, score, x1, y1, x2, y2 in predicted_results: 360 | color = self.colors[c] 361 | cv2.rectangle(img_array, (x1, y1), (x2, y2), color, 2) 362 | if draw_label: 363 | label = '{} {:.2f}'.format(predicted_class, score) 364 | cv2.putText(img_array, text=label, org=(x2 + 3, y1 + 10), fontFace=cv2.FONT_HERSHEY_SIMPLEX, 365 | fontScale=0.50, color=color, thickness=2) 366 | image = Image.fromarray(img_array) 367 | return image 368 | 369 | def detect_and_draw_image(self, image: typing.Union[Image.Image, str], draw_label=True) -> Image.Image: 370 | """ 371 | 在给定图片上做目标检测,并根据检测结果在图片上画出框框和标签 372 | 373 | Args: 374 | image: 要检测的图片对象(PIL.Image.Image)或路径(str) 375 | draw_label: 是否需要为框框标注类别和概率 376 | 377 | Returns: 378 | 添加了检测结果的图片对象 379 | """ 380 | predicted_results = self.detect_image(image) 381 | img = self.draw_image(image, predicted_results, draw_label=draw_label) 382 | return img 383 | 384 | def detect_video(self, video_path, output_path=""): 385 | import cv2 386 | vid = cv2.VideoCapture(video_path) 387 | if not vid.isOpened(): 388 | raise IOError("Couldn't open webcam or video") 389 | video_FourCC = int(vid.get(cv2.CAP_PROP_FOURCC)) 390 | video_fps = vid.get(cv2.CAP_PROP_FPS) 391 | video_size = (int(vid.get(cv2.CAP_PROP_FRAME_WIDTH)), 392 | int(vid.get(cv2.CAP_PROP_FRAME_HEIGHT))) 393 | isOutput = True if output_path != "" else False 394 | if isOutput: 395 | print("!!! TYPE:", type(output_path), type( 396 | video_FourCC), type(video_fps), type(video_size)) 397 | out = cv2.VideoWriter( 398 | output_path, video_FourCC, video_fps, video_size) 399 | accum_time = 0 400 | curr_fps = 0 401 | fps = "FPS: ??" 402 | prev_time = timer() 403 | while True: 404 | return_value, frame = vid.read() 405 | b, g, r = cv2.split(frame) 406 | frame = cv2.merge([r, g, b]) 407 | image = Image.fromarray(frame) 408 | image = self.detect_and_draw_image(image) 409 | result = np.asarray(image) 410 | r, g, b = cv2.split(result) 411 | result = cv2.merge([b, g, r]) 412 | curr_time = timer() 413 | exec_time = curr_time - prev_time 414 | prev_time = curr_time 415 | accum_time = accum_time + exec_time 416 | curr_fps = curr_fps + 1 417 | if accum_time > 1: 418 | accum_time = accum_time - 1 419 | fps = "FPS: " + str(curr_fps) 420 | curr_fps = 0 421 | cv2.putText(result, text=fps, org=(3, 15), fontFace=cv2.FONT_HERSHEY_SIMPLEX, 422 | fontScale=0.50, color=(255, 0, 0), thickness=2) 423 | cv2.namedWindow("result", cv2.WINDOW_NORMAL) 424 | cv2.imshow("result", result) 425 | if isOutput: 426 | out.write(result) 427 | if cv2.waitKey(1) & 0xFF == ord('q'): 428 | break 429 | --------------------------------------------------------------------------------