├── .gitignore ├── LICENSE ├── README.md ├── VOCdevkit └── VOC2007 │ ├── Annotations │ └── README.md │ ├── ImageSets │ └── Main │ │ └── README.md │ └── JPEGImages │ └── README.md ├── efficientdet.py ├── get_map.py ├── img └── street.jpg ├── logs └── README.md ├── model_data ├── coco_classes.txt ├── simhei.ttf └── voc_classes.txt ├── nets ├── __init__.py ├── efficientdet.py ├── efficientdet_training.py └── efficientnet.py ├── predict.py ├── requirements.txt ├── summary.py ├── train.py ├── utils ├── __init__.py ├── anchors.py ├── callbacks.py ├── dataloader.py ├── utils.py ├── utils_bbox.py └── utils_map.py ├── vision_for_anchors.py ├── voc_annotation.py └── 常见问题汇总.md /.gitignore: -------------------------------------------------------------------------------- 1 | # ignore map, miou, datasets 2 | map_out/ 3 | miou_out/ 4 | VOCdevkit/ 5 | datasets/ 6 | Medical_Datasets/ 7 | lfw/ 8 | logs/ 9 | model_data/ 10 | .temp_map_out/ 11 | 12 | # Byte-compiled / optimized / DLL files 13 | __pycache__/ 14 | *.py[cod] 15 | *$py.class 16 | 17 | # C extensions 18 | *.so 19 | 20 | # Distribution / packaging 21 | .Python 22 | build/ 23 | develop-eggs/ 24 | dist/ 25 | downloads/ 26 | eggs/ 27 | .eggs/ 28 | lib/ 29 | lib64/ 30 | parts/ 31 | sdist/ 32 | var/ 33 | wheels/ 34 | pip-wheel-metadata/ 35 | share/python-wheels/ 36 | *.egg-info/ 37 | .installed.cfg 38 | *.egg 39 | MANIFEST 40 | 41 | # PyInstaller 42 | # Usually these files are written by a python script from a template 43 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 44 | *.manifest 45 | *.spec 46 | 47 | # Installer logs 48 | pip-log.txt 49 | pip-delete-this-directory.txt 50 | 51 | # Unit test / coverage reports 52 | htmlcov/ 53 | .tox/ 54 | .nox/ 55 | .coverage 56 | .coverage.* 57 | .cache 58 | nosetests.xml 59 | coverage.xml 60 | *.cover 61 | *.py,cover 62 | .hypothesis/ 63 | .pytest_cache/ 64 | 65 | # Translations 66 | *.mo 67 | *.pot 68 | 69 | # Django stuff: 70 | *.log 71 | local_settings.py 72 | db.sqlite3 73 | db.sqlite3-journal 74 | 75 | # Flask stuff: 76 | instance/ 77 | .webassets-cache 78 | 79 | # Scrapy stuff: 80 | .scrapy 81 | 82 | # Sphinx documentation 83 | docs/_build/ 84 | 85 | # PyBuilder 86 | target/ 87 | 88 | # Jupyter Notebook 89 | .ipynb_checkpoints 90 | 91 | # IPython 92 | profile_default/ 93 | ipython_config.py 94 | 95 | # pyenv 96 | .python-version 97 | 98 | # pipenv 99 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 100 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 101 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 102 | # install all needed dependencies. 103 | #Pipfile.lock 104 | 105 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 106 | __pypackages__/ 107 | 108 | # Celery stuff 109 | celerybeat-schedule 110 | celerybeat.pid 111 | 112 | # SageMath parsed files 113 | *.sage.py 114 | 115 | # Environments 116 | .env 117 | .venv 118 | env/ 119 | venv/ 120 | ENV/ 121 | env.bak/ 122 | venv.bak/ 123 | 124 | # Spyder project settings 125 | .spyderproject 126 | .spyproject 127 | 128 | # Rope project settings 129 | .ropeproject 130 | 131 | # mkdocs documentation 132 | /site 133 | 134 | # mypy 135 | .mypy_cache/ 136 | .dmypy.json 137 | dmypy.json 138 | 139 | # Pyre type checker 140 | .pyre/ 141 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Bubbliiiing 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Efficientdet:Scalable and Efficient Object目标检测模型在Keras当中的实现 2 | --- 3 | 4 | ## 目录 5 | 1. [仓库更新 Top News](#仓库更新) 6 | 2. [性能情况 Performance](#性能情况) 7 | 3. [所需环境 Environment](#所需环境) 8 | 4. [注意事项 Attention](#注意事项) 9 | 5. [文件下载 Download](#文件下载) 10 | 6. [训练步骤 How2train](#训练步骤) 11 | 7. [预测步骤 How2predict](#预测步骤) 12 | 8. [评估步骤 How2eval](#评估步骤) 13 | 9. [参考资料 Reference](#Reference) 14 | 15 | ## Top News 16 | **`2022-04`**:**进行了大幅度的更新,支持step、cos学习率下降法、支持adam、sgd优化器选择、支持学习率根据batch_size自适应调整、新增图片裁剪。支持多GPU训练,新增各个种类目标数量计算。** 17 | BiliBili视频中的原仓库地址为:https://github.com/bubbliiiing/efficientdet-keras/tree/bilibili 18 | 19 | **`2021-10`**:**进行了大幅度的更新,增加了大量注释、增加了大量可调整参数、对代码的组成模块进行修改、增加fps、视频预测、批量预测等功能。** 20 | 21 | ### 性能情况 22 | | 训练数据集 | 权值文件名称 | 测试数据集 | 输入图片大小 | mAP 0.5:0.95 | mAP 0.5 | 23 | | :-----: | :-----: | :------: | :------: | :------: | :-----: | 24 | | VOC07+12 | [efficientdet-d0-voc.h5](https://github.com/bubbliiiing/efficientdet-keras/releases/download/v1.0/efficientdet-d0-voc.h5) | VOC-Test07 | 512x512| - | 83.2 25 | | VOC07+12 | [efficientdet-d1-voc.h5](https://github.com/bubbliiiing/efficientdet-keras/releases/download/v1.0/efficientdet-d1-voc.h5) | VOC-Test07 | 640x640| - | 84.2 26 | 27 | ### 所需环境 28 | tensorflow-gpu==1.13.1 29 | keras==2.1.5 30 | 31 | ### 文件下载 32 | 训练所需的h5可以在百度网盘下载。其中包括Efficientdet-D0和Efficientdet-D1的voc权重,可以直接用于预测;还有Efficientnet-b0到Efficientnet-b7的权重,可用于迁移学习。 33 | 链接: https://pan.baidu.com/s/1i8Xf5gis4_64ZGLdkUYztw 34 | 提取码: tcpw 35 | 36 | VOC数据集下载地址如下,里面已经包括了训练集、测试集、验证集(与测试集一样),无需再次划分: 37 | 链接: https://pan.baidu.com/s/1-1Ej6dayrx3g0iAA88uY5A 38 | 提取码: ph32 39 | 40 | ## 训练步骤 41 | ### a、训练VOC07+12数据集 42 | 1. 数据集的准备 43 | **本文使用VOC格式进行训练,训练前需要下载好VOC07+12的数据集,解压后放在根目录** 44 | 45 | 2. 数据集的处理 46 | 修改voc_annotation.py里面的annotation_mode=2,运行voc_annotation.py生成根目录下的2007_train.txt和2007_val.txt。 47 | 48 | 3. 开始网络训练 49 | train.py的默认参数用于训练VOC数据集,直接运行train.py即可开始训练。 50 | 51 | 4. 训练结果预测 52 | 训练结果预测需要用到两个文件,分别是efficientdet.py和predict.py。我们首先需要去efficientdet.py里面修改model_path以及classes_path,这两个参数必须要修改。 53 | **model_path指向训练好的权值文件,在logs文件夹里。 54 | classes_path指向检测类别所对应的txt。** 55 | 完成修改后就可以运行predict.py进行检测了。运行后输入图片路径即可检测。 56 | 57 | ### b、训练自己的数据集 58 | 1. 数据集的准备 59 | **本文使用VOC格式进行训练,训练前需要自己制作好数据集,** 60 | 训练前将标签文件放在VOCdevkit文件夹下的VOC2007文件夹下的Annotation中。 61 | 训练前将图片文件放在VOCdevkit文件夹下的VOC2007文件夹下的JPEGImages中。 62 | 63 | 2. 数据集的处理 64 | 在完成数据集的摆放之后,我们需要利用voc_annotation.py获得训练用的2007_train.txt和2007_val.txt。 65 | 修改voc_annotation.py里面的参数。第一次训练可以仅修改classes_path,classes_path用于指向检测类别所对应的txt。 66 | 训练自己的数据集时,可以自己建立一个cls_classes.txt,里面写自己所需要区分的类别。 67 | model_data/cls_classes.txt文件内容为: 68 | ```python 69 | cat 70 | dog 71 | ... 72 | ``` 73 | 修改voc_annotation.py中的classes_path,使其对应cls_classes.txt,并运行voc_annotation.py。 74 | 75 | 3. 开始网络训练 76 | **训练的参数较多,均在train.py中,大家可以在下载库后仔细看注释,其中最重要的部分依然是train.py里的classes_path。** 77 | **classes_path用于指向检测类别所对应的txt,这个txt和voc_annotation.py里面的txt一样!训练自己的数据集必须要修改!** 78 | 修改完classes_path后就可以运行train.py开始训练了,在训练多个epoch后,权值会生成在logs文件夹中。 79 | 80 | 4. 训练结果预测 81 | 训练结果预测需要用到两个文件,分别是efficientdet.py和predict.py。在efficientdet.py里面修改model_path以及classes_path。 82 | **model_path指向训练好的权值文件,在logs文件夹里。 83 | classes_path指向检测类别所对应的txt。** 84 | 完成修改后就可以运行predict.py进行检测了。运行后输入图片路径即可检测。 85 | 86 | ## 预测步骤 87 | ### a、使用预训练权重 88 | 1. 下载完库后解压,在百度网盘下载权值,放入model_data,运行predict.py,输入 89 | ```python 90 | img/street.jpg 91 | ``` 92 | 2. 在predict.py里面进行设置可以进行fps测试和video视频检测。 93 | ### b、使用自己训练的权重 94 | 1. 按照训练步骤训练。 95 | 2. 在efficientdet.py文件里面,在如下部分修改model_path和classes_path使其对应训练好的文件;**model_path对应logs文件夹下面的权值文件,classes_path是model_path对应分的类**。 96 | ```python 97 | _defaults = { 98 | #--------------------------------------------------------------------------# 99 | # 使用自己训练好的模型进行预测一定要修改model_path和classes_path! 100 | # model_path指向logs文件夹下的权值文件,classes_path指向model_data下的txt 101 | # 如果出现shape不匹配,同时要注意训练时的model_path和classes_path参数的修改 102 | #--------------------------------------------------------------------------# 103 | "model_path" : 'model_data/efficientdet-d0-voc.h5', 104 | "classes_path" : 'model_data/voc_classes.txt', 105 | #---------------------------------------------------------------------# 106 | # 用于选择所使用的模型的版本,0-7 107 | #---------------------------------------------------------------------# 108 | "phi" : 0, 109 | #---------------------------------------------------------------------# 110 | # 只有得分大于置信度的预测框会被保留下来 111 | #---------------------------------------------------------------------# 112 | "confidence" : 0.3, 113 | #---------------------------------------------------------------------# 114 | # 非极大抑制所用到的nms_iou大小 115 | #---------------------------------------------------------------------# 116 | "nms_iou" : 0.3, 117 | #---------------------------------------------------------------------# 118 | # 使用到的先验框的大小 119 | #---------------------------------------------------------------------# 120 | 'anchors_size' : [32, 64, 128, 256, 512], 121 | #---------------------------------------------------------------------# 122 | # 该变量用于控制是否使用letterbox_image对输入图像进行不失真的resize, 123 | # 在多次测试后,发现关闭letterbox_image直接resize的效果更好 124 | #---------------------------------------------------------------------# 125 | "letterbox_image" : False, 126 | } 127 | ``` 128 | 3. 运行predict.py,输入 129 | ```python 130 | img/street.jpg 131 | ``` 132 | 4. 在predict.py里面进行设置可以进行fps测试和video视频检测。 133 | 134 | ## 评估步骤 135 | ### a、评估VOC07+12的测试集 136 | 1. 本文使用VOC格式进行评估。VOC07+12已经划分好了测试集,无需利用voc_annotation.py生成ImageSets文件夹下的txt。 137 | 2. 在efficientdet.py里面修改model_path以及classes_path。**model_path指向训练好的权值文件,在logs文件夹里。classes_path指向检测类别所对应的txt。** 138 | 3. 运行get_map.py即可获得评估结果,评估结果会保存在map_out文件夹中。 139 | 140 | ### b、评估自己的数据集 141 | 1. 本文使用VOC格式进行评估。 142 | 2. 如果在训练前已经运行过voc_annotation.py文件,代码会自动将数据集划分成训练集、验证集和测试集。如果想要修改测试集的比例,可以修改voc_annotation.py文件下的trainval_percent。trainval_percent用于指定(训练集+验证集)与测试集的比例,默认情况下 (训练集+验证集):测试集 = 9:1。train_percent用于指定(训练集+验证集)中训练集与验证集的比例,默认情况下 训练集:验证集 = 9:1。 143 | 3. 利用voc_annotation.py划分测试集后,前往get_map.py文件修改classes_path,classes_path用于指向检测类别所对应的txt,这个txt和训练时的txt一样。评估自己的数据集必须要修改。 144 | 4. 在efficientdet.py里面修改model_path以及classes_path。**model_path指向训练好的权值文件,在logs文件夹里。classes_path指向检测类别所对应的txt。** 145 | 5. 运行get_map.py即可获得评估结果,评估结果会保存在map_out文件夹中。 146 | 147 | ### Reference 148 | https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch 149 | https://github.com/Cartucho/mAP 150 | -------------------------------------------------------------------------------- /VOCdevkit/VOC2007/Annotations/README.md: -------------------------------------------------------------------------------- 1 | 存放标签文件 -------------------------------------------------------------------------------- /VOCdevkit/VOC2007/ImageSets/Main/README.md: -------------------------------------------------------------------------------- 1 | 存放训练索引文件 -------------------------------------------------------------------------------- /VOCdevkit/VOC2007/JPEGImages/README.md: -------------------------------------------------------------------------------- 1 | 存放图片文件 -------------------------------------------------------------------------------- /efficientdet.py: -------------------------------------------------------------------------------- 1 | import colorsys 2 | import os 3 | import time 4 | 5 | import numpy as np 6 | from PIL import ImageDraw, ImageFont 7 | 8 | from nets.efficientdet import efficientdet 9 | from utils.anchors import get_anchors 10 | from utils.utils import (cvtColor, get_classes, image_sizes, preprocess_input, 11 | resize_image, show_config) 12 | from utils.utils_bbox import BBoxUtility 13 | 14 | ''' 15 | 训练自己的数据集必看! 16 | ''' 17 | class Efficientdet(object): 18 | _defaults = { 19 | #--------------------------------------------------------------------------# 20 | # 使用自己训练好的模型进行预测一定要修改model_path和classes_path! 21 | # model_path指向logs文件夹下的权值文件,classes_path指向model_data下的txt 22 | # 23 | # 训练好后logs文件夹下存在多个权值文件,选择验证集损失较低的即可。 24 | # 验证集损失较低不代表mAP较高,仅代表该权值在验证集上泛化性能较好。 25 | # 如果出现shape不匹配,同时要注意训练时的model_path和classes_path参数的修改 26 | #--------------------------------------------------------------------------# 27 | "model_path" : 'model_data/efficientdet-d0-voc.h5', 28 | "classes_path" : 'model_data/voc_classes.txt', 29 | #---------------------------------------------------------------------# 30 | # 用于选择所使用的模型的版本,0-7 31 | #---------------------------------------------------------------------# 32 | "phi" : 0, 33 | #---------------------------------------------------------------------# 34 | # 只有得分大于置信度的预测框会被保留下来 35 | #---------------------------------------------------------------------# 36 | "confidence" : 0.3, 37 | #---------------------------------------------------------------------# 38 | # 非极大抑制所用到的nms_iou大小 39 | #---------------------------------------------------------------------# 40 | "nms_iou" : 0.3, 41 | #---------------------------------------------------------------------# 42 | # 使用到的先验框的大小 43 | #---------------------------------------------------------------------# 44 | 'anchors_size' : [32, 64, 128, 256, 512], 45 | #---------------------------------------------------------------------# 46 | # 该变量用于控制是否使用letterbox_image对输入图像进行不失真的resize, 47 | # 在多次测试后,发现关闭letterbox_image直接resize的效果更好 48 | #---------------------------------------------------------------------# 49 | "letterbox_image" : False, 50 | } 51 | 52 | @classmethod 53 | def get_defaults(cls, n): 54 | if n in cls._defaults: 55 | return cls._defaults[n] 56 | else: 57 | return "Unrecognized attribute name '" + n + "'" 58 | 59 | #---------------------------------------------------# 60 | # 初始化efficientdet 61 | #---------------------------------------------------# 62 | def __init__(self, **kwargs): 63 | self.__dict__.update(self._defaults) 64 | for name, value in kwargs.items(): 65 | setattr(self, name, value) 66 | self._defaults[name] = value 67 | 68 | self.input_shape = [image_sizes[self.phi], image_sizes[self.phi]] 69 | #---------------------------------------------------# 70 | # 计算总的类的数量 71 | #---------------------------------------------------# 72 | self.class_names, self.num_classes = get_classes(self.classes_path) 73 | self.anchors = get_anchors(self.input_shape, self.anchors_size) 74 | 75 | #---------------------------------------------------# 76 | # 画框设置不同的颜色 77 | #---------------------------------------------------# 78 | hsv_tuples = [(x / self.num_classes, 1., 1.) for x in range(self.num_classes)] 79 | self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples)) 80 | self.colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), self.colors)) 81 | 82 | self.bbox_util = BBoxUtility(self.num_classes, nms_thresh=self.nms_iou) 83 | self.generate() 84 | 85 | show_config(**self._defaults) 86 | 87 | #---------------------------------------------------# 88 | # 载入模型 89 | #---------------------------------------------------# 90 | def generate(self): 91 | model_path = os.path.expanduser(self.model_path) 92 | assert model_path.endswith('.h5'), 'Keras model or weights must be a .h5 file.' 93 | 94 | #-------------------------------# 95 | # 载入模型与权值 96 | #-------------------------------# 97 | self.efficientdet = efficientdet([self.input_shape[0], self.input_shape[1], 3], self.phi, self.num_classes) 98 | self.efficientdet.load_weights(self.model_path) 99 | print('{} model, anchors, and classes loaded.'.format(model_path)) 100 | 101 | #---------------------------------------------------# 102 | # 检测图片 103 | #---------------------------------------------------# 104 | def detect_image(self, image, crop = False, count = False): 105 | #---------------------------------------------------# 106 | # 获得输入图片的高和宽 107 | #---------------------------------------------------# 108 | image_shape = np.array(np.shape(image)[0:2]) 109 | #---------------------------------------------------------# 110 | # 在这里将图像转换成RGB图像,防止灰度图在预测时报错。 111 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 112 | #---------------------------------------------------------# 113 | image = cvtColor(image) 114 | #---------------------------------------------------------# 115 | # 给图像增加灰条,实现不失真的resize 116 | # 也可以直接resize进行识别 117 | #---------------------------------------------------------# 118 | image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image) 119 | #---------------------------------------------------------# 120 | # 添加上batch_size维度,图片预处理,归一化。 121 | #---------------------------------------------------------# 122 | image_data = preprocess_input(np.expand_dims(np.array(image_data, dtype='float32'), 0)) 123 | 124 | preds = self.efficientdet.predict(image_data) 125 | #-----------------------------------------------------------# 126 | # 将预测结果进行解码 127 | #-----------------------------------------------------------# 128 | results = self.bbox_util.decode_box(preds, self.anchors, image_shape, 129 | self.input_shape, self.letterbox_image, confidence=self.confidence) 130 | #--------------------------------------# 131 | # 如果没有检测到物体,则返回原图 132 | #--------------------------------------# 133 | if results[0] is None: 134 | return image 135 | 136 | top_label = np.array(results[0][:, 5], dtype = 'int32') 137 | top_conf = results[0][:, 4] 138 | top_boxes = results[0][:, :4] 139 | #---------------------------------------------------------# 140 | # 设置字体与边框厚度 141 | #---------------------------------------------------------# 142 | font = ImageFont.truetype(font='model_data/simhei.ttf', size=np.floor(3e-2 * np.shape(image)[1] + 0.5).astype('int32')) 143 | thickness = max((np.shape(image)[0] + np.shape(image)[1]) // self.input_shape[0], 1) 144 | #---------------------------------------------------------# 145 | # 计数 146 | #---------------------------------------------------------# 147 | if count: 148 | print("top_label:", top_label) 149 | classes_nums = np.zeros([self.num_classes]) 150 | for i in range(self.num_classes): 151 | num = np.sum(top_label == i) 152 | if num > 0: 153 | print(self.class_names[i], " : ", num) 154 | classes_nums[i] = num 155 | print("classes_nums:", classes_nums) 156 | #---------------------------------------------------------# 157 | # 是否进行目标的裁剪 158 | #---------------------------------------------------------# 159 | if crop: 160 | for i, c in list(enumerate(top_label)): 161 | top, left, bottom, right = top_boxes[i] 162 | top = max(0, np.floor(top).astype('int32')) 163 | left = max(0, np.floor(left).astype('int32')) 164 | bottom = min(image.size[1], np.floor(bottom).astype('int32')) 165 | right = min(image.size[0], np.floor(right).astype('int32')) 166 | 167 | dir_save_path = "img_crop" 168 | if not os.path.exists(dir_save_path): 169 | os.makedirs(dir_save_path) 170 | crop_image = image.crop([left, top, right, bottom]) 171 | crop_image.save(os.path.join(dir_save_path, "crop_" + str(i) + ".png"), quality=95, subsampling=0) 172 | print("save crop_" + str(i) + ".png to " + dir_save_path) 173 | #---------------------------------------------------------# 174 | # 图像绘制 175 | #---------------------------------------------------------# 176 | for i, c in list(enumerate(top_label)): 177 | predicted_class = self.class_names[int(c)] 178 | box = top_boxes[i] 179 | score = top_conf[i] 180 | 181 | top, left, bottom, right = box 182 | 183 | top = max(0, np.floor(top).astype('int32')) 184 | left = max(0, np.floor(left).astype('int32')) 185 | bottom = min(image.size[1], np.floor(bottom).astype('int32')) 186 | right = min(image.size[0], np.floor(right).astype('int32')) 187 | 188 | label = '{} {:.2f}'.format(predicted_class, score) 189 | draw = ImageDraw.Draw(image) 190 | label_size = draw.textsize(label, font) 191 | label = label.encode('utf-8') 192 | print(label, top, left, bottom, right) 193 | 194 | if top - label_size[1] >= 0: 195 | text_origin = np.array([left, top - label_size[1]]) 196 | else: 197 | text_origin = np.array([left, top + 1]) 198 | 199 | for i in range(thickness): 200 | draw.rectangle([left + i, top + i, right - i, bottom - i], outline=self.colors[c]) 201 | draw.rectangle([tuple(text_origin), tuple(text_origin + label_size)], fill=self.colors[c]) 202 | draw.text(text_origin, str(label,'UTF-8'), fill=(0, 0, 0), font=font) 203 | del draw 204 | 205 | return image 206 | 207 | def get_FPS(self, image, test_interval): 208 | image_shape = np.array(np.shape(image)[0:2]) 209 | #---------------------------------------------------------# 210 | # 在这里将图像转换成RGB图像,防止灰度图在预测时报错。 211 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 212 | #---------------------------------------------------------# 213 | image = cvtColor(image) 214 | #---------------------------------------------------------# 215 | # 给图像增加灰条,实现不失真的resize 216 | # 也可以直接resize进行识别 217 | #---------------------------------------------------------# 218 | image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image) 219 | #---------------------------------------------------------# 220 | # 添加上batch_size维度,图片预处理,归一化。 221 | #---------------------------------------------------------# 222 | image_data = preprocess_input(np.expand_dims(np.array(image_data, dtype='float32'), 0)) 223 | 224 | preds = self.efficientdet.predict(image_data) 225 | #-----------------------------------------------------------# 226 | # 将预测结果进行解码 227 | #-----------------------------------------------------------# 228 | results = self.bbox_util.decode_box(preds, self.anchors, image_shape, 229 | self.input_shape, self.letterbox_image, confidence=self.confidence) 230 | t1 = time.time() 231 | for _ in range(test_interval): 232 | preds = self.efficientdet.predict(image_data) 233 | #-----------------------------------------------------------# 234 | # 将预测结果进行解码 235 | #-----------------------------------------------------------# 236 | results = self.bbox_util.decode_box(preds, self.anchors, image_shape, 237 | self.input_shape, self.letterbox_image, confidence=self.confidence) 238 | t2 = time.time() 239 | tact_time = (t2 - t1) / test_interval 240 | return tact_time 241 | 242 | def get_map_txt(self, image_id, image, class_names, map_out_path): 243 | f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"),"w") 244 | image_shape = np.array(np.shape(image)[0:2]) 245 | #---------------------------------------------------------# 246 | # 在这里将图像转换成RGB图像,防止灰度图在预测时报错。 247 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 248 | #---------------------------------------------------------# 249 | image = cvtColor(image) 250 | #---------------------------------------------------------# 251 | # 给图像增加灰条,实现不失真的resize 252 | # 也可以直接resize进行识别 253 | #---------------------------------------------------------# 254 | image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image) 255 | #---------------------------------------------------------# 256 | # 添加上batch_size维度,图片预处理,归一化。 257 | #---------------------------------------------------------# 258 | image_data = preprocess_input(np.expand_dims(np.array(image_data, dtype='float32'), 0)) 259 | 260 | preds = self.efficientdet.predict(image_data) 261 | #-----------------------------------------------------------# 262 | # 将预测结果进行解码 263 | #-----------------------------------------------------------# 264 | results = self.bbox_util.decode_box(preds, self.anchors, image_shape, 265 | self.input_shape, self.letterbox_image, confidence=self.confidence) 266 | #--------------------------------------# 267 | # 如果没有检测到物体,则返回原图 268 | #--------------------------------------# 269 | if results[0] is None: 270 | return 271 | 272 | top_label = results[0][:, 5] 273 | top_conf = results[0][:, 4] 274 | top_boxes = results[0][:, :4] 275 | 276 | for i, c in list(enumerate(top_label)): 277 | predicted_class = self.class_names[int(c)] 278 | box = top_boxes[i] 279 | score = str(top_conf[i]) 280 | 281 | top, left, bottom, right = box 282 | 283 | if predicted_class not in class_names: 284 | continue 285 | 286 | f.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom)))) 287 | 288 | f.close() 289 | return 290 | -------------------------------------------------------------------------------- /get_map.py: -------------------------------------------------------------------------------- 1 | import os 2 | import xml.etree.ElementTree as ET 3 | 4 | from PIL import Image 5 | from tqdm import tqdm 6 | 7 | from efficientdet import Efficientdet 8 | from utils.utils import get_classes 9 | from utils.utils_map import get_coco_map, get_map 10 | 11 | if __name__ == "__main__": 12 | ''' 13 | Recall和Precision不像AP是一个面积的概念,因此在门限值(Confidence)不同时,网络的Recall和Precision值是不同的。 14 | 默认情况下,本代码计算的Recall和Precision代表的是当门限值(Confidence)为0.5时,所对应的Recall和Precision值。 15 | 16 | 受到mAP计算原理的限制,网络在计算mAP时需要获得近乎所有的预测框,这样才可以计算不同门限条件下的Recall和Precision值 17 | 因此,本代码获得的map_out/detection-results/里面的txt的框的数量一般会比直接predict多一些,目的是列出所有可能的预测框, 18 | ''' 19 | #------------------------------------------------------------------------------------------------------------------# 20 | # map_mode用于指定该文件运行时计算的内容 21 | # map_mode为0代表整个map计算流程,包括获得预测结果、获得真实框、计算VOC_map。 22 | # map_mode为1代表仅仅获得预测结果。 23 | # map_mode为2代表仅仅获得真实框。 24 | # map_mode为3代表仅仅计算VOC_map。 25 | # map_mode为4代表利用COCO工具箱计算当前数据集的0.50:0.95map。需要获得预测结果、获得真实框后并安装pycocotools才行 26 | #-------------------------------------------------------------------------------------------------------------------# 27 | map_mode = 0 28 | #--------------------------------------------------------------------------------------# 29 | # 此处的classes_path用于指定需要测量VOC_map的类别 30 | # 一般情况下与训练和预测所用的classes_path一致即可 31 | #--------------------------------------------------------------------------------------# 32 | classes_path = 'model_data/voc_classes.txt' 33 | #--------------------------------------------------------------------------------------# 34 | # MINOVERLAP用于指定想要获得的mAP0.x,mAP0.x的意义是什么请同学们百度一下。 35 | # 比如计算mAP0.75,可以设定MINOVERLAP = 0.75。 36 | # 37 | # 当某一预测框与真实框重合度大于MINOVERLAP时,该预测框被认为是正样本,否则为负样本。 38 | # 因此MINOVERLAP的值越大,预测框要预测的越准确才能被认为是正样本,此时算出来的mAP值越低, 39 | #--------------------------------------------------------------------------------------# 40 | MINOVERLAP = 0.5 41 | #--------------------------------------------------------------------------------------# 42 | # 受到mAP计算原理的限制,网络在计算mAP时需要获得近乎所有的预测框,这样才可以计算mAP 43 | # 因此,confidence的值应当设置的尽量小进而获得全部可能的预测框。 44 | # 45 | # 该值一般不调整。因为计算mAP需要获得近乎所有的预测框,此处的confidence不能随便更改。 46 | # 想要获得不同门限值下的Recall和Precision值,请修改下方的score_threhold。 47 | #--------------------------------------------------------------------------------------# 48 | confidence = 0.02 49 | #--------------------------------------------------------------------------------------# 50 | # 预测时使用到的非极大抑制值的大小,越大表示非极大抑制越不严格。 51 | # 52 | # 该值一般不调整。 53 | #--------------------------------------------------------------------------------------# 54 | nms_iou = 0.5 55 | #---------------------------------------------------------------------------------------------------------------# 56 | # Recall和Precision不像AP是一个面积的概念,因此在门限值不同时,网络的Recall和Precision值是不同的。 57 | # 58 | # 默认情况下,本代码计算的Recall和Precision代表的是当门限值为0.5(此处定义为score_threhold)时所对应的Recall和Precision值。 59 | # 因为计算mAP需要获得近乎所有的预测框,上面定义的confidence不能随便更改。 60 | # 这里专门定义一个score_threhold用于代表门限值,进而在计算mAP时找到门限值对应的Recall和Precision值。 61 | #---------------------------------------------------------------------------------------------------------------# 62 | score_threhold = 0.5 63 | #-------------------------------------------------------# 64 | # map_vis用于指定是否开启VOC_map计算的可视化 65 | #-------------------------------------------------------# 66 | map_vis = False 67 | #-------------------------------------------------------# 68 | # 指向VOC数据集所在的文件夹 69 | # 默认指向根目录下的VOC数据集 70 | #-------------------------------------------------------# 71 | VOCdevkit_path = 'VOCdevkit' 72 | #-------------------------------------------------------# 73 | # 结果输出的文件夹,默认为map_out 74 | #-------------------------------------------------------# 75 | map_out_path = 'map_out' 76 | 77 | image_ids = open(os.path.join(VOCdevkit_path, "VOC2007/ImageSets/Main/test.txt")).read().strip().split() 78 | 79 | if not os.path.exists(map_out_path): 80 | os.makedirs(map_out_path) 81 | if not os.path.exists(os.path.join(map_out_path, 'ground-truth')): 82 | os.makedirs(os.path.join(map_out_path, 'ground-truth')) 83 | if not os.path.exists(os.path.join(map_out_path, 'detection-results')): 84 | os.makedirs(os.path.join(map_out_path, 'detection-results')) 85 | if not os.path.exists(os.path.join(map_out_path, 'images-optional')): 86 | os.makedirs(os.path.join(map_out_path, 'images-optional')) 87 | 88 | class_names, _ = get_classes(classes_path) 89 | 90 | if map_mode == 0 or map_mode == 1: 91 | print("Load model.") 92 | efficientdet = Efficientdet(confidence = confidence, nms_iou = nms_iou) 93 | print("Load model done.") 94 | 95 | print("Get predict result.") 96 | for image_id in tqdm(image_ids): 97 | image_path = os.path.join(VOCdevkit_path, "VOC2007/JPEGImages/"+image_id+".jpg") 98 | image = Image.open(image_path) 99 | if map_vis: 100 | image.save(os.path.join(map_out_path, "images-optional/" + image_id + ".jpg")) 101 | efficientdet.get_map_txt(image_id, image, class_names, map_out_path) 102 | print("Get predict result done.") 103 | 104 | if map_mode == 0 or map_mode == 2: 105 | print("Get ground truth result.") 106 | for image_id in tqdm(image_ids): 107 | with open(os.path.join(map_out_path, "ground-truth/"+image_id+".txt"), "w") as new_f: 108 | root = ET.parse(os.path.join(VOCdevkit_path, "VOC2007/Annotations/"+image_id+".xml")).getroot() 109 | for obj in root.findall('object'): 110 | difficult_flag = False 111 | if obj.find('difficult')!=None: 112 | difficult = obj.find('difficult').text 113 | if int(difficult)==1: 114 | difficult_flag = True 115 | obj_name = obj.find('name').text 116 | if obj_name not in class_names: 117 | continue 118 | bndbox = obj.find('bndbox') 119 | left = bndbox.find('xmin').text 120 | top = bndbox.find('ymin').text 121 | right = bndbox.find('xmax').text 122 | bottom = bndbox.find('ymax').text 123 | 124 | if difficult_flag: 125 | new_f.write("%s %s %s %s %s difficult\n" % (obj_name, left, top, right, bottom)) 126 | else: 127 | new_f.write("%s %s %s %s %s\n" % (obj_name, left, top, right, bottom)) 128 | print("Get ground truth result done.") 129 | 130 | if map_mode == 0 or map_mode == 3: 131 | print("Get map.") 132 | get_map(MINOVERLAP, True, score_threhold = score_threhold, path = map_out_path) 133 | print("Get map done.") 134 | 135 | if map_mode == 4: 136 | print("Get map.") 137 | get_coco_map(class_names = class_names, path = map_out_path) 138 | print("Get map done.") 139 | -------------------------------------------------------------------------------- /img/street.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bubbliiiing/efficientdet-keras/16a5139bad0ebdbff704c00b942d19238db1ead3/img/street.jpg -------------------------------------------------------------------------------- /logs/README.md: -------------------------------------------------------------------------------- 1 | 存放权值文件 -------------------------------------------------------------------------------- /model_data/coco_classes.txt: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorbike 5 | aeroplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | 13 | stop sign 14 | parking meter 15 | bench 16 | bird 17 | cat 18 | dog 19 | horse 20 | sheep 21 | cow 22 | elephant 23 | bear 24 | zebra 25 | giraffe 26 | 27 | backpack 28 | umbrella 29 | 30 | 31 | handbag 32 | tie 33 | suitcase 34 | frisbee 35 | skis 36 | snowboard 37 | sports ball 38 | kite 39 | baseball bat 40 | baseball glove 41 | skateboard 42 | surfboard 43 | tennis racket 44 | bottle 45 | 46 | wine glass 47 | cup 48 | fork 49 | knife 50 | spoon 51 | bowl 52 | banana 53 | apple 54 | sandwich 55 | orange 56 | broccoli 57 | carrot 58 | hot dog 59 | pizza 60 | donut 61 | cake 62 | chair 63 | sofa 64 | pottedplant 65 | bed 66 | 67 | diningtable 68 | 69 | 70 | toilet 71 | 72 | tvmonitor 73 | laptop 74 | mouse 75 | remote 76 | keyboard 77 | cell phone 78 | microwave 79 | oven 80 | toaster 81 | sink 82 | refrigerator 83 | 84 | book 85 | clock 86 | vase 87 | scissors 88 | teddy bear 89 | hair drier 90 | toothbrush 91 | -------------------------------------------------------------------------------- /model_data/simhei.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bubbliiiing/efficientdet-keras/16a5139bad0ebdbff704c00b942d19238db1ead3/model_data/simhei.ttf -------------------------------------------------------------------------------- /model_data/voc_classes.txt: -------------------------------------------------------------------------------- 1 | aeroplane 2 | bicycle 3 | bird 4 | boat 5 | bottle 6 | bus 7 | car 8 | cat 9 | chair 10 | cow 11 | diningtable 12 | dog 13 | horse 14 | motorbike 15 | person 16 | pottedplant 17 | sheep 18 | sofa 19 | train 20 | tvmonitor -------------------------------------------------------------------------------- /nets/__init__.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------------------------------- /nets/efficientdet.py: -------------------------------------------------------------------------------- 1 | import math 2 | from functools import reduce 3 | 4 | import keras 5 | import numpy as np 6 | import tensorflow as tf 7 | from keras import initializers, layers, models 8 | 9 | from nets.efficientnet import (EPSILON, MOMENTUM, EfficientNetB0, 10 | EfficientNetB1, EfficientNetB2, EfficientNetB3, 11 | EfficientNetB4, EfficientNetB5, EfficientNetB6, 12 | EfficientNetB7) 13 | 14 | 15 | class PriorProbability(keras.initializers.Initializer): 16 | def __init__(self, probability=0.01): 17 | self.probability = probability 18 | 19 | def get_config(self): 20 | return { 21 | 'probability': self.probability 22 | } 23 | 24 | def __call__(self, shape, dtype=None): 25 | result = np.ones(shape) * -math.log((1 - self.probability) / self.probability) 26 | return result 27 | 28 | class wBiFPNAdd(keras.layers.Layer): 29 | def __init__(self, epsilon=1e-4, **kwargs): 30 | super(wBiFPNAdd, self).__init__(**kwargs) 31 | self.epsilon = epsilon 32 | 33 | def build(self, input_shape): 34 | num_in = len(input_shape) 35 | self.w = self.add_weight(name=self.name, 36 | shape=(num_in,), 37 | initializer=keras.initializers.constant(1 / num_in), 38 | trainable=True, 39 | dtype=tf.float32) 40 | 41 | def call(self, inputs, **kwargs): 42 | w = keras.activations.relu(self.w) 43 | x = tf.reduce_sum([w[i] * inputs[i] for i in range(len(inputs))], axis=0) 44 | x = x / (tf.reduce_sum(w) + self.epsilon) 45 | return x 46 | 47 | def compute_output_shape(self, input_shape): 48 | return input_shape[0] 49 | 50 | def get_config(self): 51 | config = super(wBiFPNAdd, self).get_config() 52 | config.update({ 53 | 'epsilon': self.epsilon 54 | }) 55 | return config 56 | 57 | def SeparableConvBlock(num_channels, kernel_size, strides, name): 58 | f1 = layers.SeparableConv2D(num_channels, kernel_size=kernel_size, strides=strides, padding='same', 59 | use_bias=True, name=f'{name}/conv') 60 | f2 = layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, name=f'{name}/bn') 61 | return reduce(lambda f, g: lambda *args, **kwargs: g(f(*args, **kwargs)), (f1, f2)) 62 | 63 | 64 | def build_wBiFPN(features, num_channels, id): 65 | if id == 0: 66 | #-------------------------------------------# 67 | # 获得三个shape的有效特征层 68 | # 分别是C3 64, 64, 40 69 | # C4 32, 32, 112 70 | # C5 16, 16, 320 71 | #-------------------------------------------# 72 | _, _, C3, C4, C5 = features 73 | 74 | #------------------------------------------------------------------------# 75 | # 第一次BIFPN需要 下采样 与 调整通道 获得 p3_in p4_in p5_in p6_in p7_in 76 | #------------------------------------------------------------------------# 77 | 78 | #-------------------------------------------# 79 | # 首先对通道数进行调整 80 | # C3 64, 64, 40 -> 64, 64, 64 81 | #-------------------------------------------# 82 | P3_in = C3 83 | P3_in = layers.Conv2D(num_channels, kernel_size=1, padding='same', 84 | name=f'fpn_cells/cell_{id}/fnode3/resample_0_0_8/conv2d')(P3_in) 85 | P3_in = layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, 86 | name=f'fpn_cells/cell_{id}/fnode3/resample_0_0_8/bn')(P3_in) 87 | 88 | #-------------------------------------------# 89 | # 首先对通道数进行调整 90 | # C4 32, 32, 112 -> 32, 32, 64 91 | # -> 32, 32, 64 92 | #-------------------------------------------# 93 | P4_in = C4 94 | P4_in_1 = layers.Conv2D(num_channels, kernel_size=1, padding='same', 95 | name=f'fpn_cells/cell_{id}/fnode2/resample_0_1_7/conv2d')(P4_in) 96 | P4_in_1 = layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, 97 | name=f'fpn_cells/cell_{id}/fnode2/resample_0_1_7/bn')(P4_in_1) 98 | P4_in_2 = layers.Conv2D(num_channels, kernel_size=1, padding='same', 99 | name=f'fpn_cells/cell_{id}/fnode4/resample_0_1_9/conv2d')(P4_in) 100 | P4_in_2 = layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, 101 | name=f'fpn_cells/cell_{id}/fnode4/resample_0_1_9/bn')(P4_in_2) 102 | 103 | #-------------------------------------------# 104 | # 首先对通道数进行调整 105 | # C5 16, 16, 320 -> 16, 16, 64 106 | # -> 16, 16, 64 107 | #-------------------------------------------# 108 | P5_in = C5 109 | P5_in_1 = layers.Conv2D(num_channels, kernel_size=1, padding='same', 110 | name=f'fpn_cells/cell_{id}/fnode1/resample_0_2_6/conv2d')(P5_in) 111 | P5_in_1 = layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, 112 | name=f'fpn_cells/cell_{id}/fnode1/resample_0_2_6/bn')(P5_in_1) 113 | P5_in_2 = layers.Conv2D(num_channels, kernel_size=1, padding='same', 114 | name=f'fpn_cells/cell_{id}/fnode5/resample_0_2_10/conv2d')(P5_in) 115 | P5_in_2 = layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, 116 | name=f'fpn_cells/cell_{id}/fnode5/resample_0_2_10/bn')(P5_in_2) 117 | 118 | #-------------------------------------------# 119 | # 对C5进行下采样,调整通道数与宽高 120 | # C5 16, 16, 320 -> 8, 8, 64 121 | #-------------------------------------------# 122 | P6_in = layers.Conv2D(num_channels, kernel_size=1, padding='same', name='resample_p6/conv2d')(C5) 123 | P6_in = layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, name='resample_p6/bn')(P6_in) 124 | P6_in = layers.MaxPooling2D(pool_size=3, strides=2, padding='same', name='resample_p6/maxpool')(P6_in) 125 | 126 | #-------------------------------------------# 127 | # 对P6_in进行下采样,调整宽高 128 | # P6_in 8, 8, 64 -> 4, 4, 64 129 | #-------------------------------------------# 130 | P7_in = layers.MaxPooling2D(pool_size=3, strides=2, padding='same', name='resample_p7/maxpool')(P6_in) 131 | #-------------------------------------------------------------------------# 132 | 133 | #--------------------------构建BIFPN的上下采样循环-------------------------# 134 | P7_U = layers.UpSampling2D()(P7_in) 135 | P6_td = wBiFPNAdd(name=f'fpn_cells/cell_{id}/fnode0/add')([P6_in, P7_U]) 136 | P6_td = layers.Activation(lambda x: tf.nn.swish(x))(P6_td) 137 | P6_td = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 138 | name=f'fpn_cells/cell_{id}/fnode0/op_after_combine5')(P6_td) 139 | 140 | P6_U = layers.UpSampling2D()(P6_td) 141 | P5_td = wBiFPNAdd(name=f'fpn_cells/cell_{id}/fnode1/add')([P5_in_1, P6_U]) 142 | P5_td = layers.Activation(lambda x: tf.nn.swish(x))(P5_td) 143 | P5_td = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 144 | name=f'fpn_cells/cell_{id}/fnode1/op_after_combine6')(P5_td) 145 | 146 | P5_U = layers.UpSampling2D()(P5_td) 147 | P4_td = wBiFPNAdd(name=f'fpn_cells/cell_{id}/fnode2/add')([P4_in_1, P5_U]) 148 | P4_td = layers.Activation(lambda x: tf.nn.swish(x))(P4_td) 149 | P4_td = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 150 | name=f'fpn_cells/cell_{id}/fnode2/op_after_combine7')(P4_td) 151 | 152 | P4_U = layers.UpSampling2D()(P4_td) 153 | P3_out = wBiFPNAdd(name=f'fpn_cells/cell_{id}/fnode3/add')([P3_in, P4_U]) 154 | P3_out = layers.Activation(lambda x: tf.nn.swish(x))(P3_out) 155 | P3_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 156 | name=f'fpn_cells/cell_{id}/fnode3/op_after_combine8')(P3_out) 157 | 158 | P3_D = layers.MaxPooling2D(pool_size=3, strides=2, padding='same')(P3_out) 159 | P4_out = wBiFPNAdd(name=f'fpn_cells/cell_{id}/fnode4/add')([P4_in_2, P4_td, P3_D]) 160 | P4_out = layers.Activation(lambda x: tf.nn.swish(x))(P4_out) 161 | P4_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 162 | name=f'fpn_cells/cell_{id}/fnode4/op_after_combine9')(P4_out) 163 | 164 | P4_D = layers.MaxPooling2D(pool_size=3, strides=2, padding='same')(P4_out) 165 | P5_out = wBiFPNAdd(name=f'fpn_cells/cell_{id}/fnode5/add')([P5_in_2, P5_td, P4_D]) 166 | P5_out = layers.Activation(lambda x: tf.nn.swish(x))(P5_out) 167 | P5_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 168 | name=f'fpn_cells/cell_{id}/fnode5/op_after_combine10')(P5_out) 169 | 170 | P5_D = layers.MaxPooling2D(pool_size=3, strides=2, padding='same')(P5_out) 171 | P6_out = wBiFPNAdd(name=f'fpn_cells/cell_{id}/fnode6/add')([P6_in, P6_td, P5_D]) 172 | P6_out = layers.Activation(lambda x: tf.nn.swish(x))(P6_out) 173 | P6_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 174 | name=f'fpn_cells/cell_{id}/fnode6/op_after_combine11')(P6_out) 175 | 176 | P6_D = layers.MaxPooling2D(pool_size=3, strides=2, padding='same')(P6_out) 177 | P7_out = wBiFPNAdd(name=f'fpn_cells/cell_{id}/fnode7/add')([P7_in, P6_D]) 178 | P7_out = layers.Activation(lambda x: tf.nn.swish(x))(P7_out) 179 | P7_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 180 | name=f'fpn_cells/cell_{id}/fnode7/op_after_combine12')(P7_out) 181 | 182 | else: 183 | P3_in, P4_in, P5_in, P6_in, P7_in = features 184 | P7_U = layers.UpSampling2D()(P7_in) 185 | P6_td = wBiFPNAdd(name=f'fpn_cells/cell_{id}/fnode0/add')([P6_in, P7_U]) 186 | P6_td = layers.Activation(lambda x: tf.nn.swish(x))(P6_td) 187 | P6_td = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 188 | name=f'fpn_cells/cell_{id}/fnode0/op_after_combine5')(P6_td) 189 | 190 | P6_U = layers.UpSampling2D()(P6_td) 191 | P5_td = wBiFPNAdd(name=f'fpn_cells/cell_{id}/fnode1/add')([P5_in, P6_U]) 192 | P5_td = layers.Activation(lambda x: tf.nn.swish(x))(P5_td) 193 | P5_td = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 194 | name=f'fpn_cells/cell_{id}/fnode1/op_after_combine6')(P5_td) 195 | 196 | P5_U = layers.UpSampling2D()(P5_td) 197 | P4_td = wBiFPNAdd(name=f'fpn_cells/cell_{id}/fnode2/add')([P4_in, P5_U]) 198 | P4_td = layers.Activation(lambda x: tf.nn.swish(x))(P4_td) 199 | P4_td = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 200 | name=f'fpn_cells/cell_{id}/fnode2/op_after_combine7')(P4_td) 201 | 202 | P4_U = layers.UpSampling2D()(P4_td) 203 | P3_out = wBiFPNAdd(name=f'fpn_cells/cell_{id}/fnode3/add')([P3_in, P4_U]) 204 | P3_out = layers.Activation(lambda x: tf.nn.swish(x))(P3_out) 205 | P3_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 206 | name=f'fpn_cells/cell_{id}/fnode3/op_after_combine8')(P3_out) 207 | 208 | P3_D = layers.MaxPooling2D(pool_size=3, strides=2, padding='same')(P3_out) 209 | P4_out = wBiFPNAdd(name=f'fpn_cells/cell_{id}/fnode4/add')([P4_in, P4_td, P3_D]) 210 | P4_out = layers.Activation(lambda x: tf.nn.swish(x))(P4_out) 211 | P4_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 212 | name=f'fpn_cells/cell_{id}/fnode4/op_after_combine9')(P4_out) 213 | 214 | P4_D = layers.MaxPooling2D(pool_size=3, strides=2, padding='same')(P4_out) 215 | P5_out = wBiFPNAdd(name=f'fpn_cells/cell_{id}/fnode5/add')([P5_in, P5_td, P4_D]) 216 | P5_out = layers.Activation(lambda x: tf.nn.swish(x))(P5_out) 217 | P5_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 218 | name=f'fpn_cells/cell_{id}/fnode5/op_after_combine10')(P5_out) 219 | 220 | P5_D = layers.MaxPooling2D(pool_size=3, strides=2, padding='same')(P5_out) 221 | P6_out = wBiFPNAdd(name=f'fpn_cells/cell_{id}/fnode6/add')([P6_in, P6_td, P5_D]) 222 | P6_out = layers.Activation(lambda x: tf.nn.swish(x))(P6_out) 223 | P6_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 224 | name=f'fpn_cells/cell_{id}/fnode6/op_after_combine11')(P6_out) 225 | 226 | P6_D = layers.MaxPooling2D(pool_size=3, strides=2, padding='same')(P6_out) 227 | P7_out = wBiFPNAdd(name=f'fpn_cells/cell_{id}/fnode7/add')([P7_in, P6_D]) 228 | P7_out = layers.Activation(lambda x: tf.nn.swish(x))(P7_out) 229 | P7_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 230 | name=f'fpn_cells/cell_{id}/fnode7/op_after_combine12')(P7_out) 231 | 232 | return [P3_out, P4_out, P5_out, P6_out, P7_out] 233 | 234 | def build_BiFPN(features, num_channels, id): 235 | if id == 0: 236 | # 第一次BIFPN需要 下采样 与 降通道 获得 p3_in p4_in p5_in p6_in p7_in 237 | #-----------------------------下采样 与 降通道----------------------------# 238 | _, _, C3, C4, C5 = features 239 | P3_in = C3 240 | P3_in = layers.Conv2D(num_channels, kernel_size=1, padding='same', 241 | name=f'fpn_cells/cell_{id}/fnode3/resample_0_0_8/conv2d')(P3_in) 242 | P3_in = layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, 243 | name=f'fpn_cells/cell_{id}/fnode3/resample_0_0_8/bn')(P3_in) 244 | 245 | P4_in = C4 246 | P4_in_1 = layers.Conv2D(num_channels, kernel_size=1, padding='same', 247 | name=f'fpn_cells/cell_{id}/fnode2/resample_0_1_7/conv2d')(P4_in) 248 | P4_in_1 = layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, 249 | name=f'fpn_cells/cell_{id}/fnode2/resample_0_1_7/bn')(P4_in_1) 250 | P4_in_2 = layers.Conv2D(num_channels, kernel_size=1, padding='same', 251 | name=f'fpn_cells/cell_{id}/fnode4/resample_0_1_9/conv2d')(P4_in) 252 | P4_in_2 = layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, 253 | name=f'fpn_cells/cell_{id}/fnode4/resample_0_1_9/bn')(P4_in_2) 254 | 255 | P5_in = C5 256 | P5_in_1 = layers.Conv2D(num_channels, kernel_size=1, padding='same', 257 | name=f'fpn_cells/cell_{id}/fnode1/resample_0_2_6/conv2d')(P5_in) 258 | P5_in_1 = layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, 259 | name=f'fpn_cells/cell_{id}/fnode1/resample_0_2_6/bn')(P5_in_1) 260 | P5_in_2 = layers.Conv2D(num_channels, kernel_size=1, padding='same', 261 | name=f'fpn_cells/cell_{id}/fnode5/resample_0_2_10/conv2d')(P5_in) 262 | P5_in_2 = layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, 263 | name=f'fpn_cells/cell_{id}/fnode5/resample_0_2_10/bn')(P5_in_2) 264 | 265 | P6_in = layers.Conv2D(num_channels, kernel_size=1, padding='same', name='resample_p6/conv2d')(C5) 266 | P6_in = layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, name='resample_p6/bn')(P6_in) 267 | P6_in = layers.MaxPooling2D(pool_size=3, strides=2, padding='same', name='resample_p6/maxpool')(P6_in) 268 | 269 | P7_in = layers.MaxPooling2D(pool_size=3, strides=2, padding='same', name='resample_p7/maxpool')(P6_in) 270 | #-------------------------------------------------------------------------# 271 | 272 | #--------------------------构建BIFPN的上下采样循环-------------------------# 273 | P7_U = layers.UpSampling2D()(P7_in) 274 | P6_td = layers.Add(name=f'fpn_cells/cell_{id}/fnode0/add')([P6_in, P7_U]) 275 | P6_td = layers.Activation(lambda x: tf.nn.swish(x))(P6_td) 276 | P6_td = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 277 | name=f'fpn_cells/cell_{id}/fnode0/op_after_combine5')(P6_td) 278 | 279 | P6_U = layers.UpSampling2D()(P6_td) 280 | P5_td = layers.Add(name=f'fpn_cells/cell_{id}/fnode1/add')([P5_in_1, P6_U]) 281 | P5_td = layers.Activation(lambda x: tf.nn.swish(x))(P5_td) 282 | P5_td = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 283 | name=f'fpn_cells/cell_{id}/fnode1/op_after_combine6')(P5_td) 284 | 285 | P5_U = layers.UpSampling2D()(P5_td) 286 | P4_td = layers.Add(name=f'fpn_cells/cell_{id}/fnode2/add')([P4_in_1, P5_U]) 287 | P4_td = layers.Activation(lambda x: tf.nn.swish(x))(P4_td) 288 | P4_td = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 289 | name=f'fpn_cells/cell_{id}/fnode2/op_after_combine7')(P4_td) 290 | 291 | P4_U = layers.UpSampling2D()(P4_td) 292 | P3_out = layers.Add(name=f'fpn_cells/cell_{id}/fnode3/add')([P3_in, P4_U]) 293 | P3_out = layers.Activation(lambda x: tf.nn.swish(x))(P3_out) 294 | P3_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 295 | name=f'fpn_cells/cell_{id}/fnode3/op_after_combine8')(P3_out) 296 | 297 | P3_D = layers.MaxPooling2D(pool_size=3, strides=2, padding='same')(P3_out) 298 | P4_out = layers.Add(name=f'fpn_cells/cell_{id}/fnode4/add')([P4_in_2, P4_td, P3_D]) 299 | P4_out = layers.Activation(lambda x: tf.nn.swish(x))(P4_out) 300 | P4_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 301 | name=f'fpn_cells/cell_{id}/fnode4/op_after_combine9')(P4_out) 302 | 303 | P4_D = layers.MaxPooling2D(pool_size=3, strides=2, padding='same')(P4_out) 304 | P5_out = layers.Add(name=f'fpn_cells/cell_{id}/fnode5/add')([P5_in_2, P5_td, P4_D]) 305 | P5_out = layers.Activation(lambda x: tf.nn.swish(x))(P5_out) 306 | P5_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 307 | name=f'fpn_cells/cell_{id}/fnode5/op_after_combine10')(P5_out) 308 | 309 | P5_D = layers.MaxPooling2D(pool_size=3, strides=2, padding='same')(P5_out) 310 | P6_out = layers.Add(name=f'fpn_cells/cell_{id}/fnode6/add')([P6_in, P6_td, P5_D]) 311 | P6_out = layers.Activation(lambda x: tf.nn.swish(x))(P6_out) 312 | P6_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 313 | name=f'fpn_cells/cell_{id}/fnode6/op_after_combine11')(P6_out) 314 | 315 | P6_D = layers.MaxPooling2D(pool_size=3, strides=2, padding='same')(P6_out) 316 | P7_out = layers.Add(name=f'fpn_cells/cell_{id}/fnode7/add')([P7_in, P6_D]) 317 | P7_out = layers.Activation(lambda x: tf.nn.swish(x))(P7_out) 318 | P7_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 319 | name=f'fpn_cells/cell_{id}/fnode7/op_after_combine12')(P7_out) 320 | 321 | else: 322 | P3_in, P4_in, P5_in, P6_in, P7_in = features 323 | P7_U = layers.UpSampling2D()(P7_in) 324 | P6_td = layers.Add(name=f'fpn_cells/cell_{id}/fnode0/add')([P6_in, P7_U]) 325 | P6_td = layers.Activation(lambda x: tf.nn.swish(x))(P6_td) 326 | P6_td = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 327 | name=f'fpn_cells/cell_{id}/fnode0/op_after_combine5')(P6_td) 328 | 329 | P6_U = layers.UpSampling2D()(P6_td) 330 | P5_td = layers.Add(name=f'fpn_cells/cell_{id}/fnode1/add')([P5_in, P6_U]) 331 | P5_td = layers.Activation(lambda x: tf.nn.swish(x))(P5_td) 332 | P5_td = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 333 | name=f'fpn_cells/cell_{id}/fnode1/op_after_combine6')(P5_td) 334 | 335 | P5_U = layers.UpSampling2D()(P5_td) 336 | P4_td = layers.Add(name=f'fpn_cells/cell_{id}/fnode2/add')([P4_in, P5_U]) 337 | P4_td = layers.Activation(lambda x: tf.nn.swish(x))(P4_td) 338 | P4_td = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 339 | name=f'fpn_cells/cell_{id}/fnode2/op_after_combine7')(P4_td) 340 | 341 | P4_U = layers.UpSampling2D()(P4_td) 342 | P3_out = layers.Add(name=f'fpn_cells/cell_{id}/fnode3/add')([P3_in, P4_U]) 343 | P3_out = layers.Activation(lambda x: tf.nn.swish(x))(P3_out) 344 | P3_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 345 | name=f'fpn_cells/cell_{id}/fnode3/op_after_combine8')(P3_out) 346 | 347 | P3_D = layers.MaxPooling2D(pool_size=3, strides=2, padding='same')(P3_out) 348 | P4_out = layers.Add(name=f'fpn_cells/cell_{id}/fnode4/add')([P4_in, P4_td, P3_D]) 349 | P4_out = layers.Activation(lambda x: tf.nn.swish(x))(P4_out) 350 | P4_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 351 | name=f'fpn_cells/cell_{id}/fnode4/op_after_combine9')(P4_out) 352 | 353 | P4_D = layers.MaxPooling2D(pool_size=3, strides=2, padding='same')(P4_out) 354 | P5_out = layers.Add(name=f'fpn_cells/cell_{id}/fnode5/add')([P5_in, P5_td, P4_D]) 355 | P5_out = layers.Activation(lambda x: tf.nn.swish(x))(P5_out) 356 | P5_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 357 | name=f'fpn_cells/cell_{id}/fnode5/op_after_combine10')(P5_out) 358 | 359 | P5_D = layers.MaxPooling2D(pool_size=3, strides=2, padding='same')(P5_out) 360 | P6_out = layers.Add(name=f'fpn_cells/cell_{id}/fnode6/add')([P6_in, P6_td, P5_D]) 361 | P6_out = layers.Activation(lambda x: tf.nn.swish(x))(P6_out) 362 | P6_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 363 | name=f'fpn_cells/cell_{id}/fnode6/op_after_combine11')(P6_out) 364 | 365 | P6_D = layers.MaxPooling2D(pool_size=3, strides=2, padding='same')(P6_out) 366 | P7_out = layers.Add(name=f'fpn_cells/cell_{id}/fnode7/add')([P7_in, P6_D]) 367 | P7_out = layers.Activation(lambda x: tf.nn.swish(x))(P7_out) 368 | P7_out = SeparableConvBlock(num_channels=num_channels, kernel_size=3, strides=1, 369 | name=f'fpn_cells/cell_{id}/fnode7/op_after_combine12')(P7_out) 370 | return [P3_out, P4_out, P5_out, P6_out, P7_out] 371 | 372 | #------------------------------------------# 373 | # 获得回归预测结果 374 | # 该部分会对先验框进行调整获得预测框 375 | #------------------------------------------# 376 | class BoxNet: 377 | def __init__(self, width, depth, num_anchors=9, name='box_net', **kwargs): 378 | self.name = name 379 | self.width = width 380 | self.depth = depth 381 | self.num_anchors = num_anchors 382 | options = { 383 | 'kernel_size': 3, 384 | 'strides': 1, 385 | 'padding': 'same', 386 | 'bias_initializer': 'zeros', 387 | 'depthwise_initializer': initializers.RandomNormal(stddev=0.01), 388 | 'pointwise_initializer': initializers.RandomNormal(stddev=0.01), 389 | } 390 | 391 | self.convs = [layers.SeparableConv2D(filters=width, name=f'{self.name}/box-{i}', **options) for i in range(depth)] 392 | self.head = layers.SeparableConv2D(filters=num_anchors * 4, name=f'{self.name}/box-predict', **options) 393 | 394 | self.bns = [[layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, name=f'{self.name}/box-{i}-bn-{j}') for j in 395 | range(3, 8)] for i in range(depth)] 396 | 397 | self.relu = layers.Lambda(lambda x: tf.nn.swish(x)) 398 | self.reshape = layers.Reshape((-1, 4)) 399 | 400 | def call(self, inputs): 401 | feature, level = inputs 402 | for i in range(self.depth): 403 | feature = self.convs[i](feature) 404 | feature = self.bns[i][level](feature) 405 | feature = self.relu(feature) 406 | outputs = self.head(feature) 407 | outputs = self.reshape(outputs) 408 | return outputs 409 | 410 | #------------------------------------------# 411 | # 获得分类预测结果 412 | # 该部分会判断先验框对应的物体种类 413 | #------------------------------------------# 414 | class ClassNet: 415 | def __init__(self, width, depth, num_classes=20, num_anchors=9, name='class_net', **kwargs): 416 | self.name = name 417 | self.width = width 418 | self.depth = depth 419 | self.num_classes = num_classes 420 | self.num_anchors = num_anchors 421 | options = { 422 | 'kernel_size': 3, 423 | 'strides': 1, 424 | 'padding': 'same', 425 | 'depthwise_initializer': initializers.RandomNormal(stddev=0.01), 426 | 'pointwise_initializer': initializers.RandomNormal(stddev=0.01), 427 | } 428 | 429 | self.convs = [layers.SeparableConv2D(filters=width, bias_initializer='zeros', name=f'{self.name}/class-{i}', **options) for i in range(depth)] 430 | self.head = layers.SeparableConv2D(filters=num_classes * num_anchors, bias_initializer=PriorProbability(probability=0.01), name=f'{self.name}/class-predict', **options) 431 | 432 | self.bns = [[layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, name=f'{self.name}/class-{i}-bn-{j}') for j 433 | in range(3, 8)] for i in range(depth)] 434 | 435 | self.relu = layers.Lambda(lambda x: tf.nn.swish(x)) 436 | self.reshape = layers.Reshape((-1, num_classes)) 437 | self.activation = layers.Activation('sigmoid') 438 | 439 | def call(self, inputs): 440 | feature, level = inputs 441 | for i in range(self.depth): 442 | feature = self.convs[i](feature) 443 | feature = self.bns[i][level](feature) 444 | feature = self.relu(feature) 445 | outputs = self.head(feature) 446 | outputs = self.reshape(outputs) 447 | outputs = self.activation(outputs) 448 | return outputs 449 | 450 | def efficientdet(input_shape, phi, num_classes=20, num_anchors=9): 451 | assert phi in range(8) 452 | # 不同版本的Efficientdet的efficientdet使用的参数不同。 453 | fpn_num_filters = [64, 88, 112, 160, 224, 288, 384, 384] 454 | fpn_cell_repeats = [3, 4, 5, 6, 7, 7, 8, 8] 455 | box_class_repeats = [3, 3, 3, 4, 4, 4, 5, 5] 456 | 457 | backbones = [EfficientNetB0, EfficientNetB1, EfficientNetB2, 458 | EfficientNetB3, EfficientNetB4, EfficientNetB5, 459 | EfficientNetB6, EfficientNetB7] 460 | 461 | #------------------------------------------------------# 462 | # 神经网络的输入 463 | # efficientdet-D0 512,512,3 464 | #------------------------------------------------------# 465 | inputs = layers.Input(input_shape) 466 | 467 | #------------------------------------------------------# 468 | # 在经过多次BiFPN模块的堆叠后,我们获得的fpn_features 469 | # 包括五个有效特征层: 470 | # P3_out 64,64,64 471 | # P4_out 32,32,64 472 | # P5_out 16,16,64 473 | # P6_out 8,8,64 474 | # P7_out 4,4,64 475 | #------------------------------------------------------# 476 | fpn_features = backbones[phi](inputs=inputs) 477 | if phi < 6: 478 | for i in range(fpn_cell_repeats[phi]): 479 | fpn_features = build_wBiFPN(fpn_features, fpn_num_filters[phi], i) 480 | else: 481 | for i in range(fpn_cell_repeats[phi]): 482 | fpn_features = build_BiFPN(fpn_features, fpn_num_filters[phi], i) 483 | 484 | #------------------------------------------------------# 485 | # 创建efficient head 486 | # 可以将特征层转换成预测结果 487 | #------------------------------------------------------# 488 | box_net = BoxNet(fpn_num_filters[phi], box_class_repeats[phi], num_anchors=num_anchors, name='box_net') 489 | class_net = ClassNet(fpn_num_filters[phi], box_class_repeats[phi], num_classes=num_classes, num_anchors=num_anchors, name='class_net') 490 | 491 | #------------------------------------------------------# 492 | # 利用efficient head获得各个特征层的预测结果 493 | # 并且将预测结果进行堆叠。 494 | #------------------------------------------------------# 495 | classification = [class_net.call([feature, i]) for i, feature in enumerate(fpn_features)] 496 | classification = layers.Concatenate(axis=1, name='classification')(classification) 497 | regression = [box_net.call([feature, i]) for i, feature in enumerate(fpn_features)] 498 | regression = layers.Concatenate(axis=1, name='regression')(regression) 499 | 500 | model = models.Model(inputs=[inputs], outputs=[regression, classification], name='efficientdet') 501 | 502 | return model 503 | -------------------------------------------------------------------------------- /nets/efficientdet_training.py: -------------------------------------------------------------------------------- 1 | import math 2 | from functools import partial 3 | 4 | import tensorflow as tf 5 | from keras import backend as K 6 | 7 | 8 | def focal(alpha=0.25, gamma=2.0): 9 | def _focal(y_true, y_pred): 10 | #---------------------------------------------------# 11 | # y_true [batch_size, num_anchor, num_classes+1] 12 | # y_pred [batch_size, num_anchor, num_classes] 13 | #---------------------------------------------------# 14 | labels = y_true[:, :, :-1] 15 | #---------------------------------------------------# 16 | # -1 是需要忽略的, 0 是背景, 1 是存在目标 17 | #---------------------------------------------------# 18 | anchor_state = y_true[:, :, -1] 19 | classification = y_pred 20 | 21 | # 找出存在目标的先验框 22 | indices = tf.where(K.not_equal(anchor_state, -1)) 23 | labels = tf.gather_nd(labels, indices) 24 | classification = tf.gather_nd(classification, indices) 25 | 26 | # 计算每一个先验框应该有的权重 27 | alpha_factor = K.ones_like(labels) * alpha 28 | alpha_factor = tf.where(K.equal(labels, 1), alpha_factor, 1 - alpha_factor) 29 | focal_weight = tf.where(K.equal(labels, 1), 1 - classification, classification) 30 | focal_weight = alpha_factor * focal_weight ** gamma 31 | 32 | # 将权重乘上所求得的交叉熵 33 | cls_loss = focal_weight * K.binary_crossentropy(labels, classification) 34 | 35 | # 标准化,实际上是正样本的数量 36 | normalizer = tf.where(K.equal(anchor_state, 1)) 37 | normalizer = K.cast(K.shape(normalizer)[0], K.floatx()) 38 | normalizer = K.maximum(K.cast_to_floatx(1.0), normalizer) 39 | 40 | # 将所获得的loss除上正样本的数量 41 | loss = K.sum(cls_loss) / normalizer 42 | return loss 43 | return _focal 44 | 45 | def smooth_l1(sigma=3.0): 46 | sigma_squared = sigma ** 2 47 | def _smooth_l1(y_true, y_pred): 48 | #---------------------------------------------------# 49 | # y_true [batch_size, num_anchor, 4+1] 50 | # y_pred [batch_size, num_anchor, 4] 51 | #---------------------------------------------------# 52 | regression = y_pred 53 | regression_target = y_true[:, :, :-1] 54 | anchor_state = y_true[:, :, -1] 55 | 56 | # 找出存在目标的先验框 57 | indices = tf.where(K.equal(anchor_state, 1)) 58 | regression = tf.gather_nd(regression, indices) 59 | regression_target = tf.gather_nd(regression_target, indices) 60 | 61 | # 计算smooth L1损失 62 | regression_diff = regression - regression_target 63 | regression_diff = K.abs(regression_diff) 64 | regression_loss = tf.where( 65 | K.less(regression_diff, 1.0 / sigma_squared), 66 | 0.5 * sigma_squared * K.pow(regression_diff, 2), 67 | regression_diff - 0.5 / sigma_squared 68 | ) 69 | 70 | # 将所获得的loss除上正样本的数量 71 | normalizer = K.maximum(1, K.shape(indices)[0]) 72 | normalizer = K.cast(normalizer, dtype=K.floatx()) 73 | return K.sum(regression_loss) / normalizer / 4 74 | return _smooth_l1 75 | 76 | def get_lr_scheduler(lr_decay_type, lr, min_lr, total_iters, warmup_iters_ratio = 0.05, warmup_lr_ratio = 0.1, no_aug_iter_ratio = 0.05, step_num = 10): 77 | def yolox_warm_cos_lr(lr, min_lr, total_iters, warmup_total_iters, warmup_lr_start, no_aug_iter, iters): 78 | if iters <= warmup_total_iters: 79 | # lr = (lr - warmup_lr_start) * iters / float(warmup_total_iters) + warmup_lr_start 80 | lr = (lr - warmup_lr_start) * pow(iters / float(warmup_total_iters), 2 81 | ) + warmup_lr_start 82 | elif iters >= total_iters - no_aug_iter: 83 | lr = min_lr 84 | else: 85 | lr = min_lr + 0.5 * (lr - min_lr) * ( 86 | 1.0 87 | + math.cos( 88 | math.pi 89 | * (iters - warmup_total_iters) 90 | / (total_iters - warmup_total_iters - no_aug_iter) 91 | ) 92 | ) 93 | return lr 94 | 95 | def step_lr(lr, decay_rate, step_size, iters): 96 | if step_size < 1: 97 | raise ValueError("step_size must above 1.") 98 | n = iters // step_size 99 | out_lr = lr * decay_rate ** n 100 | return out_lr 101 | 102 | if lr_decay_type == "cos": 103 | warmup_total_iters = min(max(warmup_iters_ratio * total_iters, 1), 3) 104 | warmup_lr_start = max(warmup_lr_ratio * lr, 1e-6) 105 | no_aug_iter = min(max(no_aug_iter_ratio * total_iters, 1), 15) 106 | func = partial(yolox_warm_cos_lr ,lr, min_lr, total_iters, warmup_total_iters, warmup_lr_start, no_aug_iter) 107 | else: 108 | decay_rate = (min_lr / lr) ** (1 / (step_num - 1)) 109 | step_size = total_iters / step_num 110 | func = partial(step_lr, lr, decay_rate, step_size) 111 | 112 | return func 113 | 114 | -------------------------------------------------------------------------------- /nets/efficientnet.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import math 3 | import string 4 | 5 | from keras import backend, layers 6 | 7 | MOMENTUM = 0.99 8 | EPSILON = 1e-3 9 | 10 | #-------------------------------------------------# 11 | # 一共七个大结构块,每个大结构块都有特定的参数 12 | #-------------------------------------------------# 13 | BlockArgs = collections.namedtuple('BlockArgs', [ 14 | 'kernel_size', 'num_repeat', 'input_filters', 'output_filters', 15 | 'expand_ratio', 'id_skip', 'strides', 'se_ratio' 16 | ]) 17 | 18 | BlockArgs.__new__.__defaults__ = (None,) * len(BlockArgs._fields) 19 | 20 | DEFAULT_BLOCKS_ARGS = [ 21 | BlockArgs(kernel_size=3, num_repeat=1, input_filters=32, output_filters=16, 22 | expand_ratio=1, id_skip=True, strides=[1, 1], se_ratio=0.25), 23 | BlockArgs(kernel_size=3, num_repeat=2, input_filters=16, output_filters=24, 24 | expand_ratio=6, id_skip=True, strides=[2, 2], se_ratio=0.25), 25 | BlockArgs(kernel_size=5, num_repeat=2, input_filters=24, output_filters=40, 26 | expand_ratio=6, id_skip=True, strides=[2, 2], se_ratio=0.25), 27 | BlockArgs(kernel_size=3, num_repeat=3, input_filters=40, output_filters=80, 28 | expand_ratio=6, id_skip=True, strides=[2, 2], se_ratio=0.25), 29 | BlockArgs(kernel_size=5, num_repeat=3, input_filters=80, output_filters=112, 30 | expand_ratio=6, id_skip=True, strides=[1, 1], se_ratio=0.25), 31 | BlockArgs(kernel_size=5, num_repeat=4, input_filters=112, output_filters=192, 32 | expand_ratio=6, id_skip=True, strides=[2, 2], se_ratio=0.25), 33 | BlockArgs(kernel_size=3, num_repeat=1, input_filters=192, output_filters=320, 34 | expand_ratio=6, id_skip=True, strides=[1, 1], se_ratio=0.25) 35 | ] 36 | 37 | #-------------------------------------------------# 38 | # Kernel的初始化器 39 | #-------------------------------------------------# 40 | CONV_KERNEL_INITIALIZER = { 41 | 'class_name': 'VarianceScaling', 42 | 'config': { 43 | 'scale': 2.0, 44 | 'mode': 'fan_out', 45 | 'distribution': 'normal' 46 | } 47 | } 48 | 49 | #-------------------------------------------------# 50 | # Swish激活函数 51 | #-------------------------------------------------# 52 | def get_swish(): 53 | def swish(x): 54 | return x * backend.sigmoid(x) 55 | return swish 56 | 57 | #-------------------------------------------------# 58 | # Dropout层 59 | #-------------------------------------------------# 60 | def get_dropout(): 61 | class FixedDropout(layers.Dropout): 62 | def _get_noise_shape(self, inputs): 63 | if self.noise_shape is None: 64 | return self.noise_shape 65 | 66 | symbolic_shape = backend.shape(inputs) 67 | noise_shape = [symbolic_shape[axis] if shape is None else shape 68 | for axis, shape in enumerate(self.noise_shape)] 69 | return tuple(noise_shape) 70 | 71 | return FixedDropout 72 | 73 | #-------------------------------------------------# 74 | # 该函数的目的是保证filter的大小可以被8整除 75 | #-------------------------------------------------# 76 | def round_filters(filters, width_coefficient, depth_divisor): 77 | filters *= width_coefficient 78 | new_filters = int(filters + depth_divisor / 2) // depth_divisor * depth_divisor 79 | new_filters = max(depth_divisor, new_filters) 80 | if new_filters < 0.9 * filters: 81 | new_filters += depth_divisor 82 | return int(new_filters) 83 | 84 | #-------------------------------------------------# 85 | # 计算模块的重复次数 86 | #-------------------------------------------------# 87 | def round_repeats(repeats, depth_coefficient): 88 | return int(math.ceil(depth_coefficient * repeats)) 89 | 90 | 91 | def mb_conv_block(inputs, block_args, activation, drop_rate=None, prefix=''): 92 | Dropout = get_dropout() 93 | 94 | #-------------------------------------------------# 95 | # 利用Inverted residuals 96 | # part1 利用1x1卷积进行通道数上升 97 | #-------------------------------------------------# 98 | filters = block_args.input_filters * block_args.expand_ratio 99 | if block_args.expand_ratio != 1: 100 | x = layers.Conv2D(filters, 1, 101 | padding='same', 102 | use_bias=False, 103 | kernel_initializer=CONV_KERNEL_INITIALIZER, 104 | name=prefix + 'expand_conv')(inputs) 105 | x = layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, name=prefix + 'expand_bn')(x) 106 | x = layers.Activation(activation, name=prefix + 'expand_activation')(x) 107 | else: 108 | x = inputs 109 | 110 | #------------------------------------------------------# 111 | # 如果步长为2x2的话,利用深度可分离卷积进行高宽压缩 112 | # part2 利用3x3卷积对每一个channel进行卷积 113 | #------------------------------------------------------# 114 | x = layers.DepthwiseConv2D(block_args.kernel_size, 115 | strides=block_args.strides, 116 | padding='same', 117 | use_bias=False, 118 | depthwise_initializer=CONV_KERNEL_INITIALIZER, 119 | name=prefix + 'dwconv')(x) 120 | x = layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, name=prefix + 'bn')(x) 121 | x = layers.Activation(activation, name=prefix + 'activation')(x) 122 | 123 | #------------------------------------------------------# 124 | # 完成深度可分离卷积后 125 | # 对深度可分离卷积的结果施加注意力机制 126 | #------------------------------------------------------# 127 | if 0 < block_args.se_ratio <= 1: 128 | num_reduced_filters = max(1, int(block_args.input_filters * block_args.se_ratio)) 129 | se_tensor = layers.GlobalAveragePooling2D(name=prefix + 'se_squeeze')(x) 130 | se_tensor = layers.Reshape((1, 1, filters), name=prefix + 'se_reshape')(se_tensor) 131 | #------------------------------------------------------# 132 | # 通道先压缩后上升,最后利用sigmoid将值固定到0-1之间 133 | #------------------------------------------------------# 134 | se_tensor = layers.Conv2D(num_reduced_filters, 1, 135 | activation=activation, 136 | padding='same', 137 | use_bias=True, 138 | kernel_initializer=CONV_KERNEL_INITIALIZER, 139 | name=prefix + 'se_reduce')(se_tensor) 140 | se_tensor = layers.Conv2D(filters, 1, 141 | activation='sigmoid', 142 | padding='same', 143 | use_bias=True, 144 | kernel_initializer=CONV_KERNEL_INITIALIZER, 145 | name=prefix + 'se_expand')(se_tensor) 146 | x = layers.multiply([x, se_tensor], name=prefix + 'se_excite') 147 | 148 | #------------------------------------------------------# 149 | # part3 利用1x1卷积进行通道下降 150 | #------------------------------------------------------# 151 | x = layers.Conv2D(block_args.output_filters, 1, 152 | padding='same', 153 | use_bias=False, 154 | kernel_initializer=CONV_KERNEL_INITIALIZER, 155 | name=prefix + 'project_conv')(x) 156 | x = layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, name=prefix + 'project_bn')(x) 157 | 158 | #------------------------------------------------------# 159 | # part4 如果满足残差条件,那么就增加残差边 160 | #------------------------------------------------------# 161 | if block_args.id_skip and all(s == 1 for s in block_args.strides) and block_args.input_filters == block_args.output_filters: 162 | if drop_rate and (drop_rate > 0): 163 | x = Dropout(drop_rate, 164 | noise_shape=(None, 1, 1, 1), 165 | name=prefix + 'drop')(x) 166 | x = layers.add([x, inputs], name=prefix + 'add') 167 | 168 | return x 169 | 170 | 171 | def EfficientNet(width_coefficient, 172 | depth_coefficient, 173 | drop_connect_rate=0.2, 174 | depth_divisor=8, 175 | blocks_args=DEFAULT_BLOCKS_ARGS, 176 | inputs=None, 177 | **kwargs): 178 | activation = get_swish(**kwargs) 179 | 180 | img_input = inputs 181 | #-------------------------------------------------# 182 | # 创建stem部分 183 | #-------------------------------------------------# 184 | x = img_input 185 | x = layers.Conv2D(round_filters(32, width_coefficient, depth_divisor), 3, 186 | strides=(2, 2), 187 | padding='same', 188 | use_bias=False, 189 | kernel_initializer=CONV_KERNEL_INITIALIZER, 190 | name='stem_conv')(x) 191 | x = layers.BatchNormalization(momentum=MOMENTUM, epsilon=EPSILON, name='stem_bn')(x) 192 | x = layers.Activation(activation, name='stem_activation')(x) 193 | 194 | features = [] 195 | #-------------------------------------------------# 196 | # 计算总的efficient_block的数量 197 | #-------------------------------------------------# 198 | block_num = 0 199 | num_blocks_total = sum(block_args.num_repeat for block_args in blocks_args) 200 | #------------------------------------------------------------------------------# 201 | # 对结构块参数进行循环、一共进行7个大的结构块。 202 | # 每个大结构块下会重复小的efficient_block 203 | #------------------------------------------------------------------------------# 204 | for idx, block_args in enumerate(blocks_args): 205 | assert block_args.num_repeat > 0 206 | #-------------------------------------------------# 207 | # 对使用到的参数进行更新 208 | #-------------------------------------------------# 209 | block_args = block_args._replace( 210 | input_filters = round_filters(block_args.input_filters, width_coefficient, depth_divisor), 211 | output_filters = round_filters(block_args.output_filters, width_coefficient, depth_divisor), 212 | num_repeat = round_repeats(block_args.num_repeat, depth_coefficient)) 213 | 214 | # 计算drop_rate 215 | drop_rate = drop_connect_rate * float(block_num) / num_blocks_total 216 | x = mb_conv_block(x, block_args, 217 | activation=activation, 218 | drop_rate=drop_rate, 219 | prefix='block{}a_'.format(idx + 1)) 220 | block_num += 1 221 | if block_args.num_repeat > 1: 222 | #-------------------------------------------------# 223 | # 对使用到的参数进行更新 224 | #-------------------------------------------------# 225 | block_args = block_args._replace(input_filters=block_args.output_filters, strides=[1, 1]) 226 | for bidx in range(block_args.num_repeat - 1): 227 | # 计算drop_rate 228 | drop_rate = drop_connect_rate * float(block_num) / num_blocks_total 229 | x = mb_conv_block(x, block_args, 230 | activation = activation, 231 | drop_rate = drop_rate, 232 | prefix = 'block{}{}_'.format(idx + 1, string.ascii_lowercase[bidx + 1])) 233 | block_num += 1 234 | 235 | if idx < len(blocks_args) - 1 and blocks_args[idx + 1].strides[0] == 2: 236 | features.append(x) 237 | elif idx == len(blocks_args) - 1: 238 | features.append(x) 239 | return features 240 | 241 | def EfficientNetB0(inputs=None, **kwargs): 242 | return EfficientNet(1.0, 1.0, inputs=inputs, **kwargs) 243 | 244 | 245 | def EfficientNetB1(inputs=None, **kwargs): 246 | return EfficientNet(1.0, 1.1, inputs=inputs, **kwargs) 247 | 248 | 249 | def EfficientNetB2(inputs=None, **kwargs): 250 | return EfficientNet(1.1, 1.2, inputs=inputs, **kwargs) 251 | 252 | 253 | def EfficientNetB3(inputs=None, **kwargs): 254 | return EfficientNet(1.2, 1.4, inputs=inputs, **kwargs) 255 | 256 | 257 | def EfficientNetB4(inputs=None, **kwargs): 258 | return EfficientNet(1.4, 1.8, inputs=inputs, **kwargs) 259 | 260 | 261 | def EfficientNetB5(inputs=None, **kwargs): 262 | return EfficientNet(1.6, 2.2, inputs=inputs, **kwargs) 263 | 264 | 265 | def EfficientNetB6(inputs=None, **kwargs): 266 | return EfficientNet(1.8, 2.6, inputs=inputs, **kwargs) 267 | 268 | 269 | def EfficientNetB7(inputs=None, **kwargs): 270 | return EfficientNet(2.0, 3.1, inputs=inputs, **kwargs) 271 | 272 | -------------------------------------------------------------------------------- /predict.py: -------------------------------------------------------------------------------- 1 | #-----------------------------------------------------------------------# 2 | # predict.py将单张图片预测、摄像头检测、FPS测试和目录遍历检测等功能 3 | # 整合到了一个py文件中,通过指定mode进行模式的修改。 4 | #-----------------------------------------------------------------------# 5 | import time 6 | 7 | import cv2 8 | import numpy as np 9 | from PIL import Image 10 | 11 | from efficientdet import Efficientdet 12 | 13 | if __name__ == "__main__": 14 | efficientdet = Efficientdet() 15 | #----------------------------------------------------------------------------------------------------------# 16 | # mode用于指定测试的模式: 17 | # 'predict' 表示单张图片预测,如果想对预测过程进行修改,如保存图片,截取对象等,可以先看下方详细的注释 18 | # 'video' 表示视频检测,可调用摄像头或者视频进行检测,详情查看下方注释。 19 | # 'fps' 表示测试fps,使用的图片是img里面的street.jpg,详情查看下方注释。 20 | # 'dir_predict' 表示遍历文件夹进行检测并保存。默认遍历img文件夹,保存img_out文件夹,详情查看下方注释。 21 | #----------------------------------------------------------------------------------------------------------# 22 | mode = "predict" 23 | #-------------------------------------------------------------------------# 24 | # crop 指定了是否在单张图片预测后对目标进行截取 25 | # count 指定了是否进行目标的计数 26 | # crop、count仅在mode='predict'时有效 27 | #-------------------------------------------------------------------------# 28 | crop = False 29 | count = False 30 | #----------------------------------------------------------------------------------------------------------# 31 | # video_path 用于指定视频的路径,当video_path=0时表示检测摄像头 32 | # 想要检测视频,则设置如video_path = "xxx.mp4"即可,代表读取出根目录下的xxx.mp4文件。 33 | # video_save_path 表示视频保存的路径,当video_save_path=""时表示不保存 34 | # 想要保存视频,则设置如video_save_path = "yyy.mp4"即可,代表保存为根目录下的yyy.mp4文件。 35 | # video_fps 用于保存的视频的fps 36 | # 37 | # video_path、video_save_path和video_fps仅在mode='video'时有效 38 | # 保存视频时需要ctrl+c退出或者运行到最后一帧才会完成完整的保存步骤。 39 | #----------------------------------------------------------------------------------------------------------# 40 | video_path = 0 41 | video_save_path = "" 42 | video_fps = 25.0 43 | #----------------------------------------------------------------------------------------------------------# 44 | # test_interval 用于指定测量fps的时候,图片检测的次数。理论上test_interval越大,fps越准确。 45 | # fps_image_path 用于指定测试的fps图片 46 | # 47 | # test_interval和fps_image_path仅在mode='fps'有效 48 | #----------------------------------------------------------------------------------------------------------# 49 | test_interval = 100 50 | fps_image_path = "img/street.jpg" 51 | #-------------------------------------------------------------------------# 52 | # dir_origin_path 指定了用于检测的图片的文件夹路径 53 | # dir_save_path 指定了检测完图片的保存路径 54 | # 55 | # dir_origin_path和dir_save_path仅在mode='dir_predict'时有效 56 | #-------------------------------------------------------------------------# 57 | dir_origin_path = "img/" 58 | dir_save_path = "img_out/" 59 | 60 | if mode == "predict": 61 | ''' 62 | 1、如果想要进行检测完的图片的保存,利用r_image.save("img.jpg")即可保存,直接在predict.py里进行修改即可。 63 | 2、如果想要获得预测框的坐标,可以进入efficientdet.detect_image函数,在绘图部分读取top,left,bottom,right这四个值。 64 | 3、如果想要利用预测框截取下目标,可以进入efficientdet.detect_image函数,在绘图部分利用获取到的top,left,bottom,right这四个值 65 | 在原图上利用矩阵的方式进行截取。 66 | 4、如果想要在预测图上写额外的字,比如检测到的特定目标的数量,可以进入efficientdet.detect_image函数,在绘图部分对predicted_class进行判断, 67 | 比如判断if predicted_class == 'car': 即可判断当前目标是否为车,然后记录数量即可。利用draw.text即可写字。 68 | ''' 69 | while True: 70 | img = input('Input image filename:') 71 | try: 72 | image = Image.open(img) 73 | except: 74 | print('Open Error! Try again!') 75 | continue 76 | else: 77 | r_image = efficientdet.detect_image(image, crop = crop, count=count) 78 | r_image.show() 79 | 80 | elif mode == "video": 81 | capture = cv2.VideoCapture(video_path) 82 | if video_save_path!="": 83 | fourcc = cv2.VideoWriter_fourcc(*'XVID') 84 | size = (int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)), int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))) 85 | out = cv2.VideoWriter(video_save_path, fourcc, video_fps, size) 86 | 87 | ref, frame = capture.read() 88 | if not ref: 89 | raise ValueError("未能正确读取摄像头(视频),请注意是否正确安装摄像头(是否正确填写视频路径)。") 90 | 91 | fps = 0.0 92 | while(True): 93 | t1 = time.time() 94 | # 读取某一帧 95 | ref, frame = capture.read() 96 | if not ref: 97 | break 98 | # 格式转变,BGRtoRGB 99 | frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB) 100 | # 转变成Image 101 | frame = Image.fromarray(np.uint8(frame)) 102 | # 进行检测 103 | frame = np.array(efficientdet.detect_image(frame)) 104 | # RGBtoBGR满足opencv显示格式 105 | frame = cv2.cvtColor(frame,cv2.COLOR_RGB2BGR) 106 | 107 | fps = ( fps + (1./(time.time()-t1)) ) / 2 108 | print("fps= %.2f"%(fps)) 109 | frame = cv2.putText(frame, "fps= %.2f"%(fps), (0, 40), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2) 110 | 111 | cv2.imshow("video",frame) 112 | c= cv2.waitKey(1) & 0xff 113 | if video_save_path!="": 114 | out.write(frame) 115 | 116 | if c==27: 117 | capture.release() 118 | break 119 | 120 | print("Video Detection Done!") 121 | capture.release() 122 | if video_save_path!="": 123 | print("Save processed video to the path :" + video_save_path) 124 | out.release() 125 | cv2.destroyAllWindows() 126 | 127 | elif mode == "fps": 128 | img = Image.open(fps_image_path) 129 | tact_time = efficientdet.get_FPS(img, test_interval) 130 | print(str(tact_time) + ' seconds, ' + str(1/tact_time) + 'FPS, @batch_size 1') 131 | 132 | elif mode == "dir_predict": 133 | import os 134 | 135 | from tqdm import tqdm 136 | 137 | img_names = os.listdir(dir_origin_path) 138 | for img_name in tqdm(img_names): 139 | if img_name.lower().endswith(('.bmp', '.dib', '.png', '.jpg', '.jpeg', '.pbm', '.pgm', '.ppm', '.tif', '.tiff')): 140 | image_path = os.path.join(dir_origin_path, img_name) 141 | image = Image.open(image_path) 142 | r_image = efficientdet.detect_image(image) 143 | if not os.path.exists(dir_save_path): 144 | os.makedirs(dir_save_path) 145 | r_image.save(os.path.join(dir_save_path, img_name.replace(".jpg", ".png")), quality=95, subsampling=0) 146 | 147 | else: 148 | raise AssertionError("Please specify the correct mode: 'predict', 'video', 'fps' or 'dir_predict'.") 149 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | scipy==1.2.1 2 | numpy==1.17.0 3 | Keras==2.1.5 4 | matplotlib==3.1.2 5 | opencv_python==4.1.2.30 6 | tensorflow_gpu==1.13.2 7 | tqdm==4.60.0 8 | Pillow==8.2.0 9 | h5py==2.10.0 10 | -------------------------------------------------------------------------------- /summary.py: -------------------------------------------------------------------------------- 1 | #--------------------------------------------# 2 | # 该部分代码用于看网络结构 3 | #--------------------------------------------# 4 | from nets.efficientdet import efficientdet 5 | from utils.utils import image_sizes, net_flops 6 | 7 | if __name__ == "__main__": 8 | phi = 0 9 | input_shape = [image_sizes[phi], image_sizes[phi], 3] 10 | num_classes = 80 11 | 12 | model = efficientdet(input_shape, phi, num_classes) 13 | #--------------------------------------------# 14 | # 查看网络结构网络结构 15 | #--------------------------------------------# 16 | model.summary() 17 | #--------------------------------------------# 18 | # 计算网络的FLOPS 19 | #--------------------------------------------# 20 | net_flops(model, table=False) 21 | 22 | #--------------------------------------------# 23 | # 获得网络每个层的名称与序号 24 | #--------------------------------------------# 25 | # for i,layer in enumerate(model.layers): 26 | # print(i,layer.name) 27 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | import os 3 | 4 | import tensorflow as tf 5 | from keras.callbacks import (EarlyStopping, LearningRateScheduler, 6 | ModelCheckpoint, TensorBoard) 7 | from keras.layers import Conv2D, Dense, DepthwiseConv2D 8 | from keras.optimizers import SGD, Adam 9 | from keras.regularizers import l2 10 | from keras.utils.multi_gpu_utils import multi_gpu_model 11 | 12 | from nets.efficientdet import efficientdet 13 | from nets.efficientdet_training import focal, get_lr_scheduler, smooth_l1 14 | from utils.anchors import get_anchors 15 | from utils.callbacks import (ExponentDecayScheduler, LossHistory, 16 | ParallelModelCheckpoint, EvalCallback) 17 | from utils.dataloader import EfficientdetDatasets 18 | from utils.utils import get_classes, image_sizes, show_config 19 | 20 | tf.logging.set_verbosity(tf.logging.ERROR) 21 | 22 | ''' 23 | 训练自己的目标检测模型一定需要注意以下几点: 24 | 1、训练前仔细检查自己的格式是否满足要求,该库要求数据集格式为VOC格式,需要准备好的内容有输入图片和标签 25 | 输入图片为.jpg图片,无需固定大小,传入训练前会自动进行resize。 26 | 灰度图会自动转成RGB图片进行训练,无需自己修改。 27 | 输入图片如果后缀非jpg,需要自己批量转成jpg后再开始训练。 28 | 29 | 标签为.xml格式,文件中会有需要检测的目标信息,标签文件和输入图片文件相对应。 30 | 31 | 2、损失值的大小用于判断是否收敛,比较重要的是有收敛的趋势,即验证集损失不断下降,如果验证集损失基本上不改变的话,模型基本上就收敛了。 32 | 损失值的具体大小并没有什么意义,大和小只在于损失的计算方式,并不是接近于0才好。如果想要让损失好看点,可以直接到对应的损失函数里面除上10000。 33 | 训练过程中的损失值会保存在logs文件夹下的loss_%Y_%m_%d_%H_%M_%S文件夹中 34 | 35 | 3、训练好的权值文件保存在logs文件夹中,每个训练世代(Epoch)包含若干训练步长(Step),每个训练步长(Step)进行一次梯度下降。 36 | 如果只是训练了几个Step是不会保存的,Epoch和Step的概念要捋清楚一下。 37 | ''' 38 | if __name__ == "__main__": 39 | #---------------------------------------------------------------------# 40 | # train_gpu 训练用到的GPU 41 | # 默认为第一张卡、双卡为[0, 1]、三卡为[0, 1, 2] 42 | # 在使用多GPU时,每个卡上的batch为总batch除以卡的数量。 43 | #---------------------------------------------------------------------# 44 | train_gpu = [0,] 45 | #---------------------------------------------------------------------# 46 | # classes_path 指向model_data下的txt,与自己训练的数据集相关 47 | # 训练前一定要修改classes_path,使其对应自己的数据集 48 | #---------------------------------------------------------------------# 49 | classes_path = 'model_data/voc_classes.txt' 50 | #----------------------------------------------------------------------------------------------------------------------------# 51 | # 权值文件的下载请看README,可以通过网盘下载。模型的 预训练权重 对不同数据集是通用的,因为特征是通用的。 52 | # 模型的 预训练权重 比较重要的部分是 主干特征提取网络的权值部分,用于进行特征提取。 53 | # 预训练权重对于99%的情况都必须要用,不用的话主干部分的权值太过随机,特征提取效果不明显,网络训练的结果也不会好 54 | # 55 | # 如果训练过程中存在中断训练的操作,可以将model_path设置成logs文件夹下的权值文件,将已经训练了一部分的权值再次载入。 56 | # 同时修改下方的 冻结阶段 或者 解冻阶段 的参数,来保证模型epoch的连续性。 57 | # 58 | # 当model_path = ''的时候不加载整个模型的权值。 59 | # 60 | # 此处使用的是整个模型的权重,因此是在train.py进行加载的。 61 | # 如果想要让模型从主干的预训练权值开始训练,则设置model_path为主干网络的权值,此时仅加载主干。 62 | # 如果想要让模型从0开始训练,则设置model_path = '',Freeze_Train = Fasle,此时从0开始训练,且没有冻结主干的过程。 63 | # 64 | # 一般来讲,网络从0开始的训练效果会很差,因为权值太过随机,特征提取效果不明显,因此非常、非常、非常不建议大家从0开始训练! 65 | # 如果一定要从0开始,可以了解imagenet数据集,首先训练分类模型,获得网络的主干部分权值,分类模型的 主干部分 和该模型通用,基于此进行训练。 66 | #----------------------------------------------------------------------------------------------------------------------------# 67 | model_path = 'model_data/efficientdet-d0-voc.h5' 68 | #---------------------------------------------------------------------# 69 | # phi 用于选择所使用的模型的版本,0-7 70 | #---------------------------------------------------------------------# 71 | phi = 0 72 | #---------------------------------------------------------------------# 73 | # input_shape 输入的shape大小 74 | #---------------------------------------------------------------------# 75 | input_shape = [image_sizes[phi], image_sizes[phi]] 76 | #---------------------------------------------------------------------# 77 | # anchors_size 可用于设定先验框的大小。 78 | # 默认的anchors_size大多数情况下都是通用的! 79 | # 如果想要检测小物体,可以修改anchors_size 80 | # 一般调小浅层先验框的大小就行了!因为浅层负责小物体检测! 81 | #---------------------------------------------------------------------# 82 | anchors_size = [32, 64, 128, 256, 512] 83 | 84 | #----------------------------------------------------------------------------------------------------------------------------# 85 | # 训练分为两个阶段,分别是冻结阶段和解冻阶段。设置冻结阶段是为了满足机器性能不足的同学的训练需求。 86 | # 冻结训练需要的显存较小,显卡非常差的情况下,可设置Freeze_Epoch等于UnFreeze_Epoch,此时仅仅进行冻结训练。 87 | # 88 | # 在此提供若干参数设置建议,各位训练者根据自己的需求进行灵活调整: 89 | # (一)从整个模型的预训练权重开始训练: 90 | # Adam: 91 | # Init_Epoch = 0,Freeze_Epoch = 50,UnFreeze_Epoch = 100,Freeze_Train = True,optimizer_type = 'adam',Init_lr = 3e-4,weight_decay = 0。(冻结) 92 | # Init_Epoch = 0,UnFreeze_Epoch = 100,Freeze_Train = False,optimizer_type = 'adam',Init_lr = 3e-4,weight_decay = 0。(不冻结) 93 | # SGD: 94 | # Init_Epoch = 0,Freeze_Epoch = 50,UnFreeze_Epoch = 200,Freeze_Train = True,optimizer_type = 'sgd',Init_lr = 1e-2,weight_decay = 4e-5。(冻结) 95 | # Init_Epoch = 0,UnFreeze_Epoch = 200,Freeze_Train = False,optimizer_type = 'sgd',Init_lr = 1e-2,weight_decay = 4e-5。(不冻结) 96 | # 其中:UnFreeze_Epoch可以在100-300之间调整。 97 | # (二)从主干网络的预训练权重开始训练: 98 | # Adam: 99 | # Init_Epoch = 0,Freeze_Epoch = 50,UnFreeze_Epoch = 100,Freeze_Train = True,optimizer_type = 'adam',Init_lr = 3e-4,weight_decay = 0。(冻结) 100 | # Init_Epoch = 0,UnFreeze_Epoch = 100,Freeze_Train = False,optimizer_type = 'adam',Init_lr = 3e-4,weight_decay = 0。(不冻结) 101 | # SGD: 102 | # Init_Epoch = 0,Freeze_Epoch = 50,UnFreeze_Epoch = 200,Freeze_Train = True,optimizer_type = 'sgd',Init_lr = 1e-2,weight_decay = 4e-5。(冻结) 103 | # Init_Epoch = 0,UnFreeze_Epoch = 200,Freeze_Train = False,optimizer_type = 'sgd',Init_lr = 1e-2,weight_decay = 4e-5。(不冻结) 104 | # 其中:由于从主干网络的预训练权重开始训练,主干的权值不一定适合目标检测,需要更多的训练跳出局部最优解。 105 | # UnFreeze_Epoch可以在200-300之间调整,YOLOV5和YOLOX均推荐使用300。 106 | # Adam相较于SGD收敛的快一些。因此UnFreeze_Epoch理论上可以小一点,但依然推荐更多的Epoch。 107 | # (三)batch_size的设置: 108 | # 在显卡能够接受的范围内,以大为好。显存不足与数据集大小无关,提示显存不足(OOM或者CUDA out of memory)请调小batch_size。 109 | # 受到BatchNorm层影响,batch_size最小为2,不能为1。 110 | # 正常情况下Freeze_batch_size建议为Unfreeze_batch_size的1-2倍。不建议设置的差距过大,因为关系到学习率的自动调整。 111 | #----------------------------------------------------------------------------------------------------------------------------# 112 | #------------------------------------------------------------------# 113 | # 冻结阶段训练参数 114 | # 此时模型的主干被冻结了,特征提取网络不发生改变 115 | # 占用的显存较小,仅对网络进行微调 116 | # Init_Epoch 模型当前开始的训练世代,其值可以大于Freeze_Epoch,如设置: 117 | # Init_Epoch = 60、Freeze_Epoch = 50、UnFreeze_Epoch = 100 118 | # 会跳过冻结阶段,直接从60代开始,并调整对应的学习率。 119 | # (断点续练时使用) 120 | # Freeze_Epoch 模型冻结训练的Freeze_Epoch 121 | # (当Freeze_Train=False时失效) 122 | # Freeze_batch_size 模型冻结训练的batch_size 123 | # (当Freeze_Train=False时失效) 124 | #------------------------------------------------------------------# 125 | Init_Epoch = 0 126 | Freeze_Epoch = 50 127 | Freeze_batch_size = 8 128 | #------------------------------------------------------------------# 129 | # 解冻阶段训练参数 130 | # 此时模型的主干不被冻结了,特征提取网络会发生改变 131 | # 占用的显存较大,网络所有的参数都会发生改变 132 | # UnFreeze_Epoch 模型总共训练的epoch 133 | # SGD需要更长的时间收敛,因此设置较大的UnFreeze_Epoch 134 | # Adam可以使用相对较小的UnFreeze_Epoch 135 | # Unfreeze_batch_size 模型在解冻后的batch_size 136 | #------------------------------------------------------------------# 137 | UnFreeze_Epoch = 100 138 | Unfreeze_batch_size = 4 139 | #------------------------------------------------------------------# 140 | # Freeze_Train 是否进行冻结训练 141 | # 默认先冻结主干训练后解冻训练。 142 | #------------------------------------------------------------------# 143 | Freeze_Train = True 144 | 145 | #------------------------------------------------------------------# 146 | # 其它训练参数:学习率、优化器、学习率下降有关 147 | #------------------------------------------------------------------# 148 | #------------------------------------------------------------------# 149 | # Init_lr 模型的最大学习率 150 | # 当使用Adam优化器时建议设置 Init_lr=3e-4 151 | # 当使用SGD优化器时建议设置 Init_lr=1e-2 152 | # Min_lr 模型的最小学习率,默认为最大学习率的0.01 153 | #------------------------------------------------------------------# 154 | Init_lr = 3e-4 155 | Min_lr = Init_lr * 0.01 156 | #------------------------------------------------------------------# 157 | # optimizer_type 使用到的优化器种类,可选的有adam、sgd 158 | # 当使用Adam优化器时建议设置 Init_lr=3e-4 159 | # 当使用SGD优化器时建议设置 Init_lr=1e-2 160 | # momentum 优化器内部使用到的momentum参数 161 | # weight_decay 权值衰减,可防止过拟合 162 | # adam会导致weight_decay错误,使用adam时建议设置为0。 163 | #------------------------------------------------------------------# 164 | optimizer_type = "adam" 165 | momentum = 0.9 166 | weight_decay = 0 167 | #------------------------------------------------------------------# 168 | # lr_decay_type 使用到的学习率下降方式,可选的有'step'、'cos' 169 | #------------------------------------------------------------------# 170 | lr_decay_type = 'cos' 171 | #------------------------------------------------------------------# 172 | # save_period 多少个epoch保存一次权值 173 | #------------------------------------------------------------------# 174 | save_period = 5 175 | #------------------------------------------------------------------# 176 | # save_dir 权值与日志文件保存的文件夹 177 | #------------------------------------------------------------------# 178 | save_dir = 'logs' 179 | #------------------------------------------------------------------# 180 | # eval_flag 是否在训练时进行评估,评估对象为验证集 181 | # 安装pycocotools库后,评估体验更佳。 182 | # eval_period 代表多少个epoch评估一次,不建议频繁的评估 183 | # 评估需要消耗较多的时间,频繁评估会导致训练非常慢 184 | # 此处获得的mAP会与get_map.py获得的会有所不同,原因有二: 185 | # (一)此处获得的mAP为验证集的mAP。 186 | # (二)此处设置评估参数较为保守,目的是加快评估速度。 187 | #------------------------------------------------------------------# 188 | eval_flag = True 189 | eval_period = 5 190 | #------------------------------------------------------------------# 191 | # num_workers 用于设置是否使用多线程读取数据,1代表关闭多线程 192 | # 开启后会加快数据读取速度,但是会占用更多内存 193 | # 在IO为瓶颈的时候再开启多线程,即GPU运算速度远大于读取图片的速度。 194 | #------------------------------------------------------------------# 195 | num_workers = 1 196 | 197 | #------------------------------------------------------# 198 | # train_annotation_path 训练图片路径和标签 199 | # val_annotation_path 验证图片路径和标签 200 | #------------------------------------------------------# 201 | train_annotation_path = '2007_train.txt' 202 | val_annotation_path = '2007_val.txt' 203 | 204 | #------------------------------------------------------# 205 | # 设置用到的显卡 206 | #------------------------------------------------------# 207 | os.environ["CUDA_VISIBLE_DEVICES"] = ','.join(str(x) for x in train_gpu) 208 | ngpus_per_node = len(train_gpu) 209 | print('Number of devices: {}'.format(ngpus_per_node)) 210 | 211 | #----------------------------------------------------# 212 | # 获取classes和anchor 213 | #----------------------------------------------------# 214 | class_names, num_classes = get_classes(classes_path) 215 | anchors = get_anchors(input_shape, anchors_size) 216 | 217 | model_body = efficientdet([input_shape[0], input_shape[1], 3], phi, num_classes) 218 | if model_path != '': 219 | #------------------------------------------------------# 220 | # 载入预训练权重 221 | #------------------------------------------------------# 222 | print('Load weights {}.'.format(model_path)) 223 | model_body.load_weights(model_path, by_name=True, skip_mismatch=True) 224 | 225 | if ngpus_per_node > 1: 226 | model = multi_gpu_model(model_body, gpus=ngpus_per_node) 227 | else: 228 | model = model_body 229 | 230 | #---------------------------# 231 | # 读取数据集对应的txt 232 | #---------------------------# 233 | with open(train_annotation_path, encoding='utf-8') as f: 234 | train_lines = f.readlines() 235 | with open(val_annotation_path, encoding='utf-8') as f: 236 | val_lines = f.readlines() 237 | num_train = len(train_lines) 238 | num_val = len(val_lines) 239 | 240 | show_config( 241 | classes_path = classes_path, model_path = model_path, input_shape = input_shape, \ 242 | Init_Epoch = Init_Epoch, Freeze_Epoch = Freeze_Epoch, UnFreeze_Epoch = UnFreeze_Epoch, Freeze_batch_size = Freeze_batch_size, Unfreeze_batch_size = Unfreeze_batch_size, Freeze_Train = Freeze_Train, \ 243 | Init_lr = Init_lr, Min_lr = Min_lr, optimizer_type = optimizer_type, momentum = momentum, lr_decay_type = lr_decay_type, \ 244 | save_period = save_period, save_dir = save_dir, num_workers = num_workers, num_train = num_train, num_val = num_val 245 | ) 246 | #---------------------------------------------------------# 247 | # 总训练世代指的是遍历全部数据的总次数 248 | # 总训练步长指的是梯度下降的总次数 249 | # 每个训练世代包含若干训练步长,每个训练步长进行一次梯度下降。 250 | # 此处仅建议最低训练世代,上不封顶,计算时只考虑了解冻部分 251 | #----------------------------------------------------------# 252 | wanted_step = 5e4 if optimizer_type == "sgd" else 1.5e4 253 | total_step = num_train // Unfreeze_batch_size * UnFreeze_Epoch 254 | if total_step <= wanted_step: 255 | if num_train // Unfreeze_batch_size == 0: 256 | raise ValueError('数据集过小,无法进行训练,请扩充数据集。') 257 | wanted_epoch = wanted_step // (num_train // Unfreeze_batch_size) + 1 258 | print("\n\033[1;33;44m[Warning] 使用%s优化器时,建议将训练总步长设置到%d以上。\033[0m"%(optimizer_type, wanted_step)) 259 | print("\033[1;33;44m[Warning] 本次运行的总训练数据量为%d,Unfreeze_batch_size为%d,共训练%d个Epoch,计算出总训练步长为%d。\033[0m"%(num_train, Unfreeze_batch_size, UnFreeze_Epoch, total_step)) 260 | print("\033[1;33;44m[Warning] 由于总训练步长为%d,小于建议总步长%d,建议设置总世代为%d。\033[0m"%(total_step, wanted_step, wanted_epoch)) 261 | 262 | for layer in model_body.layers: 263 | if isinstance(layer, DepthwiseConv2D): 264 | layer.add_loss(l2(weight_decay)(layer.depthwise_kernel)) 265 | elif isinstance(layer, Conv2D) or isinstance(layer, Dense): 266 | layer.add_loss(l2(weight_decay)(layer.kernel)) 267 | 268 | #------------------------------------------------------# 269 | # 主干特征提取网络特征通用,冻结训练可以加快训练速度 270 | # 也可以在训练初期防止权值被破坏。 271 | # Init_Epoch为起始世代 272 | # Freeze_Epoch为冻结训练的世代 273 | # UnFreeze_Epoch总训练世代 274 | # 提示OOM或者显存不足请调小Batch_size 275 | #------------------------------------------------------# 276 | if True: 277 | if Freeze_Train: 278 | freeze_layers = [226, 328, 328, 373, 463, 565, 655, 802][phi] 279 | for i in range(freeze_layers): model_body.layers[i].trainable = False 280 | print('Freeze the first {} layers of total {} layers.'.format(freeze_layers, len(model_body.layers))) 281 | 282 | #-------------------------------------------------------------------# 283 | # 如果不冻结训练的话,直接设置batch_size为Unfreeze_batch_size 284 | #-------------------------------------------------------------------# 285 | batch_size = Freeze_batch_size if Freeze_Train else Unfreeze_batch_size 286 | start_epoch = Init_Epoch 287 | end_epoch = Freeze_Epoch if Freeze_Train else UnFreeze_Epoch 288 | 289 | #-------------------------------------------------------------------# 290 | # 判断当前batch_size,自适应调整学习率 291 | #-------------------------------------------------------------------# 292 | nbs = 16 293 | lr_limit_max = 5e-4 if optimizer_type == 'adam' else 1e-1 294 | lr_limit_min = 3e-4 if optimizer_type == 'adam' else 5e-4 295 | Init_lr_fit = min(max(batch_size / nbs * Init_lr, lr_limit_min), lr_limit_max) 296 | Min_lr_fit = min(max(batch_size / nbs * Min_lr, lr_limit_min * 1e-2), lr_limit_max * 1e-2) 297 | 298 | optimizer = { 299 | 'adam' : Adam(lr = Init_lr_fit, beta_1 = momentum), 300 | 'sgd' : SGD(lr = Init_lr_fit, momentum = momentum, nesterov=True) 301 | }[optimizer_type] 302 | model.compile( 303 | loss = { 304 | 'regression' : smooth_l1(), 305 | 'classification': focal() 306 | }, 307 | optimizer = optimizer 308 | ) 309 | #---------------------------------------# 310 | # 获得学习率下降的公式 311 | #---------------------------------------# 312 | lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch) 313 | 314 | epoch_step = num_train // batch_size 315 | epoch_step_val = num_val // batch_size 316 | 317 | if epoch_step == 0 or epoch_step_val == 0: 318 | raise ValueError('数据集过小,无法进行训练,请扩充数据集。') 319 | 320 | train_dataloader = EfficientdetDatasets(train_lines, input_shape, anchors, batch_size, num_classes, train = True) 321 | val_dataloader = EfficientdetDatasets(val_lines, input_shape, anchors, batch_size, num_classes, train = False) 322 | 323 | #-------------------------------------------------------------------------------# 324 | # 训练参数的设置 325 | # logging 用于设置tensorboard的保存地址 326 | # checkpoint 用于设置权值保存的细节,period用于修改多少epoch保存一次 327 | # lr_scheduler 用于设置学习率下降的方式 328 | # early_stopping 用于设定早停,val_loss多次不下降自动结束训练,表示模型基本收敛 329 | #-------------------------------------------------------------------------------# 330 | time_str = datetime.datetime.strftime(datetime.datetime.now(),'%Y_%m_%d_%H_%M_%S') 331 | log_dir = os.path.join(save_dir, "loss_" + str(time_str)) 332 | logging = TensorBoard(log_dir) 333 | loss_history = LossHistory(log_dir) 334 | if ngpus_per_node > 1: 335 | checkpoint = ParallelModelCheckpoint(model_body, os.path.join(save_dir, "ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5"), 336 | monitor = 'val_loss', save_weights_only = True, save_best_only = False, period = save_period) 337 | checkpoint_last = ParallelModelCheckpoint(model_body, os.path.join(save_dir, "last_epoch_weights.h5"), 338 | monitor = 'val_loss', save_weights_only = True, save_best_only = False, period = 1) 339 | checkpoint_best = ParallelModelCheckpoint(model_body, os.path.join(save_dir, "best_epoch_weights.h5"), 340 | monitor = 'val_loss', save_weights_only = True, save_best_only = True, period = 1) 341 | else: 342 | checkpoint = ModelCheckpoint(os.path.join(save_dir, "ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5"), 343 | monitor = 'val_loss', save_weights_only = True, save_best_only = False, period = save_period) 344 | checkpoint_last = ModelCheckpoint(os.path.join(save_dir, "last_epoch_weights.h5"), 345 | monitor = 'val_loss', save_weights_only = True, save_best_only = False, period = 1) 346 | checkpoint_best = ModelCheckpoint(os.path.join(save_dir, "best_epoch_weights.h5"), 347 | monitor = 'val_loss', save_weights_only = True, save_best_only = True, period = 1) 348 | early_stopping = EarlyStopping(monitor='val_loss', min_delta = 0, patience = 10, verbose = 1) 349 | lr_scheduler = LearningRateScheduler(lr_scheduler_func, verbose = 1) 350 | eval_callback = EvalCallback(model_body, input_shape, anchors, class_names, num_classes, val_lines, log_dir, \ 351 | eval_flag=eval_flag, period=eval_period) 352 | callbacks = [logging, loss_history, checkpoint, checkpoint_last, checkpoint_best, lr_scheduler, eval_callback] 353 | 354 | if start_epoch < end_epoch: 355 | print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size)) 356 | model.fit_generator( 357 | generator = train_dataloader, 358 | steps_per_epoch = epoch_step, 359 | validation_data = val_dataloader, 360 | validation_steps = epoch_step_val, 361 | epochs = end_epoch, 362 | initial_epoch = start_epoch, 363 | use_multiprocessing = True if num_workers > 1 else False, 364 | workers = num_workers, 365 | callbacks = callbacks 366 | ) 367 | #---------------------------------------# 368 | # 如果模型有冻结学习部分 369 | # 则解冻,并设置参数 370 | #---------------------------------------# 371 | if Freeze_Train: 372 | batch_size = Unfreeze_batch_size 373 | start_epoch = Freeze_Epoch if start_epoch < Freeze_Epoch else start_epoch 374 | end_epoch = UnFreeze_Epoch 375 | 376 | #-------------------------------------------------------------------# 377 | # 判断当前batch_size,自适应调整学习率 378 | #-------------------------------------------------------------------# 379 | nbs = 16 380 | lr_limit_max = 5e-4 if optimizer_type == 'adam' else 1e-1 381 | lr_limit_min = 3e-4 if optimizer_type == 'adam' else 5e-4 382 | Init_lr_fit = min(max(batch_size / nbs * Init_lr, lr_limit_min), lr_limit_max) 383 | Min_lr_fit = min(max(batch_size / nbs * Min_lr, lr_limit_min * 1e-2), lr_limit_max * 1e-2) 384 | #---------------------------------------# 385 | # 获得学习率下降的公式 386 | #---------------------------------------# 387 | lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch) 388 | lr_scheduler = LearningRateScheduler(lr_scheduler_func, verbose = 1) 389 | callbacks = [logging, loss_history, checkpoint, checkpoint_last, checkpoint_best, lr_scheduler, eval_callback] 390 | 391 | for i in range(len(model_body.layers)): 392 | model_body.layers[i].trainable = True 393 | model.compile( 394 | loss = { 395 | 'regression' : smooth_l1(), 396 | 'classification': focal() 397 | }, 398 | optimizer = optimizer 399 | ) 400 | 401 | epoch_step = num_train // batch_size 402 | epoch_step_val = num_val // batch_size 403 | 404 | if epoch_step == 0 or epoch_step_val == 0: 405 | raise ValueError("数据集过小,无法继续进行训练,请扩充数据集。") 406 | 407 | train_dataloader.batch_size = Unfreeze_batch_size 408 | val_dataloader.batch_size = Unfreeze_batch_size 409 | 410 | print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size)) 411 | model.fit_generator( 412 | generator = train_dataloader, 413 | steps_per_epoch = epoch_step, 414 | validation_data = val_dataloader, 415 | validation_steps = epoch_step_val, 416 | epochs = end_epoch, 417 | initial_epoch = start_epoch, 418 | use_multiprocessing = True if num_workers > 1 else False, 419 | workers = num_workers, 420 | callbacks = callbacks 421 | ) 422 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------------------------------- /utils/anchors.py: -------------------------------------------------------------------------------- 1 | import keras 2 | import numpy as np 3 | 4 | 5 | class AnchorBox: 6 | def __init__(self, ratios, scales): 7 | self.ratios = ratios 8 | self.scales = scales 9 | self.num_anchors = len(self.ratios) * len(self.scales) 10 | 11 | def generate_anchors(self, base_size = 16): 12 | # anchors - 9,4 13 | anchors = np.zeros((self.num_anchors, 4)) 14 | anchors[:, 2:] = base_size * np.tile(self.scales, (2, len(self.scales))).T 15 | 16 | # 计算先验框的面积 17 | areas = anchors[:, 2] * anchors[:, 3] 18 | 19 | # np.repeat(ratios, len(scales)) [0.5 0.5 0.5 1. 1. 1. 2. 2. 2. ] 20 | anchors[:, 2] = np.sqrt(areas / np.repeat(self.ratios, len(self.scales))) 21 | anchors[:, 3] = np.sqrt(areas * np.repeat(self.ratios, len(self.scales))) 22 | 23 | anchors[:, 0::2] -= np.tile(anchors[:, 2] * 0.5, (2, 1)).T 24 | anchors[:, 1::2] -= np.tile(anchors[:, 3] * 0.5, (2, 1)).T 25 | return anchors 26 | 27 | def shift(self, shape, stride, anchors): 28 | # 生成特征层的网格中心 29 | shift_x = (np.arange(0, shape[1], dtype=keras.backend.floatx()) + 0.5) * stride 30 | shift_y = (np.arange(0, shape[0], dtype=keras.backend.floatx()) + 0.5) * stride 31 | shift_x, shift_y = np.meshgrid(shift_x, shift_y) 32 | 33 | shift_x = np.reshape(shift_x, [-1]) 34 | shift_y = np.reshape(shift_y, [-1]) 35 | 36 | # 将网格中心进行堆叠 37 | shifts = np.stack([ 38 | shift_x, 39 | shift_y, 40 | shift_x, 41 | shift_y 42 | ], axis=0) 43 | 44 | shifts = np.transpose(shifts) 45 | number_of_anchors = np.shape(anchors)[0] 46 | 47 | k = np.shape(shifts)[0] 48 | # shifted_anchors k, 9, 4 -> k * 9, 4 49 | shifted_anchors = np.reshape(anchors, [1, number_of_anchors, 4]) + np.array(np.reshape(shifts, [k, 1, 4])) 50 | shifted_anchors = np.reshape(shifted_anchors, [k * number_of_anchors, 4]) 51 | return shifted_anchors 52 | 53 | #---------------------------------------------------# 54 | # 用于各个特征层的大小 55 | #---------------------------------------------------# 56 | def get_img_output_length(height, width): 57 | filter_sizes = [7, 3, 3, 3, 3, 3, 3] 58 | padding = [3, 1, 1, 1, 1, 1, 1] 59 | stride = [2, 2, 2, 2, 2, 2, 2] 60 | feature_heights = [] 61 | feature_widths = [] 62 | 63 | for i in range(len(filter_sizes)): 64 | height = (height + 2*padding[i] - filter_sizes[i]) // stride[i] + 1 65 | width = (width + 2*padding[i] - filter_sizes[i]) // stride[i] + 1 66 | feature_heights.append(height) 67 | feature_widths.append(width) 68 | return np.array(feature_heights)[-5:], np.array(feature_widths)[-5:] 69 | 70 | def get_anchors(input_shape, anchors_size = [32, 64, 128, 256, 512], strides = [8, 16, 32, 64, 128], \ 71 | ratios = [0.5, 1, 2], scales = [2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)]): 72 | feature_heights, feature_widths = get_img_output_length(input_shape[0], input_shape[1]) 73 | 74 | all_anchors = [] 75 | anchor_box = AnchorBox(ratios, scales) 76 | for i in range(len(anchors_size)): 77 | #------------------------------# 78 | # 先生成每个特征点的9个先验框 79 | # anchors 9, 4 80 | #------------------------------# 81 | anchors = anchor_box.generate_anchors(anchors_size[i]) 82 | shifted_anchors = anchor_box.shift([feature_heights[i], feature_widths[i]], strides[i], anchors) 83 | all_anchors.append(shifted_anchors) 84 | 85 | # 将每个特征层的先验框进行堆叠。 86 | all_anchors = np.concatenate(all_anchors,axis=0) 87 | all_anchors = all_anchors / np.array([input_shape[1], input_shape[0], input_shape[1], input_shape[0]]) 88 | return all_anchors 89 | -------------------------------------------------------------------------------- /utils/callbacks.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import keras 4 | import matplotlib 5 | matplotlib.use('Agg') 6 | from matplotlib import pyplot as plt 7 | import scipy.signal 8 | 9 | import shutil 10 | import numpy as np 11 | 12 | from keras import backend as K 13 | from PIL import Image 14 | from tqdm import tqdm 15 | from .utils import cvtColor, preprocess_input, resize_image 16 | from .utils_bbox import BBoxUtility 17 | from .utils_map import get_coco_map, get_map 18 | 19 | 20 | class LossHistory(keras.callbacks.Callback): 21 | def __init__(self, log_dir): 22 | self.log_dir = log_dir 23 | self.losses = [] 24 | self.val_loss = [] 25 | 26 | os.makedirs(self.log_dir) 27 | 28 | def on_epoch_end(self, epoch, logs={}): 29 | if not os.path.exists(self.log_dir): 30 | os.makedirs(self.log_dir) 31 | 32 | self.losses.append(logs.get('loss')) 33 | self.val_loss.append(logs.get('val_loss')) 34 | 35 | with open(os.path.join(self.log_dir, "epoch_loss.txt"), 'a') as f: 36 | f.write(str(logs.get('loss'))) 37 | f.write("\n") 38 | with open(os.path.join(self.log_dir, "epoch_val_loss.txt"), 'a') as f: 39 | f.write(str(logs.get('val_loss'))) 40 | f.write("\n") 41 | self.loss_plot() 42 | 43 | def loss_plot(self): 44 | iters = range(len(self.losses)) 45 | 46 | plt.figure() 47 | plt.plot(iters, self.losses, 'red', linewidth = 2, label='train loss') 48 | plt.plot(iters, self.val_loss, 'coral', linewidth = 2, label='val loss') 49 | try: 50 | if len(self.losses) < 25: 51 | num = 5 52 | else: 53 | num = 15 54 | 55 | plt.plot(iters, scipy.signal.savgol_filter(self.losses, num, 3), 'green', linestyle = '--', linewidth = 2, label='smooth train loss') 56 | plt.plot(iters, scipy.signal.savgol_filter(self.val_loss, num, 3), '#8B4513', linestyle = '--', linewidth = 2, label='smooth val loss') 57 | except: 58 | pass 59 | 60 | plt.grid(True) 61 | plt.xlabel('Epoch') 62 | plt.ylabel('Loss') 63 | plt.title('A Loss Curve') 64 | plt.legend(loc="upper right") 65 | 66 | plt.savefig(os.path.join(self.log_dir, "epoch_loss.png")) 67 | 68 | plt.cla() 69 | plt.close("all") 70 | 71 | class ExponentDecayScheduler(keras.callbacks.Callback): 72 | def __init__(self, 73 | decay_rate, 74 | verbose=0): 75 | super(ExponentDecayScheduler, self).__init__() 76 | self.decay_rate = decay_rate 77 | self.verbose = verbose 78 | self.learning_rates = [] 79 | 80 | def on_epoch_end(self, batch, logs=None): 81 | learning_rate = K.get_value(self.model.optimizer.lr) * self.decay_rate 82 | K.set_value(self.model.optimizer.lr, learning_rate) 83 | if self.verbose > 0: 84 | print('Setting learning rate to %s.' % (learning_rate)) 85 | 86 | class ParallelModelCheckpoint(keras.callbacks.ModelCheckpoint): 87 | def __init__(self, model, filepath, monitor='val_loss', verbose=0, 88 | save_best_only=False, save_weights_only=False, 89 | mode='auto', period=1): 90 | self.single_model = model 91 | super(ParallelModelCheckpoint,self).__init__(filepath, monitor, verbose,save_best_only, save_weights_only,mode, period) 92 | 93 | def set_model(self, model): 94 | super(ParallelModelCheckpoint,self).set_model(self.single_model) 95 | 96 | class EvalCallback(keras.callbacks.Callback): 97 | def __init__(self, model_body, input_shape, anchors, class_names, num_classes, val_lines, log_dir,\ 98 | map_out_path=".temp_map_out", max_boxes=100, confidence=0.05, nms_iou=0.5, letterbox_image=True, MINOVERLAP=0.5, eval_flag=True, period=1): 99 | super(EvalCallback, self).__init__() 100 | 101 | self.model_body = model_body 102 | self.input_shape = input_shape 103 | self.anchors = anchors 104 | self.class_names = class_names 105 | self.num_classes = num_classes 106 | self.val_lines = val_lines 107 | self.log_dir = log_dir 108 | self.map_out_path = map_out_path 109 | self.max_boxes = max_boxes 110 | self.confidence = confidence 111 | self.nms_iou = nms_iou 112 | self.letterbox_image = letterbox_image 113 | self.MINOVERLAP = MINOVERLAP 114 | self.eval_flag = eval_flag 115 | self.period = period 116 | 117 | self.bbox_util = BBoxUtility(self.num_classes, nms_thresh=self.nms_iou) 118 | 119 | self.maps = [0] 120 | self.epoches = [0] 121 | if self.eval_flag: 122 | with open(os.path.join(self.log_dir, "epoch_map.txt"), 'a') as f: 123 | f.write(str(0)) 124 | f.write("\n") 125 | 126 | def get_map_txt(self, image_id, image, class_names, map_out_path): 127 | f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"),"w") 128 | image_shape = np.array(np.shape(image)[0:2]) 129 | #---------------------------------------------------------# 130 | # 在这里将图像转换成RGB图像,防止灰度图在预测时报错。 131 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 132 | #---------------------------------------------------------# 133 | image = cvtColor(image) 134 | #---------------------------------------------------------# 135 | # 给图像增加灰条,实现不失真的resize 136 | # 也可以直接resize进行识别 137 | #---------------------------------------------------------# 138 | image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image) 139 | #---------------------------------------------------------# 140 | # 添加上batch_size维度,图片预处理,归一化。 141 | #---------------------------------------------------------# 142 | image_data = preprocess_input(np.expand_dims(np.array(image_data, dtype='float32'), 0)) 143 | 144 | preds = self.model_body.predict(image_data) 145 | #-----------------------------------------------------------# 146 | # 将预测结果进行解码 147 | #-----------------------------------------------------------# 148 | results = self.bbox_util.decode_box(preds, self.anchors, image_shape, 149 | self.input_shape, self.letterbox_image, confidence=self.confidence) 150 | #--------------------------------------# 151 | # 如果没有检测到物体,则返回原图 152 | #--------------------------------------# 153 | if results[0] is None: 154 | return 155 | 156 | top_label = results[0][:, 5] 157 | top_conf = results[0][:, 4] 158 | top_boxes = results[0][:, :4] 159 | 160 | top_100 = np.argsort(top_conf)[::-1][:self.max_boxes] 161 | top_boxes = top_boxes[top_100] 162 | top_conf = top_conf[top_100] 163 | top_label = top_label[top_100] 164 | 165 | for i, c in list(enumerate(top_label)): 166 | predicted_class = self.class_names[int(c)] 167 | box = top_boxes[i] 168 | score = str(top_conf[i]) 169 | 170 | top, left, bottom, right = box 171 | 172 | if predicted_class not in class_names: 173 | continue 174 | 175 | f.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom)))) 176 | 177 | f.close() 178 | return 179 | 180 | def on_epoch_end(self, epoch, logs=None): 181 | temp_epoch = epoch + 1 182 | if temp_epoch % self.period == 0 and self.eval_flag: 183 | if not os.path.exists(self.map_out_path): 184 | os.makedirs(self.map_out_path) 185 | if not os.path.exists(os.path.join(self.map_out_path, "ground-truth")): 186 | os.makedirs(os.path.join(self.map_out_path, "ground-truth")) 187 | if not os.path.exists(os.path.join(self.map_out_path, "detection-results")): 188 | os.makedirs(os.path.join(self.map_out_path, "detection-results")) 189 | print("Get map.") 190 | for annotation_line in tqdm(self.val_lines): 191 | line = annotation_line.split() 192 | image_id = os.path.basename(line[0]).split('.')[0] 193 | #------------------------------# 194 | # 读取图像并转换成RGB图像 195 | #------------------------------# 196 | image = Image.open(line[0]) 197 | #------------------------------# 198 | # 获得预测框 199 | #------------------------------# 200 | gt_boxes = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]]) 201 | #------------------------------# 202 | # 获得预测txt 203 | #------------------------------# 204 | self.get_map_txt(image_id, image, self.class_names, self.map_out_path) 205 | 206 | #------------------------------# 207 | # 获得真实框txt 208 | #------------------------------# 209 | with open(os.path.join(self.map_out_path, "ground-truth/"+image_id+".txt"), "w") as new_f: 210 | for box in gt_boxes: 211 | left, top, right, bottom, obj = box 212 | obj_name = self.class_names[obj] 213 | new_f.write("%s %s %s %s %s\n" % (obj_name, left, top, right, bottom)) 214 | 215 | print("Calculate Map.") 216 | try: 217 | temp_map = get_coco_map(class_names = self.class_names, path = self.map_out_path)[1] 218 | except: 219 | temp_map = get_map(self.MINOVERLAP, False, path = self.map_out_path) 220 | self.maps.append(temp_map) 221 | self.epoches.append(temp_epoch) 222 | 223 | with open(os.path.join(self.log_dir, "epoch_map.txt"), 'a') as f: 224 | f.write(str(temp_map)) 225 | f.write("\n") 226 | 227 | plt.figure() 228 | plt.plot(self.epoches, self.maps, 'red', linewidth = 2, label='train map') 229 | 230 | plt.grid(True) 231 | plt.xlabel('Epoch') 232 | plt.ylabel('Map %s'%str(self.MINOVERLAP)) 233 | plt.title('A Map Curve') 234 | plt.legend(loc="upper right") 235 | 236 | plt.savefig(os.path.join(self.log_dir, "epoch_map.png")) 237 | plt.cla() 238 | plt.close("all") 239 | 240 | print("Get map done.") 241 | shutil.rmtree(self.map_out_path) 242 | -------------------------------------------------------------------------------- /utils/dataloader.py: -------------------------------------------------------------------------------- 1 | import math 2 | from random import shuffle 3 | 4 | import cv2 5 | import keras 6 | import numpy as np 7 | from PIL import Image 8 | 9 | from utils.utils import cvtColor, preprocess_input 10 | 11 | 12 | class EfficientdetDatasets(keras.utils.Sequence): 13 | def __init__(self, annotation_lines, input_shape, anchors, batch_size, num_classes, train, ignore_threshold = 0.4, overlap_threshold = 0.5): 14 | self.annotation_lines = annotation_lines 15 | self.length = len(self.annotation_lines) 16 | 17 | self.input_shape = input_shape 18 | self.anchors = anchors 19 | self.num_anchors = len(anchors) 20 | self.batch_size = batch_size 21 | self.num_classes = num_classes 22 | self.train = train 23 | self.ignore_threshold = ignore_threshold 24 | self.overlap_threshold = overlap_threshold 25 | 26 | def __len__(self): 27 | return math.ceil(len(self.annotation_lines) / float(self.batch_size)) 28 | 29 | def __getitem__(self, index): 30 | image_data = [] 31 | regressions = [] 32 | classifications = [] 33 | for i in range(index * self.batch_size, (index + 1) * self.batch_size): 34 | i = i % self.length 35 | #---------------------------------------------------# 36 | # 训练时进行数据的随机增强 37 | # 验证时不进行数据的随机增强 38 | #---------------------------------------------------# 39 | image, box = self.get_random_data(self.annotation_lines[i], self.input_shape, random = self.train) 40 | if len(box)!=0: 41 | boxes = np.array(box[:,:4] , dtype=np.float32) 42 | boxes[:, [0, 2]] = boxes[:,[0, 2]] / self.input_shape[1] 43 | boxes[:, [1, 3]] = boxes[:,[1, 3]] / self.input_shape[0] 44 | one_hot_label = np.eye(self.num_classes)[np.array(box[:,4], np.int32)] 45 | box = np.concatenate([boxes, one_hot_label], axis=-1) 46 | assignment = self.assign_boxes(box) 47 | regression = assignment[:,:5] 48 | classification = assignment[:,5:] 49 | 50 | image_data.append(preprocess_input(np.array(image, np.float32))) 51 | regressions.append(regression) 52 | classifications.append(classification) 53 | 54 | return np.array(image_data), [np.array(regressions,dtype=np.float32), np.array(classifications,dtype=np.float32)] 55 | 56 | def on_epoch_end(self): 57 | shuffle(self.annotation_lines) 58 | 59 | def rand(self, a=0, b=1): 60 | return np.random.rand()*(b-a) + a 61 | 62 | def get_random_data(self, annotation_line, input_shape, jitter=.3, hue=.1, sat=0.7, val=0.4, random=True): 63 | line = annotation_line.split() 64 | #------------------------------# 65 | # 读取图像并转换成RGB图像 66 | #------------------------------# 67 | image = Image.open(line[0]) 68 | image = cvtColor(image) 69 | #------------------------------# 70 | # 获得图像的高宽与目标高宽 71 | #------------------------------# 72 | iw, ih = image.size 73 | h, w = input_shape 74 | #------------------------------# 75 | # 获得预测框 76 | #------------------------------# 77 | box = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]]) 78 | 79 | if not random: 80 | scale = min(w/iw, h/ih) 81 | nw = int(iw*scale) 82 | nh = int(ih*scale) 83 | dx = (w-nw)//2 84 | dy = (h-nh)//2 85 | 86 | #---------------------------------# 87 | # 将图像多余的部分加上灰条 88 | #---------------------------------# 89 | image = image.resize((nw,nh), Image.BICUBIC) 90 | new_image = Image.new('RGB', (w,h), (128,128,128)) 91 | new_image.paste(image, (dx, dy)) 92 | image_data = np.array(new_image, np.float32) 93 | 94 | #---------------------------------# 95 | # 对真实框进行调整 96 | #---------------------------------# 97 | if len(box)>0: 98 | np.random.shuffle(box) 99 | box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx 100 | box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy 101 | box[:, 0:2][box[:, 0:2]<0] = 0 102 | box[:, 2][box[:, 2]>w] = w 103 | box[:, 3][box[:, 3]>h] = h 104 | box_w = box[:, 2] - box[:, 0] 105 | box_h = box[:, 3] - box[:, 1] 106 | box = box[np.logical_and(box_w>1, box_h>1)] # discard invalid box 107 | 108 | return image_data, box 109 | 110 | #------------------------------------------# 111 | # 对图像进行缩放并且进行长和宽的扭曲 112 | #------------------------------------------# 113 | new_ar = iw/ih * self.rand(1-jitter,1+jitter) / self.rand(1-jitter,1+jitter) 114 | scale = self.rand(.25, 2) 115 | if new_ar < 1: 116 | nh = int(scale*h) 117 | nw = int(nh*new_ar) 118 | else: 119 | nw = int(scale*w) 120 | nh = int(nw/new_ar) 121 | image = image.resize((nw,nh), Image.BICUBIC) 122 | 123 | #------------------------------------------# 124 | # 将图像多余的部分加上灰条 125 | #------------------------------------------# 126 | dx = int(self.rand(0, w-nw)) 127 | dy = int(self.rand(0, h-nh)) 128 | new_image = Image.new('RGB', (w,h), (128,128,128)) 129 | new_image.paste(image, (dx, dy)) 130 | image = new_image 131 | 132 | #------------------------------------------# 133 | # 翻转图像 134 | #------------------------------------------# 135 | flip = self.rand()<.5 136 | if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT) 137 | 138 | image_data = np.array(image, np.uint8) 139 | #---------------------------------# 140 | # 对图像进行色域变换 141 | # 计算色域变换的参数 142 | #---------------------------------# 143 | r = np.random.uniform(-1, 1, 3) * [hue, sat, val] + 1 144 | #---------------------------------# 145 | # 将图像转到HSV上 146 | #---------------------------------# 147 | hue, sat, val = cv2.split(cv2.cvtColor(image_data, cv2.COLOR_RGB2HSV)) 148 | dtype = image_data.dtype 149 | #---------------------------------# 150 | # 应用变换 151 | #---------------------------------# 152 | x = np.arange(0, 256, dtype=r.dtype) 153 | lut_hue = ((x * r[0]) % 180).astype(dtype) 154 | lut_sat = np.clip(x * r[1], 0, 255).astype(dtype) 155 | lut_val = np.clip(x * r[2], 0, 255).astype(dtype) 156 | 157 | image_data = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val))) 158 | image_data = cv2.cvtColor(image_data, cv2.COLOR_HSV2RGB) 159 | 160 | #---------------------------------# 161 | # 对真实框进行调整 162 | #---------------------------------# 163 | if len(box)>0: 164 | np.random.shuffle(box) 165 | box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx 166 | box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy 167 | if flip: box[:, [0,2]] = w - box[:, [2,0]] 168 | box[:, 0:2][box[:, 0:2]<0] = 0 169 | box[:, 2][box[:, 2]>w] = w 170 | box[:, 3][box[:, 3]>h] = h 171 | box_w = box[:, 2] - box[:, 0] 172 | box_h = box[:, 3] - box[:, 1] 173 | box = box[np.logical_and(box_w>1, box_h>1)] 174 | 175 | return image_data, box 176 | 177 | def iou(self, box): 178 | #---------------------------------------------# 179 | # 计算出每个真实框与所有的先验框的iou 180 | # 判断真实框与先验框的重合情况 181 | #---------------------------------------------# 182 | inter_upleft = np.maximum(self.anchors[:, :2], box[:2]) 183 | inter_botright = np.minimum(self.anchors[:, 2:4], box[2:]) 184 | 185 | inter_wh = inter_botright - inter_upleft 186 | inter_wh = np.maximum(inter_wh, 0) 187 | inter = inter_wh[:, 0] * inter_wh[:, 1] 188 | #---------------------------------------------# 189 | # 真实框的面积 190 | #---------------------------------------------# 191 | area_true = (box[2] - box[0]) * (box[3] - box[1]) 192 | #---------------------------------------------# 193 | # 先验框的面积 194 | #---------------------------------------------# 195 | area_gt = (self.anchors[:, 2] - self.anchors[:, 0])*(self.anchors[:, 3] - self.anchors[:, 1]) 196 | #---------------------------------------------# 197 | # 计算iou 198 | #---------------------------------------------# 199 | union = area_true + area_gt - inter 200 | 201 | iou = inter / union 202 | return iou 203 | 204 | def encode_box(self, box, return_iou=True): 205 | #---------------------------------------------# 206 | # 计算当前真实框和先验框的重合情况 207 | #---------------------------------------------# 208 | iou = self.iou(box) 209 | ignored_box = np.zeros((self.num_anchors, 1)) 210 | #---------------------------------------------------# 211 | # 找到处于忽略门限值范围内的先验框 212 | #---------------------------------------------------# 213 | assign_mask_ignore = (iou > self.ignore_threshold) & (iou < self.overlap_threshold) 214 | ignored_box[:, 0][assign_mask_ignore] = iou[assign_mask_ignore] 215 | 216 | encoded_box = np.zeros((self.num_anchors, 4 + return_iou)) 217 | #---------------------------------------------# 218 | # 找到每一个真实框,重合程度较高的先验框 219 | #---------------------------------------------# 220 | assign_mask = iou > self.overlap_threshold 221 | 222 | #---------------------------------------------# 223 | # 如果没有一个先验框重合度大于self.overlap_threshold 224 | # 则选择重合度最大的为正样本 225 | #---------------------------------------------# 226 | if not assign_mask.any(): 227 | assign_mask[iou.argmax()] = True 228 | 229 | #---------------------------------------------# 230 | # 利用iou进行赋值 231 | #---------------------------------------------# 232 | if return_iou: 233 | encoded_box[:, -1][assign_mask] = iou[assign_mask] 234 | 235 | #---------------------------------------------# 236 | # 找到对应的先验框 237 | #---------------------------------------------# 238 | assigned_anchors = self.anchors[assign_mask] 239 | 240 | #---------------------------------------------# 241 | # 逆向编码,将真实框转化为efficientdet预测结果的格式 242 | # 先计算真实框的中心与长宽 243 | #---------------------------------------------# 244 | box_center = 0.5 * (box[:2] + box[2:]) 245 | box_wh = box[2:] - box[:2] 246 | #---------------------------------------------# 247 | # 再计算重合度较高的先验框的中心与长宽 248 | #---------------------------------------------# 249 | assigned_anchors_center = (assigned_anchors[:, 0:2] + assigned_anchors[:, 2:4]) * 0.5 250 | assigned_anchors_wh = (assigned_anchors[:, 2:4] - assigned_anchors[:, 0:2]) 251 | 252 | #------------------------------------------------# 253 | # 逆向求取efficientdet应该有的预测结果 254 | # 先求取中心的预测结果,再求取宽高的预测结果 255 | #------------------------------------------------# 256 | encoded_box[:, :2][assign_mask] = box_center - assigned_anchors_center 257 | encoded_box[:, :2][assign_mask] /= assigned_anchors_wh 258 | 259 | encoded_box[:, 2:4][assign_mask] = np.log(box_wh / assigned_anchors_wh) 260 | 261 | return encoded_box.ravel(), ignored_box.ravel() 262 | 263 | def assign_boxes(self, boxes): 264 | #---------------------------------------------------# 265 | # assignment分为3个部分 266 | # :4 的内容为网络应该有的回归预测结果 267 | # 4:-1 的内容为先验框所对应的种类,默认为背景 268 | # -1 的内容为当前先验框是否包含目标 269 | #---------------------------------------------------# 270 | assignment = np.zeros((self.num_anchors, 4 + 1 + self.num_classes + 1)) 271 | assignment[:, 4] = 0.0 272 | assignment[:, -1] = 0.0 273 | if len(boxes) == 0: 274 | return assignment 275 | 276 | #---------------------------------------------------# 277 | # 对每一个真实框都进行iou计算 278 | #---------------------------------------------------# 279 | apply_along_axis_boxes = np.apply_along_axis(self.encode_box, 1, boxes[:, :4]) 280 | encoded_boxes = np.array([apply_along_axis_boxes[i, 0] for i in range(len(apply_along_axis_boxes))]) 281 | ingored_boxes = np.array([apply_along_axis_boxes[i, 1] for i in range(len(apply_along_axis_boxes))]) 282 | 283 | #---------------------------------------------------# 284 | # 在reshape后,获得的ingored_boxes的shape为: 285 | # [num_true_box, num_priors, 1] 其中1为iou 286 | #---------------------------------------------------# 287 | ingored_boxes = ingored_boxes.reshape(-1, self.num_anchors, 1) 288 | ignore_iou = ingored_boxes[:, :, 0].max(axis=0) 289 | ignore_iou_mask = ignore_iou > 0 290 | 291 | assignment[:, 4][ignore_iou_mask] = -1 292 | assignment[:, -1][ignore_iou_mask] = -1 293 | 294 | 295 | #---------------------------------------------------# 296 | # 在reshape后,获得的encoded_boxes的shape为: 297 | # [num_true_box, num_anchors, 4+1] 298 | # 4是编码后的结果,1为iou 299 | #---------------------------------------------------# 300 | encoded_boxes = encoded_boxes.reshape(-1, self.num_anchors, 5) 301 | 302 | #---------------------------------------------------# 303 | # [num_anchors]求取每一个先验框重合度最大的真实框 304 | #---------------------------------------------------# 305 | best_iou = encoded_boxes[:, :, -1].max(axis=0) 306 | best_iou_idx = encoded_boxes[:, :, -1].argmax(axis=0) 307 | best_iou_mask = best_iou > 0 308 | best_iou_idx = best_iou_idx[best_iou_mask] 309 | 310 | #---------------------------------------------------# 311 | # 计算一共有多少先验框满足需求 312 | #---------------------------------------------------# 313 | assign_num = len(best_iou_idx) 314 | 315 | # 将编码后的真实框取出 316 | encoded_boxes = encoded_boxes[:, best_iou_mask, :] 317 | assignment[:, :4][best_iou_mask] = encoded_boxes[best_iou_idx,np.arange(assign_num),:4] 318 | #----------------------------------------------------------# 319 | # 4代表为背景的概率,设定为0,因为这些先验框有对应的物体 320 | #----------------------------------------------------------# 321 | assignment[:, 4][best_iou_mask] = 1 322 | assignment[:, 5:-1][best_iou_mask] = boxes[best_iou_idx, 4:] 323 | #----------------------------------------------------------# 324 | # -8表示先验框是否有对应的物体 325 | #----------------------------------------------------------# 326 | assignment[:, -1][best_iou_mask] = 1 327 | # 通过assign_boxes我们就获得了,输入进来的这张图片,应该有的预测结果是什么样子的 328 | return assignment 329 | -------------------------------------------------------------------------------- /utils/utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from PIL import Image 3 | 4 | #---------------------------------------------------------------------# 5 | # 用于预测的图像大小,无需修改,由phi选择 6 | #---------------------------------------------------------------------# 7 | image_sizes = [512, 640, 768, 896, 1024, 1280, 1408, 1536] 8 | 9 | #---------------------------------------------------------# 10 | # 将图像转换成RGB图像,防止灰度图在预测时报错。 11 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 12 | #---------------------------------------------------------# 13 | def cvtColor(image): 14 | if len(np.shape(image)) == 3 and np.shape(image)[2] == 3: 15 | return image 16 | else: 17 | image = image.convert('RGB') 18 | return image 19 | 20 | #---------------------------------------------------# 21 | # 对输入图像进行resize 22 | #---------------------------------------------------# 23 | def resize_image(image, size, letterbox_image): 24 | iw, ih = image.size 25 | w, h = size 26 | if letterbox_image: 27 | scale = min(w/iw, h/ih) 28 | nw = int(iw*scale) 29 | nh = int(ih*scale) 30 | 31 | image = image.resize((nw,nh), Image.BICUBIC) 32 | new_image = Image.new('RGB', size, (128,128,128)) 33 | new_image.paste(image, ((w-nw)//2, (h-nh)//2)) 34 | else: 35 | new_image = image.resize((w, h), Image.BICUBIC) 36 | return new_image 37 | 38 | #---------------------------------------------------# 39 | # 获得类 40 | #---------------------------------------------------# 41 | def get_classes(classes_path): 42 | with open(classes_path, encoding='utf-8') as f: 43 | class_names = f.readlines() 44 | class_names = [c.strip() for c in class_names] 45 | return class_names, len(class_names) 46 | 47 | def preprocess_input(image): 48 | image /= 255 49 | mean = (0.485, 0.456, 0.406) 50 | std = (0.229, 0.224, 0.225) 51 | image -= mean 52 | image /= std 53 | return image 54 | 55 | def show_config(**kwargs): 56 | print('Configurations:') 57 | print('-' * 70) 58 | print('|%25s | %40s|' % ('keys', 'values')) 59 | print('-' * 70) 60 | for key, value in kwargs.items(): 61 | print('|%25s | %40s|' % (str(key), str(value))) 62 | print('-' * 70) 63 | 64 | #-------------------------------------------------------------------------------------------------------------------------------# 65 | # From https://github.com/ckyrkou/Keras_FLOP_Estimator 66 | # Fix lots of bugs 67 | #-------------------------------------------------------------------------------------------------------------------------------# 68 | def net_flops(model, table=False, print_result=True): 69 | if (table == True): 70 | print("\n") 71 | print('%25s | %16s | %16s | %16s | %16s | %6s | %6s' % ( 72 | 'Layer Name', 'Input Shape', 'Output Shape', 'Kernel Size', 'Filters', 'Strides', 'FLOPS')) 73 | print('=' * 120) 74 | 75 | #---------------------------------------------------# 76 | # 总的FLOPs 77 | #---------------------------------------------------# 78 | t_flops = 0 79 | factor = 1e9 80 | 81 | for l in model.layers: 82 | try: 83 | #--------------------------------------# 84 | # 所需参数的初始化定义 85 | #--------------------------------------# 86 | o_shape, i_shape, strides, ks, filters = ('', '', ''), ('', '', ''), (1, 1), (0, 0), 0 87 | flops = 0 88 | #--------------------------------------# 89 | # 获得层的名字 90 | #--------------------------------------# 91 | name = l.name 92 | 93 | if ('InputLayer' in str(l)): 94 | i_shape = l.get_input_shape_at(0)[1:4] 95 | o_shape = l.get_output_shape_at(0)[1:4] 96 | 97 | #--------------------------------------# 98 | # Reshape层 99 | #--------------------------------------# 100 | elif ('Reshape' in str(l)): 101 | i_shape = l.get_input_shape_at(0)[1:4] 102 | o_shape = l.get_output_shape_at(0)[1:4] 103 | 104 | #--------------------------------------# 105 | # 填充层 106 | #--------------------------------------# 107 | elif ('Padding' in str(l)): 108 | i_shape = l.get_input_shape_at(0)[1:4] 109 | o_shape = l.get_output_shape_at(0)[1:4] 110 | 111 | #--------------------------------------# 112 | # 平铺层 113 | #--------------------------------------# 114 | elif ('Flatten' in str(l)): 115 | i_shape = l.get_input_shape_at(0)[1:4] 116 | o_shape = l.get_output_shape_at(0)[1:4] 117 | 118 | #--------------------------------------# 119 | # 激活函数层 120 | #--------------------------------------# 121 | elif 'Activation' in str(l): 122 | i_shape = l.get_input_shape_at(0)[1:4] 123 | o_shape = l.get_output_shape_at(0)[1:4] 124 | 125 | #--------------------------------------# 126 | # LeakyReLU 127 | #--------------------------------------# 128 | elif 'LeakyReLU' in str(l): 129 | for i in range(len(l._inbound_nodes)): 130 | i_shape = l.get_input_shape_at(i)[1:4] 131 | o_shape = l.get_output_shape_at(i)[1:4] 132 | 133 | flops += i_shape[0] * i_shape[1] * i_shape[2] 134 | 135 | #--------------------------------------# 136 | # 池化层 137 | #--------------------------------------# 138 | elif 'MaxPooling' in str(l): 139 | i_shape = l.get_input_shape_at(0)[1:4] 140 | o_shape = l.get_output_shape_at(0)[1:4] 141 | 142 | #--------------------------------------# 143 | # 池化层 144 | #--------------------------------------# 145 | elif ('AveragePooling' in str(l) and 'Global' not in str(l)): 146 | strides = l.strides 147 | ks = l.pool_size 148 | 149 | for i in range(len(l._inbound_nodes)): 150 | i_shape = l.get_input_shape_at(i)[1:4] 151 | o_shape = l.get_output_shape_at(i)[1:4] 152 | 153 | flops += o_shape[0] * o_shape[1] * o_shape[2] 154 | 155 | #--------------------------------------# 156 | # 全局池化层 157 | #--------------------------------------# 158 | elif ('AveragePooling' in str(l) and 'Global' in str(l)): 159 | for i in range(len(l._inbound_nodes)): 160 | i_shape = l.get_input_shape_at(i)[1:4] 161 | o_shape = l.get_output_shape_at(i)[1:4] 162 | 163 | flops += (i_shape[0] * i_shape[1] + 1) * i_shape[2] 164 | 165 | #--------------------------------------# 166 | # 标准化层 167 | #--------------------------------------# 168 | elif ('BatchNormalization' in str(l)): 169 | for i in range(len(l._inbound_nodes)): 170 | i_shape = l.get_input_shape_at(i)[1:4] 171 | o_shape = l.get_output_shape_at(i)[1:4] 172 | 173 | temp_flops = 1 174 | for i in range(len(i_shape)): 175 | temp_flops *= i_shape[i] 176 | temp_flops *= 2 177 | 178 | flops += temp_flops 179 | 180 | #--------------------------------------# 181 | # 全连接层 182 | #--------------------------------------# 183 | elif ('Dense' in str(l)): 184 | for i in range(len(l._inbound_nodes)): 185 | i_shape = l.get_input_shape_at(i)[1:4] 186 | o_shape = l.get_output_shape_at(i)[1:4] 187 | 188 | temp_flops = 1 189 | for i in range(len(o_shape)): 190 | temp_flops *= o_shape[i] 191 | 192 | if (i_shape[-1] == None): 193 | temp_flops = temp_flops * o_shape[-1] 194 | else: 195 | temp_flops = temp_flops * i_shape[-1] 196 | flops += temp_flops 197 | 198 | #--------------------------------------# 199 | # 普通卷积层 200 | #--------------------------------------# 201 | elif ('Conv2D' in str(l) and 'DepthwiseConv2D' not in str(l) and 'SeparableConv2D' not in str(l)): 202 | strides = l.strides 203 | ks = l.kernel_size 204 | filters = l.filters 205 | bias = 1 if l.use_bias else 0 206 | 207 | for i in range(len(l._inbound_nodes)): 208 | i_shape = l.get_input_shape_at(i)[1:4] 209 | o_shape = l.get_output_shape_at(i)[1:4] 210 | 211 | if (filters == None): 212 | filters = i_shape[2] 213 | flops += filters * o_shape[0] * o_shape[1] * (ks[0] * ks[1] * i_shape[2] + bias) 214 | 215 | #--------------------------------------# 216 | # 逐层卷积层 217 | #--------------------------------------# 218 | elif ('Conv2D' in str(l) and 'DepthwiseConv2D' in str(l) and 'SeparableConv2D' not in str(l)): 219 | strides = l.strides 220 | ks = l.kernel_size 221 | filters = l.filters 222 | bias = 1 if l.use_bias else 0 223 | 224 | for i in range(len(l._inbound_nodes)): 225 | i_shape = l.get_input_shape_at(i)[1:4] 226 | o_shape = l.get_output_shape_at(i)[1:4] 227 | 228 | if (filters == None): 229 | filters = i_shape[2] 230 | flops += filters * o_shape[0] * o_shape[1] * (ks[0] * ks[1] + bias) 231 | 232 | #--------------------------------------# 233 | # 深度可分离卷积层 234 | #--------------------------------------# 235 | elif ('Conv2D' in str(l) and 'DepthwiseConv2D' not in str(l) and 'SeparableConv2D' in str(l)): 236 | strides = l.strides 237 | ks = l.kernel_size 238 | filters = l.filters 239 | 240 | for i in range(len(l._inbound_nodes)): 241 | i_shape = l.get_input_shape_at(i)[1:4] 242 | o_shape = l.get_output_shape_at(i)[1:4] 243 | 244 | if (filters == None): 245 | filters = i_shape[2] 246 | flops += i_shape[2] * o_shape[0] * o_shape[1] * (ks[0] * ks[1] + bias) + \ 247 | filters * o_shape[0] * o_shape[1] * (1 * 1 * i_shape[2] + bias) 248 | #--------------------------------------# 249 | # 模型中有模型时 250 | #--------------------------------------# 251 | elif 'Model' in str(l): 252 | flops = net_flops(l, print_result=False) 253 | 254 | t_flops += flops 255 | 256 | if (table == True): 257 | print('%25s | %16s | %16s | %16s | %16s | %6s | %5.4f' % ( 258 | name[:25], str(i_shape), str(o_shape), str(ks), str(filters), str(strides), flops)) 259 | 260 | except: 261 | pass 262 | 263 | t_flops = t_flops * 2 264 | if print_result: 265 | show_flops = t_flops / factor 266 | print('Total GFLOPs: %.3fG' % (show_flops)) 267 | return t_flops -------------------------------------------------------------------------------- /utils/utils_bbox.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import tensorflow as tf 3 | import keras.backend as K 4 | 5 | class BBoxUtility(object): 6 | def __init__(self, num_classes, nms_thresh=0.45, top_k=300): 7 | self.num_classes = num_classes 8 | self._nms_thresh = nms_thresh 9 | self._top_k = top_k 10 | self.boxes = K.placeholder(dtype='float32', shape=(None, 4)) 11 | self.scores = K.placeholder(dtype='float32', shape=(None,)) 12 | self.nms = tf.image.non_max_suppression(self.boxes, self.scores, self._top_k, iou_threshold=self._nms_thresh) 13 | self.sess = K.get_session() 14 | 15 | def bbox_iou(self, b1, b2): 16 | b1_x1, b1_y1, b1_x2, b1_y2 = b1[0], b1[1], b1[2], b1[3] 17 | b2_x1, b2_y1, b2_x2, b2_y2 = b2[:, 0], b2[:, 1], b2[:, 2], b2[:, 3] 18 | 19 | inter_rect_x1 = np.maximum(b1_x1, b2_x1) 20 | inter_rect_y1 = np.maximum(b1_y1, b2_y1) 21 | inter_rect_x2 = np.minimum(b1_x2, b2_x2) 22 | inter_rect_y2 = np.minimum(b1_y2, b2_y2) 23 | 24 | inter_area = np.maximum(inter_rect_x2 - inter_rect_x1, 0) * \ 25 | np.maximum(inter_rect_y2 - inter_rect_y1, 0) 26 | 27 | area_b1 = (b1_x2-b1_x1)*(b1_y2-b1_y1) 28 | area_b2 = (b2_x2-b2_x1)*(b2_y2-b2_y1) 29 | 30 | iou = inter_area/np.maximum((area_b1+area_b2-inter_area),1e-6) 31 | return iou 32 | 33 | def efficientdet_correct_boxes(self, box_xy, box_wh, input_shape, image_shape, letterbox_image): 34 | #-----------------------------------------------------------------# 35 | # 把y轴放前面是因为方便预测框和图像的宽高进行相乘 36 | #-----------------------------------------------------------------# 37 | box_yx = box_xy[..., ::-1] 38 | box_hw = box_wh[..., ::-1] 39 | input_shape = np.array(input_shape) 40 | image_shape = np.array(image_shape) 41 | 42 | if letterbox_image: 43 | #-----------------------------------------------------------------# 44 | # 这里求出来的offset是图像有效区域相对于图像左上角的偏移情况 45 | # new_shape指的是宽高缩放情况 46 | #-----------------------------------------------------------------# 47 | new_shape = np.round(image_shape * np.min(input_shape/image_shape)) 48 | offset = (input_shape - new_shape)/2./input_shape 49 | scale = input_shape/new_shape 50 | 51 | box_yx = (box_yx - offset) * scale 52 | box_hw *= scale 53 | 54 | box_mins = box_yx - (box_hw / 2.) 55 | box_maxes = box_yx + (box_hw / 2.) 56 | boxes = np.concatenate([box_mins[..., 0:1], box_mins[..., 1:2], box_maxes[..., 0:1], box_maxes[..., 1:2]], axis=-1) 57 | boxes *= np.concatenate([image_shape, image_shape], axis=-1) 58 | return boxes 59 | 60 | def decode_boxes(self, mbox_loc, anchors): 61 | # 获得先验框的宽与高 62 | anchor_width = anchors[:, 2] - anchors[:, 0] 63 | anchor_height = anchors[:, 3] - anchors[:, 1] 64 | # 获得先验框的中心点 65 | anchor_center_x = 0.5 * (anchors[:, 2] + anchors[:, 0]) 66 | anchor_center_y = 0.5 * (anchors[:, 3] + anchors[:, 1]) 67 | 68 | # 真实框距离先验框中心的xy轴偏移情况 69 | decode_bbox_center_x = mbox_loc[:, 0] * anchor_width 70 | decode_bbox_center_x += anchor_center_x 71 | decode_bbox_center_y = mbox_loc[:, 1] * anchor_height 72 | decode_bbox_center_y += anchor_center_y 73 | 74 | # 真实框的宽与高的求取 75 | decode_bbox_width = np.exp(mbox_loc[:, 2]) 76 | decode_bbox_width *= anchor_width 77 | decode_bbox_height = np.exp(mbox_loc[:, 3]) 78 | decode_bbox_height *= anchor_height 79 | 80 | # 获取真实框的左上角与右下角 81 | decode_bbox_xmin = decode_bbox_center_x - 0.5 * decode_bbox_width 82 | decode_bbox_ymin = decode_bbox_center_y - 0.5 * decode_bbox_height 83 | decode_bbox_xmax = decode_bbox_center_x + 0.5 * decode_bbox_width 84 | decode_bbox_ymax = decode_bbox_center_y + 0.5 * decode_bbox_height 85 | 86 | # 真实框的左上角与右下角进行堆叠 87 | decode_bbox = np.concatenate((decode_bbox_xmin[:, None], 88 | decode_bbox_ymin[:, None], 89 | decode_bbox_xmax[:, None], 90 | decode_bbox_ymax[:, None]), axis=-1) 91 | # 防止超出0与1 92 | decode_bbox = np.minimum(np.maximum(decode_bbox, 0.0), 1.0) 93 | return decode_bbox 94 | 95 | def decode_box(self, predictions, anchors, image_shape, input_shape, letterbox_image, confidence=0.5): 96 | #---------------------------------------------------# 97 | # 获得回归预测结果 98 | #---------------------------------------------------# 99 | mbox_loc = predictions[0] 100 | #---------------------------------------------------# 101 | # 获得种类的置信度 102 | #---------------------------------------------------# 103 | mbox_conf = predictions[1] 104 | 105 | results = [None for _ in range(len(mbox_loc))] 106 | #----------------------------------------------------------------------------------------------------------------# 107 | # 对每一张图片进行处理,由于在predict.py的时候,我们只输入一张图片,所以for i in range(len(mbox_loc))只进行一次 108 | #----------------------------------------------------------------------------------------------------------------# 109 | for i in range(len(mbox_loc)): 110 | #--------------------------------# 111 | # 利用回归结果对先验框进行解码 112 | #--------------------------------# 113 | decode_bbox = self.decode_boxes(mbox_loc[i], anchors) 114 | 115 | #--------------------------------------------------# 116 | # 判断置信度与非极大抑制的过程与视频有一定的差距 117 | # 整体思想相差不大,可以参考注释进行阅读 118 | #--------------------------------------------------# 119 | class_conf = np.expand_dims(np.max(mbox_conf[i], 1), -1) 120 | class_pred = np.expand_dims(np.argmax(mbox_conf[i], 1), -1) 121 | #--------------------------------# 122 | # 判断置信度是否大于门限要求 123 | #--------------------------------# 124 | conf_mask = (class_conf >= confidence)[:, 0] 125 | 126 | #--------------------------------# 127 | # 将预测结果进行堆叠 128 | #--------------------------------# 129 | detections = np.concatenate((decode_bbox[conf_mask], class_conf[conf_mask], class_pred[conf_mask]), 1) 130 | unique_labels = np.unique(detections[:,-1]) 131 | 132 | #-------------------------------------------------------------------# 133 | # 对种类进行循环, 134 | # 非极大抑制的作用是筛选出一定区域内属于同一种类得分最大的框, 135 | # 对种类进行循环可以帮助我们对每一个类分别进行非极大抑制。 136 | #-------------------------------------------------------------------# 137 | for c in unique_labels: 138 | #------------------------------------------# 139 | # 获得某一类得分筛选后全部的预测结果 140 | #------------------------------------------# 141 | detections_class = detections[detections[:, -1] == c] 142 | #------------------------------------------# 143 | # 使用官方自带的非极大抑制会速度更快一些! 144 | #------------------------------------------# 145 | idx = self.sess.run(self.nms, feed_dict={self.boxes: detections_class[:, :4], self.scores: detections_class[:, 4]}) 146 | max_detections = detections_class[idx] 147 | 148 | # #------------------------------------------# 149 | # # 非官方的实现部分 150 | # # 获得某一类得分筛选后全部的预测结果 151 | # #------------------------------------------# 152 | # detections_class = detections[detections[:, -1] == c] 153 | # scores = detections_class[:, 4] 154 | # #------------------------------------------# 155 | # # 根据得分对该种类进行从大到小排序。 156 | # #------------------------------------------# 157 | # arg_sort = np.argsort(scores)[::-1] 158 | # detections_class = detections_class[arg_sort] 159 | # max_detections = [] 160 | # while np.shape(detections_class)[0]>0: 161 | # #-------------------------------------------------------------------------------------# 162 | # # 每次取出得分最大的框,计算其与其它所有预测框的重合程度,重合程度过大的则剔除。 163 | # #-------------------------------------------------------------------------------------# 164 | # max_detections.append(detections_class[0]) 165 | # if len(detections_class) == 1: 166 | # break 167 | # ious = self.bbox_iou(max_detections[-1], detections_class[1:]) 168 | # detections_class = detections_class[1:][ious < self._nms_thresh] 169 | results[i] = max_detections if results[i] is None else np.concatenate((results[i], max_detections), axis = 0) 170 | 171 | if results[i] is not None: 172 | results[i] = np.array(results[i]) 173 | box_xy, box_wh = (results[i][:, 0:2] + results[i][:, 2:4])/2, results[i][:, 2:4] - results[i][:, 0:2] 174 | results[i][:, :4] = self.efficientdet_correct_boxes(box_xy, box_wh, input_shape, image_shape, letterbox_image) 175 | 176 | return results 177 | -------------------------------------------------------------------------------- /vision_for_anchors.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | 3 | import keras 4 | import matplotlib.pyplot as plt 5 | import numpy as np 6 | 7 | 8 | def decode_boxes(mbox_loc, mbox_priorbox): 9 | # 获得先验框的宽与高 10 | prior_width = mbox_priorbox[:, 2] - mbox_priorbox[:, 0] 11 | prior_height = mbox_priorbox[:, 3] - mbox_priorbox[:, 1] 12 | # 获得先验框的中心点 13 | prior_center_x = 0.5 * (mbox_priorbox[:, 2] + mbox_priorbox[:, 0]) 14 | prior_center_y = 0.5 * (mbox_priorbox[:, 3] + mbox_priorbox[:, 1]) 15 | 16 | # 真实框距离先验框中心的xy轴偏移情况 17 | decode_bbox_center_x = mbox_loc[:, 0] * prior_width 18 | decode_bbox_center_x += prior_center_x 19 | decode_bbox_center_y = mbox_loc[:, 1] * prior_height 20 | decode_bbox_center_y += prior_center_y 21 | 22 | # 真实框的宽与高的求取 23 | decode_bbox_width = np.exp(mbox_loc[:, 2]) 24 | decode_bbox_width *= prior_width 25 | decode_bbox_height = np.exp(mbox_loc[:, 3]) 26 | decode_bbox_height *= prior_height 27 | 28 | # 获取真实框的左上角与右下角 29 | decode_bbox_xmin = decode_bbox_center_x - 0.5 * decode_bbox_width 30 | decode_bbox_ymin = decode_bbox_center_y - 0.5 * decode_bbox_height 31 | decode_bbox_xmax = decode_bbox_center_x + 0.5 * decode_bbox_width 32 | decode_bbox_ymax = decode_bbox_center_y + 0.5 * decode_bbox_height 33 | 34 | # 真实框的左上角与右下角进行堆叠 35 | decode_bbox = np.concatenate((decode_bbox_xmin[:, None], 36 | decode_bbox_ymin[:, None], 37 | decode_bbox_xmax[:, None], 38 | decode_bbox_ymax[:, None]), axis=-1) 39 | 40 | return decode_bbox 41 | 42 | class AnchorBox: 43 | def __init__(self, ratios, scales): 44 | self.ratios = ratios 45 | self.scales = scales 46 | self.num_anchors = len(self.ratios) * len(self.scales) 47 | 48 | def generate_anchors(self, base_size = 16): 49 | # anchors - 9,4 50 | anchors = np.zeros((self.num_anchors, 4)) 51 | anchors[:, 2:] = base_size * np.tile(self.scales, (2, len(self.scales))).T 52 | 53 | # 计算先验框的面积 54 | areas = anchors[:, 2] * anchors[:, 3] 55 | 56 | # np.repeat(ratios, len(scales)) [0.5 0.5 0.5 1. 1. 1. 2. 2. 2. ] 57 | anchors[:, 2] = np.sqrt(areas / np.repeat(self.ratios, len(self.scales))) 58 | anchors[:, 3] = np.sqrt(areas * np.repeat(self.ratios, len(self.scales))) 59 | 60 | anchors[:, 0::2] -= np.tile(anchors[:, 2] * 0.5, (2, 1)).T 61 | anchors[:, 1::2] -= np.tile(anchors[:, 3] * 0.5, (2, 1)).T 62 | return anchors 63 | 64 | 65 | def shift(self, shape, stride, anchors): 66 | # 生成特征层的网格中心 67 | shift_x = (np.arange(0, shape[1], dtype=keras.backend.floatx()) + 0.5) * stride 68 | shift_y = (np.arange(0, shape[0], dtype=keras.backend.floatx()) + 0.5) * stride 69 | shift_x, shift_y = np.meshgrid(shift_x, shift_y) 70 | 71 | shift_x = np.reshape(shift_x, [-1]) 72 | shift_y = np.reshape(shift_y, [-1]) 73 | 74 | # 将网格中心进行堆叠 75 | shifts = np.stack([ 76 | shift_x, 77 | shift_y, 78 | shift_x, 79 | shift_y 80 | ], axis=0) 81 | 82 | shifts = np.transpose(shifts) 83 | number_of_anchors = np.shape(anchors)[0] 84 | 85 | k = np.shape(shifts)[0] 86 | # shifted_anchors k, 9, 4 -> k * 9, 4 87 | shifted_anchors = np.reshape(anchors, [1, number_of_anchors, 4]) + np.array(np.reshape(shifts, [k, 1, 4])) 88 | shifted_anchors = np.reshape(shifted_anchors, [k * number_of_anchors, 4]) 89 | 90 | #-------------------------------# 91 | # 可视化代码 92 | #-------------------------------# 93 | if shape[0]==4: 94 | fig = plt.figure() 95 | ax = fig.add_subplot(121) 96 | plt.ylim(-300,900) 97 | plt.xlim(-600,600) 98 | 99 | plt.scatter(shift_x,shift_y) 100 | box_widths = shifted_anchors[:,2]-shifted_anchors[:,0] 101 | box_heights = shifted_anchors[:,3]-shifted_anchors[:,1] 102 | 103 | for i in [108,109,110,111,112,113,114,115,116]: 104 | rect = plt.Rectangle([shifted_anchors[i, 0],shifted_anchors[i, 1]],box_widths[i],box_heights[i],color="r",fill=False) 105 | ax.add_patch(rect) 106 | plt.gca().invert_yaxis() 107 | 108 | ax = fig.add_subplot(122) 109 | plt.ylim(-300,900) 110 | plt.xlim(-600,600) 111 | plt.scatter(shift_x,shift_y) 112 | P7_num_anchors = len(shifted_anchors) 113 | random_inputs = np.random.uniform(0,1,[P7_num_anchors,4])/10 114 | after_decode = decode_boxes(random_inputs, shifted_anchors) 115 | 116 | box_widths = after_decode[:,2]-after_decode[:,0] 117 | box_heights = after_decode[:,3]-after_decode[:,1] 118 | 119 | after_decode_center_x = after_decode[:,0]/2+after_decode[:,2]/2 120 | after_decode_center_y = after_decode[:,1]/2+after_decode[:,3]/2 121 | plt.scatter(after_decode_center_x[108:116],after_decode_center_y[108:116]) 122 | 123 | for i in [108,109,110,111,112,113,114,115,116]: 124 | rect = plt.Rectangle([after_decode[i, 0],after_decode[i, 1]],box_widths[i],box_heights[i],color="r",fill=False) 125 | ax.add_patch(rect) 126 | plt.gca().invert_yaxis() 127 | 128 | plt.show() 129 | 130 | return shifted_anchors 131 | 132 | 133 | #---------------------------------------------------# 134 | # 用于计算共享特征层的大小 135 | #---------------------------------------------------# 136 | def get_img_output_length(height, width): 137 | filter_sizes = [7, 3, 3, 3, 3, 3, 3] 138 | padding = [3, 1, 1, 1, 1, 1, 1] 139 | stride = [2, 2, 2, 2, 2, 2, 2] 140 | feature_heights = [] 141 | feature_widths = [] 142 | 143 | for i in range(len(filter_sizes)): 144 | height = (height + 2*padding[i] - filter_sizes[i]) // stride[i] + 1 145 | width = (width + 2*padding[i] - filter_sizes[i]) // stride[i] + 1 146 | feature_heights.append(height) 147 | feature_widths.append(width) 148 | return np.array(feature_heights)[-5:], np.array(feature_widths)[-5:] 149 | 150 | def get_anchors(input_shape, anchors_size = [32, 64, 128, 256, 512], strides = [8, 16, 32, 64, 128], \ 151 | ratios = [0.5, 1, 2], scales = [2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)]): 152 | feature_heights, feature_widths = get_img_output_length(input_shape[0], input_shape[1]) 153 | 154 | all_anchors = [] 155 | anchor_box = AnchorBox(ratios, scales) 156 | for i in range(len(anchors_size)): 157 | #------------------------------# 158 | # 先生成每个特征点的9个先验框 159 | # anchors 9, 4 160 | #------------------------------# 161 | anchors = anchor_box.generate_anchors(anchors_size[i]) 162 | shifted_anchors = anchor_box.shift([feature_heights[i], feature_widths[i]], strides[i], anchors) 163 | all_anchors.append(shifted_anchors) 164 | 165 | # 将每个特征层的先验框进行堆叠。 166 | all_anchors = np.concatenate(all_anchors,axis=0) 167 | all_anchors = all_anchors / np.array([input_shape[1], input_shape[0], input_shape[1], input_shape[0]]) 168 | return all_anchors 169 | 170 | get_anchors([512, 512]) -------------------------------------------------------------------------------- /voc_annotation.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | import xml.etree.ElementTree as ET 4 | 5 | import numpy as np 6 | 7 | from utils.utils import get_classes 8 | 9 | #--------------------------------------------------------------------------------------------------------------------------------# 10 | # annotation_mode用于指定该文件运行时计算的内容 11 | # annotation_mode为0代表整个标签处理过程,包括获得VOCdevkit/VOC2007/ImageSets里面的txt以及训练用的2007_train.txt、2007_val.txt 12 | # annotation_mode为1代表获得VOCdevkit/VOC2007/ImageSets里面的txt 13 | # annotation_mode为2代表获得训练用的2007_train.txt、2007_val.txt 14 | #--------------------------------------------------------------------------------------------------------------------------------# 15 | annotation_mode = 0 16 | #-------------------------------------------------------------------# 17 | # 必须要修改,用于生成2007_train.txt、2007_val.txt的目标信息 18 | # 与训练和预测所用的classes_path一致即可 19 | # 如果生成的2007_train.txt里面没有目标信息 20 | # 那么就是因为classes没有设定正确 21 | # 仅在annotation_mode为0和2的时候有效 22 | #-------------------------------------------------------------------# 23 | classes_path = 'model_data/voc_classes.txt' 24 | #--------------------------------------------------------------------------------------------------------------------------------# 25 | # trainval_percent用于指定(训练集+验证集)与测试集的比例,默认情况下 (训练集+验证集):测试集 = 9:1 26 | # train_percent用于指定(训练集+验证集)中训练集与验证集的比例,默认情况下 训练集:验证集 = 9:1 27 | # 仅在annotation_mode为0和1的时候有效 28 | #--------------------------------------------------------------------------------------------------------------------------------# 29 | trainval_percent = 0.9 30 | train_percent = 0.9 31 | #-------------------------------------------------------# 32 | # 指向VOC数据集所在的文件夹 33 | # 默认指向根目录下的VOC数据集 34 | #-------------------------------------------------------# 35 | VOCdevkit_path = 'VOCdevkit' 36 | 37 | VOCdevkit_sets = [('2007', 'train'), ('2007', 'val')] 38 | classes, _ = get_classes(classes_path) 39 | 40 | #-------------------------------------------------------# 41 | # 统计目标数量 42 | #-------------------------------------------------------# 43 | photo_nums = np.zeros(len(VOCdevkit_sets)) 44 | nums = np.zeros(len(classes)) 45 | def convert_annotation(year, image_id, list_file): 46 | in_file = open(os.path.join(VOCdevkit_path, 'VOC%s/Annotations/%s.xml'%(year, image_id)), encoding='utf-8') 47 | tree=ET.parse(in_file) 48 | root = tree.getroot() 49 | 50 | for obj in root.iter('object'): 51 | difficult = 0 52 | if obj.find('difficult')!=None: 53 | difficult = obj.find('difficult').text 54 | cls = obj.find('name').text 55 | if cls not in classes or int(difficult)==1: 56 | continue 57 | cls_id = classes.index(cls) 58 | xmlbox = obj.find('bndbox') 59 | b = (int(float(xmlbox.find('xmin').text)), int(float(xmlbox.find('ymin').text)), int(float(xmlbox.find('xmax').text)), int(float(xmlbox.find('ymax').text))) 60 | list_file.write(" " + ",".join([str(a) for a in b]) + ',' + str(cls_id)) 61 | 62 | nums[classes.index(cls)] = nums[classes.index(cls)] + 1 63 | 64 | if __name__ == "__main__": 65 | random.seed(0) 66 | if " " in os.path.abspath(VOCdevkit_path): 67 | raise ValueError("数据集存放的文件夹路径与图片名称中不可以存在空格,否则会影响正常的模型训练,请注意修改。") 68 | 69 | if annotation_mode == 0 or annotation_mode == 1: 70 | print("Generate txt in ImageSets.") 71 | xmlfilepath = os.path.join(VOCdevkit_path, 'VOC2007/Annotations') 72 | saveBasePath = os.path.join(VOCdevkit_path, 'VOC2007/ImageSets/Main') 73 | temp_xml = os.listdir(xmlfilepath) 74 | total_xml = [] 75 | for xml in temp_xml: 76 | if xml.endswith(".xml"): 77 | total_xml.append(xml) 78 | 79 | num = len(total_xml) 80 | list = range(num) 81 | tv = int(num*trainval_percent) 82 | tr = int(tv*train_percent) 83 | trainval= random.sample(list,tv) 84 | train = random.sample(trainval,tr) 85 | 86 | print("train and val size",tv) 87 | print("train size",tr) 88 | ftrainval = open(os.path.join(saveBasePath,'trainval.txt'), 'w') 89 | ftest = open(os.path.join(saveBasePath,'test.txt'), 'w') 90 | ftrain = open(os.path.join(saveBasePath,'train.txt'), 'w') 91 | fval = open(os.path.join(saveBasePath,'val.txt'), 'w') 92 | 93 | for i in list: 94 | name=total_xml[i][:-4]+'\n' 95 | if i in trainval: 96 | ftrainval.write(name) 97 | if i in train: 98 | ftrain.write(name) 99 | else: 100 | fval.write(name) 101 | else: 102 | ftest.write(name) 103 | 104 | ftrainval.close() 105 | ftrain.close() 106 | fval.close() 107 | ftest.close() 108 | print("Generate txt in ImageSets done.") 109 | 110 | if annotation_mode == 0 or annotation_mode == 2: 111 | print("Generate 2007_train.txt and 2007_val.txt for train.") 112 | type_index = 0 113 | for year, image_set in VOCdevkit_sets: 114 | image_ids = open(os.path.join(VOCdevkit_path, 'VOC%s/ImageSets/Main/%s.txt'%(year, image_set)), encoding='utf-8').read().strip().split() 115 | list_file = open('%s_%s.txt'%(year, image_set), 'w', encoding='utf-8') 116 | for image_id in image_ids: 117 | list_file.write('%s/VOC%s/JPEGImages/%s.jpg'%(os.path.abspath(VOCdevkit_path), year, image_id)) 118 | 119 | convert_annotation(year, image_id, list_file) 120 | list_file.write('\n') 121 | photo_nums[type_index] = len(image_ids) 122 | type_index += 1 123 | list_file.close() 124 | print("Generate 2007_train.txt and 2007_val.txt for train done.") 125 | 126 | def printTable(List1, List2): 127 | for i in range(len(List1[0])): 128 | print("|", end=' ') 129 | for j in range(len(List1)): 130 | print(List1[j][i].rjust(int(List2[j])), end=' ') 131 | print("|", end=' ') 132 | print() 133 | 134 | str_nums = [str(int(x)) for x in nums] 135 | tableData = [ 136 | classes, str_nums 137 | ] 138 | colWidths = [0]*len(tableData) 139 | len1 = 0 140 | for i in range(len(tableData)): 141 | for j in range(len(tableData[i])): 142 | if len(tableData[i][j]) > colWidths[i]: 143 | colWidths[i] = len(tableData[i][j]) 144 | printTable(tableData, colWidths) 145 | 146 | if photo_nums[0] <= 500: 147 | print("训练集数量小于500,属于较小的数据量,请注意设置较大的训练世代(Epoch)以满足足够的梯度下降次数(Step)。") 148 | 149 | if np.sum(nums) == 0: 150 | print("在数据集中并未获得任何目标,请注意修改classes_path对应自己的数据集,并且保证标签名字正确,否则训练将会没有任何效果!") 151 | print("在数据集中并未获得任何目标,请注意修改classes_path对应自己的数据集,并且保证标签名字正确,否则训练将会没有任何效果!") 152 | print("在数据集中并未获得任何目标,请注意修改classes_path对应自己的数据集,并且保证标签名字正确,否则训练将会没有任何效果!") 153 | print("(重要的事情说三遍)。") 154 | -------------------------------------------------------------------------------- /常见问题汇总.md: -------------------------------------------------------------------------------- 1 | 问题汇总的博客地址为[https://blog.csdn.net/weixin_44791964/article/details/107517428](https://blog.csdn.net/weixin_44791964/article/details/107517428)。 2 | 3 | # 问题汇总 4 | ## 1、下载问题 5 | ### a、代码下载 6 | **问:up主,可以给我发一份代码吗,代码在哪里下载啊? 7 | 答:Github上的地址就在视频简介里。复制一下就能进去下载了。** 8 | 9 | **问:up主,为什么我下载的代码提示压缩包损坏? 10 | 答:重新去Github下载。** 11 | 12 | **问:up主,为什么我下载的代码和你在视频以及博客上的代码不一样? 13 | 答:我常常会对代码进行更新,最终以实际的代码为准。** 14 | 15 | ### b、 权值下载 16 | **问:up主,为什么我下载的代码里面,model_data下面没有.pth或者.h5文件? 17 | 答:我一般会把权值上传到Github和百度网盘,在GITHUB的README里面就能找到。** 18 | 19 | ### c、 数据集下载 20 | **问:up主,XXXX数据集在哪里下载啊? 21 | 答:一般数据集的下载地址我会放在README里面,基本上都有,没有的话请及时联系我添加,直接发github的issue即可**。 22 | 23 | ## 2、环境配置问题 24 | ### a、现在库中所用的环境 25 | **pytorch代码对应的pytorch版本为1.2,博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/106037141](https://blog.csdn.net/weixin_44791964/article/details/106037141)。 26 | 27 | **keras代码对应的tensorflow版本为1.13.2,keras版本是2.1.5,博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/104702142](https://blog.csdn.net/weixin_44791964/article/details/104702142)。 28 | 29 | **tf2代码对应的tensorflow版本为2.2.0,无需安装keras,博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/109161493](https://blog.csdn.net/weixin_44791964/article/details/109161493)。 30 | 31 | **问:你的代码某某某版本的tensorflow和pytorch能用嘛? 32 | 答:最好按照我推荐的配置,配置教程也有!其它版本的我没有试过!可能出现问题但是一般问题不大。仅需要改少量代码即可。** 33 | 34 | ### b、30系列显卡环境配置 35 | 30系显卡由于框架更新不可使用上述环境配置教程。 36 | 当前我已经测试的可以用的30显卡配置如下: 37 | **pytorch代码对应的pytorch版本为1.7.0,cuda为11.0,cudnn为8.0.5**。 38 | 39 | **keras代码无法在win10下配置cuda11,在ubuntu下可以百度查询一下,配置tensorflow版本为1.15.4,keras版本是2.1.5或者2.3.1(少量函数接口不同,代码可能还需要少量调整。)** 40 | 41 | **tf2代码对应的tensorflow版本为2.4.0,cuda为11.0,cudnn为8.0.5**。 42 | 43 | ### c、GPU利用问题与环境使用问题 44 | **问:为什么我安装了tensorflow-gpu但是却没用利用GPU进行训练呢? 45 | 答:确认tensorflow-gpu已经装好,利用pip list查看tensorflow版本,然后查看任务管理器或者利用nvidia命令看看是否使用了gpu进行训练,任务管理器的话要看显存使用情况。** 46 | 47 | **问:up主,我好像没有在用gpu进行训练啊,怎么看是不是用了GPU进行训练? 48 | 答:查看是否使用GPU进行训练一般使用NVIDIA在命令行的查看命令,如果要看任务管理器的话,请看性能部分GPU的显存是否利用,或者查看任务管理器的Cuda,而非Copy。** 49 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20201013234241524.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70#pic_center) 50 | 51 | **问:up主,为什么我按照你的环境配置后还是不能使用? 52 | 答:请把你的GPU、CUDA、CUDNN、TF版本以及PYTORCH版本B站私聊告诉我。** 53 | 54 | **问:出现如下错误** 55 | ```python 56 | Traceback (most recent call last): 57 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in 58 | from tensorflow.python.pywrap_tensorflow_internal import * 59 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in 60 | pywrap_tensorflow_internal = swig_import_helper() 61 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper 62 | _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) 63 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\imp.py", line 243, in load_modulereturn load_dynamic(name, filename, file) 64 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\imp.py", line 343, in load_dynamic 65 | return _load(spec) 66 | ImportError: DLL load failed: 找不到指定的模块。 67 | ``` 68 | **答:如果没重启过就重启一下,否则重新按照步骤安装,还无法解决则把你的GPU、CUDA、CUDNN、TF版本以及PYTORCH版本私聊告诉我。** 69 | 70 | ### d、no module问题 71 | **问:为什么提示说no module name utils.utils(no module name nets.yolo、no module name nets.ssd等一系列问题)啊? 72 | 答:utils并不需要用pip装,它就在我上传的仓库的根目录,出现这个问题的原因是根目录不对,查查相对目录和根目录的概念。查了基本上就明白了。** 73 | 74 | **问:为什么提示说no module name matplotlib(no module name PIL,no module name cv2等等)? 75 | 答:这个库没安装打开命令行安装就好。pip install matplotlib** 76 | 77 | **问:为什么我已经用pip装了opencv(pillow、matplotlib等),还是提示no module name cv2? 78 | 答:没有激活环境装,要激活对应的conda环境进行安装才可以正常使用** 79 | 80 | **问:为什么提示说No module named 'torch' ? 81 | 答:其实我也真的很想知道为什么会有这个问题……这个pytorch没装是什么情况?一般就俩情况,一个是真的没装,还有一个是装到其它环境了,当前激活的环境不是自己装的环境。** 82 | 83 | **问:为什么提示说No module named 'tensorflow' ? 84 | 答:同上。** 85 | 86 | ### e、cuda安装失败问题 87 | 一般cuda安装前需要安装Visual Studio,装个2017版本即可。 88 | 89 | ### f、Ubuntu系统问题 90 | **所有代码在Ubuntu下可以使用,我两个系统都试过。** 91 | 92 | ### g、VSCODE提示错误的问题 93 | **问:为什么在VSCODE里面提示一大堆的错误啊? 94 | 答:我也提示一大堆的错误,但是不影响,是VSCODE的问题,如果不想看错误的话就装Pycharm。** 95 | 96 | ### h、使用cpu进行训练与预测的问题 97 | **对于keras和tf2的代码而言,如果想用cpu进行训练和预测,直接装cpu版本的tensorflow就可以了。** 98 | 99 | **对于pytorch的代码而言,如果想用cpu进行训练和预测,需要将cuda=True修改成cuda=False。** 100 | 101 | ### i、tqdm没有pos参数问题 102 | **问:运行代码提示'tqdm' object has no attribute 'pos'。 103 | 答:重装tqdm,换个版本就可以了。** 104 | 105 | ### j、提示decode(“utf-8”)的问题 106 | **由于h5py库的更新,安装过程中会自动安装h5py=3.0.0以上的版本,会导致decode("utf-8")的错误! 107 | 各位一定要在安装完tensorflow后利用命令装h5py=2.10.0!** 108 | ``` 109 | pip install h5py==2.10.0 110 | ``` 111 | 112 | ### k、提示TypeError: __array__() takes 1 positional argument but 2 were given错误 113 | 可以修改pillow版本解决。 114 | ``` 115 | pip install pillow==8.2.0 116 | ``` 117 | 118 | ### l、其它问题 119 | **问:为什么提示TypeError: cat() got an unexpected keyword argument 'axis',Traceback (most recent call last),AttributeError: 'Tensor' object has no attribute 'bool'? 120 | 答:这是版本问题,建议使用torch1.2以上版本** 121 | **其它有很多稀奇古怪的问题,很多是版本问题,建议按照我的视频教程安装Keras和tensorflow。比如装的是tensorflow2,就不用问我说为什么我没法运行Keras-yolo啥的。那是必然不行的。** 122 | 123 | ## 3、目标检测库问题汇总(人脸检测和分类库也可参考) 124 | ### a、shape不匹配问题 125 | #### 1)、训练时shape不匹配问题 126 | **问:up主,为什么运行train.py会提示shape不匹配啊? 127 | 答:在keras环境中,因为你训练的种类和原始的种类不同,网络结构会变化,所以最尾部的shape会有少量不匹配。** 128 | 129 | #### 2)、预测时shape不匹配问题 130 | **问:为什么我运行predict.py会提示我说shape不匹配呀。 131 | 在Pytorch里面是这样的:** 132 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171631901.png) 133 | 在Keras里面是这样的: 134 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171523380.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70) 135 | **答:原因主要有仨: 136 | 1、在ssd、FasterRCNN里面,可能是train.py里面的num_classes没改。 137 | 2、model_path没改。 138 | 3、classes_path没改。 139 | 请检查清楚了!确定自己所用的model_path和classes_path是对应的!训练的时候用到的num_classes或者classes_path也需要检查!** 140 | 141 | ### b、显存不足问题 142 | **问:为什么我运行train.py下面的命令行闪的贼快,还提示OOM啥的? 143 | 答:这是在keras中出现的,爆显存了,可以改小batch_size,SSD的显存占用率是最小的,建议用SSD; 144 | 2G显存:SSD、YOLOV4-TINY 145 | 4G显存:YOLOV3 146 | 6G显存:YOLOV4、Retinanet、M2det、Efficientdet、Faster RCNN等 147 | 8G+显存:随便选吧。** 148 | **需要注意的是,受到BatchNorm2d影响,batch_size不可为1,至少为2。** 149 | 150 | **问:为什么提示 RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 51.88 MiB free; 15.07 GiB reserved in total by PyTorch)? 151 | 答:这是pytorch中出现的,爆显存了,同上。** 152 | 153 | **问:为什么我显存都没利用,就直接爆显存了? 154 | 答:都爆显存了,自然就不利用了,模型没有开始训练。** 155 | ### c、训练问题(冻结训练,LOSS问题、训练效果问题等) 156 | **问:为什么要冻结训练和解冻训练呀? 157 | 答:这是迁移学习的思想,因为神经网络主干特征提取部分所提取到的特征是通用的,我们冻结起来训练可以加快训练效率,也可以防止权值被破坏。** 158 | 在冻结阶段,模型的主干被冻结了,特征提取网络不发生改变。占用的显存较小,仅对网络进行微调。 159 | 在解冻阶段,模型的主干不被冻结了,特征提取网络会发生改变。占用的显存较大,网络所有的参数都会发生改变。 160 | 161 | **问:为什么我的网络不收敛啊,LOSS是XXXX。 162 | 答:不同网络的LOSS不同,LOSS只是一个参考指标,用于查看网络是否收敛,而非评价网络好坏,我的yolo代码都没有归一化,所以LOSS值看起来比较高,LOSS的值不重要,重要的是是否在变小,预测是否有效果。** 163 | 164 | **问:为什么我的训练效果不好?预测了没有框(框不准)。 165 | 答:** 166 | 167 | 考虑几个问题: 168 | 1、目标信息问题,查看2007_train.txt文件是否有目标信息,没有的话请修改voc_annotation.py。 169 | 2、数据集问题,小于500的自行考虑增加数据集,同时测试不同的模型,确认数据集是好的。 170 | 3、是否解冻训练,如果数据集分布与常规画面差距过大需要进一步解冻训练,调整主干,加强特征提取能力。 171 | 4、网络问题,比如SSD不适合小目标,因为先验框固定了。 172 | 5、训练时长问题,有些同学只训练了几代表示没有效果,按默认参数训练完。 173 | 6、确认自己是否按照步骤去做了,如果比如voc_annotation.py里面的classes是否修改了等。 174 | 7、不同网络的LOSS不同,LOSS只是一个参考指标,用于查看网络是否收敛,而非评价网络好坏,LOSS的值不重要,重要的是是否收敛。 175 | 176 | **问:我怎么出现了gbk什么的编码错误啊:** 177 | ```python 178 | UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 446: illegal multibyte sequence 179 | ``` 180 | **答:标签和路径不要使用中文,如果一定要使用中文,请注意处理的时候编码的问题,改成打开文件的encoding方式改为utf-8。** 181 | 182 | **问:我的图片是xxx*xxx的分辨率的,可以用吗!** 183 | **答:可以用,代码里面会自动进行resize或者数据增强。** 184 | 185 | **问:怎么进行多GPU训练? 186 | 答:pytorch的大多数代码可以直接使用gpu训练,keras的话直接百度就好了,实现并不复杂,我没有多卡没法详细测试,还需要各位同学自己努力了。** 187 | ### d、灰度图问题 188 | **问:能不能训练灰度图(预测灰度图)啊? 189 | 答:我的大多数库会将灰度图转化成RGB进行训练和预测,如果遇到代码不能训练或者预测灰度图的情况,可以尝试一下在get_random_data里面将Image.open后的结果转换成RGB,预测的时候也这样试试。(仅供参考)** 190 | 191 | ### e、断点续练问题 192 | **问:我已经训练过几个世代了,能不能从这个基础上继续开始训练 193 | 答:可以,你在训练前,和载入预训练权重一样载入训练过的权重就行了。一般训练好的权重会保存在logs文件夹里面,将model_path修改成你要开始的权值的路径即可。** 194 | 195 | ### f、预训练权重的问题 196 | **问:如果我要训练其它的数据集,预训练权重要怎么办啊?** 197 | **答:数据的预训练权重对不同数据集是通用的,因为特征是通用的,预训练权重对于99%的情况都必须要用,不用的话权值太过随机,特征提取效果不明显,网络训练的结果也不会好。** 198 | 199 | **问:up,我修改了网络,预训练权重还能用吗? 200 | 答:修改了主干的话,如果不是用的现有的网络,基本上预训练权重是不能用的,要么就自己判断权值里卷积核的shape然后自己匹配,要么只能自己预训练去了;修改了后半部分的话,前半部分的主干部分的预训练权重还是可以用的,如果是pytorch代码的话,需要自己修改一下载入权值的方式,判断shape后载入,如果是keras代码,直接by_name=True,skip_mismatch=True即可。** 201 | 权值匹配的方式可以参考如下: 202 | ```python 203 | # 加快模型训练的效率 204 | print('Loading weights into state dict...') 205 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 206 | model_dict = model.state_dict() 207 | pretrained_dict = torch.load(model_path, map_location=device) 208 | a = {} 209 | for k, v in pretrained_dict.items(): 210 | try: 211 | if np.shape(model_dict[k]) == np.shape(v): 212 | a[k]=v 213 | except: 214 | pass 215 | model_dict.update(a) 216 | model.load_state_dict(model_dict) 217 | print('Finished!') 218 | ``` 219 | 220 | **问:我要怎么不使用预训练权重啊? 221 | 答:把载入预训练权重的代码注释了就行。** 222 | 223 | **问:为什么我不使用预训练权重效果这么差啊? 224 | 答:因为随机初始化的权值不好,提取的特征不好,也就导致了模型训练的效果不好,voc07+12、coco+voc07+12效果都不一样,预训练权重还是非常重要的。** 225 | 226 | ### g、视频检测问题与摄像头检测问题 227 | **问:怎么用摄像头检测呀? 228 | 答:predict.py修改参数可以进行摄像头检测,也有视频详细解释了摄像头检测的思路。** 229 | 230 | **问:怎么用视频检测呀? 231 | 答:同上** 232 | ### h、从0开始训练问题 233 | **问:怎么在模型上从0开始训练? 234 | 答:在算力不足与调参能力不足的情况下从0开始训练毫无意义。模型特征提取能力在随机初始化参数的情况下非常差。没有好的参数调节能力和算力,无法使得网络正常收敛。** 235 | 如果一定要从0开始,那么训练的时候请注意几点: 236 | - 不载入预训练权重。 237 | - 不要进行冻结训练,注释冻结模型的代码。 238 | 239 | **问:为什么我不使用预训练权重效果这么差啊? 240 | 答:因为随机初始化的权值不好,提取的特征不好,也就导致了模型训练的效果不好,voc07+12、coco+voc07+12效果都不一样,预训练权重还是非常重要的。** 241 | 242 | ### i、保存问题 243 | **问:检测完的图片怎么保存? 244 | 答:一般目标检测用的是Image,所以查询一下PIL库的Image如何进行保存。详细看看predict.py文件的注释。** 245 | 246 | **问:怎么用视频保存呀? 247 | 答:详细看看predict.py文件的注释。** 248 | 249 | ### j、遍历问题 250 | **问:如何对一个文件夹的图片进行遍历? 251 | 答:一般使用os.listdir先找出文件夹里面的所有图片,然后根据predict.py文件里面的执行思路检测图片就行了,详细看看predict.py文件的注释。** 252 | 253 | **问:如何对一个文件夹的图片进行遍历?并且保存。 254 | 答:遍历的话一般使用os.listdir先找出文件夹里面的所有图片,然后根据predict.py文件里面的执行思路检测图片就行了。保存的话一般目标检测用的是Image,所以查询一下PIL库的Image如何进行保存。如果有些库用的是cv2,那就是查一下cv2怎么保存图片。详细看看predict.py文件的注释。** 255 | 256 | ### k、路径问题(No such file or directory) 257 | **问:我怎么出现了这样的错误呀:** 258 | ```python 259 | FileNotFoundError: 【Errno 2】 No such file or directory 260 | …………………………………… 261 | …………………………………… 262 | ``` 263 | **答:去检查一下文件夹路径,查看是否有对应文件;并且检查一下2007_train.txt,其中文件路径是否有错。** 264 | 关于路径有几个重要的点: 265 | **文件夹名称中一定不要有空格。 266 | 注意相对路径和绝对路径。 267 | 多百度路径相关的知识。** 268 | 269 | **所有的路径问题基本上都是根目录问题,好好查一下相对目录的概念!** 270 | ### l、和原版比较问题 271 | **问:你这个代码和原版比怎么样,可以达到原版的效果么? 272 | 答:基本上可以达到,我都用voc数据测过,我没有好显卡,没有能力在coco上测试与训练。** 273 | 274 | **问:你有没有实现yolov4所有的tricks,和原版差距多少? 275 | 答:并没有实现全部的改进部分,由于YOLOV4使用的改进实在太多了,很难完全实现与列出来,这里只列出来了一些我比较感兴趣,而且非常有效的改进。论文中提到的SAM(注意力机制模块),作者自己的源码也没有使用。还有其它很多的tricks,不是所有的tricks都有提升,我也没法实现全部的tricks。至于和原版的比较,我没有能力训练coco数据集,根据使用过的同学反应差距不大。** 276 | 277 | ### m、FPS问题(检测速度问题) 278 | **问:你这个FPS可以到达多少,可以到 XX FPS么? 279 | 答:FPS和机子的配置有关,配置高就快,配置低就慢。** 280 | 281 | **问:为什么我用服务器去测试yolov4(or others)的FPS只有十几? 282 | 答:检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本,如果已经正确安装,可以去利用time.time()的方法查看detect_image里面,哪一段代码耗时更长(不仅只有网络耗时长,其它处理部分也会耗时,如绘图等)。** 283 | 284 | **问:为什么论文中说速度可以达到XX,但是这里却没有? 285 | 答:检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本,如果已经正确安装,可以去利用time.time()的方法查看detect_image里面,哪一段代码耗时更长(不仅只有网络耗时长,其它处理部分也会耗时,如绘图等)。有些论文还会使用多batch进行预测,我并没有去实现这个部分。** 286 | 287 | ### n、预测图片不显示问题 288 | **问:为什么你的代码在预测完成后不显示图片?只是在命令行告诉我有什么目标。 289 | 答:给系统安装一个图片查看器就行了。** 290 | 291 | ### o、算法评价问题(目标检测的map、PR曲线、Recall、Precision等) 292 | **问:怎么计算map? 293 | 答:看map视频,都一个流程。** 294 | 295 | **问:计算map的时候,get_map.py里面有一个MINOVERLAP是什么用的,是iou吗? 296 | 答:是iou,它的作用是判断预测框和真实框的重合成度,如果重合程度大于MINOVERLAP,则预测正确。** 297 | 298 | **问:为什么get_map.py里面的self.confidence(self.score)要设置的那么小? 299 | 答:看一下map的视频的原理部分,要知道所有的结果然后再进行pr曲线的绘制。** 300 | 301 | **问:能不能说说怎么绘制PR曲线啥的呀。 302 | 答:可以看mAP视频,结果里面有PR曲线。** 303 | 304 | **问:怎么计算Recall、Precision指标。 305 | 答:这俩指标应该是相对于特定的置信度的,计算map的时候也会获得。** 306 | 307 | ### p、coco数据集训练问题 308 | **问:目标检测怎么训练COCO数据集啊?。 309 | 答:coco数据训练所需要的txt文件可以参考qqwweee的yolo3的库,格式都是一样的。** 310 | 311 | ### q、模型优化(模型修改)问题 312 | **问:up,YOLO系列使用Focal LOSS的代码你有吗,有提升吗? 313 | 答:很多人试过,提升效果也不大(甚至变的更Low),它自己有自己的正负样本的平衡方式。** 314 | 315 | **问:up,我修改了网络,预训练权重还能用吗? 316 | 答:修改了主干的话,如果不是用的现有的网络,基本上预训练权重是不能用的,要么就自己判断权值里卷积核的shape然后自己匹配,要么只能自己预训练去了;修改了后半部分的话,前半部分的主干部分的预训练权重还是可以用的,如果是pytorch代码的话,需要自己修改一下载入权值的方式,判断shape后载入,如果是keras代码,直接by_name=True,skip_mismatch=True即可。** 317 | 权值匹配的方式可以参考如下: 318 | ```python 319 | # 加快模型训练的效率 320 | print('Loading weights into state dict...') 321 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 322 | model_dict = model.state_dict() 323 | pretrained_dict = torch.load(model_path, map_location=device) 324 | a = {} 325 | for k, v in pretrained_dict.items(): 326 | try: 327 | if np.shape(model_dict[k]) == np.shape(v): 328 | a[k]=v 329 | except: 330 | pass 331 | model_dict.update(a) 332 | model.load_state_dict(model_dict) 333 | print('Finished!') 334 | ``` 335 | 336 | **问:up,怎么修改模型啊,我想发个小论文! 337 | 答:建议看看yolov3和yolov4的区别,然后看看yolov4的论文,作为一个大型调参现场非常有参考意义,使用了很多tricks。我能给的建议就是多看一些经典模型,然后拆解里面的亮点结构并使用。** 338 | 339 | ### r、部署问题 340 | 我没有具体部署到手机等设备上过,所以很多部署问题我并不了解…… 341 | 342 | ## 4、语义分割库问题汇总 343 | ### a、shape不匹配问题 344 | #### 1)、训练时shape不匹配问题 345 | **问:up主,为什么运行train.py会提示shape不匹配啊? 346 | 答:在keras环境中,因为你训练的种类和原始的种类不同,网络结构会变化,所以最尾部的shape会有少量不匹配。** 347 | 348 | #### 2)、预测时shape不匹配问题 349 | **问:为什么我运行predict.py会提示我说shape不匹配呀。 350 | 在Pytorch里面是这样的:** 351 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171631901.png) 352 | 在Keras里面是这样的: 353 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171523380.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70) 354 | **答:原因主要有二: 355 | 1、train.py里面的num_classes没改。 356 | 2、预测时num_classes没改。 357 | 请检查清楚!训练和预测的时候用到的num_classes都需要检查!** 358 | 359 | ### b、显存不足问题 360 | **问:为什么我运行train.py下面的命令行闪的贼快,还提示OOM啥的? 361 | 答:这是在keras中出现的,爆显存了,可以改小batch_size。** 362 | 363 | **需要注意的是,受到BatchNorm2d影响,batch_size不可为1,至少为2。** 364 | 365 | **问:为什么提示 RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 51.88 MiB free; 15.07 GiB reserved in total by PyTorch)? 366 | 答:这是pytorch中出现的,爆显存了,同上。** 367 | 368 | **问:为什么我显存都没利用,就直接爆显存了? 369 | 答:都爆显存了,自然就不利用了,模型没有开始训练。** 370 | 371 | ### c、训练问题(冻结训练,LOSS问题、训练效果问题等) 372 | **问:为什么要冻结训练和解冻训练呀? 373 | 答:这是迁移学习的思想,因为神经网络主干特征提取部分所提取到的特征是通用的,我们冻结起来训练可以加快训练效率,也可以防止权值被破坏。** 374 | **在冻结阶段,模型的主干被冻结了,特征提取网络不发生改变。占用的显存较小,仅对网络进行微调。** 375 | **在解冻阶段,模型的主干不被冻结了,特征提取网络会发生改变。占用的显存较大,网络所有的参数都会发生改变。** 376 | 377 | **问:为什么我的网络不收敛啊,LOSS是XXXX。 378 | 答:不同网络的LOSS不同,LOSS只是一个参考指标,用于查看网络是否收敛,而非评价网络好坏,我的yolo代码都没有归一化,所以LOSS值看起来比较高,LOSS的值不重要,重要的是是否在变小,预测是否有效果。** 379 | 380 | **问:为什么我的训练效果不好?预测了没有目标,结果是一片黑。 381 | 答:** 382 | **考虑几个问题: 383 | 1、数据集问题,这是最重要的问题。小于500的自行考虑增加数据集;一定要检查数据集的标签,视频中详细解析了VOC数据集的格式,但并不是有输入图片有输出标签即可,还需要确认标签的每一个像素值是否为它对应的种类。很多同学的标签格式不对,最常见的错误格式就是标签的背景为黑,目标为白,此时目标的像素点值为255,无法正常训练,目标需要为1才行。 384 | 2、是否解冻训练,如果数据集分布与常规画面差距过大需要进一步解冻训练,调整主干,加强特征提取能力。 385 | 3、网络问题,可以尝试不同的网络。 386 | 4、训练时长问题,有些同学只训练了几代表示没有效果,按默认参数训练完。 387 | 5、确认自己是否按照步骤去做了。 388 | 6、不同网络的LOSS不同,LOSS只是一个参考指标,用于查看网络是否收敛,而非评价网络好坏,LOSS的值不重要,重要的是是否收敛。** 389 | 390 | 391 | 392 | **问:为什么我的训练效果不好?对小目标预测不准确。 393 | 答:对于deeplab和pspnet而言,可以修改一下downsample_factor,当downsample_factor为16的时候下采样倍数过多,效果不太好,可以修改为8。** 394 | 395 | **问:我怎么出现了gbk什么的编码错误啊:** 396 | ```python 397 | UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 446: illegal multibyte sequence 398 | ``` 399 | **答:标签和路径不要使用中文,如果一定要使用中文,请注意处理的时候编码的问题,改成打开文件的encoding方式改为utf-8。** 400 | 401 | **问:我的图片是xxx*xxx的分辨率的,可以用吗!** 402 | **答:可以用,代码里面会自动进行resize或者数据增强。** 403 | 404 | **问:怎么进行多GPU训练? 405 | 答:pytorch的大多数代码可以直接使用gpu训练,keras的话直接百度就好了,实现并不复杂,我没有多卡没法详细测试,还需要各位同学自己努力了。** 406 | 407 | ### d、灰度图问题 408 | **问:能不能训练灰度图(预测灰度图)啊? 409 | 答:我的大多数库会将灰度图转化成RGB进行训练和预测,如果遇到代码不能训练或者预测灰度图的情况,可以尝试一下在get_random_data里面将Image.open后的结果转换成RGB,预测的时候也这样试试。(仅供参考)** 410 | 411 | ### e、断点续练问题 412 | **问:我已经训练过几个世代了,能不能从这个基础上继续开始训练 413 | 答:可以,你在训练前,和载入预训练权重一样载入训练过的权重就行了。一般训练好的权重会保存在logs文件夹里面,将model_path修改成你要开始的权值的路径即可。** 414 | 415 | ### f、预训练权重的问题 416 | 417 | **问:如果我要训练其它的数据集,预训练权重要怎么办啊?** 418 | **答:数据的预训练权重对不同数据集是通用的,因为特征是通用的,预训练权重对于99%的情况都必须要用,不用的话权值太过随机,特征提取效果不明显,网络训练的结果也不会好。** 419 | 420 | **问:up,我修改了网络,预训练权重还能用吗? 421 | 答:修改了主干的话,如果不是用的现有的网络,基本上预训练权重是不能用的,要么就自己判断权值里卷积核的shape然后自己匹配,要么只能自己预训练去了;修改了后半部分的话,前半部分的主干部分的预训练权重还是可以用的,如果是pytorch代码的话,需要自己修改一下载入权值的方式,判断shape后载入,如果是keras代码,直接by_name=True,skip_mismatch=True即可。** 422 | 权值匹配的方式可以参考如下: 423 | 424 | ```python 425 | # 加快模型训练的效率 426 | print('Loading weights into state dict...') 427 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 428 | model_dict = model.state_dict() 429 | pretrained_dict = torch.load(model_path, map_location=device) 430 | a = {} 431 | for k, v in pretrained_dict.items(): 432 | try: 433 | if np.shape(model_dict[k]) == np.shape(v): 434 | a[k]=v 435 | except: 436 | pass 437 | model_dict.update(a) 438 | model.load_state_dict(model_dict) 439 | print('Finished!') 440 | ``` 441 | 442 | **问:我要怎么不使用预训练权重啊? 443 | 答:把载入预训练权重的代码注释了就行。** 444 | 445 | **问:为什么我不使用预训练权重效果这么差啊? 446 | 答:因为随机初始化的权值不好,提取的特征不好,也就导致了模型训练的效果不好,预训练权重还是非常重要的。** 447 | 448 | ### g、视频检测问题与摄像头检测问题 449 | **问:怎么用摄像头检测呀? 450 | 答:predict.py修改参数可以进行摄像头检测,也有视频详细解释了摄像头检测的思路。** 451 | 452 | **问:怎么用视频检测呀? 453 | 答:同上** 454 | 455 | ### h、从0开始训练问题 456 | **问:怎么在模型上从0开始训练? 457 | 答:在算力不足与调参能力不足的情况下从0开始训练毫无意义。模型特征提取能力在随机初始化参数的情况下非常差。没有好的参数调节能力和算力,无法使得网络正常收敛。** 458 | 如果一定要从0开始,那么训练的时候请注意几点: 459 | - 不载入预训练权重。 460 | - 不要进行冻结训练,注释冻结模型的代码。 461 | 462 | **问:为什么我不使用预训练权重效果这么差啊? 463 | 答:因为随机初始化的权值不好,提取的特征不好,也就导致了模型训练的效果不好,预训练权重还是非常重要的。** 464 | 465 | ### i、保存问题 466 | **问:检测完的图片怎么保存? 467 | 答:一般目标检测用的是Image,所以查询一下PIL库的Image如何进行保存。详细看看predict.py文件的注释。** 468 | 469 | **问:怎么用视频保存呀? 470 | 答:详细看看predict.py文件的注释。** 471 | 472 | ### j、遍历问题 473 | **问:如何对一个文件夹的图片进行遍历? 474 | 答:一般使用os.listdir先找出文件夹里面的所有图片,然后根据predict.py文件里面的执行思路检测图片就行了,详细看看predict.py文件的注释。** 475 | 476 | **问:如何对一个文件夹的图片进行遍历?并且保存。 477 | 答:遍历的话一般使用os.listdir先找出文件夹里面的所有图片,然后根据predict.py文件里面的执行思路检测图片就行了。保存的话一般目标检测用的是Image,所以查询一下PIL库的Image如何进行保存。如果有些库用的是cv2,那就是查一下cv2怎么保存图片。详细看看predict.py文件的注释。** 478 | 479 | ### k、路径问题(No such file or directory) 480 | **问:我怎么出现了这样的错误呀:** 481 | ```python 482 | FileNotFoundError: 【Errno 2】 No such file or directory 483 | …………………………………… 484 | …………………………………… 485 | ``` 486 | 487 | **答:去检查一下文件夹路径,查看是否有对应文件;并且检查一下2007_train.txt,其中文件路径是否有错。** 488 | 关于路径有几个重要的点: 489 | **文件夹名称中一定不要有空格。 490 | 注意相对路径和绝对路径。 491 | 多百度路径相关的知识。** 492 | 493 | **所有的路径问题基本上都是根目录问题,好好查一下相对目录的概念!** 494 | 495 | ### l、FPS问题(检测速度问题) 496 | **问:你这个FPS可以到达多少,可以到 XX FPS么? 497 | 答:FPS和机子的配置有关,配置高就快,配置低就慢。** 498 | 499 | **问:为什么论文中说速度可以达到XX,但是这里却没有? 500 | 答:检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本,如果已经正确安装,可以去利用time.time()的方法查看detect_image里面,哪一段代码耗时更长(不仅只有网络耗时长,其它处理部分也会耗时,如绘图等)。有些论文还会使用多batch进行预测,我并没有去实现这个部分。** 501 | 502 | ### m、预测图片不显示问题 503 | **问:为什么你的代码在预测完成后不显示图片?只是在命令行告诉我有什么目标。 504 | 答:给系统安装一个图片查看器就行了。** 505 | 506 | ### n、算法评价问题(miou) 507 | **问:怎么计算miou? 508 | 答:参考视频里的miou测量部分。** 509 | 510 | **问:怎么计算Recall、Precision指标。 511 | 答:现有的代码还无法获得,需要各位同学理解一下混淆矩阵的概念,然后自行计算一下。** 512 | 513 | ### o、模型优化(模型修改)问题 514 | **问:up,我修改了网络,预训练权重还能用吗? 515 | 答:修改了主干的话,如果不是用的现有的网络,基本上预训练权重是不能用的,要么就自己判断权值里卷积核的shape然后自己匹配,要么只能自己预训练去了;修改了后半部分的话,前半部分的主干部分的预训练权重还是可以用的,如果是pytorch代码的话,需要自己修改一下载入权值的方式,判断shape后载入,如果是keras代码,直接by_name=True,skip_mismatch=True即可。** 516 | 权值匹配的方式可以参考如下: 517 | 518 | ```python 519 | # 加快模型训练的效率 520 | print('Loading weights into state dict...') 521 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 522 | model_dict = model.state_dict() 523 | pretrained_dict = torch.load(model_path, map_location=device) 524 | a = {} 525 | for k, v in pretrained_dict.items(): 526 | try: 527 | if np.shape(model_dict[k]) == np.shape(v): 528 | a[k]=v 529 | except: 530 | pass 531 | model_dict.update(a) 532 | model.load_state_dict(model_dict) 533 | print('Finished!') 534 | ``` 535 | 536 | 537 | 538 | **问:up,怎么修改模型啊,我想发个小论文! 539 | 答:建议看看目标检测中yolov4的论文,作为一个大型调参现场非常有参考意义,使用了很多tricks。我能给的建议就是多看一些经典模型,然后拆解里面的亮点结构并使用。常用的tricks如注意力机制什么的,可以试试。** 540 | 541 | ### p、部署问题 542 | 我没有具体部署到手机等设备上过,所以很多部署问题我并不了解…… 543 | 544 | ## 5、交流群问题 545 | **问:up,有没有QQ群啥的呢? 546 | 答:没有没有,我没有时间管理QQ群……** 547 | 548 | ## 6、怎么学习的问题 549 | **问:up,你的学习路线怎么样的?我是个小白我要怎么学? 550 | 答:这里有几点需要注意哈 551 | 1、我不是高手,很多东西我也不会,我的学习路线也不一定适用所有人。 552 | 2、我实验室不做深度学习,所以我很多东西都是自学,自己摸索,正确与否我也不知道。 553 | 3、我个人觉得学习更靠自学** 554 | 学习路线的话,我是先学习了莫烦的python教程,从tensorflow、keras、pytorch入门,入门完之后学的SSD,YOLO,然后了解了很多经典的卷积网,后面就开始学很多不同的代码了,我的学习方法就是一行一行的看,了解整个代码的执行流程,特征层的shape变化等,花了很多时间也没有什么捷径,就是要花时间吧。 --------------------------------------------------------------------------------