├── .gitignore ├── LICENSE ├── README.md ├── VOCdevkit └── VOC2007 │ ├── ImageSets │ └── Segmentation │ │ └── README.md │ ├── JPEGImages │ └── README.md │ └── SegmentationClass │ └── README.md ├── datasets ├── JPEGImages │ └── 1.jpg ├── SegmentationClass │ └── 1.png └── before │ ├── 1.jpg │ └── 1.json ├── get_miou.py ├── img └── street.jpg ├── json_to_dataset.py ├── logs └── README.md ├── model_data ├── README.md └── pspnet_mobilenetv2.h5 ├── nets ├── __init__.py ├── mobilenetv2.py ├── pspnet.py ├── pspnet_training.py └── resnet50.py ├── predict.py ├── pspnet.py ├── requirements.txt ├── summary.py ├── train.py ├── utils ├── __init__.py ├── callbacks.py ├── dataloader.py ├── utils.py ├── utils_fit.py └── utils_metrics.py ├── voc_annotation.py └── 常见问题汇总.md /.gitignore: -------------------------------------------------------------------------------- 1 | # ignore map, miou, datasets 2 | map_out/ 3 | miou_out/ 4 | VOCdevkit/ 5 | datasets/ 6 | Medical_Datasets/ 7 | lfw/ 8 | logs/ 9 | model_data/ 10 | .temp_miou_out/ 11 | 12 | # Byte-compiled / optimized / DLL files 13 | __pycache__/ 14 | *.py[cod] 15 | *$py.class 16 | 17 | # C extensions 18 | *.so 19 | 20 | # Distribution / packaging 21 | .Python 22 | build/ 23 | develop-eggs/ 24 | dist/ 25 | downloads/ 26 | eggs/ 27 | .eggs/ 28 | lib/ 29 | lib64/ 30 | parts/ 31 | sdist/ 32 | var/ 33 | wheels/ 34 | pip-wheel-metadata/ 35 | share/python-wheels/ 36 | *.egg-info/ 37 | .installed.cfg 38 | *.egg 39 | MANIFEST 40 | 41 | # PyInstaller 42 | # Usually these files are written by a python script from a template 43 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 44 | *.manifest 45 | *.spec 46 | 47 | # Installer logs 48 | pip-log.txt 49 | pip-delete-this-directory.txt 50 | 51 | # Unit test / coverage reports 52 | htmlcov/ 53 | .tox/ 54 | .nox/ 55 | .coverage 56 | .coverage.* 57 | .cache 58 | nosetests.xml 59 | coverage.xml 60 | *.cover 61 | *.py,cover 62 | .hypothesis/ 63 | .pytest_cache/ 64 | 65 | # Translations 66 | *.mo 67 | *.pot 68 | 69 | # Django stuff: 70 | *.log 71 | local_settings.py 72 | db.sqlite3 73 | db.sqlite3-journal 74 | 75 | # Flask stuff: 76 | instance/ 77 | .webassets-cache 78 | 79 | # Scrapy stuff: 80 | .scrapy 81 | 82 | # Sphinx documentation 83 | docs/_build/ 84 | 85 | # PyBuilder 86 | target/ 87 | 88 | # Jupyter Notebook 89 | .ipynb_checkpoints 90 | 91 | # IPython 92 | profile_default/ 93 | ipython_config.py 94 | 95 | # pyenv 96 | .python-version 97 | 98 | # pipenv 99 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 100 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 101 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 102 | # install all needed dependencies. 103 | #Pipfile.lock 104 | 105 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 106 | __pypackages__/ 107 | 108 | # Celery stuff 109 | celerybeat-schedule 110 | celerybeat.pid 111 | 112 | # SageMath parsed files 113 | *.sage.py 114 | 115 | # Environments 116 | .env 117 | .venv 118 | env/ 119 | venv/ 120 | ENV/ 121 | env.bak/ 122 | venv.bak/ 123 | 124 | # Spyder project settings 125 | .spyderproject 126 | .spyproject 127 | 128 | # Rope project settings 129 | .ropeproject 130 | 131 | # mkdocs documentation 132 | /site 133 | 134 | # mypy 135 | .mypy_cache/ 136 | .dmypy.json 137 | dmypy.json 138 | 139 | # Pyre type checker 140 | .pyre/ 141 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Bubbliiiing 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## PSPnet:Pyramid Scene Parsing Network目标检测模型在tensorflow2当中的实现 2 | --- 3 | 4 | ### 目录 5 | 1. [仓库更新 Top News](#仓库更新) 6 | 2. [相关仓库 Related code](#相关仓库) 7 | 3. [性能情况 Performance](#性能情况) 8 | 4. [所需环境 Environment](#所需环境) 9 | 5. [文件下载 Download](#文件下载) 10 | 6. [训练步骤 How2train](#训练步骤) 11 | 7. [预测步骤 How2predict](#预测步骤) 12 | 8. [评估步骤 miou](#评估步骤) 13 | 9. [参考资料 Reference](#Reference) 14 | 15 | ## Top News 16 | **`2022-04`**:**支持多GPU训练。** 17 | 18 | **`2022-03`**:**进行大幅度更新、支持step、cos学习率下降法、支持adam、sgd优化器选择、支持学习率根据batch_size自适应调整。** 19 | BiliBili视频中的原仓库地址为:https://github.com/bubbliiiing/pspnet-tf2/tree/bilibili 20 | 21 | **`2020-08`**:**创建仓库、支持多backbone、支持数据miou评估、标注数据处理、大量注释等。** 22 | 23 | ## 相关仓库 24 | | 模型 | 路径 | 25 | | :----- | :----- | 26 | Unet | https://github.com/bubbliiiing/unet-tf2 27 | PSPnet | https://github.com/bubbliiiing/pspnet-tf2 28 | deeplabv3+ | https://github.com/bubbliiiing/deeplabv3-plus-tf2 29 | 30 | ### 性能情况 31 | | 训练数据集 | 权值文件名称 | 测试数据集 | 输入图片大小 | mIOU | 32 | | :-----: | :-----: | :------: | :------: | :------: | 33 | | VOC12+SBD | [pspnet_mobilenetv2.h5](https://github.com/bubbliiiing/pspnet-tf2/releases/download/v1.0/pspnet_mobilenetv2.h5) | VOC-Val12 | 473x473| 71.04 | 34 | | VOC12+SBD | [pspnet_resnet50.h5](https://github.com/bubbliiiing/pspnet-tf2/releases/download/v1.0/pspnet_resnet50.h5) | VOC-Val12 | 473x473| 79.92 | 35 | 36 | ### 所需环境 37 | tensorflow-gpu==2.2.0 38 | 39 | ### 文件下载 40 | 训练所需的pspnet_mobilenetv2.h5和pspnet_resnet50.h5可在百度网盘中下载。 41 | 链接: https://pan.baidu.com/s/1-sIjtenHU05JzVIjyFwvxQ 提取码: upft 42 | 43 | VOC拓展数据集的百度网盘如下: 44 | 链接: https://pan.baidu.com/s/1vkk3lMheUm6IjTXznlg7Ng 提取码: 44mk 45 | 46 | ### 训练步骤 47 | #### a、训练voc数据集 48 | 1、将我提供的voc数据集放入VOCdevkit中(无需运行voc_annotation.py)。 49 | 2、在train.py中设置对应参数,默认参数已经对应voc数据集所需要的参数了,所以只要修改backbone和model_path即可。 50 | 3、运行train.py进行训练。 51 | 52 | #### b、训练自己的数据集 53 | 1、本文使用VOC格式进行训练。 54 | 2、训练前将标签文件放在VOCdevkit文件夹下的VOC2007文件夹下的SegmentationClass中。 55 | 3、训练前将图片文件放在VOCdevkit文件夹下的VOC2007文件夹下的JPEGImages中。 56 | 4、在训练前利用voc_annotation.py文件生成对应的txt。 57 | 5、在train.py文件夹下面,选择自己要使用的主干模型和下采样因子。本文提供的主干模型有mobilenet和resnet50。下采样因子可以在8和16中选择。需要注意的是,预训练模型需要和主干模型相对应。 58 | 6、注意修改train.py的num_classes为分类个数+1。 59 | 7、运行train.py即可开始训练。 60 | 61 | ### 预测步骤 62 | #### a、使用预训练权重 63 | 1. 下载完库后解压,如果想用backbone为mobilenet的进行预测,直接运行predict.py就可以了;如果想要利用backbone为resnet50的进行预测,在百度网盘下载pspnet_resnet50.h5,放入model_data,修改pspnet.py的backbone和model_path之后再运行predict.py,输入 64 | ```python 65 | img/street.jpg 66 | ``` 67 | 2. 在predict.py里面进行设置可以进行fps测试和video视频检测。 68 | #### b、使用自己训练的权重 69 | 1. 按照训练步骤训练。 70 | 2. 在pspnet.py文件里面,在如下部分修改model_path和backbone使其对应训练好的文件;**model_path对应logs文件夹下面的权值文件,backbone是所使用的主干特征提取网络**。 71 | ```python 72 | _defaults = { 73 | #-------------------------------------------------------------------# 74 | # model_path指向logs文件夹下的权值文件 75 | # 训练好后logs文件夹下存在多个权值文件,选择验证集损失较低的即可。 76 | # 验证集损失较低不代表miou较高,仅代表该权值在验证集上泛化性能较好。 77 | #-------------------------------------------------------------------# 78 | "model_path" : 'model_data/pspnet_mobilenetv2.h5', 79 | #----------------------------------------# 80 | # 所需要区分的类的个数+1 81 | #----------------------------------------# 82 | "num_classes" : 21, 83 | #----------------------------------------# 84 | # 所使用的的主干网络:mobilenet、resnet50 85 | #----------------------------------------# 86 | "backbone" : "mobilenet", 87 | #----------------------------------------# 88 | # 输入图片的大小 89 | #----------------------------------------# 90 | "input_shape" : [473, 473], 91 | #----------------------------------------# 92 | # 下采样的倍数,一般可选的为8和16 93 | # 与训练时设置的一样即可 94 | #----------------------------------------# 95 | "downsample_factor" : 16, 96 | #--------------------------------# 97 | # blend参数用于控制是否 98 | # 让识别结果和原图混合 99 | #--------------------------------# 100 | "blend" : True, 101 | } 102 | ``` 103 | 3. 运行predict.py,输入 104 | ```python 105 | img/street.jpg 106 | ``` 107 | 4. 在predict.py里面进行设置可以进行fps测试和video视频检测。 108 | 109 | ### 评估步骤 110 | 1、设置get_miou.py里面的num_classes为预测的类的数量加1。 111 | 2、设置get_miou.py里面的name_classes为需要去区分的类别。 112 | 3、运行get_miou.py即可获得miou大小。 113 | 114 | ### Reference 115 | https://github.com/ggyyzm/pytorch_segmentation 116 | https://github.com/bonlime/keras-deeplab-v3-plus 117 | -------------------------------------------------------------------------------- /VOCdevkit/VOC2007/ImageSets/Segmentation/README.md: -------------------------------------------------------------------------------- 1 | 存放的是指向文件名称的txt 2 | 3 | -------------------------------------------------------------------------------- /VOCdevkit/VOC2007/JPEGImages/README.md: -------------------------------------------------------------------------------- 1 | 这里面存放的是训练用的图片文件。 2 | -------------------------------------------------------------------------------- /VOCdevkit/VOC2007/SegmentationClass/README.md: -------------------------------------------------------------------------------- 1 | 这里面存放的是训练过程中产生的权重。 2 | -------------------------------------------------------------------------------- /datasets/JPEGImages/1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bubbliiiing/pspnet-tf2/501b5a03cd085c22ffb63362e2c99fc2470d39de/datasets/JPEGImages/1.jpg -------------------------------------------------------------------------------- /datasets/SegmentationClass/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bubbliiiing/pspnet-tf2/501b5a03cd085c22ffb63362e2c99fc2470d39de/datasets/SegmentationClass/1.png -------------------------------------------------------------------------------- /datasets/before/1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bubbliiiing/pspnet-tf2/501b5a03cd085c22ffb63362e2c99fc2470d39de/datasets/before/1.jpg -------------------------------------------------------------------------------- /get_miou.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import tensorflow as tf 4 | from PIL import Image 5 | from tqdm import tqdm 6 | 7 | from pspnet import Pspnet 8 | from utils.utils_metrics import compute_mIoU, show_results 9 | 10 | gpus = tf.config.experimental.list_physical_devices(device_type='GPU') 11 | for gpu in gpus: 12 | tf.config.experimental.set_memory_growth(gpu, True) 13 | 14 | ''' 15 | 进行指标评估需要注意以下几点: 16 | 1、该文件生成的图为灰度图,因为值比较小,按照PNG形式的图看是没有显示效果的,所以看到近似全黑的图是正常的。 17 | 2、该文件计算的是验证集的miou,当前该库将测试集当作验证集使用,不单独划分测试集 18 | ''' 19 | if __name__ == "__main__": 20 | #---------------------------------------------------------------------------# 21 | # miou_mode用于指定该文件运行时计算的内容 22 | # miou_mode为0代表整个miou计算流程,包括获得预测结果、计算miou。 23 | # miou_mode为1代表仅仅获得预测结果。 24 | # miou_mode为2代表仅仅计算miou。 25 | #---------------------------------------------------------------------------# 26 | miou_mode = 0 27 | #------------------------------# 28 | # 分类个数+1、如2+1 29 | #------------------------------# 30 | num_classes = 21 31 | #--------------------------------------------# 32 | # 区分的种类,和json_to_dataset里面的一样 33 | #--------------------------------------------# 34 | name_classes = ["background","aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"] 35 | # name_classes = ["_background_","cat","dog"] 36 | #-------------------------------------------------------# 37 | # 指向VOC数据集所在的文件夹 38 | # 默认指向根目录下的VOC数据集 39 | #-------------------------------------------------------# 40 | VOCdevkit_path = 'VOCdevkit' 41 | 42 | image_ids = open(os.path.join(VOCdevkit_path, "VOC2007/ImageSets/Segmentation/val.txt"),'r').read().splitlines() 43 | gt_dir = os.path.join(VOCdevkit_path, "VOC2007/SegmentationClass/") 44 | miou_out_path = "miou_out" 45 | pred_dir = os.path.join(miou_out_path, 'detection-results') 46 | 47 | if miou_mode == 0 or miou_mode == 1: 48 | if not os.path.exists(pred_dir): 49 | os.makedirs(pred_dir) 50 | 51 | print("Load model.") 52 | pspnet = Pspnet() 53 | print("Load model done.") 54 | 55 | print("Get predict result.") 56 | for image_id in tqdm(image_ids): 57 | image_path = os.path.join(VOCdevkit_path, "VOC2007/JPEGImages/"+image_id+".jpg") 58 | image = Image.open(image_path) 59 | image = pspnet.get_miou_png(image) 60 | image.save(os.path.join(pred_dir, image_id + ".png")) 61 | print("Get predict result done.") 62 | 63 | if miou_mode == 0 or miou_mode == 2: 64 | print("Get miou.") 65 | hist, IoUs, PA_Recall, Precision = compute_mIoU(gt_dir, pred_dir, image_ids, num_classes, name_classes) # 执行计算mIoU的函数 66 | print("Get miou done.") 67 | show_results(miou_out_path, hist, IoUs, PA_Recall, Precision, name_classes) -------------------------------------------------------------------------------- /img/street.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bubbliiiing/pspnet-tf2/501b5a03cd085c22ffb63362e2c99fc2470d39de/img/street.jpg -------------------------------------------------------------------------------- /json_to_dataset.py: -------------------------------------------------------------------------------- 1 | import base64 2 | import json 3 | import os 4 | import os.path as osp 5 | 6 | import numpy as np 7 | import PIL.Image 8 | from labelme import utils 9 | 10 | ''' 11 | 制作自己的语义分割数据集需要注意以下几点: 12 | 1、我使用的labelme版本是3.16.7,建议使用该版本的labelme,有些版本的labelme会发生错误, 13 | 具体错误为:Too many dimensions: 3 > 2 14 | 安装方式为命令行pip install labelme==3.16.7 15 | 2、此处生成的标签图是8位彩色图,与视频中看起来的数据集格式不太一样。 16 | 虽然看起来是彩图,但事实上只有8位,此时每个像素点的值就是这个像素点所属的种类。 17 | 所以其实和视频中VOC数据集的格式一样。因此这样制作出来的数据集是可以正常使用的。也是正常的。 18 | ''' 19 | if __name__ == '__main__': 20 | jpgs_path = "datasets/JPEGImages" 21 | pngs_path = "datasets/SegmentationClass" 22 | classes = ["_background_","aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"] 23 | # classes = ["_background_","cat","dog"] 24 | 25 | count = os.listdir("./datasets/before/") 26 | for i in range(0, len(count)): 27 | path = os.path.join("./datasets/before", count[i]) 28 | 29 | if os.path.isfile(path) and path.endswith('json'): 30 | data = json.load(open(path)) 31 | 32 | if data['imageData']: 33 | imageData = data['imageData'] 34 | else: 35 | imagePath = os.path.join(os.path.dirname(path), data['imagePath']) 36 | with open(imagePath, 'rb') as f: 37 | imageData = f.read() 38 | imageData = base64.b64encode(imageData).decode('utf-8') 39 | 40 | img = utils.img_b64_to_arr(imageData) 41 | label_name_to_value = {'_background_': 0} 42 | for shape in data['shapes']: 43 | label_name = shape['label'] 44 | if label_name in label_name_to_value: 45 | label_value = label_name_to_value[label_name] 46 | else: 47 | label_value = len(label_name_to_value) 48 | label_name_to_value[label_name] = label_value 49 | 50 | # label_values must be dense 51 | label_values, label_names = [], [] 52 | for ln, lv in sorted(label_name_to_value.items(), key=lambda x: x[1]): 53 | label_values.append(lv) 54 | label_names.append(ln) 55 | assert label_values == list(range(len(label_values))) 56 | 57 | lbl = utils.shapes_to_label(img.shape, data['shapes'], label_name_to_value) 58 | 59 | 60 | PIL.Image.fromarray(img).save(osp.join(jpgs_path, count[i].split(".")[0]+'.jpg')) 61 | 62 | new = np.zeros([np.shape(img)[0],np.shape(img)[1]]) 63 | for name in label_names: 64 | index_json = label_names.index(name) 65 | index_all = classes.index(name) 66 | new = new + index_all*(np.array(lbl) == index_json) 67 | 68 | utils.lblsave(osp.join(pngs_path, count[i].split(".")[0]+'.png'), new) 69 | print('Saved ' + count[i].split(".")[0] + '.jpg and ' + count[i].split(".")[0] + '.png') 70 | -------------------------------------------------------------------------------- /logs/README.md: -------------------------------------------------------------------------------- 1 | 这里面存放的是训练过程中产生的权重。 2 | -------------------------------------------------------------------------------- /model_data/README.md: -------------------------------------------------------------------------------- 1 | 这里面存放的是已经训练好的权重,可通过百度网盘下载。 2 | -------------------------------------------------------------------------------- /model_data/pspnet_mobilenetv2.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bubbliiiing/pspnet-tf2/501b5a03cd085c22ffb63362e2c99fc2470d39de/model_data/pspnet_mobilenetv2.h5 -------------------------------------------------------------------------------- /nets/__init__.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------------------------------- /nets/mobilenetv2.py: -------------------------------------------------------------------------------- 1 | from tensorflow.keras.activations import relu 2 | from tensorflow.keras.layers import (Activation, Add, BatchNormalization, 3 | Conv2D, DepthwiseConv2D, Input) 4 | 5 | 6 | def _make_divisible(v, divisor, min_value=None): 7 | if min_value is None: 8 | min_value = divisor 9 | new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) 10 | if new_v < 0.9 * v: 11 | new_v += divisor 12 | return new_v 13 | 14 | def relu6(x): 15 | return relu(x, max_value=6) 16 | 17 | def _inverted_res_block(inputs, expansion, stride, alpha, in_filters, filters, block_id, skip_connection, rate=1): 18 | pointwise_conv_filters = int(filters * alpha) 19 | pointwise_filters = _make_divisible(pointwise_conv_filters, 8) 20 | x = inputs 21 | prefix = 'expanded_conv_{}_'.format(block_id) 22 | 23 | #----------------------------------------------------# 24 | # 利用1x1卷积根据输入进来的通道数进行通道数上升 25 | #----------------------------------------------------# 26 | if block_id: 27 | x = Conv2D(expansion * in_filters, kernel_size=1, padding='same', 28 | use_bias=False, activation=None, 29 | name=prefix + 'expand')(x) 30 | x = BatchNormalization(epsilon=1e-3, momentum=0.999, 31 | name=prefix + 'expand_BN')(x) 32 | x = Activation(relu6, name=prefix + 'expand_relu')(x) 33 | else: 34 | prefix = 'expanded_conv_' 35 | 36 | #----------------------------------------------------# 37 | # 利用深度可分离卷积进行特征提取 38 | #----------------------------------------------------# 39 | x = DepthwiseConv2D(kernel_size=3, strides=stride, activation=None, 40 | use_bias=False, padding='same', dilation_rate=(rate, rate), 41 | name=prefix + 'depthwise')(x) 42 | x = BatchNormalization(epsilon=1e-3, momentum=0.999, 43 | name=prefix + 'depthwise_BN')(x) 44 | 45 | x = Activation(relu6, name=prefix + 'depthwise_relu')(x) 46 | 47 | #----------------------------------------------------# 48 | # 利用1x1的卷积进行通道数的下降 49 | #----------------------------------------------------# 50 | x = Conv2D(pointwise_filters, 51 | kernel_size=1, padding='same', use_bias=False, activation=None, 52 | name=prefix + 'project')(x) 53 | x = BatchNormalization(epsilon=1e-3, momentum=0.999, 54 | name=prefix + 'project_BN')(x) 55 | 56 | #----------------------------------------------------# 57 | # 添加残差边 58 | #----------------------------------------------------# 59 | if skip_connection: 60 | return Add(name=prefix + 'add')([inputs, x]) 61 | return x 62 | 63 | def get_mobilenet_encoder(inputs_size, downsample_factor=8): 64 | if downsample_factor == 16: 65 | block4_dilation = 1 66 | block5_dilation = 2 67 | block4_stride = 2 68 | elif downsample_factor == 8: 69 | block4_dilation = 2 70 | block5_dilation = 4 71 | block4_stride = 1 72 | else: 73 | raise ValueError('Unsupported factor - `{}`, Use 8 or 16.'.format(downsample_factor)) 74 | 75 | # 473,473,3 76 | inputs = Input(shape=inputs_size) 77 | 78 | alpha=1.0 79 | first_block_filters = _make_divisible(32 * alpha, 8) 80 | 81 | # 473,473,3 -> 237,237,32 82 | x = Conv2D(first_block_filters, 83 | kernel_size=3, 84 | strides=(2, 2), padding='same', 85 | use_bias=False, name='Conv')(inputs) 86 | x = BatchNormalization(epsilon=1e-3, momentum=0.999, name='Conv_BN')(x) 87 | x = Activation(relu6, name='Conv_Relu6')(x) 88 | 89 | # 237,237,32 -> 237,237,16 90 | x = _inverted_res_block(x, in_filters=32, filters=16, alpha=alpha, stride=1, 91 | expansion=1, block_id=0, skip_connection=False) 92 | 93 | #---------------------------------------------------------------# 94 | # 237,237,16 -> 119,119,24 95 | x = _inverted_res_block(x, in_filters=16, filters=24, alpha=alpha, stride=2, 96 | expansion=6, block_id=1, skip_connection=False) 97 | x = _inverted_res_block(x, in_filters=24, filters=24, alpha=alpha, stride=1, 98 | expansion=6, block_id=2, skip_connection=True) 99 | 100 | #---------------------------------------------------------------# 101 | # 119,119,24 -> 60,60.32 102 | x = _inverted_res_block(x, in_filters=24, filters=32, alpha=alpha, stride=2, 103 | expansion=6, block_id=3, skip_connection=False) 104 | x = _inverted_res_block(x, in_filters=32, filters=32, alpha=alpha, stride=1, 105 | expansion=6, block_id=4, skip_connection=True) 106 | x = _inverted_res_block(x, in_filters=32, filters=32, alpha=alpha, stride=1, 107 | expansion=6, block_id=5, skip_connection=True) 108 | 109 | #---------------------------------------------------------------# 110 | # 60,60,32 -> 30,30.64 111 | x = _inverted_res_block(x, in_filters=32, filters=64, alpha=alpha, stride=block4_stride, 112 | expansion=6, block_id=6, skip_connection=False) 113 | x = _inverted_res_block(x, in_filters=64, filters=64, alpha=alpha, stride=1, rate=block4_dilation, 114 | expansion=6, block_id=7, skip_connection=True) 115 | x = _inverted_res_block(x, in_filters=64, filters=64, alpha=alpha, stride=1, rate=block4_dilation, 116 | expansion=6, block_id=8, skip_connection=True) 117 | x = _inverted_res_block(x, in_filters=64, filters=64, alpha=alpha, stride=1, rate=block4_dilation, 118 | expansion=6, block_id=9, skip_connection=True) 119 | 120 | # 30,30.64 -> 30,30.96 121 | x = _inverted_res_block(x, in_filters=64, filters=96, alpha=alpha, stride=1, rate=block4_dilation, 122 | expansion=6, block_id=10, skip_connection=False) 123 | x = _inverted_res_block(x, in_filters=96, filters=96, alpha=alpha, stride=1, rate=block4_dilation, 124 | expansion=6, block_id=11, skip_connection=True) 125 | x = _inverted_res_block(x, in_filters=96, filters=96, alpha=alpha, stride=1, rate=block4_dilation, 126 | expansion=6, block_id=12, skip_connection=True) 127 | # 辅助分支训练 128 | f4 = x 129 | 130 | #---------------------------------------------------------------# 131 | # 30,30.96 -> 30,30,160 -> 30,30,320 132 | x = _inverted_res_block(x, in_filters=96, filters=160, alpha=alpha, stride=1, rate=block4_dilation, 133 | expansion=6, block_id=13, skip_connection=False) 134 | x = _inverted_res_block(x, in_filters=160, filters=160, alpha=alpha, stride=1, rate=block5_dilation, 135 | expansion=6, block_id=14, skip_connection=True) 136 | x = _inverted_res_block(x, in_filters=160, filters=160, alpha=alpha, stride=1, rate=block5_dilation, 137 | expansion=6, block_id=15, skip_connection=True) 138 | 139 | x = _inverted_res_block(x, in_filters=160, filters=320, alpha=alpha, stride=1, rate=block5_dilation, 140 | expansion=6, block_id=16, skip_connection=False) 141 | f5 = x 142 | return inputs, f4, f5 143 | -------------------------------------------------------------------------------- /nets/pspnet.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import tensorflow as tf 3 | import tensorflow.keras.backend as K 4 | from tensorflow.keras.layers import * 5 | from tensorflow.keras.models import * 6 | 7 | from nets.mobilenetv2 import get_mobilenet_encoder 8 | from nets.resnet50 import get_resnet50_encoder 9 | 10 | def pool_block(feats, pool_factor, out_channel): 11 | h = K.int_shape(feats)[1] 12 | w = K.int_shape(feats)[2] 13 | #-----------------------------------------------------# 14 | # 分区域进行平均池化 15 | # strides = [30,30], [15,15], [10,10], [5, 5] 16 | # poolsize = 30/1=30 30/2=15 30/3=10 30/6=5 17 | #-----------------------------------------------------# 18 | pool_size = strides = [int(np.round(float(h)/pool_factor)),int(np.round(float(w)/pool_factor))] 19 | x = AveragePooling2D(pool_size, strides=strides, padding='same')(feats) 20 | 21 | #-----------------------------------------------------# 22 | # 利用1x1卷积进行通道数的调整 23 | #-----------------------------------------------------# 24 | x = Conv2D(out_channel//4, (1 ,1), padding='same', use_bias=False)(x) 25 | x = BatchNormalization()(x) 26 | x = Activation('relu' )(x) 27 | 28 | #-----------------------------------------------------# 29 | # 利用resize扩大特征层面积 30 | #-----------------------------------------------------# 31 | x = Lambda(lambda x: tf.compat.v1.image.resize_images(x, (K.int_shape(feats)[1], K.int_shape(feats)[2]), align_corners=True))(x) 32 | return x 33 | 34 | def pspnet(input_shape, num_classes, backbone='mobilenet', downsample_factor=8, aux_branch=True): 35 | if backbone == "mobilenet": 36 | #----------------------------------# 37 | # 获得两个特征层 38 | # f4为辅助分支 [30,30,96] 39 | # o为主干部分 [30,30,320] 40 | #----------------------------------# 41 | img_input, f4, o = get_mobilenet_encoder(input_shape, downsample_factor=downsample_factor) 42 | out_channel = 320 43 | elif backbone == "resnet50": 44 | img_input, f4, o = get_resnet50_encoder(input_shape, downsample_factor=downsample_factor) 45 | out_channel = 2048 46 | else: 47 | raise ValueError('Unsupported backbone - `{}`, Use mobilenet, resnet50.'.format(backbone)) 48 | 49 | #--------------------------------------------------------------# 50 | # PSP模块,分区域进行池化 51 | # 分别分割成1x1的区域,2x2的区域,3x3的区域,6x6的区域 52 | #--------------------------------------------------------------# 53 | pool_factors = [1,2,3,6] 54 | pool_outs = [o] 55 | 56 | for p in pool_factors: 57 | pooled = pool_block(o, p, out_channel) 58 | pool_outs.append(pooled) 59 | 60 | #--------------------------------------------------------------------------------# 61 | # 利用获取到的特征层进行堆叠 62 | # 30, 30, 320 + 30, 30, 80 + 30, 30, 80 + 30, 30, 80 + 30, 30, 80 = 30, 30, 640 63 | #--------------------------------------------------------------------------------# 64 | o = Concatenate()(pool_outs) 65 | 66 | # 30, 30, 640 -> 30, 30, 80 67 | o = Conv2D(out_channel//4, (3,3), padding='same', use_bias=False)(o) 68 | o = BatchNormalization()(o) 69 | o = Activation('relu')(o) 70 | 71 | # 防止过拟合 72 | o = Dropout(0.1)(o) 73 | 74 | #---------------------------------------------------# 75 | # 利用特征获得预测结果 76 | # 30, 30, 80 -> 30, 30, 21 -> 473, 473, 21 77 | #---------------------------------------------------# 78 | o = Conv2D(num_classes,(1,1), padding='same')(o) 79 | o = Lambda(lambda x: tf.compat.v1.image.resize_images(x, (input_shape[1], input_shape[0]), align_corners=True))(o) 80 | 81 | #---------------------------------------------------# 82 | # 获得每一个像素点属于每一个类的概率 83 | #---------------------------------------------------# 84 | o = Activation("softmax", name="main")(o) 85 | 86 | if aux_branch: 87 | # 30, 30, 96 -> 30, 30, 40 88 | f4 = Conv2D(out_channel//8, (3,3), padding='same', use_bias=False)(f4) 89 | f4 = BatchNormalization()(f4) 90 | f4 = Activation('relu')(f4) 91 | f4 = Dropout(0.1)(f4) 92 | #---------------------------------------------------# 93 | # 利用特征获得预测结果 94 | # 30, 30, 40 -> 30, 30, 21 -> 473, 473, 21 95 | #---------------------------------------------------# 96 | f4 = Conv2D(num_classes,(1,1), padding='same')(f4) 97 | f4 = Lambda(lambda x: tf.compat.v1.image.resize_images(x, (input_shape[1], input_shape[0]), align_corners=True))(f4) 98 | 99 | f4 = Activation("softmax", name="aux")(f4) 100 | model = Model(img_input,[f4,o]) 101 | return model 102 | else: 103 | model = Model(img_input,[o]) 104 | return model 105 | 106 | 107 | -------------------------------------------------------------------------------- /nets/pspnet_training.py: -------------------------------------------------------------------------------- 1 | import math 2 | from functools import partial 3 | 4 | import numpy as np 5 | import tensorflow as tf 6 | from tensorflow.keras import backend as K 7 | 8 | 9 | def dice_loss_with_CE(cls_weights, beta=1, smooth = 1e-5): 10 | cls_weights = np.reshape(cls_weights, [1, 1, 1, -1]) 11 | def _dice_loss_with_CE(y_true, y_pred): 12 | y_pred = K.clip(y_pred, K.epsilon(), 1.0 - K.epsilon()) 13 | 14 | CE_loss = - y_true[...,:-1] * K.log(y_pred) * cls_weights 15 | CE_loss = K.mean(K.sum(CE_loss, axis = -1)) 16 | 17 | tp = K.sum(y_true[...,:-1] * y_pred, axis=[0,1,2]) 18 | fp = K.sum(y_pred , axis=[0,1,2]) - tp 19 | fn = K.sum(y_true[...,:-1], axis=[0,1,2]) - tp 20 | 21 | score = ((1 + beta ** 2) * tp + smooth) / ((1 + beta ** 2) * tp + beta ** 2 * fn + fp + smooth) 22 | score = tf.reduce_mean(score) 23 | dice_loss = 1 - score 24 | # dice_loss = tf.Print(dice_loss, [dice_loss, CE_loss]) 25 | return CE_loss + dice_loss 26 | return _dice_loss_with_CE 27 | 28 | def CE(cls_weights): 29 | cls_weights = np.reshape(cls_weights, [1, 1, 1, -1]) 30 | def _CE(y_true, y_pred): 31 | y_pred = K.clip(y_pred, K.epsilon(), 1.0 - K.epsilon()) 32 | 33 | CE_loss = - y_true[...,:-1] * K.log(y_pred) * cls_weights 34 | CE_loss = K.mean(K.sum(CE_loss, axis = -1)) 35 | # dice_loss = tf.Print(CE_loss, [CE_loss]) 36 | return CE_loss 37 | return _CE 38 | 39 | def dice_loss_with_Focal_Loss(cls_weights, beta=1, smooth = 1e-5, alpha=0.5, gamma=2): 40 | cls_weights = np.reshape(cls_weights, [1, 1, 1, -1]) 41 | def _dice_loss_with_Focal_Loss(y_true, y_pred): 42 | y_pred = K.clip(y_pred, K.epsilon(), 1.0 - K.epsilon()) 43 | 44 | logpt = - y_true[...,:-1] * K.log(y_pred) * cls_weights 45 | logpt = - K.sum(logpt, axis = -1) 46 | 47 | pt = tf.exp(logpt) 48 | if alpha is not None: 49 | logpt *= alpha 50 | CE_loss = -((1 - pt) ** gamma) * logpt 51 | CE_loss = K.mean(CE_loss) 52 | 53 | tp = K.sum(y_true[...,:-1] * y_pred, axis=[0,1,2]) 54 | fp = K.sum(y_pred , axis=[0,1,2]) - tp 55 | fn = K.sum(y_true[...,:-1], axis=[0,1,2]) - tp 56 | 57 | score = ((1 + beta ** 2) * tp + smooth) / ((1 + beta ** 2) * tp + beta ** 2 * fn + fp + smooth) 58 | score = tf.reduce_mean(score) 59 | dice_loss = 1 - score 60 | # dice_loss = tf.Print(dice_loss, [dice_loss, CE_loss]) 61 | return CE_loss + dice_loss 62 | return _dice_loss_with_Focal_Loss 63 | 64 | def Focal_Loss(cls_weights, alpha=0.5, gamma=2): 65 | cls_weights = np.reshape(cls_weights, [1, 1, 1, -1]) 66 | def _Focal_Loss(y_true, y_pred): 67 | y_pred = K.clip(y_pred, K.epsilon(), 1.0 - K.epsilon()) 68 | 69 | logpt = - y_true[...,:-1] * K.log(y_pred) * cls_weights 70 | logpt = - K.sum(logpt, axis = -1) 71 | 72 | pt = tf.exp(logpt) 73 | if alpha is not None: 74 | logpt *= alpha 75 | CE_loss = -((1 - pt) ** gamma) * logpt 76 | CE_loss = K.mean(CE_loss) 77 | return CE_loss 78 | return _Focal_Loss 79 | 80 | def get_lr_scheduler(lr_decay_type, lr, min_lr, total_iters, warmup_iters_ratio = 0.1, warmup_lr_ratio = 0.1, no_aug_iter_ratio = 0.3, step_num = 10): 81 | def yolox_warm_cos_lr(lr, min_lr, total_iters, warmup_total_iters, warmup_lr_start, no_aug_iter, iters): 82 | if iters <= warmup_total_iters: 83 | # lr = (lr - warmup_lr_start) * iters / float(warmup_total_iters) + warmup_lr_start 84 | lr = (lr - warmup_lr_start) * pow(iters / float(warmup_total_iters), 2 85 | ) + warmup_lr_start 86 | elif iters >= total_iters - no_aug_iter: 87 | lr = min_lr 88 | else: 89 | lr = min_lr + 0.5 * (lr - min_lr) * ( 90 | 1.0 91 | + math.cos( 92 | math.pi 93 | * (iters - warmup_total_iters) 94 | / (total_iters - warmup_total_iters - no_aug_iter) 95 | ) 96 | ) 97 | return lr 98 | 99 | def step_lr(lr, decay_rate, step_size, iters): 100 | if step_size < 1: 101 | raise ValueError("step_size must above 1.") 102 | n = iters // step_size 103 | out_lr = lr * decay_rate ** n 104 | return out_lr 105 | 106 | if lr_decay_type == "cos": 107 | warmup_total_iters = min(max(warmup_iters_ratio * total_iters, 1), 3) 108 | warmup_lr_start = max(warmup_lr_ratio * lr, 1e-6) 109 | no_aug_iter = min(max(no_aug_iter_ratio * total_iters, 1), 15) 110 | func = partial(yolox_warm_cos_lr ,lr, min_lr, total_iters, warmup_total_iters, warmup_lr_start, no_aug_iter) 111 | else: 112 | decay_rate = (min_lr / lr) ** (1 / (step_num - 1)) 113 | step_size = total_iters / step_num 114 | func = partial(step_lr, lr, decay_rate, step_size) 115 | 116 | return func 117 | 118 | -------------------------------------------------------------------------------- /nets/resnet50.py: -------------------------------------------------------------------------------- 1 | #-------------------------------------------------------------# 2 | # ResNet50的网络部分 3 | #-------------------------------------------------------------# 4 | from __future__ import print_function 5 | 6 | from tensorflow.keras import layers 7 | from tensorflow.keras.layers import (Activation, BatchNormalization, Conv2D, 8 | Input, MaxPooling2D, ZeroPadding2D) 9 | 10 | 11 | def identity_block(input_tensor, kernel_size, filters, stage, block, dilation_rate=1): 12 | 13 | filters1, filters2, filters3 = filters 14 | 15 | conv_name_base = 'res' + str(stage) + block + '_branch' 16 | bn_name_base = 'bn' + str(stage) + block + '_branch' 17 | 18 | x = Conv2D(filters1, (1, 1), name=conv_name_base + '2a', use_bias=False)(input_tensor) 19 | x = BatchNormalization(name=bn_name_base + '2a')(x) 20 | x = Activation('relu')(x) 21 | 22 | x = Conv2D(filters2, kernel_size, padding='same', dilation_rate = dilation_rate, name=conv_name_base + '2b', use_bias=False)(x) 23 | x = BatchNormalization(name=bn_name_base + '2b')(x) 24 | x = Activation('relu')(x) 25 | 26 | x = Conv2D(filters3, (1, 1), name=conv_name_base + '2c', use_bias=False)(x) 27 | x = BatchNormalization(name=bn_name_base + '2c')(x) 28 | 29 | x = layers.add([x, input_tensor]) 30 | x = Activation('relu')(x) 31 | return x 32 | 33 | 34 | def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2), dilation_rate=1): 35 | 36 | filters1, filters2, filters3 = filters 37 | 38 | conv_name_base = 'res' + str(stage) + block + '_branch' 39 | bn_name_base = 'bn' + str(stage) + block + '_branch' 40 | 41 | x = Conv2D(filters1, (1, 1), strides=strides, 42 | name=conv_name_base + '2a', use_bias=False)(input_tensor) 43 | x = BatchNormalization(name=bn_name_base + '2a')(x) 44 | x = Activation('relu')(x) 45 | 46 | x = Conv2D(filters2, kernel_size, padding='same', dilation_rate = dilation_rate, 47 | name=conv_name_base + '2b', use_bias=False)(x) 48 | x = BatchNormalization(name=bn_name_base + '2b')(x) 49 | x = Activation('relu')(x) 50 | 51 | x = Conv2D(filters3, (1, 1), name=conv_name_base + '2c', use_bias=False)(x) 52 | x = BatchNormalization(name=bn_name_base + '2c')(x) 53 | 54 | shortcut = Conv2D(filters3, (1, 1), strides=strides, 55 | name=conv_name_base + '1', use_bias=False)(input_tensor) 56 | shortcut = BatchNormalization(name=bn_name_base + '1')(shortcut) 57 | 58 | x = layers.add([x, shortcut]) 59 | x = Activation('relu')(x) 60 | return x 61 | 62 | def get_resnet50_encoder(inputs_size, downsample_factor=8): 63 | if downsample_factor == 16: 64 | block4_dilation = 1 65 | block5_dilation = 2 66 | block4_stride = 2 67 | elif downsample_factor == 8: 68 | block4_dilation = 2 69 | block5_dilation = 4 70 | block4_stride = 1 71 | else: 72 | raise ValueError('Unsupported factor - `{}`, Use 8 or 16.'.format(downsample_factor)) 73 | img_input = Input(shape=inputs_size) 74 | 75 | x = ZeroPadding2D(padding=(1, 1), name='conv1_pad')(img_input) 76 | x = Conv2D(filters=64, kernel_size=(3, 3), strides=(2, 2), name='conv1', use_bias=False)(x) 77 | x = BatchNormalization(axis=-1, name='bn_conv1')(x) 78 | x = Activation('relu')(x) 79 | 80 | x = ZeroPadding2D(padding=(1, 1), name='conv2_pad')(x) 81 | x = Conv2D(filters=64, kernel_size=(3, 3), strides=(1, 1), name='conv2', use_bias=False)(x) 82 | x = BatchNormalization(axis=-1, name='bn_conv2')(x) 83 | x = Activation(activation='relu')(x) 84 | 85 | x = ZeroPadding2D(padding=(1, 1), name='conv3_pad')(x) 86 | x = Conv2D(filters=128, kernel_size=(3, 3), strides=(1, 1), name='conv3', use_bias=False)(x) 87 | x = BatchNormalization(axis=-1, name='bn_conv3')(x) 88 | x = Activation(activation='relu')(x) 89 | 90 | x = ZeroPadding2D(padding=(1, 1), name='pool1_pad')(x) 91 | x = MaxPooling2D((3, 3), strides=(2, 2))(x) 92 | 93 | x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1)) 94 | x = identity_block(x, 3, [64, 64, 256], stage=2, block='b') 95 | x = identity_block(x, 3, [64, 64, 256], stage=2, block='c') 96 | 97 | x = conv_block(x, 3, [128, 128, 512], stage=3, block='a') 98 | x = identity_block(x, 3, [128, 128, 512], stage=3, block='b') 99 | x = identity_block(x, 3, [128, 128, 512], stage=3, block='c') 100 | x = identity_block(x, 3, [128, 128, 512], stage=3, block='d') 101 | 102 | x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a', strides=(block4_stride,block4_stride)) 103 | x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b', dilation_rate=block4_dilation) 104 | x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c', dilation_rate=block4_dilation) 105 | x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d', dilation_rate=block4_dilation) 106 | x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e', dilation_rate=block4_dilation) 107 | x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f', dilation_rate=block4_dilation) 108 | f4 = x 109 | 110 | x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a', strides=(1,1), dilation_rate=block4_dilation) 111 | x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b', dilation_rate=block5_dilation) 112 | x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c', dilation_rate=block5_dilation) 113 | f5 = x 114 | 115 | return img_input, f4, f5 116 | -------------------------------------------------------------------------------- /predict.py: -------------------------------------------------------------------------------- 1 | #-------------------------------------# 2 | # 对单张图片进行预测 3 | #-------------------------------------# 4 | import time 5 | 6 | import cv2 7 | import numpy as np 8 | import tensorflow as tf 9 | from PIL import Image 10 | 11 | from pspnet import Pspnet 12 | 13 | gpus = tf.config.experimental.list_physical_devices(device_type='GPU') 14 | for gpu in gpus: 15 | tf.config.experimental.set_memory_growth(gpu, True) 16 | 17 | if __name__ == "__main__": 18 | #-------------------------------------------------------------------------# 19 | # 如果想要修改对应种类的颜色,到__init__函数里修改self.colors即可 20 | #-------------------------------------------------------------------------# 21 | pspnet = Pspnet() 22 | #----------------------------------------------------------------------------------------------------------# 23 | # mode用于指定测试的模式: 24 | # 'predict' 表示单张图片预测,如果想对预测过程进行修改,如保存图片,截取对象等,可以先看下方详细的注释 25 | # 'video' 表示视频检测,可调用摄像头或者视频进行检测,详情查看下方注释。 26 | # 'fps' 表示测试fps,使用的图片是img里面的street.jpg,详情查看下方注释。 27 | # 'dir_predict' 表示遍历文件夹进行检测并保存。默认遍历img文件夹,保存img_out文件夹,详情查看下方注释。 28 | #----------------------------------------------------------------------------------------------------------# 29 | mode = "predict" 30 | #-------------------------------------------------------------------------# 31 | # count 指定了是否进行目标的像素点计数(即面积)与比例计算 32 | # name_classes 区分的种类,和json_to_dataset里面的一样,用于打印种类和数量 33 | # 34 | # count、name_classes仅在mode='predict'时有效 35 | #-------------------------------------------------------------------------# 36 | count = False 37 | name_classes = ["background","aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"] 38 | # name_classes = ["background","cat","dog"] 39 | #----------------------------------------------------------------------------------------------------------# 40 | # video_path 用于指定视频的路径,当video_path=0时表示检测摄像头 41 | # 想要检测视频,则设置如video_path = "xxx.mp4"即可,代表读取出根目录下的xxx.mp4文件。 42 | # video_save_path 表示视频保存的路径,当video_save_path=""时表示不保存 43 | # 想要保存视频,则设置如video_save_path = "yyy.mp4"即可,代表保存为根目录下的yyy.mp4文件。 44 | # video_fps 用于保存的视频的fps 45 | # 46 | # video_path、video_save_path和video_fps仅在mode='video'时有效 47 | # 保存视频时需要ctrl+c退出或者运行到最后一帧才会完成完整的保存步骤。 48 | #----------------------------------------------------------------------------------------------------------# 49 | video_path = 0 50 | video_save_path = "" 51 | video_fps = 25.0 52 | #----------------------------------------------------------------------------------------------------------# 53 | # test_interval 用于指定测量fps的时候,图片检测的次数。理论上test_interval越大,fps越准确。 54 | # fps_image_path 用于指定测试的fps图片 55 | # 56 | # test_interval和fps_image_path仅在mode='fps'有效 57 | #----------------------------------------------------------------------------------------------------------# 58 | test_interval = 100 59 | fps_image_path = "img/street.jpg" 60 | #-------------------------------------------------------------------------# 61 | # dir_origin_path 指定了用于检测的图片的文件夹路径 62 | # dir_save_path 指定了检测完图片的保存路径 63 | # 64 | # dir_origin_path和dir_save_path仅在mode='dir_predict'时有效 65 | #-------------------------------------------------------------------------# 66 | dir_origin_path = "img/" 67 | dir_save_path = "img_out/" 68 | 69 | if mode == "predict": 70 | ''' 71 | predict.py有几个注意点 72 | 1、该代码无法直接进行批量预测,如果想要批量预测,可以利用os.listdir()遍历文件夹,利用Image.open打开图片文件进行预测。 73 | 具体流程可以参考get_miou_prediction.py,在get_miou_prediction.py即实现了遍历。 74 | 2、如果想要保存,利用r_image.save("img.jpg")即可保存。 75 | 3、如果想要原图和分割图不混合,可以把blend参数设置成False。 76 | 4、如果想根据mask获取对应的区域,可以参考detect_image函数中,利用预测结果绘图的部分,判断每一个像素点的种类,然后根据种类获取对应的部分。 77 | seg_img = np.zeros((np.shape(pr)[0],np.shape(pr)[1],3)) 78 | for c in range(self.num_classes): 79 | seg_img[:, :, 0] += ((pr == c)*( self.colors[c][0] )).astype('uint8') 80 | seg_img[:, :, 1] += ((pr == c)*( self.colors[c][1] )).astype('uint8') 81 | seg_img[:, :, 2] += ((pr == c)*( self.colors[c][2] )).astype('uint8') 82 | ''' 83 | while True: 84 | img = input('Input image filename:') 85 | try: 86 | image = Image.open(img) 87 | except: 88 | print('Open Error! Try again!') 89 | continue 90 | else: 91 | r_image = pspnet.detect_image(image, count=count, name_classes=name_classes) 92 | r_image.show() 93 | 94 | elif mode == "video": 95 | capture=cv2.VideoCapture(video_path) 96 | if video_save_path!="": 97 | fourcc = cv2.VideoWriter_fourcc(*'XVID') 98 | size = (int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)), int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))) 99 | out = cv2.VideoWriter(video_save_path, fourcc, video_fps, size) 100 | 101 | ref, frame = capture.read() 102 | if not ref: 103 | raise ValueError("未能正确读取摄像头(视频),请注意是否正确安装摄像头(是否正确填写视频路径)。") 104 | 105 | fps = 0.0 106 | while(True): 107 | t1 = time.time() 108 | # 读取某一帧 109 | ref, frame = capture.read() 110 | if not ref: 111 | break 112 | # 格式转变,BGRtoRGB 113 | frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB) 114 | # 转变成Image 115 | frame = Image.fromarray(np.uint8(frame)) 116 | # 进行检测 117 | frame = np.array(pspnet.detect_image(frame)) 118 | # RGBtoBGR满足opencv显示格式 119 | frame = cv2.cvtColor(frame,cv2.COLOR_RGB2BGR) 120 | 121 | fps = ( fps + (1./(time.time()-t1)) ) / 2 122 | print("fps= %.2f"%(fps)) 123 | frame = cv2.putText(frame, "fps= %.2f"%(fps), (0, 40), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2) 124 | 125 | cv2.imshow("video",frame) 126 | c= cv2.waitKey(1) & 0xff 127 | if video_save_path!="": 128 | out.write(frame) 129 | 130 | if c==27: 131 | capture.release() 132 | break 133 | print("Video Detection Done!") 134 | capture.release() 135 | if video_save_path!="": 136 | print("Save processed video to the path :" + video_save_path) 137 | out.release() 138 | cv2.destroyAllWindows() 139 | 140 | elif mode == "fps": 141 | img = Image.open(fps_image_path) 142 | tact_time = pspnet.get_FPS(img, test_interval) 143 | print(str(tact_time) + ' seconds, ' + str(1/tact_time) + 'FPS, @batch_size 1') 144 | 145 | elif mode == "dir_predict": 146 | import os 147 | 148 | from tqdm import tqdm 149 | 150 | img_names = os.listdir(dir_origin_path) 151 | for img_name in tqdm(img_names): 152 | if img_name.lower().endswith(('.bmp', '.dib', '.png', '.jpg', '.jpeg', '.pbm', '.pgm', '.ppm', '.tif', '.tiff')): 153 | image_path = os.path.join(dir_origin_path, img_name) 154 | image = Image.open(image_path) 155 | r_image = pspnet.detect_image(image) 156 | if not os.path.exists(dir_save_path): 157 | os.makedirs(dir_save_path) 158 | r_image.save(os.path.join(dir_save_path, img_name)) 159 | 160 | else: 161 | raise AssertionError("Please specify the correct mode: 'predict', 'video', 'fps' or 'dir_predict'.") 162 | -------------------------------------------------------------------------------- /pspnet.py: -------------------------------------------------------------------------------- 1 | import colorsys 2 | import copy 3 | import time 4 | 5 | import cv2 6 | import numpy as np 7 | import tensorflow as tf 8 | from PIL import Image 9 | 10 | from nets.pspnet import pspnet 11 | from utils.utils import cvtColor, preprocess_input, resize_image, show_config 12 | 13 | 14 | #--------------------------------------------# 15 | # 使用自己训练好的模型预测需要修改3个参数 16 | # model_path、backbone和num_classes都需要修改! 17 | # 如果出现shape不匹配 18 | # 一定要注意训练时的model_path、 19 | # backbone和num_classes数的修改 20 | #--------------------------------------------# 21 | class Pspnet(object): 22 | _defaults = { 23 | #-------------------------------------------------------------------# 24 | # model_path指向logs文件夹下的权值文件 25 | # 训练好后logs文件夹下存在多个权值文件,选择验证集损失较低的即可。 26 | # 验证集损失较低不代表miou较高,仅代表该权值在验证集上泛化性能较好。 27 | #-------------------------------------------------------------------# 28 | "model_path" : 'model_data/pspnet_mobilenetv2.h5', 29 | #----------------------------------------# 30 | # 所需要区分的类的个数+1 31 | #----------------------------------------# 32 | "num_classes" : 21, 33 | #----------------------------------------# 34 | # 所使用的的主干网络:mobilenet、resnet50 35 | #----------------------------------------# 36 | "backbone" : "mobilenet", 37 | #----------------------------------------# 38 | # 输入图片的大小 39 | #----------------------------------------# 40 | "input_shape" : [473, 473], 41 | #----------------------------------------# 42 | # 下采样的倍数,一般可选的为8和16 43 | # 与训练时设置的一样即可 44 | #----------------------------------------# 45 | "downsample_factor" : 16, 46 | #-------------------------------------------------# 47 | # mix_type参数用于控制检测结果的可视化方式 48 | # 49 | # mix_type = 0的时候代表原图与生成的图进行混合 50 | # mix_type = 1的时候代表仅保留生成的图 51 | # mix_type = 2的时候代表仅扣去背景,仅保留原图中的目标 52 | #-------------------------------------------------# 53 | "mix_type" : 0, 54 | } 55 | 56 | #---------------------------------------------------# 57 | # 初始化PSPNET 58 | #---------------------------------------------------# 59 | def __init__(self, **kwargs): 60 | self.__dict__.update(self._defaults) 61 | for name, value in kwargs.items(): 62 | setattr(self, name, value) 63 | #---------------------------------------------------# 64 | # 画框设置不同的颜色 65 | #---------------------------------------------------# 66 | if self.num_classes <= 21: 67 | self.colors = [ (0, 0, 0), (128, 0, 0), (0, 128, 0), (128, 128, 0), (0, 0, 128), (128, 0, 128), (0, 128, 128), 68 | (128, 128, 128), (64, 0, 0), (192, 0, 0), (64, 128, 0), (192, 128, 0), (64, 0, 128), (192, 0, 128), 69 | (64, 128, 128), (192, 128, 128), (0, 64, 0), (128, 64, 0), (0, 192, 0), (128, 192, 0), (0, 64, 128), 70 | (128, 64, 12)] 71 | else: 72 | hsv_tuples = [(x / self.num_classes, 1., 1.) for x in range(self.num_classes)] 73 | self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples)) 74 | self.colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), self.colors)) 75 | #---------------------------------------------------# 76 | # 获得模型 77 | #---------------------------------------------------# 78 | self.generate() 79 | 80 | show_config(**self._defaults) 81 | 82 | #---------------------------------------------------# 83 | # 载入模型 84 | #---------------------------------------------------# 85 | def generate(self): 86 | #-------------------------------# 87 | # 载入模型与权值 88 | #-------------------------------# 89 | self.model = pspnet([self.input_shape[0], self.input_shape[1], 3], self.num_classes, 90 | downsample_factor=self.downsample_factor, backbone=self.backbone, aux_branch=False) 91 | self.model.load_weights(self.model_path, by_name = True) 92 | print('{} model loaded.'.format(self.model_path)) 93 | 94 | @tf.function 95 | def get_pred(self, image_data): 96 | pr = self.model(image_data, training=False) 97 | return pr 98 | #---------------------------------------------------# 99 | # 检测图片 100 | #---------------------------------------------------# 101 | def detect_image(self, image, count=False, name_classes=None): 102 | #---------------------------------------------------------# 103 | # 在这里将图像转换成RGB图像,防止灰度图在预测时报错。 104 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 105 | #---------------------------------------------------------# 106 | image = cvtColor(image) 107 | #---------------------------------------------------# 108 | # 对输入图像进行一个备份,后面用于绘图 109 | #---------------------------------------------------# 110 | old_img = copy.deepcopy(image) 111 | orininal_h = np.array(image).shape[0] 112 | orininal_w = np.array(image).shape[1] 113 | #---------------------------------------------------------# 114 | # 给图像增加灰条,实现不失真的resize 115 | # 也可以直接resize进行识别 116 | #---------------------------------------------------------# 117 | image_data, nw, nh = resize_image(image, (self.input_shape[1], self.input_shape[0])) 118 | #---------------------------------------------------------# 119 | # 归一化+通道数调整到第一维度+添加上batch_size维度 120 | #---------------------------------------------------------# 121 | image_data = np.expand_dims(preprocess_input(np.array(image_data, np.float32)), 0) 122 | 123 | #---------------------------------------------------# 124 | # 图片传入网络进行预测 125 | #---------------------------------------------------# 126 | pr = self.get_pred(image_data)[0].numpy() 127 | #---------------------------------------------------# 128 | # 将灰条部分截取掉 129 | #---------------------------------------------------# 130 | pr = pr[int((self.input_shape[0] - nh) // 2) : int((self.input_shape[0] - nh) // 2 + nh), \ 131 | int((self.input_shape[1] - nw) // 2) : int((self.input_shape[1] - nw) // 2 + nw)] 132 | #---------------------------------------------------# 133 | # 进行图片的resize 134 | #---------------------------------------------------# 135 | pr = cv2.resize(pr, (orininal_w, orininal_h), interpolation = cv2.INTER_LINEAR) 136 | #---------------------------------------------------# 137 | # 取出每一个像素点的种类 138 | #---------------------------------------------------# 139 | pr = pr.argmax(axis=-1) 140 | 141 | #---------------------------------------------------------# 142 | # 计数 143 | #---------------------------------------------------------# 144 | if count: 145 | classes_nums = np.zeros([self.num_classes]) 146 | total_points_num = orininal_h * orininal_w 147 | print('-' * 63) 148 | print("|%25s | %15s | %15s|"%("Key", "Value", "Ratio")) 149 | print('-' * 63) 150 | for i in range(self.num_classes): 151 | num = np.sum(pr == i) 152 | ratio = num / total_points_num * 100 153 | if num > 0: 154 | print("|%25s | %15s | %14.2f%%|"%(str(name_classes[i]), str(num), ratio)) 155 | print('-' * 63) 156 | classes_nums[i] = num 157 | print("classes_nums:", classes_nums) 158 | 159 | if self.mix_type == 0: 160 | # seg_img = np.zeros((np.shape(pr)[0], np.shape(pr)[1], 3)) 161 | # for c in range(self.num_classes): 162 | # seg_img[:, :, 0] += ((pr[:, :] == c ) * self.colors[c][0]).astype('uint8') 163 | # seg_img[:, :, 1] += ((pr[:, :] == c ) * self.colors[c][1]).astype('uint8') 164 | # seg_img[:, :, 2] += ((pr[:, :] == c ) * self.colors[c][2]).astype('uint8') 165 | seg_img = np.reshape(np.array(self.colors, np.uint8)[np.reshape(pr, [-1])], [orininal_h, orininal_w, -1]) 166 | #------------------------------------------------# 167 | # 将新图片转换成Image的形式 168 | #------------------------------------------------# 169 | image = Image.fromarray(np.uint8(seg_img)) 170 | #------------------------------------------------# 171 | # 将新图与原图及进行混合 172 | #------------------------------------------------# 173 | image = Image.blend(old_img, image, 0.7) 174 | 175 | elif self.mix_type == 1: 176 | # seg_img = np.zeros((np.shape(pr)[0], np.shape(pr)[1], 3)) 177 | # for c in range(self.num_classes): 178 | # seg_img[:, :, 0] += ((pr[:, :] == c ) * self.colors[c][0]).astype('uint8') 179 | # seg_img[:, :, 1] += ((pr[:, :] == c ) * self.colors[c][1]).astype('uint8') 180 | # seg_img[:, :, 2] += ((pr[:, :] == c ) * self.colors[c][2]).astype('uint8') 181 | seg_img = np.reshape(np.array(self.colors, np.uint8)[np.reshape(pr, [-1])], [orininal_h, orininal_w, -1]) 182 | #------------------------------------------------# 183 | # 将新图片转换成Image的形式 184 | #------------------------------------------------# 185 | image = Image.fromarray(np.uint8(seg_img)) 186 | 187 | elif self.mix_type == 2: 188 | seg_img = (np.expand_dims(pr != 0, -1) * np.array(old_img, np.float32)).astype('uint8') 189 | #------------------------------------------------# 190 | # 将新图片转换成Image的形式 191 | #------------------------------------------------# 192 | image = Image.fromarray(np.uint8(seg_img)) 193 | 194 | return image 195 | 196 | def get_FPS(self, image, test_interval): 197 | #---------------------------------------------------------# 198 | # 在这里将图像转换成RGB图像,防止灰度图在预测时报错。 199 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 200 | #---------------------------------------------------------# 201 | image = cvtColor(image) 202 | #---------------------------------------------------------# 203 | # 给图像增加灰条,实现不失真的resize 204 | # 也可以直接resize进行识别 205 | #---------------------------------------------------------# 206 | image_data, nw, nh = resize_image(image, (self.input_shape[1], self.input_shape[0])) 207 | #---------------------------------------------------------# 208 | # 归一化+通道数调整到第一维度+添加上batch_size维度 209 | #---------------------------------------------------------# 210 | image_data = np.expand_dims(preprocess_input(np.array(image_data, np.float32)), 0) 211 | 212 | #---------------------------------------------------# 213 | # 图片传入网络进行预测 214 | #---------------------------------------------------# 215 | pr = self.get_pred(image_data)[0].numpy() 216 | #---------------------------------------------------# 217 | # 取出每一个像素点的种类 218 | #---------------------------------------------------# 219 | pr = pr.argmax(axis=-1).reshape([self.input_shape[0],self.input_shape[1]]) 220 | #--------------------------------------# 221 | # 将灰条部分截取掉 222 | #--------------------------------------# 223 | pr = pr[int((self.input_shape[0] - nh) // 2) : int((self.input_shape[0] - nh) // 2 + nh), \ 224 | int((self.input_shape[1] - nw) // 2) : int((self.input_shape[1] - nw) // 2 + nw)] 225 | 226 | t1 = time.time() 227 | for _ in range(test_interval): 228 | #---------------------------------------------------# 229 | # 图片传入网络进行预测 230 | #---------------------------------------------------# 231 | pr = self.get_pred(image_data)[0].numpy() 232 | #---------------------------------------------------# 233 | # 取出每一个像素点的种类 234 | #---------------------------------------------------# 235 | pr = pr.argmax(axis=-1).reshape([self.input_shape[0],self.input_shape[1]]) 236 | #--------------------------------------# 237 | # 将灰条部分截取掉 238 | #--------------------------------------# 239 | pr = pr[int((self.input_shape[0] - nh) // 2) : int((self.input_shape[0] - nh) // 2 + nh), \ 240 | int((self.input_shape[1] - nw) // 2) : int((self.input_shape[1] - nw) // 2 + nw)] 241 | 242 | t2 = time.time() 243 | tact_time = (t2 - t1) / test_interval 244 | return tact_time 245 | 246 | def get_miou_png(self, image): 247 | #---------------------------------------------------------# 248 | # 在这里将图像转换成RGB图像,防止灰度图在预测时报错。 249 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 250 | #---------------------------------------------------------# 251 | image = cvtColor(image) 252 | orininal_h = np.array(image).shape[0] 253 | orininal_w = np.array(image).shape[1] 254 | #---------------------------------------------------------# 255 | # 给图像增加灰条,实现不失真的resize 256 | # 也可以直接resize进行识别 257 | #---------------------------------------------------------# 258 | image_data, nw, nh = resize_image(image, (self.input_shape[1], self.input_shape[0])) 259 | #---------------------------------------------------------# 260 | # 归一化+通道数调整到第一维度+添加上batch_size维度 261 | #---------------------------------------------------------# 262 | image_data = np.expand_dims(preprocess_input(np.array(image_data, np.float32)), 0) 263 | 264 | #---------------------------------------------------# 265 | # 图片传入网络进行预测 266 | #---------------------------------------------------# 267 | pr = self.get_pred(image_data)[0].numpy() 268 | #--------------------------------------# 269 | # 将灰条部分截取掉 270 | #--------------------------------------# 271 | pr = pr[int((self.input_shape[0] - nh) // 2) : int((self.input_shape[0] - nh) // 2 + nh), \ 272 | int((self.input_shape[1] - nw) // 2) : int((self.input_shape[1] - nw) // 2 + nw)] 273 | #--------------------------------------# 274 | # 进行图片的resize 275 | #--------------------------------------# 276 | pr = cv2.resize(pr, (orininal_w, orininal_h), interpolation = cv2.INTER_LINEAR) 277 | #---------------------------------------------------# 278 | # 取出每一个像素点的种类 279 | #---------------------------------------------------# 280 | pr = pr.argmax(axis=-1) 281 | 282 | image = Image.fromarray(np.uint8(pr)) 283 | return image 284 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | scipy==1.4.1 2 | numpy==1.18.4 3 | matplotlib==3.2.1 4 | opencv_python==4.2.0.34 5 | tensorflow_gpu==2.2.0 6 | tqdm==4.46.1 7 | Pillow==8.2.0 8 | h5py==2.10.0 9 | -------------------------------------------------------------------------------- /summary.py: -------------------------------------------------------------------------------- 1 | #--------------------------------------------# 2 | # 该部分代码只用于看网络结构,并非测试代码 3 | #--------------------------------------------# 4 | from nets.pspnet import pspnet 5 | from utils.utils import net_flops 6 | 7 | if __name__ == "__main__": 8 | input_shape = [512, 512] 9 | num_classes = 21 10 | backbone = 'mobilenet' 11 | 12 | model = pspnet([input_shape[0], input_shape[1], 3], num_classes, backbone=backbone, downsample_factor=16, aux_branch=False) 13 | #--------------------------------------------# 14 | # 查看网络结构网络结构 15 | #--------------------------------------------# 16 | model.summary() 17 | #--------------------------------------------# 18 | # 计算网络的FLOPS 19 | #--------------------------------------------# 20 | net_flops(model, table=False) 21 | 22 | #--------------------------------------------# 23 | # 获得网络每个层的名称与序号 24 | #--------------------------------------------# 25 | # for i,layer in enumerate(model.layers): 26 | # print(i,layer.name) 27 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | import os 3 | from functools import partial 4 | 5 | import numpy as np 6 | import tensorflow as tf 7 | import tensorflow.keras.backend as K 8 | from tensorflow.keras.callbacks import (EarlyStopping, LearningRateScheduler, 9 | TensorBoard) 10 | from tensorflow.keras.optimizers import SGD, Adam 11 | 12 | from nets.pspnet import pspnet 13 | from nets.pspnet_training import (CE, Focal_Loss, dice_loss_with_CE, 14 | dice_loss_with_Focal_Loss, get_lr_scheduler) 15 | from utils.callbacks import EvalCallback, LossHistory, ModelCheckpoint 16 | from utils.dataloader import PSPnetDataset 17 | from utils.utils import show_config 18 | from utils.utils_fit import fit_one_epoch 19 | from utils.utils_metrics import Iou_score, f_score 20 | 21 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 22 | 23 | ''' 24 | 训练自己的语义分割模型一定需要注意以下几点: 25 | 1、训练前仔细检查自己的格式是否满足要求,该库要求数据集格式为VOC格式,需要准备好的内容有输入图片和标签 26 | 输入图片为.jpg图片,无需固定大小,传入训练前会自动进行resize。 27 | 灰度图会自动转成RGB图片进行训练,无需自己修改。 28 | 输入图片如果后缀非jpg,需要自己批量转成jpg后再开始训练。 29 | 30 | 标签为png图片,无需固定大小,传入训练前会自动进行resize。 31 | 由于许多同学的数据集是网络上下载的,标签格式并不符合,需要再度处理。一定要注意!标签的每个像素点的值就是这个像素点所属的种类。 32 | 网上常见的数据集总共对输入图片分两类,背景的像素点值为0,目标的像素点值为255。这样的数据集可以正常运行但是预测是没有效果的! 33 | 需要改成,背景的像素点值为0,目标的像素点值为1。 34 | 如果格式有误,参考:https://github.com/bubbliiiing/segmentation-format-fix 35 | 36 | 2、损失值的大小用于判断是否收敛,比较重要的是有收敛的趋势,即验证集损失不断下降,如果验证集损失基本上不改变的话,模型基本上就收敛了。 37 | 损失值的具体大小并没有什么意义,大和小只在于损失的计算方式,并不是接近于0才好。如果想要让损失好看点,可以直接到对应的损失函数里面除上10000。 38 | 训练过程中的损失值会保存在logs文件夹下的loss_%Y_%m_%d_%H_%M_%S文件夹中 39 | 40 | 3、训练好的权值文件保存在logs文件夹中,每个训练世代(Epoch)包含若干训练步长(Step),每个训练步长(Step)进行一次梯度下降。 41 | 如果只是训练了几个Step是不会保存的,Epoch和Step的概念要捋清楚一下。 42 | ''' 43 | if __name__ == "__main__": 44 | #-------------------------------# 45 | # 是否使用eager模式训练 46 | #-------------------------------# 47 | eager = False 48 | #---------------------------------------------------------------------# 49 | # train_gpu 训练用到的GPU 50 | # 默认为第一张卡、双卡为[0, 1]、三卡为[0, 1, 2] 51 | # 在使用多GPU时,每个卡上的batch为总batch除以卡的数量。 52 | #---------------------------------------------------------------------# 53 | train_gpu = [0,] 54 | #-----------------------------------------------------# 55 | # num_classes 训练自己的数据集必须要修改的 56 | # 自己需要的分类个数+1,如2+1 57 | #-----------------------------------------------------# 58 | num_classes = 21 59 | #-------------------------------# 60 | # 主干网络选择 61 | # mobilenet、resnet50 62 | #-------------------------------# 63 | backbone = "mobilenet" 64 | #----------------------------------------------------------------------------------------------------------------------------# 65 | # 权值文件的下载请看README,可以通过网盘下载。模型的 预训练权重 对不同数据集是通用的,因为特征是通用的。 66 | # 模型的 预训练权重 比较重要的部分是 主干特征提取网络的权值部分,用于进行特征提取。 67 | # 预训练权重对于99%的情况都必须要用,不用的话主干部分的权值太过随机,特征提取效果不明显,网络训练的结果也不会好 68 | # 训练自己的数据集时提示维度不匹配正常,预测的东西都不一样了自然维度不匹配 69 | # 70 | # 如果训练过程中存在中断训练的操作,可以将model_path设置成logs文件夹下的权值文件,将已经训练了一部分的权值再次载入。 71 | # 同时修改下方的 冻结阶段 或者 解冻阶段 的参数,来保证模型epoch的连续性。 72 | # 73 | # 当model_path = ''的时候不加载整个模型的权值。 74 | # 75 | # 此处使用的是整个模型的权重,因此是在train.py进行加载的。 76 | # 如果想要让模型从主干的预训练权值开始训练,则设置model_path为主干网络的权值,此时仅加载主干。 77 | # 如果想要让模型从0开始训练,则设置model_path = '',下面的Freeze_Train = Fasle,此时从0开始训练,且没有冻结主干的过程。 78 | # 79 | # 一般来讲,网络从0开始的训练效果会很差,因为权值太过随机,特征提取效果不明显,因此非常、非常、非常不建议大家从0开始训练! 80 | # 如果一定要从0开始,可以了解imagenet数据集,首先训练分类模型,获得网络的主干部分权值,分类模型的 主干部分 和该模型通用,基于此进行训练。 81 | #----------------------------------------------------------------------------------------------------------------------------# 82 | model_path = "model_data/pspnet_mobilenetv2.h5" 83 | #---------------------------------------------------------# 84 | # downsample_factor 下采样的倍数8、16 85 | # 8下采样的倍数较小、理论上效果更好。 86 | # 但也要求更大的显存 87 | #---------------------------------------------------------# 88 | downsample_factor = 16 89 | #------------------------------# 90 | # 输入图片的大小 91 | #------------------------------# 92 | input_shape = [473, 473] 93 | 94 | #----------------------------------------------------------------------------------------------------------------------------# 95 | # 训练分为两个阶段,分别是冻结阶段和解冻阶段。设置冻结阶段是为了满足机器性能不足的同学的训练需求。 96 | # 冻结训练需要的显存较小,显卡非常差的情况下,可设置Freeze_Epoch等于UnFreeze_Epoch,此时仅仅进行冻结训练。 97 | # 98 | # 在此提供若干参数设置建议,各位训练者根据自己的需求进行灵活调整: 99 | # (一)从整个模型的预训练权重开始训练: 100 | # Adam: 101 | # Init_Epoch = 0,Freeze_Epoch = 50,UnFreeze_Epoch = 100,Freeze_Train = True,optimizer_type = 'adam',Init_lr = 5e-4。(冻结) 102 | # Init_Epoch = 0,UnFreeze_Epoch = 100,Freeze_Train = False,optimizer_type = 'adam',Init_lr = 5e-4。(不冻结) 103 | # SGD: 104 | # Init_Epoch = 0,Freeze_Epoch = 50,UnFreeze_Epoch = 100,Freeze_Train = True,optimizer_type = 'sgd',Init_lr = 1e-2。(冻结) 105 | # Init_Epoch = 0,UnFreeze_Epoch = 100,Freeze_Train = False,optimizer_type = 'sgd',Init_lr = 1e-2。(不冻结) 106 | # 其中:UnFreeze_Epoch可以在100-300之间调整。 107 | # (二)从主干网络的预训练权重开始训练: 108 | # Adam: 109 | # Init_Epoch = 0,Freeze_Epoch = 50,UnFreeze_Epoch = 100,Freeze_Train = True,optimizer_type = 'adam',Init_lr = 5e-4。(冻结) 110 | # Init_Epoch = 0,UnFreeze_Epoch = 100,Freeze_Train = False,optimizer_type = 'adam',Init_lr = 5e-4。(不冻结) 111 | # SGD: 112 | # Init_Epoch = 0,Freeze_Epoch = 50,UnFreeze_Epoch = 120,Freeze_Train = True,optimizer_type = 'sgd',Init_lr = 1e-2。(冻结) 113 | # Init_Epoch = 0,UnFreeze_Epoch = 120,Freeze_Train = False,optimizer_type = 'sgd',Init_lr = 1e-2。(不冻结) 114 | # 其中:由于从主干网络的预训练权重开始训练,主干的权值不一定适合语义分割,需要更多的训练跳出局部最优解。 115 | # UnFreeze_Epoch可以在120-300之间调整。 116 | # Adam相较于SGD收敛的快一些。因此UnFreeze_Epoch理论上可以小一点,但依然推荐更多的Epoch。 117 | # (三)batch_size的设置: 118 | # 在显卡能够接受的范围内,以大为好。显存不足与数据集大小无关,提示显存不足(OOM或者CUDA out of memory)请调小batch_size。 119 | # 受到BatchNorm层影响,batch_size最小为2,不能为1。 120 | # 正常情况下Freeze_batch_size建议为Unfreeze_batch_size的1-2倍。不建议设置的差距过大,因为关系到学习率的自动调整。 121 | #----------------------------------------------------------------------------------------------------------------------------# 122 | #------------------------------------------------------------------# 123 | # 冻结阶段训练参数 124 | # 此时模型的主干被冻结了,特征提取网络不发生改变 125 | # 占用的显存较小,仅对网络进行微调 126 | # Init_Epoch 模型当前开始的训练世代,其值可以大于Freeze_Epoch,如设置: 127 | # Init_Epoch = 60、Freeze_Epoch = 50、UnFreeze_Epoch = 100 128 | # 会跳过冻结阶段,直接从60代开始,并调整对应的学习率。 129 | # (断点续练时使用) 130 | # Freeze_Epoch 模型冻结训练的Freeze_Epoch 131 | # (当Freeze_Train=False时失效) 132 | # Freeze_batch_size 模型冻结训练的batch_size 133 | # (当Freeze_Train=False时失效) 134 | #------------------------------------------------------------------# 135 | Init_Epoch = 0 136 | Freeze_Epoch = 50 137 | Freeze_batch_size = 8 138 | #------------------------------------------------------------------# 139 | # 解冻阶段训练参数 140 | # 此时模型的主干不被冻结了,特征提取网络会发生改变 141 | # 占用的显存较大,网络所有的参数都会发生改变 142 | # UnFreeze_Epoch 模型总共训练的epoch 143 | # Unfreeze_batch_size 模型在解冻后的batch_size 144 | #------------------------------------------------------------------# 145 | UnFreeze_Epoch = 100 146 | Unfreeze_batch_size = 4 147 | #------------------------------------------------------------------# 148 | # Freeze_Train 是否进行冻结训练 149 | # 默认先冻结主干训练后解冻训练。 150 | #------------------------------------------------------------------# 151 | Freeze_Train = True 152 | 153 | #------------------------------------------------------------------# 154 | # 其它训练参数:学习率、优化器、学习率下降有关 155 | #------------------------------------------------------------------# 156 | #------------------------------------------------------------------# 157 | # Init_lr 模型的最大学习率 158 | # 当使用Adam优化器时建议设置 Init_lr=5e-4 159 | # 当使用SGD优化器时建议设置 Init_lr=1e-2 160 | # Min_lr 模型的最小学习率,默认为最大学习率的0.01 161 | #------------------------------------------------------------------# 162 | Init_lr = 1e-2 163 | Min_lr = Init_lr * 0.01 164 | #------------------------------------------------------------------# 165 | # optimizer_type 使用到的优化器种类,可选的有adam、sgd 166 | # 当使用Adam优化器时建议设置 Init_lr=1e-3 167 | # 当使用SGD优化器时建议设置 Init_lr=1e-2 168 | # momentum 优化器内部使用到的momentum参数 169 | #------------------------------------------------------------------# 170 | optimizer_type = "sgd" 171 | momentum = 0.9 172 | #------------------------------------------------------------------# 173 | # lr_decay_type 使用到的学习率下降方式,可选的有'step'、'cos' 174 | #------------------------------------------------------------------# 175 | lr_decay_type = 'cos' 176 | #------------------------------------------------------------------# 177 | # save_period 多少个epoch保存一次权值 178 | #------------------------------------------------------------------# 179 | save_period = 5 180 | #------------------------------------------------------------------# 181 | # save_dir 权值与日志文件保存的文件夹 182 | #------------------------------------------------------------------# 183 | save_dir = 'logs' 184 | #------------------------------------------------------------------# 185 | # eval_flag 是否在训练时进行评估,评估对象为验证集 186 | # eval_period 代表多少个epoch评估一次,不建议频繁的评估 187 | # 评估需要消耗较多的时间,频繁评估会导致训练非常慢 188 | # 此处获得的mAP会与get_map.py获得的会有所不同,原因有二: 189 | # (一)此处获得的mAP为验证集的mAP。 190 | # (二)此处设置评估参数较为保守,目的是加快评估速度。 191 | #------------------------------------------------------------------# 192 | eval_flag = True 193 | eval_period = 5 194 | 195 | #------------------------------------------------------------------# 196 | # VOCdevkit_path 数据集路径 197 | #------------------------------------------------------------------# 198 | VOCdevkit_path = 'VOCdevkit' 199 | #------------------------------------------------------------------# 200 | # 建议选项: 201 | # 种类少(几类)时,设置为True 202 | # 种类多(十几类)时,如果batch_size比较大(10以上),那么设置为True 203 | # 种类多(十几类)时,如果batch_size比较小(10以下),那么设置为False 204 | #------------------------------------------------------------------# 205 | dice_loss = False 206 | #------------------------------------------------------------------# 207 | # 是否使用focal loss来防止正负样本不平衡 208 | #------------------------------------------------------------------# 209 | focal_loss = False 210 | #------------------------------------------------------------------# 211 | # 是否给不同种类赋予不同的损失权值,默认是平衡的。 212 | # 设置的话,注意设置成numpy形式的,长度和num_classes一样。 213 | # 如: 214 | # num_classes = 3 215 | # cls_weights = np.array([1, 2, 3], np.float32) 216 | #------------------------------------------------------------------# 217 | cls_weights = np.ones([num_classes], np.float32) 218 | #---------------------------------------------------------------------# 219 | # 是否使用辅助分支,会占用大量显存 220 | #---------------------------------------------------------------------# 221 | aux_branch = False 222 | #-------------------------------------------------------------------# 223 | # 用于设置是否使用多线程读取数据,1代表关闭多线程 224 | # 开启后会加快数据读取速度,但是会占用更多内存 225 | # 在IO为瓶颈的时候再开启多线程,即GPU运算速度远大于读取图片的速度。 226 | #-------------------------------------------------------------------# 227 | num_workers = 1 228 | 229 | #------------------------------------------------------# 230 | # 设置用到的显卡 231 | #------------------------------------------------------# 232 | os.environ["CUDA_VISIBLE_DEVICES"] = ','.join(str(x) for x in train_gpu) 233 | ngpus_per_node = len(train_gpu) 234 | 235 | gpus = tf.config.experimental.list_physical_devices(device_type='GPU') 236 | for gpu in gpus: 237 | tf.config.experimental.set_memory_growth(gpu, True) 238 | 239 | #------------------------------------------------------# 240 | # 判断当前使用的GPU数量与机器上实际的GPU数量 241 | #------------------------------------------------------# 242 | if ngpus_per_node > 1 and ngpus_per_node > len(gpus): 243 | raise ValueError("The number of GPUs specified for training is more than the GPUs on the machine") 244 | 245 | if ngpus_per_node > 1: 246 | strategy = tf.distribute.MirroredStrategy() 247 | else: 248 | strategy = None 249 | print('Number of devices: {}'.format(ngpus_per_node)) 250 | 251 | if ngpus_per_node > 1: 252 | with strategy.scope(): 253 | #------------------------------------------------------# 254 | # 获取model 255 | #------------------------------------------------------# 256 | model = pspnet([input_shape[0], input_shape[1], 3], num_classes, downsample_factor=downsample_factor, backbone=backbone, aux_branch=aux_branch) 257 | if model_path != '': 258 | #------------------------------------------------------# 259 | # 载入预训练权重 260 | #------------------------------------------------------# 261 | print('Load weights {}.'.format(model_path)) 262 | model.load_weights(model_path, by_name=True, skip_mismatch=True) 263 | else: 264 | #------------------------------------------------------# 265 | # 获取model 266 | #------------------------------------------------------# 267 | model = pspnet([input_shape[0], input_shape[1], 3], num_classes, downsample_factor=downsample_factor, backbone=backbone, aux_branch=aux_branch) 268 | if model_path != '': 269 | #------------------------------------------------------# 270 | # 载入预训练权重 271 | #------------------------------------------------------# 272 | print('Load weights {}.'.format(model_path)) 273 | model.load_weights(model_path, by_name=True, skip_mismatch=True) 274 | 275 | #--------------------------# 276 | # 使用到的损失函数 277 | #--------------------------# 278 | if focal_loss: 279 | if dice_loss: 280 | loss = dice_loss_with_Focal_Loss(cls_weights) 281 | else: 282 | loss = Focal_Loss(cls_weights) 283 | else: 284 | if dice_loss: 285 | loss = dice_loss_with_CE(cls_weights) 286 | else: 287 | loss = CE(cls_weights) 288 | 289 | #---------------------------# 290 | # 读取数据集对应的txt 291 | #---------------------------# 292 | with open(os.path.join(VOCdevkit_path, "VOC2007/ImageSets/Segmentation/train.txt"),"r") as f: 293 | train_lines = f.readlines() 294 | with open(os.path.join(VOCdevkit_path, "VOC2007/ImageSets/Segmentation/val.txt"),"r") as f: 295 | val_lines = f.readlines() 296 | num_train = len(train_lines) 297 | num_val = len(val_lines) 298 | 299 | show_config( 300 | num_classes = num_classes, backbone = backbone, model_path = model_path, input_shape = input_shape, \ 301 | Init_Epoch = Init_Epoch, Freeze_Epoch = Freeze_Epoch, UnFreeze_Epoch = UnFreeze_Epoch, Freeze_batch_size = Freeze_batch_size, Unfreeze_batch_size = Unfreeze_batch_size, Freeze_Train = Freeze_Train, \ 302 | Init_lr = Init_lr, Min_lr = Min_lr, optimizer_type = optimizer_type, momentum = momentum, lr_decay_type = lr_decay_type, \ 303 | save_period = save_period, save_dir = save_dir, num_workers = num_workers, num_train = num_train, num_val = num_val 304 | ) 305 | #-----------------------------------------------# 306 | # 总训练世代指的是遍历全部数据的总次数 307 | # 总训练步长指的是梯度下降的总次数 308 | # 计算建议Epoch时。只考虑了解冻部分 309 | #-----------------------------------------------# 310 | wanted_step = 1.5e4 if optimizer_type == "sgd" else 0.5e4 311 | total_step = num_train // Unfreeze_batch_size * UnFreeze_Epoch 312 | if total_step <= wanted_step: 313 | if num_train // Unfreeze_batch_size == 0: 314 | raise ValueError('数据集过小,无法进行训练,请扩充数据集。') 315 | wanted_epoch = wanted_step // (num_train // Unfreeze_batch_size) + 1 316 | print("\n\033[1;33;44m[Warning] 使用%s优化器时,建议将训练总步长设置到%d以上。\033[0m"%(optimizer_type, wanted_step)) 317 | print("\033[1;33;44m[Warning] 本次运行的总训练数据量为%d,Unfreeze_batch_size为%d,共训练%d个Epoch,计算出总训练步长为%d。\033[0m"%(num_train, Unfreeze_batch_size, UnFreeze_Epoch, total_step)) 318 | print("\033[1;33;44m[Warning] 由于总训练步长为%d,小于建议总步长%d,建议设置总世代为%d。\033[0m"%(total_step, wanted_step, wanted_epoch)) 319 | 320 | #------------------------------------------------------# 321 | # 主干特征提取网络特征通用,冻结训练可以加快训练速度 322 | # 也可以在训练初期防止权值被破坏。 323 | # Init_Epoch为起始世代 324 | # Freeze_Epoch为冻结训练的世代 325 | # Epoch总训练世代 326 | # 提示OOM或者显存不足请调小Batch_size 327 | #------------------------------------------------------# 328 | if True: 329 | if Freeze_Train: 330 | #------------------------------------# 331 | # 冻结一定部分训练 332 | #------------------------------------# 333 | if backbone=="mobilenet": 334 | freeze_layers = 146 335 | else: 336 | freeze_layers = 172 337 | for i in range(freeze_layers): model.layers[i].trainable = False 338 | print('Freeze the first {} layers of total {} layers.'.format(freeze_layers, len(model.layers))) 339 | 340 | #-------------------------------------------------------------------# 341 | # 如果不冻结训练的话,直接设置batch_size为Unfreeze_batch_size 342 | #-------------------------------------------------------------------# 343 | batch_size = Freeze_batch_size if Freeze_Train else Unfreeze_batch_size 344 | 345 | #-------------------------------------------------------------------# 346 | # 判断当前batch_size,自适应调整学习率 347 | #-------------------------------------------------------------------# 348 | nbs = 16 349 | lr_limit_max = 5e-4 if optimizer_type == 'adam' else 1e-1 350 | lr_limit_min = 3e-4 if optimizer_type == 'adam' else 5e-4 351 | Init_lr_fit = min(max(batch_size / nbs * Init_lr, lr_limit_min), lr_limit_max) 352 | Min_lr_fit = min(max(batch_size / nbs * Min_lr, lr_limit_min * 1e-2), lr_limit_max * 1e-2) 353 | 354 | #---------------------------------------# 355 | # 获得学习率下降的公式 356 | #---------------------------------------# 357 | lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch) 358 | 359 | epoch_step = num_train // batch_size 360 | epoch_step_val = num_val // batch_size 361 | 362 | if epoch_step == 0 or epoch_step_val == 0: 363 | raise ValueError('数据集过小,无法进行训练,请扩充数据集。') 364 | 365 | train_dataloader = PSPnetDataset(train_lines, input_shape, batch_size, num_classes, aux_branch, True, VOCdevkit_path) 366 | val_dataloader = PSPnetDataset(val_lines, input_shape, batch_size, num_classes, aux_branch, False, VOCdevkit_path) 367 | 368 | optimizer = { 369 | 'adam' : Adam(lr = Init_lr, beta_1 = momentum), 370 | 'sgd' : SGD(lr = Init_lr, momentum = momentum, nesterov=True) 371 | }[optimizer_type] 372 | if eager: 373 | start_epoch = Init_Epoch 374 | end_epoch = UnFreeze_Epoch 375 | UnFreeze_flag = False 376 | 377 | gen = tf.data.Dataset.from_generator(partial(train_dataloader.generate), (tf.float32, tf.float32)) 378 | gen_val = tf.data.Dataset.from_generator(partial(val_dataloader.generate), (tf.float32, tf.float32)) 379 | 380 | gen = gen.shuffle(buffer_size = batch_size).prefetch(buffer_size = batch_size) 381 | gen_val = gen_val.shuffle(buffer_size = batch_size).prefetch(buffer_size = batch_size) 382 | 383 | if ngpus_per_node > 1: 384 | gen = strategy.experimental_distribute_dataset(gen) 385 | gen_val = strategy.experimental_distribute_dataset(gen_val) 386 | 387 | time_str = datetime.datetime.strftime(datetime.datetime.now(),'%Y_%m_%d_%H_%M_%S') 388 | log_dir = os.path.join(save_dir, "loss_" + str(time_str)) 389 | loss_history = LossHistory(log_dir) 390 | eval_callback = EvalCallback(model, input_shape, num_classes, val_lines, VOCdevkit_path, log_dir, \ 391 | eval_flag=eval_flag, period=eval_period) 392 | #---------------------------------------# 393 | # 开始模型训练 394 | #---------------------------------------# 395 | for epoch in range(start_epoch, end_epoch): 396 | #---------------------------------------# 397 | # 如果模型有冻结学习部分 398 | # 则解冻,并设置参数 399 | #---------------------------------------# 400 | if epoch >= Freeze_Epoch and not UnFreeze_flag and Freeze_Train: 401 | batch_size = Unfreeze_batch_size 402 | 403 | #-------------------------------------------------------------------# 404 | # 判断当前batch_size,自适应调整学习率 405 | #-------------------------------------------------------------------# 406 | nbs = 16 407 | lr_limit_max = 5e-4 if optimizer_type == 'adam' else 1e-1 408 | lr_limit_min = 3e-4 if optimizer_type == 'adam' else 5e-4 409 | Init_lr_fit = min(max(batch_size / nbs * Init_lr, lr_limit_min), lr_limit_max) 410 | Min_lr_fit = min(max(batch_size / nbs * Min_lr, lr_limit_min * 1e-2), lr_limit_max * 1e-2) 411 | #---------------------------------------# 412 | # 获得学习率下降的公式 413 | #---------------------------------------# 414 | lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch) 415 | 416 | for i in range(len(model.layers)): 417 | model.layers[i].trainable = True 418 | 419 | epoch_step = num_train // batch_size 420 | epoch_step_val = num_val // batch_size 421 | 422 | if epoch_step == 0 or epoch_step_val == 0: 423 | raise ValueError("数据集过小,无法继续进行训练,请扩充数据集。") 424 | 425 | train_dataloader.batch_size = batch_size 426 | val_dataloader.batch_size = batch_size 427 | 428 | gen = tf.data.Dataset.from_generator(partial(train_dataloader.generate), (tf.float32, tf.float32)) 429 | gen_val = tf.data.Dataset.from_generator(partial(val_dataloader.generate), (tf.float32, tf.float32)) 430 | 431 | gen = gen.shuffle(buffer_size = batch_size).prefetch(buffer_size = batch_size) 432 | gen_val = gen_val.shuffle(buffer_size = batch_size).prefetch(buffer_size = batch_size) 433 | 434 | if ngpus_per_node > 1: 435 | gen = strategy.experimental_distribute_dataset(gen) 436 | gen_val = strategy.experimental_distribute_dataset(gen_val) 437 | 438 | UnFreeze_flag = True 439 | 440 | lr = lr_scheduler_func(epoch) 441 | K.set_value(optimizer.lr, lr) 442 | 443 | fit_one_epoch(model, loss, loss_history, eval_callback, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, 444 | end_epoch, aux_branch, f_score(), save_period, save_dir, strategy) 445 | 446 | train_dataloader.on_epoch_end() 447 | val_dataloader.on_epoch_end() 448 | 449 | else: 450 | start_epoch = Init_Epoch 451 | end_epoch = Freeze_Epoch if Freeze_Train else UnFreeze_Epoch 452 | if ngpus_per_node > 1: 453 | with strategy.scope(): 454 | if aux_branch: 455 | model.compile( 456 | loss = [loss, loss], 457 | loss_weights = [1, 0.4], 458 | optimizer = optimizer, 459 | metrics = [f_score()], 460 | ) 461 | else: 462 | model.compile( 463 | loss = loss, 464 | optimizer = optimizer, 465 | metrics = [f_score()] 466 | ) 467 | else: 468 | if aux_branch: 469 | model.compile( 470 | loss = [loss, loss], 471 | loss_weights = [1, 0.4], 472 | optimizer = optimizer, 473 | metrics = [f_score()], 474 | ) 475 | else: 476 | model.compile( 477 | loss = loss, 478 | optimizer = optimizer, 479 | metrics = [f_score()] 480 | ) 481 | 482 | #-------------------------------------------------------------------------------# 483 | # 训练参数的设置 484 | # logging 用于设置tensorboard的保存地址 485 | # checkpoint 用于设置权值保存的细节,period用于修改多少epoch保存一次 486 | # lr_scheduler 用于设置学习率下降的方式 487 | # early_stopping 用于设定早停,val_loss多次不下降自动结束训练,表示模型基本收敛 488 | #-------------------------------------------------------------------------------# 489 | time_str = datetime.datetime.strftime(datetime.datetime.now(),'%Y_%m_%d_%H_%M_%S') 490 | log_dir = os.path.join(save_dir, "loss_" + str(time_str)) 491 | logging = TensorBoard(log_dir) 492 | loss_history = LossHistory(log_dir) 493 | checkpoint = ModelCheckpoint(os.path.join(save_dir, "ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5"), 494 | monitor = 'val_loss', save_weights_only = True, save_best_only = False, period = save_period) 495 | checkpoint_last = ModelCheckpoint(os.path.join(save_dir, "last_epoch_weights.h5"), 496 | monitor = 'val_loss', save_weights_only = True, save_best_only = False, period = 1) 497 | checkpoint_best = ModelCheckpoint(os.path.join(save_dir, "best_epoch_weights.h5"), 498 | monitor = 'val_loss', save_weights_only = True, save_best_only = True, period = 1) 499 | early_stopping = EarlyStopping(monitor='val_loss', min_delta = 0, patience = 10, verbose = 1) 500 | lr_scheduler = LearningRateScheduler(lr_scheduler_func, verbose = 1) 501 | eval_callback = EvalCallback(model, input_shape, num_classes, val_lines, VOCdevkit_path, log_dir, \ 502 | eval_flag=eval_flag, period=eval_period) 503 | callbacks = [logging, loss_history, checkpoint, checkpoint_last, checkpoint_best, lr_scheduler, eval_callback] 504 | 505 | if start_epoch < end_epoch: 506 | print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size)) 507 | model.fit( 508 | x = train_dataloader, 509 | steps_per_epoch = epoch_step, 510 | validation_data = val_dataloader, 511 | validation_steps = epoch_step_val, 512 | epochs = end_epoch, 513 | initial_epoch = start_epoch, 514 | use_multiprocessing = True if num_workers > 1 else False, 515 | workers = num_workers, 516 | callbacks = callbacks 517 | ) 518 | #---------------------------------------# 519 | # 如果模型有冻结学习部分 520 | # 则解冻,并设置参数 521 | #---------------------------------------# 522 | if Freeze_Train: 523 | batch_size = Unfreeze_batch_size 524 | start_epoch = Freeze_Epoch if start_epoch < Freeze_Epoch else start_epoch 525 | end_epoch = UnFreeze_Epoch 526 | 527 | #-------------------------------------------------------------------# 528 | # 判断当前batch_size,自适应调整学习率 529 | #-------------------------------------------------------------------# 530 | nbs = 16 531 | lr_limit_max = 5e-4 if optimizer_type == 'adam' else 1e-1 532 | lr_limit_min = 3e-4 if optimizer_type == 'adam' else 5e-4 533 | Init_lr_fit = min(max(batch_size / nbs * Init_lr, lr_limit_min), lr_limit_max) 534 | Min_lr_fit = min(max(batch_size / nbs * Min_lr, lr_limit_min * 1e-2), lr_limit_max * 1e-2) 535 | #---------------------------------------# 536 | # 获得学习率下降的公式 537 | #---------------------------------------# 538 | lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch) 539 | lr_scheduler = LearningRateScheduler(lr_scheduler_func, verbose = 1) 540 | callbacks = [logging, loss_history, checkpoint, checkpoint_last, checkpoint_best, lr_scheduler, eval_callback] 541 | 542 | for i in range(len(model.layers)): 543 | model.layers[i].trainable = True 544 | if ngpus_per_node > 1: 545 | with strategy.scope(): 546 | if aux_branch: 547 | model.compile( 548 | loss = [loss, loss], 549 | loss_weights = [1, 0.4], 550 | optimizer = optimizer, 551 | metrics = [f_score()], 552 | ) 553 | else: 554 | model.compile( 555 | loss = loss, 556 | optimizer = optimizer, 557 | metrics = [f_score()] 558 | ) 559 | else: 560 | if aux_branch: 561 | model.compile( 562 | loss = [loss, loss], 563 | loss_weights = [1, 0.4], 564 | optimizer = optimizer, 565 | metrics = [f_score()], 566 | ) 567 | else: 568 | model.compile( 569 | loss = loss, 570 | optimizer = optimizer, 571 | metrics = [f_score()] 572 | ) 573 | 574 | epoch_step = num_train // batch_size 575 | epoch_step_val = num_val // batch_size 576 | 577 | if epoch_step == 0 or epoch_step_val == 0: 578 | raise ValueError("数据集过小,无法继续进行训练,请扩充数据集。") 579 | 580 | train_dataloader.batch_size = Unfreeze_batch_size 581 | val_dataloader.batch_size = Unfreeze_batch_size 582 | 583 | print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size)) 584 | model.fit( 585 | x = train_dataloader, 586 | steps_per_epoch = epoch_step, 587 | validation_data = val_dataloader, 588 | validation_steps = epoch_step_val, 589 | epochs = end_epoch, 590 | initial_epoch = start_epoch, 591 | use_multiprocessing = True if num_workers > 1 else False, 592 | workers = num_workers, 593 | callbacks = callbacks 594 | ) 595 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------------------------------- /utils/callbacks.py: -------------------------------------------------------------------------------- 1 | import os 2 | import warnings 3 | 4 | import matplotlib 5 | matplotlib.use('Agg') 6 | from matplotlib import pyplot as plt 7 | import scipy.signal 8 | 9 | import cv2 10 | import shutil 11 | import numpy as np 12 | import tensorflow as tf 13 | 14 | from PIL import Image 15 | from tensorflow import keras 16 | from tensorflow.keras import backend as K 17 | from tqdm import tqdm 18 | from .utils import cvtColor, preprocess_input, resize_image 19 | from .utils_metrics import compute_mIoU 20 | 21 | 22 | class LossHistory(keras.callbacks.Callback): 23 | def __init__(self, log_dir): 24 | self.log_dir = log_dir 25 | self.losses = [] 26 | self.val_loss = [] 27 | 28 | os.makedirs(self.log_dir) 29 | 30 | def on_epoch_end(self, epoch, logs={}): 31 | if not os.path.exists(self.log_dir): 32 | os.makedirs(self.log_dir) 33 | 34 | self.losses.append(logs.get('loss')) 35 | self.val_loss.append(logs.get('val_loss')) 36 | 37 | with open(os.path.join(self.log_dir, "epoch_loss.txt"), 'a') as f: 38 | f.write(str(logs.get('loss'))) 39 | f.write("\n") 40 | with open(os.path.join(self.log_dir, "epoch_val_loss.txt"), 'a') as f: 41 | f.write(str(logs.get('val_loss'))) 42 | f.write("\n") 43 | self.loss_plot() 44 | 45 | def loss_plot(self): 46 | iters = range(len(self.losses)) 47 | 48 | plt.figure() 49 | plt.plot(iters, self.losses, 'red', linewidth = 2, label='train loss') 50 | plt.plot(iters, self.val_loss, 'coral', linewidth = 2, label='val loss') 51 | try: 52 | if len(self.losses) < 25: 53 | num = 5 54 | else: 55 | num = 15 56 | 57 | plt.plot(iters, scipy.signal.savgol_filter(self.losses, num, 3), 'green', linestyle = '--', linewidth = 2, label='smooth train loss') 58 | plt.plot(iters, scipy.signal.savgol_filter(self.val_loss, num, 3), '#8B4513', linestyle = '--', linewidth = 2, label='smooth val loss') 59 | except: 60 | pass 61 | 62 | plt.grid(True) 63 | plt.xlabel('Epoch') 64 | plt.ylabel('Loss') 65 | plt.title('A Loss Curve') 66 | plt.legend(loc="upper right") 67 | 68 | plt.savefig(os.path.join(self.log_dir, "epoch_loss.png")) 69 | 70 | plt.cla() 71 | plt.close("all") 72 | 73 | class ExponentDecayScheduler(keras.callbacks.Callback): 74 | def __init__(self, 75 | decay_rate, 76 | verbose=0): 77 | super(ExponentDecayScheduler, self).__init__() 78 | self.decay_rate = decay_rate 79 | self.verbose = verbose 80 | self.learning_rates = [] 81 | 82 | def on_epoch_end(self, batch, logs=None): 83 | learning_rate = K.get_value(self.model.optimizer.lr) * self.decay_rate 84 | K.set_value(self.model.optimizer.lr, learning_rate) 85 | if self.verbose > 0: 86 | print('\nSetting learning rate to %s.' % (learning_rate)) 87 | 88 | class EvalCallback(keras.callbacks.Callback): 89 | def __init__(self, model_body, input_shape, num_classes, image_ids, dataset_path, log_dir,\ 90 | miou_out_path=".temp_miou_out", eval_flag=True, period=1): 91 | super(EvalCallback, self).__init__() 92 | 93 | self.model_body = model_body 94 | self.input_shape = input_shape 95 | self.num_classes = num_classes 96 | self.image_ids = image_ids 97 | self.dataset_path = dataset_path 98 | self.log_dir = log_dir 99 | self.miou_out_path = miou_out_path 100 | self.eval_flag = eval_flag 101 | self.period = period 102 | 103 | self.image_ids = [image_id.split()[0] for image_id in image_ids] 104 | self.mious = [0] 105 | self.epoches = [0] 106 | if self.eval_flag: 107 | with open(os.path.join(self.log_dir, "epoch_miou.txt"), 'a') as f: 108 | f.write(str(0)) 109 | f.write("\n") 110 | 111 | @tf.function 112 | def get_pred(self, image_data): 113 | pr = self.model_body(image_data, training=False) 114 | return pr 115 | 116 | def get_miou_png(self, image): 117 | #---------------------------------------------------------# 118 | # 在这里将图像转换成RGB图像,防止灰度图在预测时报错。 119 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 120 | #---------------------------------------------------------# 121 | image = cvtColor(image) 122 | orininal_h = np.array(image).shape[0] 123 | orininal_w = np.array(image).shape[1] 124 | #---------------------------------------------------------# 125 | # 给图像增加灰条,实现不失真的resize 126 | # 也可以直接resize进行识别 127 | #---------------------------------------------------------# 128 | image_data, nw, nh = resize_image(image, (self.input_shape[1], self.input_shape[0])) 129 | #---------------------------------------------------------# 130 | # 归一化+通道数调整到第一维度+添加上batch_size维度 131 | #---------------------------------------------------------# 132 | image_data = np.expand_dims(preprocess_input(np.array(image_data, np.float32)), 0) 133 | 134 | #---------------------------------------------------# 135 | # 图片传入网络进行预测 136 | #---------------------------------------------------# 137 | pr = self.get_pred(image_data)[0].numpy() 138 | #--------------------------------------# 139 | # 将灰条部分截取掉 140 | #--------------------------------------# 141 | pr = pr[int((self.input_shape[0] - nh) // 2) : int((self.input_shape[0] - nh) // 2 + nh), \ 142 | int((self.input_shape[1] - nw) // 2) : int((self.input_shape[1] - nw) // 2 + nw)] 143 | #--------------------------------------# 144 | # 进行图片的resize 145 | #--------------------------------------# 146 | pr = cv2.resize(pr, (orininal_w, orininal_h), interpolation = cv2.INTER_LINEAR) 147 | #---------------------------------------------------# 148 | # 取出每一个像素点的种类 149 | #---------------------------------------------------# 150 | pr = pr.argmax(axis=-1) 151 | 152 | image = Image.fromarray(np.uint8(pr)) 153 | return image 154 | 155 | def on_epoch_end(self, epoch, logs=None): 156 | temp_epoch = epoch + 1 157 | if temp_epoch % self.period == 0 and self.eval_flag: 158 | gt_dir = os.path.join(self.dataset_path, "VOC2007/SegmentationClass/") 159 | pred_dir = os.path.join(self.miou_out_path, 'detection-results') 160 | if not os.path.exists(self.miou_out_path): 161 | os.makedirs(self.miou_out_path) 162 | if not os.path.exists(pred_dir): 163 | os.makedirs(pred_dir) 164 | print("Get miou.") 165 | for image_id in tqdm(self.image_ids): 166 | #-------------------------------# 167 | # 从文件中读取图像 168 | #-------------------------------# 169 | image_path = os.path.join(self.dataset_path, "VOC2007/JPEGImages/"+image_id+".jpg") 170 | image = Image.open(image_path) 171 | #------------------------------# 172 | # 获得预测txt 173 | #------------------------------# 174 | image = self.get_miou_png(image) 175 | image.save(os.path.join(pred_dir, image_id + ".png")) 176 | 177 | print("Calculate miou.") 178 | _, IoUs, _, _ = compute_mIoU(gt_dir, pred_dir, self.image_ids, self.num_classes, None) # 执行计算mIoU的函数 179 | temp_miou = np.nanmean(IoUs) * 100 180 | 181 | self.mious.append(temp_miou) 182 | self.epoches.append(temp_epoch) 183 | 184 | with open(os.path.join(self.log_dir, "epoch_miou.txt"), 'a') as f: 185 | f.write(str(temp_miou)) 186 | f.write("\n") 187 | 188 | plt.figure() 189 | plt.plot(self.epoches, self.mious, 'red', linewidth = 2, label='train miou') 190 | 191 | plt.grid(True) 192 | plt.xlabel('Epoch') 193 | plt.ylabel('Miou') 194 | plt.title('A Miou Curve') 195 | plt.legend(loc="upper right") 196 | 197 | plt.savefig(os.path.join(self.log_dir, "epoch_miou.png")) 198 | plt.cla() 199 | plt.close("all") 200 | 201 | print("Get miou done.") 202 | shutil.rmtree(self.miou_out_path) 203 | 204 | class ModelCheckpoint(keras.callbacks.Callback): 205 | def __init__(self, filepath, monitor='val_loss', verbose=0, 206 | save_best_only=False, save_weights_only=False, 207 | mode='auto', period=1): 208 | super(ModelCheckpoint, self).__init__() 209 | self.monitor = monitor 210 | self.verbose = verbose 211 | self.filepath = filepath 212 | self.save_best_only = save_best_only 213 | self.save_weights_only = save_weights_only 214 | self.period = period 215 | self.epochs_since_last_save = 0 216 | 217 | if mode not in ['auto', 'min', 'max']: 218 | warnings.warn('ModelCheckpoint mode %s is unknown, ' 219 | 'fallback to auto mode.' % (mode), 220 | RuntimeWarning) 221 | mode = 'auto' 222 | 223 | if mode == 'min': 224 | self.monitor_op = np.less 225 | self.best = np.Inf 226 | elif mode == 'max': 227 | self.monitor_op = np.greater 228 | self.best = -np.Inf 229 | else: 230 | if 'acc' in self.monitor or self.monitor.startswith('fmeasure'): 231 | self.monitor_op = np.greater 232 | self.best = -np.Inf 233 | else: 234 | self.monitor_op = np.less 235 | self.best = np.Inf 236 | 237 | def on_epoch_end(self, epoch, logs=None): 238 | logs = logs or {} 239 | self.epochs_since_last_save += 1 240 | if self.epochs_since_last_save >= self.period: 241 | self.epochs_since_last_save = 0 242 | filepath = self.filepath.format(epoch=epoch + 1, **logs) 243 | if self.save_best_only: 244 | current = logs.get(self.monitor) 245 | if current is None: 246 | warnings.warn('Can save best model only with %s available, ' 247 | 'skipping.' % (self.monitor), RuntimeWarning) 248 | else: 249 | if self.monitor_op(current, self.best): 250 | if self.verbose > 0: 251 | print('\nEpoch %05d: %s improved from %0.5f to %0.5f,' 252 | ' saving model to %s' 253 | % (epoch + 1, self.monitor, self.best, 254 | current, filepath)) 255 | self.best = current 256 | if self.save_weights_only: 257 | self.model.save_weights(filepath, overwrite=True) 258 | else: 259 | self.model.save(filepath, overwrite=True) 260 | else: 261 | if self.verbose > 0: 262 | print('\nEpoch %05d: %s did not improve' % 263 | (epoch + 1, self.monitor)) 264 | else: 265 | if self.verbose > 0: 266 | print('\nEpoch %05d: saving model to %s' % (epoch + 1, filepath)) 267 | if self.save_weights_only: 268 | self.model.save_weights(filepath, overwrite=True) 269 | else: 270 | self.model.save(filepath, overwrite=True) 271 | -------------------------------------------------------------------------------- /utils/dataloader.py: -------------------------------------------------------------------------------- 1 | import math 2 | import os 3 | from random import shuffle 4 | 5 | import cv2 6 | import numpy as np 7 | from PIL import Image 8 | from tensorflow import keras 9 | 10 | from utils.utils import cvtColor, preprocess_input 11 | 12 | 13 | class PSPnetDataset(keras.utils.Sequence): 14 | def __init__(self, annotation_lines, input_shape, batch_size, num_classes, aux_branch, train, dataset_path): 15 | self.annotation_lines = annotation_lines 16 | self.length = len(self.annotation_lines) 17 | self.input_shape = input_shape 18 | self.batch_size = batch_size 19 | self.num_classes = num_classes 20 | self.aux_branch = aux_branch 21 | self.train = train 22 | self.dataset_path = dataset_path 23 | 24 | def __len__(self): 25 | return math.ceil(len(self.annotation_lines) / float(self.batch_size)) 26 | 27 | def __getitem__(self, index): 28 | images = [] 29 | targets = [] 30 | for i in range(index * self.batch_size, (index + 1) * self.batch_size): 31 | i = i % self.length 32 | name = self.annotation_lines[i].split()[0] 33 | #-------------------------------# 34 | # 从文件中读取图像 35 | #-------------------------------# 36 | jpg = Image.open(os.path.join(os.path.join(self.dataset_path, "VOC2007/JPEGImages"), name + ".jpg")) 37 | png = Image.open(os.path.join(os.path.join(self.dataset_path, "VOC2007/SegmentationClass"), name + ".png")) 38 | #-------------------------------# 39 | # 数据增强 40 | #-------------------------------# 41 | jpg, png = self.get_random_data(jpg, png, self.input_shape, random = self.train) 42 | jpg = preprocess_input(np.array(jpg, np.float64)) 43 | png = np.array(png) 44 | png[png >= self.num_classes] = self.num_classes 45 | #-------------------------------------------------------# 46 | # 转化成one_hot的形式 47 | # 在这里需要+1是因为voc数据集有些标签具有白边部分 48 | # 我们需要将白边部分进行忽略,+1的目的是方便忽略。 49 | #-------------------------------------------------------# 50 | seg_labels = np.eye(self.num_classes + 1)[png.reshape([-1])] 51 | seg_labels = seg_labels.reshape((int(self.input_shape[1]), int(self.input_shape[0]), self.num_classes+1)) 52 | 53 | images.append(jpg) 54 | targets.append(seg_labels) 55 | 56 | images = np.array(images) 57 | targets = np.array(targets) 58 | if self.aux_branch: 59 | return images, [targets, targets] 60 | else: 61 | return images, targets 62 | 63 | def generate(self): 64 | i = 0 65 | while True: 66 | images = [] 67 | targets = [] 68 | for b in range(self.batch_size): 69 | if i==0: 70 | np.random.shuffle(self.annotation_lines) 71 | name = self.annotation_lines[i].split()[0] 72 | #-------------------------------# 73 | # 从文件中读取图像 74 | #-------------------------------# 75 | jpg = Image.open(os.path.join(os.path.join(self.dataset_path, "VOC2007/JPEGImages"), name + ".jpg")) 76 | png = Image.open(os.path.join(os.path.join(self.dataset_path, "VOC2007/SegmentationClass"), name + ".png")) 77 | #-------------------------------# 78 | # 数据增强 79 | #-------------------------------# 80 | jpg, png = self.get_random_data(jpg, png, self.input_shape, random = self.train) 81 | jpg = preprocess_input(np.array(jpg, np.float64)) 82 | png = np.array(png) 83 | png[png >= self.num_classes] = self.num_classes 84 | #-------------------------------------------------------# 85 | # 转化成one_hot的形式 86 | # 在这里需要+1是因为voc数据集有些标签具有白边部分 87 | # 我们需要将白边部分进行忽略,+1的目的是方便忽略。 88 | #-------------------------------------------------------# 89 | seg_labels = np.eye(self.num_classes + 1)[png.reshape([-1])] 90 | seg_labels = seg_labels.reshape((int(self.input_shape[1]), int(self.input_shape[0]), self.num_classes+1)) 91 | 92 | images.append(jpg) 93 | targets.append(seg_labels) 94 | i = (i + 1) % self.length 95 | 96 | images = np.array(images) 97 | targets = np.array(targets) 98 | yield images, targets 99 | 100 | def on_epoch_end(self): 101 | shuffle(self.annotation_lines) 102 | 103 | def rand(self, a=0, b=1): 104 | return np.random.rand() * (b - a) + a 105 | 106 | def get_random_data(self, image, label, input_shape, jitter=.3, hue=.1, sat=0.7, val=0.3, random=True): 107 | image = cvtColor(image) 108 | label = Image.fromarray(np.array(label)) 109 | #------------------------------# 110 | # 获得图像的高宽与目标高宽 111 | #------------------------------# 112 | iw, ih = image.size 113 | h, w = input_shape 114 | 115 | if not random: 116 | iw, ih = image.size 117 | scale = min(w/iw, h/ih) 118 | nw = int(iw*scale) 119 | nh = int(ih*scale) 120 | 121 | image = image.resize((nw,nh), Image.BICUBIC) 122 | new_image = Image.new('RGB', [w, h], (128,128,128)) 123 | new_image.paste(image, ((w-nw)//2, (h-nh)//2)) 124 | 125 | label = label.resize((nw,nh), Image.NEAREST) 126 | new_label = Image.new('L', [w, h], (0)) 127 | new_label.paste(label, ((w-nw)//2, (h-nh)//2)) 128 | return new_image, new_label 129 | 130 | #------------------------------------------# 131 | # 对图像进行缩放并且进行长和宽的扭曲 132 | #------------------------------------------# 133 | new_ar = iw/ih * self.rand(1-jitter,1+jitter) / self.rand(1-jitter,1+jitter) 134 | scale = self.rand(0.25, 2) 135 | if new_ar < 1: 136 | nh = int(scale*h) 137 | nw = int(nh*new_ar) 138 | else: 139 | nw = int(scale*w) 140 | nh = int(nw/new_ar) 141 | image = image.resize((nw,nh), Image.BICUBIC) 142 | label = label.resize((nw,nh), Image.NEAREST) 143 | 144 | #------------------------------------------# 145 | # 翻转图像 146 | #------------------------------------------# 147 | flip = self.rand()<.5 148 | if flip: 149 | image = image.transpose(Image.FLIP_LEFT_RIGHT) 150 | label = label.transpose(Image.FLIP_LEFT_RIGHT) 151 | 152 | #------------------------------------------# 153 | # 将图像多余的部分加上灰条 154 | #------------------------------------------# 155 | dx = int(self.rand(0, w-nw)) 156 | dy = int(self.rand(0, h-nh)) 157 | new_image = Image.new('RGB', (w,h), (128,128,128)) 158 | new_label = Image.new('L', (w,h), (0)) 159 | new_image.paste(image, (dx, dy)) 160 | new_label.paste(label, (dx, dy)) 161 | image = new_image 162 | label = new_label 163 | 164 | image_data = np.array(image, np.uint8) 165 | 166 | #------------------------------------------# 167 | # 高斯模糊 168 | #------------------------------------------# 169 | blur = self.rand() < 0.25 170 | if blur: 171 | image_data = cv2.GaussianBlur(image_data, (5, 5), 0) 172 | 173 | #------------------------------------------# 174 | # 旋转 175 | #------------------------------------------# 176 | rotate = self.rand() < 0.25 177 | if rotate: 178 | center = (w // 2, h // 2) 179 | rotation = np.random.randint(-10, 11) 180 | M = cv2.getRotationMatrix2D(center, -rotation, scale=1) 181 | image_data = cv2.warpAffine(image_data, M, (w, h), flags=cv2.INTER_CUBIC, borderValue=(128,128,128)) 182 | label = cv2.warpAffine(np.array(label, np.uint8), M, (w, h), flags=cv2.INTER_NEAREST, borderValue=(0)) 183 | 184 | #---------------------------------# 185 | # 对图像进行色域变换 186 | # 计算色域变换的参数 187 | #---------------------------------# 188 | r = np.random.uniform(-1, 1, 3) * [hue, sat, val] + 1 189 | #---------------------------------# 190 | # 将图像转到HSV上 191 | #---------------------------------# 192 | hue, sat, val = cv2.split(cv2.cvtColor(image_data, cv2.COLOR_RGB2HSV)) 193 | dtype = image_data.dtype 194 | #---------------------------------# 195 | # 应用变换 196 | #---------------------------------# 197 | x = np.arange(0, 256, dtype=r.dtype) 198 | lut_hue = ((x * r[0]) % 180).astype(dtype) 199 | lut_sat = np.clip(x * r[1], 0, 255).astype(dtype) 200 | lut_val = np.clip(x * r[2], 0, 255).astype(dtype) 201 | 202 | image_data = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val))) 203 | image_data = cv2.cvtColor(image_data, cv2.COLOR_HSV2RGB) 204 | 205 | return image_data, label -------------------------------------------------------------------------------- /utils/utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from PIL import Image 3 | 4 | #---------------------------------------------------------# 5 | # 将图像转换成RGB图像,防止灰度图在预测时报错。 6 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 7 | #---------------------------------------------------------# 8 | def cvtColor(image): 9 | if len(np.shape(image)) == 3 and np.shape(image)[2] == 3: 10 | return image 11 | else: 12 | image = image.convert('RGB') 13 | return image 14 | 15 | #---------------------------------------------------# 16 | # 对输入图像进行resize 17 | #---------------------------------------------------# 18 | def resize_image(image, size): 19 | iw, ih = image.size 20 | w, h = size 21 | 22 | scale = min(w/iw, h/ih) 23 | nw = int(iw*scale) 24 | nh = int(ih*scale) 25 | 26 | image = image.resize((nw,nh), Image.BICUBIC) 27 | new_image = Image.new('RGB', size, (128,128,128)) 28 | new_image.paste(image, ((w-nw)//2, (h-nh)//2)) 29 | 30 | return new_image, nw, nh 31 | 32 | def preprocess_input(image): 33 | image = image / 255 34 | return image 35 | 36 | def show_config(**kwargs): 37 | print('Configurations:') 38 | print('-' * 70) 39 | print('|%25s | %40s|' % ('keys', 'values')) 40 | print('-' * 70) 41 | for key, value in kwargs.items(): 42 | print('|%25s | %40s|' % (str(key), str(value))) 43 | print('-' * 70) 44 | 45 | #-------------------------------------------------------------------------------------------------------------------------------# 46 | # From https://github.com/ckyrkou/Keras_FLOP_Estimator 47 | # Fix lots of bugs 48 | #-------------------------------------------------------------------------------------------------------------------------------# 49 | def net_flops(model, table=False, print_result=True): 50 | if (table == True): 51 | print("\n") 52 | print('%25s | %16s | %16s | %16s | %16s | %6s | %6s' % ( 53 | 'Layer Name', 'Input Shape', 'Output Shape', 'Kernel Size', 'Filters', 'Strides', 'FLOPS')) 54 | print('=' * 120) 55 | 56 | #---------------------------------------------------# 57 | # 总的FLOPs 58 | #---------------------------------------------------# 59 | t_flops = 0 60 | factor = 1e9 61 | 62 | for l in model.layers: 63 | try: 64 | #--------------------------------------# 65 | # 所需参数的初始化定义 66 | #--------------------------------------# 67 | o_shape, i_shape, strides, ks, filters = ('', '', ''), ('', '', ''), (1, 1), (0, 0), 0 68 | flops = 0 69 | #--------------------------------------# 70 | # 获得层的名字 71 | #--------------------------------------# 72 | name = l.name 73 | 74 | if ('InputLayer' in str(l)): 75 | i_shape = l.get_input_shape_at(0)[1:4] 76 | o_shape = l.get_output_shape_at(0)[1:4] 77 | 78 | #--------------------------------------# 79 | # Reshape层 80 | #--------------------------------------# 81 | elif ('Reshape' in str(l)): 82 | i_shape = l.get_input_shape_at(0)[1:4] 83 | o_shape = l.get_output_shape_at(0)[1:4] 84 | 85 | #--------------------------------------# 86 | # 填充层 87 | #--------------------------------------# 88 | elif ('Padding' in str(l)): 89 | i_shape = l.get_input_shape_at(0)[1:4] 90 | o_shape = l.get_output_shape_at(0)[1:4] 91 | 92 | #--------------------------------------# 93 | # 平铺层 94 | #--------------------------------------# 95 | elif ('Flatten' in str(l)): 96 | i_shape = l.get_input_shape_at(0)[1:4] 97 | o_shape = l.get_output_shape_at(0)[1:4] 98 | 99 | #--------------------------------------# 100 | # 激活函数层 101 | #--------------------------------------# 102 | elif 'Activation' in str(l): 103 | i_shape = l.get_input_shape_at(0)[1:4] 104 | o_shape = l.get_output_shape_at(0)[1:4] 105 | 106 | #--------------------------------------# 107 | # LeakyReLU 108 | #--------------------------------------# 109 | elif 'LeakyReLU' in str(l): 110 | for i in range(len(l._inbound_nodes)): 111 | i_shape = l.get_input_shape_at(i)[1:4] 112 | o_shape = l.get_output_shape_at(i)[1:4] 113 | 114 | flops += i_shape[0] * i_shape[1] * i_shape[2] 115 | 116 | #--------------------------------------# 117 | # 池化层 118 | #--------------------------------------# 119 | elif 'MaxPooling' in str(l): 120 | i_shape = l.get_input_shape_at(0)[1:4] 121 | o_shape = l.get_output_shape_at(0)[1:4] 122 | 123 | #--------------------------------------# 124 | # 池化层 125 | #--------------------------------------# 126 | elif ('AveragePooling' in str(l) and 'Global' not in str(l)): 127 | strides = l.strides 128 | ks = l.pool_size 129 | 130 | for i in range(len(l._inbound_nodes)): 131 | i_shape = l.get_input_shape_at(i)[1:4] 132 | o_shape = l.get_output_shape_at(i)[1:4] 133 | 134 | flops += o_shape[0] * o_shape[1] * o_shape[2] 135 | 136 | #--------------------------------------# 137 | # 全局池化层 138 | #--------------------------------------# 139 | elif ('AveragePooling' in str(l) and 'Global' in str(l)): 140 | for i in range(len(l._inbound_nodes)): 141 | i_shape = l.get_input_shape_at(i)[1:4] 142 | o_shape = l.get_output_shape_at(i)[1:4] 143 | 144 | flops += (i_shape[0] * i_shape[1] + 1) * i_shape[2] 145 | 146 | #--------------------------------------# 147 | # 标准化层 148 | #--------------------------------------# 149 | elif ('BatchNormalization' in str(l)): 150 | for i in range(len(l._inbound_nodes)): 151 | i_shape = l.get_input_shape_at(i)[1:4] 152 | o_shape = l.get_output_shape_at(i)[1:4] 153 | 154 | temp_flops = 1 155 | for i in range(len(i_shape)): 156 | temp_flops *= i_shape[i] 157 | temp_flops *= 2 158 | 159 | flops += temp_flops 160 | 161 | #--------------------------------------# 162 | # 全连接层 163 | #--------------------------------------# 164 | elif ('Dense' in str(l)): 165 | for i in range(len(l._inbound_nodes)): 166 | i_shape = l.get_input_shape_at(i)[1:4] 167 | o_shape = l.get_output_shape_at(i)[1:4] 168 | 169 | temp_flops = 1 170 | for i in range(len(o_shape)): 171 | temp_flops *= o_shape[i] 172 | 173 | if (i_shape[-1] == None): 174 | temp_flops = temp_flops * o_shape[-1] 175 | else: 176 | temp_flops = temp_flops * i_shape[-1] 177 | flops += temp_flops 178 | 179 | #--------------------------------------# 180 | # 普通卷积层 181 | #--------------------------------------# 182 | elif ('Conv2D' in str(l) and 'DepthwiseConv2D' not in str(l) and 'SeparableConv2D' not in str(l)): 183 | strides = l.strides 184 | ks = l.kernel_size 185 | filters = l.filters 186 | bias = 1 if l.use_bias else 0 187 | 188 | for i in range(len(l._inbound_nodes)): 189 | i_shape = l.get_input_shape_at(i)[1:4] 190 | o_shape = l.get_output_shape_at(i)[1:4] 191 | 192 | if (filters == None): 193 | filters = i_shape[2] 194 | flops += filters * o_shape[0] * o_shape[1] * (ks[0] * ks[1] * i_shape[2] + bias) 195 | 196 | #--------------------------------------# 197 | # 逐层卷积层 198 | #--------------------------------------# 199 | elif ('Conv2D' in str(l) and 'DepthwiseConv2D' in str(l) and 'SeparableConv2D' not in str(l)): 200 | strides = l.strides 201 | ks = l.kernel_size 202 | filters = l.filters 203 | bias = 1 if l.use_bias else 0 204 | 205 | for i in range(len(l._inbound_nodes)): 206 | i_shape = l.get_input_shape_at(i)[1:4] 207 | o_shape = l.get_output_shape_at(i)[1:4] 208 | 209 | if (filters == None): 210 | filters = i_shape[2] 211 | flops += filters * o_shape[0] * o_shape[1] * (ks[0] * ks[1] + bias) 212 | 213 | #--------------------------------------# 214 | # 深度可分离卷积层 215 | #--------------------------------------# 216 | elif ('Conv2D' in str(l) and 'DepthwiseConv2D' not in str(l) and 'SeparableConv2D' in str(l)): 217 | strides = l.strides 218 | ks = l.kernel_size 219 | filters = l.filters 220 | 221 | for i in range(len(l._inbound_nodes)): 222 | i_shape = l.get_input_shape_at(i)[1:4] 223 | o_shape = l.get_output_shape_at(i)[1:4] 224 | 225 | if (filters == None): 226 | filters = i_shape[2] 227 | flops += i_shape[2] * o_shape[0] * o_shape[1] * (ks[0] * ks[1] + bias) + \ 228 | filters * o_shape[0] * o_shape[1] * (1 * 1 * i_shape[2] + bias) 229 | #--------------------------------------# 230 | # 模型中有模型时 231 | #--------------------------------------# 232 | elif 'Model' in str(l): 233 | flops = net_flops(l, print_result=False) 234 | 235 | t_flops += flops 236 | 237 | if (table == True): 238 | print('%25s | %16s | %16s | %16s | %16s | %6s | %5.4f' % ( 239 | name[:25], str(i_shape), str(o_shape), str(ks), str(filters), str(strides), flops)) 240 | 241 | except: 242 | pass 243 | 244 | t_flops = t_flops * 2 245 | if print_result: 246 | show_flops = t_flops / factor 247 | print('Total GFLOPs: %.3fG' % (show_flops)) 248 | return t_flops -------------------------------------------------------------------------------- /utils/utils_fit.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import tensorflow as tf 4 | from tqdm import tqdm 5 | 6 | 7 | def get_train_step_fn(strategy): 8 | @tf.function 9 | def train_step(images, labels, net, optimizer, loss, aux_branch, metrics): 10 | with tf.GradientTape() as tape: 11 | prediction = net(images, training=True) 12 | if aux_branch: 13 | aux_loss = loss(labels, prediction[0]) 14 | main_loss = loss(labels, prediction[1]) 15 | loss_value = 0.4 * aux_loss + main_loss 16 | else: 17 | loss_value = loss(labels, prediction) 18 | grads = tape.gradient(loss_value, net.trainable_variables) 19 | optimizer.apply_gradients(zip(grads, net.trainable_variables)) 20 | 21 | if aux_branch: 22 | _f_score = tf.reduce_mean(metrics(labels, prediction[1])) 23 | else: 24 | _f_score = tf.reduce_mean(metrics(labels, prediction)) 25 | return loss_value, _f_score 26 | if strategy == None: 27 | return train_step 28 | else: 29 | #----------------------# 30 | # 多gpu训练 31 | #----------------------# 32 | @tf.function 33 | def distributed_train_step(images, labels, net, optimizer, loss, aux_branch, metrics): 34 | per_replica_losses, per_replica_score = strategy.run(train_step, args=(images, labels, net, optimizer, loss, aux_branch, metrics)) 35 | return strategy.reduce(tf.distribute.ReduceOp.MEAN, per_replica_losses, axis=None), strategy.reduce(tf.distribute.ReduceOp.MEAN, per_replica_score, axis=None) 36 | return distributed_train_step 37 | 38 | #----------------------# 39 | # 防止bug 40 | #----------------------# 41 | def get_val_step_fn(strategy): 42 | @tf.function 43 | def val_step(images, labels, net, optimizer, loss, aux_branch, metrics): 44 | prediction = net(images, training=False) 45 | if aux_branch: 46 | aux_loss = loss(labels, prediction[0]) 47 | main_loss = loss(labels, prediction[1]) 48 | loss_value = 0.4 * aux_loss + main_loss 49 | _f_score = tf.reduce_mean(metrics(labels, prediction[1])) 50 | else: 51 | loss_value = loss(labels, prediction) 52 | _f_score = tf.reduce_mean(metrics(labels, prediction)) 53 | 54 | return loss_value, _f_score 55 | if strategy == None: 56 | return val_step 57 | else: 58 | #----------------------# 59 | # 多gpu验证 60 | #----------------------# 61 | @tf.function 62 | def distributed_val_step(images, labels, net, optimizer, loss, aux_branch, metrics): 63 | per_replica_losses, per_replica_score = strategy.run(val_step, args=(images, labels, net, optimizer, loss, aux_branch, metrics)) 64 | return strategy.reduce(tf.distribute.ReduceOp.MEAN, per_replica_losses, axis=None), strategy.reduce(tf.distribute.ReduceOp.MEAN, per_replica_score, axis=None) 65 | return distributed_val_step 66 | 67 | def fit_one_epoch(net, loss, loss_history, eval_callback, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, Epoch, aux_branch, metrics, save_period, save_dir, strategy): 68 | train_step = get_train_step_fn(strategy) 69 | val_step = get_val_step_fn(strategy) 70 | 71 | total_loss = 0 72 | val_loss = 0 73 | total_f_score = 0 74 | val_f_score = 0 75 | print('Start Train') 76 | with tqdm(total=epoch_step,desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3) as pbar: 77 | for iteration, batch in enumerate(gen): 78 | if iteration >= epoch_step: 79 | break 80 | images, labels = batch[0], batch[1] 81 | loss_value, _f_score = train_step(images, labels, net, optimizer, loss, aux_branch, metrics) 82 | total_loss += loss_value.numpy() 83 | total_f_score += _f_score.numpy() 84 | 85 | pbar.set_postfix(**{'total Loss' : total_loss / (iteration + 1), 86 | 'total f_score' : total_f_score / (iteration + 1), 87 | 'lr' : optimizer._decayed_lr(tf.float32).numpy()}) 88 | pbar.update(1) 89 | print('Finish Train') 90 | 91 | print('Start Validation') 92 | with tqdm(total=epoch_step_val, desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3) as pbar: 93 | for iteration, batch in enumerate(gen_val): 94 | if iteration >= epoch_step_val: 95 | break 96 | images, labels = batch[0], batch[1] 97 | loss_value, _f_score = val_step(images, labels, net, optimizer, loss, aux_branch, metrics) 98 | val_loss += loss_value.numpy() 99 | val_f_score += _f_score.numpy() 100 | 101 | pbar.set_postfix(**{'val Loss' : val_loss / (iteration + 1), 102 | 'val f_score' : val_f_score / (iteration + 1)}) 103 | pbar.update(1) 104 | print('Finish Validation') 105 | 106 | logs = {'loss': total_loss / epoch_step, 'val_loss': val_loss / epoch_step_val} 107 | loss_history.on_epoch_end([], logs) 108 | eval_callback.on_epoch_end(epoch, logs) 109 | print('Epoch:'+ str(epoch+1) + '/' + str(Epoch)) 110 | print('Total Loss: %.3f || Val Loss: %.3f ' % (total_loss / epoch_step, val_loss / epoch_step_val)) 111 | 112 | #-----------------------------------------------# 113 | # 保存权值 114 | #-----------------------------------------------# 115 | if (epoch + 1) % save_period == 0 or epoch + 1 == Epoch: 116 | net.save_weights(os.path.join(save_dir, 'ep%03d-loss%.3f-val_loss%.3f.h5' % (epoch + 1, total_loss / epoch_step, val_loss / epoch_step_val))) 117 | 118 | if len(loss_history.val_loss) <= 1 or (val_loss / epoch_step_val) <= min(loss_history.val_loss): 119 | print('Save best model to best_epoch_weights.pth') 120 | net.save_weights(os.path.join(save_dir, "best_epoch_weights.h5")) 121 | 122 | net.save_weights(os.path.join(save_dir, "last_epoch_weights.h5")) -------------------------------------------------------------------------------- /utils/utils_metrics.py: -------------------------------------------------------------------------------- 1 | import csv 2 | import os 3 | from os.path import join 4 | 5 | import matplotlib.pyplot as plt 6 | import numpy as np 7 | from tensorflow.keras import backend 8 | from PIL import Image 9 | 10 | 11 | def Iou_score(smooth = 1e-5, threhold = 0.5): 12 | def _Iou_score(y_true, y_pred): 13 | # score calculation 14 | y_pred = backend.greater(y_pred, threhold) 15 | y_pred = backend.cast(y_pred, backend.floatx()) 16 | intersection = backend.sum(y_true[...,:-1] * y_pred, axis=[0,1,2]) 17 | union = backend.sum(y_true[...,:-1] + y_pred, axis=[0,1,2]) - intersection 18 | 19 | score = (intersection + smooth) / (union + smooth) 20 | return score 21 | return _Iou_score 22 | 23 | def f_score(beta=1, smooth = 1e-5, threhold = 0.5): 24 | def _f_score(y_true, y_pred): 25 | y_pred = backend.greater(y_pred, threhold) 26 | y_pred = backend.cast(y_pred, backend.floatx()) 27 | 28 | tp = backend.sum(y_true[...,:-1] * y_pred, axis=[0,1,2]) 29 | fp = backend.sum(y_pred , axis=[0,1,2]) - tp 30 | fn = backend.sum(y_true[...,:-1], axis=[0,1,2]) - tp 31 | 32 | score = ((1 + beta ** 2) * tp + smooth) \ 33 | / ((1 + beta ** 2) * tp + beta ** 2 * fn + fp + smooth) 34 | return score 35 | return _f_score 36 | 37 | # 设标签宽W,长H 38 | def fast_hist(a, b, n): 39 | #--------------------------------------------------------------------------------# 40 | # a是转化成一维数组的标签,形状(H×W,);b是转化成一维数组的预测结果,形状(H×W,) 41 | #--------------------------------------------------------------------------------# 42 | k = (a >= 0) & (a < n) 43 | #--------------------------------------------------------------------------------# 44 | # np.bincount计算了从0到n**2-1这n**2个数中每个数出现的次数,返回值形状(n, n) 45 | # 返回中,写对角线上的为分类正确的像素点 46 | #--------------------------------------------------------------------------------# 47 | return np.bincount(n * a[k].astype(int) + b[k], minlength=n ** 2).reshape(n, n) 48 | 49 | def per_class_iu(hist): 50 | return np.diag(hist) / np.maximum((hist.sum(1) + hist.sum(0) - np.diag(hist)), 1) 51 | 52 | def per_class_PA_Recall(hist): 53 | return np.diag(hist) / np.maximum(hist.sum(1), 1) 54 | 55 | def per_class_Precision(hist): 56 | return np.diag(hist) / np.maximum(hist.sum(0), 1) 57 | 58 | def per_Accuracy(hist): 59 | return np.sum(np.diag(hist)) / np.maximum(np.sum(hist), 1) 60 | 61 | def compute_mIoU(gt_dir, pred_dir, png_name_list, num_classes, name_classes=None): 62 | print('Num classes', num_classes) 63 | #-----------------------------------------# 64 | # 创建一个全是0的矩阵,是一个混淆矩阵 65 | #-----------------------------------------# 66 | hist = np.zeros((num_classes, num_classes)) 67 | 68 | #------------------------------------------------# 69 | # 获得验证集标签路径列表,方便直接读取 70 | # 获得验证集图像分割结果路径列表,方便直接读取 71 | #------------------------------------------------# 72 | gt_imgs = [join(gt_dir, x + ".png") for x in png_name_list] 73 | pred_imgs = [join(pred_dir, x + ".png") for x in png_name_list] 74 | 75 | #------------------------------------------------# 76 | # 读取每一个(图片-标签)对 77 | #------------------------------------------------# 78 | for ind in range(len(gt_imgs)): 79 | #------------------------------------------------# 80 | # 读取一张图像分割结果,转化成numpy数组 81 | #------------------------------------------------# 82 | pred = np.array(Image.open(pred_imgs[ind])) 83 | #------------------------------------------------# 84 | # 读取一张对应的标签,转化成numpy数组 85 | #------------------------------------------------# 86 | label = np.array(Image.open(gt_imgs[ind])) 87 | 88 | # 如果图像分割结果与标签的大小不一样,这张图片就不计算 89 | if len(label.flatten()) != len(pred.flatten()): 90 | print( 91 | 'Skipping: len(gt) = {:d}, len(pred) = {:d}, {:s}, {:s}'.format( 92 | len(label.flatten()), len(pred.flatten()), gt_imgs[ind], 93 | pred_imgs[ind])) 94 | continue 95 | 96 | #------------------------------------------------# 97 | # 对一张图片计算21×21的hist矩阵,并累加 98 | #------------------------------------------------# 99 | hist += fast_hist(label.flatten(), pred.flatten(), num_classes) 100 | # 每计算10张就输出一下目前已计算的图片中所有类别平均的mIoU值 101 | if name_classes is not None and ind > 0 and ind % 10 == 0: 102 | print('{:d} / {:d}: mIou-{:0.2f}%; mPA-{:0.2f}%; Accuracy-{:0.2f}%'.format( 103 | ind, 104 | len(gt_imgs), 105 | 100 * np.nanmean(per_class_iu(hist)), 106 | 100 * np.nanmean(per_class_PA_Recall(hist)), 107 | 100 * per_Accuracy(hist) 108 | ) 109 | ) 110 | #------------------------------------------------# 111 | # 计算所有验证集图片的逐类别mIoU值 112 | #------------------------------------------------# 113 | IoUs = per_class_iu(hist) 114 | PA_Recall = per_class_PA_Recall(hist) 115 | Precision = per_class_Precision(hist) 116 | #------------------------------------------------# 117 | # 逐类别输出一下mIoU值 118 | #------------------------------------------------# 119 | if name_classes is not None: 120 | for ind_class in range(num_classes): 121 | print('===>' + name_classes[ind_class] + ':\tIou-' + str(round(IoUs[ind_class] * 100, 2)) \ 122 | + '; Recall (equal to the PA)-' + str(round(PA_Recall[ind_class] * 100, 2))+ '; Precision-' + str(round(Precision[ind_class] * 100, 2))) 123 | 124 | #-----------------------------------------------------------------# 125 | # 在所有验证集图像上求所有类别平均的mIoU值,计算时忽略NaN值 126 | #-----------------------------------------------------------------# 127 | print('===> mIoU: ' + str(round(np.nanmean(IoUs) * 100, 2)) + '; mPA: ' + str(round(np.nanmean(PA_Recall) * 100, 2)) + '; Accuracy: ' + str(round(per_Accuracy(hist) * 100, 2))) 128 | return np.array(hist, np.int), IoUs, PA_Recall, Precision 129 | 130 | def adjust_axes(r, t, fig, axes): 131 | bb = t.get_window_extent(renderer=r) 132 | text_width_inches = bb.width / fig.dpi 133 | current_fig_width = fig.get_figwidth() 134 | new_fig_width = current_fig_width + text_width_inches 135 | propotion = new_fig_width / current_fig_width 136 | x_lim = axes.get_xlim() 137 | axes.set_xlim([x_lim[0], x_lim[1] * propotion]) 138 | 139 | def draw_plot_func(values, name_classes, plot_title, x_label, output_path, tick_font_size = 12, plt_show = True): 140 | fig = plt.gcf() 141 | axes = plt.gca() 142 | plt.barh(range(len(values)), values, color='royalblue') 143 | plt.title(plot_title, fontsize=tick_font_size + 2) 144 | plt.xlabel(x_label, fontsize=tick_font_size) 145 | plt.yticks(range(len(values)), name_classes, fontsize=tick_font_size) 146 | r = fig.canvas.get_renderer() 147 | for i, val in enumerate(values): 148 | str_val = " " + str(val) 149 | if val < 1.0: 150 | str_val = " {0:.2f}".format(val) 151 | t = plt.text(val, i, str_val, color='royalblue', va='center', fontweight='bold') 152 | if i == (len(values)-1): 153 | adjust_axes(r, t, fig, axes) 154 | 155 | fig.tight_layout() 156 | fig.savefig(output_path) 157 | if plt_show: 158 | plt.show() 159 | plt.close() 160 | 161 | def show_results(miou_out_path, hist, IoUs, PA_Recall, Precision, name_classes, tick_font_size = 12): 162 | draw_plot_func(IoUs, name_classes, "mIoU = {0:.2f}%".format(np.nanmean(IoUs)*100), "Intersection over Union", \ 163 | os.path.join(miou_out_path, "mIoU.png"), tick_font_size = tick_font_size, plt_show = True) 164 | print("Save mIoU out to " + os.path.join(miou_out_path, "mIoU.png")) 165 | 166 | draw_plot_func(PA_Recall, name_classes, "mPA = {0:.2f}%".format(np.nanmean(PA_Recall)*100), "Pixel Accuracy", \ 167 | os.path.join(miou_out_path, "mPA.png"), tick_font_size = tick_font_size, plt_show = False) 168 | print("Save mPA out to " + os.path.join(miou_out_path, "mPA.png")) 169 | 170 | draw_plot_func(PA_Recall, name_classes, "mRecall = {0:.2f}%".format(np.nanmean(PA_Recall)*100), "Recall", \ 171 | os.path.join(miou_out_path, "Recall.png"), tick_font_size = tick_font_size, plt_show = False) 172 | print("Save Recall out to " + os.path.join(miou_out_path, "Recall.png")) 173 | 174 | draw_plot_func(Precision, name_classes, "mPrecision = {0:.2f}%".format(np.nanmean(Precision)*100), "Precision", \ 175 | os.path.join(miou_out_path, "Precision.png"), tick_font_size = tick_font_size, plt_show = False) 176 | print("Save Precision out to " + os.path.join(miou_out_path, "Precision.png")) 177 | 178 | with open(os.path.join(miou_out_path, "confusion_matrix.csv"), 'w', newline='') as f: 179 | writer = csv.writer(f) 180 | writer_list = [] 181 | writer_list.append([' '] + [str(c) for c in name_classes]) 182 | for i in range(len(hist)): 183 | writer_list.append([name_classes[i]] + [str(x) for x in hist[i]]) 184 | writer.writerows(writer_list) 185 | print("Save confusion_matrix out to " + os.path.join(miou_out_path, "confusion_matrix.csv")) 186 | -------------------------------------------------------------------------------- /voc_annotation.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | 4 | import numpy as np 5 | from PIL import Image 6 | from tqdm import tqdm 7 | 8 | #-------------------------------------------------------# 9 | # 想要增加测试集修改trainval_percent 10 | # 修改train_percent用于改变验证集的比例 9:1 11 | # 12 | # 当前该库将测试集当作验证集使用,不单独划分测试集 13 | #-------------------------------------------------------# 14 | trainval_percent = 1 15 | train_percent = 0.9 16 | #-------------------------------------------------------# 17 | # 指向VOC数据集所在的文件夹 18 | # 默认指向根目录下的VOC数据集 19 | #-------------------------------------------------------# 20 | VOCdevkit_path = 'VOCdevkit' 21 | 22 | if __name__ == "__main__": 23 | random.seed(0) 24 | print("Generate txt in ImageSets.") 25 | segfilepath = os.path.join(VOCdevkit_path, 'VOC2007/SegmentationClass') 26 | saveBasePath = os.path.join(VOCdevkit_path, 'VOC2007/ImageSets/Segmentation') 27 | 28 | temp_seg = os.listdir(segfilepath) 29 | total_seg = [] 30 | for seg in temp_seg: 31 | if seg.endswith(".png"): 32 | total_seg.append(seg) 33 | 34 | num = len(total_seg) 35 | list = range(num) 36 | tv = int(num*trainval_percent) 37 | tr = int(tv*train_percent) 38 | trainval= random.sample(list,tv) 39 | train = random.sample(trainval,tr) 40 | 41 | print("train and val size",tv) 42 | print("traub suze",tr) 43 | ftrainval = open(os.path.join(saveBasePath,'trainval.txt'), 'w') 44 | ftest = open(os.path.join(saveBasePath,'test.txt'), 'w') 45 | ftrain = open(os.path.join(saveBasePath,'train.txt'), 'w') 46 | fval = open(os.path.join(saveBasePath,'val.txt'), 'w') 47 | 48 | for i in list: 49 | name = total_seg[i][:-4]+'\n' 50 | if i in trainval: 51 | ftrainval.write(name) 52 | if i in train: 53 | ftrain.write(name) 54 | else: 55 | fval.write(name) 56 | else: 57 | ftest.write(name) 58 | 59 | ftrainval.close() 60 | ftrain.close() 61 | fval.close() 62 | ftest.close() 63 | print("Generate txt in ImageSets done.") 64 | 65 | print("Check datasets format, this may take a while.") 66 | print("检查数据集格式是否符合要求,这可能需要一段时间。") 67 | classes_nums = np.zeros([256], np.int) 68 | for i in tqdm(list): 69 | name = total_seg[i] 70 | png_file_name = os.path.join(segfilepath, name) 71 | if not os.path.exists(png_file_name): 72 | raise ValueError("未检测到标签图片%s,请查看具体路径下文件是否存在以及后缀是否为png。"%(png_file_name)) 73 | 74 | png = np.array(Image.open(png_file_name), np.uint8) 75 | if len(np.shape(png)) > 2: 76 | print("标签图片%s的shape为%s,不属于灰度图或者八位彩图,请仔细检查数据集格式。"%(name, str(np.shape(png)))) 77 | print("标签图片需要为灰度图或者八位彩图,标签的每个像素点的值就是这个像素点所属的种类。"%(name, str(np.shape(png)))) 78 | 79 | classes_nums += np.bincount(np.reshape(png, [-1]), minlength=256) 80 | 81 | print("打印像素点的值与数量。") 82 | print('-' * 37) 83 | print("| %15s | %15s |"%("Key", "Value")) 84 | print('-' * 37) 85 | for i in range(256): 86 | if classes_nums[i] > 0: 87 | print("| %15s | %15s |"%(str(i), str(classes_nums[i]))) 88 | print('-' * 37) 89 | 90 | if classes_nums[255] > 0 and classes_nums[0] > 0 and np.sum(classes_nums[1:255]) == 0: 91 | print("检测到标签中像素点的值仅包含0与255,数据格式有误。") 92 | print("二分类问题需要将标签修改为背景的像素点值为0,目标的像素点值为1。") 93 | elif classes_nums[0] > 0 and np.sum(classes_nums[1:]) == 0: 94 | print("检测到标签中仅仅包含背景像素点,数据格式有误,请仔细检查数据集格式。") 95 | 96 | print("JPEGImages中的图片应当为.jpg文件、SegmentationClass中的图片应当为.png文件。") 97 | print("如果格式有误,参考:") 98 | print("https://github.com/bubbliiiing/segmentation-format-fix") -------------------------------------------------------------------------------- /常见问题汇总.md: -------------------------------------------------------------------------------- 1 | 问题汇总的博客地址为[https://blog.csdn.net/weixin_44791964/article/details/107517428](https://blog.csdn.net/weixin_44791964/article/details/107517428)。 2 | 3 | # 问题汇总 4 | ## 1、下载问题 5 | ### a、代码下载 6 | **问:up主,可以给我发一份代码吗,代码在哪里下载啊? 7 | 答:Github上的地址就在视频简介里。复制一下就能进去下载了。** 8 | 9 | **问:up主,为什么我下载的代码提示压缩包损坏? 10 | 答:重新去Github下载。** 11 | 12 | **问:up主,为什么我下载的代码和你在视频以及博客上的代码不一样? 13 | 答:我常常会对代码进行更新,最终以实际的代码为准。** 14 | 15 | ### b、 权值下载 16 | **问:up主,为什么我下载的代码里面,model_data下面没有.pth或者.h5文件? 17 | 答:我一般会把权值上传到Github和百度网盘,在GITHUB的README里面就能找到。** 18 | 19 | ### c、 数据集下载 20 | **问:up主,XXXX数据集在哪里下载啊? 21 | 答:一般数据集的下载地址我会放在README里面,基本上都有,没有的话请及时联系我添加,直接发github的issue即可**。 22 | 23 | ## 2、环境配置问题 24 | ### a、现在库中所用的环境 25 | **pytorch代码对应的pytorch版本为1.2,博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/106037141](https://blog.csdn.net/weixin_44791964/article/details/106037141)。 26 | 27 | **keras代码对应的tensorflow版本为1.13.2,keras版本是2.1.5,博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/104702142](https://blog.csdn.net/weixin_44791964/article/details/104702142)。 28 | 29 | **tf2代码对应的tensorflow版本为2.2.0,无需安装keras,博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/109161493](https://blog.csdn.net/weixin_44791964/article/details/109161493)。 30 | 31 | **问:你的代码某某某版本的tensorflow和pytorch能用嘛? 32 | 答:最好按照我推荐的配置,配置教程也有!其它版本的我没有试过!可能出现问题但是一般问题不大。仅需要改少量代码即可。** 33 | 34 | ### b、30系列显卡环境配置 35 | 30系显卡由于框架更新不可使用上述环境配置教程。 36 | 当前我已经测试的可以用的30显卡配置如下: 37 | **pytorch代码对应的pytorch版本为1.7.0,cuda为11.0,cudnn为8.0.5**。 38 | 39 | **keras代码无法在win10下配置cuda11,在ubuntu下可以百度查询一下,配置tensorflow版本为1.15.4,keras版本是2.1.5或者2.3.1(少量函数接口不同,代码可能还需要少量调整。)** 40 | 41 | **tf2代码对应的tensorflow版本为2.4.0,cuda为11.0,cudnn为8.0.5**。 42 | 43 | ### c、GPU利用问题与环境使用问题 44 | **问:为什么我安装了tensorflow-gpu但是却没用利用GPU进行训练呢? 45 | 答:确认tensorflow-gpu已经装好,利用pip list查看tensorflow版本,然后查看任务管理器或者利用nvidia命令看看是否使用了gpu进行训练,任务管理器的话要看显存使用情况。** 46 | 47 | **问:up主,我好像没有在用gpu进行训练啊,怎么看是不是用了GPU进行训练? 48 | 答:查看是否使用GPU进行训练一般使用NVIDIA在命令行的查看命令,如果要看任务管理器的话,请看性能部分GPU的显存是否利用,或者查看任务管理器的Cuda,而非Copy。** 49 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20201013234241524.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70#pic_center) 50 | 51 | **问:up主,为什么我按照你的环境配置后还是不能使用? 52 | 答:请把你的GPU、CUDA、CUDNN、TF版本以及PYTORCH版本B站私聊告诉我。** 53 | 54 | **问:出现如下错误** 55 | ```python 56 | Traceback (most recent call last): 57 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in 58 | from tensorflow.python.pywrap_tensorflow_internal import * 59 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in 60 | pywrap_tensorflow_internal = swig_import_helper() 61 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper 62 | _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) 63 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\imp.py", line 243, in load_modulereturn load_dynamic(name, filename, file) 64 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\imp.py", line 343, in load_dynamic 65 | return _load(spec) 66 | ImportError: DLL load failed: 找不到指定的模块。 67 | ``` 68 | **答:如果没重启过就重启一下,否则重新按照步骤安装,还无法解决则把你的GPU、CUDA、CUDNN、TF版本以及PYTORCH版本私聊告诉我。** 69 | 70 | ### d、no module问题 71 | **问:为什么提示说no module name utils.utils(no module name nets.yolo、no module name nets.ssd等一系列问题)啊? 72 | 答:utils并不需要用pip装,它就在我上传的仓库的根目录,出现这个问题的原因是根目录不对,查查相对目录和根目录的概念。查了基本上就明白了。** 73 | 74 | **问:为什么提示说no module name matplotlib(no module name PIL,no module name cv2等等)? 75 | 答:这个库没安装打开命令行安装就好。pip install matplotlib** 76 | 77 | **问:为什么我已经用pip装了opencv(pillow、matplotlib等),还是提示no module name cv2? 78 | 答:没有激活环境装,要激活对应的conda环境进行安装才可以正常使用** 79 | 80 | **问:为什么提示说No module named 'torch' ? 81 | 答:其实我也真的很想知道为什么会有这个问题……这个pytorch没装是什么情况?一般就俩情况,一个是真的没装,还有一个是装到其它环境了,当前激活的环境不是自己装的环境。** 82 | 83 | **问:为什么提示说No module named 'tensorflow' ? 84 | 答:同上。** 85 | 86 | ### e、cuda安装失败问题 87 | 一般cuda安装前需要安装Visual Studio,装个2017版本即可。 88 | 89 | ### f、Ubuntu系统问题 90 | **所有代码在Ubuntu下可以使用,我两个系统都试过。** 91 | 92 | ### g、VSCODE提示错误的问题 93 | **问:为什么在VSCODE里面提示一大堆的错误啊? 94 | 答:我也提示一大堆的错误,但是不影响,是VSCODE的问题,如果不想看错误的话就装Pycharm。** 95 | 96 | ### h、使用cpu进行训练与预测的问题 97 | **对于keras和tf2的代码而言,如果想用cpu进行训练和预测,直接装cpu版本的tensorflow就可以了。** 98 | 99 | **对于pytorch的代码而言,如果想用cpu进行训练和预测,需要将cuda=True修改成cuda=False。** 100 | 101 | ### i、tqdm没有pos参数问题 102 | **问:运行代码提示'tqdm' object has no attribute 'pos'。 103 | 答:重装tqdm,换个版本就可以了。** 104 | 105 | ### j、提示decode(“utf-8”)的问题 106 | **由于h5py库的更新,安装过程中会自动安装h5py=3.0.0以上的版本,会导致decode("utf-8")的错误! 107 | 各位一定要在安装完tensorflow后利用命令装h5py=2.10.0!** 108 | ``` 109 | pip install h5py==2.10.0 110 | ``` 111 | 112 | ### k、提示TypeError: __array__() takes 1 positional argument but 2 were given错误 113 | 可以修改pillow版本解决。 114 | ``` 115 | pip install pillow==8.2.0 116 | ``` 117 | 118 | ### l、其它问题 119 | **问:为什么提示TypeError: cat() got an unexpected keyword argument 'axis',Traceback (most recent call last),AttributeError: 'Tensor' object has no attribute 'bool'? 120 | 答:这是版本问题,建议使用torch1.2以上版本** 121 | **其它有很多稀奇古怪的问题,很多是版本问题,建议按照我的视频教程安装Keras和tensorflow。比如装的是tensorflow2,就不用问我说为什么我没法运行Keras-yolo啥的。那是必然不行的。** 122 | 123 | ## 3、目标检测库问题汇总(人脸检测和分类库也可参考) 124 | ### a、shape不匹配问题 125 | #### 1)、训练时shape不匹配问题 126 | **问:up主,为什么运行train.py会提示shape不匹配啊? 127 | 答:在keras环境中,因为你训练的种类和原始的种类不同,网络结构会变化,所以最尾部的shape会有少量不匹配。** 128 | 129 | #### 2)、预测时shape不匹配问题 130 | **问:为什么我运行predict.py会提示我说shape不匹配呀。 131 | 在Pytorch里面是这样的:** 132 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171631901.png) 133 | 在Keras里面是这样的: 134 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171523380.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70) 135 | **答:原因主要有仨: 136 | 1、在ssd、FasterRCNN里面,可能是train.py里面的num_classes没改。 137 | 2、model_path没改。 138 | 3、classes_path没改。 139 | 请检查清楚了!确定自己所用的model_path和classes_path是对应的!训练的时候用到的num_classes或者classes_path也需要检查!** 140 | 141 | ### b、显存不足问题 142 | **问:为什么我运行train.py下面的命令行闪的贼快,还提示OOM啥的? 143 | 答:这是在keras中出现的,爆显存了,可以改小batch_size,SSD的显存占用率是最小的,建议用SSD; 144 | 2G显存:SSD、YOLOV4-TINY 145 | 4G显存:YOLOV3 146 | 6G显存:YOLOV4、Retinanet、M2det、Efficientdet、Faster RCNN等 147 | 8G+显存:随便选吧。** 148 | **需要注意的是,受到BatchNorm2d影响,batch_size不可为1,至少为2。** 149 | 150 | **问:为什么提示 RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 51.88 MiB free; 15.07 GiB reserved in total by PyTorch)? 151 | 答:这是pytorch中出现的,爆显存了,同上。** 152 | 153 | **问:为什么我显存都没利用,就直接爆显存了? 154 | 答:都爆显存了,自然就不利用了,模型没有开始训练。** 155 | ### c、训练问题(冻结训练,LOSS问题、训练效果问题等) 156 | **问:为什么要冻结训练和解冻训练呀? 157 | 答:这是迁移学习的思想,因为神经网络主干特征提取部分所提取到的特征是通用的,我们冻结起来训练可以加快训练效率,也可以防止权值被破坏。** 158 | 在冻结阶段,模型的主干被冻结了,特征提取网络不发生改变。占用的显存较小,仅对网络进行微调。 159 | 在解冻阶段,模型的主干不被冻结了,特征提取网络会发生改变。占用的显存较大,网络所有的参数都会发生改变。 160 | 161 | **问:为什么我的网络不收敛啊,LOSS是XXXX。 162 | 答:不同网络的LOSS不同,LOSS只是一个参考指标,用于查看网络是否收敛,而非评价网络好坏,我的yolo代码都没有归一化,所以LOSS值看起来比较高,LOSS的值不重要,重要的是是否在变小,预测是否有效果。** 163 | 164 | **问:为什么我的训练效果不好?预测了没有框(框不准)。 165 | 答:** 166 | 167 | 考虑几个问题: 168 | 1、目标信息问题,查看2007_train.txt文件是否有目标信息,没有的话请修改voc_annotation.py。 169 | 2、数据集问题,小于500的自行考虑增加数据集,同时测试不同的模型,确认数据集是好的。 170 | 3、是否解冻训练,如果数据集分布与常规画面差距过大需要进一步解冻训练,调整主干,加强特征提取能力。 171 | 4、网络问题,比如SSD不适合小目标,因为先验框固定了。 172 | 5、训练时长问题,有些同学只训练了几代表示没有效果,按默认参数训练完。 173 | 6、确认自己是否按照步骤去做了,如果比如voc_annotation.py里面的classes是否修改了等。 174 | 7、不同网络的LOSS不同,LOSS只是一个参考指标,用于查看网络是否收敛,而非评价网络好坏,LOSS的值不重要,重要的是是否收敛。 175 | 176 | **问:我怎么出现了gbk什么的编码错误啊:** 177 | ```python 178 | UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 446: illegal multibyte sequence 179 | ``` 180 | **答:标签和路径不要使用中文,如果一定要使用中文,请注意处理的时候编码的问题,改成打开文件的encoding方式改为utf-8。** 181 | 182 | **问:我的图片是xxx*xxx的分辨率的,可以用吗!** 183 | **答:可以用,代码里面会自动进行resize或者数据增强。** 184 | 185 | **问:怎么进行多GPU训练? 186 | 答:pytorch的大多数代码可以直接使用gpu训练,keras的话直接百度就好了,实现并不复杂,我没有多卡没法详细测试,还需要各位同学自己努力了。** 187 | ### d、灰度图问题 188 | **问:能不能训练灰度图(预测灰度图)啊? 189 | 答:我的大多数库会将灰度图转化成RGB进行训练和预测,如果遇到代码不能训练或者预测灰度图的情况,可以尝试一下在get_random_data里面将Image.open后的结果转换成RGB,预测的时候也这样试试。(仅供参考)** 190 | 191 | ### e、断点续练问题 192 | **问:我已经训练过几个世代了,能不能从这个基础上继续开始训练 193 | 答:可以,你在训练前,和载入预训练权重一样载入训练过的权重就行了。一般训练好的权重会保存在logs文件夹里面,将model_path修改成你要开始的权值的路径即可。** 194 | 195 | ### f、预训练权重的问题 196 | **问:如果我要训练其它的数据集,预训练权重要怎么办啊?** 197 | **答:数据的预训练权重对不同数据集是通用的,因为特征是通用的,预训练权重对于99%的情况都必须要用,不用的话权值太过随机,特征提取效果不明显,网络训练的结果也不会好。** 198 | 199 | **问:up,我修改了网络,预训练权重还能用吗? 200 | 答:修改了主干的话,如果不是用的现有的网络,基本上预训练权重是不能用的,要么就自己判断权值里卷积核的shape然后自己匹配,要么只能自己预训练去了;修改了后半部分的话,前半部分的主干部分的预训练权重还是可以用的,如果是pytorch代码的话,需要自己修改一下载入权值的方式,判断shape后载入,如果是keras代码,直接by_name=True,skip_mismatch=True即可。** 201 | 权值匹配的方式可以参考如下: 202 | ```python 203 | # 加快模型训练的效率 204 | print('Loading weights into state dict...') 205 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 206 | model_dict = model.state_dict() 207 | pretrained_dict = torch.load(model_path, map_location=device) 208 | a = {} 209 | for k, v in pretrained_dict.items(): 210 | try: 211 | if np.shape(model_dict[k]) == np.shape(v): 212 | a[k]=v 213 | except: 214 | pass 215 | model_dict.update(a) 216 | model.load_state_dict(model_dict) 217 | print('Finished!') 218 | ``` 219 | 220 | **问:我要怎么不使用预训练权重啊? 221 | 答:把载入预训练权重的代码注释了就行。** 222 | 223 | **问:为什么我不使用预训练权重效果这么差啊? 224 | 答:因为随机初始化的权值不好,提取的特征不好,也就导致了模型训练的效果不好,voc07+12、coco+voc07+12效果都不一样,预训练权重还是非常重要的。** 225 | 226 | ### g、视频检测问题与摄像头检测问题 227 | **问:怎么用摄像头检测呀? 228 | 答:predict.py修改参数可以进行摄像头检测,也有视频详细解释了摄像头检测的思路。** 229 | 230 | **问:怎么用视频检测呀? 231 | 答:同上** 232 | ### h、从0开始训练问题 233 | **问:怎么在模型上从0开始训练? 234 | 答:在算力不足与调参能力不足的情况下从0开始训练毫无意义。模型特征提取能力在随机初始化参数的情况下非常差。没有好的参数调节能力和算力,无法使得网络正常收敛。** 235 | 如果一定要从0开始,那么训练的时候请注意几点: 236 | - 不载入预训练权重。 237 | - 不要进行冻结训练,注释冻结模型的代码。 238 | 239 | **问:为什么我不使用预训练权重效果这么差啊? 240 | 答:因为随机初始化的权值不好,提取的特征不好,也就导致了模型训练的效果不好,voc07+12、coco+voc07+12效果都不一样,预训练权重还是非常重要的。** 241 | 242 | ### i、保存问题 243 | **问:检测完的图片怎么保存? 244 | 答:一般目标检测用的是Image,所以查询一下PIL库的Image如何进行保存。详细看看predict.py文件的注释。** 245 | 246 | **问:怎么用视频保存呀? 247 | 答:详细看看predict.py文件的注释。** 248 | 249 | ### j、遍历问题 250 | **问:如何对一个文件夹的图片进行遍历? 251 | 答:一般使用os.listdir先找出文件夹里面的所有图片,然后根据predict.py文件里面的执行思路检测图片就行了,详细看看predict.py文件的注释。** 252 | 253 | **问:如何对一个文件夹的图片进行遍历?并且保存。 254 | 答:遍历的话一般使用os.listdir先找出文件夹里面的所有图片,然后根据predict.py文件里面的执行思路检测图片就行了。保存的话一般目标检测用的是Image,所以查询一下PIL库的Image如何进行保存。如果有些库用的是cv2,那就是查一下cv2怎么保存图片。详细看看predict.py文件的注释。** 255 | 256 | ### k、路径问题(No such file or directory) 257 | **问:我怎么出现了这样的错误呀:** 258 | ```python 259 | FileNotFoundError: 【Errno 2】 No such file or directory 260 | …………………………………… 261 | …………………………………… 262 | ``` 263 | **答:去检查一下文件夹路径,查看是否有对应文件;并且检查一下2007_train.txt,其中文件路径是否有错。** 264 | 关于路径有几个重要的点: 265 | **文件夹名称中一定不要有空格。 266 | 注意相对路径和绝对路径。 267 | 多百度路径相关的知识。** 268 | 269 | **所有的路径问题基本上都是根目录问题,好好查一下相对目录的概念!** 270 | ### l、和原版比较问题 271 | **问:你这个代码和原版比怎么样,可以达到原版的效果么? 272 | 答:基本上可以达到,我都用voc数据测过,我没有好显卡,没有能力在coco上测试与训练。** 273 | 274 | **问:你有没有实现yolov4所有的tricks,和原版差距多少? 275 | 答:并没有实现全部的改进部分,由于YOLOV4使用的改进实在太多了,很难完全实现与列出来,这里只列出来了一些我比较感兴趣,而且非常有效的改进。论文中提到的SAM(注意力机制模块),作者自己的源码也没有使用。还有其它很多的tricks,不是所有的tricks都有提升,我也没法实现全部的tricks。至于和原版的比较,我没有能力训练coco数据集,根据使用过的同学反应差距不大。** 276 | 277 | ### m、FPS问题(检测速度问题) 278 | **问:你这个FPS可以到达多少,可以到 XX FPS么? 279 | 答:FPS和机子的配置有关,配置高就快,配置低就慢。** 280 | 281 | **问:为什么我用服务器去测试yolov4(or others)的FPS只有十几? 282 | 答:检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本,如果已经正确安装,可以去利用time.time()的方法查看detect_image里面,哪一段代码耗时更长(不仅只有网络耗时长,其它处理部分也会耗时,如绘图等)。** 283 | 284 | **问:为什么论文中说速度可以达到XX,但是这里却没有? 285 | 答:检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本,如果已经正确安装,可以去利用time.time()的方法查看detect_image里面,哪一段代码耗时更长(不仅只有网络耗时长,其它处理部分也会耗时,如绘图等)。有些论文还会使用多batch进行预测,我并没有去实现这个部分。** 286 | 287 | ### n、预测图片不显示问题 288 | **问:为什么你的代码在预测完成后不显示图片?只是在命令行告诉我有什么目标。 289 | 答:给系统安装一个图片查看器就行了。** 290 | 291 | ### o、算法评价问题(目标检测的map、PR曲线、Recall、Precision等) 292 | **问:怎么计算map? 293 | 答:看map视频,都一个流程。** 294 | 295 | **问:计算map的时候,get_map.py里面有一个MINOVERLAP是什么用的,是iou吗? 296 | 答:是iou,它的作用是判断预测框和真实框的重合成度,如果重合程度大于MINOVERLAP,则预测正确。** 297 | 298 | **问:为什么get_map.py里面的self.confidence(self.score)要设置的那么小? 299 | 答:看一下map的视频的原理部分,要知道所有的结果然后再进行pr曲线的绘制。** 300 | 301 | **问:能不能说说怎么绘制PR曲线啥的呀。 302 | 答:可以看mAP视频,结果里面有PR曲线。** 303 | 304 | **问:怎么计算Recall、Precision指标。 305 | 答:这俩指标应该是相对于特定的置信度的,计算map的时候也会获得。** 306 | 307 | ### p、coco数据集训练问题 308 | **问:目标检测怎么训练COCO数据集啊?。 309 | 答:coco数据训练所需要的txt文件可以参考qqwweee的yolo3的库,格式都是一样的。** 310 | 311 | ### q、模型优化(模型修改)问题 312 | **问:up,YOLO系列使用Focal LOSS的代码你有吗,有提升吗? 313 | 答:很多人试过,提升效果也不大(甚至变的更Low),它自己有自己的正负样本的平衡方式。** 314 | 315 | **问:up,我修改了网络,预训练权重还能用吗? 316 | 答:修改了主干的话,如果不是用的现有的网络,基本上预训练权重是不能用的,要么就自己判断权值里卷积核的shape然后自己匹配,要么只能自己预训练去了;修改了后半部分的话,前半部分的主干部分的预训练权重还是可以用的,如果是pytorch代码的话,需要自己修改一下载入权值的方式,判断shape后载入,如果是keras代码,直接by_name=True,skip_mismatch=True即可。** 317 | 权值匹配的方式可以参考如下: 318 | ```python 319 | # 加快模型训练的效率 320 | print('Loading weights into state dict...') 321 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 322 | model_dict = model.state_dict() 323 | pretrained_dict = torch.load(model_path, map_location=device) 324 | a = {} 325 | for k, v in pretrained_dict.items(): 326 | try: 327 | if np.shape(model_dict[k]) == np.shape(v): 328 | a[k]=v 329 | except: 330 | pass 331 | model_dict.update(a) 332 | model.load_state_dict(model_dict) 333 | print('Finished!') 334 | ``` 335 | 336 | **问:up,怎么修改模型啊,我想发个小论文! 337 | 答:建议看看yolov3和yolov4的区别,然后看看yolov4的论文,作为一个大型调参现场非常有参考意义,使用了很多tricks。我能给的建议就是多看一些经典模型,然后拆解里面的亮点结构并使用。** 338 | 339 | ### r、部署问题 340 | 我没有具体部署到手机等设备上过,所以很多部署问题我并不了解…… 341 | 342 | ## 4、语义分割库问题汇总 343 | ### a、shape不匹配问题 344 | #### 1)、训练时shape不匹配问题 345 | **问:up主,为什么运行train.py会提示shape不匹配啊? 346 | 答:在keras环境中,因为你训练的种类和原始的种类不同,网络结构会变化,所以最尾部的shape会有少量不匹配。** 347 | 348 | #### 2)、预测时shape不匹配问题 349 | **问:为什么我运行predict.py会提示我说shape不匹配呀。 350 | 在Pytorch里面是这样的:** 351 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171631901.png) 352 | 在Keras里面是这样的: 353 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171523380.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70) 354 | **答:原因主要有二: 355 | 1、train.py里面的num_classes没改。 356 | 2、预测时num_classes没改。 357 | 请检查清楚!训练和预测的时候用到的num_classes都需要检查!** 358 | 359 | ### b、显存不足问题 360 | **问:为什么我运行train.py下面的命令行闪的贼快,还提示OOM啥的? 361 | 答:这是在keras中出现的,爆显存了,可以改小batch_size。** 362 | 363 | **需要注意的是,受到BatchNorm2d影响,batch_size不可为1,至少为2。** 364 | 365 | **问:为什么提示 RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 51.88 MiB free; 15.07 GiB reserved in total by PyTorch)? 366 | 答:这是pytorch中出现的,爆显存了,同上。** 367 | 368 | **问:为什么我显存都没利用,就直接爆显存了? 369 | 答:都爆显存了,自然就不利用了,模型没有开始训练。** 370 | 371 | ### c、训练问题(冻结训练,LOSS问题、训练效果问题等) 372 | **问:为什么要冻结训练和解冻训练呀? 373 | 答:这是迁移学习的思想,因为神经网络主干特征提取部分所提取到的特征是通用的,我们冻结起来训练可以加快训练效率,也可以防止权值被破坏。** 374 | **在冻结阶段,模型的主干被冻结了,特征提取网络不发生改变。占用的显存较小,仅对网络进行微调。** 375 | **在解冻阶段,模型的主干不被冻结了,特征提取网络会发生改变。占用的显存较大,网络所有的参数都会发生改变。** 376 | 377 | **问:为什么我的网络不收敛啊,LOSS是XXXX。 378 | 答:不同网络的LOSS不同,LOSS只是一个参考指标,用于查看网络是否收敛,而非评价网络好坏,我的yolo代码都没有归一化,所以LOSS值看起来比较高,LOSS的值不重要,重要的是是否在变小,预测是否有效果。** 379 | 380 | **问:为什么我的训练效果不好?预测了没有目标,结果是一片黑。 381 | 答:** 382 | **考虑几个问题: 383 | 1、数据集问题,这是最重要的问题。小于500的自行考虑增加数据集;一定要检查数据集的标签,视频中详细解析了VOC数据集的格式,但并不是有输入图片有输出标签即可,还需要确认标签的每一个像素值是否为它对应的种类。很多同学的标签格式不对,最常见的错误格式就是标签的背景为黑,目标为白,此时目标的像素点值为255,无法正常训练,目标需要为1才行。 384 | 2、是否解冻训练,如果数据集分布与常规画面差距过大需要进一步解冻训练,调整主干,加强特征提取能力。 385 | 3、网络问题,可以尝试不同的网络。 386 | 4、训练时长问题,有些同学只训练了几代表示没有效果,按默认参数训练完。 387 | 5、确认自己是否按照步骤去做了。 388 | 6、不同网络的LOSS不同,LOSS只是一个参考指标,用于查看网络是否收敛,而非评价网络好坏,LOSS的值不重要,重要的是是否收敛。** 389 | 390 | 391 | 392 | **问:为什么我的训练效果不好?对小目标预测不准确。 393 | 答:对于deeplab和pspnet而言,可以修改一下downsample_factor,当downsample_factor为16的时候下采样倍数过多,效果不太好,可以修改为8。** 394 | 395 | **问:我怎么出现了gbk什么的编码错误啊:** 396 | ```python 397 | UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 446: illegal multibyte sequence 398 | ``` 399 | **答:标签和路径不要使用中文,如果一定要使用中文,请注意处理的时候编码的问题,改成打开文件的encoding方式改为utf-8。** 400 | 401 | **问:我的图片是xxx*xxx的分辨率的,可以用吗!** 402 | **答:可以用,代码里面会自动进行resize或者数据增强。** 403 | 404 | **问:怎么进行多GPU训练? 405 | 答:pytorch的大多数代码可以直接使用gpu训练,keras的话直接百度就好了,实现并不复杂,我没有多卡没法详细测试,还需要各位同学自己努力了。** 406 | 407 | ### d、灰度图问题 408 | **问:能不能训练灰度图(预测灰度图)啊? 409 | 答:我的大多数库会将灰度图转化成RGB进行训练和预测,如果遇到代码不能训练或者预测灰度图的情况,可以尝试一下在get_random_data里面将Image.open后的结果转换成RGB,预测的时候也这样试试。(仅供参考)** 410 | 411 | ### e、断点续练问题 412 | **问:我已经训练过几个世代了,能不能从这个基础上继续开始训练 413 | 答:可以,你在训练前,和载入预训练权重一样载入训练过的权重就行了。一般训练好的权重会保存在logs文件夹里面,将model_path修改成你要开始的权值的路径即可。** 414 | 415 | ### f、预训练权重的问题 416 | 417 | **问:如果我要训练其它的数据集,预训练权重要怎么办啊?** 418 | **答:数据的预训练权重对不同数据集是通用的,因为特征是通用的,预训练权重对于99%的情况都必须要用,不用的话权值太过随机,特征提取效果不明显,网络训练的结果也不会好。** 419 | 420 | **问:up,我修改了网络,预训练权重还能用吗? 421 | 答:修改了主干的话,如果不是用的现有的网络,基本上预训练权重是不能用的,要么就自己判断权值里卷积核的shape然后自己匹配,要么只能自己预训练去了;修改了后半部分的话,前半部分的主干部分的预训练权重还是可以用的,如果是pytorch代码的话,需要自己修改一下载入权值的方式,判断shape后载入,如果是keras代码,直接by_name=True,skip_mismatch=True即可。** 422 | 权值匹配的方式可以参考如下: 423 | 424 | ```python 425 | # 加快模型训练的效率 426 | print('Loading weights into state dict...') 427 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 428 | model_dict = model.state_dict() 429 | pretrained_dict = torch.load(model_path, map_location=device) 430 | a = {} 431 | for k, v in pretrained_dict.items(): 432 | try: 433 | if np.shape(model_dict[k]) == np.shape(v): 434 | a[k]=v 435 | except: 436 | pass 437 | model_dict.update(a) 438 | model.load_state_dict(model_dict) 439 | print('Finished!') 440 | ``` 441 | 442 | **问:我要怎么不使用预训练权重啊? 443 | 答:把载入预训练权重的代码注释了就行。** 444 | 445 | **问:为什么我不使用预训练权重效果这么差啊? 446 | 答:因为随机初始化的权值不好,提取的特征不好,也就导致了模型训练的效果不好,预训练权重还是非常重要的。** 447 | 448 | ### g、视频检测问题与摄像头检测问题 449 | **问:怎么用摄像头检测呀? 450 | 答:predict.py修改参数可以进行摄像头检测,也有视频详细解释了摄像头检测的思路。** 451 | 452 | **问:怎么用视频检测呀? 453 | 答:同上** 454 | 455 | ### h、从0开始训练问题 456 | **问:怎么在模型上从0开始训练? 457 | 答:在算力不足与调参能力不足的情况下从0开始训练毫无意义。模型特征提取能力在随机初始化参数的情况下非常差。没有好的参数调节能力和算力,无法使得网络正常收敛。** 458 | 如果一定要从0开始,那么训练的时候请注意几点: 459 | - 不载入预训练权重。 460 | - 不要进行冻结训练,注释冻结模型的代码。 461 | 462 | **问:为什么我不使用预训练权重效果这么差啊? 463 | 答:因为随机初始化的权值不好,提取的特征不好,也就导致了模型训练的效果不好,预训练权重还是非常重要的。** 464 | 465 | ### i、保存问题 466 | **问:检测完的图片怎么保存? 467 | 答:一般目标检测用的是Image,所以查询一下PIL库的Image如何进行保存。详细看看predict.py文件的注释。** 468 | 469 | **问:怎么用视频保存呀? 470 | 答:详细看看predict.py文件的注释。** 471 | 472 | ### j、遍历问题 473 | **问:如何对一个文件夹的图片进行遍历? 474 | 答:一般使用os.listdir先找出文件夹里面的所有图片,然后根据predict.py文件里面的执行思路检测图片就行了,详细看看predict.py文件的注释。** 475 | 476 | **问:如何对一个文件夹的图片进行遍历?并且保存。 477 | 答:遍历的话一般使用os.listdir先找出文件夹里面的所有图片,然后根据predict.py文件里面的执行思路检测图片就行了。保存的话一般目标检测用的是Image,所以查询一下PIL库的Image如何进行保存。如果有些库用的是cv2,那就是查一下cv2怎么保存图片。详细看看predict.py文件的注释。** 478 | 479 | ### k、路径问题(No such file or directory) 480 | **问:我怎么出现了这样的错误呀:** 481 | ```python 482 | FileNotFoundError: 【Errno 2】 No such file or directory 483 | …………………………………… 484 | …………………………………… 485 | ``` 486 | 487 | **答:去检查一下文件夹路径,查看是否有对应文件;并且检查一下2007_train.txt,其中文件路径是否有错。** 488 | 关于路径有几个重要的点: 489 | **文件夹名称中一定不要有空格。 490 | 注意相对路径和绝对路径。 491 | 多百度路径相关的知识。** 492 | 493 | **所有的路径问题基本上都是根目录问题,好好查一下相对目录的概念!** 494 | 495 | ### l、FPS问题(检测速度问题) 496 | **问:你这个FPS可以到达多少,可以到 XX FPS么? 497 | 答:FPS和机子的配置有关,配置高就快,配置低就慢。** 498 | 499 | **问:为什么论文中说速度可以达到XX,但是这里却没有? 500 | 答:检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本,如果已经正确安装,可以去利用time.time()的方法查看detect_image里面,哪一段代码耗时更长(不仅只有网络耗时长,其它处理部分也会耗时,如绘图等)。有些论文还会使用多batch进行预测,我并没有去实现这个部分。** 501 | 502 | ### m、预测图片不显示问题 503 | **问:为什么你的代码在预测完成后不显示图片?只是在命令行告诉我有什么目标。 504 | 答:给系统安装一个图片查看器就行了。** 505 | 506 | ### n、算法评价问题(miou) 507 | **问:怎么计算miou? 508 | 答:参考视频里的miou测量部分。** 509 | 510 | **问:怎么计算Recall、Precision指标。 511 | 答:现有的代码还无法获得,需要各位同学理解一下混淆矩阵的概念,然后自行计算一下。** 512 | 513 | ### o、模型优化(模型修改)问题 514 | **问:up,我修改了网络,预训练权重还能用吗? 515 | 答:修改了主干的话,如果不是用的现有的网络,基本上预训练权重是不能用的,要么就自己判断权值里卷积核的shape然后自己匹配,要么只能自己预训练去了;修改了后半部分的话,前半部分的主干部分的预训练权重还是可以用的,如果是pytorch代码的话,需要自己修改一下载入权值的方式,判断shape后载入,如果是keras代码,直接by_name=True,skip_mismatch=True即可。** 516 | 权值匹配的方式可以参考如下: 517 | 518 | ```python 519 | # 加快模型训练的效率 520 | print('Loading weights into state dict...') 521 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 522 | model_dict = model.state_dict() 523 | pretrained_dict = torch.load(model_path, map_location=device) 524 | a = {} 525 | for k, v in pretrained_dict.items(): 526 | try: 527 | if np.shape(model_dict[k]) == np.shape(v): 528 | a[k]=v 529 | except: 530 | pass 531 | model_dict.update(a) 532 | model.load_state_dict(model_dict) 533 | print('Finished!') 534 | ``` 535 | 536 | 537 | 538 | **问:up,怎么修改模型啊,我想发个小论文! 539 | 答:建议看看目标检测中yolov4的论文,作为一个大型调参现场非常有参考意义,使用了很多tricks。我能给的建议就是多看一些经典模型,然后拆解里面的亮点结构并使用。常用的tricks如注意力机制什么的,可以试试。** 540 | 541 | ### p、部署问题 542 | 我没有具体部署到手机等设备上过,所以很多部署问题我并不了解…… 543 | 544 | ## 5、交流群问题 545 | **问:up,有没有QQ群啥的呢? 546 | 答:没有没有,我没有时间管理QQ群……** 547 | 548 | ## 6、怎么学习的问题 549 | **问:up,你的学习路线怎么样的?我是个小白我要怎么学? 550 | 答:这里有几点需要注意哈 551 | 1、我不是高手,很多东西我也不会,我的学习路线也不一定适用所有人。 552 | 2、我实验室不做深度学习,所以我很多东西都是自学,自己摸索,正确与否我也不知道。 553 | 3、我个人觉得学习更靠自学** 554 | 学习路线的话,我是先学习了莫烦的python教程,从tensorflow、keras、pytorch入门,入门完之后学的SSD,YOLO,然后了解了很多经典的卷积网,后面就开始学很多不同的代码了,我的学习方法就是一行一行的看,了解整个代码的执行流程,特征层的shape变化等,花了很多时间也没有什么捷径,就是要花时间吧。 --------------------------------------------------------------------------------