├── .gitignore
├── LICENSE
├── README.md
├── VOCdevkit
    └── VOC2007
    │   ├── ImageSets
    │       └── Segmentation
    │       │   └── README.md
    │   ├── JPEGImages
    │       └── README.md
    │   └── SegmentationClass
    │       └── README.md
├── datasets
    ├── JPEGImages
    │   └── 1.jpg
    ├── SegmentationClass
    │   └── 1.png
    └── before
    │   ├── 1.jpg
    │   └── 1.json
├── get_miou.py
├── img
    └── street.jpg
├── json_to_dataset.py
├── logs
    └── README.md
├── model_data
    ├── README.md
    └── pspnet_mobilenetv2.h5
├── nets
    ├── __init__.py
    ├── mobilenetv2.py
    ├── pspnet.py
    ├── pspnet_training.py
    └── resnet50.py
├── predict.py
├── pspnet.py
├── requirements.txt
├── summary.py
├── train.py
├── utils
    ├── __init__.py
    ├── callbacks.py
    ├── dataloader.py
    ├── utils.py
    ├── utils_fit.py
    └── utils_metrics.py
├── voc_annotation.py
└── 常见问题汇总.md


/.gitignore:
--------------------------------------------------------------------------------
  1 | # ignore map, miou, datasets
  2 | map_out/
  3 | miou_out/
  4 | VOCdevkit/
  5 | datasets/
  6 | Medical_Datasets/
  7 | lfw/
  8 | logs/
  9 | model_data/
 10 | .temp_miou_out/
 11 | 
 12 | # Byte-compiled / optimized / DLL files
 13 | __pycache__/
 14 | *.py[cod]
 15 | *$py.class
 16 | 
 17 | # C extensions
 18 | *.so
 19 | 
 20 | # Distribution / packaging
 21 | .Python
 22 | build/
 23 | develop-eggs/
 24 | dist/
 25 | downloads/
 26 | eggs/
 27 | .eggs/
 28 | lib/
 29 | lib64/
 30 | parts/
 31 | sdist/
 32 | var/
 33 | wheels/
 34 | pip-wheel-metadata/
 35 | share/python-wheels/
 36 | *.egg-info/
 37 | .installed.cfg
 38 | *.egg
 39 | MANIFEST
 40 | 
 41 | # PyInstaller
 42 | #  Usually these files are written by a python script from a template
 43 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 44 | *.manifest
 45 | *.spec
 46 | 
 47 | # Installer logs
 48 | pip-log.txt
 49 | pip-delete-this-directory.txt
 50 | 
 51 | # Unit test / coverage reports
 52 | htmlcov/
 53 | .tox/
 54 | .nox/
 55 | .coverage
 56 | .coverage.*
 57 | .cache
 58 | nosetests.xml
 59 | coverage.xml
 60 | *.cover
 61 | *.py,cover
 62 | .hypothesis/
 63 | .pytest_cache/
 64 | 
 65 | # Translations
 66 | *.mo
 67 | *.pot
 68 | 
 69 | # Django stuff:
 70 | *.log
 71 | local_settings.py
 72 | db.sqlite3
 73 | db.sqlite3-journal
 74 | 
 75 | # Flask stuff:
 76 | instance/
 77 | .webassets-cache
 78 | 
 79 | # Scrapy stuff:
 80 | .scrapy
 81 | 
 82 | # Sphinx documentation
 83 | docs/_build/
 84 | 
 85 | # PyBuilder
 86 | target/
 87 | 
 88 | # Jupyter Notebook
 89 | .ipynb_checkpoints
 90 | 
 91 | # IPython
 92 | profile_default/
 93 | ipython_config.py
 94 | 
 95 | # pyenv
 96 | .python-version
 97 | 
 98 | # pipenv
 99 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
100 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
101 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
102 | #   install all needed dependencies.
103 | #Pipfile.lock
104 | 
105 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
106 | __pypackages__/
107 | 
108 | # Celery stuff
109 | celerybeat-schedule
110 | celerybeat.pid
111 | 
112 | # SageMath parsed files
113 | *.sage.py
114 | 
115 | # Environments
116 | .env
117 | .venv
118 | env/
119 | venv/
120 | ENV/
121 | env.bak/
122 | venv.bak/
123 | 
124 | # Spyder project settings
125 | .spyderproject
126 | .spyproject
127 | 
128 | # Rope project settings
129 | .ropeproject
130 | 
131 | # mkdocs documentation
132 | /site
133 | 
134 | # mypy
135 | .mypy_cache/
136 | .dmypy.json
137 | dmypy.json
138 | 
139 | # Pyre type checker
140 | .pyre/
141 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Bubbliiiing
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ## PSPnet：Pyramid Scene Parsing Network目标检测模型在tensorflow2当中的实现
  2 | ---
  3 | 
  4 | ### 目录
  5 | 1. [仓库更新 Top News](#仓库更新)
  6 | 2. [相关仓库 Related code](#相关仓库)
  7 | 3. [性能情况 Performance](#性能情况)
  8 | 4. [所需环境 Environment](#所需环境)
  9 | 5. [文件下载 Download](#文件下载)
 10 | 6. [训练步骤 How2train](#训练步骤)
 11 | 7. [预测步骤 How2predict](#预测步骤)
 12 | 8. [评估步骤 miou](#评估步骤)
 13 | 9. [参考资料 Reference](#Reference)
 14 | 
 15 | ## Top News
 16 | **`2022-04`**:**支持多GPU训练。**  
 17 | 
 18 | **`2022-03`**:**进行大幅度更新、支持step、cos学习率下降法、支持adam、sgd优化器选择、支持学习率根据batch_size自适应调整。**  
 19 | BiliBili视频中的原仓库地址为：https://github.com/bubbliiiing/pspnet-tf2/tree/bilibili
 20 | 
 21 | **`2020-08`**:**创建仓库、支持多backbone、支持数据miou评估、标注数据处理、大量注释等。**  
 22 | 
 23 | ## 相关仓库
 24 | | 模型 | 路径 |
 25 | | :----- | :----- |
 26 | Unet | https://github.com/bubbliiiing/unet-tf2  
 27 | PSPnet | https://github.com/bubbliiiing/pspnet-tf2
 28 | deeplabv3+ | https://github.com/bubbliiiing/deeplabv3-plus-tf2
 29 | 
 30 | ### 性能情况
 31 | | 训练数据集 | 权值文件名称 | 测试数据集 | 输入图片大小 | mIOU | 
 32 | | :-----: | :-----: | :------: | :------: | :------: | 
 33 | | VOC12+SBD | [pspnet_mobilenetv2.h5](https://github.com/bubbliiiing/pspnet-tf2/releases/download/v1.0/pspnet_mobilenetv2.h5) | VOC-Val12 | 473x473| 71.04 | 
 34 | | VOC12+SBD | [pspnet_resnet50.h5](https://github.com/bubbliiiing/pspnet-tf2/releases/download/v1.0/pspnet_resnet50.h5) | VOC-Val12 | 473x473| 79.92 | 
 35 | 
 36 | ### 所需环境
 37 | tensorflow-gpu==2.2.0
 38 | 
 39 | ### 文件下载
 40 | 训练所需的pspnet_mobilenetv2.h5和pspnet_resnet50.h5可在百度网盘中下载。    
 41 | 链接: https://pan.baidu.com/s/1-sIjtenHU05JzVIjyFwvxQ 提取码: upft   
 42 | 
 43 | VOC拓展数据集的百度网盘如下：  
 44 | 链接: https://pan.baidu.com/s/1vkk3lMheUm6IjTXznlg7Ng 提取码: 44mk   
 45 | 
 46 | ### 训练步骤
 47 | #### a、训练voc数据集
 48 | 1、将我提供的voc数据集放入VOCdevkit中（无需运行voc_annotation.py）。  
 49 | 2、在train.py中设置对应参数，默认参数已经对应voc数据集所需要的参数了，所以只要修改backbone和model_path即可。  
 50 | 3、运行train.py进行训练。  
 51 | 
 52 | #### b、训练自己的数据集
 53 | 1、本文使用VOC格式进行训练。  
 54 | 2、训练前将标签文件放在VOCdevkit文件夹下的VOC2007文件夹下的SegmentationClass中。    
 55 | 3、训练前将图片文件放在VOCdevkit文件夹下的VOC2007文件夹下的JPEGImages中。    
 56 | 4、在训练前利用voc_annotation.py文件生成对应的txt。    
 57 | 5、在train.py文件夹下面，选择自己要使用的主干模型和下采样因子。本文提供的主干模型有mobilenet和resnet50。下采样因子可以在8和16中选择。需要注意的是，预训练模型需要和主干模型相对应。   
 58 | 6、注意修改train.py的num_classes为分类个数+1。    
 59 | 7、运行train.py即可开始训练。  
 60 | 
 61 | ### 预测步骤
 62 | #### a、使用预训练权重
 63 | 1. 下载完库后解压，如果想用backbone为mobilenet的进行预测，直接运行predict.py就可以了；如果想要利用backbone为resnet50的进行预测，在百度网盘下载pspnet_resnet50.h5，放入model_data，修改pspnet.py的backbone和model_path之后再运行predict.py，输入  
 64 | ```python
 65 | img/street.jpg
 66 | ```
 67 | 2. 在predict.py里面进行设置可以进行fps测试和video视频检测。    
 68 | #### b、使用自己训练的权重
 69 | 1. 按照训练步骤训练。    
 70 | 2. 在pspnet.py文件里面，在如下部分修改model_path和backbone使其对应训练好的文件；**model_path对应logs文件夹下面的权值文件，backbone是所使用的主干特征提取网络**。    
 71 | ```python
 72 | _defaults = {
 73 |     #-------------------------------------------------------------------#
 74 |     #   model_path指向logs文件夹下的权值文件
 75 |     #   训练好后logs文件夹下存在多个权值文件，选择验证集损失较低的即可。
 76 |     #   验证集损失较低不代表miou较高，仅代表该权值在验证集上泛化性能较好。
 77 |     #-------------------------------------------------------------------#
 78 |     "model_path"        : 'model_data/pspnet_mobilenetv2.h5',
 79 |     #----------------------------------------#
 80 |     #   所需要区分的类的个数+1
 81 |     #----------------------------------------#
 82 |     "num_classes"       : 21,
 83 |     #----------------------------------------#
 84 |     #   所使用的的主干网络：mobilenet、resnet50   
 85 |     #----------------------------------------#
 86 |     "backbone"          : "mobilenet",
 87 |     #----------------------------------------#
 88 |     #   输入图片的大小
 89 |     #----------------------------------------#
 90 |     "input_shape"       : [473, 473],
 91 |     #----------------------------------------#
 92 |     #   下采样的倍数，一般可选的为8和16
 93 |     #   与训练时设置的一样即可
 94 |     #----------------------------------------#
 95 |     "downsample_factor" : 16,
 96 |     #--------------------------------#
 97 |     #   blend参数用于控制是否
 98 |     #   让识别结果和原图混合
 99 |     #--------------------------------#
100 |     "blend"             : True,
101 | }
102 | ```
103 | 3. 运行predict.py，输入    
104 | ```python
105 | img/street.jpg
106 | ```    
107 | 4. 在predict.py里面进行设置可以进行fps测试和video视频检测。    
108 | 
109 | ### 评估步骤
110 | 1、设置get_miou.py里面的num_classes为预测的类的数量加1。  
111 | 2、设置get_miou.py里面的name_classes为需要去区分的类别。  
112 | 3、运行get_miou.py即可获得miou大小。  
113 | 
114 | ### Reference
115 | https://github.com/ggyyzm/pytorch_segmentation  
116 | https://github.com/bonlime/keras-deeplab-v3-plus
117 | 


--------------------------------------------------------------------------------
/VOCdevkit/VOC2007/ImageSets/Segmentation/README.md:
--------------------------------------------------------------------------------
1 | 存放的是指向文件名称的txt
2 | 
3 | 


--------------------------------------------------------------------------------
/VOCdevkit/VOC2007/JPEGImages/README.md:
--------------------------------------------------------------------------------
1 | 这里面存放的是训练用的图片文件。
2 | 


--------------------------------------------------------------------------------
/VOCdevkit/VOC2007/SegmentationClass/README.md:
--------------------------------------------------------------------------------
1 | 这里面存放的是训练过程中产生的权重。
2 | 


--------------------------------------------------------------------------------
/datasets/JPEGImages/1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bubbliiiing/pspnet-tf2/501b5a03cd085c22ffb63362e2c99fc2470d39de/datasets/JPEGImages/1.jpg


--------------------------------------------------------------------------------
/datasets/SegmentationClass/1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bubbliiiing/pspnet-tf2/501b5a03cd085c22ffb63362e2c99fc2470d39de/datasets/SegmentationClass/1.png


--------------------------------------------------------------------------------
/datasets/before/1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bubbliiiing/pspnet-tf2/501b5a03cd085c22ffb63362e2c99fc2470d39de/datasets/before/1.jpg


--------------------------------------------------------------------------------
/get_miou.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | import tensorflow as tf
 4 | from PIL import Image
 5 | from tqdm import tqdm
 6 | 
 7 | from pspnet import Pspnet
 8 | from utils.utils_metrics import compute_mIoU, show_results
 9 | 
10 | gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
11 | for gpu in gpus:
12 |     tf.config.experimental.set_memory_growth(gpu, True)
13 |     
14 | '''
15 | 进行指标评估需要注意以下几点：
16 | 1、该文件生成的图为灰度图，因为值比较小，按照PNG形式的图看是没有显示效果的，所以看到近似全黑的图是正常的。
17 | 2、该文件计算的是验证集的miou，当前该库将测试集当作验证集使用，不单独划分测试集
18 | '''
19 | if __name__ == "__main__":
20 |     #---------------------------------------------------------------------------#
21 |     #   miou_mode用于指定该文件运行时计算的内容
22 |     #   miou_mode为0代表整个miou计算流程，包括获得预测结果、计算miou。
23 |     #   miou_mode为1代表仅仅获得预测结果。
24 |     #   miou_mode为2代表仅仅计算miou。
25 |     #---------------------------------------------------------------------------#
26 |     miou_mode       = 0
27 |     #------------------------------#
28 |     #   分类个数+1、如2+1
29 |     #------------------------------#
30 |     num_classes     = 21
31 |     #--------------------------------------------#
32 |     #   区分的种类，和json_to_dataset里面的一样
33 |     #--------------------------------------------#
34 |     name_classes    = ["background","aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
35 |     # name_classes    = ["_background_","cat","dog"]
36 |     #-------------------------------------------------------#
37 |     #   指向VOC数据集所在的文件夹
38 |     #   默认指向根目录下的VOC数据集
39 |     #-------------------------------------------------------#
40 |     VOCdevkit_path  = 'VOCdevkit'
41 | 
42 |     image_ids       = open(os.path.join(VOCdevkit_path, "VOC2007/ImageSets/Segmentation/val.txt"),'r').read().splitlines() 
43 |     gt_dir          = os.path.join(VOCdevkit_path, "VOC2007/SegmentationClass/")
44 |     miou_out_path   = "miou_out"
45 |     pred_dir        = os.path.join(miou_out_path, 'detection-results')
46 | 
47 |     if miou_mode == 0 or miou_mode == 1:
48 |         if not os.path.exists(pred_dir):
49 |             os.makedirs(pred_dir)
50 |             
51 |         print("Load model.")
52 |         pspnet = Pspnet()
53 |         print("Load model done.")
54 | 
55 |         print("Get predict result.")
56 |         for image_id in tqdm(image_ids):
57 |             image_path  = os.path.join(VOCdevkit_path, "VOC2007/JPEGImages/"+image_id+".jpg")
58 |             image       = Image.open(image_path)
59 |             image       = pspnet.get_miou_png(image)
60 |             image.save(os.path.join(pred_dir, image_id + ".png"))
61 |         print("Get predict result done.")
62 | 
63 |     if miou_mode == 0 or miou_mode == 2:
64 |         print("Get miou.")
65 |         hist, IoUs, PA_Recall, Precision = compute_mIoU(gt_dir, pred_dir, image_ids, num_classes, name_classes)  # 执行计算mIoU的函数
66 |         print("Get miou done.")
67 |         show_results(miou_out_path, hist, IoUs, PA_Recall, Precision, name_classes)


--------------------------------------------------------------------------------
/img/street.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bubbliiiing/pspnet-tf2/501b5a03cd085c22ffb63362e2c99fc2470d39de/img/street.jpg


--------------------------------------------------------------------------------
/json_to_dataset.py:
--------------------------------------------------------------------------------
 1 | import base64
 2 | import json
 3 | import os
 4 | import os.path as osp
 5 | 
 6 | import numpy as np
 7 | import PIL.Image
 8 | from labelme import utils
 9 | 
10 | '''
11 | 制作自己的语义分割数据集需要注意以下几点：
12 | 1、我使用的labelme版本是3.16.7，建议使用该版本的labelme，有些版本的labelme会发生错误，
13 |    具体错误为：Too many dimensions: 3 > 2
14 |    安装方式为命令行pip install labelme==3.16.7
15 | 2、此处生成的标签图是8位彩色图，与视频中看起来的数据集格式不太一样。
16 |    虽然看起来是彩图，但事实上只有8位，此时每个像素点的值就是这个像素点所属的种类。
17 |    所以其实和视频中VOC数据集的格式一样。因此这样制作出来的数据集是可以正常使用的。也是正常的。
18 | '''
19 | if __name__ == '__main__':
20 |     jpgs_path   = "datasets/JPEGImages"
21 |     pngs_path   = "datasets/SegmentationClass"
22 |     classes     = ["_background_","aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
23 |     # classes     = ["_background_","cat","dog"]
24 |     
25 |     count = os.listdir("./datasets/before/") 
26 |     for i in range(0, len(count)):
27 |         path = os.path.join("./datasets/before", count[i])
28 | 
29 |         if os.path.isfile(path) and path.endswith('json'):
30 |             data = json.load(open(path))
31 |             
32 |             if data['imageData']:
33 |                 imageData = data['imageData']
34 |             else:
35 |                 imagePath = os.path.join(os.path.dirname(path), data['imagePath'])
36 |                 with open(imagePath, 'rb') as f:
37 |                     imageData = f.read()
38 |                     imageData = base64.b64encode(imageData).decode('utf-8')
39 | 
40 |             img = utils.img_b64_to_arr(imageData)
41 |             label_name_to_value = {'_background_': 0}
42 |             for shape in data['shapes']:
43 |                 label_name = shape['label']
44 |                 if label_name in label_name_to_value:
45 |                     label_value = label_name_to_value[label_name]
46 |                 else:
47 |                     label_value = len(label_name_to_value)
48 |                     label_name_to_value[label_name] = label_value
49 |             
50 |             # label_values must be dense
51 |             label_values, label_names = [], []
52 |             for ln, lv in sorted(label_name_to_value.items(), key=lambda x: x[1]):
53 |                 label_values.append(lv)
54 |                 label_names.append(ln)
55 |             assert label_values == list(range(len(label_values)))
56 |             
57 |             lbl = utils.shapes_to_label(img.shape, data['shapes'], label_name_to_value)
58 |             
59 |                 
60 |             PIL.Image.fromarray(img).save(osp.join(jpgs_path, count[i].split(".")[0]+'.jpg'))
61 | 
62 |             new = np.zeros([np.shape(img)[0],np.shape(img)[1]])
63 |             for name in label_names:
64 |                 index_json = label_names.index(name)
65 |                 index_all = classes.index(name)
66 |                 new = new + index_all*(np.array(lbl) == index_json)
67 | 
68 |             utils.lblsave(osp.join(pngs_path, count[i].split(".")[0]+'.png'), new)
69 |             print('Saved ' + count[i].split(".")[0] + '.jpg and ' + count[i].split(".")[0] + '.png')
70 | 


--------------------------------------------------------------------------------
/logs/README.md:
--------------------------------------------------------------------------------
1 | 这里面存放的是训练过程中产生的权重。
2 | 


--------------------------------------------------------------------------------
/model_data/README.md:
--------------------------------------------------------------------------------
1 | 这里面存放的是已经训练好的权重，可通过百度网盘下载。
2 | 


--------------------------------------------------------------------------------
/model_data/pspnet_mobilenetv2.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bubbliiiing/pspnet-tf2/501b5a03cd085c22ffb63362e2c99fc2470d39de/model_data/pspnet_mobilenetv2.h5


--------------------------------------------------------------------------------
/nets/__init__.py:
--------------------------------------------------------------------------------
1 | #


--------------------------------------------------------------------------------
/nets/mobilenetv2.py:
--------------------------------------------------------------------------------
  1 | from tensorflow.keras.activations import relu
  2 | from tensorflow.keras.layers import (Activation, Add, BatchNormalization,
  3 |                                      Conv2D, DepthwiseConv2D, Input)
  4 | 
  5 | 
  6 | def _make_divisible(v, divisor, min_value=None):
  7 |     if min_value is None:
  8 |         min_value = divisor
  9 |     new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
 10 |     if new_v < 0.9 * v:
 11 |         new_v += divisor
 12 |     return new_v
 13 | 
 14 | def relu6(x):
 15 |     return relu(x, max_value=6)
 16 | 
 17 | def _inverted_res_block(inputs, expansion, stride, alpha, in_filters, filters, block_id, skip_connection, rate=1):
 18 |     pointwise_conv_filters = int(filters * alpha)
 19 |     pointwise_filters = _make_divisible(pointwise_conv_filters, 8)
 20 |     x = inputs
 21 |     prefix = 'expanded_conv_{}_'.format(block_id)
 22 | 
 23 |     #----------------------------------------------------#
 24 |     #   利用1x1卷积根据输入进来的通道数进行通道数上升
 25 |     #----------------------------------------------------#
 26 |     if block_id:
 27 |         x = Conv2D(expansion * in_filters, kernel_size=1, padding='same',
 28 |                    use_bias=False, activation=None,
 29 |                    name=prefix + 'expand')(x)
 30 |         x = BatchNormalization(epsilon=1e-3, momentum=0.999,
 31 |                                name=prefix + 'expand_BN')(x)
 32 |         x = Activation(relu6, name=prefix + 'expand_relu')(x)
 33 |     else:
 34 |         prefix = 'expanded_conv_'
 35 | 
 36 |     #----------------------------------------------------#
 37 |     #   利用深度可分离卷积进行特征提取
 38 |     #----------------------------------------------------#
 39 |     x = DepthwiseConv2D(kernel_size=3, strides=stride, activation=None,
 40 |                         use_bias=False, padding='same', dilation_rate=(rate, rate),
 41 |                         name=prefix + 'depthwise')(x)
 42 |     x = BatchNormalization(epsilon=1e-3, momentum=0.999,
 43 |                            name=prefix + 'depthwise_BN')(x)
 44 | 
 45 |     x = Activation(relu6, name=prefix + 'depthwise_relu')(x)
 46 | 
 47 |     #----------------------------------------------------#
 48 |     #   利用1x1的卷积进行通道数的下降
 49 |     #----------------------------------------------------#
 50 |     x = Conv2D(pointwise_filters,
 51 |                kernel_size=1, padding='same', use_bias=False, activation=None,
 52 |                name=prefix + 'project')(x)
 53 |     x = BatchNormalization(epsilon=1e-3, momentum=0.999,
 54 |                            name=prefix + 'project_BN')(x)
 55 | 
 56 |     #----------------------------------------------------#
 57 |     #   添加残差边
 58 |     #----------------------------------------------------#
 59 |     if skip_connection:
 60 |         return Add(name=prefix + 'add')([inputs, x])
 61 |     return x
 62 | 
 63 | def get_mobilenet_encoder(inputs_size, downsample_factor=8):
 64 |     if downsample_factor == 16:
 65 |         block4_dilation = 1
 66 |         block5_dilation = 2
 67 |         block4_stride = 2
 68 |     elif downsample_factor == 8:
 69 |         block4_dilation = 2
 70 |         block5_dilation = 4
 71 |         block4_stride = 1
 72 |     else:
 73 |         raise ValueError('Unsupported factor - `{}`, Use 8 or 16.'.format(downsample_factor))
 74 |     
 75 |     # 473,473,3
 76 |     inputs = Input(shape=inputs_size)
 77 | 
 78 |     alpha=1.0
 79 |     first_block_filters = _make_divisible(32 * alpha, 8)
 80 | 
 81 |     # 473,473,3 -> 237,237,32
 82 |     x = Conv2D(first_block_filters,
 83 |                 kernel_size=3,
 84 |                 strides=(2, 2), padding='same',
 85 |                 use_bias=False, name='Conv')(inputs)
 86 |     x = BatchNormalization(epsilon=1e-3, momentum=0.999, name='Conv_BN')(x)
 87 |     x = Activation(relu6, name='Conv_Relu6')(x)
 88 | 
 89 |     # 237,237,32 -> 237,237,16
 90 |     x = _inverted_res_block(x, in_filters=32, filters=16, alpha=alpha, stride=1,
 91 |                             expansion=1, block_id=0, skip_connection=False)
 92 | 
 93 |     #---------------------------------------------------------------#
 94 |     # 237,237,16 -> 119,119,24
 95 |     x = _inverted_res_block(x, in_filters=16, filters=24, alpha=alpha, stride=2,
 96 |                             expansion=6, block_id=1, skip_connection=False)
 97 |     x = _inverted_res_block(x, in_filters=24, filters=24, alpha=alpha, stride=1,
 98 |                             expansion=6, block_id=2, skip_connection=True)
 99 |                             
100 |     #---------------------------------------------------------------#
101 |     # 119,119,24 -> 60,60.32
102 |     x = _inverted_res_block(x, in_filters=24, filters=32, alpha=alpha, stride=2,
103 |                             expansion=6, block_id=3, skip_connection=False)
104 |     x = _inverted_res_block(x, in_filters=32, filters=32, alpha=alpha, stride=1,
105 |                             expansion=6, block_id=4, skip_connection=True)
106 |     x = _inverted_res_block(x, in_filters=32, filters=32, alpha=alpha, stride=1,
107 |                             expansion=6, block_id=5, skip_connection=True)
108 | 
109 |     #---------------------------------------------------------------#
110 |     # 60,60,32 -> 30,30.64
111 |     x = _inverted_res_block(x, in_filters=32, filters=64, alpha=alpha, stride=block4_stride,
112 |                             expansion=6, block_id=6, skip_connection=False)
113 |     x = _inverted_res_block(x, in_filters=64, filters=64, alpha=alpha, stride=1, rate=block4_dilation,
114 |                             expansion=6, block_id=7, skip_connection=True)
115 |     x = _inverted_res_block(x, in_filters=64, filters=64, alpha=alpha, stride=1, rate=block4_dilation,
116 |                             expansion=6, block_id=8, skip_connection=True)
117 |     x = _inverted_res_block(x, in_filters=64, filters=64, alpha=alpha, stride=1, rate=block4_dilation,
118 |                             expansion=6, block_id=9, skip_connection=True)
119 | 
120 |     # 30,30.64 -> 30,30.96
121 |     x = _inverted_res_block(x, in_filters=64, filters=96, alpha=alpha, stride=1, rate=block4_dilation,
122 |                             expansion=6, block_id=10, skip_connection=False)
123 |     x = _inverted_res_block(x, in_filters=96, filters=96, alpha=alpha, stride=1, rate=block4_dilation,
124 |                             expansion=6, block_id=11, skip_connection=True)
125 |     x = _inverted_res_block(x, in_filters=96, filters=96, alpha=alpha, stride=1, rate=block4_dilation,
126 |                             expansion=6, block_id=12, skip_connection=True)
127 |     # 辅助分支训练
128 |     f4 = x
129 | 
130 |     #---------------------------------------------------------------#
131 |     # 30,30.96 -> 30,30,160 -> 30,30,320
132 |     x = _inverted_res_block(x, in_filters=96, filters=160, alpha=alpha, stride=1, rate=block4_dilation,
133 |                             expansion=6, block_id=13, skip_connection=False)
134 |     x = _inverted_res_block(x, in_filters=160, filters=160, alpha=alpha, stride=1, rate=block5_dilation,
135 |                             expansion=6, block_id=14, skip_connection=True)
136 |     x = _inverted_res_block(x, in_filters=160, filters=160, alpha=alpha, stride=1, rate=block5_dilation,
137 |                             expansion=6, block_id=15, skip_connection=True)
138 | 
139 |     x = _inverted_res_block(x, in_filters=160, filters=320, alpha=alpha, stride=1, rate=block5_dilation,
140 |                             expansion=6, block_id=16, skip_connection=False)
141 |     f5 = x
142 |     return inputs, f4, f5
143 | 


--------------------------------------------------------------------------------
/nets/pspnet.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import tensorflow as tf
  3 | import tensorflow.keras.backend as K
  4 | from tensorflow.keras.layers import *
  5 | from tensorflow.keras.models import *
  6 | 
  7 | from nets.mobilenetv2 import get_mobilenet_encoder
  8 | from nets.resnet50 import get_resnet50_encoder
  9 | 
 10 | def pool_block(feats, pool_factor, out_channel):
 11 | 	h = K.int_shape(feats)[1]
 12 | 	w = K.int_shape(feats)[2]
 13 |     #-----------------------------------------------------#
 14 |     #   分区域进行平均池化
 15 |     #   strides     = [30,30], [15,15], [10,10], [5, 5]
 16 |     #   poolsize    = 30/1=30  30/2=15  30/3=10  30/6=5
 17 |     #-----------------------------------------------------#
 18 | 	pool_size = strides = [int(np.round(float(h)/pool_factor)),int(np.round(float(w)/pool_factor))]
 19 | 	x = AveragePooling2D(pool_size, strides=strides, padding='same')(feats)
 20 | 
 21 |     #-----------------------------------------------------#
 22 |     #   利用1x1卷积进行通道数的调整
 23 |     #-----------------------------------------------------#
 24 | 	x = Conv2D(out_channel//4, (1 ,1), padding='same', use_bias=False)(x)
 25 | 	x = BatchNormalization()(x)
 26 | 	x = Activation('relu' )(x)
 27 | 
 28 |     #-----------------------------------------------------#
 29 |     #   利用resize扩大特征层面积
 30 |     #-----------------------------------------------------#
 31 | 	x = Lambda(lambda x: tf.compat.v1.image.resize_images(x, (K.int_shape(feats)[1], K.int_shape(feats)[2]), align_corners=True))(x)
 32 | 	return x
 33 | 
 34 | def pspnet(input_shape, num_classes, backbone='mobilenet', downsample_factor=8, aux_branch=True):
 35 | 	if backbone == "mobilenet":
 36 |         #----------------------------------#
 37 |         #   获得两个特征层
 38 |         #   f4为辅助分支    [30,30,96]
 39 |         #   o为主干部分     [30,30,320]
 40 |         #----------------------------------#
 41 | 		img_input, f4, o = get_mobilenet_encoder(input_shape, downsample_factor=downsample_factor)
 42 | 		out_channel = 320
 43 | 	elif backbone == "resnet50":
 44 | 		img_input, f4, o = get_resnet50_encoder(input_shape, downsample_factor=downsample_factor)
 45 | 		out_channel = 2048
 46 | 	else:
 47 | 		raise ValueError('Unsupported backbone - `{}`, Use mobilenet, resnet50.'.format(backbone))
 48 | 
 49 |     #--------------------------------------------------------------#
 50 |     #	PSP模块，分区域进行池化
 51 |     #   分别分割成1x1的区域，2x2的区域，3x3的区域，6x6的区域
 52 |     #--------------------------------------------------------------#
 53 | 	pool_factors = [1,2,3,6]
 54 | 	pool_outs = [o]
 55 | 
 56 | 	for p in pool_factors:
 57 | 		pooled = pool_block(o, p, out_channel)
 58 | 		pool_outs.append(pooled)
 59 | 	
 60 |     #--------------------------------------------------------------------------------#
 61 |     #   利用获取到的特征层进行堆叠
 62 |     #   30, 30, 320 + 30, 30, 80 + 30, 30, 80 + 30, 30, 80 + 30, 30, 80 = 30, 30, 640
 63 |     #--------------------------------------------------------------------------------#
 64 | 	o = Concatenate()(pool_outs)
 65 | 
 66 |     # 30, 30, 640 -> 30, 30, 80
 67 | 	o = Conv2D(out_channel//4, (3,3), padding='same', use_bias=False)(o)
 68 | 	o = BatchNormalization()(o)
 69 | 	o = Activation('relu')(o)
 70 | 
 71 |     # 防止过拟合
 72 | 	o = Dropout(0.1)(o)
 73 | 
 74 |     #---------------------------------------------------#
 75 |     #	利用特征获得预测结果
 76 |     #   30, 30, 80 -> 30, 30, 21 -> 473, 473, 21
 77 |     #---------------------------------------------------#
 78 | 	o = Conv2D(num_classes,(1,1), padding='same')(o)
 79 | 	o = Lambda(lambda x: tf.compat.v1.image.resize_images(x, (input_shape[1], input_shape[0]), align_corners=True))(o)
 80 | 
 81 |     #---------------------------------------------------#
 82 |     #   获得每一个像素点属于每一个类的概率
 83 |     #---------------------------------------------------#
 84 | 	o = Activation("softmax", name="main")(o)
 85 | 	
 86 | 	if aux_branch:
 87 |         # 30, 30, 96 -> 30, 30, 40 
 88 | 		f4 = Conv2D(out_channel//8, (3,3), padding='same', use_bias=False)(f4)
 89 | 		f4 = BatchNormalization()(f4)
 90 | 		f4 = Activation('relu')(f4)
 91 | 		f4 = Dropout(0.1)(f4)
 92 |         #---------------------------------------------------#
 93 |         #	利用特征获得预测结果
 94 |         #   30, 30, 40 -> 30, 30, 21 -> 473, 473, 21
 95 |         #---------------------------------------------------#
 96 | 		f4 = Conv2D(num_classes,(1,1), padding='same')(f4)
 97 | 		f4 = Lambda(lambda x: tf.compat.v1.image.resize_images(x, (input_shape[1], input_shape[0]), align_corners=True))(f4)
 98 | 		
 99 | 		f4 = Activation("softmax", name="aux")(f4)
100 | 		model = Model(img_input,[f4,o])
101 | 		return model
102 | 	else:
103 | 		model = Model(img_input,[o])
104 | 		return model
105 | 
106 | 	
107 | 


--------------------------------------------------------------------------------
/nets/pspnet_training.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | from functools import partial
  3 | 
  4 | import numpy as np
  5 | import tensorflow as tf
  6 | from tensorflow.keras import backend as K
  7 | 
  8 | 
  9 | def dice_loss_with_CE(cls_weights, beta=1, smooth = 1e-5):
 10 |     cls_weights = np.reshape(cls_weights, [1, 1, 1, -1])
 11 |     def _dice_loss_with_CE(y_true, y_pred):
 12 |         y_pred = K.clip(y_pred, K.epsilon(), 1.0 - K.epsilon())
 13 | 
 14 |         CE_loss = - y_true[...,:-1] * K.log(y_pred) * cls_weights
 15 |         CE_loss = K.mean(K.sum(CE_loss, axis = -1))
 16 | 
 17 |         tp = K.sum(y_true[...,:-1] * y_pred, axis=[0,1,2])
 18 |         fp = K.sum(y_pred         , axis=[0,1,2]) - tp
 19 |         fn = K.sum(y_true[...,:-1], axis=[0,1,2]) - tp
 20 | 
 21 |         score = ((1 + beta ** 2) * tp + smooth) / ((1 + beta ** 2) * tp + beta ** 2 * fn + fp + smooth)
 22 |         score = tf.reduce_mean(score)
 23 |         dice_loss = 1 - score
 24 |         # dice_loss = tf.Print(dice_loss, [dice_loss, CE_loss])
 25 |         return CE_loss + dice_loss
 26 |     return _dice_loss_with_CE
 27 | 
 28 | def CE(cls_weights):
 29 |     cls_weights = np.reshape(cls_weights, [1, 1, 1, -1])
 30 |     def _CE(y_true, y_pred):
 31 |         y_pred = K.clip(y_pred, K.epsilon(), 1.0 - K.epsilon())
 32 | 
 33 |         CE_loss = - y_true[...,:-1] * K.log(y_pred) * cls_weights
 34 |         CE_loss = K.mean(K.sum(CE_loss, axis = -1))
 35 |         # dice_loss = tf.Print(CE_loss, [CE_loss])
 36 |         return CE_loss
 37 |     return _CE
 38 | 
 39 | def dice_loss_with_Focal_Loss(cls_weights, beta=1, smooth = 1e-5, alpha=0.5, gamma=2):
 40 |     cls_weights = np.reshape(cls_weights, [1, 1, 1, -1])
 41 |     def _dice_loss_with_Focal_Loss(y_true, y_pred):
 42 |         y_pred = K.clip(y_pred, K.epsilon(), 1.0 - K.epsilon())
 43 | 
 44 |         logpt = - y_true[...,:-1] * K.log(y_pred) * cls_weights
 45 |         logpt = - K.sum(logpt, axis = -1)
 46 | 
 47 |         pt = tf.exp(logpt)
 48 |         if alpha is not None:
 49 |             logpt *= alpha
 50 |         CE_loss = -((1 - pt) ** gamma) * logpt
 51 |         CE_loss = K.mean(CE_loss)
 52 | 
 53 |         tp = K.sum(y_true[...,:-1] * y_pred, axis=[0,1,2])
 54 |         fp = K.sum(y_pred         , axis=[0,1,2]) - tp
 55 |         fn = K.sum(y_true[...,:-1], axis=[0,1,2]) - tp
 56 | 
 57 |         score = ((1 + beta ** 2) * tp + smooth) / ((1 + beta ** 2) * tp + beta ** 2 * fn + fp + smooth)
 58 |         score = tf.reduce_mean(score)
 59 |         dice_loss = 1 - score
 60 |         # dice_loss = tf.Print(dice_loss, [dice_loss, CE_loss])
 61 |         return CE_loss + dice_loss
 62 |     return _dice_loss_with_Focal_Loss
 63 | 
 64 | def Focal_Loss(cls_weights, alpha=0.5, gamma=2):
 65 |     cls_weights = np.reshape(cls_weights, [1, 1, 1, -1])
 66 |     def _Focal_Loss(y_true, y_pred):
 67 |         y_pred = K.clip(y_pred, K.epsilon(), 1.0 - K.epsilon())
 68 | 
 69 |         logpt = - y_true[...,:-1] * K.log(y_pred) * cls_weights
 70 |         logpt = - K.sum(logpt, axis = -1)
 71 | 
 72 |         pt = tf.exp(logpt)
 73 |         if alpha is not None:
 74 |             logpt *= alpha
 75 |         CE_loss = -((1 - pt) ** gamma) * logpt
 76 |         CE_loss = K.mean(CE_loss)
 77 |         return CE_loss
 78 |     return _Focal_Loss
 79 | 
 80 | def get_lr_scheduler(lr_decay_type, lr, min_lr, total_iters, warmup_iters_ratio = 0.1, warmup_lr_ratio = 0.1, no_aug_iter_ratio = 0.3, step_num = 10):
 81 |     def yolox_warm_cos_lr(lr, min_lr, total_iters, warmup_total_iters, warmup_lr_start, no_aug_iter, iters):
 82 |         if iters <= warmup_total_iters:
 83 |             # lr = (lr - warmup_lr_start) * iters / float(warmup_total_iters) + warmup_lr_start
 84 |             lr = (lr - warmup_lr_start) * pow(iters / float(warmup_total_iters), 2
 85 |             ) + warmup_lr_start
 86 |         elif iters >= total_iters - no_aug_iter:
 87 |             lr = min_lr
 88 |         else:
 89 |             lr = min_lr + 0.5 * (lr - min_lr) * (
 90 |                 1.0
 91 |                 + math.cos(
 92 |                     math.pi
 93 |                     * (iters - warmup_total_iters)
 94 |                     / (total_iters - warmup_total_iters - no_aug_iter)
 95 |                 )
 96 |             )
 97 |         return lr
 98 | 
 99 |     def step_lr(lr, decay_rate, step_size, iters):
100 |         if step_size < 1:
101 |             raise ValueError("step_size must above 1.")
102 |         n       = iters // step_size
103 |         out_lr  = lr * decay_rate ** n
104 |         return out_lr
105 | 
106 |     if lr_decay_type == "cos":
107 |         warmup_total_iters  = min(max(warmup_iters_ratio * total_iters, 1), 3)
108 |         warmup_lr_start     = max(warmup_lr_ratio * lr, 1e-6)
109 |         no_aug_iter         = min(max(no_aug_iter_ratio * total_iters, 1), 15)
110 |         func = partial(yolox_warm_cos_lr ,lr, min_lr, total_iters, warmup_total_iters, warmup_lr_start, no_aug_iter)
111 |     else:
112 |         decay_rate  = (min_lr / lr) ** (1 / (step_num - 1))
113 |         step_size   = total_iters / step_num
114 |         func = partial(step_lr, lr, decay_rate, step_size)
115 | 
116 |     return func
117 | 
118 | 


--------------------------------------------------------------------------------
/nets/resnet50.py:
--------------------------------------------------------------------------------
  1 | #-------------------------------------------------------------#
  2 | #   ResNet50的网络部分
  3 | #-------------------------------------------------------------#
  4 | from __future__ import print_function
  5 | 
  6 | from tensorflow.keras import layers
  7 | from tensorflow.keras.layers import (Activation, BatchNormalization, Conv2D,
  8 |                                      Input, MaxPooling2D, ZeroPadding2D)
  9 | 
 10 | 
 11 | def identity_block(input_tensor, kernel_size, filters, stage, block, dilation_rate=1):
 12 | 
 13 |     filters1, filters2, filters3 = filters
 14 | 
 15 |     conv_name_base = 'res' + str(stage) + block + '_branch'
 16 |     bn_name_base = 'bn' + str(stage) + block + '_branch'
 17 | 
 18 |     x = Conv2D(filters1, (1, 1), name=conv_name_base + '2a', use_bias=False)(input_tensor)
 19 |     x = BatchNormalization(name=bn_name_base + '2a')(x)
 20 |     x = Activation('relu')(x)
 21 | 
 22 |     x = Conv2D(filters2, kernel_size, padding='same', dilation_rate = dilation_rate, name=conv_name_base + '2b', use_bias=False)(x)
 23 |     x = BatchNormalization(name=bn_name_base + '2b')(x)
 24 |     x = Activation('relu')(x)
 25 | 
 26 |     x = Conv2D(filters3, (1, 1), name=conv_name_base + '2c', use_bias=False)(x)
 27 |     x = BatchNormalization(name=bn_name_base + '2c')(x)
 28 | 
 29 |     x = layers.add([x, input_tensor])
 30 |     x = Activation('relu')(x)
 31 |     return x
 32 | 
 33 | 
 34 | def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2), dilation_rate=1):
 35 | 
 36 |     filters1, filters2, filters3 = filters
 37 | 
 38 |     conv_name_base = 'res' + str(stage) + block + '_branch'
 39 |     bn_name_base = 'bn' + str(stage) + block + '_branch'
 40 | 
 41 |     x = Conv2D(filters1, (1, 1), strides=strides,
 42 |                name=conv_name_base + '2a', use_bias=False)(input_tensor)
 43 |     x = BatchNormalization(name=bn_name_base + '2a')(x)
 44 |     x = Activation('relu')(x)
 45 | 
 46 |     x = Conv2D(filters2, kernel_size, padding='same', dilation_rate = dilation_rate,
 47 |                name=conv_name_base + '2b', use_bias=False)(x)
 48 |     x = BatchNormalization(name=bn_name_base + '2b')(x)
 49 |     x = Activation('relu')(x)
 50 | 
 51 |     x = Conv2D(filters3, (1, 1), name=conv_name_base + '2c', use_bias=False)(x)
 52 |     x = BatchNormalization(name=bn_name_base + '2c')(x)
 53 | 
 54 |     shortcut = Conv2D(filters3, (1, 1), strides=strides,
 55 |                       name=conv_name_base + '1', use_bias=False)(input_tensor)
 56 |     shortcut = BatchNormalization(name=bn_name_base + '1')(shortcut)
 57 | 
 58 |     x = layers.add([x, shortcut])
 59 |     x = Activation('relu')(x)
 60 |     return x
 61 |     
 62 | def get_resnet50_encoder(inputs_size, downsample_factor=8):
 63 |     if downsample_factor == 16:
 64 |         block4_dilation = 1
 65 |         block5_dilation = 2
 66 |         block4_stride = 2
 67 |     elif downsample_factor == 8:
 68 |         block4_dilation = 2
 69 |         block5_dilation = 4
 70 |         block4_stride = 1
 71 |     else:
 72 |         raise ValueError('Unsupported factor - `{}`, Use 8 or 16.'.format(downsample_factor))
 73 |     img_input = Input(shape=inputs_size)
 74 | 
 75 |     x = ZeroPadding2D(padding=(1, 1), name='conv1_pad')(img_input)
 76 |     x = Conv2D(filters=64, kernel_size=(3, 3), strides=(2, 2), name='conv1', use_bias=False)(x)
 77 |     x = BatchNormalization(axis=-1, name='bn_conv1')(x)
 78 |     x = Activation('relu')(x)
 79 | 
 80 |     x = ZeroPadding2D(padding=(1, 1), name='conv2_pad')(x)
 81 |     x = Conv2D(filters=64, kernel_size=(3, 3), strides=(1, 1), name='conv2', use_bias=False)(x)
 82 |     x = BatchNormalization(axis=-1, name='bn_conv2')(x)
 83 |     x = Activation(activation='relu')(x)
 84 | 
 85 |     x = ZeroPadding2D(padding=(1, 1), name='conv3_pad')(x)
 86 |     x = Conv2D(filters=128, kernel_size=(3, 3), strides=(1, 1), name='conv3', use_bias=False)(x)
 87 |     x = BatchNormalization(axis=-1, name='bn_conv3')(x)
 88 |     x = Activation(activation='relu')(x)
 89 | 
 90 |     x = ZeroPadding2D(padding=(1, 1), name='pool1_pad')(x)
 91 |     x = MaxPooling2D((3, 3), strides=(2, 2))(x)
 92 | 
 93 |     x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))
 94 |     x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
 95 |     x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')
 96 | 
 97 |     x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')
 98 |     x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')
 99 |     x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')
100 |     x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')
101 | 
102 |     x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a', strides=(block4_stride,block4_stride))
103 |     x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b', dilation_rate=block4_dilation)
104 |     x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c', dilation_rate=block4_dilation)
105 |     x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d', dilation_rate=block4_dilation)
106 |     x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e', dilation_rate=block4_dilation)
107 |     x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f', dilation_rate=block4_dilation)
108 |     f4 = x
109 | 
110 |     x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a', strides=(1,1), dilation_rate=block4_dilation)
111 |     x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b', dilation_rate=block5_dilation)
112 |     x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c', dilation_rate=block5_dilation)
113 |     f5 = x 
114 | 
115 |     return img_input, f4, f5
116 | 


--------------------------------------------------------------------------------
/predict.py:
--------------------------------------------------------------------------------
  1 | #-------------------------------------#
  2 | #       对单张图片进行预测
  3 | #-------------------------------------#
  4 | import time
  5 | 
  6 | import cv2
  7 | import numpy as np
  8 | import tensorflow as tf
  9 | from PIL import Image
 10 | 
 11 | from pspnet import Pspnet
 12 | 
 13 | gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
 14 | for gpu in gpus:
 15 |     tf.config.experimental.set_memory_growth(gpu, True)
 16 |     
 17 | if __name__ == "__main__":
 18 |     #-------------------------------------------------------------------------#
 19 |     #   如果想要修改对应种类的颜色，到__init__函数里修改self.colors即可
 20 |     #-------------------------------------------------------------------------#
 21 |     pspnet = Pspnet()
 22 |     #----------------------------------------------------------------------------------------------------------#
 23 |     #   mode用于指定测试的模式：
 24 |     #   'predict'           表示单张图片预测，如果想对预测过程进行修改，如保存图片，截取对象等，可以先看下方详细的注释
 25 |     #   'video'             表示视频检测，可调用摄像头或者视频进行检测，详情查看下方注释。
 26 |     #   'fps'               表示测试fps，使用的图片是img里面的street.jpg，详情查看下方注释。
 27 |     #   'dir_predict'       表示遍历文件夹进行检测并保存。默认遍历img文件夹，保存img_out文件夹，详情查看下方注释。
 28 |     #----------------------------------------------------------------------------------------------------------#
 29 |     mode = "predict"
 30 |     #-------------------------------------------------------------------------#
 31 |     #   count               指定了是否进行目标的像素点计数（即面积）与比例计算
 32 |     #   name_classes        区分的种类，和json_to_dataset里面的一样，用于打印种类和数量
 33 |     #
 34 |     #   count、name_classes仅在mode='predict'时有效
 35 |     #-------------------------------------------------------------------------#
 36 |     count           = False
 37 |     name_classes    = ["background","aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
 38 |     # name_classes    = ["background","cat","dog"]
 39 |     #----------------------------------------------------------------------------------------------------------#
 40 |     #   video_path          用于指定视频的路径，当video_path=0时表示检测摄像头
 41 |     #                       想要检测视频，则设置如video_path = "xxx.mp4"即可，代表读取出根目录下的xxx.mp4文件。
 42 |     #   video_save_path     表示视频保存的路径，当video_save_path=""时表示不保存
 43 |     #                       想要保存视频，则设置如video_save_path = "yyy.mp4"即可，代表保存为根目录下的yyy.mp4文件。
 44 |     #   video_fps           用于保存的视频的fps
 45 |     #
 46 |     #   video_path、video_save_path和video_fps仅在mode='video'时有效
 47 |     #   保存视频时需要ctrl+c退出或者运行到最后一帧才会完成完整的保存步骤。
 48 |     #----------------------------------------------------------------------------------------------------------#
 49 |     video_path      = 0
 50 |     video_save_path = ""
 51 |     video_fps       = 25.0
 52 |     #----------------------------------------------------------------------------------------------------------#
 53 |     #   test_interval       用于指定测量fps的时候，图片检测的次数。理论上test_interval越大，fps越准确。
 54 |     #   fps_image_path      用于指定测试的fps图片
 55 |     #   
 56 |     #   test_interval和fps_image_path仅在mode='fps'有效
 57 |     #----------------------------------------------------------------------------------------------------------#
 58 |     test_interval = 100
 59 |     fps_image_path  = "img/street.jpg"
 60 |     #-------------------------------------------------------------------------#
 61 |     #   dir_origin_path     指定了用于检测的图片的文件夹路径
 62 |     #   dir_save_path       指定了检测完图片的保存路径
 63 |     #   
 64 |     #   dir_origin_path和dir_save_path仅在mode='dir_predict'时有效
 65 |     #-------------------------------------------------------------------------#
 66 |     dir_origin_path = "img/"
 67 |     dir_save_path   = "img_out/"
 68 | 
 69 |     if mode == "predict":
 70 |         '''
 71 |         predict.py有几个注意点
 72 |         1、该代码无法直接进行批量预测，如果想要批量预测，可以利用os.listdir()遍历文件夹，利用Image.open打开图片文件进行预测。
 73 |         具体流程可以参考get_miou_prediction.py，在get_miou_prediction.py即实现了遍历。
 74 |         2、如果想要保存，利用r_image.save("img.jpg")即可保存。
 75 |         3、如果想要原图和分割图不混合，可以把blend参数设置成False。
 76 |         4、如果想根据mask获取对应的区域，可以参考detect_image函数中，利用预测结果绘图的部分，判断每一个像素点的种类，然后根据种类获取对应的部分。
 77 |         seg_img = np.zeros((np.shape(pr)[0],np.shape(pr)[1],3))
 78 |         for c in range(self.num_classes):
 79 |             seg_img[:, :, 0] += ((pr == c)*( self.colors[c][0] )).astype('uint8')
 80 |             seg_img[:, :, 1] += ((pr == c)*( self.colors[c][1] )).astype('uint8')
 81 |             seg_img[:, :, 2] += ((pr == c)*( self.colors[c][2] )).astype('uint8')
 82 |         '''
 83 |         while True:
 84 |             img = input('Input image filename:')
 85 |             try:
 86 |                 image = Image.open(img)
 87 |             except:
 88 |                 print('Open Error! Try again!')
 89 |                 continue
 90 |             else:
 91 |                 r_image = pspnet.detect_image(image, count=count, name_classes=name_classes)
 92 |                 r_image.show()
 93 | 
 94 |     elif mode == "video":
 95 |         capture=cv2.VideoCapture(video_path)
 96 |         if video_save_path!="":
 97 |             fourcc = cv2.VideoWriter_fourcc(*'XVID')
 98 |             size = (int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)), int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)))
 99 |             out = cv2.VideoWriter(video_save_path, fourcc, video_fps, size)
100 | 
101 |         ref, frame = capture.read()
102 |         if not ref:
103 |             raise ValueError("未能正确读取摄像头（视频），请注意是否正确安装摄像头（是否正确填写视频路径）。")
104 | 
105 |         fps = 0.0
106 |         while(True):
107 |             t1 = time.time()
108 |             # 读取某一帧
109 |             ref, frame = capture.read()
110 |             if not ref:
111 |                 break
112 |             # 格式转变，BGRtoRGB
113 |             frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)
114 |             # 转变成Image
115 |             frame = Image.fromarray(np.uint8(frame))
116 |             # 进行检测
117 |             frame = np.array(pspnet.detect_image(frame))
118 |             # RGBtoBGR满足opencv显示格式
119 |             frame = cv2.cvtColor(frame,cv2.COLOR_RGB2BGR)
120 |             
121 |             fps  = ( fps + (1./(time.time()-t1)) ) / 2
122 |             print("fps= %.2f"%(fps))
123 |             frame = cv2.putText(frame, "fps= %.2f"%(fps), (0, 40), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
124 |             
125 |             cv2.imshow("video",frame)
126 |             c= cv2.waitKey(1) & 0xff 
127 |             if video_save_path!="":
128 |                 out.write(frame)
129 | 
130 |             if c==27:
131 |                 capture.release()
132 |                 break
133 |         print("Video Detection Done!")
134 |         capture.release()
135 |         if video_save_path!="":
136 |             print("Save processed video to the path :" + video_save_path)
137 |             out.release()
138 |         cv2.destroyAllWindows()
139 | 
140 |     elif mode == "fps":
141 |         img = Image.open(fps_image_path)
142 |         tact_time = pspnet.get_FPS(img, test_interval)
143 |         print(str(tact_time) + ' seconds, ' + str(1/tact_time) + 'FPS, @batch_size 1')
144 |         
145 |     elif mode == "dir_predict":
146 |         import os
147 | 
148 |         from tqdm import tqdm
149 | 
150 |         img_names = os.listdir(dir_origin_path)
151 |         for img_name in tqdm(img_names):
152 |             if img_name.lower().endswith(('.bmp', '.dib', '.png', '.jpg', '.jpeg', '.pbm', '.pgm', '.ppm', '.tif', '.tiff')):
153 |                 image_path  = os.path.join(dir_origin_path, img_name)
154 |                 image       = Image.open(image_path)
155 |                 r_image     = pspnet.detect_image(image)
156 |                 if not os.path.exists(dir_save_path):
157 |                     os.makedirs(dir_save_path)
158 |                 r_image.save(os.path.join(dir_save_path, img_name))
159 |                 
160 |     else:
161 |         raise AssertionError("Please specify the correct mode: 'predict', 'video', 'fps' or 'dir_predict'.")
162 | 


--------------------------------------------------------------------------------
/pspnet.py:
--------------------------------------------------------------------------------
  1 | import colorsys
  2 | import copy
  3 | import time
  4 | 
  5 | import cv2
  6 | import numpy as np
  7 | import tensorflow as tf
  8 | from PIL import Image
  9 | 
 10 | from nets.pspnet import pspnet
 11 | from utils.utils import cvtColor, preprocess_input, resize_image, show_config
 12 | 
 13 | 
 14 | #--------------------------------------------#
 15 | #   使用自己训练好的模型预测需要修改3个参数
 16 | #   model_path、backbone和num_classes都需要修改！
 17 | #   如果出现shape不匹配
 18 | #   一定要注意训练时的model_path、
 19 | #   backbone和num_classes数的修改
 20 | #--------------------------------------------#
 21 | class Pspnet(object):
 22 |     _defaults = {
 23 |         #-------------------------------------------------------------------#
 24 |         #   model_path指向logs文件夹下的权值文件
 25 |         #   训练好后logs文件夹下存在多个权值文件，选择验证集损失较低的即可。
 26 |         #   验证集损失较低不代表miou较高，仅代表该权值在验证集上泛化性能较好。
 27 |         #-------------------------------------------------------------------#
 28 |         "model_path"        : 'model_data/pspnet_mobilenetv2.h5',
 29 |         #----------------------------------------#
 30 |         #   所需要区分的类的个数+1
 31 |         #----------------------------------------#
 32 |         "num_classes"       : 21,
 33 |         #----------------------------------------#
 34 |         #   所使用的的主干网络：mobilenet、resnet50
 35 |         #----------------------------------------#
 36 |         "backbone"          : "mobilenet",
 37 |         #----------------------------------------#
 38 |         #   输入图片的大小
 39 |         #----------------------------------------#
 40 |         "input_shape"       : [473, 473],
 41 |         #----------------------------------------#
 42 |         #   下采样的倍数，一般可选的为8和16
 43 |         #   与训练时设置的一样即可
 44 |         #----------------------------------------#
 45 |         "downsample_factor" : 16,
 46 |         #-------------------------------------------------#
 47 |         #   mix_type参数用于控制检测结果的可视化方式
 48 |         #
 49 |         #   mix_type = 0的时候代表原图与生成的图进行混合
 50 |         #   mix_type = 1的时候代表仅保留生成的图
 51 |         #   mix_type = 2的时候代表仅扣去背景，仅保留原图中的目标
 52 |         #-------------------------------------------------#
 53 |         "mix_type"          : 0,
 54 |     }
 55 | 
 56 |     #---------------------------------------------------#
 57 |     #   初始化PSPNET
 58 |     #---------------------------------------------------#
 59 |     def __init__(self, **kwargs):
 60 |         self.__dict__.update(self._defaults)
 61 |         for name, value in kwargs.items():
 62 |             setattr(self, name, value)
 63 |         #---------------------------------------------------#
 64 |         #   画框设置不同的颜色
 65 |         #---------------------------------------------------#
 66 |         if self.num_classes <= 21:
 67 |             self.colors = [ (0, 0, 0), (128, 0, 0), (0, 128, 0), (128, 128, 0), (0, 0, 128), (128, 0, 128), (0, 128, 128), 
 68 |                             (128, 128, 128), (64, 0, 0), (192, 0, 0), (64, 128, 0), (192, 128, 0), (64, 0, 128), (192, 0, 128), 
 69 |                             (64, 128, 128), (192, 128, 128), (0, 64, 0), (128, 64, 0), (0, 192, 0), (128, 192, 0), (0, 64, 128), 
 70 |                             (128, 64, 12)]
 71 |         else:
 72 |             hsv_tuples = [(x / self.num_classes, 1., 1.) for x in range(self.num_classes)]
 73 |             self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
 74 |             self.colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), self.colors))
 75 |         #---------------------------------------------------#
 76 |         #   获得模型
 77 |         #---------------------------------------------------#
 78 |         self.generate()
 79 | 
 80 |         show_config(**self._defaults)
 81 | 
 82 |     #---------------------------------------------------#
 83 |     #   载入模型
 84 |     #---------------------------------------------------#
 85 |     def generate(self):
 86 |         #-------------------------------#
 87 |         #   载入模型与权值
 88 |         #-------------------------------#
 89 |         self.model = pspnet([self.input_shape[0], self.input_shape[1], 3], self.num_classes,
 90 |                     downsample_factor=self.downsample_factor, backbone=self.backbone, aux_branch=False)
 91 |         self.model.load_weights(self.model_path, by_name = True)
 92 |         print('{} model loaded.'.format(self.model_path))
 93 | 
 94 |     @tf.function
 95 |     def get_pred(self, image_data):
 96 |         pr = self.model(image_data, training=False)
 97 |         return pr
 98 |     #---------------------------------------------------#
 99 |     #   检测图片
100 |     #---------------------------------------------------#
101 |     def detect_image(self, image, count=False, name_classes=None):
102 |         #---------------------------------------------------------#
103 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
104 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
105 |         #---------------------------------------------------------#
106 |         image       = cvtColor(image)
107 |         #---------------------------------------------------#
108 |         #   对输入图像进行一个备份，后面用于绘图
109 |         #---------------------------------------------------#
110 |         old_img     = copy.deepcopy(image)
111 |         orininal_h  = np.array(image).shape[0]
112 |         orininal_w  = np.array(image).shape[1]
113 |         #---------------------------------------------------------#
114 |         #   给图像增加灰条，实现不失真的resize
115 |         #   也可以直接resize进行识别 
116 |         #---------------------------------------------------------#
117 |         image_data, nw, nh  = resize_image(image, (self.input_shape[1], self.input_shape[0]))
118 |         #---------------------------------------------------------#
119 |         #   归一化+通道数调整到第一维度+添加上batch_size维度
120 |         #---------------------------------------------------------#
121 |         image_data  = np.expand_dims(preprocess_input(np.array(image_data, np.float32)), 0)
122 | 
123 |         #---------------------------------------------------#
124 |         #   图片传入网络进行预测
125 |         #---------------------------------------------------#
126 |         pr = self.get_pred(image_data)[0].numpy()
127 |         #---------------------------------------------------#
128 |         #   将灰条部分截取掉
129 |         #---------------------------------------------------#
130 |         pr = pr[int((self.input_shape[0] - nh) // 2) : int((self.input_shape[0] - nh) // 2 + nh), \
131 |                 int((self.input_shape[1] - nw) // 2) : int((self.input_shape[1] - nw) // 2 + nw)]
132 |         #---------------------------------------------------#
133 |         #   进行图片的resize
134 |         #---------------------------------------------------#
135 |         pr = cv2.resize(pr, (orininal_w, orininal_h), interpolation = cv2.INTER_LINEAR)
136 |         #---------------------------------------------------#
137 |         #   取出每一个像素点的种类
138 |         #---------------------------------------------------#
139 |         pr = pr.argmax(axis=-1)
140 |         
141 |         #---------------------------------------------------------#
142 |         #   计数
143 |         #---------------------------------------------------------#
144 |         if count:
145 |             classes_nums        = np.zeros([self.num_classes])
146 |             total_points_num    = orininal_h * orininal_w
147 |             print('-' * 63)
148 |             print("|%25s | %15s | %15s|"%("Key", "Value", "Ratio"))
149 |             print('-' * 63)
150 |             for i in range(self.num_classes):
151 |                 num     = np.sum(pr == i)
152 |                 ratio   = num / total_points_num * 100
153 |                 if num > 0:
154 |                     print("|%25s | %15s | %14.2f%%|"%(str(name_classes[i]), str(num), ratio))
155 |                     print('-' * 63)
156 |                 classes_nums[i] = num
157 |             print("classes_nums:", classes_nums)
158 | 
159 |         if self.mix_type == 0:
160 |             # seg_img = np.zeros((np.shape(pr)[0], np.shape(pr)[1], 3))
161 |             # for c in range(self.num_classes):
162 |             #     seg_img[:, :, 0] += ((pr[:, :] == c ) * self.colors[c][0]).astype('uint8')
163 |             #     seg_img[:, :, 1] += ((pr[:, :] == c ) * self.colors[c][1]).astype('uint8')
164 |             #     seg_img[:, :, 2] += ((pr[:, :] == c ) * self.colors[c][2]).astype('uint8')
165 |             seg_img = np.reshape(np.array(self.colors, np.uint8)[np.reshape(pr, [-1])], [orininal_h, orininal_w, -1])
166 |             #------------------------------------------------#
167 |             #   将新图片转换成Image的形式
168 |             #------------------------------------------------#
169 |             image   = Image.fromarray(np.uint8(seg_img))
170 |             #------------------------------------------------#
171 |             #   将新图与原图及进行混合
172 |             #------------------------------------------------#
173 |             image   = Image.blend(old_img, image, 0.7)
174 | 
175 |         elif self.mix_type == 1:
176 |             # seg_img = np.zeros((np.shape(pr)[0], np.shape(pr)[1], 3))
177 |             # for c in range(self.num_classes):
178 |             #     seg_img[:, :, 0] += ((pr[:, :] == c ) * self.colors[c][0]).astype('uint8')
179 |             #     seg_img[:, :, 1] += ((pr[:, :] == c ) * self.colors[c][1]).astype('uint8')
180 |             #     seg_img[:, :, 2] += ((pr[:, :] == c ) * self.colors[c][2]).astype('uint8')
181 |             seg_img = np.reshape(np.array(self.colors, np.uint8)[np.reshape(pr, [-1])], [orininal_h, orininal_w, -1])
182 |             #------------------------------------------------#
183 |             #   将新图片转换成Image的形式
184 |             #------------------------------------------------#
185 |             image   = Image.fromarray(np.uint8(seg_img))
186 | 
187 |         elif self.mix_type == 2:
188 |             seg_img = (np.expand_dims(pr != 0, -1) * np.array(old_img, np.float32)).astype('uint8')
189 |             #------------------------------------------------#
190 |             #   将新图片转换成Image的形式
191 |             #------------------------------------------------#
192 |             image = Image.fromarray(np.uint8(seg_img))
193 |         
194 |         return image
195 | 
196 |     def get_FPS(self, image, test_interval):
197 |         #---------------------------------------------------------#
198 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
199 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
200 |         #---------------------------------------------------------#
201 |         image       = cvtColor(image)
202 |         #---------------------------------------------------------#
203 |         #   给图像增加灰条，实现不失真的resize
204 |         #   也可以直接resize进行识别 
205 |         #---------------------------------------------------------#
206 |         image_data, nw, nh  = resize_image(image, (self.input_shape[1], self.input_shape[0]))
207 |         #---------------------------------------------------------#
208 |         #   归一化+通道数调整到第一维度+添加上batch_size维度
209 |         #---------------------------------------------------------#
210 |         image_data  = np.expand_dims(preprocess_input(np.array(image_data, np.float32)), 0)
211 | 
212 |         #---------------------------------------------------#
213 |         #   图片传入网络进行预测
214 |         #---------------------------------------------------#
215 |         pr = self.get_pred(image_data)[0].numpy()
216 |         #---------------------------------------------------#
217 |         #   取出每一个像素点的种类
218 |         #---------------------------------------------------#
219 |         pr = pr.argmax(axis=-1).reshape([self.input_shape[0],self.input_shape[1]])
220 |         #--------------------------------------#
221 |         #   将灰条部分截取掉
222 |         #--------------------------------------#
223 |         pr = pr[int((self.input_shape[0] - nh) // 2) : int((self.input_shape[0] - nh) // 2 + nh), \
224 |                 int((self.input_shape[1] - nw) // 2) : int((self.input_shape[1] - nw) // 2 + nw)]
225 |                 
226 |         t1 = time.time()
227 |         for _ in range(test_interval):
228 |             #---------------------------------------------------#
229 |             #   图片传入网络进行预测
230 |             #---------------------------------------------------#
231 |             pr = self.get_pred(image_data)[0].numpy()
232 |             #---------------------------------------------------#
233 |             #   取出每一个像素点的种类
234 |             #---------------------------------------------------#
235 |             pr = pr.argmax(axis=-1).reshape([self.input_shape[0],self.input_shape[1]])
236 |             #--------------------------------------#
237 |             #   将灰条部分截取掉
238 |             #--------------------------------------#
239 |             pr = pr[int((self.input_shape[0] - nh) // 2) : int((self.input_shape[0] - nh) // 2 + nh), \
240 |                     int((self.input_shape[1] - nw) // 2) : int((self.input_shape[1] - nw) // 2 + nw)]
241 | 
242 |         t2 = time.time()
243 |         tact_time = (t2 - t1) / test_interval
244 |         return tact_time
245 |         
246 |     def get_miou_png(self, image):
247 |         #---------------------------------------------------------#
248 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
249 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
250 |         #---------------------------------------------------------#
251 |         image       = cvtColor(image)
252 |         orininal_h  = np.array(image).shape[0]
253 |         orininal_w  = np.array(image).shape[1]
254 |         #---------------------------------------------------------#
255 |         #   给图像增加灰条，实现不失真的resize
256 |         #   也可以直接resize进行识别 
257 |         #---------------------------------------------------------#
258 |         image_data, nw, nh  = resize_image(image, (self.input_shape[1], self.input_shape[0]))
259 |         #---------------------------------------------------------#
260 |         #   归一化+通道数调整到第一维度+添加上batch_size维度
261 |         #---------------------------------------------------------#
262 |         image_data  = np.expand_dims(preprocess_input(np.array(image_data, np.float32)), 0)
263 | 
264 |         #---------------------------------------------------#
265 |         #   图片传入网络进行预测
266 |         #---------------------------------------------------#
267 |         pr = self.get_pred(image_data)[0].numpy()
268 |         #--------------------------------------#
269 |         #   将灰条部分截取掉
270 |         #--------------------------------------#
271 |         pr = pr[int((self.input_shape[0] - nh) // 2) : int((self.input_shape[0] - nh) // 2 + nh), \
272 |                 int((self.input_shape[1] - nw) // 2) : int((self.input_shape[1] - nw) // 2 + nw)]
273 |         #--------------------------------------#
274 |         #   进行图片的resize
275 |         #--------------------------------------#
276 |         pr = cv2.resize(pr, (orininal_w, orininal_h), interpolation = cv2.INTER_LINEAR)
277 |         #---------------------------------------------------#
278 |         #   取出每一个像素点的种类
279 |         #---------------------------------------------------#
280 |         pr = pr.argmax(axis=-1)
281 | 
282 |         image = Image.fromarray(np.uint8(pr))
283 |         return image
284 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | scipy==1.4.1
2 | numpy==1.18.4
3 | matplotlib==3.2.1
4 | opencv_python==4.2.0.34
5 | tensorflow_gpu==2.2.0
6 | tqdm==4.46.1
7 | Pillow==8.2.0
8 | h5py==2.10.0
9 | 


--------------------------------------------------------------------------------
/summary.py:
--------------------------------------------------------------------------------
 1 | #--------------------------------------------#
 2 | #   该部分代码只用于看网络结构，并非测试代码
 3 | #--------------------------------------------#
 4 | from nets.pspnet import pspnet
 5 | from utils.utils import net_flops
 6 | 
 7 | if __name__ == "__main__":
 8 |     input_shape     = [512, 512]
 9 |     num_classes     = 21
10 |     backbone        = 'mobilenet'
11 |     
12 |     model = pspnet([input_shape[0], input_shape[1], 3], num_classes, backbone=backbone, downsample_factor=16, aux_branch=False)
13 |     #--------------------------------------------#
14 |     #   查看网络结构网络结构
15 |     #--------------------------------------------#
16 |     model.summary()
17 |     #--------------------------------------------#
18 |     #   计算网络的FLOPS
19 |     #--------------------------------------------#
20 |     net_flops(model, table=False)
21 |     
22 |     #--------------------------------------------#
23 |     #   获得网络每个层的名称与序号
24 |     #--------------------------------------------#
25 |     # for i,layer in enumerate(model.layers):
26 |     #     print(i,layer.name)
27 | 


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | import datetime
  2 | import os
  3 | from functools import partial
  4 | 
  5 | import numpy as np
  6 | import tensorflow as tf
  7 | import tensorflow.keras.backend as K
  8 | from tensorflow.keras.callbacks import (EarlyStopping, LearningRateScheduler,
  9 |                                         TensorBoard)
 10 | from tensorflow.keras.optimizers import SGD, Adam
 11 | 
 12 | from nets.pspnet import pspnet
 13 | from nets.pspnet_training import (CE, Focal_Loss, dice_loss_with_CE,
 14 |                                   dice_loss_with_Focal_Loss, get_lr_scheduler)
 15 | from utils.callbacks import EvalCallback, LossHistory, ModelCheckpoint
 16 | from utils.dataloader import PSPnetDataset
 17 | from utils.utils import show_config
 18 | from utils.utils_fit import fit_one_epoch
 19 | from utils.utils_metrics import Iou_score, f_score
 20 | 
 21 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
 22 | 
 23 | '''
 24 | 训练自己的语义分割模型一定需要注意以下几点：
 25 | 1、训练前仔细检查自己的格式是否满足要求，该库要求数据集格式为VOC格式，需要准备好的内容有输入图片和标签
 26 |    输入图片为.jpg图片，无需固定大小，传入训练前会自动进行resize。
 27 |    灰度图会自动转成RGB图片进行训练，无需自己修改。
 28 |    输入图片如果后缀非jpg，需要自己批量转成jpg后再开始训练。
 29 | 
 30 |    标签为png图片，无需固定大小，传入训练前会自动进行resize。
 31 |    由于许多同学的数据集是网络上下载的，标签格式并不符合，需要再度处理。一定要注意！标签的每个像素点的值就是这个像素点所属的种类。
 32 |    网上常见的数据集总共对输入图片分两类，背景的像素点值为0，目标的像素点值为255。这样的数据集可以正常运行但是预测是没有效果的！
 33 |    需要改成，背景的像素点值为0，目标的像素点值为1。
 34 |    如果格式有误，参考：https://github.com/bubbliiiing/segmentation-format-fix
 35 | 
 36 | 2、损失值的大小用于判断是否收敛，比较重要的是有收敛的趋势，即验证集损失不断下降，如果验证集损失基本上不改变的话，模型基本上就收敛了。
 37 |    损失值的具体大小并没有什么意义，大和小只在于损失的计算方式，并不是接近于0才好。如果想要让损失好看点，可以直接到对应的损失函数里面除上10000。
 38 |    训练过程中的损失值会保存在logs文件夹下的loss_%Y_%m_%d_%H_%M_%S文件夹中
 39 |    
 40 | 3、训练好的权值文件保存在logs文件夹中，每个训练世代（Epoch）包含若干训练步长（Step），每个训练步长（Step）进行一次梯度下降。
 41 |    如果只是训练了几个Step是不会保存的，Epoch和Step的概念要捋清楚一下。
 42 | '''
 43 | if __name__ == "__main__":  
 44 |     #-------------------------------#
 45 |     #   是否使用eager模式训练
 46 |     #-------------------------------#
 47 |     eager       = False
 48 |     #---------------------------------------------------------------------#
 49 |     #   train_gpu   训练用到的GPU
 50 |     #               默认为第一张卡、双卡为[0, 1]、三卡为[0, 1, 2]
 51 |     #               在使用多GPU时，每个卡上的batch为总batch除以卡的数量。
 52 |     #---------------------------------------------------------------------#
 53 |     train_gpu   = [0,]
 54 |     #-----------------------------------------------------#
 55 |     #   num_classes     训练自己的数据集必须要修改的
 56 |     #                   自己需要的分类个数+1，如2+1
 57 |     #-----------------------------------------------------#
 58 |     num_classes = 21
 59 |     #-------------------------------#
 60 |     #   主干网络选择
 61 |     #   mobilenet、resnet50
 62 |     #-------------------------------#
 63 |     backbone    = "mobilenet"
 64 |     #----------------------------------------------------------------------------------------------------------------------------#
 65 |     #   权值文件的下载请看README，可以通过网盘下载。模型的 预训练权重 对不同数据集是通用的，因为特征是通用的。
 66 |     #   模型的 预训练权重 比较重要的部分是 主干特征提取网络的权值部分，用于进行特征提取。
 67 |     #   预训练权重对于99%的情况都必须要用，不用的话主干部分的权值太过随机，特征提取效果不明显，网络训练的结果也不会好
 68 |     #   训练自己的数据集时提示维度不匹配正常，预测的东西都不一样了自然维度不匹配
 69 |     #
 70 |     #   如果训练过程中存在中断训练的操作，可以将model_path设置成logs文件夹下的权值文件，将已经训练了一部分的权值再次载入。
 71 |     #   同时修改下方的 冻结阶段 或者 解冻阶段 的参数，来保证模型epoch的连续性。
 72 |     #   
 73 |     #   当model_path = ''的时候不加载整个模型的权值。
 74 |     #
 75 |     #   此处使用的是整个模型的权重，因此是在train.py进行加载的。
 76 |     #   如果想要让模型从主干的预训练权值开始训练，则设置model_path为主干网络的权值，此时仅加载主干。
 77 |     #   如果想要让模型从0开始训练，则设置model_path = ''，下面的Freeze_Train = Fasle，此时从0开始训练，且没有冻结主干的过程。
 78 |     #   
 79 |     #   一般来讲，网络从0开始的训练效果会很差，因为权值太过随机，特征提取效果不明显，因此非常、非常、非常不建议大家从0开始训练！
 80 |     #   如果一定要从0开始，可以了解imagenet数据集，首先训练分类模型，获得网络的主干部分权值，分类模型的 主干部分 和该模型通用，基于此进行训练。
 81 |     #----------------------------------------------------------------------------------------------------------------------------#
 82 |     model_path          = "model_data/pspnet_mobilenetv2.h5"
 83 |     #---------------------------------------------------------#
 84 |     #   downsample_factor   下采样的倍数8、16 
 85 |     #                       8下采样的倍数较小、理论上效果更好。
 86 |     #                       但也要求更大的显存
 87 |     #---------------------------------------------------------#
 88 |     downsample_factor   = 16
 89 |     #------------------------------#
 90 |     #   输入图片的大小
 91 |     #------------------------------#
 92 |     input_shape         = [473, 473]
 93 | 
 94 |     #----------------------------------------------------------------------------------------------------------------------------#
 95 |     #   训练分为两个阶段，分别是冻结阶段和解冻阶段。设置冻结阶段是为了满足机器性能不足的同学的训练需求。
 96 |     #   冻结训练需要的显存较小，显卡非常差的情况下，可设置Freeze_Epoch等于UnFreeze_Epoch，此时仅仅进行冻结训练。
 97 |     #      
 98 |     #   在此提供若干参数设置建议，各位训练者根据自己的需求进行灵活调整：
 99 |     #   （一）从整个模型的预训练权重开始训练： 
100 |     #       Adam：
101 |     #           Init_Epoch = 0，Freeze_Epoch = 50，UnFreeze_Epoch = 100，Freeze_Train = True，optimizer_type = 'adam'，Init_lr = 5e-4。（冻结）
102 |     #           Init_Epoch = 0，UnFreeze_Epoch = 100，Freeze_Train = False，optimizer_type = 'adam'，Init_lr = 5e-4。（不冻结）
103 |     #       SGD：
104 |     #           Init_Epoch = 0，Freeze_Epoch = 50，UnFreeze_Epoch = 100，Freeze_Train = True，optimizer_type = 'sgd'，Init_lr = 1e-2。（冻结）
105 |     #           Init_Epoch = 0，UnFreeze_Epoch = 100，Freeze_Train = False，optimizer_type = 'sgd'，Init_lr = 1e-2。（不冻结）
106 |     #       其中：UnFreeze_Epoch可以在100-300之间调整。
107 |     #   （二）从主干网络的预训练权重开始训练：
108 |     #       Adam：
109 |     #           Init_Epoch = 0，Freeze_Epoch = 50，UnFreeze_Epoch = 100，Freeze_Train = True，optimizer_type = 'adam'，Init_lr = 5e-4。（冻结）
110 |     #           Init_Epoch = 0，UnFreeze_Epoch = 100，Freeze_Train = False，optimizer_type = 'adam'，Init_lr = 5e-4。（不冻结）
111 |     #       SGD：
112 |     #           Init_Epoch = 0，Freeze_Epoch = 50，UnFreeze_Epoch = 120，Freeze_Train = True，optimizer_type = 'sgd'，Init_lr = 1e-2。（冻结）
113 |     #           Init_Epoch = 0，UnFreeze_Epoch = 120，Freeze_Train = False，optimizer_type = 'sgd'，Init_lr = 1e-2。（不冻结）
114 |     #       其中：由于从主干网络的预训练权重开始训练，主干的权值不一定适合语义分割，需要更多的训练跳出局部最优解。
115 |     #             UnFreeze_Epoch可以在120-300之间调整。
116 |     #             Adam相较于SGD收敛的快一些。因此UnFreeze_Epoch理论上可以小一点，但依然推荐更多的Epoch。
117 |     #   （三）batch_size的设置：
118 |     #       在显卡能够接受的范围内，以大为好。显存不足与数据集大小无关，提示显存不足（OOM或者CUDA out of memory）请调小batch_size。
119 |     #       受到BatchNorm层影响，batch_size最小为2，不能为1。
120 |     #       正常情况下Freeze_batch_size建议为Unfreeze_batch_size的1-2倍。不建议设置的差距过大，因为关系到学习率的自动调整。
121 |     #----------------------------------------------------------------------------------------------------------------------------#
122 |     #------------------------------------------------------------------#
123 |     #   冻结阶段训练参数
124 |     #   此时模型的主干被冻结了，特征提取网络不发生改变
125 |     #   占用的显存较小，仅对网络进行微调
126 |     #   Init_Epoch          模型当前开始的训练世代，其值可以大于Freeze_Epoch，如设置：
127 |     #                       Init_Epoch = 60、Freeze_Epoch = 50、UnFreeze_Epoch = 100
128 |     #                       会跳过冻结阶段，直接从60代开始，并调整对应的学习率。
129 |     #                       （断点续练时使用）
130 |     #   Freeze_Epoch        模型冻结训练的Freeze_Epoch
131 |     #                       (当Freeze_Train=False时失效)
132 |     #   Freeze_batch_size   模型冻结训练的batch_size
133 |     #                       (当Freeze_Train=False时失效)
134 |     #------------------------------------------------------------------#
135 |     Init_Epoch          = 0
136 |     Freeze_Epoch        = 50
137 |     Freeze_batch_size   = 8
138 |     #------------------------------------------------------------------#
139 |     #   解冻阶段训练参数
140 |     #   此时模型的主干不被冻结了，特征提取网络会发生改变
141 |     #   占用的显存较大，网络所有的参数都会发生改变
142 |     #   UnFreeze_Epoch          模型总共训练的epoch
143 |     #   Unfreeze_batch_size     模型在解冻后的batch_size
144 |     #------------------------------------------------------------------#
145 |     UnFreeze_Epoch      = 100
146 |     Unfreeze_batch_size = 4
147 |     #------------------------------------------------------------------#
148 |     #   Freeze_Train    是否进行冻结训练
149 |     #                   默认先冻结主干训练后解冻训练。
150 |     #------------------------------------------------------------------#
151 |     Freeze_Train        = True
152 | 
153 |     #------------------------------------------------------------------#
154 |     #   其它训练参数：学习率、优化器、学习率下降有关
155 |     #------------------------------------------------------------------#
156 |     #------------------------------------------------------------------#
157 |     #   Init_lr         模型的最大学习率
158 |     #                   当使用Adam优化器时建议设置  Init_lr=5e-4
159 |     #                   当使用SGD优化器时建议设置   Init_lr=1e-2
160 |     #   Min_lr          模型的最小学习率，默认为最大学习率的0.01
161 |     #------------------------------------------------------------------#
162 |     Init_lr             = 1e-2
163 |     Min_lr              = Init_lr * 0.01
164 |     #------------------------------------------------------------------#
165 |     #   optimizer_type  使用到的优化器种类，可选的有adam、sgd
166 |     #                   当使用Adam优化器时建议设置  Init_lr=1e-3
167 |     #                   当使用SGD优化器时建议设置   Init_lr=1e-2
168 |     #   momentum        优化器内部使用到的momentum参数
169 |     #------------------------------------------------------------------#
170 |     optimizer_type      = "sgd"
171 |     momentum            = 0.9
172 |     #------------------------------------------------------------------#
173 |     #   lr_decay_type   使用到的学习率下降方式，可选的有'step'、'cos'
174 |     #------------------------------------------------------------------#
175 |     lr_decay_type       = 'cos'
176 |     #------------------------------------------------------------------#
177 |     #   save_period     多少个epoch保存一次权值
178 |     #------------------------------------------------------------------#
179 |     save_period         = 5
180 |     #------------------------------------------------------------------#
181 |     #   save_dir        权值与日志文件保存的文件夹
182 |     #------------------------------------------------------------------#
183 |     save_dir            = 'logs'
184 |     #------------------------------------------------------------------#
185 |     #   eval_flag       是否在训练时进行评估，评估对象为验证集
186 |     #   eval_period     代表多少个epoch评估一次，不建议频繁的评估
187 |     #                   评估需要消耗较多的时间，频繁评估会导致训练非常慢
188 |     #   此处获得的mAP会与get_map.py获得的会有所不同，原因有二：
189 |     #   （一）此处获得的mAP为验证集的mAP。
190 |     #   （二）此处设置评估参数较为保守，目的是加快评估速度。
191 |     #------------------------------------------------------------------#
192 |     eval_flag           = True
193 |     eval_period         = 5
194 | 
195 |     #------------------------------------------------------------------#
196 |     #   VOCdevkit_path  数据集路径
197 |     #------------------------------------------------------------------#
198 |     VOCdevkit_path  = 'VOCdevkit'
199 |     #------------------------------------------------------------------#
200 |     #   建议选项：
201 |     #   种类少（几类）时，设置为True
202 |     #   种类多（十几类）时，如果batch_size比较大（10以上），那么设置为True
203 |     #   种类多（十几类）时，如果batch_size比较小（10以下），那么设置为False
204 |     #------------------------------------------------------------------#
205 |     dice_loss       = False
206 |     #------------------------------------------------------------------#
207 |     #   是否使用focal loss来防止正负样本不平衡
208 |     #------------------------------------------------------------------#
209 |     focal_loss      = False
210 |     #------------------------------------------------------------------#
211 |     #   是否给不同种类赋予不同的损失权值，默认是平衡的。
212 |     #   设置的话，注意设置成numpy形式的，长度和num_classes一样。
213 |     #   如：
214 |     #   num_classes = 3
215 |     #   cls_weights = np.array([1, 2, 3], np.float32)
216 |     #------------------------------------------------------------------#
217 |     cls_weights     = np.ones([num_classes], np.float32)
218 |     #---------------------------------------------------------------------# 
219 |     #   是否使用辅助分支，会占用大量显存
220 |     #---------------------------------------------------------------------# 
221 |     aux_branch      = False
222 |     #-------------------------------------------------------------------#
223 |     #   用于设置是否使用多线程读取数据，1代表关闭多线程
224 |     #   开启后会加快数据读取速度，但是会占用更多内存
225 |     #   在IO为瓶颈的时候再开启多线程，即GPU运算速度远大于读取图片的速度。
226 |     #-------------------------------------------------------------------#
227 |     num_workers     = 1
228 | 
229 |     #------------------------------------------------------#
230 |     #   设置用到的显卡
231 |     #------------------------------------------------------#
232 |     os.environ["CUDA_VISIBLE_DEVICES"]  = ','.join(str(x) for x in train_gpu)
233 |     ngpus_per_node                      = len(train_gpu)
234 |     
235 |     gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
236 |     for gpu in gpus:
237 |         tf.config.experimental.set_memory_growth(gpu, True)
238 |     
239 |     #------------------------------------------------------#
240 |     #   判断当前使用的GPU数量与机器上实际的GPU数量
241 |     #------------------------------------------------------#
242 |     if ngpus_per_node > 1 and ngpus_per_node > len(gpus):
243 |         raise ValueError("The number of GPUs specified for training is more than the GPUs on the machine")
244 |         
245 |     if ngpus_per_node > 1:
246 |         strategy = tf.distribute.MirroredStrategy()
247 |     else:
248 |         strategy = None
249 |     print('Number of devices: {}'.format(ngpus_per_node))
250 | 
251 |     if ngpus_per_node > 1:
252 |         with strategy.scope():
253 |             #------------------------------------------------------#
254 |             #   获取model
255 |             #------------------------------------------------------#
256 |             model = pspnet([input_shape[0], input_shape[1], 3], num_classes, downsample_factor=downsample_factor, backbone=backbone, aux_branch=aux_branch)
257 |             if model_path != '':
258 |                 #------------------------------------------------------#
259 |                 #   载入预训练权重
260 |                 #------------------------------------------------------#
261 |                 print('Load weights {}.'.format(model_path))
262 |                 model.load_weights(model_path, by_name=True, skip_mismatch=True)
263 |     else:
264 |         #------------------------------------------------------#
265 |         #   获取model
266 |         #------------------------------------------------------#
267 |         model = pspnet([input_shape[0], input_shape[1], 3], num_classes, downsample_factor=downsample_factor, backbone=backbone, aux_branch=aux_branch)
268 |         if model_path != '':
269 |             #------------------------------------------------------#
270 |             #   载入预训练权重
271 |             #------------------------------------------------------#
272 |             print('Load weights {}.'.format(model_path))
273 |             model.load_weights(model_path, by_name=True, skip_mismatch=True)
274 | 
275 |     #--------------------------#
276 |     #   使用到的损失函数
277 |     #--------------------------#
278 |     if focal_loss:
279 |         if dice_loss:
280 |             loss = dice_loss_with_Focal_Loss(cls_weights)
281 |         else:
282 |             loss = Focal_Loss(cls_weights)
283 |     else:
284 |         if dice_loss:
285 |             loss = dice_loss_with_CE(cls_weights)
286 |         else:
287 |             loss = CE(cls_weights)
288 | 
289 |     #---------------------------#
290 |     #   读取数据集对应的txt
291 |     #---------------------------#
292 |     with open(os.path.join(VOCdevkit_path, "VOC2007/ImageSets/Segmentation/train.txt"),"r") as f:
293 |         train_lines = f.readlines()
294 |     with open(os.path.join(VOCdevkit_path, "VOC2007/ImageSets/Segmentation/val.txt"),"r") as f:
295 |         val_lines = f.readlines()
296 |     num_train   = len(train_lines)
297 |     num_val     = len(val_lines)
298 | 
299 |     show_config(
300 |         num_classes = num_classes, backbone = backbone, model_path = model_path, input_shape = input_shape, \
301 |         Init_Epoch = Init_Epoch, Freeze_Epoch = Freeze_Epoch, UnFreeze_Epoch = UnFreeze_Epoch, Freeze_batch_size = Freeze_batch_size, Unfreeze_batch_size = Unfreeze_batch_size, Freeze_Train = Freeze_Train, \
302 |         Init_lr = Init_lr, Min_lr = Min_lr, optimizer_type = optimizer_type, momentum = momentum, lr_decay_type = lr_decay_type, \
303 |         save_period = save_period, save_dir = save_dir, num_workers = num_workers, num_train = num_train, num_val = num_val
304 |     )
305 |     #-----------------------------------------------#
306 |     #   总训练世代指的是遍历全部数据的总次数
307 |     #   总训练步长指的是梯度下降的总次数 
308 |     #   计算建议Epoch时。只考虑了解冻部分
309 |     #-----------------------------------------------#
310 |     wanted_step = 1.5e4 if optimizer_type == "sgd" else 0.5e4
311 |     total_step  = num_train // Unfreeze_batch_size * UnFreeze_Epoch
312 |     if total_step <= wanted_step:
313 |         if num_train // Unfreeze_batch_size == 0:
314 |             raise ValueError('数据集过小，无法进行训练，请扩充数据集。')
315 |         wanted_epoch = wanted_step // (num_train // Unfreeze_batch_size) + 1
316 |         print("\n\033[1;33;44m[Warning] 使用%s优化器时，建议将训练总步长设置到%d以上。\033[0m"%(optimizer_type, wanted_step))
317 |         print("\033[1;33;44m[Warning] 本次运行的总训练数据量为%d，Unfreeze_batch_size为%d，共训练%d个Epoch，计算出总训练步长为%d。\033[0m"%(num_train, Unfreeze_batch_size, UnFreeze_Epoch, total_step))
318 |         print("\033[1;33;44m[Warning] 由于总训练步长为%d，小于建议总步长%d，建议设置总世代为%d。\033[0m"%(total_step, wanted_step, wanted_epoch))
319 | 
320 |     #------------------------------------------------------#
321 |     #   主干特征提取网络特征通用，冻结训练可以加快训练速度
322 |     #   也可以在训练初期防止权值被破坏。
323 |     #   Init_Epoch为起始世代
324 |     #   Freeze_Epoch为冻结训练的世代
325 |     #   Epoch总训练世代
326 |     #   提示OOM或者显存不足请调小Batch_size
327 |     #------------------------------------------------------#
328 |     if True:
329 |         if Freeze_Train:
330 |             #------------------------------------#
331 |             #   冻结一定部分训练
332 |             #------------------------------------#
333 |             if backbone=="mobilenet":
334 |                 freeze_layers = 146
335 |             else:
336 |                 freeze_layers = 172
337 |             for i in range(freeze_layers): model.layers[i].trainable = False
338 |             print('Freeze the first {} layers of total {} layers.'.format(freeze_layers, len(model.layers)))
339 |                 
340 |         #-------------------------------------------------------------------#
341 |         #   如果不冻结训练的话，直接设置batch_size为Unfreeze_batch_size
342 |         #-------------------------------------------------------------------#
343 |         batch_size  = Freeze_batch_size if Freeze_Train else Unfreeze_batch_size
344 |         
345 |         #-------------------------------------------------------------------#
346 |         #   判断当前batch_size，自适应调整学习率
347 |         #-------------------------------------------------------------------#
348 |         nbs             = 16
349 |         lr_limit_max    = 5e-4 if optimizer_type == 'adam' else 1e-1
350 |         lr_limit_min    = 3e-4 if optimizer_type == 'adam' else 5e-4
351 |         Init_lr_fit     = min(max(batch_size / nbs * Init_lr, lr_limit_min), lr_limit_max)
352 |         Min_lr_fit      = min(max(batch_size / nbs * Min_lr, lr_limit_min * 1e-2), lr_limit_max * 1e-2)
353 |         
354 |         #---------------------------------------#
355 |         #   获得学习率下降的公式
356 |         #---------------------------------------#
357 |         lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch)
358 | 
359 |         epoch_step      = num_train // batch_size
360 |         epoch_step_val  = num_val // batch_size
361 | 
362 |         if epoch_step == 0 or epoch_step_val == 0:
363 |             raise ValueError('数据集过小，无法进行训练，请扩充数据集。')
364 |         
365 |         train_dataloader    = PSPnetDataset(train_lines, input_shape, batch_size, num_classes, aux_branch, True, VOCdevkit_path)
366 |         val_dataloader      = PSPnetDataset(val_lines, input_shape, batch_size, num_classes, aux_branch, False, VOCdevkit_path)
367 | 
368 |         optimizer = {
369 |             'adam'  : Adam(lr = Init_lr, beta_1 = momentum),
370 |             'sgd'   : SGD(lr = Init_lr, momentum = momentum, nesterov=True)
371 |         }[optimizer_type]
372 |         if eager:
373 |             start_epoch     = Init_Epoch
374 |             end_epoch       = UnFreeze_Epoch
375 |             UnFreeze_flag   = False
376 | 
377 |             gen     = tf.data.Dataset.from_generator(partial(train_dataloader.generate), (tf.float32, tf.float32))
378 |             gen_val = tf.data.Dataset.from_generator(partial(val_dataloader.generate), (tf.float32, tf.float32))
379 | 
380 |             gen     = gen.shuffle(buffer_size = batch_size).prefetch(buffer_size = batch_size)
381 |             gen_val = gen_val.shuffle(buffer_size = batch_size).prefetch(buffer_size = batch_size)
382 |                     
383 |             if ngpus_per_node > 1:
384 |                 gen     = strategy.experimental_distribute_dataset(gen)
385 |                 gen_val = strategy.experimental_distribute_dataset(gen_val)
386 | 
387 |             time_str        = datetime.datetime.strftime(datetime.datetime.now(),'%Y_%m_%d_%H_%M_%S')
388 |             log_dir         = os.path.join(save_dir, "loss_" + str(time_str))
389 |             loss_history    = LossHistory(log_dir)
390 |             eval_callback   = EvalCallback(model, input_shape, num_classes, val_lines, VOCdevkit_path, log_dir, \
391 |                                             eval_flag=eval_flag, period=eval_period)
392 |             #---------------------------------------#
393 |             #   开始模型训练
394 |             #---------------------------------------#
395 |             for epoch in range(start_epoch, end_epoch):
396 |                 #---------------------------------------#
397 |                 #   如果模型有冻结学习部分
398 |                 #   则解冻，并设置参数
399 |                 #---------------------------------------#
400 |                 if epoch >= Freeze_Epoch and not UnFreeze_flag and Freeze_Train:
401 |                     batch_size      = Unfreeze_batch_size
402 | 
403 |                     #-------------------------------------------------------------------#
404 |                     #   判断当前batch_size，自适应调整学习率
405 |                     #-------------------------------------------------------------------#
406 |                     nbs             = 16
407 |                     lr_limit_max    = 5e-4 if optimizer_type == 'adam' else 1e-1
408 |                     lr_limit_min    = 3e-4 if optimizer_type == 'adam' else 5e-4
409 |                     Init_lr_fit     = min(max(batch_size / nbs * Init_lr, lr_limit_min), lr_limit_max)
410 |                     Min_lr_fit      = min(max(batch_size / nbs * Min_lr, lr_limit_min * 1e-2), lr_limit_max * 1e-2)
411 |                     #---------------------------------------#
412 |                     #   获得学习率下降的公式
413 |                     #---------------------------------------#
414 |                     lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch)
415 | 
416 |                     for i in range(len(model.layers)): 
417 |                         model.layers[i].trainable = True
418 | 
419 |                     epoch_step      = num_train // batch_size
420 |                     epoch_step_val  = num_val // batch_size
421 | 
422 |                     if epoch_step == 0 or epoch_step_val == 0:
423 |                         raise ValueError("数据集过小，无法继续进行训练，请扩充数据集。")
424 | 
425 |                     train_dataloader.batch_size    = batch_size
426 |                     val_dataloader.batch_size      = batch_size
427 | 
428 |                     gen     = tf.data.Dataset.from_generator(partial(train_dataloader.generate), (tf.float32, tf.float32))
429 |                     gen_val = tf.data.Dataset.from_generator(partial(val_dataloader.generate), (tf.float32, tf.float32))
430 | 
431 |                     gen     = gen.shuffle(buffer_size = batch_size).prefetch(buffer_size = batch_size)
432 |                     gen_val = gen_val.shuffle(buffer_size = batch_size).prefetch(buffer_size = batch_size)
433 |                     
434 |                     if ngpus_per_node > 1:
435 |                         gen     = strategy.experimental_distribute_dataset(gen)
436 |                         gen_val = strategy.experimental_distribute_dataset(gen_val)
437 |                         
438 |                     UnFreeze_flag = True
439 | 
440 |                 lr = lr_scheduler_func(epoch)
441 |                 K.set_value(optimizer.lr, lr)
442 |                 
443 |                 fit_one_epoch(model, loss, loss_history, eval_callback, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, 
444 |                             end_epoch, aux_branch, f_score(), save_period, save_dir, strategy)
445 | 
446 |                 train_dataloader.on_epoch_end()
447 |                 val_dataloader.on_epoch_end()
448 | 
449 |         else:
450 |             start_epoch = Init_Epoch
451 |             end_epoch   = Freeze_Epoch if Freeze_Train else UnFreeze_Epoch
452 |             if ngpus_per_node > 1:
453 |                 with strategy.scope():
454 |                     if aux_branch:
455 |                         model.compile(
456 |                             loss            = [loss, loss], 
457 |                             loss_weights    = [1, 0.4], 
458 |                             optimizer       = optimizer,
459 |                             metrics         = [f_score()],
460 |                         )
461 |                     else:
462 |                         model.compile(
463 |                             loss            = loss, 
464 |                             optimizer       = optimizer,
465 |                             metrics         = [f_score()]
466 |                         )
467 |             else:
468 |                 if aux_branch:
469 |                     model.compile(
470 |                         loss            = [loss, loss], 
471 |                         loss_weights    = [1, 0.4], 
472 |                         optimizer       = optimizer,
473 |                         metrics         = [f_score()],
474 |                     )
475 |                 else:
476 |                     model.compile(
477 |                         loss            = loss, 
478 |                         optimizer       = optimizer,
479 |                         metrics         = [f_score()]
480 |                     )
481 | 
482 |             #-------------------------------------------------------------------------------#
483 |             #   训练参数的设置
484 |             #   logging         用于设置tensorboard的保存地址
485 |             #   checkpoint      用于设置权值保存的细节，period用于修改多少epoch保存一次
486 |             #   lr_scheduler       用于设置学习率下降的方式
487 |             #   early_stopping  用于设定早停，val_loss多次不下降自动结束训练，表示模型基本收敛
488 |             #-------------------------------------------------------------------------------#
489 |             time_str        = datetime.datetime.strftime(datetime.datetime.now(),'%Y_%m_%d_%H_%M_%S')
490 |             log_dir         = os.path.join(save_dir, "loss_" + str(time_str))
491 |             logging         = TensorBoard(log_dir)
492 |             loss_history    = LossHistory(log_dir)
493 |             checkpoint      = ModelCheckpoint(os.path.join(save_dir, "ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5"), 
494 |                                     monitor = 'val_loss', save_weights_only = True, save_best_only = False, period = save_period)
495 |             checkpoint_last = ModelCheckpoint(os.path.join(save_dir, "last_epoch_weights.h5"), 
496 |                                     monitor = 'val_loss', save_weights_only = True, save_best_only = False, period = 1)
497 |             checkpoint_best = ModelCheckpoint(os.path.join(save_dir, "best_epoch_weights.h5"), 
498 |                                     monitor = 'val_loss', save_weights_only = True, save_best_only = True, period = 1)
499 |             early_stopping  = EarlyStopping(monitor='val_loss', min_delta = 0, patience = 10, verbose = 1)
500 |             lr_scheduler    = LearningRateScheduler(lr_scheduler_func, verbose = 1)
501 |             eval_callback   = EvalCallback(model, input_shape, num_classes, val_lines, VOCdevkit_path, log_dir, \
502 |                                             eval_flag=eval_flag, period=eval_period)
503 |             callbacks       = [logging, loss_history, checkpoint, checkpoint_last, checkpoint_best, lr_scheduler, eval_callback]
504 | 
505 |             if start_epoch < end_epoch:
506 |                 print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))
507 |                 model.fit(
508 |                     x                   = train_dataloader,
509 |                     steps_per_epoch     = epoch_step,
510 |                     validation_data     = val_dataloader,
511 |                     validation_steps    = epoch_step_val,
512 |                     epochs              = end_epoch,
513 |                     initial_epoch       = start_epoch,
514 |                     use_multiprocessing = True if num_workers > 1 else False,
515 |                     workers             = num_workers,
516 |                     callbacks           = callbacks
517 |                 )
518 |             #---------------------------------------#
519 |             #   如果模型有冻结学习部分
520 |             #   则解冻，并设置参数
521 |             #---------------------------------------#
522 |             if Freeze_Train:
523 |                 batch_size  = Unfreeze_batch_size
524 |                 start_epoch = Freeze_Epoch if start_epoch < Freeze_Epoch else start_epoch
525 |                 end_epoch   = UnFreeze_Epoch
526 |                     
527 |                 #-------------------------------------------------------------------#
528 |                 #   判断当前batch_size，自适应调整学习率
529 |                 #-------------------------------------------------------------------#
530 |                 nbs             = 16
531 |                 lr_limit_max    = 5e-4 if optimizer_type == 'adam' else 1e-1
532 |                 lr_limit_min    = 3e-4 if optimizer_type == 'adam' else 5e-4
533 |                 Init_lr_fit     = min(max(batch_size / nbs * Init_lr, lr_limit_min), lr_limit_max)
534 |                 Min_lr_fit      = min(max(batch_size / nbs * Min_lr, lr_limit_min * 1e-2), lr_limit_max * 1e-2)
535 |                 #---------------------------------------#
536 |                 #   获得学习率下降的公式
537 |                 #---------------------------------------#
538 |                 lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch)
539 |                 lr_scheduler    = LearningRateScheduler(lr_scheduler_func, verbose = 1)
540 |                 callbacks       = [logging, loss_history, checkpoint, checkpoint_last, checkpoint_best, lr_scheduler, eval_callback]
541 |                     
542 |                 for i in range(len(model.layers)): 
543 |                     model.layers[i].trainable = True
544 |                 if ngpus_per_node > 1:
545 |                     with strategy.scope():
546 |                         if aux_branch:
547 |                             model.compile(
548 |                                 loss            = [loss, loss], 
549 |                                 loss_weights    = [1, 0.4], 
550 |                                 optimizer       = optimizer,
551 |                                 metrics         = [f_score()],
552 |                             )
553 |                         else:
554 |                             model.compile(
555 |                                 loss            = loss, 
556 |                                 optimizer       = optimizer,
557 |                                 metrics         = [f_score()]
558 |                             )
559 |                 else:
560 |                     if aux_branch:
561 |                         model.compile(
562 |                             loss            = [loss, loss], 
563 |                             loss_weights    = [1, 0.4], 
564 |                             optimizer       = optimizer,
565 |                             metrics         = [f_score()],
566 |                         )
567 |                     else:
568 |                         model.compile(
569 |                             loss            = loss, 
570 |                             optimizer       = optimizer,
571 |                             metrics         = [f_score()]
572 |                         )
573 | 
574 |                 epoch_step      = num_train // batch_size
575 |                 epoch_step_val  = num_val // batch_size
576 | 
577 |                 if epoch_step == 0 or epoch_step_val == 0:
578 |                     raise ValueError("数据集过小，无法继续进行训练，请扩充数据集。")
579 | 
580 |                 train_dataloader.batch_size    = Unfreeze_batch_size
581 |                 val_dataloader.batch_size      = Unfreeze_batch_size
582 | 
583 |                 print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))
584 |                 model.fit(
585 |                     x                   = train_dataloader,
586 |                     steps_per_epoch     = epoch_step,
587 |                     validation_data     = val_dataloader,
588 |                     validation_steps    = epoch_step_val,
589 |                     epochs              = end_epoch,
590 |                     initial_epoch       = start_epoch,
591 |                     use_multiprocessing = True if num_workers > 1 else False,
592 |                     workers             = num_workers,
593 |                     callbacks           = callbacks
594 |                 )
595 | 


--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
1 | #


--------------------------------------------------------------------------------
/utils/callbacks.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import warnings
  3 | 
  4 | import matplotlib
  5 | matplotlib.use('Agg')
  6 | from matplotlib import pyplot as plt
  7 | import scipy.signal
  8 | 
  9 | import cv2
 10 | import shutil
 11 | import numpy as np
 12 | import tensorflow as tf
 13 | 
 14 | from PIL import Image
 15 | from tensorflow import keras
 16 | from tensorflow.keras import backend as K
 17 | from tqdm import tqdm
 18 | from .utils import cvtColor, preprocess_input, resize_image
 19 | from .utils_metrics import compute_mIoU
 20 | 
 21 | 
 22 | class LossHistory(keras.callbacks.Callback):
 23 |     def __init__(self, log_dir):
 24 |         self.log_dir    = log_dir
 25 |         self.losses     = []
 26 |         self.val_loss   = []
 27 |         
 28 |         os.makedirs(self.log_dir)
 29 | 
 30 |     def on_epoch_end(self, epoch, logs={}):
 31 |         if not os.path.exists(self.log_dir):
 32 |             os.makedirs(self.log_dir)
 33 | 
 34 |         self.losses.append(logs.get('loss'))
 35 |         self.val_loss.append(logs.get('val_loss'))
 36 |         
 37 |         with open(os.path.join(self.log_dir, "epoch_loss.txt"), 'a') as f:
 38 |             f.write(str(logs.get('loss')))
 39 |             f.write("\n")
 40 |         with open(os.path.join(self.log_dir, "epoch_val_loss.txt"), 'a') as f:
 41 |             f.write(str(logs.get('val_loss')))
 42 |             f.write("\n")
 43 |         self.loss_plot()
 44 | 
 45 |     def loss_plot(self):
 46 |         iters = range(len(self.losses))
 47 | 
 48 |         plt.figure()
 49 |         plt.plot(iters, self.losses, 'red', linewidth = 2, label='train loss')
 50 |         plt.plot(iters, self.val_loss, 'coral', linewidth = 2, label='val loss')
 51 |         try:
 52 |             if len(self.losses) < 25:
 53 |                 num = 5
 54 |             else:
 55 |                 num = 15
 56 |             
 57 |             plt.plot(iters, scipy.signal.savgol_filter(self.losses, num, 3), 'green', linestyle = '--', linewidth = 2, label='smooth train loss')
 58 |             plt.plot(iters, scipy.signal.savgol_filter(self.val_loss, num, 3), '#8B4513', linestyle = '--', linewidth = 2, label='smooth val loss')
 59 |         except:
 60 |             pass
 61 | 
 62 |         plt.grid(True)
 63 |         plt.xlabel('Epoch')
 64 |         plt.ylabel('Loss')
 65 |         plt.title('A Loss Curve')
 66 |         plt.legend(loc="upper right")
 67 | 
 68 |         plt.savefig(os.path.join(self.log_dir, "epoch_loss.png"))
 69 | 
 70 |         plt.cla()
 71 |         plt.close("all")
 72 | 
 73 | class ExponentDecayScheduler(keras.callbacks.Callback):
 74 |     def __init__(self,
 75 |                  decay_rate,
 76 |                  verbose=0):
 77 |         super(ExponentDecayScheduler, self).__init__()
 78 |         self.decay_rate         = decay_rate
 79 |         self.verbose            = verbose
 80 |         self.learning_rates     = []
 81 | 
 82 |     def on_epoch_end(self, batch, logs=None):
 83 |         learning_rate = K.get_value(self.model.optimizer.lr) * self.decay_rate
 84 |         K.set_value(self.model.optimizer.lr, learning_rate)
 85 |         if self.verbose > 0:
 86 |             print('\nSetting learning rate to %s.' % (learning_rate))
 87 | 
 88 | class EvalCallback(keras.callbacks.Callback):
 89 |     def __init__(self, model_body, input_shape, num_classes, image_ids, dataset_path, log_dir,\
 90 |             miou_out_path=".temp_miou_out", eval_flag=True, period=1):
 91 |         super(EvalCallback, self).__init__()
 92 |         
 93 |         self.model_body         = model_body
 94 |         self.input_shape        = input_shape
 95 |         self.num_classes        = num_classes
 96 |         self.image_ids          = image_ids
 97 |         self.dataset_path       = dataset_path
 98 |         self.log_dir            = log_dir
 99 |         self.miou_out_path      = miou_out_path
100 |         self.eval_flag          = eval_flag
101 |         self.period             = period
102 |         
103 |         self.image_ids          = [image_id.split()[0] for image_id in image_ids]
104 |         self.mious      = [0]
105 |         self.epoches    = [0]
106 |         if self.eval_flag:
107 |             with open(os.path.join(self.log_dir, "epoch_miou.txt"), 'a') as f:
108 |                 f.write(str(0))
109 |                 f.write("\n")
110 | 
111 |     @tf.function
112 |     def get_pred(self, image_data):
113 |         pr = self.model_body(image_data, training=False)
114 |         return pr
115 |     
116 |     def get_miou_png(self, image):
117 |         #---------------------------------------------------------#
118 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
119 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
120 |         #---------------------------------------------------------#
121 |         image       = cvtColor(image)
122 |         orininal_h  = np.array(image).shape[0]
123 |         orininal_w  = np.array(image).shape[1]
124 |         #---------------------------------------------------------#
125 |         #   给图像增加灰条，实现不失真的resize
126 |         #   也可以直接resize进行识别 
127 |         #---------------------------------------------------------#
128 |         image_data, nw, nh  = resize_image(image, (self.input_shape[1], self.input_shape[0]))
129 |         #---------------------------------------------------------#
130 |         #   归一化+通道数调整到第一维度+添加上batch_size维度
131 |         #---------------------------------------------------------#
132 |         image_data  = np.expand_dims(preprocess_input(np.array(image_data, np.float32)), 0)
133 | 
134 |         #---------------------------------------------------#
135 |         #   图片传入网络进行预测
136 |         #---------------------------------------------------#
137 |         pr = self.get_pred(image_data)[0].numpy()
138 |         #--------------------------------------#
139 |         #   将灰条部分截取掉
140 |         #--------------------------------------#
141 |         pr = pr[int((self.input_shape[0] - nh) // 2) : int((self.input_shape[0] - nh) // 2 + nh), \
142 |                 int((self.input_shape[1] - nw) // 2) : int((self.input_shape[1] - nw) // 2 + nw)]
143 |         #--------------------------------------#
144 |         #   进行图片的resize
145 |         #--------------------------------------#
146 |         pr = cv2.resize(pr, (orininal_w, orininal_h), interpolation = cv2.INTER_LINEAR)
147 |         #---------------------------------------------------#
148 |         #   取出每一个像素点的种类
149 |         #---------------------------------------------------#
150 |         pr = pr.argmax(axis=-1)
151 | 
152 |         image = Image.fromarray(np.uint8(pr))
153 |         return image
154 |     
155 |     def on_epoch_end(self, epoch, logs=None):
156 |         temp_epoch = epoch + 1
157 |         if temp_epoch % self.period == 0 and self.eval_flag:
158 |             gt_dir      = os.path.join(self.dataset_path, "VOC2007/SegmentationClass/")
159 |             pred_dir    = os.path.join(self.miou_out_path, 'detection-results')
160 |             if not os.path.exists(self.miou_out_path):
161 |                 os.makedirs(self.miou_out_path)
162 |             if not os.path.exists(pred_dir):
163 |                 os.makedirs(pred_dir)
164 |             print("Get miou.")
165 |             for image_id in tqdm(self.image_ids):
166 |                 #-------------------------------#
167 |                 #   从文件中读取图像
168 |                 #-------------------------------#
169 |                 image_path  = os.path.join(self.dataset_path, "VOC2007/JPEGImages/"+image_id+".jpg")
170 |                 image       = Image.open(image_path)
171 |                 #------------------------------#
172 |                 #   获得预测txt
173 |                 #------------------------------#
174 |                 image       = self.get_miou_png(image)
175 |                 image.save(os.path.join(pred_dir, image_id + ".png"))
176 |                         
177 |             print("Calculate miou.")
178 |             _, IoUs, _, _ = compute_mIoU(gt_dir, pred_dir, self.image_ids, self.num_classes, None)  # 执行计算mIoU的函数
179 |             temp_miou = np.nanmean(IoUs) * 100
180 | 
181 |             self.mious.append(temp_miou)
182 |             self.epoches.append(temp_epoch)
183 | 
184 |             with open(os.path.join(self.log_dir, "epoch_miou.txt"), 'a') as f:
185 |                 f.write(str(temp_miou))
186 |                 f.write("\n")
187 |             
188 |             plt.figure()
189 |             plt.plot(self.epoches, self.mious, 'red', linewidth = 2, label='train miou')
190 | 
191 |             plt.grid(True)
192 |             plt.xlabel('Epoch')
193 |             plt.ylabel('Miou')
194 |             plt.title('A Miou Curve')
195 |             plt.legend(loc="upper right")
196 | 
197 |             plt.savefig(os.path.join(self.log_dir, "epoch_miou.png"))
198 |             plt.cla()
199 |             plt.close("all")
200 | 
201 |             print("Get miou done.")
202 |             shutil.rmtree(self.miou_out_path)
203 | 
204 | class ModelCheckpoint(keras.callbacks.Callback):
205 |     def __init__(self, filepath, monitor='val_loss', verbose=0,
206 |                  save_best_only=False, save_weights_only=False,
207 |                  mode='auto', period=1):
208 |         super(ModelCheckpoint, self).__init__()
209 |         self.monitor = monitor
210 |         self.verbose = verbose
211 |         self.filepath = filepath
212 |         self.save_best_only = save_best_only
213 |         self.save_weights_only = save_weights_only
214 |         self.period = period
215 |         self.epochs_since_last_save = 0
216 | 
217 |         if mode not in ['auto', 'min', 'max']:
218 |             warnings.warn('ModelCheckpoint mode %s is unknown, '
219 |                           'fallback to auto mode.' % (mode),
220 |                           RuntimeWarning)
221 |             mode = 'auto'
222 | 
223 |         if mode == 'min':
224 |             self.monitor_op = np.less
225 |             self.best = np.Inf
226 |         elif mode == 'max':
227 |             self.monitor_op = np.greater
228 |             self.best = -np.Inf
229 |         else:
230 |             if 'acc' in self.monitor or self.monitor.startswith('fmeasure'):
231 |                 self.monitor_op = np.greater
232 |                 self.best = -np.Inf
233 |             else:
234 |                 self.monitor_op = np.less
235 |                 self.best = np.Inf
236 | 
237 |     def on_epoch_end(self, epoch, logs=None):
238 |         logs = logs or {}
239 |         self.epochs_since_last_save += 1
240 |         if self.epochs_since_last_save >= self.period:
241 |             self.epochs_since_last_save = 0
242 |             filepath = self.filepath.format(epoch=epoch + 1, **logs)
243 |             if self.save_best_only:
244 |                 current = logs.get(self.monitor)
245 |                 if current is None:
246 |                     warnings.warn('Can save best model only with %s available, '
247 |                                   'skipping.' % (self.monitor), RuntimeWarning)
248 |                 else:
249 |                     if self.monitor_op(current, self.best):
250 |                         if self.verbose > 0:
251 |                             print('\nEpoch %05d: %s improved from %0.5f to %0.5f,'
252 |                                   ' saving model to %s'
253 |                                   % (epoch + 1, self.monitor, self.best,
254 |                                      current, filepath))
255 |                         self.best = current
256 |                         if self.save_weights_only:
257 |                             self.model.save_weights(filepath, overwrite=True)
258 |                         else:
259 |                             self.model.save(filepath, overwrite=True)
260 |                     else:
261 |                         if self.verbose > 0:
262 |                             print('\nEpoch %05d: %s did not improve' %
263 |                                   (epoch + 1, self.monitor))
264 |             else:
265 |                 if self.verbose > 0:
266 |                     print('\nEpoch %05d: saving model to %s' % (epoch + 1, filepath))
267 |                 if self.save_weights_only:
268 |                     self.model.save_weights(filepath, overwrite=True)
269 |                 else:
270 |                     self.model.save(filepath, overwrite=True)
271 | 


--------------------------------------------------------------------------------
/utils/dataloader.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | import os
  3 | from random import shuffle
  4 | 
  5 | import cv2
  6 | import numpy as np
  7 | from PIL import Image
  8 | from tensorflow import keras
  9 | 
 10 | from utils.utils import cvtColor, preprocess_input
 11 | 
 12 | 
 13 | class PSPnetDataset(keras.utils.Sequence):
 14 |     def __init__(self, annotation_lines, input_shape, batch_size, num_classes, aux_branch, train, dataset_path):
 15 |         self.annotation_lines   = annotation_lines
 16 |         self.length             = len(self.annotation_lines)
 17 |         self.input_shape        = input_shape
 18 |         self.batch_size         = batch_size
 19 |         self.num_classes        = num_classes
 20 |         self.aux_branch         = aux_branch
 21 |         self.train              = train
 22 |         self.dataset_path       = dataset_path
 23 | 
 24 |     def __len__(self):
 25 |         return math.ceil(len(self.annotation_lines) / float(self.batch_size))
 26 | 
 27 |     def __getitem__(self, index):
 28 |         images  = []
 29 |         targets = []
 30 |         for i in range(index * self.batch_size, (index + 1) * self.batch_size):  
 31 |             i           = i % self.length
 32 |             name        = self.annotation_lines[i].split()[0]
 33 |             #-------------------------------#
 34 |             #   从文件中读取图像
 35 |             #-------------------------------#
 36 |             jpg         = Image.open(os.path.join(os.path.join(self.dataset_path, "VOC2007/JPEGImages"), name + ".jpg"))
 37 |             png         = Image.open(os.path.join(os.path.join(self.dataset_path, "VOC2007/SegmentationClass"), name + ".png"))
 38 |             #-------------------------------#
 39 |             #   数据增强
 40 |             #-------------------------------#
 41 |             jpg, png    = self.get_random_data(jpg, png, self.input_shape, random = self.train)
 42 |             jpg         = preprocess_input(np.array(jpg, np.float64))
 43 |             png         = np.array(png)
 44 |             png[png >= self.num_classes] = self.num_classes
 45 |             #-------------------------------------------------------#
 46 |             #   转化成one_hot的形式
 47 |             #   在这里需要+1是因为voc数据集有些标签具有白边部分
 48 |             #   我们需要将白边部分进行忽略，+1的目的是方便忽略。
 49 |             #-------------------------------------------------------#
 50 |             seg_labels  = np.eye(self.num_classes + 1)[png.reshape([-1])]
 51 |             seg_labels  = seg_labels.reshape((int(self.input_shape[1]), int(self.input_shape[0]), self.num_classes+1))
 52 | 
 53 |             images.append(jpg)
 54 |             targets.append(seg_labels)
 55 | 
 56 |         images  = np.array(images)
 57 |         targets = np.array(targets)
 58 |         if self.aux_branch:
 59 |             return images, [targets, targets]
 60 |         else:
 61 |             return images, targets
 62 | 
 63 |     def generate(self):
 64 |         i = 0
 65 |         while True:
 66 |             images  = []
 67 |             targets = []
 68 |             for b in range(self.batch_size):
 69 |                 if i==0:
 70 |                     np.random.shuffle(self.annotation_lines)
 71 |                 name        = self.annotation_lines[i].split()[0]
 72 |                 #-------------------------------#
 73 |                 #   从文件中读取图像
 74 |                 #-------------------------------#
 75 |                 jpg         = Image.open(os.path.join(os.path.join(self.dataset_path, "VOC2007/JPEGImages"), name + ".jpg"))
 76 |                 png         = Image.open(os.path.join(os.path.join(self.dataset_path, "VOC2007/SegmentationClass"), name + ".png"))
 77 |                 #-------------------------------#
 78 |                 #   数据增强
 79 |                 #-------------------------------#
 80 |                 jpg, png    = self.get_random_data(jpg, png, self.input_shape, random = self.train)
 81 |                 jpg         = preprocess_input(np.array(jpg, np.float64))
 82 |                 png         = np.array(png)
 83 |                 png[png >= self.num_classes] = self.num_classes
 84 |                 #-------------------------------------------------------#
 85 |                 #   转化成one_hot的形式
 86 |                 #   在这里需要+1是因为voc数据集有些标签具有白边部分
 87 |                 #   我们需要将白边部分进行忽略，+1的目的是方便忽略。
 88 |                 #-------------------------------------------------------#
 89 |                 seg_labels  = np.eye(self.num_classes + 1)[png.reshape([-1])]
 90 |                 seg_labels  = seg_labels.reshape((int(self.input_shape[1]), int(self.input_shape[0]), self.num_classes+1))
 91 | 
 92 |                 images.append(jpg)
 93 |                 targets.append(seg_labels)
 94 |                 i           = (i + 1) % self.length
 95 |                 
 96 |             images  = np.array(images)
 97 |             targets = np.array(targets)
 98 |             yield images, targets
 99 |            
100 |     def on_epoch_end(self):
101 |         shuffle(self.annotation_lines)
102 |          
103 |     def rand(self, a=0, b=1):
104 |         return np.random.rand() * (b - a) + a
105 | 
106 |     def get_random_data(self, image, label, input_shape, jitter=.3, hue=.1, sat=0.7, val=0.3, random=True):
107 |         image   = cvtColor(image)
108 |         label   = Image.fromarray(np.array(label))
109 |         #------------------------------#
110 |         #   获得图像的高宽与目标高宽
111 |         #------------------------------#
112 |         iw, ih  = image.size
113 |         h, w    = input_shape
114 | 
115 |         if not random:
116 |             iw, ih  = image.size
117 |             scale   = min(w/iw, h/ih)
118 |             nw      = int(iw*scale)
119 |             nh      = int(ih*scale)
120 | 
121 |             image       = image.resize((nw,nh), Image.BICUBIC)
122 |             new_image   = Image.new('RGB', [w, h], (128,128,128))
123 |             new_image.paste(image, ((w-nw)//2, (h-nh)//2))
124 | 
125 |             label       = label.resize((nw,nh), Image.NEAREST)
126 |             new_label   = Image.new('L', [w, h], (0))
127 |             new_label.paste(label, ((w-nw)//2, (h-nh)//2))
128 |             return new_image, new_label
129 | 
130 |         #------------------------------------------#
131 |         #   对图像进行缩放并且进行长和宽的扭曲
132 |         #------------------------------------------#
133 |         new_ar = iw/ih * self.rand(1-jitter,1+jitter) / self.rand(1-jitter,1+jitter)
134 |         scale = self.rand(0.25, 2)
135 |         if new_ar < 1:
136 |             nh = int(scale*h)
137 |             nw = int(nh*new_ar)
138 |         else:
139 |             nw = int(scale*w)
140 |             nh = int(nw/new_ar)
141 |         image = image.resize((nw,nh), Image.BICUBIC)
142 |         label = label.resize((nw,nh), Image.NEAREST)
143 |         
144 |         #------------------------------------------#
145 |         #   翻转图像
146 |         #------------------------------------------#
147 |         flip = self.rand()<.5
148 |         if flip: 
149 |             image = image.transpose(Image.FLIP_LEFT_RIGHT)
150 |             label = label.transpose(Image.FLIP_LEFT_RIGHT)
151 |         
152 |         #------------------------------------------#
153 |         #   将图像多余的部分加上灰条
154 |         #------------------------------------------#
155 |         dx = int(self.rand(0, w-nw))
156 |         dy = int(self.rand(0, h-nh))
157 |         new_image = Image.new('RGB', (w,h), (128,128,128))
158 |         new_label = Image.new('L', (w,h), (0))
159 |         new_image.paste(image, (dx, dy))
160 |         new_label.paste(label, (dx, dy))
161 |         image = new_image
162 |         label = new_label
163 | 
164 |         image_data      = np.array(image, np.uint8)
165 | 
166 |         #------------------------------------------#
167 |         #   高斯模糊
168 |         #------------------------------------------#
169 |         blur = self.rand() < 0.25
170 |         if blur: 
171 |             image_data = cv2.GaussianBlur(image_data, (5, 5), 0)
172 | 
173 |         #------------------------------------------#
174 |         #   旋转
175 |         #------------------------------------------#
176 |         rotate = self.rand() < 0.25
177 |         if rotate: 
178 |             center      = (w // 2, h // 2)
179 |             rotation    = np.random.randint(-10, 11)
180 |             M           = cv2.getRotationMatrix2D(center, -rotation, scale=1)
181 |             image_data  = cv2.warpAffine(image_data, M, (w, h), flags=cv2.INTER_CUBIC, borderValue=(128,128,128))
182 |             label       = cv2.warpAffine(np.array(label, np.uint8), M, (w, h), flags=cv2.INTER_NEAREST, borderValue=(0))
183 | 
184 |         #---------------------------------#
185 |         #   对图像进行色域变换
186 |         #   计算色域变换的参数
187 |         #---------------------------------#
188 |         r               = np.random.uniform(-1, 1, 3) * [hue, sat, val] + 1
189 |         #---------------------------------#
190 |         #   将图像转到HSV上
191 |         #---------------------------------#
192 |         hue, sat, val   = cv2.split(cv2.cvtColor(image_data, cv2.COLOR_RGB2HSV))
193 |         dtype           = image_data.dtype
194 |         #---------------------------------#
195 |         #   应用变换
196 |         #---------------------------------#
197 |         x       = np.arange(0, 256, dtype=r.dtype)
198 |         lut_hue = ((x * r[0]) % 180).astype(dtype)
199 |         lut_sat = np.clip(x * r[1], 0, 255).astype(dtype)
200 |         lut_val = np.clip(x * r[2], 0, 255).astype(dtype)
201 | 
202 |         image_data = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val)))
203 |         image_data = cv2.cvtColor(image_data, cv2.COLOR_HSV2RGB)
204 |         
205 |         return image_data, label


--------------------------------------------------------------------------------
/utils/utils.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from PIL import Image
  3 | 
  4 | #---------------------------------------------------------#
  5 | #   将图像转换成RGB图像，防止灰度图在预测时报错。
  6 | #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
  7 | #---------------------------------------------------------#
  8 | def cvtColor(image):
  9 |     if len(np.shape(image)) == 3 and np.shape(image)[2] == 3:
 10 |         return image 
 11 |     else:
 12 |         image = image.convert('RGB')
 13 |         return image 
 14 | 
 15 | #---------------------------------------------------#
 16 | #   对输入图像进行resize
 17 | #---------------------------------------------------#
 18 | def resize_image(image, size):
 19 |     iw, ih  = image.size
 20 |     w, h    = size
 21 | 
 22 |     scale   = min(w/iw, h/ih)
 23 |     nw      = int(iw*scale)
 24 |     nh      = int(ih*scale)
 25 | 
 26 |     image   = image.resize((nw,nh), Image.BICUBIC)
 27 |     new_image = Image.new('RGB', size, (128,128,128))
 28 |     new_image.paste(image, ((w-nw)//2, (h-nh)//2))
 29 | 
 30 |     return new_image, nw, nh
 31 |     
 32 | def preprocess_input(image):
 33 |     image = image / 255
 34 |     return image
 35 | 
 36 | def show_config(**kwargs):
 37 |     print('Configurations:')
 38 |     print('-' * 70)
 39 |     print('|%25s | %40s|' % ('keys', 'values'))
 40 |     print('-' * 70)
 41 |     for key, value in kwargs.items():
 42 |         print('|%25s | %40s|' % (str(key), str(value)))
 43 |     print('-' * 70)
 44 | 
 45 | #-------------------------------------------------------------------------------------------------------------------------------#
 46 | #   From https://github.com/ckyrkou/Keras_FLOP_Estimator 
 47 | #   Fix lots of bugs
 48 | #-------------------------------------------------------------------------------------------------------------------------------#
 49 | def net_flops(model, table=False, print_result=True):
 50 |     if (table == True):
 51 |         print("\n")
 52 |         print('%25s | %16s | %16s | %16s | %16s | %6s | %6s' % (
 53 |             'Layer Name', 'Input Shape', 'Output Shape', 'Kernel Size', 'Filters', 'Strides', 'FLOPS'))
 54 |         print('=' * 120)
 55 |         
 56 |     #---------------------------------------------------#
 57 |     #   总的FLOPs
 58 |     #---------------------------------------------------#
 59 |     t_flops = 0
 60 |     factor  = 1e9
 61 | 
 62 |     for l in model.layers:
 63 |         try:
 64 |             #--------------------------------------#
 65 |             #   所需参数的初始化定义
 66 |             #--------------------------------------#
 67 |             o_shape, i_shape, strides, ks, filters = ('', '', ''), ('', '', ''), (1, 1), (0, 0), 0
 68 |             flops   = 0
 69 |             #--------------------------------------#
 70 |             #   获得层的名字
 71 |             #--------------------------------------#
 72 |             name    = l.name
 73 |             
 74 |             if ('InputLayer' in str(l)):
 75 |                 i_shape = l.get_input_shape_at(0)[1:4]
 76 |                 o_shape = l.get_output_shape_at(0)[1:4]
 77 |                 
 78 |             #--------------------------------------#
 79 |             #   Reshape层
 80 |             #--------------------------------------#
 81 |             elif ('Reshape' in str(l)):
 82 |                 i_shape = l.get_input_shape_at(0)[1:4]
 83 |                 o_shape = l.get_output_shape_at(0)[1:4]
 84 | 
 85 |             #--------------------------------------#
 86 |             #   填充层
 87 |             #--------------------------------------#
 88 |             elif ('Padding' in str(l)):
 89 |                 i_shape = l.get_input_shape_at(0)[1:4]
 90 |                 o_shape = l.get_output_shape_at(0)[1:4]
 91 | 
 92 |             #--------------------------------------#
 93 |             #   平铺层
 94 |             #--------------------------------------#
 95 |             elif ('Flatten' in str(l)):
 96 |                 i_shape = l.get_input_shape_at(0)[1:4]
 97 |                 o_shape = l.get_output_shape_at(0)[1:4]
 98 |                 
 99 |             #--------------------------------------#
100 |             #   激活函数层
101 |             #--------------------------------------#
102 |             elif 'Activation' in str(l):
103 |                 i_shape = l.get_input_shape_at(0)[1:4]
104 |                 o_shape = l.get_output_shape_at(0)[1:4]
105 |                 
106 |             #--------------------------------------#
107 |             #   LeakyReLU
108 |             #--------------------------------------#
109 |             elif 'LeakyReLU' in str(l):
110 |                 for i in range(len(l._inbound_nodes)):
111 |                     i_shape = l.get_input_shape_at(i)[1:4]
112 |                     o_shape = l.get_output_shape_at(i)[1:4]
113 |                     
114 |                     flops   += i_shape[0] * i_shape[1] * i_shape[2]
115 |                     
116 |             #--------------------------------------#
117 |             #   池化层
118 |             #--------------------------------------#
119 |             elif 'MaxPooling' in str(l):
120 |                 i_shape = l.get_input_shape_at(0)[1:4]
121 |                 o_shape = l.get_output_shape_at(0)[1:4]
122 |                     
123 |             #--------------------------------------#
124 |             #   池化层
125 |             #--------------------------------------#
126 |             elif ('AveragePooling' in str(l) and 'Global' not in str(l)):
127 |                 strides = l.strides
128 |                 ks      = l.pool_size
129 |                 
130 |                 for i in range(len(l._inbound_nodes)):
131 |                     i_shape = l.get_input_shape_at(i)[1:4]
132 |                     o_shape = l.get_output_shape_at(i)[1:4]
133 |                     
134 |                     flops   += o_shape[0] * o_shape[1] * o_shape[2]
135 | 
136 |             #--------------------------------------#
137 |             #   全局池化层
138 |             #--------------------------------------#
139 |             elif ('AveragePooling' in str(l) and 'Global' in str(l)):
140 |                 for i in range(len(l._inbound_nodes)):
141 |                     i_shape = l.get_input_shape_at(i)[1:4]
142 |                     o_shape = l.get_output_shape_at(i)[1:4]
143 |                     
144 |                     flops += (i_shape[0] * i_shape[1] + 1) * i_shape[2]
145 |                 
146 |             #--------------------------------------#
147 |             #   标准化层
148 |             #--------------------------------------#
149 |             elif ('BatchNormalization' in str(l)):
150 |                 for i in range(len(l._inbound_nodes)):
151 |                     i_shape = l.get_input_shape_at(i)[1:4]
152 |                     o_shape = l.get_output_shape_at(i)[1:4]
153 | 
154 |                     temp_flops = 1
155 |                     for i in range(len(i_shape)):
156 |                         temp_flops *= i_shape[i]
157 |                     temp_flops *= 2
158 |                     
159 |                     flops += temp_flops
160 |                 
161 |             #--------------------------------------#
162 |             #   全连接层
163 |             #--------------------------------------#
164 |             elif ('Dense' in str(l)):
165 |                 for i in range(len(l._inbound_nodes)):
166 |                     i_shape = l.get_input_shape_at(i)[1:4]
167 |                     o_shape = l.get_output_shape_at(i)[1:4]
168 |                 
169 |                     temp_flops = 1
170 |                     for i in range(len(o_shape)):
171 |                         temp_flops *= o_shape[i]
172 |                         
173 |                     if (i_shape[-1] == None):
174 |                         temp_flops = temp_flops * o_shape[-1]
175 |                     else:
176 |                         temp_flops = temp_flops * i_shape[-1]
177 |                     flops += temp_flops
178 | 
179 |             #--------------------------------------#
180 |             #   普通卷积层
181 |             #--------------------------------------#
182 |             elif ('Conv2D' in str(l) and 'DepthwiseConv2D' not in str(l) and 'SeparableConv2D' not in str(l)):
183 |                 strides = l.strides
184 |                 ks      = l.kernel_size
185 |                 filters = l.filters
186 |                 bias    = 1 if l.use_bias else 0
187 |                 
188 |                 for i in range(len(l._inbound_nodes)):
189 |                     i_shape = l.get_input_shape_at(i)[1:4]
190 |                     o_shape = l.get_output_shape_at(i)[1:4]
191 |                     
192 |                     if (filters == None):
193 |                         filters = i_shape[2]
194 |                     flops += filters * o_shape[0] * o_shape[1] * (ks[0] * ks[1] * i_shape[2] + bias)
195 | 
196 |             #--------------------------------------#
197 |             #   逐层卷积层
198 |             #--------------------------------------#
199 |             elif ('Conv2D' in str(l) and 'DepthwiseConv2D' in str(l) and 'SeparableConv2D' not in str(l)):
200 |                 strides = l.strides
201 |                 ks      = l.kernel_size
202 |                 filters = l.filters
203 |                 bias    = 1 if l.use_bias else 0
204 |             
205 |                 for i in range(len(l._inbound_nodes)):
206 |                     i_shape = l.get_input_shape_at(i)[1:4]
207 |                     o_shape = l.get_output_shape_at(i)[1:4]
208 |                     
209 |                     if (filters == None):
210 |                         filters = i_shape[2]
211 |                     flops += filters * o_shape[0] * o_shape[1] * (ks[0] * ks[1] + bias)
212 |                 
213 |             #--------------------------------------#
214 |             #   深度可分离卷积层
215 |             #--------------------------------------#
216 |             elif ('Conv2D' in str(l) and 'DepthwiseConv2D' not in str(l) and 'SeparableConv2D' in str(l)):
217 |                 strides = l.strides
218 |                 ks      = l.kernel_size
219 |                 filters = l.filters
220 |                 
221 |                 for i in range(len(l._inbound_nodes)):
222 |                     i_shape = l.get_input_shape_at(i)[1:4]
223 |                     o_shape = l.get_output_shape_at(i)[1:4]
224 |                     
225 |                     if (filters == None):
226 |                         filters = i_shape[2]
227 |                     flops += i_shape[2] * o_shape[0] * o_shape[1] * (ks[0] * ks[1] + bias) + \
228 |                              filters * o_shape[0] * o_shape[1] * (1 * 1 * i_shape[2] + bias)
229 |             #--------------------------------------#
230 |             #   模型中有模型时
231 |             #--------------------------------------#
232 |             elif 'Model' in str(l):
233 |                 flops = net_flops(l, print_result=False)
234 |                 
235 |             t_flops += flops
236 | 
237 |             if (table == True):
238 |                 print('%25s | %16s | %16s | %16s | %16s | %6s | %5.4f' % (
239 |                     name[:25], str(i_shape), str(o_shape), str(ks), str(filters), str(strides), flops))
240 |                 
241 |         except:
242 |             pass
243 |     
244 |     t_flops = t_flops * 2
245 |     if print_result:
246 |         show_flops = t_flops / factor
247 |         print('Total GFLOPs: %.3fG' % (show_flops))
248 |     return t_flops


--------------------------------------------------------------------------------
/utils/utils_fit.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | 
  3 | import tensorflow as tf
  4 | from tqdm import tqdm
  5 | 
  6 | 
  7 | def get_train_step_fn(strategy):
  8 |     @tf.function
  9 |     def train_step(images, labels, net, optimizer, loss, aux_branch, metrics):
 10 |         with tf.GradientTape() as tape:
 11 |             prediction = net(images, training=True)
 12 |             if aux_branch:
 13 |                 aux_loss    = loss(labels, prediction[0])
 14 |                 main_loss   = loss(labels, prediction[1])
 15 |                 loss_value  = 0.4 * aux_loss + main_loss
 16 |             else:
 17 |                 loss_value  = loss(labels, prediction)
 18 |         grads = tape.gradient(loss_value, net.trainable_variables)
 19 |         optimizer.apply_gradients(zip(grads, net.trainable_variables))
 20 |         
 21 |         if aux_branch:
 22 |             _f_score = tf.reduce_mean(metrics(labels, prediction[1]))
 23 |         else:
 24 |             _f_score = tf.reduce_mean(metrics(labels, prediction))
 25 |         return loss_value, _f_score
 26 |     if strategy == None:
 27 |         return train_step
 28 |     else:
 29 |         #----------------------#
 30 |         #   多gpu训练
 31 |         #----------------------#
 32 |         @tf.function
 33 |         def distributed_train_step(images, labels, net, optimizer, loss, aux_branch, metrics):
 34 |             per_replica_losses, per_replica_score = strategy.run(train_step, args=(images, labels, net, optimizer, loss, aux_branch, metrics))
 35 |             return strategy.reduce(tf.distribute.ReduceOp.MEAN, per_replica_losses, axis=None), strategy.reduce(tf.distribute.ReduceOp.MEAN, per_replica_score, axis=None)
 36 |         return distributed_train_step
 37 | 
 38 | #----------------------#
 39 | #   防止bug
 40 | #----------------------#
 41 | def get_val_step_fn(strategy):
 42 |     @tf.function
 43 |     def val_step(images, labels, net, optimizer, loss, aux_branch, metrics):
 44 |         prediction = net(images, training=False)
 45 |         if aux_branch:
 46 |             aux_loss    = loss(labels, prediction[0])
 47 |             main_loss   = loss(labels, prediction[1])
 48 |             loss_value  = 0.4 * aux_loss + main_loss
 49 |             _f_score    = tf.reduce_mean(metrics(labels, prediction[1]))
 50 |         else:
 51 |             loss_value  = loss(labels, prediction)
 52 |             _f_score    = tf.reduce_mean(metrics(labels, prediction))
 53 | 
 54 |         return loss_value, _f_score
 55 |     if strategy == None:
 56 |         return val_step
 57 |     else:
 58 |         #----------------------#
 59 |         #   多gpu验证
 60 |         #----------------------#
 61 |         @tf.function
 62 |         def distributed_val_step(images, labels, net, optimizer, loss, aux_branch, metrics):
 63 |             per_replica_losses, per_replica_score = strategy.run(val_step, args=(images, labels, net, optimizer, loss, aux_branch, metrics))
 64 |             return strategy.reduce(tf.distribute.ReduceOp.MEAN, per_replica_losses, axis=None), strategy.reduce(tf.distribute.ReduceOp.MEAN, per_replica_score, axis=None)
 65 |         return distributed_val_step
 66 |     
 67 | def fit_one_epoch(net, loss, loss_history, eval_callback, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, Epoch, aux_branch, metrics, save_period, save_dir, strategy):
 68 |     train_step      = get_train_step_fn(strategy)
 69 |     val_step        = get_val_step_fn(strategy)
 70 |     
 71 |     total_loss      = 0
 72 |     val_loss        = 0
 73 |     total_f_score   = 0
 74 |     val_f_score     = 0
 75 |     print('Start Train')
 76 |     with tqdm(total=epoch_step,desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3) as pbar:
 77 |         for iteration, batch in enumerate(gen):
 78 |             if iteration >= epoch_step:
 79 |                 break
 80 |             images, labels = batch[0], batch[1]
 81 |             loss_value, _f_score = train_step(images, labels, net, optimizer, loss, aux_branch, metrics)
 82 |             total_loss      += loss_value.numpy()
 83 |             total_f_score   += _f_score.numpy()
 84 | 
 85 |             pbar.set_postfix(**{'total Loss'    : total_loss / (iteration + 1), 
 86 |                                 'total f_score' : total_f_score / (iteration + 1),
 87 |                                 'lr'            : optimizer._decayed_lr(tf.float32).numpy()})
 88 |             pbar.update(1)
 89 |     print('Finish Train')
 90 |             
 91 |     print('Start Validation')
 92 |     with tqdm(total=epoch_step_val, desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3) as pbar:
 93 |         for iteration, batch in enumerate(gen_val):
 94 |             if iteration >= epoch_step_val:
 95 |                 break
 96 |             images, labels = batch[0], batch[1]
 97 |             loss_value, _f_score = val_step(images, labels, net, optimizer, loss, aux_branch, metrics)
 98 |             val_loss    += loss_value.numpy()
 99 |             val_f_score += _f_score.numpy()
100 | 
101 |             pbar.set_postfix(**{'val Loss'    : val_loss / (iteration + 1), 
102 |                                 'val f_score' : val_f_score / (iteration + 1)})
103 |             pbar.update(1)
104 |     print('Finish Validation')
105 | 
106 |     logs = {'loss': total_loss / epoch_step, 'val_loss': val_loss / epoch_step_val}
107 |     loss_history.on_epoch_end([], logs)
108 |     eval_callback.on_epoch_end(epoch, logs)
109 |     print('Epoch:'+ str(epoch+1) + '/' + str(Epoch))
110 |     print('Total Loss: %.3f || Val Loss: %.3f ' % (total_loss / epoch_step, val_loss / epoch_step_val))
111 |     
112 |     #-----------------------------------------------#
113 |     #   保存权值
114 |     #-----------------------------------------------#
115 |     if (epoch + 1) % save_period == 0 or epoch + 1 == Epoch:
116 |         net.save_weights(os.path.join(save_dir, 'ep%03d-loss%.3f-val_loss%.3f.h5' % (epoch + 1, total_loss / epoch_step, val_loss / epoch_step_val)))
117 |         
118 |     if len(loss_history.val_loss) <= 1 or (val_loss / epoch_step_val) <= min(loss_history.val_loss):
119 |         print('Save best model to best_epoch_weights.pth')
120 |         net.save_weights(os.path.join(save_dir, "best_epoch_weights.h5"))
121 |             
122 |     net.save_weights(os.path.join(save_dir, "last_epoch_weights.h5"))


--------------------------------------------------------------------------------
/utils/utils_metrics.py:
--------------------------------------------------------------------------------
  1 | import csv
  2 | import os
  3 | from os.path import join
  4 | 
  5 | import matplotlib.pyplot as plt
  6 | import numpy as np
  7 | from tensorflow.keras import backend
  8 | from PIL import Image
  9 | 
 10 | 
 11 | def Iou_score(smooth = 1e-5, threhold = 0.5):
 12 |     def _Iou_score(y_true, y_pred):
 13 |         # score calculation
 14 |         y_pred = backend.greater(y_pred, threhold)
 15 |         y_pred = backend.cast(y_pred, backend.floatx())
 16 |         intersection = backend.sum(y_true[...,:-1] * y_pred, axis=[0,1,2])
 17 |         union = backend.sum(y_true[...,:-1] + y_pred, axis=[0,1,2]) - intersection
 18 | 
 19 |         score = (intersection + smooth) / (union + smooth)
 20 |         return score
 21 |     return _Iou_score
 22 | 
 23 | def f_score(beta=1, smooth = 1e-5, threhold = 0.5):
 24 |     def _f_score(y_true, y_pred):
 25 |         y_pred = backend.greater(y_pred, threhold)
 26 |         y_pred = backend.cast(y_pred, backend.floatx())
 27 | 
 28 |         tp = backend.sum(y_true[...,:-1] * y_pred, axis=[0,1,2])
 29 |         fp = backend.sum(y_pred         , axis=[0,1,2]) - tp
 30 |         fn = backend.sum(y_true[...,:-1], axis=[0,1,2]) - tp
 31 | 
 32 |         score = ((1 + beta ** 2) * tp + smooth) \
 33 |                 / ((1 + beta ** 2) * tp + beta ** 2 * fn + fp + smooth)
 34 |         return score
 35 |     return _f_score
 36 | 
 37 | # 设标签宽W，长H
 38 | def fast_hist(a, b, n):
 39 |     #--------------------------------------------------------------------------------#
 40 |     #   a是转化成一维数组的标签，形状(H×W,)；b是转化成一维数组的预测结果，形状(H×W,)
 41 |     #--------------------------------------------------------------------------------#
 42 |     k = (a >= 0) & (a < n)
 43 |     #--------------------------------------------------------------------------------#
 44 |     #   np.bincount计算了从0到n**2-1这n**2个数中每个数出现的次数，返回值形状(n, n)
 45 |     #   返回中，写对角线上的为分类正确的像素点
 46 |     #--------------------------------------------------------------------------------#
 47 |     return np.bincount(n * a[k].astype(int) + b[k], minlength=n ** 2).reshape(n, n)  
 48 | 
 49 | def per_class_iu(hist):
 50 |     return np.diag(hist) / np.maximum((hist.sum(1) + hist.sum(0) - np.diag(hist)), 1) 
 51 | 
 52 | def per_class_PA_Recall(hist):
 53 |     return np.diag(hist) / np.maximum(hist.sum(1), 1) 
 54 | 
 55 | def per_class_Precision(hist):
 56 |     return np.diag(hist) / np.maximum(hist.sum(0), 1) 
 57 | 
 58 | def per_Accuracy(hist):
 59 |     return np.sum(np.diag(hist)) / np.maximum(np.sum(hist), 1) 
 60 | 
 61 | def compute_mIoU(gt_dir, pred_dir, png_name_list, num_classes, name_classes=None):  
 62 |     print('Num classes', num_classes)  
 63 |     #-----------------------------------------#
 64 |     #   创建一个全是0的矩阵，是一个混淆矩阵
 65 |     #-----------------------------------------#
 66 |     hist = np.zeros((num_classes, num_classes))
 67 |     
 68 |     #------------------------------------------------#
 69 |     #   获得验证集标签路径列表，方便直接读取
 70 |     #   获得验证集图像分割结果路径列表，方便直接读取
 71 |     #------------------------------------------------#
 72 |     gt_imgs     = [join(gt_dir, x + ".png") for x in png_name_list]  
 73 |     pred_imgs   = [join(pred_dir, x + ".png") for x in png_name_list]  
 74 | 
 75 |     #------------------------------------------------#
 76 |     #   读取每一个（图片-标签）对
 77 |     #------------------------------------------------#
 78 |     for ind in range(len(gt_imgs)): 
 79 |         #------------------------------------------------#
 80 |         #   读取一张图像分割结果，转化成numpy数组
 81 |         #------------------------------------------------#
 82 |         pred = np.array(Image.open(pred_imgs[ind]))  
 83 |         #------------------------------------------------#
 84 |         #   读取一张对应的标签，转化成numpy数组
 85 |         #------------------------------------------------#
 86 |         label = np.array(Image.open(gt_imgs[ind]))  
 87 | 
 88 |         # 如果图像分割结果与标签的大小不一样，这张图片就不计算
 89 |         if len(label.flatten()) != len(pred.flatten()):  
 90 |             print(
 91 |                 'Skipping: len(gt) = {:d}, len(pred) = {:d}, {:s}, {:s}'.format(
 92 |                     len(label.flatten()), len(pred.flatten()), gt_imgs[ind],
 93 |                     pred_imgs[ind]))
 94 |             continue
 95 | 
 96 |         #------------------------------------------------#
 97 |         #   对一张图片计算21×21的hist矩阵，并累加
 98 |         #------------------------------------------------#
 99 |         hist += fast_hist(label.flatten(), pred.flatten(), num_classes)  
100 |         # 每计算10张就输出一下目前已计算的图片中所有类别平均的mIoU值
101 |         if name_classes is not None and ind > 0 and ind % 10 == 0: 
102 |             print('{:d} / {:d}: mIou-{:0.2f}%; mPA-{:0.2f}%; Accuracy-{:0.2f}%'.format(
103 |                     ind, 
104 |                     len(gt_imgs),
105 |                     100 * np.nanmean(per_class_iu(hist)),
106 |                     100 * np.nanmean(per_class_PA_Recall(hist)),
107 |                     100 * per_Accuracy(hist)
108 |                 )
109 |             )
110 |     #------------------------------------------------#
111 |     #   计算所有验证集图片的逐类别mIoU值
112 |     #------------------------------------------------#
113 |     IoUs        = per_class_iu(hist)
114 |     PA_Recall   = per_class_PA_Recall(hist)
115 |     Precision   = per_class_Precision(hist)
116 |     #------------------------------------------------#
117 |     #   逐类别输出一下mIoU值
118 |     #------------------------------------------------#
119 |     if name_classes is not None:
120 |         for ind_class in range(num_classes):
121 |             print('===>' + name_classes[ind_class] + ':\tIou-' + str(round(IoUs[ind_class] * 100, 2)) \
122 |                 + '; Recall (equal to the PA)-' + str(round(PA_Recall[ind_class] * 100, 2))+ '; Precision-' + str(round(Precision[ind_class] * 100, 2)))
123 | 
124 |     #-----------------------------------------------------------------#
125 |     #   在所有验证集图像上求所有类别平均的mIoU值，计算时忽略NaN值
126 |     #-----------------------------------------------------------------#
127 |     print('===> mIoU: ' + str(round(np.nanmean(IoUs) * 100, 2)) + '; mPA: ' + str(round(np.nanmean(PA_Recall) * 100, 2)) + '; Accuracy: ' + str(round(per_Accuracy(hist) * 100, 2)))  
128 |     return np.array(hist, np.int), IoUs, PA_Recall, Precision
129 | 
130 | def adjust_axes(r, t, fig, axes):
131 |     bb                  = t.get_window_extent(renderer=r)
132 |     text_width_inches   = bb.width / fig.dpi
133 |     current_fig_width   = fig.get_figwidth()
134 |     new_fig_width       = current_fig_width + text_width_inches
135 |     propotion           = new_fig_width / current_fig_width
136 |     x_lim               = axes.get_xlim()
137 |     axes.set_xlim([x_lim[0], x_lim[1] * propotion])
138 | 
139 | def draw_plot_func(values, name_classes, plot_title, x_label, output_path, tick_font_size = 12, plt_show = True):
140 |     fig     = plt.gcf() 
141 |     axes    = plt.gca()
142 |     plt.barh(range(len(values)), values, color='royalblue')
143 |     plt.title(plot_title, fontsize=tick_font_size + 2)
144 |     plt.xlabel(x_label, fontsize=tick_font_size)
145 |     plt.yticks(range(len(values)), name_classes, fontsize=tick_font_size)
146 |     r = fig.canvas.get_renderer()
147 |     for i, val in enumerate(values):
148 |         str_val = " " + str(val) 
149 |         if val < 1.0:
150 |             str_val = " {0:.2f}".format(val)
151 |         t = plt.text(val, i, str_val, color='royalblue', va='center', fontweight='bold')
152 |         if i == (len(values)-1):
153 |             adjust_axes(r, t, fig, axes)
154 | 
155 |     fig.tight_layout()
156 |     fig.savefig(output_path)
157 |     if plt_show:
158 |         plt.show()
159 |     plt.close()
160 | 
161 | def show_results(miou_out_path, hist, IoUs, PA_Recall, Precision, name_classes, tick_font_size = 12):
162 |     draw_plot_func(IoUs, name_classes, "mIoU = {0:.2f}%".format(np.nanmean(IoUs)*100), "Intersection over Union", \
163 |         os.path.join(miou_out_path, "mIoU.png"), tick_font_size = tick_font_size, plt_show = True)
164 |     print("Save mIoU out to " + os.path.join(miou_out_path, "mIoU.png"))
165 | 
166 |     draw_plot_func(PA_Recall, name_classes, "mPA = {0:.2f}%".format(np.nanmean(PA_Recall)*100), "Pixel Accuracy", \
167 |         os.path.join(miou_out_path, "mPA.png"), tick_font_size = tick_font_size, plt_show = False)
168 |     print("Save mPA out to " + os.path.join(miou_out_path, "mPA.png"))
169 |     
170 |     draw_plot_func(PA_Recall, name_classes, "mRecall = {0:.2f}%".format(np.nanmean(PA_Recall)*100), "Recall", \
171 |         os.path.join(miou_out_path, "Recall.png"), tick_font_size = tick_font_size, plt_show = False)
172 |     print("Save Recall out to " + os.path.join(miou_out_path, "Recall.png"))
173 | 
174 |     draw_plot_func(Precision, name_classes, "mPrecision = {0:.2f}%".format(np.nanmean(Precision)*100), "Precision", \
175 |         os.path.join(miou_out_path, "Precision.png"), tick_font_size = tick_font_size, plt_show = False)
176 |     print("Save Precision out to " + os.path.join(miou_out_path, "Precision.png"))
177 | 
178 |     with open(os.path.join(miou_out_path, "confusion_matrix.csv"), 'w', newline='') as f:
179 |         writer          = csv.writer(f)
180 |         writer_list     = []
181 |         writer_list.append([' '] + [str(c) for c in name_classes])
182 |         for i in range(len(hist)):
183 |             writer_list.append([name_classes[i]] + [str(x) for x in hist[i]])
184 |         writer.writerows(writer_list)
185 |     print("Save confusion_matrix out to " + os.path.join(miou_out_path, "confusion_matrix.csv"))
186 |             


--------------------------------------------------------------------------------
/voc_annotation.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import random
 3 | 
 4 | import numpy as np
 5 | from PIL import Image
 6 | from tqdm import tqdm
 7 | 
 8 | #-------------------------------------------------------#
 9 | #   想要增加测试集修改trainval_percent 
10 | #   修改train_percent用于改变验证集的比例 9:1
11 | #   
12 | #   当前该库将测试集当作验证集使用，不单独划分测试集
13 | #-------------------------------------------------------#
14 | trainval_percent    = 1
15 | train_percent       = 0.9
16 | #-------------------------------------------------------#
17 | #   指向VOC数据集所在的文件夹
18 | #   默认指向根目录下的VOC数据集
19 | #-------------------------------------------------------#
20 | VOCdevkit_path      = 'VOCdevkit'
21 | 
22 | if __name__ == "__main__":
23 |     random.seed(0)
24 |     print("Generate txt in ImageSets.")
25 |     segfilepath     = os.path.join(VOCdevkit_path, 'VOC2007/SegmentationClass')
26 |     saveBasePath    = os.path.join(VOCdevkit_path, 'VOC2007/ImageSets/Segmentation')
27 |     
28 |     temp_seg = os.listdir(segfilepath)
29 |     total_seg = []
30 |     for seg in temp_seg:
31 |         if seg.endswith(".png"):
32 |             total_seg.append(seg)
33 | 
34 |     num     = len(total_seg)  
35 |     list    = range(num)  
36 |     tv      = int(num*trainval_percent)  
37 |     tr      = int(tv*train_percent)  
38 |     trainval= random.sample(list,tv)  
39 |     train   = random.sample(trainval,tr)  
40 |     
41 |     print("train and val size",tv)
42 |     print("traub suze",tr)
43 |     ftrainval   = open(os.path.join(saveBasePath,'trainval.txt'), 'w')  
44 |     ftest       = open(os.path.join(saveBasePath,'test.txt'), 'w')  
45 |     ftrain      = open(os.path.join(saveBasePath,'train.txt'), 'w')  
46 |     fval        = open(os.path.join(saveBasePath,'val.txt'), 'w')  
47 |     
48 |     for i in list:  
49 |         name = total_seg[i][:-4]+'\n'  
50 |         if i in trainval:  
51 |             ftrainval.write(name)  
52 |             if i in train:  
53 |                 ftrain.write(name)  
54 |             else:  
55 |                 fval.write(name)  
56 |         else:  
57 |             ftest.write(name)  
58 |     
59 |     ftrainval.close()  
60 |     ftrain.close()  
61 |     fval.close()  
62 |     ftest.close()
63 |     print("Generate txt in ImageSets done.")
64 | 
65 |     print("Check datasets format, this may take a while.")
66 |     print("检查数据集格式是否符合要求，这可能需要一段时间。")
67 |     classes_nums        = np.zeros([256], np.int)
68 |     for i in tqdm(list):
69 |         name            = total_seg[i]
70 |         png_file_name   = os.path.join(segfilepath, name)
71 |         if not os.path.exists(png_file_name):
72 |             raise ValueError("未检测到标签图片%s，请查看具体路径下文件是否存在以及后缀是否为png。"%(png_file_name))
73 |         
74 |         png             = np.array(Image.open(png_file_name), np.uint8)
75 |         if len(np.shape(png)) > 2:
76 |             print("标签图片%s的shape为%s，不属于灰度图或者八位彩图，请仔细检查数据集格式。"%(name, str(np.shape(png))))
77 |             print("标签图片需要为灰度图或者八位彩图，标签的每个像素点的值就是这个像素点所属的种类。"%(name, str(np.shape(png))))
78 | 
79 |         classes_nums += np.bincount(np.reshape(png, [-1]), minlength=256)
80 |             
81 |     print("打印像素点的值与数量。")
82 |     print('-' * 37)
83 |     print("| %15s | %15s |"%("Key", "Value"))
84 |     print('-' * 37)
85 |     for i in range(256):
86 |         if classes_nums[i] > 0:
87 |             print("| %15s | %15s |"%(str(i), str(classes_nums[i])))
88 |             print('-' * 37)
89 |     
90 |     if classes_nums[255] > 0 and classes_nums[0] > 0 and np.sum(classes_nums[1:255]) == 0:
91 |         print("检测到标签中像素点的值仅包含0与255，数据格式有误。")
92 |         print("二分类问题需要将标签修改为背景的像素点值为0，目标的像素点值为1。")
93 |     elif classes_nums[0] > 0 and np.sum(classes_nums[1:]) == 0:
94 |         print("检测到标签中仅仅包含背景像素点，数据格式有误，请仔细检查数据集格式。")
95 | 
96 |     print("JPEGImages中的图片应当为.jpg文件、SegmentationClass中的图片应当为.png文件。")
97 |     print("如果格式有误，参考:")
98 |     print("https://github.com/bubbliiiing/segmentation-format-fix")


--------------------------------------------------------------------------------
/常见问题汇总.md:
--------------------------------------------------------------------------------
  1 | 问题汇总的博客地址为[https://blog.csdn.net/weixin_44791964/article/details/107517428](https://blog.csdn.net/weixin_44791964/article/details/107517428)。
  2 | 
  3 | # 问题汇总
  4 | ## 1、下载问题
  5 | ### a、代码下载
  6 | **问：up主，可以给我发一份代码吗，代码在哪里下载啊？ 
  7 | 答：Github上的地址就在视频简介里。复制一下就能进去下载了。**
  8 | 
  9 | **问：up主，为什么我下载的代码提示压缩包损坏？
 10 | 答：重新去Github下载。**
 11 | 
 12 | **问：up主，为什么我下载的代码和你在视频以及博客上的代码不一样？
 13 | 答：我常常会对代码进行更新，最终以实际的代码为准。**
 14 | 
 15 | ### b、 权值下载
 16 | **问：up主，为什么我下载的代码里面，model_data下面没有.pth或者.h5文件？ 
 17 | 答：我一般会把权值上传到Github和百度网盘，在GITHUB的README里面就能找到。**
 18 | 
 19 | ### c、 数据集下载
 20 | **问：up主，XXXX数据集在哪里下载啊？
 21 | 答：一般数据集的下载地址我会放在README里面，基本上都有，没有的话请及时联系我添加，直接发github的issue即可**。
 22 | 
 23 | ## 2、环境配置问题
 24 | ### a、现在库中所用的环境
 25 | **pytorch代码对应的pytorch版本为1.2，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/106037141](https://blog.csdn.net/weixin_44791964/article/details/106037141)。
 26 | 
 27 | **keras代码对应的tensorflow版本为1.13.2，keras版本是2.1.5，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/104702142](https://blog.csdn.net/weixin_44791964/article/details/104702142)。
 28 | 
 29 | **tf2代码对应的tensorflow版本为2.2.0，无需安装keras，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/109161493](https://blog.csdn.net/weixin_44791964/article/details/109161493)。
 30 | 
 31 | **问：你的代码某某某版本的tensorflow和pytorch能用嘛？
 32 | 答：最好按照我推荐的配置，配置教程也有！其它版本的我没有试过！可能出现问题但是一般问题不大。仅需要改少量代码即可。**
 33 | 
 34 | ### b、30系列显卡环境配置
 35 | 30系显卡由于框架更新不可使用上述环境配置教程。
 36 | 当前我已经测试的可以用的30显卡配置如下：
 37 | **pytorch代码对应的pytorch版本为1.7.0，cuda为11.0，cudnn为8.0.5**。
 38 | 
 39 | **keras代码无法在win10下配置cuda11，在ubuntu下可以百度查询一下，配置tensorflow版本为1.15.4，keras版本是2.1.5或者2.3.1（少量函数接口不同，代码可能还需要少量调整。）**
 40 | 
 41 | **tf2代码对应的tensorflow版本为2.4.0，cuda为11.0，cudnn为8.0.5**。
 42 | 
 43 | ### c、GPU利用问题与环境使用问题
 44 | **问：为什么我安装了tensorflow-gpu但是却没用利用GPU进行训练呢？
 45 | 答：确认tensorflow-gpu已经装好，利用pip list查看tensorflow版本，然后查看任务管理器或者利用nvidia命令看看是否使用了gpu进行训练，任务管理器的话要看显存使用情况。**
 46 | 
 47 | **问：up主，我好像没有在用gpu进行训练啊，怎么看是不是用了GPU进行训练？
 48 | 答：查看是否使用GPU进行训练一般使用NVIDIA在命令行的查看命令，如果要看任务管理器的话，请看性能部分GPU的显存是否利用，或者查看任务管理器的Cuda，而非Copy。**
 49 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20201013234241524.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70#pic_center)
 50 | 
 51 | **问：up主，为什么我按照你的环境配置后还是不能使用？
 52 | 答：请把你的GPU、CUDA、CUDNN、TF版本以及PYTORCH版本B站私聊告诉我。**
 53 | 
 54 | **问：出现如下错误**
 55 | ```python
 56 | Traceback (most recent call last):
 57 |   File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module>
 58 |  from tensorflow.python.pywrap_tensorflow_internal import *
 59 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in <module>
 60 | pywrap_tensorflow_internal = swig_import_helper()
 61 |   File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
 62 |     _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
 63 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\imp.py", line 243, in load_modulereturn load_dynamic(name, filename, file)
 64 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\imp.py", line 343, in load_dynamic
 65 |     return _load(spec)
 66 | ImportError: DLL load failed: 找不到指定的模块。
 67 | ```
 68 | **答：如果没重启过就重启一下，否则重新按照步骤安装，还无法解决则把你的GPU、CUDA、CUDNN、TF版本以及PYTORCH版本私聊告诉我。**
 69 | 
 70 | ### d、no module问题
 71 | **问：为什么提示说no module name utils.utils（no module name nets.yolo、no module name nets.ssd等一系列问题）啊？
 72 | 答：utils并不需要用pip装，它就在我上传的仓库的根目录，出现这个问题的原因是根目录不对，查查相对目录和根目录的概念。查了基本上就明白了。**
 73 | 
 74 | **问：为什么提示说no module name matplotlib（no module name PIL，no module name cv2等等）？
 75 | 答：这个库没安装打开命令行安装就好。pip install matplotlib**
 76 | 
 77 | **问：为什么我已经用pip装了opencv（pillow、matplotlib等），还是提示no module name cv2？
 78 | 答：没有激活环境装，要激活对应的conda环境进行安装才可以正常使用**
 79 | 
 80 | **问：为什么提示说No module named 'torch' ？
 81 | 答：其实我也真的很想知道为什么会有这个问题……这个pytorch没装是什么情况？一般就俩情况，一个是真的没装，还有一个是装到其它环境了，当前激活的环境不是自己装的环境。**
 82 | 
 83 | **问：为什么提示说No module named 'tensorflow' ？
 84 | 答：同上。**
 85 | 
 86 | ### e、cuda安装失败问题
 87 | 一般cuda安装前需要安装Visual Studio，装个2017版本即可。
 88 | 
 89 | ### f、Ubuntu系统问题
 90 | **所有代码在Ubuntu下可以使用，我两个系统都试过。**
 91 | 
 92 | ### g、VSCODE提示错误的问题
 93 | **问：为什么在VSCODE里面提示一大堆的错误啊？
 94 | 答：我也提示一大堆的错误，但是不影响，是VSCODE的问题，如果不想看错误的话就装Pycharm。**
 95 | 
 96 | ### h、使用cpu进行训练与预测的问题
 97 | **对于keras和tf2的代码而言，如果想用cpu进行训练和预测，直接装cpu版本的tensorflow就可以了。**
 98 | 
 99 | **对于pytorch的代码而言，如果想用cpu进行训练和预测，需要将cuda=True修改成cuda=False。**
100 | 
101 | ### i、tqdm没有pos参数问题
102 | **问：运行代码提示'tqdm' object has no attribute 'pos'。
103 | 答：重装tqdm，换个版本就可以了。**
104 | 
105 | ### j、提示decode(“utf-8”)的问题
106 | **由于h5py库的更新，安装过程中会自动安装h5py=3.0.0以上的版本，会导致decode("utf-8")的错误！
107 | 各位一定要在安装完tensorflow后利用命令装h5py=2.10.0！**
108 | ```
109 | pip install h5py==2.10.0
110 | ```
111 | 
112 | ### k、提示TypeError: __array__() takes 1 positional argument but 2 were given错误
113 | 可以修改pillow版本解决。
114 | ```
115 | pip install pillow==8.2.0
116 | ```
117 | 
118 | ### l、其它问题
119 | **问：为什么提示TypeError: cat() got an unexpected keyword argument 'axis'，Traceback (most recent call last)，AttributeError: 'Tensor' object has no attribute 'bool'？
120 | 答：这是版本问题，建议使用torch1.2以上版本**
121 | **其它有很多稀奇古怪的问题，很多是版本问题，建议按照我的视频教程安装Keras和tensorflow。比如装的是tensorflow2，就不用问我说为什么我没法运行Keras-yolo啥的。那是必然不行的。**
122 | 
123 | ## 3、目标检测库问题汇总（人脸检测和分类库也可参考）
124 | ### a、shape不匹配问题
125 | #### 1）、训练时shape不匹配问题
126 | **问：up主，为什么运行train.py会提示shape不匹配啊？
127 | 答：在keras环境中，因为你训练的种类和原始的种类不同，网络结构会变化，所以最尾部的shape会有少量不匹配。**
128 | 
129 | #### 2）、预测时shape不匹配问题
130 | **问：为什么我运行predict.py会提示我说shape不匹配呀。
131 | 在Pytorch里面是这样的：**
132 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171631901.png)
133 | 在Keras里面是这样的：
134 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171523380.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70)
135 | **答：原因主要有仨：
136 | 1、在ssd、FasterRCNN里面，可能是train.py里面的num_classes没改。
137 | 2、model_path没改。
138 | 3、classes_path没改。
139 | 请检查清楚了！确定自己所用的model_path和classes_path是对应的！训练的时候用到的num_classes或者classes_path也需要检查！**
140 | 
141 | ### b、显存不足问题
142 | **问：为什么我运行train.py下面的命令行闪的贼快，还提示OOM啥的？ 
143 | 答：这是在keras中出现的，爆显存了，可以改小batch_size，SSD的显存占用率是最小的，建议用SSD；
144 | 2G显存：SSD、YOLOV4-TINY
145 | 4G显存：YOLOV3
146 | 6G显存：YOLOV4、Retinanet、M2det、Efficientdet、Faster RCNN等
147 | 8G+显存：随便选吧。**
148 | **需要注意的是，受到BatchNorm2d影响，batch_size不可为1，至少为2。**
149 | 
150 | **问：为什么提示 RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 51.88 MiB free; 15.07 GiB reserved in total by PyTorch)？ 
151 | 答：这是pytorch中出现的，爆显存了，同上。**
152 | 
153 | **问：为什么我显存都没利用，就直接爆显存了？ 
154 | 答：都爆显存了，自然就不利用了，模型没有开始训练。**
155 | ### c、训练问题（冻结训练，LOSS问题、训练效果问题等）
156 | **问：为什么要冻结训练和解冻训练呀？
157 | 答：这是迁移学习的思想，因为神经网络主干特征提取部分所提取到的特征是通用的，我们冻结起来训练可以加快训练效率，也可以防止权值被破坏。**
158 | 在冻结阶段，模型的主干被冻结了，特征提取网络不发生改变。占用的显存较小，仅对网络进行微调。
159 | 在解冻阶段，模型的主干不被冻结了，特征提取网络会发生改变。占用的显存较大，网络所有的参数都会发生改变。
160 | 
161 | **问：为什么我的网络不收敛啊，LOSS是XXXX。
162 | 答：不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，我的yolo代码都没有归一化，所以LOSS值看起来比较高，LOSS的值不重要，重要的是是否在变小，预测是否有效果。**
163 | 
164 | **问：为什么我的训练效果不好？预测了没有框（框不准）。
165 | 答：**
166 | 
167 | 考虑几个问题：
168 | 1、目标信息问题，查看2007_train.txt文件是否有目标信息，没有的话请修改voc_annotation.py。
169 | 2、数据集问题，小于500的自行考虑增加数据集，同时测试不同的模型，确认数据集是好的。
170 | 3、是否解冻训练，如果数据集分布与常规画面差距过大需要进一步解冻训练，调整主干，加强特征提取能力。
171 | 4、网络问题，比如SSD不适合小目标，因为先验框固定了。
172 | 5、训练时长问题，有些同学只训练了几代表示没有效果，按默认参数训练完。
173 | 6、确认自己是否按照步骤去做了，如果比如voc_annotation.py里面的classes是否修改了等。
174 | 7、不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，LOSS的值不重要，重要的是是否收敛。
175 | 
176 | **问：我怎么出现了gbk什么的编码错误啊：**
177 | ```python
178 | UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 446: illegal multibyte sequence
179 | ```
180 | **答：标签和路径不要使用中文，如果一定要使用中文，请注意处理的时候编码的问题，改成打开文件的encoding方式改为utf-8。**
181 | 
182 | **问：我的图片是xxx*xxx的分辨率的，可以用吗！**
183 | **答：可以用，代码里面会自动进行resize或者数据增强。**
184 | 
185 | **问：怎么进行多GPU训练？
186 | 答：pytorch的大多数代码可以直接使用gpu训练，keras的话直接百度就好了，实现并不复杂，我没有多卡没法详细测试，还需要各位同学自己努力了。**
187 | ### d、灰度图问题
188 | **问：能不能训练灰度图（预测灰度图）啊？
189 | 答：我的大多数库会将灰度图转化成RGB进行训练和预测，如果遇到代码不能训练或者预测灰度图的情况，可以尝试一下在get_random_data里面将Image.open后的结果转换成RGB，预测的时候也这样试试。（仅供参考）**
190 | 
191 | ### e、断点续练问题
192 | **问：我已经训练过几个世代了，能不能从这个基础上继续开始训练
193 | 答：可以，你在训练前，和载入预训练权重一样载入训练过的权重就行了。一般训练好的权重会保存在logs文件夹里面，将model_path修改成你要开始的权值的路径即可。**
194 | 
195 | ### f、预训练权重的问题
196 | **问：如果我要训练其它的数据集，预训练权重要怎么办啊？**
197 | **答：数据的预训练权重对不同数据集是通用的，因为特征是通用的，预训练权重对于99%的情况都必须要用，不用的话权值太过随机，特征提取效果不明显，网络训练的结果也不会好。**
198 | 
199 | **问：up，我修改了网络，预训练权重还能用吗？
200 | 答：修改了主干的话，如果不是用的现有的网络，基本上预训练权重是不能用的，要么就自己判断权值里卷积核的shape然后自己匹配，要么只能自己预训练去了；修改了后半部分的话，前半部分的主干部分的预训练权重还是可以用的，如果是pytorch代码的话，需要自己修改一下载入权值的方式，判断shape后载入，如果是keras代码，直接by_name=True,skip_mismatch=True即可。**
201 | 权值匹配的方式可以参考如下：
202 | ```python
203 | # 加快模型训练的效率
204 | print('Loading weights into state dict...')
205 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
206 | model_dict = model.state_dict()
207 | pretrained_dict = torch.load(model_path, map_location=device)
208 | a = {}
209 | for k, v in pretrained_dict.items():
210 |     try:    
211 |         if np.shape(model_dict[k]) ==  np.shape(v):
212 |             a[k]=v
213 |     except:
214 |         pass
215 | model_dict.update(a)
216 | model.load_state_dict(model_dict)
217 | print('Finished!')
218 | ```
219 | 
220 | **问：我要怎么不使用预训练权重啊？
221 | 答：把载入预训练权重的代码注释了就行。**
222 | 
223 | **问：为什么我不使用预训练权重效果这么差啊？
224 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，voc07+12、coco+voc07+12效果都不一样，预训练权重还是非常重要的。**
225 | 
226 | ### g、视频检测问题与摄像头检测问题
227 | **问：怎么用摄像头检测呀？
228 | 答：predict.py修改参数可以进行摄像头检测，也有视频详细解释了摄像头检测的思路。**
229 | 
230 | **问：怎么用视频检测呀？
231 | 答：同上**
232 | ### h、从0开始训练问题
233 | **问：怎么在模型上从0开始训练？
234 | 答：在算力不足与调参能力不足的情况下从0开始训练毫无意义。模型特征提取能力在随机初始化参数的情况下非常差。没有好的参数调节能力和算力，无法使得网络正常收敛。**
235 | 如果一定要从0开始，那么训练的时候请注意几点：
236 |  - 不载入预训练权重。 
237 |  - 不要进行冻结训练，注释冻结模型的代码。
238 | 
239 | **问：为什么我不使用预训练权重效果这么差啊？
240 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，voc07+12、coco+voc07+12效果都不一样，预训练权重还是非常重要的。**
241 | 
242 | ### i、保存问题
243 | **问：检测完的图片怎么保存？
244 | 答：一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。详细看看predict.py文件的注释。**
245 | 
246 | **问：怎么用视频保存呀？
247 | 答：详细看看predict.py文件的注释。**
248 | 
249 | ### j、遍历问题
250 | **问：如何对一个文件夹的图片进行遍历？
251 | 答：一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了，详细看看predict.py文件的注释。**
252 | 
253 | **问：如何对一个文件夹的图片进行遍历？并且保存。
254 | 答：遍历的话一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了。保存的话一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。如果有些库用的是cv2，那就是查一下cv2怎么保存图片。详细看看predict.py文件的注释。**
255 | 
256 | ### k、路径问题（No such file or directory）
257 | **问：我怎么出现了这样的错误呀：**
258 | ```python
259 | FileNotFoundError: 【Errno 2】 No such file or directory
260 | ……………………………………
261 | ……………………………………
262 | ```
263 | **答：去检查一下文件夹路径，查看是否有对应文件；并且检查一下2007_train.txt，其中文件路径是否有错。**
264 | 关于路径有几个重要的点：
265 | **文件夹名称中一定不要有空格。
266 | 注意相对路径和绝对路径。
267 | 多百度路径相关的知识。**
268 | 
269 | **所有的路径问题基本上都是根目录问题，好好查一下相对目录的概念！**
270 | ### l、和原版比较问题
271 | **问：你这个代码和原版比怎么样，可以达到原版的效果么？
272 | 答：基本上可以达到，我都用voc数据测过，我没有好显卡，没有能力在coco上测试与训练。**
273 | 
274 | **问：你有没有实现yolov4所有的tricks，和原版差距多少？
275 | 答：并没有实现全部的改进部分，由于YOLOV4使用的改进实在太多了，很难完全实现与列出来，这里只列出来了一些我比较感兴趣，而且非常有效的改进。论文中提到的SAM（注意力机制模块），作者自己的源码也没有使用。还有其它很多的tricks，不是所有的tricks都有提升，我也没法实现全部的tricks。至于和原版的比较，我没有能力训练coco数据集，根据使用过的同学反应差距不大。**
276 | 
277 | ### m、FPS问题（检测速度问题）
278 | **问：你这个FPS可以到达多少，可以到 XX FPS么？
279 | 答：FPS和机子的配置有关，配置高就快，配置低就慢。**
280 | 
281 | **问：为什么我用服务器去测试yolov4（or others）的FPS只有十几？
282 | 答：检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本，如果已经正确安装，可以去利用time.time()的方法查看detect_image里面，哪一段代码耗时更长（不仅只有网络耗时长，其它处理部分也会耗时，如绘图等）。**
283 | 
284 | **问：为什么论文中说速度可以达到XX，但是这里却没有？
285 | 答：检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本，如果已经正确安装，可以去利用time.time()的方法查看detect_image里面，哪一段代码耗时更长（不仅只有网络耗时长，其它处理部分也会耗时，如绘图等）。有些论文还会使用多batch进行预测，我并没有去实现这个部分。**
286 | 
287 | ### n、预测图片不显示问题
288 | **问：为什么你的代码在预测完成后不显示图片？只是在命令行告诉我有什么目标。
289 | 答：给系统安装一个图片查看器就行了。**
290 | 
291 | ### o、算法评价问题（目标检测的map、PR曲线、Recall、Precision等）
292 | **问：怎么计算map？
293 | 答：看map视频，都一个流程。**
294 | 
295 | **问：计算map的时候，get_map.py里面有一个MINOVERLAP是什么用的，是iou吗？
296 | 答：是iou，它的作用是判断预测框和真实框的重合成度，如果重合程度大于MINOVERLAP，则预测正确。**
297 | 
298 | **问：为什么get_map.py里面的self.confidence（self.score）要设置的那么小？
299 | 答：看一下map的视频的原理部分，要知道所有的结果然后再进行pr曲线的绘制。**
300 | 
301 | **问：能不能说说怎么绘制PR曲线啥的呀。
302 | 答：可以看mAP视频，结果里面有PR曲线。**
303 | 
304 | **问：怎么计算Recall、Precision指标。
305 | 答：这俩指标应该是相对于特定的置信度的，计算map的时候也会获得。**
306 | 
307 | ### p、coco数据集训练问题
308 | **问：目标检测怎么训练COCO数据集啊？。
309 | 答：coco数据训练所需要的txt文件可以参考qqwweee的yolo3的库，格式都是一样的。**
310 | 
311 | ### q、模型优化（模型修改）问题
312 | **问：up，YOLO系列使用Focal LOSS的代码你有吗，有提升吗？
313 | 答：很多人试过，提升效果也不大（甚至变的更Low），它自己有自己的正负样本的平衡方式。**
314 | 
315 | **问：up，我修改了网络，预训练权重还能用吗？
316 | 答：修改了主干的话，如果不是用的现有的网络，基本上预训练权重是不能用的，要么就自己判断权值里卷积核的shape然后自己匹配，要么只能自己预训练去了；修改了后半部分的话，前半部分的主干部分的预训练权重还是可以用的，如果是pytorch代码的话，需要自己修改一下载入权值的方式，判断shape后载入，如果是keras代码，直接by_name=True,skip_mismatch=True即可。**
317 | 权值匹配的方式可以参考如下：
318 | ```python
319 | # 加快模型训练的效率
320 | print('Loading weights into state dict...')
321 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
322 | model_dict = model.state_dict()
323 | pretrained_dict = torch.load(model_path, map_location=device)
324 | a = {}
325 | for k, v in pretrained_dict.items():
326 |     try:    
327 |         if np.shape(model_dict[k]) ==  np.shape(v):
328 |             a[k]=v
329 |     except:
330 |         pass
331 | model_dict.update(a)
332 | model.load_state_dict(model_dict)
333 | print('Finished!')
334 | ```
335 | 
336 | **问：up，怎么修改模型啊，我想发个小论文！
337 | 答：建议看看yolov3和yolov4的区别，然后看看yolov4的论文，作为一个大型调参现场非常有参考意义，使用了很多tricks。我能给的建议就是多看一些经典模型，然后拆解里面的亮点结构并使用。**
338 | 
339 | ### r、部署问题
340 | 我没有具体部署到手机等设备上过，所以很多部署问题我并不了解……
341 | 
342 | ## 4、语义分割库问题汇总
343 | ### a、shape不匹配问题
344 | #### 1）、训练时shape不匹配问题
345 | **问：up主，为什么运行train.py会提示shape不匹配啊？
346 | 答：在keras环境中，因为你训练的种类和原始的种类不同，网络结构会变化，所以最尾部的shape会有少量不匹配。**
347 | 
348 | #### 2）、预测时shape不匹配问题
349 | **问：为什么我运行predict.py会提示我说shape不匹配呀。
350 | 在Pytorch里面是这样的：**
351 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171631901.png)
352 | 在Keras里面是这样的：
353 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171523380.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70)
354 | **答：原因主要有二：
355 | 1、train.py里面的num_classes没改。
356 | 2、预测时num_classes没改。
357 | 请检查清楚！训练和预测的时候用到的num_classes都需要检查！**
358 | 
359 | ### b、显存不足问题
360 | **问：为什么我运行train.py下面的命令行闪的贼快，还提示OOM啥的？ 
361 | 答：这是在keras中出现的，爆显存了，可以改小batch_size。**
362 | 
363 | **需要注意的是，受到BatchNorm2d影响，batch_size不可为1，至少为2。**
364 | 
365 | **问：为什么提示 RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 51.88 MiB free; 15.07 GiB reserved in total by PyTorch)？ 
366 | 答：这是pytorch中出现的，爆显存了，同上。**
367 | 
368 | **问：为什么我显存都没利用，就直接爆显存了？ 
369 | 答：都爆显存了，自然就不利用了，模型没有开始训练。**
370 | 
371 | ### c、训练问题（冻结训练，LOSS问题、训练效果问题等）
372 | **问：为什么要冻结训练和解冻训练呀？
373 | 答：这是迁移学习的思想，因为神经网络主干特征提取部分所提取到的特征是通用的，我们冻结起来训练可以加快训练效率，也可以防止权值被破坏。**
374 | **在冻结阶段，模型的主干被冻结了，特征提取网络不发生改变。占用的显存较小，仅对网络进行微调。**
375 | **在解冻阶段，模型的主干不被冻结了，特征提取网络会发生改变。占用的显存较大，网络所有的参数都会发生改变。**
376 | 
377 | **问：为什么我的网络不收敛啊，LOSS是XXXX。
378 | 答：不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，我的yolo代码都没有归一化，所以LOSS值看起来比较高，LOSS的值不重要，重要的是是否在变小，预测是否有效果。**
379 | 
380 | **问：为什么我的训练效果不好？预测了没有目标，结果是一片黑。
381 | 答：**
382 | **考虑几个问题：
383 | 1、数据集问题，这是最重要的问题。小于500的自行考虑增加数据集；一定要检查数据集的标签，视频中详细解析了VOC数据集的格式，但并不是有输入图片有输出标签即可，还需要确认标签的每一个像素值是否为它对应的种类。很多同学的标签格式不对，最常见的错误格式就是标签的背景为黑，目标为白，此时目标的像素点值为255，无法正常训练，目标需要为1才行。
384 | 2、是否解冻训练，如果数据集分布与常规画面差距过大需要进一步解冻训练，调整主干，加强特征提取能力。
385 | 3、网络问题，可以尝试不同的网络。
386 | 4、训练时长问题，有些同学只训练了几代表示没有效果，按默认参数训练完。
387 | 5、确认自己是否按照步骤去做了。
388 | 6、不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，LOSS的值不重要，重要的是是否收敛。**
389 | 
390 | 
391 | 
392 | **问：为什么我的训练效果不好？对小目标预测不准确。
393 | 答：对于deeplab和pspnet而言，可以修改一下downsample_factor，当downsample_factor为16的时候下采样倍数过多，效果不太好，可以修改为8。**
394 | 
395 | **问：我怎么出现了gbk什么的编码错误啊：**
396 | ```python
397 | UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 446: illegal multibyte sequence
398 | ```
399 | **答：标签和路径不要使用中文，如果一定要使用中文，请注意处理的时候编码的问题，改成打开文件的encoding方式改为utf-8。**
400 | 
401 | **问：我的图片是xxx*xxx的分辨率的，可以用吗！**
402 | **答：可以用，代码里面会自动进行resize或者数据增强。**
403 | 
404 | **问：怎么进行多GPU训练？
405 | 答：pytorch的大多数代码可以直接使用gpu训练，keras的话直接百度就好了，实现并不复杂，我没有多卡没法详细测试，还需要各位同学自己努力了。**
406 | 
407 | ### d、灰度图问题
408 | **问：能不能训练灰度图（预测灰度图）啊？
409 | 答：我的大多数库会将灰度图转化成RGB进行训练和预测，如果遇到代码不能训练或者预测灰度图的情况，可以尝试一下在get_random_data里面将Image.open后的结果转换成RGB，预测的时候也这样试试。（仅供参考）**
410 | 
411 | ### e、断点续练问题
412 | **问：我已经训练过几个世代了，能不能从这个基础上继续开始训练
413 | 答：可以，你在训练前，和载入预训练权重一样载入训练过的权重就行了。一般训练好的权重会保存在logs文件夹里面，将model_path修改成你要开始的权值的路径即可。**
414 | 
415 | ### f、预训练权重的问题
416 | 
417 | **问：如果我要训练其它的数据集，预训练权重要怎么办啊？**
418 | **答：数据的预训练权重对不同数据集是通用的，因为特征是通用的，预训练权重对于99%的情况都必须要用，不用的话权值太过随机，特征提取效果不明显，网络训练的结果也不会好。**
419 | 
420 | **问：up，我修改了网络，预训练权重还能用吗？
421 | 答：修改了主干的话，如果不是用的现有的网络，基本上预训练权重是不能用的，要么就自己判断权值里卷积核的shape然后自己匹配，要么只能自己预训练去了；修改了后半部分的话，前半部分的主干部分的预训练权重还是可以用的，如果是pytorch代码的话，需要自己修改一下载入权值的方式，判断shape后载入，如果是keras代码，直接by_name=True,skip_mismatch=True即可。**
422 | 权值匹配的方式可以参考如下：
423 | 
424 | ```python
425 | # 加快模型训练的效率
426 | print('Loading weights into state dict...')
427 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
428 | model_dict = model.state_dict()
429 | pretrained_dict = torch.load(model_path, map_location=device)
430 | a = {}
431 | for k, v in pretrained_dict.items():
432 |     try:    
433 |         if np.shape(model_dict[k]) ==  np.shape(v):
434 |             a[k]=v
435 |     except:
436 |         pass
437 | model_dict.update(a)
438 | model.load_state_dict(model_dict)
439 | print('Finished!')
440 | ```
441 | 
442 | **问：我要怎么不使用预训练权重啊？
443 | 答：把载入预训练权重的代码注释了就行。**
444 | 
445 | **问：为什么我不使用预训练权重效果这么差啊？
446 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，预训练权重还是非常重要的。**
447 | 
448 | ### g、视频检测问题与摄像头检测问题
449 | **问：怎么用摄像头检测呀？
450 | 答：predict.py修改参数可以进行摄像头检测，也有视频详细解释了摄像头检测的思路。**
451 | 
452 | **问：怎么用视频检测呀？
453 | 答：同上**
454 | 
455 | ### h、从0开始训练问题
456 | **问：怎么在模型上从0开始训练？
457 | 答：在算力不足与调参能力不足的情况下从0开始训练毫无意义。模型特征提取能力在随机初始化参数的情况下非常差。没有好的参数调节能力和算力，无法使得网络正常收敛。**
458 | 如果一定要从0开始，那么训练的时候请注意几点：
459 |  - 不载入预训练权重。
460 |  - 不要进行冻结训练，注释冻结模型的代码。
461 | 
462 | **问：为什么我不使用预训练权重效果这么差啊？
463 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，预训练权重还是非常重要的。**
464 | 
465 | ### i、保存问题
466 | **问：检测完的图片怎么保存？
467 | 答：一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。详细看看predict.py文件的注释。**
468 | 
469 | **问：怎么用视频保存呀？
470 | 答：详细看看predict.py文件的注释。**
471 | 
472 | ### j、遍历问题
473 | **问：如何对一个文件夹的图片进行遍历？
474 | 答：一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了，详细看看predict.py文件的注释。**
475 | 
476 | **问：如何对一个文件夹的图片进行遍历？并且保存。
477 | 答：遍历的话一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了。保存的话一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。如果有些库用的是cv2，那就是查一下cv2怎么保存图片。详细看看predict.py文件的注释。**
478 | 
479 | ### k、路径问题（No such file or directory）
480 | **问：我怎么出现了这样的错误呀：**
481 | ```python
482 | FileNotFoundError: 【Errno 2】 No such file or directory
483 | ……………………………………
484 | ……………………………………
485 | ```
486 | 
487 | **答：去检查一下文件夹路径，查看是否有对应文件；并且检查一下2007_train.txt，其中文件路径是否有错。**
488 | 关于路径有几个重要的点：
489 | **文件夹名称中一定不要有空格。
490 | 注意相对路径和绝对路径。
491 | 多百度路径相关的知识。**
492 | 
493 | **所有的路径问题基本上都是根目录问题，好好查一下相对目录的概念！**
494 | 
495 | ### l、FPS问题（检测速度问题）
496 | **问：你这个FPS可以到达多少，可以到 XX FPS么？
497 | 答：FPS和机子的配置有关，配置高就快，配置低就慢。**
498 | 
499 | **问：为什么论文中说速度可以达到XX，但是这里却没有？
500 | 答：检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本，如果已经正确安装，可以去利用time.time()的方法查看detect_image里面，哪一段代码耗时更长（不仅只有网络耗时长，其它处理部分也会耗时，如绘图等）。有些论文还会使用多batch进行预测，我并没有去实现这个部分。**
501 | 
502 | ### m、预测图片不显示问题
503 | **问：为什么你的代码在预测完成后不显示图片？只是在命令行告诉我有什么目标。
504 | 答：给系统安装一个图片查看器就行了。**
505 | 
506 | ### n、算法评价问题（miou）
507 | **问：怎么计算miou？
508 | 答：参考视频里的miou测量部分。**
509 | 
510 | **问：怎么计算Recall、Precision指标。
511 | 答：现有的代码还无法获得，需要各位同学理解一下混淆矩阵的概念，然后自行计算一下。**
512 | 
513 | ### o、模型优化（模型修改）问题
514 | **问：up，我修改了网络，预训练权重还能用吗？
515 | 答：修改了主干的话，如果不是用的现有的网络，基本上预训练权重是不能用的，要么就自己判断权值里卷积核的shape然后自己匹配，要么只能自己预训练去了；修改了后半部分的话，前半部分的主干部分的预训练权重还是可以用的，如果是pytorch代码的话，需要自己修改一下载入权值的方式，判断shape后载入，如果是keras代码，直接by_name=True,skip_mismatch=True即可。**
516 | 权值匹配的方式可以参考如下：
517 | 
518 | ```python
519 | # 加快模型训练的效率
520 | print('Loading weights into state dict...')
521 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
522 | model_dict = model.state_dict()
523 | pretrained_dict = torch.load(model_path, map_location=device)
524 | a = {}
525 | for k, v in pretrained_dict.items():
526 |     try:    
527 |         if np.shape(model_dict[k]) ==  np.shape(v):
528 |             a[k]=v
529 |     except:
530 |         pass
531 | model_dict.update(a)
532 | model.load_state_dict(model_dict)
533 | print('Finished!')
534 | ```
535 | 
536 | 
537 | 
538 | **问：up，怎么修改模型啊，我想发个小论文！
539 | 答：建议看看目标检测中yolov4的论文，作为一个大型调参现场非常有参考意义，使用了很多tricks。我能给的建议就是多看一些经典模型，然后拆解里面的亮点结构并使用。常用的tricks如注意力机制什么的，可以试试。**
540 | 
541 | ### p、部署问题
542 | 我没有具体部署到手机等设备上过，所以很多部署问题我并不了解……
543 | 
544 | ## 5、交流群问题
545 | **问：up，有没有QQ群啥的呢？
546 | 答：没有没有，我没有时间管理QQ群……**
547 | 
548 | ## 6、怎么学习的问题
549 | **问：up，你的学习路线怎么样的？我是个小白我要怎么学？
550 | 答：这里有几点需要注意哈
551 | 1、我不是高手，很多东西我也不会，我的学习路线也不一定适用所有人。
552 | 2、我实验室不做深度学习，所以我很多东西都是自学，自己摸索，正确与否我也不知道。
553 | 3、我个人觉得学习更靠自学**
554 | 学习路线的话，我是先学习了莫烦的python教程，从tensorflow、keras、pytorch入门，入门完之后学的SSD，YOLO，然后了解了很多经典的卷积网，后面就开始学很多不同的代码了，我的学习方法就是一行一行的看，了解整个代码的执行流程，特征层的shape变化等，花了很多时间也没有什么捷径，就是要花时间吧。


--------------------------------------------------------------------------------