├── README.md
├── __pycache__
├── my_dataset_cityscraps.cpython-38.pyc
└── transforms.cpython-38.pyc
├── backbone
├── __init__.py
├── __pycache__
│ ├── __init__.cpython-38.pyc
│ ├── feature_pyramid_network.cpython-38.pyc
│ └── resnet50_fpn_model.cpython-38.pyc
├── feature_pyramid_network.py
└── resnet50_fpn_model.py
├── cityscrapes4_indices.json
├── draw_box_utils.py
├── faster_rcnn
├── backbone
│ ├── __init__.py
│ ├── feature_pyramid_network.py
│ ├── mobilenetv2_model.py
│ ├── resnet50_fpn_model.py
│ └── vgg_model.py
├── change_backbone_with_fpn.py
├── change_backbone_without_fpn.py
├── cityscrapes8_indices.json
├── cityscrayp.py
├── draw_box_utils.py
├── my_dataset.py
├── network_files
│ ├── __init__.py
│ ├── boxes.py
│ ├── det_utils.py
│ ├── faster_rcnn_framework.py
│ ├── image_list.py
│ ├── roi_head.py
│ ├── rpn_function.py
│ └── transform.py
├── pascal_voc_classes.json
├── plot_curve.py
├── predict.py
├── record_mAP.txt
├── requirements.txt
├── split_data.py
├── train_mobilenetv2.py
├── train_multi_GPU.py
├── train_res50_fpn.py
├── train_utils
│ ├── __init__.py
│ ├── coco_eval.py
│ ├── coco_utils.py
│ ├── distributed_utils.py
│ ├── group_by_aspect_ratio.py
│ └── train_eval_utils.py
├── transforms.py
└── validation.py
├── figures
├── VehicleMAE_Det.jpg
├── detection_result.jpg
├── experimentalresults.jpg
├── firstIMG.jpg
├── proposal_attentionmaps.jpg
└── proposal_attribute.jpg
├── my_dataset_cityscraps.py
├── network_files
├── __init__.py
├── __pycache__
│ ├── __init__.cpython-38.pyc
│ ├── boxes.cpython-38.pyc
│ ├── det_utils.cpython-38.pyc
│ ├── faster_rcnn_framework.cpython-38.pyc
│ ├── image_list.cpython-38.pyc
│ ├── mask_rcnn.cpython-38.pyc
│ ├── roi_head.cpython-38.pyc
│ ├── rpn_function.cpython-38.pyc
│ ├── transform.cpython-38.pyc
│ └── vehiclemaeencode.cpython-38.pyc
├── boxes.py
├── det_utils.py
├── faster_rcnn_framework.py
├── image_list.py
├── mask_rcnn.py
├── roi_head.py
├── rpn_function.py
├── transform.py
└── vehiclemaeencode.py
├── plot_curve.py
├── predict.py
├── requirements.txt
├── train.py
├── train_utils
├── __init__.py
├── __pycache__
│ ├── __init__.cpython-38.pyc
│ ├── coco_eval.cpython-38.pyc
│ ├── coco_utils.cpython-38.pyc
│ ├── distributed_utils.cpython-38.pyc
│ ├── group_by_aspect_ratio.cpython-38.pyc
│ └── train_eval_utils.cpython-38.pyc
├── coco_eval.py
├── coco_utils.py
├── distributed_utils.py
├── group_by_aspect_ratio.py
└── train_eval_utils.py
├── transforms.py
└── validation.py
/README.md:
--------------------------------------------------------------------------------
1 | # VFM-Det
2 |
3 |
4 |
5 |

6 |
7 | **Vehicle Detection using Pre-trained Large Vision-Language Foundation Models**
8 |
9 | ------
10 |
11 |
12 | • arXiv •
13 | Baselines •
14 | DemoVideo •
15 | Tutorial •
16 |
17 |
18 |
19 |
20 | > **VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models**,
21 | Wentao Wu†, Fanghua Hong†, Xiao Wang*, Chenglong Li, Jin Tang
22 | [[Paper]()]
23 | [[Code]()]
24 | [[DemoVideo]()]
25 |
26 |
27 |
28 | ### News
29 | * [2024.08.23] The source code is released.
30 |
31 |
32 |
33 | ### Abastract
34 |
35 | Existing vehicle detectors are usually obtained by training a typical detector (e.g., YOLO, RCNN, DETR series) on vehicle images based on a pre-trained backbone (e.g., ResNet, ViT). Some researchers also exploit and enhance the detection performance using pre-trained large foundation models. However, we think these detectors may only get sub-optimal results because the large models they use are not specifically designed for vehicles. In addition, their results heavily rely on visual features, and seldom of they consider the alignment between the vehicle's semantic information and visual representations. In this work, we propose a new vehicle detection paradigm based on a pre-trained foundation vehicle model (VehicleMAE) and a large language model (T5), termed VFM-Det. It follows the region proposal-based detection framework and the features of each proposal can be enhanced using VehicleMAE. More importantly, we propose a new VAtt2Vec module that predicts the vehicle semantic attributes of these proposals and transforms them into feature vectors to enhance the vision features via contrastive learning. Extensive experiments on three vehicle detection benchmark datasets thoroughly proved the effectiveness of our vehicle detector. Specifically, our model improves the baseline approach by $+5.1\%$, $+6.2\%$ on the $AP_{0.5}$, $AP_{0.75}$ metrics, respectively, on the Cityscapes dataset.
36 |
37 | ### Framework
38 |
39 |
40 |
41 | ### Environment Configuration
42 |
43 | Configure the environment according to the content of the requirements.txt file.
44 |
45 | ### Model Training and Testing
46 |
47 | ```bibtex
48 | #If you training VFM-Det using a single GPU, please run.
49 | CUDA_VISIBLE_DEVICES=0 python train.py
50 |
51 | #If you testing VFM-Det, please run.
52 | CUDA_VISIBLE_DEVICES=0 python validation.py
53 | ```
54 |
55 | ### Experimental Results
56 |
57 |
58 |
59 |
60 | ### Visual Results
61 |
62 |
63 |
64 |
65 |
66 |
67 |
68 |
69 |
70 | ### Datasets and Checkpoints Download
71 | Datasets
72 |
73 | Cityscapes dataset download address:https://www.cityscapes-dataset.com/
74 |
75 |
76 | COCO2017 dataset download address:
77 | http://images.cocodataset.org/zips/train2017.zip
78 | http://images.cocodataset.org/annotations/annotations_trainval2017.zip
79 | http://images.cocodataset.org/zips/val2017.zip
80 | http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip
81 | http://images.cocodataset.org/zips/test2017.zip
82 | http://images.cocodataset.org/annotations/image_info_test2017.zip
83 |
84 | UA-DETRAC dataset download address:https://www.albany.edu/cnse/research/computer-vision-machine-learning-lab
85 |
86 | Checkpoints
87 |
88 | checkpoint | [download](https://pan.baidu.com/s/1ra1fQEsXCrruUtZsBj741g?pwd=2dyx)
89 |
90 | Extracted code |2dyx
91 |
92 |
93 | ### License
94 |
95 |
96 |
97 | ### :cupid: Acknowledgement
98 | * Thanks for the [WZMIAOMIAO](https://github.com/WZMIAOMIAO/deep-learning-for-image-processing) library for a quickly implement.
99 |
100 |
101 |
102 | ### :newspaper: Citation
103 | If you find this work helps your research, please cite the following work and give us a **star**. Any questions you have, please leave an issue.
104 |
105 | ```bibtex
106 | @misc{wu2024VFMDet,
107 | title={VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models},
108 | author={Wentao Wu and Fanghua Hong and Xiao Wang and Chenglong Li and Jin Tang},
109 | year={2024},
110 | eprint={2408.13031},
111 | archivePrefix={arXiv},
112 | primaryClass={cs.CV},
113 | url={https://arxiv.org/abs/2408.13031},
114 | }
115 | ```
116 |
117 |
118 |
119 |
120 |
121 |
122 |
123 |
124 |
--------------------------------------------------------------------------------
/__pycache__/my_dataset_cityscraps.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/__pycache__/my_dataset_cityscraps.cpython-38.pyc
--------------------------------------------------------------------------------
/__pycache__/transforms.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/__pycache__/transforms.cpython-38.pyc
--------------------------------------------------------------------------------
/backbone/__init__.py:
--------------------------------------------------------------------------------
1 | from .resnet50_fpn_model import resnet50_fpn_backbone
--------------------------------------------------------------------------------
/backbone/__pycache__/__init__.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/backbone/__pycache__/__init__.cpython-38.pyc
--------------------------------------------------------------------------------
/backbone/__pycache__/feature_pyramid_network.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/backbone/__pycache__/feature_pyramid_network.cpython-38.pyc
--------------------------------------------------------------------------------
/backbone/__pycache__/resnet50_fpn_model.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/backbone/__pycache__/resnet50_fpn_model.cpython-38.pyc
--------------------------------------------------------------------------------
/backbone/feature_pyramid_network.py:
--------------------------------------------------------------------------------
1 | from collections import OrderedDict
2 |
3 | import torch.nn as nn
4 | import torch
5 | from torch import Tensor
6 | import torch.nn.functional as F
7 |
8 | from torch.jit.annotations import Tuple, List, Dict
9 |
10 |
11 | class IntermediateLayerGetter(nn.ModuleDict):
12 | """
13 | Module wrapper that returns intermediate layers from a model
14 | It has a strong assumption that the modules have been registered
15 | into the model in the same order as they are used.
16 | This means that one should **not** reuse the same nn.Module
17 | twice in the forward if you want this to work.
18 | Additionally, it is only able to query submodules that are directly
19 | assigned to the model. So if `model` is passed, `model.feature1` can
20 | be returned, but not `model.feature1.layer2`.
21 | Arguments:
22 | model (nn.Module): model on which we will extract the features
23 | return_layers (Dict[name, new_name]): a dict containing the names
24 | of the modules for which the activations will be returned as
25 | the key of the dict, and the value of the dict is the name
26 | of the returned activation (which the user can specify).
27 | """
28 | __annotations__ = {
29 | "return_layers": Dict[str, str],
30 | }
31 |
32 | def __init__(self, model, return_layers):
33 | if not set(return_layers).issubset([name for name, _ in model.named_children()]):
34 | raise ValueError("return_layers are not present in model")
35 |
36 | orig_return_layers = return_layers
37 | return_layers = {str(k): str(v) for k, v in return_layers.items()}
38 | layers = OrderedDict()
39 |
40 | # 遍历模型子模块按顺序存入有序字典
41 | # 只保存layer4及其之前的结构,舍去之后不用的结构
42 | for name, module in model.named_children():
43 | layers[name] = module
44 | if name in return_layers:
45 | del return_layers[name]
46 | if not return_layers:
47 | break
48 |
49 | super().__init__(layers)
50 | self.return_layers = orig_return_layers
51 |
52 | def forward(self, x):
53 | out = OrderedDict()
54 | # 依次遍历模型的所有子模块,并进行正向传播,
55 | # 收集layer1, layer2, layer3, layer4的输出
56 | for name, module in self.items():
57 | x = module(x)
58 | if name in self.return_layers:
59 | out_name = self.return_layers[name]
60 | out[out_name] = x
61 | return out
62 |
63 |
64 | class BackboneWithFPN(nn.Module):
65 | """
66 | Adds a FPN on top of a model.
67 | Internally, it uses torchvision.models._utils.IntermediateLayerGetter to
68 | extract a submodel that returns the feature maps specified in return_layers.
69 | The same limitations of IntermediatLayerGetter apply here.
70 | Arguments:
71 | backbone (nn.Module)
72 | return_layers (Dict[name, new_name]): a dict containing the names
73 | of the modules for which the activations will be returned as
74 | the key of the dict, and the value of the dict is the name
75 | of the returned activation (which the user can specify).
76 | in_channels_list (List[int]): number of channels for each feature map
77 | that is returned, in the order they are present in the OrderedDict
78 | out_channels (int): number of channels in the FPN.
79 | extra_blocks: ExtraFPNBlock
80 | Attributes:
81 | out_channels (int): the number of channels in the FPN
82 | """
83 |
84 | def __init__(self,
85 | backbone: nn.Module,
86 | return_layers=None,
87 | in_channels_list=None,
88 | out_channels=256,
89 | extra_blocks=None,
90 | re_getter=True):
91 | super().__init__()
92 |
93 | if extra_blocks is None:
94 | extra_blocks = LastLevelMaxPool()
95 |
96 | if re_getter:
97 | assert return_layers is not None
98 | self.body = IntermediateLayerGetter(backbone, return_layers=return_layers)
99 | else:
100 | self.body = backbone
101 |
102 | self.fpn = FeaturePyramidNetwork(
103 | in_channels_list=in_channels_list,
104 | out_channels=out_channels,
105 | extra_blocks=extra_blocks,
106 | )
107 |
108 | self.out_channels = out_channels
109 |
110 | def forward(self, x):
111 | x = self.body(x)
112 | x = self.fpn(x)
113 | return x
114 |
115 |
116 | class FeaturePyramidNetwork(nn.Module):
117 | """
118 | Module that adds a FPN from on top of a set of feature maps. This is based on
119 | `"Feature Pyramid Network for Object Detection" `_.
120 | The feature maps are currently supposed to be in increasing depth
121 | order.
122 | The input to the model is expected to be an OrderedDict[Tensor], containing
123 | the feature maps on top of which the FPN will be added.
124 | Arguments:
125 | in_channels_list (list[int]): number of channels for each feature map that
126 | is passed to the module
127 | out_channels (int): number of channels of the FPN representation
128 | extra_blocks (ExtraFPNBlock or None): if provided, extra operations will
129 | be performed. It is expected to take the fpn features, the original
130 | features and the names of the original features as input, and returns
131 | a new list of feature maps and their corresponding names
132 | """
133 |
134 | def __init__(self, in_channels_list, out_channels, extra_blocks=None):
135 | super().__init__()
136 | # 用来调整resnet特征矩阵(layer1,2,3,4)的channel(kernel_size=1)
137 | self.inner_blocks = nn.ModuleList()
138 | # 对调整后的特征矩阵使用3x3的卷积核来得到对应的预测特征矩阵
139 | self.layer_blocks = nn.ModuleList()
140 | for in_channels in in_channels_list:
141 | if in_channels == 0:
142 | continue
143 | inner_block_module = nn.Conv2d(in_channels, out_channels, 1)
144 | layer_block_module = nn.Conv2d(out_channels, out_channels, 3, padding=1)
145 | self.inner_blocks.append(inner_block_module)
146 | self.layer_blocks.append(layer_block_module)
147 |
148 | # initialize parameters now to avoid modifying the initialization of top_blocks
149 | for m in self.children():
150 | if isinstance(m, nn.Conv2d):
151 | nn.init.kaiming_uniform_(m.weight, a=1)
152 | nn.init.constant_(m.bias, 0)
153 |
154 | self.extra_blocks = extra_blocks
155 |
156 | def get_result_from_inner_blocks(self, x: Tensor, idx: int) -> Tensor:
157 | """
158 | This is equivalent to self.inner_blocks[idx](x),
159 | but torchscript doesn't support this yet
160 | """
161 | num_blocks = len(self.inner_blocks)
162 | if idx < 0:
163 | idx += num_blocks
164 | i = 0
165 | out = x
166 | for module in self.inner_blocks:
167 | if i == idx:
168 | out = module(x)
169 | i += 1
170 | return out
171 |
172 | def get_result_from_layer_blocks(self, x: Tensor, idx: int) -> Tensor:
173 | """
174 | This is equivalent to self.layer_blocks[idx](x),
175 | but torchscript doesn't support this yet
176 | """
177 | num_blocks = len(self.layer_blocks)
178 | if idx < 0:
179 | idx += num_blocks
180 | i = 0
181 | out = x
182 | for module in self.layer_blocks:
183 | if i == idx:
184 | out = module(x)
185 | i += 1
186 | return out
187 |
188 | def forward(self, x: Dict[str, Tensor]) -> Dict[str, Tensor]:
189 | """
190 | Computes the FPN for a set of feature maps.
191 | Arguments:
192 | x (OrderedDict[Tensor]): feature maps for each feature level.
193 | Returns:
194 | results (OrderedDict[Tensor]): feature maps after FPN layers.
195 | They are ordered from highest resolution first.
196 | """
197 | # unpack OrderedDict into two lists for easier handling
198 | names = list(x.keys())
199 | x = list(x.values())
200 |
201 | # 将resnet layer4的channel调整到指定的out_channels
202 | # last_inner = self.inner_blocks[-1](x[-1])
203 | last_inner = self.get_result_from_inner_blocks(x[-1], -1)
204 | # result中保存着每个预测特征层
205 | results = []
206 | # 将layer4调整channel后的特征矩阵,通过3x3卷积后得到对应的预测特征矩阵
207 | # results.append(self.layer_blocks[-1](last_inner))
208 | results.append(self.get_result_from_layer_blocks(last_inner, -1))
209 |
210 | for idx in range(len(x) - 2, -1, -1):
211 | inner_lateral = self.get_result_from_inner_blocks(x[idx], idx)
212 | feat_shape = inner_lateral.shape[-2:]
213 | inner_top_down = F.interpolate(last_inner, size=feat_shape, mode="nearest")
214 | last_inner = inner_lateral + inner_top_down
215 | results.insert(0, self.get_result_from_layer_blocks(last_inner, idx))
216 |
217 | # 在layer4对应的预测特征层基础上生成预测特征矩阵5
218 | if self.extra_blocks is not None:
219 | results, names = self.extra_blocks(results, x, names)
220 |
221 | # make it back an OrderedDict
222 | out = OrderedDict([(k, v) for k, v in zip(names, results)])
223 |
224 | return out
225 |
226 |
227 | class LastLevelMaxPool(torch.nn.Module):
228 | """
229 | Applies a max_pool2d on top of the last feature map
230 | """
231 |
232 | def forward(self, x: List[Tensor], y: List[Tensor], names: List[str]) -> Tuple[List[Tensor], List[str]]:
233 | names.append("pool")
234 | x.append(F.max_pool2d(x[-1], 1, 2, 0))
235 | return x, names
236 |
--------------------------------------------------------------------------------
/backbone/resnet50_fpn_model.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 | import torch
4 | import torch.nn as nn
5 | from torchvision.ops.misc import FrozenBatchNorm2d
6 |
7 | from .feature_pyramid_network import BackboneWithFPN, LastLevelMaxPool
8 |
9 |
10 | class Bottleneck(nn.Module):
11 | expansion = 4
12 |
13 | def __init__(self, in_channel, out_channel, stride=1, downsample=None, norm_layer=None):
14 | super().__init__()
15 | if norm_layer is None:
16 | norm_layer = nn.BatchNorm2d
17 |
18 | self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
19 | kernel_size=1, stride=1, bias=False) # squeeze channels
20 | self.bn1 = norm_layer(out_channel)
21 | # -----------------------------------------
22 | self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
23 | kernel_size=3, stride=stride, bias=False, padding=1)
24 | self.bn2 = norm_layer(out_channel)
25 | # -----------------------------------------
26 | self.conv3 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel * self.expansion,
27 | kernel_size=1, stride=1, bias=False) # unsqueeze channels
28 | self.bn3 = norm_layer(out_channel * self.expansion)
29 | self.relu = nn.ReLU(inplace=True)
30 | self.downsample = downsample
31 |
32 | def forward(self, x):
33 | identity = x
34 | if self.downsample is not None:
35 | identity = self.downsample(x)
36 |
37 | out = self.conv1(x)
38 | out = self.bn1(out)
39 | out = self.relu(out)
40 |
41 | out = self.conv2(out)
42 | out = self.bn2(out)
43 | out = self.relu(out)
44 |
45 | out = self.conv3(out)
46 | out = self.bn3(out)
47 |
48 | out += identity
49 | out = self.relu(out)
50 |
51 | return out
52 |
53 |
54 | class ResNet(nn.Module):
55 |
56 | def __init__(self, block, blocks_num, num_classes=1000, include_top=True, norm_layer=None):
57 | super().__init__()
58 | if norm_layer is None:
59 | norm_layer = nn.BatchNorm2d
60 | self._norm_layer = norm_layer
61 |
62 | self.include_top = include_top
63 | self.in_channel = 64
64 |
65 | self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2,
66 | padding=3, bias=False)
67 | self.bn1 = norm_layer(self.in_channel)
68 | self.relu = nn.ReLU(inplace=True)
69 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
70 | self.layer1 = self._make_layer(block, 64, blocks_num[0])
71 | self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
72 | self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
73 | self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)
74 | if self.include_top:
75 | self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # output size = (1, 1)
76 | self.fc = nn.Linear(512 * block.expansion, num_classes)
77 |
78 | for m in self.modules():
79 | if isinstance(m, nn.Conv2d):
80 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
81 |
82 | def _make_layer(self, block, channel, block_num, stride=1):
83 | norm_layer = self._norm_layer
84 | downsample = None
85 | if stride != 1 or self.in_channel != channel * block.expansion:
86 | downsample = nn.Sequential(
87 | nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
88 | norm_layer(channel * block.expansion))
89 |
90 | layers = []
91 | layers.append(block(self.in_channel, channel, downsample=downsample,
92 | stride=stride, norm_layer=norm_layer))
93 | self.in_channel = channel * block.expansion
94 |
95 | for _ in range(1, block_num):
96 | layers.append(block(self.in_channel, channel, norm_layer=norm_layer))
97 |
98 | return nn.Sequential(*layers)
99 |
100 | def forward(self, x):
101 | x = self.conv1(x)
102 | x = self.bn1(x)
103 | x = self.relu(x)
104 | x = self.maxpool(x)
105 |
106 | x = self.layer1(x)
107 | x = self.layer2(x)
108 | x = self.layer3(x)
109 | x = self.layer4(x)
110 |
111 | if self.include_top:
112 | x = self.avgpool(x)
113 | x = torch.flatten(x, 1)
114 | x = self.fc(x)
115 |
116 | return x
117 |
118 |
119 | def overwrite_eps(model, eps):
120 | """
121 | This method overwrites the default eps values of all the
122 | FrozenBatchNorm2d layers of the model with the provided value.
123 | This is necessary to address the BC-breaking change introduced
124 | by the bug-fix at pytorch/vision#2933. The overwrite is applied
125 | only when the pretrained weights are loaded to maintain compatibility
126 | with previous versions.
127 |
128 | Args:
129 | model (nn.Module): The model on which we perform the overwrite.
130 | eps (float): The new value of eps.
131 | """
132 | for module in model.modules():
133 | if isinstance(module, FrozenBatchNorm2d):
134 | module.eps = eps
135 |
136 |
137 | def resnet50_fpn_backbone(pretrain_path="/home/lcl_d/wuwentao/detection/deep-learning-for-image-processing-master/deep-learning-for-image-processing-master/pre_model/resnet50.pth",
138 | norm_layer=nn.BatchNorm2d,
139 | trainable_layers=3,
140 | returned_layers=None,
141 | extra_blocks=None):
142 | """
143 | 搭建resnet50_fpn——backbone
144 | Args:
145 | pretrain_path: resnet50的预训练权重,如果不使用就默认为空
146 | norm_layer: 默认是nn.BatchNorm2d,如果GPU显存很小,batch_size不能设置很大,
147 | 建议将norm_layer设置成FrozenBatchNorm2d(默认是nn.BatchNorm2d)
148 | (https://github.com/facebookresearch/maskrcnn-benchmark/issues/267)
149 | trainable_layers: 指定训练哪些层结构
150 | returned_layers: 指定哪些层的输出需要返回
151 | extra_blocks: 在输出的特征层基础上额外添加的层结构
152 |
153 | Returns:
154 |
155 | """
156 | resnet_backbone = ResNet(Bottleneck, [3, 4, 6, 3],
157 | include_top=False,
158 | norm_layer=norm_layer)
159 |
160 | if isinstance(norm_layer, FrozenBatchNorm2d):
161 | overwrite_eps(resnet_backbone, 0.0)
162 |
163 | if pretrain_path != "":
164 | #assert os.path.exists(pretrain_path), "{} is not exist.".format(pretrain_path)
165 | # 载入预训练权重
166 | print(resnet_backbone.load_state_dict(torch.load(pretrain_path), strict=False))
167 |
168 | # select layers that wont be frozen
169 | assert 0 <= trainable_layers <= 5
170 | layers_to_train = ['layer4', 'layer3', 'layer2', 'layer1', 'conv1'][:trainable_layers]
171 |
172 | # 如果要训练所有层结构的话,不要忘了conv1后还有一个bn1
173 | if trainable_layers == 5:
174 | layers_to_train.append("bn1")
175 |
176 | # freeze layers
177 | for name, parameter in resnet_backbone.named_parameters():
178 | # 只训练不在layers_to_train列表中的层结构
179 | if all([not name.startswith(layer) for layer in layers_to_train]):
180 | parameter.requires_grad_(False)
181 |
182 | if extra_blocks is None:
183 | extra_blocks = LastLevelMaxPool()
184 |
185 | if returned_layers is None:
186 | returned_layers = [1, 2, 3, 4]
187 | # 返回的特征层个数肯定大于0小于5
188 | assert min(returned_layers) > 0 and max(returned_layers) < 5
189 |
190 | # return_layers = {'layer1': '0', 'layer2': '1', 'layer3': '2', 'layer4': '3'}
191 | return_layers = {f'layer{k}': str(v) for v, k in enumerate(returned_layers)}
192 |
193 | # in_channel 为layer4的输出特征矩阵channel = 2048
194 | in_channels_stage2 = resnet_backbone.in_channel // 8 # 256
195 | # 记录resnet50提供给fpn的每个特征层channel
196 | in_channels_list = [in_channels_stage2 * 2 ** (i - 1) for i in returned_layers]
197 | # 通过fpn后得到的每个特征层的channel
198 | out_channels = 256
199 | return BackboneWithFPN(resnet_backbone, return_layers, in_channels_list, out_channels, extra_blocks=extra_blocks)
200 |
--------------------------------------------------------------------------------
/cityscrapes4_indices.json:
--------------------------------------------------------------------------------
1 | {
2 | "1": "car",
3 | "2": "truck",
4 | "3": "bus",
5 | "4": "caravan"
6 | }
--------------------------------------------------------------------------------
/draw_box_utils.py:
--------------------------------------------------------------------------------
1 | from PIL.Image import Image, fromarray
2 | import PIL.ImageDraw as ImageDraw
3 | import PIL.ImageFont as ImageFont
4 | from PIL import ImageColor
5 | import numpy as np
6 |
7 | STANDARD_COLORS = [
8 | 'AliceBlue', 'Chartreuse', 'Aqua', 'Aquamarine', 'Azure', 'Beige', 'Bisque',
9 | 'BlanchedAlmond', 'BlueViolet', 'BurlyWood', 'CadetBlue', 'AntiqueWhite',
10 | 'Chocolate', 'Coral', 'CornflowerBlue', 'Cornsilk', 'Crimson', 'Cyan',
11 | 'DarkCyan', 'DarkGoldenRod', 'DarkGrey', 'DarkKhaki', 'DarkOrange',
12 | 'DarkOrchid', 'DarkSalmon', 'DarkSeaGreen', 'DarkTurquoise', 'DarkViolet',
13 | 'DeepPink', 'DeepSkyBlue', 'DodgerBlue', 'FireBrick', 'FloralWhite',
14 | 'ForestGreen', 'Fuchsia', 'Gainsboro', 'GhostWhite', 'Gold', 'GoldenRod',
15 | 'Salmon', 'Tan', 'HoneyDew', 'HotPink', 'IndianRed', 'Ivory', 'Khaki',
16 | 'Lavender', 'LavenderBlush', 'LawnGreen', 'LemonChiffon', 'LightBlue',
17 | 'LightCoral', 'LightCyan', 'LightGoldenRodYellow', 'LightGray', 'LightGrey',
18 | 'LightGreen', 'LightPink', 'LightSalmon', 'LightSeaGreen', 'LightSkyBlue',
19 | 'LightSlateGray', 'LightSlateGrey', 'LightSteelBlue', 'LightYellow', 'Lime',
20 | 'LimeGreen', 'Linen', 'Magenta', 'MediumAquaMarine', 'MediumOrchid',
21 | 'MediumPurple', 'MediumSeaGreen', 'MediumSlateBlue', 'MediumSpringGreen',
22 | 'MediumTurquoise', 'MediumVioletRed', 'MintCream', 'MistyRose', 'Moccasin',
23 | 'NavajoWhite', 'OldLace', 'Olive', 'OliveDrab', 'Orange', 'OrangeRed',
24 | 'Orchid', 'PaleGoldenRod', 'PaleGreen', 'PaleTurquoise', 'PaleVioletRed',
25 | 'PapayaWhip', 'PeachPuff', 'Peru', 'Pink', 'Plum', 'PowderBlue', 'Purple',
26 | 'Red', 'RosyBrown', 'RoyalBlue', 'SaddleBrown', 'Green', 'SandyBrown',
27 | 'SeaGreen', 'SeaShell', 'Sienna', 'Silver', 'SkyBlue', 'SlateBlue',
28 | 'SlateGray', 'SlateGrey', 'Snow', 'SpringGreen', 'SteelBlue', 'GreenYellow',
29 | 'Teal', 'Thistle', 'Tomato', 'Turquoise', 'Violet', 'Wheat', 'White',
30 | 'WhiteSmoke', 'Yellow', 'YellowGreen'
31 | ]
32 |
33 |
34 | def draw_text(draw,
35 | box: list,
36 | cls: int,
37 | score: float,
38 | category_index: dict,
39 | color: str,
40 | font: str = 'arial.ttf',
41 | font_size: int = 24):
42 | """
43 | 将目标边界框和类别信息绘制到图片上
44 | """
45 | try:
46 | font = ImageFont.truetype(font, font_size)
47 | except IOError:
48 | font = ImageFont.load_default()
49 |
50 | left, top, right, bottom = box
51 | # If the total height of the display strings added to the top of the bounding
52 | # box exceeds the top of the image, stack the strings below the bounding box
53 | # instead of above.
54 | display_str = f"{category_index[str(cls)]}: {int(100 * score)}%"
55 | display_str_heights = [font.getsize(ds)[1] for ds in display_str]
56 | # Each display_str has a top and bottom margin of 0.05x.
57 | display_str_height = (1 + 2 * 0.05) * max(display_str_heights)
58 |
59 | if top > display_str_height:
60 | text_top = top - display_str_height
61 | text_bottom = top
62 | else:
63 | text_top = bottom
64 | text_bottom = bottom + display_str_height
65 |
66 | for ds in display_str:
67 | text_width, text_height = font.getsize(ds)
68 | margin = np.ceil(0.05 * text_width)
69 | draw.rectangle([(left, text_top),
70 | (left + text_width + 2 * margin, text_bottom)], fill=color)
71 | draw.text((left + margin, text_top),
72 | ds,
73 | fill='black',
74 | font=font)
75 | left += text_width
76 |
77 |
78 | def draw_masks(image, masks, colors, thresh: float = 0.7, alpha: float = 0.5):
79 | np_image = np.array(image)
80 | masks = np.where(masks > thresh, True, False)
81 |
82 | # colors = np.array(colors)
83 | img_to_draw = np.copy(np_image)
84 | # TODO: There might be a way to vectorize this
85 | for mask, color in zip(masks, colors):
86 | img_to_draw[mask] = color
87 |
88 | out = np_image * (1 - alpha) + img_to_draw * alpha
89 | return fromarray(out.astype(np.uint8))
90 |
91 |
92 | def draw_objs(image: Image,
93 | boxes: np.ndarray = None,
94 | classes: np.ndarray = None,
95 | scores: np.ndarray = None,
96 | masks: np.ndarray = None,
97 | category_index: dict = None,
98 | box_thresh: float = 0.1,
99 | mask_thresh: float = 0.5,
100 | line_thickness: int = 8,
101 | font: str = 'arial.ttf',
102 | font_size: int = 24,
103 | draw_boxes_on_image: bool = True,
104 | draw_masks_on_image: bool = True):
105 | """
106 | 将目标边界框信息,类别信息,mask信息绘制在图片上
107 | Args:
108 | image: 需要绘制的图片
109 | boxes: 目标边界框信息
110 | classes: 目标类别信息
111 | scores: 目标概率信息
112 | masks: 目标mask信息
113 | category_index: 类别与名称字典
114 | box_thresh: 过滤的概率阈值
115 | mask_thresh:
116 | line_thickness: 边界框宽度
117 | font: 字体类型
118 | font_size: 字体大小
119 | draw_boxes_on_image:
120 | draw_masks_on_image:
121 |
122 | Returns:
123 |
124 | """
125 |
126 | # 过滤掉低概率的目标
127 | idxs = np.greater(scores, box_thresh)
128 | boxes = boxes[idxs]
129 | classes = classes[idxs]
130 | scores = scores[idxs]
131 | if masks is not None:
132 | masks = masks[idxs]
133 | if len(boxes) == 0:
134 | return image
135 |
136 | colors = [ImageColor.getrgb(STANDARD_COLORS[cls % len(STANDARD_COLORS)]) for cls in classes]
137 |
138 | if draw_boxes_on_image:
139 | # Draw all boxes onto image.
140 | draw = ImageDraw.Draw(image)
141 | for box, cls, score, color in zip(boxes, classes, scores, colors):
142 | left, top, right, bottom = box
143 | # 绘制目标边界框
144 | draw.line([(left, top), (left, bottom), (right, bottom),
145 | (right, top), (left, top)], width=line_thickness, fill=color)
146 | # 绘制类别和概率信息
147 | draw_text(draw, box.tolist(), int(cls), float(score), category_index, color, font, font_size)
148 |
149 | if draw_masks_on_image and (masks is not None):
150 | # Draw all mask onto image.
151 | image = draw_masks(image, masks, colors, mask_thresh)
152 |
153 | return image
154 |
--------------------------------------------------------------------------------
/faster_rcnn/backbone/__init__.py:
--------------------------------------------------------------------------------
1 | from .resnet50_fpn_model import resnet50_fpn_backbone
2 | from .mobilenetv2_model import MobileNetV2
3 | from .vgg_model import vgg
4 | from .feature_pyramid_network import LastLevelMaxPool, BackboneWithFPN
5 |
--------------------------------------------------------------------------------
/faster_rcnn/backbone/feature_pyramid_network.py:
--------------------------------------------------------------------------------
1 | from collections import OrderedDict
2 |
3 | import torch.nn as nn
4 | import torch
5 | from torch import Tensor
6 | import torch.nn.functional as F
7 |
8 | from torch.jit.annotations import Tuple, List, Dict
9 |
10 |
11 | class IntermediateLayerGetter(nn.ModuleDict):
12 | """
13 | Module wrapper that returns intermediate layers from a model
14 | It has a strong assumption that the modules have been registered
15 | into the model in the same order as they are used.
16 | This means that one should **not** reuse the same nn.Module
17 | twice in the forward if you want this to work.
18 | Additionally, it is only able to query submodules that are directly
19 | assigned to the model. So if `model` is passed, `model.feature1` can
20 | be returned, but not `model.feature1.layer2`.
21 | Arguments:
22 | model (nn.Module): model on which we will extract the features
23 | return_layers (Dict[name, new_name]): a dict containing the names
24 | of the modules for which the activations will be returned as
25 | the key of the dict, and the value of the dict is the name
26 | of the returned activation (which the user can specify).
27 | """
28 | __annotations__ = {
29 | "return_layers": Dict[str, str],
30 | }
31 |
32 | def __init__(self, model, return_layers):
33 | if not set(return_layers).issubset([name for name, _ in model.named_children()]):
34 | raise ValueError("return_layers are not present in model")
35 |
36 | orig_return_layers = return_layers
37 | return_layers = {str(k): str(v) for k, v in return_layers.items()}
38 | layers = OrderedDict()
39 |
40 | # 遍历模型子模块按顺序存入有序字典
41 | # 只保存layer4及其之前的结构,舍去之后不用的结构
42 | for name, module in model.named_children():
43 | layers[name] = module
44 | if name in return_layers:
45 | del return_layers[name]
46 | if not return_layers:
47 | break
48 |
49 | super().__init__(layers)
50 | self.return_layers = orig_return_layers
51 |
52 | def forward(self, x):
53 | out = OrderedDict()
54 | # 依次遍历模型的所有子模块,并进行正向传播,
55 | # 收集layer1, layer2, layer3, layer4的输出
56 | for name, module in self.items():
57 | x = module(x)
58 | if name in self.return_layers:
59 | out_name = self.return_layers[name]
60 | out[out_name] = x
61 | return out
62 |
63 |
64 | class FeaturePyramidNetwork(nn.Module):
65 | """
66 | Module that adds a FPN from on top of a set of feature maps. This is based on
67 | `"Feature Pyramid Network for Object Detection" `_.
68 | The feature maps are currently supposed to be in increasing depth
69 | order.
70 | The input to the model is expected to be an OrderedDict[Tensor], containing
71 | the feature maps on top of which the FPN will be added.
72 | Arguments:
73 | in_channels_list (list[int]): number of channels for each feature map that
74 | is passed to the module
75 | out_channels (int): number of channels of the FPN representation
76 | extra_blocks (ExtraFPNBlock or None): if provided, extra operations will
77 | be performed. It is expected to take the fpn features, the original
78 | features and the names of the original features as input, and returns
79 | a new list of feature maps and their corresponding names
80 | """
81 |
82 | def __init__(self, in_channels_list, out_channels, extra_blocks=None):
83 | super().__init__()
84 | # 用来调整resnet特征矩阵(layer1,2,3,4)的channel(kernel_size=1)
85 | self.inner_blocks = nn.ModuleList()
86 | # 对调整后的特征矩阵使用3x3的卷积核来得到对应的预测特征矩阵
87 | self.layer_blocks = nn.ModuleList()
88 | for in_channels in in_channels_list:
89 | if in_channels == 0:
90 | continue
91 | inner_block_module = nn.Conv2d(in_channels, out_channels, 1)
92 | layer_block_module = nn.Conv2d(out_channels, out_channels, 3, padding=1)
93 | self.inner_blocks.append(inner_block_module)
94 | self.layer_blocks.append(layer_block_module)
95 |
96 | # initialize parameters now to avoid modifying the initialization of top_blocks
97 | for m in self.children():
98 | if isinstance(m, nn.Conv2d):
99 | nn.init.kaiming_uniform_(m.weight, a=1)
100 | nn.init.constant_(m.bias, 0)
101 |
102 | self.extra_blocks = extra_blocks
103 |
104 | def get_result_from_inner_blocks(self, x: Tensor, idx: int) -> Tensor:
105 | """
106 | This is equivalent to self.inner_blocks[idx](x),
107 | but torchscript doesn't support this yet
108 | """
109 | num_blocks = len(self.inner_blocks)
110 | if idx < 0:
111 | idx += num_blocks
112 | i = 0
113 | out = x
114 | for module in self.inner_blocks:
115 | if i == idx:
116 | out = module(x)
117 | i += 1
118 | return out
119 |
120 | def get_result_from_layer_blocks(self, x: Tensor, idx: int) -> Tensor:
121 | """
122 | This is equivalent to self.layer_blocks[idx](x),
123 | but torchscript doesn't support this yet
124 | """
125 | num_blocks = len(self.layer_blocks)
126 | if idx < 0:
127 | idx += num_blocks
128 | i = 0
129 | out = x
130 | for module in self.layer_blocks:
131 | if i == idx:
132 | out = module(x)
133 | i += 1
134 | return out
135 |
136 | def forward(self, x: Dict[str, Tensor]) -> Dict[str, Tensor]:
137 | """
138 | Computes the FPN for a set of feature maps.
139 | Arguments:
140 | x (OrderedDict[Tensor]): feature maps for each feature level.
141 | Returns:
142 | results (OrderedDict[Tensor]): feature maps after FPN layers.
143 | They are ordered from highest resolution first.
144 | """
145 | # unpack OrderedDict into two lists for easier handling
146 | names = list(x.keys())
147 | x = list(x.values())
148 |
149 | # 将resnet layer4的channel调整到指定的out_channels
150 | # last_inner = self.inner_blocks[-1](x[-1])
151 | last_inner = self.get_result_from_inner_blocks(x[-1], -1)
152 | # result中保存着每个预测特征层
153 | results = []
154 | # 将layer4调整channel后的特征矩阵,通过3x3卷积后得到对应的预测特征矩阵
155 | # results.append(self.layer_blocks[-1](last_inner))
156 | results.append(self.get_result_from_layer_blocks(last_inner, -1))
157 |
158 | for idx in range(len(x) - 2, -1, -1):
159 | inner_lateral = self.get_result_from_inner_blocks(x[idx], idx)
160 | feat_shape = inner_lateral.shape[-2:]
161 | inner_top_down = F.interpolate(last_inner, size=feat_shape, mode="nearest")
162 | last_inner = inner_lateral + inner_top_down
163 | results.insert(0, self.get_result_from_layer_blocks(last_inner, idx))
164 |
165 | # 在layer4对应的预测特征层基础上生成预测特征矩阵5
166 | if self.extra_blocks is not None:
167 | results, names = self.extra_blocks(results, x, names)
168 |
169 | # make it back an OrderedDict
170 | out = OrderedDict([(k, v) for k, v in zip(names, results)])
171 |
172 | return out
173 |
174 |
175 | class LastLevelMaxPool(torch.nn.Module):
176 | """
177 | Applies a max_pool2d on top of the last feature map
178 | """
179 |
180 | def forward(self, x: List[Tensor], y: List[Tensor], names: List[str]) -> Tuple[List[Tensor], List[str]]:
181 | names.append("pool")
182 | x.append(F.max_pool2d(x[-1], 1, 2, 0)) # input, kernel_size, stride, padding
183 | return x, names
184 |
185 |
186 | class BackboneWithFPN(nn.Module):
187 | """
188 | Adds a FPN on top of a model.
189 | Internally, it uses torchvision.models._utils.IntermediateLayerGetter to
190 | extract a submodel that returns the feature maps specified in return_layers.
191 | The same limitations of IntermediatLayerGetter apply here.
192 | Arguments:
193 | backbone (nn.Module)
194 | return_layers (Dict[name, new_name]): a dict containing the names
195 | of the modules for which the activations will be returned as
196 | the key of the dict, and the value of the dict is the name
197 | of the returned activation (which the user can specify).
198 | in_channels_list (List[int]): number of channels for each feature map
199 | that is returned, in the order they are present in the OrderedDict
200 | out_channels (int): number of channels in the FPN.
201 | extra_blocks: ExtraFPNBlock
202 | Attributes:
203 | out_channels (int): the number of channels in the FPN
204 | """
205 |
206 | def __init__(self,
207 | backbone: nn.Module,
208 | return_layers=None,
209 | in_channels_list=None,
210 | out_channels=256,
211 | extra_blocks=None,
212 | re_getter=True):
213 | super().__init__()
214 |
215 | if extra_blocks is None:
216 | extra_blocks = LastLevelMaxPool()
217 |
218 | if re_getter is True:
219 | assert return_layers is not None
220 | self.body = IntermediateLayerGetter(backbone, return_layers=return_layers)
221 | else:
222 | self.body = backbone
223 |
224 | self.fpn = FeaturePyramidNetwork(
225 | in_channels_list=in_channels_list,
226 | out_channels=out_channels,
227 | extra_blocks=extra_blocks,
228 | )
229 |
230 | self.out_channels = out_channels
231 |
232 | def forward(self, x):
233 | x = self.body(x)
234 | x = self.fpn(x)
235 | return x
236 |
--------------------------------------------------------------------------------
/faster_rcnn/backbone/mobilenetv2_model.py:
--------------------------------------------------------------------------------
1 | from torch import nn
2 | import torch
3 | from torchvision.ops import misc
4 |
5 |
6 | def _make_divisible(ch, divisor=8, min_ch=None):
7 | """
8 | This function is taken from the original tf repo.
9 | It ensures that all layers have a channel number that is divisible by 8
10 | It can be seen here:
11 | https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
12 | """
13 | if min_ch is None:
14 | min_ch = divisor
15 | new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor)
16 | # Make sure that round down does not go down by more than 10%.
17 | if new_ch < 0.9 * ch:
18 | new_ch += divisor
19 | return new_ch
20 |
21 |
22 | class ConvBNReLU(nn.Sequential):
23 | def __init__(self, in_channel, out_channel, kernel_size=3, stride=1, groups=1, norm_layer=None):
24 | padding = (kernel_size - 1) // 2
25 | if norm_layer is None:
26 | norm_layer = nn.BatchNorm2d
27 | super(ConvBNReLU, self).__init__(
28 | nn.Conv2d(in_channel, out_channel, kernel_size, stride, padding, groups=groups, bias=False),
29 | norm_layer(out_channel),
30 | nn.ReLU6(inplace=True)
31 | )
32 |
33 |
34 | class InvertedResidual(nn.Module):
35 | def __init__(self, in_channel, out_channel, stride, expand_ratio, norm_layer=None):
36 | super(InvertedResidual, self).__init__()
37 | hidden_channel = in_channel * expand_ratio
38 | self.use_shortcut = stride == 1 and in_channel == out_channel
39 | if norm_layer is None:
40 | norm_layer = nn.BatchNorm2d
41 |
42 | layers = []
43 | if expand_ratio != 1:
44 | # 1x1 pointwise conv
45 | layers.append(ConvBNReLU(in_channel, hidden_channel, kernel_size=1, norm_layer=norm_layer))
46 | layers.extend([
47 | # 3x3 depthwise conv
48 | ConvBNReLU(hidden_channel, hidden_channel, stride=stride, groups=hidden_channel, norm_layer=norm_layer),
49 | # 1x1 pointwise conv(linear)
50 | nn.Conv2d(hidden_channel, out_channel, kernel_size=1, bias=False),
51 | norm_layer(out_channel),
52 | ])
53 |
54 | self.conv = nn.Sequential(*layers)
55 |
56 | def forward(self, x):
57 | if self.use_shortcut:
58 | return x + self.conv(x)
59 | else:
60 | return self.conv(x)
61 |
62 |
63 | class MobileNetV2(nn.Module):
64 | def __init__(self, num_classes=1000, alpha=1.0, round_nearest=8, weights_path=None, norm_layer=None):
65 | super(MobileNetV2, self).__init__()
66 | block = InvertedResidual
67 | input_channel = _make_divisible(32 * alpha, round_nearest)
68 | last_channel = _make_divisible(1280 * alpha, round_nearest)
69 |
70 | if norm_layer is None:
71 | norm_layer = nn.BatchNorm2d
72 |
73 | inverted_residual_setting = [
74 | # t, c, n, s
75 | [1, 16, 1, 1],
76 | [6, 24, 2, 2],
77 | [6, 32, 3, 2],
78 | [6, 64, 4, 2],
79 | [6, 96, 3, 1],
80 | [6, 160, 3, 2],
81 | [6, 320, 1, 1],
82 | ]
83 |
84 | features = []
85 | # conv1 layer
86 | features.append(ConvBNReLU(3, input_channel, stride=2, norm_layer=norm_layer))
87 | # building inverted residual residual blockes
88 | for t, c, n, s in inverted_residual_setting:
89 | output_channel = _make_divisible(c * alpha, round_nearest)
90 | for i in range(n):
91 | stride = s if i == 0 else 1
92 | features.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer))
93 | input_channel = output_channel
94 | # building last several layers
95 | features.append(ConvBNReLU(input_channel, last_channel, 1, norm_layer=norm_layer))
96 | # combine feature layers
97 | self.features = nn.Sequential(*features)
98 |
99 | # building classifier
100 | self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
101 | self.classifier = nn.Sequential(
102 | nn.Dropout(0.2),
103 | nn.Linear(last_channel, num_classes)
104 | )
105 |
106 | if weights_path is None:
107 | # weight initialization
108 | for m in self.modules():
109 | if isinstance(m, nn.Conv2d):
110 | nn.init.kaiming_normal_(m.weight, mode='fan_out')
111 | if m.bias is not None:
112 | nn.init.zeros_(m.bias)
113 | elif isinstance(m, nn.BatchNorm2d):
114 | nn.init.ones_(m.weight)
115 | nn.init.zeros_(m.bias)
116 | elif isinstance(m, nn.Linear):
117 | nn.init.normal_(m.weight, 0, 0.01)
118 | nn.init.zeros_(m.bias)
119 | else:
120 | self.load_state_dict(torch.load(weights_path))
121 |
122 | def forward(self, x):
123 | x = self.features(x)
124 | x = self.avgpool(x)
125 | x = torch.flatten(x, 1)
126 | x = self.classifier(x)
127 | return x
128 |
--------------------------------------------------------------------------------
/faster_rcnn/backbone/resnet50_fpn_model.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 | import torch
4 | import torch.nn as nn
5 | from torchvision.ops.misc import FrozenBatchNorm2d
6 |
7 | from .feature_pyramid_network import BackboneWithFPN, LastLevelMaxPool
8 |
9 |
10 | class Bottleneck(nn.Module):
11 | expansion = 4
12 |
13 | def __init__(self, in_channel, out_channel, stride=1, downsample=None, norm_layer=None):
14 | super().__init__()
15 | if norm_layer is None:
16 | norm_layer = nn.BatchNorm2d
17 |
18 | self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
19 | kernel_size=1, stride=1, bias=False) # squeeze channels
20 | self.bn1 = norm_layer(out_channel)
21 | # -----------------------------------------
22 | self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
23 | kernel_size=3, stride=stride, bias=False, padding=1)
24 | self.bn2 = norm_layer(out_channel)
25 | # -----------------------------------------
26 | self.conv3 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel * self.expansion,
27 | kernel_size=1, stride=1, bias=False) # unsqueeze channels
28 | self.bn3 = norm_layer(out_channel * self.expansion)
29 | self.relu = nn.ReLU(inplace=True)
30 | self.downsample = downsample
31 |
32 | def forward(self, x):
33 | identity = x
34 | if self.downsample is not None:
35 | identity = self.downsample(x)
36 |
37 | out = self.conv1(x)
38 | out = self.bn1(out)
39 | out = self.relu(out)
40 |
41 | out = self.conv2(out)
42 | out = self.bn2(out)
43 | out = self.relu(out)
44 |
45 | out = self.conv3(out)
46 | out = self.bn3(out)
47 |
48 | out += identity
49 | out = self.relu(out)
50 |
51 | return out
52 |
53 |
54 | class ResNet(nn.Module):
55 |
56 | def __init__(self, block, blocks_num, num_classes=1000, include_top=True, norm_layer=None):
57 | super().__init__()
58 | if norm_layer is None:
59 | norm_layer = nn.BatchNorm2d
60 | self._norm_layer = norm_layer
61 |
62 | self.include_top = include_top
63 | self.in_channel = 64
64 |
65 | self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2,
66 | padding=3, bias=False)
67 | self.bn1 = norm_layer(self.in_channel)
68 | self.relu = nn.ReLU(inplace=True)
69 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
70 | self.layer1 = self._make_layer(block, 64, blocks_num[0])
71 | self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
72 | self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
73 | self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)
74 | if self.include_top:
75 | self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # output size = (1, 1)
76 | self.fc = nn.Linear(512 * block.expansion, num_classes)
77 |
78 | for m in self.modules():
79 | if isinstance(m, nn.Conv2d):
80 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
81 |
82 | def _make_layer(self, block, channel, block_num, stride=1):
83 | norm_layer = self._norm_layer
84 | downsample = None
85 | if stride != 1 or self.in_channel != channel * block.expansion:
86 | downsample = nn.Sequential(
87 | nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
88 | norm_layer(channel * block.expansion))
89 |
90 | layers = []
91 | layers.append(block(self.in_channel, channel, downsample=downsample,
92 | stride=stride, norm_layer=norm_layer))
93 | self.in_channel = channel * block.expansion
94 |
95 | for _ in range(1, block_num):
96 | layers.append(block(self.in_channel, channel, norm_layer=norm_layer))
97 |
98 | return nn.Sequential(*layers)
99 |
100 | def forward(self, x):
101 | x = self.conv1(x)
102 | x = self.bn1(x)
103 | x = self.relu(x)
104 | x = self.maxpool(x)
105 |
106 | x = self.layer1(x)
107 | x = self.layer2(x)
108 | x = self.layer3(x)
109 | x = self.layer4(x)
110 |
111 | if self.include_top:
112 | x = self.avgpool(x)
113 | x = torch.flatten(x, 1)
114 | x = self.fc(x)
115 |
116 | return x
117 |
118 |
119 | def overwrite_eps(model, eps):
120 | """
121 | This method overwrites the default eps values of all the
122 | FrozenBatchNorm2d layers of the model with the provided value.
123 | This is necessary to address the BC-breaking change introduced
124 | by the bug-fix at pytorch/vision#2933. The overwrite is applied
125 | only when the pretrained weights are loaded to maintain compatibility
126 | with previous versions.
127 |
128 | Args:
129 | model (nn.Module): The model on which we perform the overwrite.
130 | eps (float): The new value of eps.
131 | """
132 | for module in model.modules():
133 | if isinstance(module, FrozenBatchNorm2d):
134 | module.eps = eps
135 |
136 |
137 | def resnet50_fpn_backbone(pretrain_path="",
138 | norm_layer=FrozenBatchNorm2d, # FrozenBatchNorm2d的功能与BatchNorm2d类似,但参数无法更新
139 | trainable_layers=3,
140 | returned_layers=None,
141 | extra_blocks=None):
142 | """
143 | 搭建resnet50_fpn——backbone
144 | Args:
145 | pretrain_path: resnet50的预训练权重,如果不使用就默认为空
146 | norm_layer: 官方默认的是FrozenBatchNorm2d,即不会更新参数的bn层(因为如果batch_size设置的很小会导致效果更差,还不如不用bn层)
147 | 如果自己的GPU显存很大可以设置很大的batch_size,那么自己可以传入正常的BatchNorm2d层
148 | (https://github.com/facebookresearch/maskrcnn-benchmark/issues/267)
149 | trainable_layers: 指定训练哪些层结构
150 | returned_layers: 指定哪些层的输出需要返回
151 | extra_blocks: 在输出的特征层基础上额外添加的层结构
152 |
153 | Returns:
154 |
155 | """
156 | resnet_backbone = ResNet(Bottleneck, [3, 4, 6, 3],
157 | include_top=False,
158 | norm_layer=norm_layer)
159 |
160 | if isinstance(norm_layer, FrozenBatchNorm2d):
161 | overwrite_eps(resnet_backbone, 0.0)
162 |
163 | if pretrain_path != "":
164 | assert os.path.exists(pretrain_path), "{} is not exist.".format(pretrain_path)
165 | # 载入预训练权重
166 | print(resnet_backbone.load_state_dict(torch.load(pretrain_path), strict=False))
167 |
168 | # select layers that wont be frozen
169 | assert 0 <= trainable_layers <= 5
170 | layers_to_train = ['layer4', 'layer3', 'layer2', 'layer1', 'conv1'][:trainable_layers]
171 |
172 | # 如果要训练所有层结构的话,不要忘了conv1后还有一个bn1
173 | if trainable_layers == 5:
174 | layers_to_train.append("bn1")
175 |
176 | # freeze layers
177 | for name, parameter in resnet_backbone.named_parameters():
178 | # 只训练不在layers_to_train列表中的层结构
179 | if all([not name.startswith(layer) for layer in layers_to_train]):
180 | parameter.requires_grad_(False)
181 |
182 | if extra_blocks is None:
183 | extra_blocks = LastLevelMaxPool()
184 |
185 | if returned_layers is None:
186 | returned_layers = [1, 2, 3, 4]
187 | # 返回的特征层个数肯定大于0小于5
188 | assert min(returned_layers) > 0 and max(returned_layers) < 5
189 |
190 | # return_layers = {'layer1': '0', 'layer2': '1', 'layer3': '2', 'layer4': '3'}
191 | return_layers = {f'layer{k}': str(v) for v, k in enumerate(returned_layers)}
192 |
193 | # in_channel 为layer4的输出特征矩阵channel = 2048
194 | in_channels_stage2 = resnet_backbone.in_channel // 8 # 256
195 | # 记录resnet50提供给fpn的每个特征层channel
196 | in_channels_list = [in_channels_stage2 * 2 ** (i - 1) for i in returned_layers]
197 | # 通过fpn后得到的每个特征层的channel
198 | out_channels = 256
199 | return BackboneWithFPN(resnet_backbone, return_layers, in_channels_list, out_channels, extra_blocks=extra_blocks)
200 |
--------------------------------------------------------------------------------
/faster_rcnn/backbone/vgg_model.py:
--------------------------------------------------------------------------------
1 | import torch.nn as nn
2 | import torch
3 |
4 |
5 | class VGG(nn.Module):
6 | def __init__(self, features, class_num=1000, init_weights=False, weights_path=None):
7 | super(VGG, self).__init__()
8 | self.features = features
9 | self.classifier = nn.Sequential(
10 | nn.Linear(512*7*7, 4096),
11 | nn.ReLU(True),
12 | nn.Dropout(p=0.5),
13 | nn.Linear(4096, 4096),
14 | nn.ReLU(True),
15 | nn.Dropout(p=0.5),
16 | nn.Linear(4096, class_num)
17 | )
18 | if init_weights and weights_path is None:
19 | self._initialize_weights()
20 |
21 | if weights_path is not None:
22 | self.load_state_dict(torch.load(weights_path))
23 |
24 | def forward(self, x):
25 | # N x 3 x 224 x 224
26 | x = self.features(x)
27 | # N x 512 x 7 x 7
28 | x = torch.flatten(x, start_dim=1)
29 | # N x 512*7*7
30 | x = self.classifier(x)
31 | return x
32 |
33 | def _initialize_weights(self):
34 | for m in self.modules():
35 | if isinstance(m, nn.Conv2d):
36 | # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
37 | nn.init.xavier_uniform_(m.weight)
38 | if m.bias is not None:
39 | nn.init.constant_(m.bias, 0)
40 | elif isinstance(m, nn.Linear):
41 | nn.init.xavier_uniform_(m.weight)
42 | # nn.init.normal_(m.weight, 0, 0.01)
43 | nn.init.constant_(m.bias, 0)
44 |
45 |
46 | def make_features(cfg: list):
47 | layers = []
48 | in_channels = 3
49 | for v in cfg:
50 | if v == "M":
51 | layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
52 | else:
53 | conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
54 | layers += [conv2d, nn.ReLU(True)]
55 | in_channels = v
56 | return nn.Sequential(*layers)
57 |
58 |
59 | cfgs = {
60 | 'vgg11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
61 | 'vgg13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
62 | 'vgg16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],
63 | 'vgg19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'],
64 | }
65 |
66 |
67 | def vgg(model_name="vgg16", weights_path=None):
68 | assert model_name in cfgs, "Warning: model number {} not in cfgs dict!".format(model_name)
69 | cfg = cfgs[model_name]
70 |
71 | model = VGG(make_features(cfg), weights_path=weights_path)
72 | return model
73 |
--------------------------------------------------------------------------------
/faster_rcnn/cityscrapes8_indices.json:
--------------------------------------------------------------------------------
1 | {
2 | "1": "car",
3 | "2": "truck",
4 | "3": "bus",
5 | "4": "caravan"
6 | }
--------------------------------------------------------------------------------
/faster_rcnn/draw_box_utils.py:
--------------------------------------------------------------------------------
1 | from PIL.Image import Image, fromarray
2 | import PIL.ImageDraw as ImageDraw
3 | import PIL.ImageFont as ImageFont
4 | from PIL import ImageColor
5 | import numpy as np
6 |
7 | STANDARD_COLORS = [
8 | 'AliceBlue', 'Chartreuse', 'Aqua', 'Aquamarine', 'Azure', 'Beige', 'Bisque',
9 | 'BlanchedAlmond', 'BlueViolet', 'BurlyWood', 'CadetBlue', 'AntiqueWhite',
10 | 'Chocolate', 'Coral', 'CornflowerBlue', 'Cornsilk', 'Crimson', 'Cyan',
11 | 'DarkCyan', 'DarkGoldenRod', 'DarkGrey', 'DarkKhaki', 'DarkOrange',
12 | 'DarkOrchid', 'DarkSalmon', 'DarkSeaGreen', 'DarkTurquoise', 'DarkViolet',
13 | 'DeepPink', 'DeepSkyBlue', 'DodgerBlue', 'FireBrick', 'FloralWhite',
14 | 'ForestGreen', 'Fuchsia', 'Gainsboro', 'GhostWhite', 'Gold', 'GoldenRod',
15 | 'Salmon', 'Tan', 'HoneyDew', 'HotPink', 'IndianRed', 'Ivory', 'Khaki',
16 | 'Lavender', 'LavenderBlush', 'LawnGreen', 'LemonChiffon', 'LightBlue',
17 | 'LightCoral', 'LightCyan', 'LightGoldenRodYellow', 'LightGray', 'LightGrey',
18 | 'LightGreen', 'LightPink', 'LightSalmon', 'LightSeaGreen', 'LightSkyBlue',
19 | 'LightSlateGray', 'LightSlateGrey', 'LightSteelBlue', 'LightYellow', 'Lime',
20 | 'LimeGreen', 'Linen', 'Magenta', 'MediumAquaMarine', 'MediumOrchid',
21 | 'MediumPurple', 'MediumSeaGreen', 'MediumSlateBlue', 'MediumSpringGreen',
22 | 'MediumTurquoise', 'MediumVioletRed', 'MintCream', 'MistyRose', 'Moccasin',
23 | 'NavajoWhite', 'OldLace', 'Olive', 'OliveDrab', 'Orange', 'OrangeRed',
24 | 'Orchid', 'PaleGoldenRod', 'PaleGreen', 'PaleTurquoise', 'PaleVioletRed',
25 | 'PapayaWhip', 'PeachPuff', 'Peru', 'Pink', 'Plum', 'PowderBlue', 'Purple',
26 | 'Red', 'RosyBrown', 'RoyalBlue', 'SaddleBrown', 'Green', 'SandyBrown',
27 | 'SeaGreen', 'SeaShell', 'Sienna', 'Silver', 'SkyBlue', 'SlateBlue',
28 | 'SlateGray', 'SlateGrey', 'Snow', 'SpringGreen', 'SteelBlue', 'GreenYellow',
29 | 'Teal', 'Thistle', 'Tomato', 'Turquoise', 'Violet', 'Wheat', 'White',
30 | 'WhiteSmoke', 'Yellow', 'YellowGreen'
31 | ]
32 |
33 |
34 | def draw_text(draw,
35 | box: list,
36 | cls: int,
37 | score: float,
38 | category_index: dict,
39 | color: str,
40 | font: str = 'arial.ttf',
41 | font_size: int = 24):
42 | """
43 | 将目标边界框和类别信息绘制到图片上
44 | """
45 | try:
46 | font = ImageFont.truetype(font, font_size)
47 | except IOError:
48 | font = ImageFont.load_default()
49 |
50 | left, top, right, bottom = box
51 | # If the total height of the display strings added to the top of the bounding
52 | # box exceeds the top of the image, stack the strings below the bounding box
53 | # instead of above.
54 | display_str = f"{category_index[str(cls)]}: {int(100 * score)}%"
55 | display_str_heights = [font.getsize(ds)[1] for ds in display_str]
56 | # Each display_str has a top and bottom margin of 0.05x.
57 | display_str_height = (1 + 2 * 0.05) * max(display_str_heights)
58 |
59 | if top > display_str_height:
60 | text_top = top - display_str_height
61 | text_bottom = top
62 | else:
63 | text_top = bottom
64 | text_bottom = bottom + display_str_height
65 |
66 | for ds in display_str:
67 | text_width, text_height = font.getsize(ds)
68 | margin = np.ceil(0.05 * text_width)
69 | draw.rectangle([(left, text_top),
70 | (left + text_width + 2 * margin, text_bottom)], fill=color)
71 | draw.text((left + margin, text_top),
72 | ds,
73 | fill='black',
74 | font=font)
75 | left += text_width
76 |
77 |
78 | def draw_masks(image, masks, colors, thresh: float = 0.7, alpha: float = 0.5):
79 | np_image = np.array(image)
80 | masks = np.where(masks > thresh, True, False)
81 |
82 | # colors = np.array(colors)
83 | img_to_draw = np.copy(np_image)
84 | # TODO: There might be a way to vectorize this
85 | for mask, color in zip(masks, colors):
86 | img_to_draw[mask] = color
87 |
88 | out = np_image * (1 - alpha) + img_to_draw * alpha
89 | return fromarray(out.astype(np.uint8))
90 |
91 |
92 | def draw_objs(image: Image,
93 | boxes: np.ndarray = None,
94 | classes: np.ndarray = None,
95 | scores: np.ndarray = None,
96 | masks: np.ndarray = None,
97 | category_index: dict = None,
98 | box_thresh: float = 0.1,
99 | mask_thresh: float = 0.5,
100 | line_thickness: int = 8,
101 | font: str = 'arial.ttf',
102 | font_size: int = 24,
103 | draw_boxes_on_image: bool = True,
104 | draw_masks_on_image: bool = False):
105 | """
106 | 将目标边界框信息,类别信息,mask信息绘制在图片上
107 | Args:
108 | image: 需要绘制的图片
109 | boxes: 目标边界框信息
110 | classes: 目标类别信息
111 | scores: 目标概率信息
112 | masks: 目标mask信息
113 | category_index: 类别与名称字典
114 | box_thresh: 过滤的概率阈值
115 | mask_thresh:
116 | line_thickness: 边界框宽度
117 | font: 字体类型
118 | font_size: 字体大小
119 | draw_boxes_on_image:
120 | draw_masks_on_image:
121 |
122 | Returns:
123 |
124 | """
125 |
126 | # 过滤掉低概率的目标
127 | idxs = np.greater(scores, box_thresh)
128 | boxes = boxes[idxs]
129 | classes = classes[idxs]
130 | scores = scores[idxs]
131 | if masks is not None:
132 | masks = masks[idxs]
133 | if len(boxes) == 0:
134 | return image
135 |
136 | colors = [ImageColor.getrgb(STANDARD_COLORS[cls % len(STANDARD_COLORS)]) for cls in classes]
137 |
138 | if draw_boxes_on_image:
139 | # Draw all boxes onto image.
140 | draw = ImageDraw.Draw(image)
141 | for box, cls, score, color in zip(boxes, classes, scores, colors):
142 | left, top, right, bottom = box
143 | # 绘制目标边界框
144 | draw.line([(left, top), (left, bottom), (right, bottom),
145 | (right, top), (left, top)], width=line_thickness, fill=color)
146 | # 绘制类别和概率信息
147 | draw_text(draw, box.tolist(), int(cls), float(score), category_index, color, font, font_size)
148 |
149 | if draw_masks_on_image and (masks is not None):
150 | # Draw all mask onto image.
151 | image = draw_masks(image, masks, colors, mask_thresh)
152 |
153 | return image
154 |
--------------------------------------------------------------------------------
/faster_rcnn/my_dataset.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from torch.utils.data import Dataset
3 | import os
4 | import torch
5 | import json
6 | from PIL import Image
7 | from lxml import etree
8 |
9 |
10 | class VOCDataSet(Dataset):
11 | """读取解析PASCAL VOC2007/2012数据集"""
12 |
13 | def __init__(self, voc_root, year="2012", transforms=None, txt_name: str = "train.txt"):
14 | assert year in ["2007", "2012"], "year must be in ['2007', '2012']"
15 | # 增加容错能力
16 | if "VOCdevkit" in voc_root:
17 | self.root = os.path.join(voc_root, f"VOC{year}")
18 | else:
19 | self.root = os.path.join(voc_root, "VOCdevkit", f"VOC{year}")
20 | self.img_root = os.path.join(self.root, "JPEGImages")
21 | self.annotations_root = os.path.join(self.root, "Annotations")
22 |
23 | # read train.txt or val.txt file
24 | txt_path = os.path.join(self.root, "ImageSets", "Main", txt_name)
25 | assert os.path.exists(txt_path), "not found {} file.".format(txt_name)
26 |
27 | with open(txt_path) as read:
28 | xml_list = [os.path.join(self.annotations_root, line.strip() + ".xml")
29 | for line in read.readlines() if len(line.strip()) > 0]
30 |
31 | self.xml_list = []
32 | # check file
33 | for xml_path in xml_list:
34 | if os.path.exists(xml_path) is False:
35 | print(f"Warning: not found '{xml_path}', skip this annotation file.")
36 | continue
37 |
38 | # check for targets
39 | with open(xml_path) as fid:
40 | xml_str = fid.read()
41 | xml = etree.fromstring(xml_str)
42 | data = self.parse_xml_to_dict(xml)["annotation"]
43 | if "object" not in data:
44 | print(f"INFO: no objects in {xml_path}, skip this annotation file.")
45 | continue
46 |
47 | self.xml_list.append(xml_path)
48 |
49 | assert len(self.xml_list) > 0, "in '{}' file does not find any information.".format(txt_path)
50 |
51 | # read class_indict
52 | json_file = './pascal_voc_classes.json'
53 | assert os.path.exists(json_file), "{} file not exist.".format(json_file)
54 | with open(json_file, 'r') as f:
55 | self.class_dict = json.load(f)
56 |
57 | self.transforms = transforms
58 |
59 | def __len__(self):
60 | return len(self.xml_list)
61 |
62 | def __getitem__(self, idx):
63 | # read xml
64 | xml_path = self.xml_list[idx]
65 | with open(xml_path) as fid:
66 | xml_str = fid.read()
67 | xml = etree.fromstring(xml_str)
68 | data = self.parse_xml_to_dict(xml)["annotation"]
69 | img_path = os.path.join(self.img_root, data["filename"])
70 | image = Image.open(img_path)
71 | if image.format != "JPEG":
72 | raise ValueError("Image '{}' format not JPEG".format(img_path))
73 |
74 | boxes = []
75 | labels = []
76 | iscrowd = []
77 | assert "object" in data, "{} lack of object information.".format(xml_path)
78 | for obj in data["object"]:
79 | xmin = float(obj["bndbox"]["xmin"])
80 | xmax = float(obj["bndbox"]["xmax"])
81 | ymin = float(obj["bndbox"]["ymin"])
82 | ymax = float(obj["bndbox"]["ymax"])
83 |
84 | # 进一步检查数据,有的标注信息中可能有w或h为0的情况,这样的数据会导致计算回归loss为nan
85 | if xmax <= xmin or ymax <= ymin:
86 | print("Warning: in '{}' xml, there are some bbox w/h <=0".format(xml_path))
87 | continue
88 |
89 | boxes.append([xmin, ymin, xmax, ymax])
90 | labels.append(self.class_dict[obj["name"]])
91 | if "difficult" in obj:
92 | iscrowd.append(int(obj["difficult"]))
93 | else:
94 | iscrowd.append(0)
95 |
96 | # convert everything into a torch.Tensor
97 | boxes = torch.as_tensor(boxes, dtype=torch.float32)
98 | labels = torch.as_tensor(labels, dtype=torch.int64)
99 | iscrowd = torch.as_tensor(iscrowd, dtype=torch.int64)
100 | image_id = torch.tensor([idx])
101 | area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
102 |
103 | target = {}
104 | target["boxes"] = boxes
105 | target["labels"] = labels
106 | target["image_id"] = image_id
107 | target["area"] = area
108 | target["iscrowd"] = iscrowd
109 |
110 | if self.transforms is not None:
111 | image, target = self.transforms(image, target)
112 |
113 | return image, target
114 |
115 | def get_height_and_width(self, idx):
116 | # read xml
117 | xml_path = self.xml_list[idx]
118 | with open(xml_path) as fid:
119 | xml_str = fid.read()
120 | xml = etree.fromstring(xml_str)
121 | data = self.parse_xml_to_dict(xml)["annotation"]
122 | data_height = int(data["size"]["height"])
123 | data_width = int(data["size"]["width"])
124 | return data_height, data_width
125 |
126 | def parse_xml_to_dict(self, xml):
127 | """
128 | 将xml文件解析成字典形式,参考tensorflow的recursive_parse_xml_to_dict
129 | Args:
130 | xml: xml tree obtained by parsing XML file contents using lxml.etree
131 |
132 | Returns:
133 | Python dictionary holding XML contents.
134 | """
135 |
136 | if len(xml) == 0: # 遍历到底层,直接返回tag对应的信息
137 | return {xml.tag: xml.text}
138 |
139 | result = {}
140 | for child in xml:
141 | child_result = self.parse_xml_to_dict(child) # 递归遍历标签信息
142 | if child.tag != 'object':
143 | result[child.tag] = child_result[child.tag]
144 | else:
145 | if child.tag not in result: # 因为object可能有多个,所以需要放入列表里
146 | result[child.tag] = []
147 | result[child.tag].append(child_result[child.tag])
148 | return {xml.tag: result}
149 |
150 | def coco_index(self, idx):
151 | """
152 | 该方法是专门为pycocotools统计标签信息准备,不对图像和标签作任何处理
153 | 由于不用去读取图片,可大幅缩减统计时间
154 |
155 | Args:
156 | idx: 输入需要获取图像的索引
157 | """
158 | # read xml
159 | xml_path = self.xml_list[idx]
160 | with open(xml_path) as fid:
161 | xml_str = fid.read()
162 | xml = etree.fromstring(xml_str)
163 | data = self.parse_xml_to_dict(xml)["annotation"]
164 | data_height = int(data["size"]["height"])
165 | data_width = int(data["size"]["width"])
166 | # img_path = os.path.join(self.img_root, data["filename"])
167 | # image = Image.open(img_path)
168 | # if image.format != "JPEG":
169 | # raise ValueError("Image format not JPEG")
170 | boxes = []
171 | labels = []
172 | iscrowd = []
173 | for obj in data["object"]:
174 | xmin = float(obj["bndbox"]["xmin"])
175 | xmax = float(obj["bndbox"]["xmax"])
176 | ymin = float(obj["bndbox"]["ymin"])
177 | ymax = float(obj["bndbox"]["ymax"])
178 | boxes.append([xmin, ymin, xmax, ymax])
179 | labels.append(self.class_dict[obj["name"]])
180 | iscrowd.append(int(obj["difficult"]))
181 |
182 | # convert everything into a torch.Tensor
183 | boxes = torch.as_tensor(boxes, dtype=torch.float32)
184 | labels = torch.as_tensor(labels, dtype=torch.int64)
185 | iscrowd = torch.as_tensor(iscrowd, dtype=torch.int64)
186 | image_id = torch.tensor([idx])
187 | area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
188 |
189 | target = {}
190 | target["boxes"] = boxes
191 | target["labels"] = labels
192 | target["image_id"] = image_id
193 | target["area"] = area
194 | target["iscrowd"] = iscrowd
195 |
196 | return (data_height, data_width), target
197 |
198 | @staticmethod
199 | def collate_fn(batch):
200 | return tuple(zip(*batch))
201 |
202 | # import transforms
203 | # from draw_box_utils import draw_objs
204 | # from PIL import Image
205 | # import json
206 | # import matplotlib.pyplot as plt
207 | # import torchvision.transforms as ts
208 | # import random
209 | #
210 | # # read class_indict
211 | # category_index = {}
212 | # try:
213 | # json_file = open('./pascal_voc_classes.json', 'r')
214 | # class_dict = json.load(json_file)
215 | # category_index = {str(v): str(k) for k, v in class_dict.items()}
216 | # except Exception as e:
217 | # print(e)
218 | # exit(-1)
219 | #
220 | # data_transform = {
221 | # "train": transforms.Compose([transforms.ToTensor(),
222 | # transforms.RandomHorizontalFlip(0.5)]),
223 | # "val": transforms.Compose([transforms.ToTensor()])
224 | # }
225 | #
226 | # # load train data set
227 | # train_data_set = VOCDataSet(os.getcwd(), "2012", data_transform["train"], "train.txt")
228 | # print(len(train_data_set))
229 | # for index in random.sample(range(0, len(train_data_set)), k=5):
230 | # img, target = train_data_set[index]
231 | # img = ts.ToPILImage()(img)
232 | # plot_img = draw_objs(img,
233 | # target["boxes"].numpy(),
234 | # target["labels"].numpy(),
235 | # np.ones(target["labels"].shape[0]),
236 | # category_index=category_index,
237 | # box_thresh=0.5,
238 | # line_thickness=3,
239 | # font='arial.ttf',
240 | # font_size=20)
241 | # plt.imshow(plot_img)
242 | # plt.show()
243 |
--------------------------------------------------------------------------------
/faster_rcnn/network_files/__init__.py:
--------------------------------------------------------------------------------
1 | from .faster_rcnn_framework import FasterRCNN, FastRCNNPredictor
2 | from .rpn_function import AnchorsGenerator
3 |
--------------------------------------------------------------------------------
/faster_rcnn/network_files/boxes.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from typing import Tuple
3 | from torch import Tensor
4 | import torchvision
5 |
6 |
7 | def nms(boxes, scores, iou_threshold):
8 | # type: (Tensor, Tensor, float) -> Tensor
9 | """
10 | Performs non-maximum suppression (NMS) on the boxes according
11 | to their intersection-over-union (IoU).
12 |
13 | NMS iteratively removes lower scoring boxes which have an
14 | IoU greater than iou_threshold with another (higher scoring)
15 | box.
16 |
17 | Parameters
18 | ----------
19 | boxes : Tensor[N, 4])
20 | boxes to perform NMS on. They
21 | are expected to be in (x1, y1, x2, y2) format
22 | scores : Tensor[N]
23 | scores for each one of the boxes
24 | iou_threshold : float
25 | discards all overlapping
26 | boxes with IoU > iou_threshold
27 |
28 | Returns
29 | -------
30 | keep : Tensor
31 | int64 tensor with the indices
32 | of the elements that have been kept
33 | by NMS, sorted in decreasing order of scores
34 | """
35 | return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
36 |
37 |
38 | def batched_nms(boxes, scores, idxs, iou_threshold):
39 | # type: (Tensor, Tensor, Tensor, float) -> Tensor
40 | """
41 | Performs non-maximum suppression in a batched fashion.
42 |
43 | Each index value correspond to a category, and NMS
44 | will not be applied between elements of different categories.
45 |
46 | Parameters
47 | ----------
48 | boxes : Tensor[N, 4]
49 | boxes where NMS will be performed. They
50 | are expected to be in (x1, y1, x2, y2) format
51 | scores : Tensor[N]
52 | scores for each one of the boxes
53 | idxs : Tensor[N]
54 | indices of the categories for each one of the boxes.
55 | iou_threshold : float
56 | discards all overlapping boxes
57 | with IoU < iou_threshold
58 |
59 | Returns
60 | -------
61 | keep : Tensor
62 | int64 tensor with the indices of
63 | the elements that have been kept by NMS, sorted
64 | in decreasing order of scores
65 | """
66 | if boxes.numel() == 0:
67 | return torch.empty((0,), dtype=torch.int64, device=boxes.device)
68 |
69 | # strategy: in order to perform NMS independently per class.
70 | # we add an offset to all the boxes. The offset is dependent
71 | # only on the class idx, and is large enough so that boxes
72 | # from different classes do not overlap
73 | # 获取所有boxes中最大的坐标值(xmin, ymin, xmax, ymax)
74 | max_coordinate = boxes.max()
75 |
76 | # to(): Performs Tensor dtype and/or device conversion
77 | # 为每一个类别/每一层生成一个很大的偏移量
78 | # 这里的to只是让生成tensor的dytpe和device与boxes保持一致
79 | offsets = idxs.to(boxes) * (max_coordinate + 1)
80 | # boxes加上对应层的偏移量后,保证不同类别/层之间boxes不会有重合的现象
81 | boxes_for_nms = boxes + offsets[:, None]
82 | keep = nms(boxes_for_nms, scores, iou_threshold)
83 | return keep
84 |
85 |
86 | def remove_small_boxes(boxes, min_size):
87 | # type: (Tensor, float) -> Tensor
88 | """
89 | Remove boxes which contains at least one side smaller than min_size.
90 | 移除宽高小于指定阈值的索引
91 | Arguments:
92 | boxes (Tensor[N, 4]): boxes in (x1, y1, x2, y2) format
93 | min_size (float): minimum size
94 |
95 | Returns:
96 | keep (Tensor[K]): indices of the boxes that have both sides
97 | larger than min_size
98 | """
99 | ws, hs = boxes[:, 2] - boxes[:, 0], boxes[:, 3] - boxes[:, 1] # 预测boxes的宽和高
100 | # keep = (ws >= min_size) & (hs >= min_size) # 当满足宽,高都大于给定阈值时为True
101 | keep = torch.logical_and(torch.ge(ws, min_size), torch.ge(hs, min_size))
102 | # nonzero(): Returns a tensor containing the indices of all non-zero elements of input
103 | # keep = keep.nonzero().squeeze(1)
104 | keep = torch.where(keep)[0]
105 | return keep
106 |
107 |
108 | def clip_boxes_to_image(boxes, size):
109 | # type: (Tensor, Tuple[int, int]) -> Tensor
110 | """
111 | Clip boxes so that they lie inside an image of size `size`.
112 | 裁剪预测的boxes信息,将越界的坐标调整到图片边界上
113 |
114 | Arguments:
115 | boxes (Tensor[N, 4]): boxes in (x1, y1, x2, y2) format
116 | size (Tuple[height, width]): size of the image
117 |
118 | Returns:
119 | clipped_boxes (Tensor[N, 4])
120 | """
121 | dim = boxes.dim()
122 | boxes_x = boxes[..., 0::2] # x1, x2
123 | boxes_y = boxes[..., 1::2] # y1, y2
124 | height, width = size
125 |
126 | if torchvision._is_tracing():
127 | boxes_x = torch.max(boxes_x, torch.tensor(0, dtype=boxes.dtype, device=boxes.device))
128 | boxes_x = torch.min(boxes_x, torch.tensor(width, dtype=boxes.dtype, device=boxes.device))
129 | boxes_y = torch.max(boxes_y, torch.tensor(0, dtype=boxes.dtype, device=boxes.device))
130 | boxes_y = torch.min(boxes_y, torch.tensor(height, dtype=boxes.dtype, device=boxes.device))
131 | else:
132 | boxes_x = boxes_x.clamp(min=0, max=width) # 限制x坐标范围在[0,width]之间
133 | boxes_y = boxes_y.clamp(min=0, max=height) # 限制y坐标范围在[0,height]之间
134 |
135 | clipped_boxes = torch.stack((boxes_x, boxes_y), dim=dim)
136 | return clipped_boxes.reshape(boxes.shape)
137 |
138 |
139 | def box_area(boxes):
140 | """
141 | Computes the area of a set of bounding boxes, which are specified by its
142 | (x1, y1, x2, y2) coordinates.
143 |
144 | Arguments:
145 | boxes (Tensor[N, 4]): boxes for which the area will be computed. They
146 | are expected to be in (x1, y1, x2, y2) format
147 |
148 | Returns:
149 | area (Tensor[N]): area for each box
150 | """
151 | return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
152 |
153 |
154 | def box_iou(boxes1, boxes2):
155 | """
156 | Return intersection-over-union (Jaccard index) of boxes.
157 |
158 | Both sets of boxes are expected to be in (x1, y1, x2, y2) format.
159 |
160 | Arguments:
161 | boxes1 (Tensor[N, 4])
162 | boxes2 (Tensor[M, 4])
163 |
164 | Returns:
165 | iou (Tensor[N, M]): the NxM matrix containing the pairwise
166 | IoU values for every element in boxes1 and boxes2
167 | """
168 | area1 = box_area(boxes1)
169 | area2 = box_area(boxes2)
170 |
171 | # When the shapes do not match,
172 | # the shape of the returned output tensor follows the broadcasting rules
173 | lt = torch.max(boxes1[:, None, :2], boxes2[:, :2]) # left-top [N,M,2]
174 | rb = torch.min(boxes1[:, None, 2:], boxes2[:, 2:]) # right-bottom [N,M,2]
175 |
176 | wh = (rb - lt).clamp(min=0) # [N,M,2]
177 | inter = wh[:, :, 0] * wh[:, :, 1] # [N,M]
178 |
179 | iou = inter / (area1[:, None] + area2 - inter)
180 | return iou
181 |
182 |
--------------------------------------------------------------------------------
/faster_rcnn/network_files/image_list.py:
--------------------------------------------------------------------------------
1 | from typing import List, Tuple
2 | from torch import Tensor
3 |
4 |
5 | class ImageList(object):
6 | """
7 | Structure that holds a list of images (of possibly
8 | varying sizes) as a single tensor.
9 | This works by padding the images to the same size,
10 | and storing in a field the original sizes of each image
11 | """
12 |
13 | def __init__(self, tensors, image_sizes):
14 | # type: (Tensor, List[Tuple[int, int]]) -> None
15 | """
16 | Arguments:
17 | tensors (tensor) padding后的图像数据
18 | image_sizes (list[tuple[int, int]]) padding前的图像尺寸
19 | """
20 | self.tensors = tensors
21 | self.image_sizes = image_sizes
22 |
23 | def to(self, device):
24 | # type: (Device) -> ImageList # noqa
25 | cast_tensor = self.tensors.to(device)
26 | return ImageList(cast_tensor, self.image_sizes)
27 |
28 |
--------------------------------------------------------------------------------
/faster_rcnn/pascal_voc_classes.json:
--------------------------------------------------------------------------------
1 | {
2 | "aeroplane": 1,
3 | "bicycle": 2,
4 | "bird": 3,
5 | "boat": 4,
6 | "bottle": 5,
7 | "bus": 6,
8 | "car": 7,
9 | "cat": 8,
10 | "chair": 9,
11 | "cow": 10,
12 | "diningtable": 11,
13 | "dog": 12,
14 | "horse": 13,
15 | "motorbike": 14,
16 | "person": 15,
17 | "pottedplant": 16,
18 | "sheep": 17,
19 | "sofa": 18,
20 | "train": 19,
21 | "tvmonitor": 20
22 | }
--------------------------------------------------------------------------------
/faster_rcnn/plot_curve.py:
--------------------------------------------------------------------------------
1 | import datetime
2 | import matplotlib.pyplot as plt
3 |
4 |
5 | def plot_loss_and_lr(train_loss, learning_rate):
6 | try:
7 | x = list(range(len(train_loss)))
8 | fig, ax1 = plt.subplots(1, 1)
9 | ax1.plot(x, train_loss, 'r', label='loss')
10 | ax1.set_xlabel("step")
11 | ax1.set_ylabel("loss")
12 | ax1.set_title("Train Loss and lr")
13 | plt.legend(loc='best')
14 |
15 | ax2 = ax1.twinx()
16 | ax2.plot(x, learning_rate, label='lr')
17 | ax2.set_ylabel("learning rate")
18 | ax2.set_xlim(0, len(train_loss)) # 设置横坐标整数间隔
19 | plt.legend(loc='best')
20 |
21 | handles1, labels1 = ax1.get_legend_handles_labels()
22 | handles2, labels2 = ax2.get_legend_handles_labels()
23 | plt.legend(handles1 + handles2, labels1 + labels2, loc='upper right')
24 |
25 | fig.subplots_adjust(right=0.8) # 防止出现保存图片显示不全的情况
26 | fig.savefig('./loss_and_lr{}.png'.format(datetime.datetime.now().strftime("%Y%m%d-%H%M%S")))
27 | plt.close()
28 | print("successful save loss curve! ")
29 | except Exception as e:
30 | print(e)
31 |
32 |
33 | def plot_map(mAP):
34 | try:
35 | x = list(range(len(mAP)))
36 | plt.plot(x, mAP, label='mAp')
37 | plt.xlabel('epoch')
38 | plt.ylabel('mAP')
39 | plt.title('Eval mAP')
40 | plt.xlim(0, len(mAP))
41 | plt.legend(loc='best')
42 | plt.savefig('./mAP.png')
43 | plt.close()
44 | print("successful save mAP curve!")
45 | except Exception as e:
46 | print(e)
47 |
--------------------------------------------------------------------------------
/faster_rcnn/predict.py:
--------------------------------------------------------------------------------
1 | import os
2 | import time
3 | import json
4 |
5 | import torch
6 | import torchvision
7 | from PIL import Image
8 | import matplotlib.pyplot as plt
9 |
10 | from torchvision import transforms
11 | from network_files import FasterRCNN, FastRCNNPredictor, AnchorsGenerator
12 | from backbone import resnet50_fpn_backbone, MobileNetV2
13 | from draw_box_utils import draw_objs
14 |
15 |
16 | def create_model(num_classes):
17 | # mobileNetv2+faster_RCNN
18 | # backbone = MobileNetV2().features
19 | # backbone.out_channels = 1280
20 | #
21 | # anchor_generator = AnchorsGenerator(sizes=((32, 64, 128, 256, 512),),
22 | # aspect_ratios=((0.5, 1.0, 2.0),))
23 | #
24 | # roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'],
25 | # output_size=[7, 7],
26 | # sampling_ratio=2)
27 | #
28 | # model = FasterRCNN(backbone=backbone,
29 | # num_classes=num_classes,
30 | # rpn_anchor_generator=anchor_generator,
31 | # box_roi_pool=roi_pooler)
32 |
33 | # resNet50+fpn+faster_RCNN
34 | # 注意,这里的norm_layer要和训练脚本中保持一致
35 | backbone = resnet50_fpn_backbone(norm_layer=torch.nn.BatchNorm2d)
36 | model = FasterRCNN(backbone=backbone, num_classes=num_classes, rpn_score_thresh=0.5)
37 |
38 | return model
39 |
40 |
41 | def time_synchronized():
42 | torch.cuda.synchronize() if torch.cuda.is_available() else None
43 | return time.time()
44 |
45 |
46 | def main():
47 | # get devices
48 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
49 | print("using {} device.".format(device))
50 |
51 | # create model
52 | model = create_model(num_classes=21)
53 |
54 | # load train weights
55 | weights_path = "./save_weights/model.pth"
56 | assert os.path.exists(weights_path), "{} file dose not exist.".format(weights_path)
57 | weights_dict = torch.load(weights_path, map_location='cpu')
58 | weights_dict = weights_dict["model"] if "model" in weights_dict else weights_dict
59 | model.load_state_dict(weights_dict)
60 | model.to(device)
61 |
62 | # read class_indict
63 | label_json_path = './pascal_voc_classes.json'
64 | assert os.path.exists(label_json_path), "json file {} dose not exist.".format(label_json_path)
65 | with open(label_json_path, 'r') as f:
66 | class_dict = json.load(f)
67 |
68 | category_index = {str(v): str(k) for k, v in class_dict.items()}
69 |
70 | # load image
71 | original_img = Image.open("./test.jpg")
72 |
73 | # from pil image to tensor, do not normalize image
74 | data_transform = transforms.Compose([transforms.ToTensor()])
75 | img = data_transform(original_img)
76 | # expand batch dimension
77 | img = torch.unsqueeze(img, dim=0)
78 |
79 | model.eval() # 进入验证模式
80 | with torch.no_grad():
81 | # init
82 | img_height, img_width = img.shape[-2:]
83 | init_img = torch.zeros((1, 3, img_height, img_width), device=device)
84 | model(init_img)
85 |
86 | t_start = time_synchronized()
87 | predictions = model(img.to(device))[0]
88 | t_end = time_synchronized()
89 | print("inference+NMS time: {}".format(t_end - t_start))
90 |
91 | predict_boxes = predictions["boxes"].to("cpu").numpy()
92 | predict_classes = predictions["labels"].to("cpu").numpy()
93 | predict_scores = predictions["scores"].to("cpu").numpy()
94 |
95 | if len(predict_boxes) == 0:
96 | print("没有检测到任何目标!")
97 |
98 | plot_img = draw_objs(original_img,
99 | predict_boxes,
100 | predict_classes,
101 | predict_scores,
102 | category_index=category_index,
103 | box_thresh=0.5,
104 | line_thickness=3,
105 | font='arial.ttf',
106 | font_size=20)
107 | plt.imshow(plot_img)
108 | plt.show()
109 | # 保存预测的图片结果
110 | plot_img.save("test_result.jpg")
111 |
112 |
113 | if __name__ == '__main__':
114 | main()
115 |
--------------------------------------------------------------------------------
/faster_rcnn/record_mAP.txt:
--------------------------------------------------------------------------------
1 | COCO results:
2 | Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.526
3 | Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.804
4 | Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.586
5 | Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.211
6 | Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.403
7 | Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.580
8 | Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.454
9 | Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.639
10 | Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.646
11 | Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.347
12 | Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.540
13 | Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.693
14 |
15 | mAP(IoU=0.5) for each category:
16 | aeroplane : 0.8759546352558178
17 | bicycle : 0.8554609242543677
18 | bird : 0.8434943725365999
19 | boat : 0.6753024837855667
20 | bottle : 0.7185899054232459
21 | bus : 0.8691082170432654
22 | car : 0.8771002682431779
23 | cat : 0.9169138943375639
24 | chair : 0.6403466317122392
25 | cow : 0.8285552434280278
26 | diningtable : 0.6437938565684241
27 | dog : 0.8745793980119227
28 | horse : 0.8718238708874728
29 | motorbike : 0.8910672301923952
30 | person : 0.9047338725598096
31 | pottedplant : 0.5808810399193133
32 | sheep : 0.86045368568359
33 | sofa : 0.7239390963388067
34 | train : 0.8652277764020805
35 | tvmonitor : 0.7683550206571649
--------------------------------------------------------------------------------
/faster_rcnn/requirements.txt:
--------------------------------------------------------------------------------
1 | lxml
2 | matplotlib
3 | numpy
4 | tqdm
5 | torch==1.7.1
6 | torchvision==0.8.2
7 | pycocotools
8 | Pillow
9 |
--------------------------------------------------------------------------------
/faster_rcnn/split_data.py:
--------------------------------------------------------------------------------
1 | import os
2 | import random
3 |
4 |
5 | def main():
6 | random.seed(0) # 设置随机种子,保证随机结果可复现
7 |
8 | files_path = "./VOCdevkit/VOC2012/Annotations"
9 | assert os.path.exists(files_path), "path: '{}' does not exist.".format(files_path)
10 |
11 | val_rate = 0.5
12 |
13 | files_name = sorted([file.split(".")[0] for file in os.listdir(files_path)])
14 | files_num = len(files_name)
15 | val_index = random.sample(range(0, files_num), k=int(files_num*val_rate))
16 | train_files = []
17 | val_files = []
18 | for index, file_name in enumerate(files_name):
19 | if index in val_index:
20 | val_files.append(file_name)
21 | else:
22 | train_files.append(file_name)
23 |
24 | try:
25 | train_f = open("train.txt", "x")
26 | eval_f = open("val.txt", "x")
27 | train_f.write("\n".join(train_files))
28 | eval_f.write("\n".join(val_files))
29 | except FileExistsError as e:
30 | print(e)
31 | exit(1)
32 |
33 |
34 | if __name__ == '__main__':
35 | main()
36 |
--------------------------------------------------------------------------------
/faster_rcnn/train_mobilenetv2.py:
--------------------------------------------------------------------------------
1 | import os
2 | import datetime
3 |
4 | import torch
5 | import torchvision
6 |
7 | import transforms
8 | from network_files import FasterRCNN, AnchorsGenerator
9 | from backbone import MobileNetV2, vgg
10 | from my_dataset import VOCDataSet
11 | from train_utils import GroupedBatchSampler, create_aspect_ratio_groups
12 | from train_utils import train_eval_utils as utils
13 |
14 |
15 | def create_model(num_classes):
16 | # https://download.pytorch.org/models/vgg16-397923af.pth
17 | # 如果使用vgg16的话就下载对应预训练权重并取消下面注释,接着把mobilenetv2模型对应的两行代码注释掉
18 | # vgg_feature = vgg(model_name="vgg16", weights_path="./backbone/vgg16.pth").features
19 | # backbone = torch.nn.Sequential(*list(vgg_feature._modules.values())[:-1]) # 删除features中最后一个Maxpool层
20 | # backbone.out_channels = 512
21 |
22 | # https://download.pytorch.org/models/mobilenet_v2-b0353104.pth
23 | backbone = MobileNetV2(weights_path="./backbone/mobilenet_v2.pth").features
24 | backbone.out_channels = 1280 # 设置对应backbone输出特征矩阵的channels
25 |
26 | anchor_generator = AnchorsGenerator(sizes=((32, 64, 128, 256, 512),),
27 | aspect_ratios=((0.5, 1.0, 2.0),))
28 |
29 | roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'], # 在哪些特征层上进行roi pooling
30 | output_size=[7, 7], # roi_pooling输出特征矩阵尺寸
31 | sampling_ratio=2) # 采样率
32 |
33 | model = FasterRCNN(backbone=backbone,
34 | num_classes=num_classes,
35 | rpn_anchor_generator=anchor_generator,
36 | box_roi_pool=roi_pooler)
37 |
38 | return model
39 |
40 |
41 | def main():
42 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
43 | print("Using {} device training.".format(device.type))
44 |
45 | # 用来保存coco_info的文件
46 | results_file = "results{}.txt".format(datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
47 |
48 | # 检查保存权重文件夹是否存在,不存在则创建
49 | if not os.path.exists("save_weights"):
50 | os.makedirs("save_weights")
51 |
52 | data_transform = {
53 | "train": transforms.Compose([transforms.ToTensor(),
54 | transforms.RandomHorizontalFlip(0.5)]),
55 | "val": transforms.Compose([transforms.ToTensor()])
56 | }
57 |
58 | VOC_root = "./" # VOCdevkit
59 | aspect_ratio_group_factor = 3
60 | batch_size = 8
61 | amp = False # 是否使用混合精度训练,需要GPU支持
62 |
63 | # check voc root
64 | if os.path.exists(os.path.join(VOC_root, "VOCdevkit")) is False:
65 | raise FileNotFoundError("VOCdevkit dose not in path:'{}'.".format(VOC_root))
66 |
67 | # load train data set
68 | # VOCdevkit -> VOC2012 -> ImageSets -> Main -> train.txt
69 | train_dataset = VOCDataSet(VOC_root, "2012", data_transform["train"], "train.txt")
70 | train_sampler = None
71 |
72 | # 是否按图片相似高宽比采样图片组成batch
73 | # 使用的话能够减小训练时所需GPU显存,默认使用
74 | if aspect_ratio_group_factor >= 0:
75 | train_sampler = torch.utils.data.RandomSampler(train_dataset)
76 | # 统计所有图像高宽比例在bins区间中的位置索引
77 | group_ids = create_aspect_ratio_groups(train_dataset, k=aspect_ratio_group_factor)
78 | # 每个batch图片从同一高宽比例区间中取
79 | train_batch_sampler = GroupedBatchSampler(train_sampler, group_ids, batch_size)
80 |
81 | nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers
82 | print('Using %g dataloader workers' % nw)
83 |
84 | # 注意这里的collate_fn是自定义的,因为读取的数据包括image和targets,不能直接使用默认的方法合成batch
85 | if train_sampler:
86 | # 如果按照图片高宽比采样图片,dataloader中需要使用batch_sampler
87 | train_data_loader = torch.utils.data.DataLoader(train_dataset,
88 | batch_sampler=train_batch_sampler,
89 | pin_memory=True,
90 | num_workers=nw,
91 | collate_fn=train_dataset.collate_fn)
92 | else:
93 | train_data_loader = torch.utils.data.DataLoader(train_dataset,
94 | batch_size=batch_size,
95 | shuffle=True,
96 | pin_memory=True,
97 | num_workers=nw,
98 | collate_fn=train_dataset.collate_fn)
99 |
100 | # load validation data set
101 | # VOCdevkit -> VOC2012 -> ImageSets -> Main -> val.txt
102 | val_dataset = VOCDataSet(VOC_root, "2012", data_transform["val"], "val.txt")
103 | val_data_loader = torch.utils.data.DataLoader(val_dataset,
104 | batch_size=1,
105 | shuffle=False,
106 | pin_memory=True,
107 | num_workers=nw,
108 | collate_fn=val_dataset.collate_fn)
109 |
110 | # create model num_classes equal background + 20 classes
111 | model = create_model(num_classes=21)
112 | # print(model)
113 |
114 | model.to(device)
115 |
116 | scaler = torch.cuda.amp.GradScaler() if amp else None
117 |
118 | train_loss = []
119 | learning_rate = []
120 | val_map = []
121 |
122 | # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
123 | # first frozen backbone and train 5 epochs #
124 | # 首先冻结前置特征提取网络权重(backbone),训练rpn以及最终预测网络部分 #
125 | # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
126 | for param in model.backbone.parameters():
127 | param.requires_grad = False
128 |
129 | # define optimizer
130 | params = [p for p in model.parameters() if p.requires_grad]
131 | optimizer = torch.optim.SGD(params, lr=0.005,
132 | momentum=0.9, weight_decay=0.0005)
133 |
134 | init_epochs = 5
135 | for epoch in range(init_epochs):
136 | # train for one epoch, printing every 10 iterations
137 | mean_loss, lr = utils.train_one_epoch(model, optimizer, train_data_loader,
138 | device, epoch, print_freq=50,
139 | warmup=True, scaler=scaler)
140 | train_loss.append(mean_loss.item())
141 | learning_rate.append(lr)
142 |
143 | # evaluate on the test dataset
144 | coco_info = utils.evaluate(model, val_data_loader, device=device)
145 |
146 | # write into txt
147 | with open(results_file, "a") as f:
148 | # 写入的数据包括coco指标还有loss和learning rate
149 | result_info = [f"{i:.4f}" for i in coco_info + [mean_loss.item()]] + [f"{lr:.6f}"]
150 | txt = "epoch:{} {}".format(epoch, ' '.join(result_info))
151 | f.write(txt + "\n")
152 |
153 | val_map.append(coco_info[1]) # pascal mAP
154 |
155 | torch.save(model.state_dict(), "./save_weights/pretrain.pth")
156 |
157 | # # # # # # # # # # # # # # # # # # # # # # # # # # # #
158 | # second unfrozen backbone and train all network #
159 | # 解冻前置特征提取网络权重(backbone),接着训练整个网络权重 #
160 | # # # # # # # # # # # # # # # # # # # # # # # # # # # #
161 |
162 | # 冻结backbone部分底层权重
163 | for name, parameter in model.backbone.named_parameters():
164 | split_name = name.split(".")[0]
165 | if split_name in ["0", "1", "2", "3"]:
166 | parameter.requires_grad = False
167 | else:
168 | parameter.requires_grad = True
169 |
170 | # define optimizer
171 | params = [p for p in model.parameters() if p.requires_grad]
172 | optimizer = torch.optim.SGD(params, lr=0.005,
173 | momentum=0.9, weight_decay=0.0005)
174 | # learning rate scheduler
175 | lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
176 | step_size=3,
177 | gamma=0.33)
178 | num_epochs = 20
179 | for epoch in range(init_epochs, num_epochs+init_epochs, 1):
180 | # train for one epoch, printing every 50 iterations
181 | mean_loss, lr = utils.train_one_epoch(model, optimizer, train_data_loader,
182 | device, epoch, print_freq=50,
183 | warmup=True, scaler=scaler)
184 | train_loss.append(mean_loss.item())
185 | learning_rate.append(lr)
186 |
187 | # update the learning rate
188 | lr_scheduler.step()
189 |
190 | # evaluate on the test dataset
191 | coco_info = utils.evaluate(model, val_data_loader, device=device)
192 |
193 | # write into txt
194 | with open(results_file, "a") as f:
195 | # 写入的数据包括coco指标还有loss和learning rate
196 | result_info = [f"{i:.4f}" for i in coco_info + [mean_loss.item()]] + [f"{lr:.6f}"]
197 | txt = "epoch:{} {}".format(epoch, ' '.join(result_info))
198 | f.write(txt + "\n")
199 |
200 | val_map.append(coco_info[1]) # pascal mAP
201 |
202 | # save weights
203 | # 仅保存最后5个epoch的权重
204 | if epoch in range(num_epochs+init_epochs)[-5:]:
205 | save_files = {
206 | 'model': model.state_dict(),
207 | 'optimizer': optimizer.state_dict(),
208 | 'lr_scheduler': lr_scheduler.state_dict(),
209 | 'epoch': epoch}
210 | torch.save(save_files, "./save_weights/mobile-model-{}.pth".format(epoch))
211 |
212 | # plot loss and lr curve
213 | if len(train_loss) != 0 and len(learning_rate) != 0:
214 | from plot_curve import plot_loss_and_lr
215 | plot_loss_and_lr(train_loss, learning_rate)
216 |
217 | # plot mAP curve
218 | if len(val_map) != 0:
219 | from plot_curve import plot_map
220 | plot_map(val_map)
221 |
222 |
223 | if __name__ == "__main__":
224 | main()
225 |
--------------------------------------------------------------------------------
/faster_rcnn/train_res50_fpn.py:
--------------------------------------------------------------------------------
1 | import os
2 | import datetime
3 |
4 | import torch
5 |
6 | import transforms
7 | from network_files import FasterRCNN, FastRCNNPredictor
8 | from backbone import resnet50_fpn_backbone
9 | from cityscrayp import VOCDataSet
10 | from train_utils import GroupedBatchSampler, create_aspect_ratio_groups
11 | from train_utils import train_eval_utils as utils
12 |
13 |
14 | def create_model(num_classes, load_pretrain_weights=True):
15 | # 注意,这里的backbone默认使用的是FrozenBatchNorm2d,即不会去更新bn参数
16 | # 目的是为了防止batch_size太小导致效果更差(如果显存很小,建议使用默认的FrozenBatchNorm2d)
17 | # 如果GPU显存很大可以设置比较大的batch_size就可以将norm_layer设置为普通的BatchNorm2d
18 | # trainable_layers包括['layer4', 'layer3', 'layer2', 'layer1', 'conv1'], 5代表全部训练
19 | # resnet50 imagenet weights url: https://download.pytorch.org/models/resnet50-0676ba61.pth
20 | backbone = resnet50_fpn_backbone(pretrain_path="/home/lcl_d/wuwentao/detection/maskrcnn/pre_model/resnet50.pth",
21 | norm_layer=torch.nn.BatchNorm2d,
22 | trainable_layers=3)
23 | # 训练自己数据集时不要修改这里的91,修改的是传入的num_classes参数
24 | model = FasterRCNN(backbone=backbone, num_classes=91)
25 |
26 | if load_pretrain_weights:
27 | # 载入预训练模型权重
28 | # https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
29 | weights_dict = torch.load("/home/lcl_d/wuwentao/detection/maskrcnn_vehiclemae_image_v3/pytorch_object_detection/faster_rcnn/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth", map_location='cpu')
30 | missing_keys, unexpected_keys = model.load_state_dict(weights_dict, strict=False)
31 | if len(missing_keys) != 0 or len(unexpected_keys) != 0:
32 | print("missing_keys: ", missing_keys)
33 | print("unexpected_keys: ", unexpected_keys)
34 |
35 | # get number of input features for the classifier
36 | in_features = model.roi_heads.box_predictor.cls_score.in_features
37 | # replace the pre-trained head with a new one
38 | model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
39 |
40 | return model
41 |
42 |
43 | def main(args):
44 | device = torch.device(args.device if torch.cuda.is_available() else "cpu")
45 | print("Using {} device training.".format(device.type))
46 |
47 | # 用来保存coco_info的文件
48 | results_file = "results{}.txt".format(datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
49 |
50 | data_transform = {
51 | "train": transforms.Compose([transforms.ToTensor(),
52 | transforms.RandomHorizontalFlip(0.5)]),
53 | "val": transforms.Compose([transforms.ToTensor()])
54 | }
55 |
56 | VOC_root = args.data_path
57 | # check voc root
58 | #if os.path.exists(os.path.join(VOC_root, "VOCdevkit")) is False:
59 | # raise FileNotFoundError("VOCdevkit dose not in path:'{}'.".format(VOC_root))
60 |
61 | # load train data set
62 | # VOCdevkit -> VOC2012 -> ImageSets -> Main -> train.txt
63 | train_dataset = VOCDataSet(VOC_root,'train', data_transform["train"], "trainImages.txt")
64 | train_sampler = None
65 |
66 | # 是否按图片相似高宽比采样图片组成batch
67 | # 使用的话能够减小训练时所需GPU显存,默认使用
68 | if args.aspect_ratio_group_factor >= 0:
69 | train_sampler = torch.utils.data.RandomSampler(train_dataset)
70 | # 统计所有图像高宽比例在bins区间中的位置索引
71 | group_ids = create_aspect_ratio_groups(train_dataset, k=args.aspect_ratio_group_factor)
72 | # 每个batch图片从同一高宽比例区间中取
73 | train_batch_sampler = GroupedBatchSampler(train_sampler, group_ids, args.batch_size)
74 |
75 | # 注意这里的collate_fn是自定义的,因为读取的数据包括image和targets,不能直接使用默认的方法合成batch
76 | batch_size = args.batch_size
77 | nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers
78 | print('Using %g dataloader workers' % nw)
79 | if train_sampler:
80 | # 如果按照图片高宽比采样图片,dataloader中需要使用batch_sampler
81 | train_data_loader = torch.utils.data.DataLoader(train_dataset,
82 | batch_sampler=train_batch_sampler,
83 | pin_memory=True,
84 | num_workers=nw,
85 | collate_fn=train_dataset.collate_fn)
86 | else:
87 | train_data_loader = torch.utils.data.DataLoader(train_dataset,
88 | batch_size=batch_size,
89 | shuffle=True,
90 | pin_memory=True,
91 | num_workers=nw,
92 | collate_fn=train_dataset.collate_fn)
93 |
94 | # load validation data set
95 | # VOCdevkit -> VOC2012 -> ImageSets -> Main -> val.txt
96 | val_dataset = VOCDataSet(VOC_root,'val', data_transform["val"], "valImages.txt")
97 | val_data_set_loader = torch.utils.data.DataLoader(val_dataset,
98 | batch_size=1,
99 | shuffle=False,
100 | pin_memory=True,
101 | num_workers=nw,
102 | collate_fn=val_dataset.collate_fn)
103 |
104 | # create model num_classes equal background + 20 classes
105 | model = create_model(num_classes=args.num_classes + 1)
106 | # print(model)
107 |
108 | model.to(device)
109 |
110 | # define optimizer
111 | params = [p for p in model.parameters() if p.requires_grad]
112 | optimizer = torch.optim.SGD(params,
113 | lr=args.lr,
114 | momentum=args.momentum,
115 | weight_decay=args.weight_decay)
116 |
117 | scaler = torch.cuda.amp.GradScaler() if args.amp else None
118 |
119 | # learning rate scheduler
120 | lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
121 | step_size=3,
122 | gamma=0.33)
123 |
124 | # 如果指定了上次训练保存的权重文件地址,则接着上次结果接着训练
125 | if args.resume != "":
126 | checkpoint = torch.load(args.resume, map_location='cpu')
127 | model.load_state_dict(checkpoint['model'])
128 | optimizer.load_state_dict(checkpoint['optimizer'])
129 | lr_scheduler.load_state_dict(checkpoint['lr_scheduler'])
130 | args.start_epoch = checkpoint['epoch'] + 1
131 | if args.amp and "scaler" in checkpoint:
132 | scaler.load_state_dict(checkpoint["scaler"])
133 | print("the training process from epoch{}...".format(args.start_epoch))
134 |
135 | train_loss = []
136 | learning_rate = []
137 | val_map = []
138 |
139 | for epoch in range(args.start_epoch, args.epochs):
140 | # train for one epoch, printing every 10 iterations
141 | mean_loss, lr = utils.train_one_epoch(model, optimizer, train_data_loader,
142 | device=device, epoch=epoch,
143 | print_freq=50, warmup=True,
144 | scaler=scaler)
145 | train_loss.append(mean_loss.item())
146 | learning_rate.append(lr)
147 |
148 | # update the learning rate
149 | lr_scheduler.step()
150 |
151 | # evaluate on the test dataset
152 | coco_info = utils.evaluate(model, val_data_set_loader, device=device)
153 |
154 | # write into txt
155 | with open(results_file, "a") as f:
156 | # 写入的数据包括coco指标还有loss和learning rate
157 | result_info = [f"{i:.4f}" for i in coco_info + [mean_loss.item()]] + [f"{lr:.6f}"]
158 | txt = "epoch:{} {}".format(epoch, ' '.join(result_info))
159 | f.write(txt + "\n")
160 |
161 | val_map.append(coco_info[1]) # pascal mAP
162 |
163 | # save weights
164 | save_files = {
165 | 'model': model.state_dict(),
166 | 'optimizer': optimizer.state_dict(),
167 | 'lr_scheduler': lr_scheduler.state_dict(),
168 | 'epoch': epoch}
169 | if args.amp:
170 | save_files["scaler"] = scaler.state_dict()
171 | torch.save(save_files, "./save_weights/resNetFpn-model-{}.pth".format(epoch))
172 |
173 | # plot loss and lr curve
174 | if len(train_loss) != 0 and len(learning_rate) != 0:
175 | from plot_curve import plot_loss_and_lr
176 | plot_loss_and_lr(train_loss, learning_rate)
177 |
178 | # plot mAP curve
179 | if len(val_map) != 0:
180 | from plot_curve import plot_map
181 | plot_map(val_map)
182 |
183 |
184 | if __name__ == "__main__":
185 | import argparse
186 |
187 | parser = argparse.ArgumentParser(
188 | description=__doc__)
189 |
190 | # 训练设备类型
191 | parser.add_argument('--device', default='cuda:0', help='device')
192 | # 训练数据集的根目录(VOCdevkit)
193 | parser.add_argument('--data-path', default='/home/lcl_d/wuwentao/data/cityscapes/', help='dataset')
194 | # 检测目标类别数(不包含背景)
195 | parser.add_argument('--num-classes', default=4, type=int, help='num_classes')
196 | # 文件保存地址
197 | parser.add_argument('--output-dir', default='./save_weights', help='path where to save')
198 | # 若需要接着上次训练,则指定上次训练保存权重文件地址
199 | parser.add_argument('--resume', default='', type=str, help='resume from checkpoint')
200 | # 指定接着从哪个epoch数开始训练
201 | parser.add_argument('--start_epoch', default=0, type=int, help='start epoch')
202 | # 训练的总epoch数
203 | parser.add_argument('--epochs', default=31, type=int, metavar='N',
204 | help='number of total epochs to run')
205 | # 学习率
206 | parser.add_argument('--lr', default=0.01, type=float,
207 | help='initial learning rate, 0.02 is the default value for training '
208 | 'on 8 gpus and 2 images_per_gpu')
209 | # SGD的momentum参数
210 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M',
211 | help='momentum')
212 | # SGD的weight_decay参数
213 | parser.add_argument('--wd', '--weight-decay', default=1e-4, type=float,
214 | metavar='W', help='weight decay (default: 1e-4)',
215 | dest='weight_decay')
216 | # 训练的batch size
217 | parser.add_argument('--batch_size', default=8, type=int, metavar='N',
218 | help='batch size when training.')
219 | parser.add_argument('--aspect-ratio-group-factor', default=3, type=int)
220 | # 是否使用混合精度训练(需要GPU支持混合精度)
221 | parser.add_argument("--amp", default=False, help="Use torch.cuda.amp for mixed precision training")
222 |
223 | args = parser.parse_args()
224 | print(args)
225 |
226 | # 检查保存权重文件夹是否存在,不存在则创建
227 | if not os.path.exists(args.output_dir):
228 | os.makedirs(args.output_dir)
229 |
230 | main(args)
231 |
--------------------------------------------------------------------------------
/faster_rcnn/train_utils/__init__.py:
--------------------------------------------------------------------------------
1 | from .group_by_aspect_ratio import GroupedBatchSampler, create_aspect_ratio_groups
2 | from .distributed_utils import init_distributed_mode, save_on_master, mkdir
3 | from .coco_utils import get_coco_api_from_dataset
4 | from .coco_eval import CocoEvaluator
5 |
--------------------------------------------------------------------------------
/faster_rcnn/train_utils/coco_utils.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torchvision
3 | import torch.utils.data
4 | from pycocotools.coco import COCO
5 |
6 |
7 | def convert_to_coco_api(ds):
8 | coco_ds = COCO()
9 | # annotation IDs need to start at 1, not 0
10 | ann_id = 1
11 | dataset = {'images': [], 'categories': [], 'annotations': []}
12 | categories = set()
13 | for img_idx in range(len(ds)):
14 | # find better way to get target
15 | hw, targets = ds.coco_index(img_idx)
16 | image_id = targets["image_id"].item()
17 | img_dict = {}
18 | img_dict['id'] = image_id
19 | img_dict['height'] = hw[0]
20 | img_dict['width'] = hw[1]
21 | dataset['images'].append(img_dict)
22 | bboxes = targets["boxes"]
23 | bboxes[:, 2:] -= bboxes[:, :2]
24 | bboxes = bboxes.tolist()
25 | labels = targets['labels'].tolist()
26 | areas = targets['area'].tolist()
27 | iscrowd = targets['iscrowd'].tolist()
28 | num_objs = len(bboxes)
29 | for i in range(num_objs):
30 | ann = {}
31 | ann['image_id'] = image_id
32 | ann['bbox'] = bboxes[i]
33 | ann['category_id'] = labels[i]
34 | categories.add(labels[i])
35 | ann['area'] = areas[i]
36 | ann['iscrowd'] = iscrowd[i]
37 | ann['id'] = ann_id
38 | dataset['annotations'].append(ann)
39 | ann_id += 1
40 | dataset['categories'] = [{'id': i} for i in sorted(categories)]
41 | coco_ds.dataset = dataset
42 | coco_ds.createIndex()
43 | return coco_ds
44 |
45 |
46 | def get_coco_api_from_dataset(dataset):
47 | for _ in range(10):
48 | if isinstance(dataset, torchvision.datasets.CocoDetection):
49 | break
50 | if isinstance(dataset, torch.utils.data.Subset):
51 | dataset = dataset.dataset
52 | if isinstance(dataset, torchvision.datasets.CocoDetection):
53 | return dataset.coco
54 | return convert_to_coco_api(dataset)
55 |
--------------------------------------------------------------------------------
/faster_rcnn/train_utils/group_by_aspect_ratio.py:
--------------------------------------------------------------------------------
1 | import bisect
2 | from collections import defaultdict
3 | import copy
4 | from itertools import repeat, chain
5 | import math
6 | import numpy as np
7 |
8 | import torch
9 | import torch.utils.data
10 | from torch.utils.data.sampler import BatchSampler, Sampler
11 | from torch.utils.model_zoo import tqdm
12 | import torchvision
13 |
14 | from PIL import Image
15 |
16 |
17 | def _repeat_to_at_least(iterable, n):
18 | repeat_times = math.ceil(n / len(iterable))
19 | repeated = chain.from_iterable(repeat(iterable, repeat_times))
20 | return list(repeated)
21 |
22 |
23 | class GroupedBatchSampler(BatchSampler):
24 | """
25 | Wraps another sampler to yield a mini-batch of indices.
26 | It enforces that the batch only contain elements from the same group.
27 | It also tries to provide mini-batches which follows an ordering which is
28 | as close as possible to the ordering from the original sampler.
29 | Arguments:
30 | sampler (Sampler): Base sampler.
31 | group_ids (list[int]): If the sampler produces indices in range [0, N),
32 | `group_ids` must be a list of `N` ints which contains the group id of each sample.
33 | The group ids must be a continuous set of integers starting from
34 | 0, i.e. they must be in the range [0, num_groups).
35 | batch_size (int): Size of mini-batch.
36 | """
37 | def __init__(self, sampler, group_ids, batch_size):
38 | if not isinstance(sampler, Sampler):
39 | raise ValueError(
40 | "sampler should be an instance of "
41 | "torch.utils.data.Sampler, but got sampler={}".format(sampler)
42 | )
43 | self.sampler = sampler
44 | self.group_ids = group_ids
45 | self.batch_size = batch_size
46 |
47 | def __iter__(self):
48 | buffer_per_group = defaultdict(list)
49 | samples_per_group = defaultdict(list)
50 |
51 | num_batches = 0
52 | for idx in self.sampler:
53 | group_id = self.group_ids[idx]
54 | buffer_per_group[group_id].append(idx)
55 | samples_per_group[group_id].append(idx)
56 | if len(buffer_per_group[group_id]) == self.batch_size:
57 | yield buffer_per_group[group_id]
58 | num_batches += 1
59 | del buffer_per_group[group_id]
60 | assert len(buffer_per_group[group_id]) < self.batch_size
61 |
62 | # now we have run out of elements that satisfy
63 | # the group criteria, let's return the remaining
64 | # elements so that the size of the sampler is
65 | # deterministic
66 | expected_num_batches = len(self)
67 | num_remaining = expected_num_batches - num_batches
68 | if num_remaining > 0:
69 | # for the remaining batches, take first the buffers with largest number
70 | # of elements
71 | for group_id, _ in sorted(buffer_per_group.items(),
72 | key=lambda x: len(x[1]), reverse=True):
73 | remaining = self.batch_size - len(buffer_per_group[group_id])
74 | samples_from_group_id = _repeat_to_at_least(samples_per_group[group_id], remaining)
75 | buffer_per_group[group_id].extend(samples_from_group_id[:remaining])
76 | assert len(buffer_per_group[group_id]) == self.batch_size
77 | yield buffer_per_group[group_id]
78 | num_remaining -= 1
79 | if num_remaining == 0:
80 | break
81 | assert num_remaining == 0
82 |
83 | def __len__(self):
84 | return len(self.sampler) // self.batch_size
85 |
86 |
87 | def _compute_aspect_ratios_slow(dataset, indices=None):
88 | print("Your dataset doesn't support the fast path for "
89 | "computing the aspect ratios, so will iterate over "
90 | "the full dataset and load every image instead. "
91 | "This might take some time...")
92 | if indices is None:
93 | indices = range(len(dataset))
94 |
95 | class SubsetSampler(Sampler):
96 | def __init__(self, indices):
97 | self.indices = indices
98 |
99 | def __iter__(self):
100 | return iter(self.indices)
101 |
102 | def __len__(self):
103 | return len(self.indices)
104 |
105 | sampler = SubsetSampler(indices)
106 | data_loader = torch.utils.data.DataLoader(
107 | dataset, batch_size=1, sampler=sampler,
108 | num_workers=14, # you might want to increase it for faster processing
109 | collate_fn=lambda x: x[0])
110 | aspect_ratios = []
111 | with tqdm(total=len(dataset)) as pbar:
112 | for _i, (img, _) in enumerate(data_loader):
113 | pbar.update(1)
114 | height, width = img.shape[-2:]
115 | aspect_ratio = float(width) / float(height)
116 | aspect_ratios.append(aspect_ratio)
117 | return aspect_ratios
118 |
119 |
120 | def _compute_aspect_ratios_custom_dataset(dataset, indices=None):
121 | if indices is None:
122 | indices = range(len(dataset))
123 | aspect_ratios = []
124 | for i in indices:
125 | height, width = dataset.get_height_and_width(i)
126 | aspect_ratio = float(width) / float(height)
127 | aspect_ratios.append(aspect_ratio)
128 | return aspect_ratios
129 |
130 |
131 | def _compute_aspect_ratios_coco_dataset(dataset, indices=None):
132 | if indices is None:
133 | indices = range(len(dataset))
134 | aspect_ratios = []
135 | for i in indices:
136 | img_info = dataset.coco.imgs[dataset.ids[i]]
137 | aspect_ratio = float(img_info["width"]) / float(img_info["height"])
138 | aspect_ratios.append(aspect_ratio)
139 | return aspect_ratios
140 |
141 |
142 | def _compute_aspect_ratios_voc_dataset(dataset, indices=None):
143 | if indices is None:
144 | indices = range(len(dataset))
145 | aspect_ratios = []
146 | for i in indices:
147 | # this doesn't load the data into memory, because PIL loads it lazily
148 | width, height = Image.open(dataset.images[i]).size
149 | aspect_ratio = float(width) / float(height)
150 | aspect_ratios.append(aspect_ratio)
151 | return aspect_ratios
152 |
153 |
154 | def _compute_aspect_ratios_subset_dataset(dataset, indices=None):
155 | if indices is None:
156 | indices = range(len(dataset))
157 |
158 | ds_indices = [dataset.indices[i] for i in indices]
159 | return compute_aspect_ratios(dataset.dataset, ds_indices)
160 |
161 |
162 | def compute_aspect_ratios(dataset, indices=None):
163 | if hasattr(dataset, "get_height_and_width"):
164 | return _compute_aspect_ratios_custom_dataset(dataset, indices)
165 |
166 | if isinstance(dataset, torchvision.datasets.CocoDetection):
167 | return _compute_aspect_ratios_coco_dataset(dataset, indices)
168 |
169 | if isinstance(dataset, torchvision.datasets.VOCDetection):
170 | return _compute_aspect_ratios_voc_dataset(dataset, indices)
171 |
172 | if isinstance(dataset, torch.utils.data.Subset):
173 | return _compute_aspect_ratios_subset_dataset(dataset, indices)
174 |
175 | # slow path
176 | return _compute_aspect_ratios_slow(dataset, indices)
177 |
178 |
179 | def _quantize(x, bins):
180 | bins = copy.deepcopy(bins)
181 | bins = sorted(bins)
182 | # bisect_right:寻找y元素按顺序应该排在bins中哪个元素的右边,返回的是索引
183 | quantized = list(map(lambda y: bisect.bisect_right(bins, y), x))
184 | return quantized
185 |
186 |
187 | def create_aspect_ratio_groups(dataset, k=0):
188 | # 计算所有数据集中的图片width/height比例
189 | aspect_ratios = compute_aspect_ratios(dataset)
190 | # 将[0.5, 2]区间划分成2*k等份(2k+1个点,2k个区间)
191 | bins = (2 ** np.linspace(-1, 1, 2 * k + 1)).tolist() if k > 0 else [1.0]
192 |
193 | # 统计所有图像比例在bins区间中的位置索引
194 | groups = _quantize(aspect_ratios, bins)
195 | # count number of elements per group
196 | # 统计每个区间的频次
197 | counts = np.unique(groups, return_counts=True)[1]
198 | fbins = [0] + bins + [np.inf]
199 | print("Using {} as bins for aspect ratio quantization".format(fbins))
200 | print("Count of instances per bin: {}".format(counts))
201 | return groups
202 |
--------------------------------------------------------------------------------
/faster_rcnn/train_utils/train_eval_utils.py:
--------------------------------------------------------------------------------
1 | import math
2 | import sys
3 | import time
4 |
5 | import torch
6 |
7 | from .coco_utils import get_coco_api_from_dataset
8 | from .coco_eval import CocoEvaluator
9 | import train_utils.distributed_utils as utils
10 |
11 |
12 | def train_one_epoch(model, optimizer, data_loader, device, epoch,
13 | print_freq=50, warmup=False, scaler=None):
14 | model.train()
15 | metric_logger = utils.MetricLogger(delimiter=" ")
16 | metric_logger.add_meter('lr', utils.SmoothedValue(window_size=1, fmt='{value:.6f}'))
17 | header = 'Epoch: [{}]'.format(epoch)
18 |
19 | lr_scheduler = None
20 | if epoch == 0 and warmup is True: # 当训练第一轮(epoch=0)时,启用warmup训练方式,可理解为热身训练
21 | warmup_factor = 1.0 / 1000
22 | warmup_iters = min(1000, len(data_loader) - 1)
23 |
24 | lr_scheduler = utils.warmup_lr_scheduler(optimizer, warmup_iters, warmup_factor)
25 |
26 | mloss = torch.zeros(1).to(device) # mean losses
27 | for i, [images, targets] in enumerate(metric_logger.log_every(data_loader, print_freq, header)):
28 | images = list(image.to(device) for image in images)
29 | targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
30 |
31 | # 混合精度训练上下文管理器,如果在CPU环境中不起任何作用
32 | with torch.cuda.amp.autocast(enabled=scaler is not None):
33 | loss_dict = model(images, targets)
34 | losses = sum(loss for loss in loss_dict.values())
35 |
36 | # reduce losses over all GPUs for logging purpose
37 | loss_dict_reduced = utils.reduce_dict(loss_dict)
38 | losses_reduced = sum(loss for loss in loss_dict_reduced.values())
39 |
40 | loss_value = losses_reduced.item()
41 | # 记录训练损失
42 | mloss = (mloss * i + loss_value) / (i + 1) # update mean losses
43 |
44 | if not math.isfinite(loss_value): # 当计算的损失为无穷大时停止训练
45 | print("Loss is {}, stopping training".format(loss_value))
46 | print(loss_dict_reduced)
47 | sys.exit(1)
48 |
49 | optimizer.zero_grad()
50 | if scaler is not None:
51 | scaler.scale(losses).backward()
52 | scaler.step(optimizer)
53 | scaler.update()
54 | else:
55 | losses.backward()
56 | optimizer.step()
57 |
58 | if lr_scheduler is not None: # 第一轮使用warmup训练方式
59 | lr_scheduler.step()
60 |
61 | metric_logger.update(loss=losses_reduced, **loss_dict_reduced)
62 | now_lr = optimizer.param_groups[0]["lr"]
63 | metric_logger.update(lr=now_lr)
64 |
65 | return mloss, now_lr
66 |
67 |
68 | @torch.no_grad()
69 | def evaluate(model, data_loader, device):
70 |
71 | cpu_device = torch.device("cpu")
72 | model.eval()
73 | metric_logger = utils.MetricLogger(delimiter=" ")
74 | header = "Test: "
75 |
76 | coco = get_coco_api_from_dataset(data_loader.dataset)
77 | iou_types = _get_iou_types(model)
78 | coco_evaluator = CocoEvaluator(coco, iou_types)
79 |
80 | for image, targets in metric_logger.log_every(data_loader, 100, header):
81 | image = list(img.to(device) for img in image)
82 |
83 | # 当使用CPU时,跳过GPU相关指令
84 | if device != torch.device("cpu"):
85 | torch.cuda.synchronize(device)
86 |
87 | model_time = time.time()
88 | outputs = model(image)
89 |
90 | outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs]
91 | model_time = time.time() - model_time
92 |
93 | res = {target["image_id"].item(): output for target, output in zip(targets, outputs)}
94 |
95 | evaluator_time = time.time()
96 | coco_evaluator.update(res)
97 | evaluator_time = time.time() - evaluator_time
98 | metric_logger.update(model_time=model_time, evaluator_time=evaluator_time)
99 |
100 | # gather the stats from all processes
101 | metric_logger.synchronize_between_processes()
102 | print("Averaged stats:", metric_logger)
103 | coco_evaluator.synchronize_between_processes()
104 |
105 | # accumulate predictions from all images
106 | coco_evaluator.accumulate()
107 | coco_evaluator.summarize()
108 |
109 | coco_info = coco_evaluator.coco_eval[iou_types[0]].stats.tolist() # numpy to list
110 |
111 | return coco_info
112 |
113 |
114 | def _get_iou_types(model):
115 | model_without_ddp = model
116 | if isinstance(model, torch.nn.parallel.DistributedDataParallel):
117 | model_without_ddp = model.module
118 | iou_types = ["bbox"]
119 | return iou_types
120 |
--------------------------------------------------------------------------------
/faster_rcnn/transforms.py:
--------------------------------------------------------------------------------
1 | import random
2 | from torchvision.transforms import functional as F
3 |
4 |
5 | class Compose(object):
6 | """组合多个transform函数"""
7 | def __init__(self, transforms):
8 | self.transforms = transforms
9 |
10 | def __call__(self, image, target):
11 | for t in self.transforms:
12 | image, target = t(image, target)
13 | return image, target
14 |
15 |
16 | class ToTensor(object):
17 | """将PIL图像转为Tensor"""
18 | def __call__(self, image, target):
19 | image = F.to_tensor(image)
20 | return image, target
21 |
22 |
23 | class RandomHorizontalFlip(object):
24 | """随机水平翻转图像以及bboxes"""
25 | def __init__(self, prob=0.5):
26 | self.prob = prob
27 |
28 | def __call__(self, image, target):
29 | if random.random() < self.prob:
30 | height, width = image.shape[-2:]
31 | image = image.flip(-1) # 水平翻转图片
32 | bbox = target["boxes"]
33 | # bbox: xmin, ymin, xmax, ymax
34 | bbox[:, [0, 2]] = width - bbox[:, [2, 0]] # 翻转对应bbox坐标信息
35 | target["boxes"] = bbox
36 | return image, target
37 |
--------------------------------------------------------------------------------
/faster_rcnn/validation.py:
--------------------------------------------------------------------------------
1 | """
2 | 该脚本用于调用训练好的模型权重去计算验证集/测试集的COCO指标
3 | 以及每个类别的mAP(IoU=0.5)
4 | """
5 |
6 | import os
7 | import json
8 |
9 | import torch
10 | from tqdm import tqdm
11 | import numpy as np
12 |
13 | import transforms
14 | from network_files import FasterRCNN
15 | from backbone import resnet50_fpn_backbone
16 | from my_dataset import VOCDataSet
17 | from train_utils import get_coco_api_from_dataset, CocoEvaluator
18 |
19 |
20 | def summarize(self, catId=None):
21 | """
22 | Compute and display summary metrics for evaluation results.
23 | Note this functin can *only* be applied on the default parameter setting
24 | """
25 |
26 | def _summarize(ap=1, iouThr=None, areaRng='all', maxDets=100):
27 | p = self.params
28 | iStr = ' {:<18} {} @[ IoU={:<9} | area={:>6s} | maxDets={:>3d} ] = {:0.3f}'
29 | titleStr = 'Average Precision' if ap == 1 else 'Average Recall'
30 | typeStr = '(AP)' if ap == 1 else '(AR)'
31 | iouStr = '{:0.2f}:{:0.2f}'.format(p.iouThrs[0], p.iouThrs[-1]) \
32 | if iouThr is None else '{:0.2f}'.format(iouThr)
33 |
34 | aind = [i for i, aRng in enumerate(p.areaRngLbl) if aRng == areaRng]
35 | mind = [i for i, mDet in enumerate(p.maxDets) if mDet == maxDets]
36 |
37 | if ap == 1:
38 | # dimension of precision: [TxRxKxAxM]
39 | s = self.eval['precision']
40 | # IoU
41 | if iouThr is not None:
42 | t = np.where(iouThr == p.iouThrs)[0]
43 | s = s[t]
44 |
45 | if isinstance(catId, int):
46 | s = s[:, :, catId, aind, mind]
47 | else:
48 | s = s[:, :, :, aind, mind]
49 |
50 | else:
51 | # dimension of recall: [TxKxAxM]
52 | s = self.eval['recall']
53 | if iouThr is not None:
54 | t = np.where(iouThr == p.iouThrs)[0]
55 | s = s[t]
56 |
57 | if isinstance(catId, int):
58 | s = s[:, catId, aind, mind]
59 | else:
60 | s = s[:, :, aind, mind]
61 |
62 | if len(s[s > -1]) == 0:
63 | mean_s = -1
64 | else:
65 | mean_s = np.mean(s[s > -1])
66 |
67 | print_string = iStr.format(titleStr, typeStr, iouStr, areaRng, maxDets, mean_s)
68 | return mean_s, print_string
69 |
70 | stats, print_list = [0] * 12, [""] * 12
71 | stats[0], print_list[0] = _summarize(1)
72 | stats[1], print_list[1] = _summarize(1, iouThr=.5, maxDets=self.params.maxDets[2])
73 | stats[2], print_list[2] = _summarize(1, iouThr=.75, maxDets=self.params.maxDets[2])
74 | stats[3], print_list[3] = _summarize(1, areaRng='small', maxDets=self.params.maxDets[2])
75 | stats[4], print_list[4] = _summarize(1, areaRng='medium', maxDets=self.params.maxDets[2])
76 | stats[5], print_list[5] = _summarize(1, areaRng='large', maxDets=self.params.maxDets[2])
77 | stats[6], print_list[6] = _summarize(0, maxDets=self.params.maxDets[0])
78 | stats[7], print_list[7] = _summarize(0, maxDets=self.params.maxDets[1])
79 | stats[8], print_list[8] = _summarize(0, maxDets=self.params.maxDets[2])
80 | stats[9], print_list[9] = _summarize(0, areaRng='small', maxDets=self.params.maxDets[2])
81 | stats[10], print_list[10] = _summarize(0, areaRng='medium', maxDets=self.params.maxDets[2])
82 | stats[11], print_list[11] = _summarize(0, areaRng='large', maxDets=self.params.maxDets[2])
83 |
84 | print_info = "\n".join(print_list)
85 |
86 | if not self.eval:
87 | raise Exception('Please run accumulate() first')
88 |
89 | return stats, print_info
90 |
91 |
92 | def main(parser_data):
93 | device = torch.device(parser_data.device if torch.cuda.is_available() else "cpu")
94 | print("Using {} device training.".format(device.type))
95 |
96 | data_transform = {
97 | "val": transforms.Compose([transforms.ToTensor()])
98 | }
99 |
100 | # read class_indict
101 | label_json_path = './pascal_voc_classes.json'
102 | assert os.path.exists(label_json_path), "json file {} dose not exist.".format(label_json_path)
103 | with open(label_json_path, 'r') as f:
104 | class_dict = json.load(f)
105 |
106 | category_index = {v: k for k, v in class_dict.items()}
107 |
108 | VOC_root = parser_data.data_path
109 | # check voc root
110 | if os.path.exists(os.path.join(VOC_root, "VOCdevkit")) is False:
111 | raise FileNotFoundError("VOCdevkit dose not in path:'{}'.".format(VOC_root))
112 |
113 | # 注意这里的collate_fn是自定义的,因为读取的数据包括image和targets,不能直接使用默认的方法合成batch
114 | batch_size = parser_data.batch_size
115 | nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers
116 | print('Using %g dataloader workers' % nw)
117 |
118 | # load validation data set
119 | val_dataset = VOCDataSet(VOC_root, "2012", data_transform["val"], "val.txt")
120 | val_dataset_loader = torch.utils.data.DataLoader(val_dataset,
121 | batch_size=1,
122 | shuffle=False,
123 | num_workers=nw,
124 | pin_memory=True,
125 | collate_fn=val_dataset.collate_fn)
126 |
127 | # create model num_classes equal background + 20 classes
128 | # 注意,这里的norm_layer要和训练脚本中保持一致
129 | backbone = resnet50_fpn_backbone(norm_layer=torch.nn.BatchNorm2d)
130 | model = FasterRCNN(backbone=backbone, num_classes=parser_data.num_classes + 1)
131 |
132 | # 载入你自己训练好的模型权重
133 | weights_path = parser_data.weights_path
134 | assert os.path.exists(weights_path), "not found {} file.".format(weights_path)
135 | weights_dict = torch.load(weights_path, map_location='cpu')
136 | weights_dict = weights_dict["model"] if "model" in weights_dict else weights_dict
137 | model.load_state_dict(weights_dict)
138 | # print(model)
139 |
140 | model.to(device)
141 |
142 | # evaluate on the test dataset
143 | coco = get_coco_api_from_dataset(val_dataset)
144 | iou_types = ["bbox"]
145 | coco_evaluator = CocoEvaluator(coco, iou_types)
146 | cpu_device = torch.device("cpu")
147 |
148 | model.eval()
149 | with torch.no_grad():
150 | for image, targets in tqdm(val_dataset_loader, desc="validation..."):
151 | # 将图片传入指定设备device
152 | image = list(img.to(device) for img in image)
153 |
154 | # inference
155 | outputs = model(image)
156 |
157 | outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs]
158 | res = {target["image_id"].item(): output for target, output in zip(targets, outputs)}
159 | coco_evaluator.update(res)
160 |
161 | coco_evaluator.synchronize_between_processes()
162 |
163 | # accumulate predictions from all images
164 | coco_evaluator.accumulate()
165 | coco_evaluator.summarize()
166 |
167 | coco_eval = coco_evaluator.coco_eval["bbox"]
168 | # calculate COCO info for all classes
169 | coco_stats, print_coco = summarize(coco_eval)
170 |
171 | # calculate voc info for every classes(IoU=0.5)
172 | voc_map_info_list = []
173 | for i in range(len(category_index)):
174 | stats, _ = summarize(coco_eval, catId=i)
175 | voc_map_info_list.append(" {:15}: {}".format(category_index[i + 1], stats[1]))
176 |
177 | print_voc = "\n".join(voc_map_info_list)
178 | print(print_voc)
179 |
180 | # 将验证结果保存至txt文件中
181 | with open("record_mAP.txt", "w") as f:
182 | record_lines = ["COCO results:",
183 | print_coco,
184 | "",
185 | "mAP(IoU=0.5) for each category:",
186 | print_voc]
187 | f.write("\n".join(record_lines))
188 |
189 |
190 | if __name__ == "__main__":
191 | import argparse
192 |
193 | parser = argparse.ArgumentParser(
194 | description=__doc__)
195 |
196 | # 使用设备类型
197 | parser.add_argument('--device', default='cuda', help='device')
198 |
199 | # 检测目标类别数
200 | parser.add_argument('--num-classes', type=int, default='20', help='number of classes')
201 |
202 | # 数据集的根目录(VOCdevkit)
203 | parser.add_argument('--data-path', default='/data/', help='dataset root')
204 |
205 | # 训练好的权重文件
206 | parser.add_argument('--weights-path', default='./save_weights/model.pth', type=str, help='training weights')
207 |
208 | # batch size
209 | parser.add_argument('--batch_size', default=1, type=int, metavar='N',
210 | help='batch size when validation.')
211 |
212 | args = parser.parse_args()
213 |
214 | main(args)
215 |
--------------------------------------------------------------------------------
/figures/VehicleMAE_Det.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/figures/VehicleMAE_Det.jpg
--------------------------------------------------------------------------------
/figures/detection_result.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/figures/detection_result.jpg
--------------------------------------------------------------------------------
/figures/experimentalresults.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/figures/experimentalresults.jpg
--------------------------------------------------------------------------------
/figures/firstIMG.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/figures/firstIMG.jpg
--------------------------------------------------------------------------------
/figures/proposal_attentionmaps.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/figures/proposal_attentionmaps.jpg
--------------------------------------------------------------------------------
/figures/proposal_attribute.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/figures/proposal_attribute.jpg
--------------------------------------------------------------------------------
/my_dataset_cityscraps.py:
--------------------------------------------------------------------------------
1 | import os
2 | import json
3 |
4 | import torch
5 | from PIL import Image
6 | import torch.utils.data as data
7 | from pycocotools.coco import COCO
8 | from train_utils import coco_remove_images_without_annotations, convert_coco_poly_mask
9 |
10 |
11 | class CityscrapesDetection(data.Dataset):
12 | """`MS Coco Detection `_ Dataset.
13 |
14 | Args:
15 | root (string): Root directory where images are downloaded to.
16 | dataset (string): train or val.
17 | transforms (callable, optional): A function/transform that takes input sample and its target as entry
18 | and returns a transformed version.
19 | """
20 |
21 | def __init__(self, root, dataset="train", transforms=None):
22 | super(CityscrapesDetection, self).__init__()
23 | assert dataset in ["train", "val"], 'dataset must be in ["train", "val"]'
24 | anno_file = f"instances_{dataset}.json"
25 | assert os.path.exists(root), "file '{}' does not exist.".format(root)
26 | self.img_root = os.path.join(root, f"{dataset}")
27 | assert os.path.exists(self.img_root), "path '{}' does not exist.".format(self.img_root)
28 | self.anno_path = os.path.join(root, "annotations", anno_file)
29 | assert os.path.exists(self.anno_path), "file '{}' does not exist.".format(self.anno_path)
30 |
31 | self.mode = dataset
32 | self.transforms = transforms
33 | self.coco = COCO(self.anno_path)
34 |
35 | # 获取coco数据索引与类别名称的关系
36 | # 注意在object80中的索引并不是连续的,虽然只有80个类别,但索引还是按照stuff91来排序的
37 | data_classes = dict([(v["id"], v["name"]) for k, v in self.coco.cats.items()])
38 | max_index = max(data_classes.keys()) # 90
39 | # 将缺失的类别名称设置成N/A
40 | coco_classes = {}
41 | for k in range(1, max_index + 1):
42 | if k in data_classes:
43 | coco_classes[k] = data_classes[k]
44 | else:
45 | coco_classes[k] = "N/A"
46 |
47 | if dataset == "train":
48 | json_str = json.dumps(coco_classes, indent=4)
49 | with open("/data/wuwentao/VehicleDetection/cityscrapes4_indices.json", "w") as f:
50 | f.write(json_str)
51 |
52 | self.coco_classes = coco_classes
53 |
54 | ids = list(sorted(self.coco.imgs.keys()))
55 | if dataset == "train":
56 | # 移除没有目标,或者目标面积非常小的数据
57 | valid_ids = coco_remove_images_without_annotations(self.coco, ids)
58 | self.ids = valid_ids
59 | else:
60 | self.ids = ids
61 |
62 | def parse_targets(self,
63 | img_id: int,
64 | coco_targets: list,
65 | w: int = None,
66 | h: int = None):
67 | assert w > 0
68 | assert h > 0
69 |
70 | # 只筛选出单个对象的情况
71 | anno = [obj for obj in coco_targets if obj['iscrowd'] == 0]
72 |
73 | boxes = [obj["bbox"] for obj in anno]
74 |
75 | # guard against no boxes via resizing
76 | boxes = torch.as_tensor(boxes, dtype=torch.float32).reshape(-1, 4)
77 | # [xmin, ymin, w, h] -> [xmin, ymin, xmax, ymax]
78 | boxes[:, 2:] += boxes[:, :2]
79 | boxes[:, 0::2].clamp_(min=0, max=w)
80 | boxes[:, 1::2].clamp_(min=0, max=h)
81 |
82 | classes = [obj["category_id"] for obj in anno]
83 | classes = torch.tensor(classes, dtype=torch.int64)
84 |
85 | area = torch.tensor([obj["area"] for obj in anno])
86 | iscrowd = torch.tensor([obj["iscrowd"] for obj in anno])
87 |
88 | segmentations = [obj["segmentation"] for obj in anno]
89 | masks = convert_coco_poly_mask(segmentations, h, w)
90 |
91 | # 筛选出合法的目标,即x_max>x_min且y_max>y_min
92 | keep = (boxes[:, 3] > boxes[:, 1]) & (boxes[:, 2] > boxes[:, 0])
93 | boxes = boxes[keep]
94 | classes = classes[keep]
95 | masks = masks[keep]
96 | area = area[keep]
97 | iscrowd = iscrowd[keep]
98 |
99 | target = {}
100 | target["boxes"] = boxes
101 | target["labels"] = classes
102 | target["masks"] = masks
103 | target["image_id"] = torch.tensor([img_id])
104 |
105 | # for conversion to coco api
106 | target["area"] = area
107 | target["iscrowd"] = iscrowd
108 |
109 | return target
110 |
111 | def __getitem__(self, index):
112 | """
113 | Args:
114 | index (int): Index
115 |
116 | Returns:
117 | tuple: Tuple (image, target). target is the object returned by ``coco.loadAnns``.
118 | """
119 | coco = self.coco
120 | img_id = self.ids[index]
121 | ann_ids = coco.getAnnIds(imgIds=img_id)
122 | coco_target = coco.loadAnns(ann_ids)
123 |
124 | path = coco.loadImgs(img_id)[0]['file_name']
125 | path = '/data/wuwentao/data/cityscapes/leftImg8bit/' + path
126 | #img = Image.open(os.path.join(self.img_root, path)).convert('RGB')
127 | img = Image.open(path).convert('RGB')
128 |
129 | w, h = img.size
130 | target = self.parse_targets(img_id, coco_target, w, h)
131 | if self.transforms is not None:
132 | img, target = self.transforms(img, target)
133 |
134 | return img, target
135 |
136 | def __len__(self):
137 | return len(self.ids)
138 |
139 | def get_height_and_width(self, index):
140 | coco = self.coco
141 | img_id = self.ids[index]
142 |
143 | img_info = coco.loadImgs(img_id)[0]
144 | w = img_info["width"]
145 | h = img_info["height"]
146 | return h, w
147 |
148 | @staticmethod
149 | def collate_fn(batch):
150 | return tuple(zip(*batch))
151 |
152 |
153 | if __name__ == '__main__':
154 | train = CityscrapesDetection("/data/wuwentao/data/cityscapes/leftImg8bit/", dataset="train")
155 | print(len(train))
156 | t = train[0]
157 |
--------------------------------------------------------------------------------
/network_files/__init__.py:
--------------------------------------------------------------------------------
1 | from .faster_rcnn_framework import FasterRCNN, FastRCNNPredictor
2 | from .rpn_function import AnchorsGenerator
3 | from .mask_rcnn import MaskRCNN
4 | from .vehiclemaeencode import VehiclemaeEncode
5 |
--------------------------------------------------------------------------------
/network_files/__pycache__/__init__.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/__init__.cpython-38.pyc
--------------------------------------------------------------------------------
/network_files/__pycache__/boxes.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/boxes.cpython-38.pyc
--------------------------------------------------------------------------------
/network_files/__pycache__/det_utils.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/det_utils.cpython-38.pyc
--------------------------------------------------------------------------------
/network_files/__pycache__/faster_rcnn_framework.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/faster_rcnn_framework.cpython-38.pyc
--------------------------------------------------------------------------------
/network_files/__pycache__/image_list.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/image_list.cpython-38.pyc
--------------------------------------------------------------------------------
/network_files/__pycache__/mask_rcnn.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/mask_rcnn.cpython-38.pyc
--------------------------------------------------------------------------------
/network_files/__pycache__/roi_head.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/roi_head.cpython-38.pyc
--------------------------------------------------------------------------------
/network_files/__pycache__/rpn_function.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/rpn_function.cpython-38.pyc
--------------------------------------------------------------------------------
/network_files/__pycache__/transform.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/transform.cpython-38.pyc
--------------------------------------------------------------------------------
/network_files/__pycache__/vehiclemaeencode.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/vehiclemaeencode.cpython-38.pyc
--------------------------------------------------------------------------------
/network_files/boxes.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from typing import Tuple
3 | from torch import Tensor
4 | import torchvision
5 |
6 |
7 | def nms(boxes, scores, iou_threshold):
8 | # type: (Tensor, Tensor, float) -> Tensor
9 | """
10 | Performs non-maximum suppression (NMS) on the boxes according
11 | to their intersection-over-union (IoU).
12 |
13 | NMS iteratively removes lower scoring boxes which have an
14 | IoU greater than iou_threshold with another (higher scoring)
15 | box.
16 |
17 | Parameters
18 | ----------
19 | boxes : Tensor[N, 4])
20 | boxes to perform NMS on. They
21 | are expected to be in (x1, y1, x2, y2) format
22 | scores : Tensor[N]
23 | scores for each one of the boxes
24 | iou_threshold : float
25 | discards all overlapping
26 | boxes with IoU > iou_threshold
27 |
28 | Returns
29 | -------
30 | keep : Tensor
31 | int64 tensor with the indices
32 | of the elements that have been kept
33 | by NMS, sorted in decreasing order of scores
34 | """
35 | return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
36 |
37 |
38 | def batched_nms(boxes, scores, idxs, iou_threshold):
39 | # type: (Tensor, Tensor, Tensor, float) -> Tensor
40 | """
41 | Performs non-maximum suppression in a batched fashion.
42 |
43 | Each index value correspond to a category, and NMS
44 | will not be applied between elements of different categories.
45 |
46 | Parameters
47 | ----------
48 | boxes : Tensor[N, 4]
49 | boxes where NMS will be performed. They
50 | are expected to be in (x1, y1, x2, y2) format
51 | scores : Tensor[N]
52 | scores for each one of the boxes
53 | idxs : Tensor[N]
54 | indices of the categories for each one of the boxes.
55 | iou_threshold : float
56 | discards all overlapping boxes
57 | with IoU < iou_threshold
58 |
59 | Returns
60 | -------
61 | keep : Tensor
62 | int64 tensor with the indices of
63 | the elements that have been kept by NMS, sorted
64 | in decreasing order of scores
65 | """
66 | if boxes.numel() == 0:
67 | return torch.empty((0,), dtype=torch.int64, device=boxes.device)
68 |
69 | # strategy: in order to perform NMS independently per class.
70 | # we add an offset to all the boxes. The offset is dependent
71 | # only on the class idx, and is large enough so that boxes
72 | # from different classes do not overlap
73 | # 获取所有boxes中最大的坐标值(xmin, ymin, xmax, ymax)
74 | max_coordinate = boxes.max()
75 |
76 | # to(): Performs Tensor dtype and/or device conversion
77 | # 为每一个类别/每一层生成一个很大的偏移量
78 | # 这里的to只是让生成tensor的dytpe和device与boxes保持一致
79 | offsets = idxs.to(boxes) * (max_coordinate + 1)
80 | # boxes加上对应层的偏移量后,保证不同类别/层之间boxes不会有重合的现象
81 | boxes_for_nms = boxes + offsets[:, None]
82 | keep = nms(boxes_for_nms, scores, iou_threshold)
83 | return keep
84 |
85 |
86 | def remove_small_boxes(boxes, min_size):
87 | # type: (Tensor, float) -> Tensor
88 | """
89 | Remove boxes which contains at least one side smaller than min_size.
90 | 移除宽高小于指定阈值的索引
91 | Arguments:
92 | boxes (Tensor[N, 4]): boxes in (x1, y1, x2, y2) format
93 | min_size (float): minimum size
94 |
95 | Returns:
96 | keep (Tensor[K]): indices of the boxes that have both sides
97 | larger than min_size
98 | """
99 | ws, hs = boxes[:, 2] - boxes[:, 0], boxes[:, 3] - boxes[:, 1] # 预测boxes的宽和高
100 | # keep = (ws >= min_size) & (hs >= min_size) # 当满足宽,高都大于给定阈值时为True
101 | keep = torch.logical_and(torch.ge(ws, min_size), torch.ge(hs, min_size))
102 | # nonzero(): Returns a tensor containing the indices of all non-zero elements of input
103 | # keep = keep.nonzero().squeeze(1)
104 | keep = torch.where(keep)[0]
105 | return keep
106 |
107 |
108 | def clip_boxes_to_image(boxes, size):
109 | # type: (Tensor, Tuple[int, int]) -> Tensor
110 | """
111 | Clip boxes so that they lie inside an image of size `size`.
112 | 裁剪预测的boxes信息,将越界的坐标调整到图片边界上
113 |
114 | Arguments:
115 | boxes (Tensor[N, 4]): boxes in (x1, y1, x2, y2) format
116 | size (Tuple[height, width]): size of the image
117 |
118 | Returns:
119 | clipped_boxes (Tensor[N, 4])
120 | """
121 | dim = boxes.dim()
122 | boxes_x = boxes[..., 0::2] # x1, x2
123 | boxes_y = boxes[..., 1::2] # y1, y2
124 | height, width = size
125 |
126 | if torchvision._is_tracing():
127 | boxes_x = torch.max(boxes_x, torch.tensor(0, dtype=boxes.dtype, device=boxes.device))
128 | boxes_x = torch.min(boxes_x, torch.tensor(width, dtype=boxes.dtype, device=boxes.device))
129 | boxes_y = torch.max(boxes_y, torch.tensor(0, dtype=boxes.dtype, device=boxes.device))
130 | boxes_y = torch.min(boxes_y, torch.tensor(height, dtype=boxes.dtype, device=boxes.device))
131 | else:
132 | boxes_x = boxes_x.clamp(min=0, max=width) # 限制x坐标范围在[0,width]之间
133 | boxes_y = boxes_y.clamp(min=0, max=height) # 限制y坐标范围在[0,height]之间
134 |
135 | clipped_boxes = torch.stack((boxes_x, boxes_y), dim=dim)
136 | return clipped_boxes.reshape(boxes.shape)
137 |
138 |
139 | def box_area(boxes):
140 | """
141 | Computes the area of a set of bounding boxes, which are specified by its
142 | (x1, y1, x2, y2) coordinates.
143 |
144 | Arguments:
145 | boxes (Tensor[N, 4]): boxes for which the area will be computed. They
146 | are expected to be in (x1, y1, x2, y2) format
147 |
148 | Returns:
149 | area (Tensor[N]): area for each box
150 | """
151 | return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
152 |
153 |
154 | def box_iou(boxes1, boxes2):
155 | """
156 | Return intersection-over-union (Jaccard index) of boxes.
157 |
158 | Both sets of boxes are expected to be in (x1, y1, x2, y2) format.
159 |
160 | Arguments:
161 | boxes1 (Tensor[N, 4])
162 | boxes2 (Tensor[M, 4])
163 |
164 | Returns:
165 | iou (Tensor[N, M]): the NxM matrix containing the pairwise
166 | IoU values for every element in boxes1 and boxes2
167 | """
168 | area1 = box_area(boxes1)
169 | area2 = box_area(boxes2)
170 |
171 | # When the shapes do not match,
172 | # the shape of the returned output tensor follows the broadcasting rules
173 | lt = torch.max(boxes1[:, None, :2], boxes2[:, :2]) # left-top [N,M,2]
174 | rb = torch.min(boxes1[:, None, 2:], boxes2[:, 2:]) # right-bottom [N,M,2]
175 |
176 | wh = (rb - lt).clamp(min=0) # [N,M,2]
177 | inter = wh[:, :, 0] * wh[:, :, 1] # [N,M]
178 |
179 | iou = inter / (area1[:, None] + area2 - inter)
180 | return iou
181 |
182 |
--------------------------------------------------------------------------------
/network_files/image_list.py:
--------------------------------------------------------------------------------
1 | from typing import List, Tuple
2 | from torch import Tensor
3 |
4 |
5 | class ImageList(object):
6 | """
7 | Structure that holds a list of images (of possibly
8 | varying sizes) as a single tensor.
9 | This works by padding the images to the same size,
10 | and storing in a field the original sizes of each image
11 | """
12 |
13 | def __init__(self, tensors, image_sizes):
14 | # type: (Tensor, List[Tuple[int, int]]) -> None
15 | """
16 | Arguments:
17 | tensors (tensor) padding后的图像数据
18 | image_sizes (list[tuple[int, int]]) padding前的图像尺寸
19 | """
20 | self.tensors = tensors
21 | self.image_sizes = image_sizes
22 |
23 | def to(self, device):
24 | cast_tensor = self.tensors.to(device)
25 | return ImageList(cast_tensor, self.image_sizes)
26 |
27 |
--------------------------------------------------------------------------------
/network_files/vehiclemaeencode.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import numpy as np
3 | import torch.nn as nn
4 | from itertools import repeat
5 | import math
6 | import torch
7 | import torch.nn as nn
8 | import collections.abc as container_abcs
9 | from timm.models.vision_transformer import PatchEmbed, Block
10 |
11 | def _ntuple(n):
12 | def parse(x):
13 | if isinstance(x, container_abcs.Iterable):
14 | return x
15 | return tuple(repeat(x, n))
16 | return parse
17 | to_2tuple = _ntuple(2)
18 | def drop_path(x, drop_prob: float = 0., training: bool = False):
19 | """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
20 |
21 | This is the same as the DropConnect impl I created for EfficientNet, etc networks, however,
22 | the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
23 | See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for
24 | changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use
25 | 'survival rate' as the argument.
26 |
27 | """
28 | if drop_prob == 0. or not training:
29 | return x
30 | keep_prob = 1 - drop_prob
31 | shape = (x.shape[0],) + (1,) * (x.ndim - 1) # work with diff dim tensors, not just 2D ConvNets
32 | random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
33 | random_tensor.floor_() # binarize
34 | output = x.div(keep_prob) * random_tensor
35 | return output
36 |
37 | class DropPath(nn.Module):
38 | """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
39 | """
40 | def __init__(self, drop_prob=None):
41 | super(DropPath, self).__init__()
42 | self.drop_prob = drop_prob
43 |
44 | def forward(self, x):
45 | return drop_path(x, self.drop_prob, self.training)
46 |
47 | class Mlp(nn.Module):
48 | def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):
49 | super().__init__()
50 | out_features = out_features or in_features
51 | hidden_features = hidden_features or in_features
52 | self.fc1 = nn.Linear(in_features, hidden_features)
53 | self.act = act_layer()
54 | self.fc2 = nn.Linear(hidden_features, out_features)
55 | self.drop = nn.Dropout(drop)
56 |
57 | def forward(self, x):
58 | x = self.fc1(x)
59 | x = self.act(x)
60 | x = self.drop(x)
61 | x = self.fc2(x)
62 | x = self.drop(x)
63 | return x
64 |
65 |
66 | class Attention(nn.Module):
67 | def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0.):
68 | super().__init__()
69 | self.num_heads = num_heads
70 | head_dim = dim // num_heads
71 | # NOTE scale factor was wrong in my original version, can set manually to be compat with prev weights
72 | self.scale = qk_scale or head_dim ** -0.5
73 |
74 | self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)
75 | self.attn_drop = nn.Dropout(attn_drop)
76 | self.proj = nn.Linear(dim, dim)
77 | self.proj_drop = nn.Dropout(proj_drop)
78 |
79 | def forward(self, x):
80 | B, N, C = x.shape
81 | qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)
82 | q, k, v = qkv[0], qkv[1], qkv[2] # make torchscript happy (cannot use tensor as tuple)
83 |
84 | attn = (q @ k.transpose(-2, -1)) * self.scale
85 | attn = attn.softmax(dim=-1)
86 | attn = self.attn_drop(attn)
87 |
88 | x = (attn @ v).transpose(1, 2).reshape(B, N, C)
89 | x = self.proj(x)
90 | x = self.proj_drop(x)
91 | return x
92 |
93 |
94 | class Block(nn.Module):
95 |
96 | def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=True, qk_scale=None, drop=0., attn_drop=0.,
97 | drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm):
98 | super().__init__()
99 | self.norm1 = norm_layer(dim)
100 | self.attn = Attention(
101 | dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop)
102 | # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here
103 | self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
104 | self.norm2 = norm_layer(dim)
105 | mlp_hidden_dim = int(dim * mlp_ratio)
106 | self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
107 |
108 | def forward(self, x):
109 | x = x + self.drop_path(self.attn(self.norm1(x)))
110 | x = x + self.drop_path(self.mlp(self.norm2(x)))
111 | return x
112 |
113 | class PatchEmbed(nn.Module):
114 | def __init__(self, img_size=224, patch_size=16, stride_size=20, in_chans=3, embed_dim=768):
115 | super().__init__()
116 | img_size = to_2tuple(img_size)
117 | patch_size = to_2tuple(patch_size)
118 | stride_size_tuple = to_2tuple(stride_size)
119 | self.num_x = (img_size[1] - patch_size[1]) // stride_size_tuple[1] + 1
120 | self.num_y = (img_size[0] - patch_size[0]) // stride_size_tuple[0] + 1
121 |
122 | num_patches = self.num_x * self.num_y
123 | self.img_size = img_size
124 | self.patch_size = patch_size
125 | self.num_patches = num_patches
126 |
127 | self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=stride_size)
128 | for m in self.modules():
129 | if isinstance(m, nn.Conv2d):
130 | n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
131 | m.weight.data.normal_(0, math.sqrt(2. / n))
132 | elif isinstance(m, nn.BatchNorm2d):
133 | m.weight.data.fill_(1)
134 | m.bias.data.zero_()
135 | elif isinstance(m, nn.InstanceNorm2d):
136 | m.weight.data.fill_(1)
137 | m.bias.data.zero_()
138 |
139 | def forward(self, x):
140 | B, C, H, W = x.shape
141 |
142 | # FIXME look at relaxing size constraints
143 | assert H == self.img_size[0] and W == self.img_size[1], \
144 | f"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]})."
145 | x = self.proj(x)
146 |
147 | x = x.flatten(2).transpose(1, 2) # [64, 8, 768]
148 | return x
149 |
150 | class VTBClassifier(nn.Module):
151 | def __init__(self, attr_num, dim=768,num_heads=12, mlp_ratio=4., qkv_bias=True, qk_scale=None, drop_rate=0., attn_drop_rate=0.,
152 | drop_path_rate=0., norm_layer=nn.LayerNorm):#checkpoint-last.pth
153 |
154 | super().__init__()
155 | self.attr_num = attr_num
156 | self.word_embed = nn.Linear(768, dim)
157 | dpr = [x.item() for x in torch.linspace(0, drop_path_rate, 1)] # stochastic depth decay rule
158 | self.blocks = nn.ModuleList([
159 | Block(
160 | dim=dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,
161 | drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer)#多头注意力头数num_heads,, qk_scale=None
162 | for i in range(1)])#最新版timm要注释掉qk_scale
163 | self.norm = norm_layer(768)#对encode output做归一化
164 | self.weight_layer = nn.ModuleList([nn.Linear(dim, 1) for i in range(self.attr_num)])
165 | self.bn = nn.BatchNorm1d(self.attr_num)
166 |
167 | self.vis_embed = nn.Parameter(torch.zeros(1, 1, dim))
168 | self.tex_embed = nn.Parameter(torch.zeros(1, 1, dim))
169 |
170 | @torch.no_grad()
171 | def forward(self, features, word_vec, label=None):
172 |
173 | word_embed = self.word_embed(word_vec).expand(features.shape[0], word_vec.shape[0], features.shape[-1])
174 |
175 | tex_embed = word_embed + self.tex_embed
176 | vis_embed = features + self.vis_embed
177 |
178 | x = torch.cat([tex_embed, vis_embed], dim=1)
179 | for blk in self.blocks:
180 | x = blk(x)
181 | x = self.norm(x) #torch.Size([1024, 260, 768])
182 | tex_feature = x[:,:47,:]
183 |
184 | logits = torch.cat([self.weight_layer[i](x[:, i, :]) for i in range(self.attr_num)], dim=1)
185 | logits = self.bn(logits)
186 |
187 | return logits,tex_feature
188 |
189 | class VehiclemaeEncode(nn.Module):
190 | def __init__(self,img_size=224,patch_size=16, stride_size=16, in_chans=3,embed_dim=768, depth=12,
191 | num_heads=12, mlp_ratio=4., qkv_bias=True, qk_scale=None, drop_rate=0., attn_drop_rate=0.,
192 | drop_path_rate=0., norm_layer=nn.LayerNorm):
193 | super().__init__()
194 | self.patch_embed = PatchEmbed(
195 | img_size=img_size, patch_size=patch_size, stride_size=stride_size, in_chans=in_chans,
196 | embed_dim=embed_dim)
197 |
198 | num_patches = self.patch_embed.num_patches
199 | self.cls_token = nn.Parameter(torch.zeros(1, 1, embed_dim))
200 | self.other_token = nn.Parameter(torch.randn(1, 1, embed_dim))#可训练的token,用于替换掉那些被mask的块
201 | self.pos_embed = nn.Parameter(torch.zeros(1, num_patches + 1, embed_dim))
202 |
203 | dpr = [x.item() for x in torch.linspace(0, drop_path_rate, depth)] # stochastic depth decay rule
204 |
205 | self.pos_drop = nn.Dropout(p=drop_rate)
206 | self.blocks = nn.ModuleList([
207 | Block(
208 | dim=embed_dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,
209 | drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer)#多头注意力头数num_heads,, qk_scale=None
210 | for i in range(depth)])#最新版timm要注释掉qk_scale
211 | self.norm = norm_layer(768)#对encode output做归一化
212 |
213 | torch.nn.init.normal_(self.cls_token, std=.02)
214 | torch.nn.init.normal_(self.pos_embed, std=.02)
215 | torch.nn.init.normal_(self.other_token, std=.02)
216 |
217 | #self.apply(self._init_weights)
218 |
219 |
220 |
221 | @torch.no_grad()
222 | def forward(self, x):
223 | B = x.shape[0]
224 | x = self.patch_embed(x)
225 |
226 | cls_tokens = self.cls_token.expand(B, -1, -1)
227 | x = torch.cat((cls_tokens, x), dim=1) + self.pos_embed #(512*batch_size,197,768)256
228 | x = self.pos_drop(x)
229 |
230 | other_token = self.other_token.repeat(x.shape[0], 8, 1)#扩维
231 |
232 | x1 = x[:,:1,:]
233 | x2 = x[:,1:,:]
234 | x = torch.cat((x1,other_token, x2), dim=1)
235 |
236 | i = 0
237 | vtb_frature = None
238 | for blk in self.blocks:
239 | x = blk(x)
240 | i += 1
241 | if i == 11:
242 | vtb_frature = x
243 |
244 | x = self.norm(x)#encode的输出
245 |
246 | return x ,vtb_frature
247 |
248 |
--------------------------------------------------------------------------------
/plot_curve.py:
--------------------------------------------------------------------------------
1 | import datetime
2 | import matplotlib.pyplot as plt
3 |
4 |
5 | def plot_loss_and_lr(train_loss, learning_rate):
6 | try:
7 | x = list(range(len(train_loss)))
8 | fig, ax1 = plt.subplots(1, 1)
9 | ax1.plot(x, train_loss, 'r', label='loss')
10 | ax1.set_xlabel("step")
11 | ax1.set_ylabel("loss")
12 | ax1.set_title("Train Loss and lr")
13 | plt.legend(loc='best')
14 |
15 | ax2 = ax1.twinx()
16 | ax2.plot(x, learning_rate, label='lr')
17 | ax2.set_ylabel("learning rate")
18 | ax2.set_xlim(0, len(train_loss)) # 设置横坐标整数间隔
19 | plt.legend(loc='best')
20 |
21 | handles1, labels1 = ax1.get_legend_handles_labels()
22 | handles2, labels2 = ax2.get_legend_handles_labels()
23 | plt.legend(handles1 + handles2, labels1 + labels2, loc='upper right')
24 |
25 | fig.subplots_adjust(right=0.8) # 防止出现保存图片显示不全的情况
26 | fig.savefig('./loss_and_lr{}.png'.format(datetime.datetime.now().strftime("%Y%m%d-%H%M%S")))
27 | plt.close()
28 | print("successful save loss curve! ")
29 | except Exception as e:
30 | print(e)
31 |
32 |
33 | def plot_map(mAP):
34 | try:
35 | x = list(range(len(mAP)))
36 | plt.plot(x, mAP, label='mAp')
37 | plt.xlabel('epoch')
38 | plt.ylabel('mAP')
39 | plt.title('Eval mAP')
40 | plt.xlim(0, len(mAP))
41 | plt.legend(loc='best')
42 | plt.savefig('./mAP.png')
43 | plt.close()
44 | print("successful save mAP curve!")
45 | except Exception as e:
46 | print(e)
47 |
--------------------------------------------------------------------------------
/predict.py:
--------------------------------------------------------------------------------
1 | import os
2 | import time
3 | import json
4 | import pickle
5 | import numpy as np
6 | from PIL import Image
7 | import matplotlib.pyplot as plt
8 | import torch
9 | from torchvision import transforms
10 |
11 | from network_files import MaskRCNN
12 | from backbone import resnet50_fpn_backbone
13 | from draw_box_utils import draw_objs
14 |
15 |
16 | def create_model(num_classes, box_thresh=0.5):
17 | backbone = resnet50_fpn_backbone()
18 | model = MaskRCNN(backbone,
19 | num_classes=num_classes,
20 | rpn_score_thresh=box_thresh,
21 | box_score_thresh=box_thresh)
22 |
23 | return model
24 |
25 |
26 | def time_synchronized():
27 | torch.cuda.synchronize() if torch.cuda.is_available() else None
28 | return time.time()
29 |
30 |
31 | def main():
32 | num_classes = 4 # 不包含背景
33 | box_thresh = 0.5
34 | weights_path = "./save_weights/city_checkpoint.pth"
35 | img_path = "./image.png"
36 | label_json_path = './cityscrapes4_indices.json'
37 |
38 | data_path = './pre_model/Attribute_word_embedding_t5.pkl'
39 | dataset_info = pickle.load(open(data_path, 'rb+'))
40 | attr_vectors = dataset_info.attr_vectors.astype(np.float32)#.cuda()#.tolist()
41 | attr_vectors = torch.from_numpy(attr_vectors).cuda()
42 |
43 | # get devices
44 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
45 | print("using {} device.".format(device))
46 |
47 | # create model
48 | model = create_model(num_classes=num_classes + 1, box_thresh=box_thresh)
49 |
50 | # load train weights
51 | assert os.path.exists(weights_path), "{} file dose not exist.".format(weights_path)
52 | weights_dict = torch.load(weights_path, map_location='cpu')
53 | weights_dict = weights_dict["model"] if "model" in weights_dict else weights_dict
54 | model.load_state_dict(weights_dict)
55 | model.to(device)
56 |
57 | # read class_indict
58 | assert os.path.exists(label_json_path), "json file {} dose not exist.".format(label_json_path)
59 | with open(label_json_path, 'r') as json_file:
60 | category_index = json.load(json_file)
61 |
62 | # load image
63 | assert os.path.exists(img_path), f"{img_path} does not exits."
64 | original_img = Image.open(img_path).convert('RGB')
65 |
66 | # from pil image to tensor, do not normalize image
67 | data_transform = transforms.Compose([transforms.ToTensor()])
68 | img = data_transform(original_img)
69 | # expand batch dimension
70 | img = torch.unsqueeze(img, dim=0)
71 |
72 | model.eval() # 进入验证模式
73 | with torch.no_grad():
74 | # init
75 | img_height, img_width = img.shape[-2:]
76 | init_img = torch.zeros((1, 3, img_height, img_width), device=device)
77 | model(init_img, attr_vectors)
78 |
79 | t_start = time_synchronized()
80 | predictions = model(img.to(device), attr_vectors)[0]
81 | t_end = time_synchronized()
82 | print("inference+NMS time: {}".format(t_end - t_start))
83 |
84 | predict_boxes = predictions["boxes"].to("cpu").numpy()
85 | predict_classes = predictions["labels"].to("cpu").numpy()
86 | predict_scores = predictions["scores"].to("cpu").numpy()
87 | predict_mask = predictions["masks"].to("cpu").numpy()
88 | predict_mask = np.squeeze(predict_mask, axis=1) # [batch, 1, h, w] -> [batch, h, w]
89 |
90 | if len(predict_boxes) == 0:
91 | print("没有检测到任何目标!")
92 | return
93 |
94 | plot_img = draw_objs(original_img,
95 | boxes=predict_boxes,
96 | classes=predict_classes,
97 | scores=predict_scores,
98 | masks=predict_mask,
99 | category_index=category_index,
100 | line_thickness=3,
101 | font='arial.ttf',
102 | font_size=20)
103 | plt.imshow(plot_img)
104 | plt.show()
105 | # 保存预测的图片结果
106 | plot_img.save("test_result.jpg")
107 |
108 |
109 | if __name__ == '__main__':
110 | main()
111 |
112 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | _libgcc_mutex
2 | addict
3 | aliyun-python-sdk-core
4 | aliyun-python-sdk-kms
5 | appdirs
6 | ca-certificates
7 | certifi
8 | cffi
9 | chardet
10 | charset-normalizer
11 | cityscapesscripts
12 | click
13 | colorama
14 | coloredlogs
15 | crcmod
16 | cryptography
17 | cycler
18 | cython
19 | easydict
20 | filelock
21 | fonttools
22 | fsspec
23 | huggingface-hub
24 | humanfriendly
25 | idna
26 | imagecorruptions
27 | imageio
28 | importlib-metadata
29 | jmespath
30 | joblib
31 | kiwisolver
32 | ld_impl_linux-64
33 | libffi
34 | libgcc-ng
35 | libstdcxx-ng
36 | lxml
37 | markdown
38 | markdown-it-py
39 | matplotlib
40 | mdurl
41 | mmcv
42 | mmengine
43 | model-index
44 | ncurses
45 | networkx
46 | nltk
47 | numpy
48 | opencv-python
49 | opendatalab
50 | openmim
51 | openssl
52 | openxlab
53 | ordered-set
54 | oss2
55 | packaging
56 | pandas
57 | pillow
58 | pip
59 | platformdirs
60 | pycocotools
61 | pycparser
62 | pycryptodome
63 | pygments
64 | pyparsing
65 | pyquaternion
66 | python
67 | python-dateutil
68 | pytz
69 | pywavelets
70 | pyyaml
71 | readline
72 | regex
73 | requests
74 | rich
75 | safetensors
76 | scikit-image
77 | scikit-learn
78 | scipy
79 | sentence-transformers
80 | sentencepiece
81 | setuptools
82 | shapely
83 | six
84 | sqlite
85 | summary
86 | tabulate
87 | termcolor
88 | terminaltables
89 | threadpoolctl
90 | tifffile
91 | timm
92 | tk
93 | tokenizers
94 | tomli
95 | torch
96 | torchaudio
97 | torchvision
98 | tqdm
99 | transformers
100 | typing
101 | typing-extensions
102 | urllib3
103 | wheel
104 | xz
105 | yapf
106 | zipp
107 | zlib
108 |
--------------------------------------------------------------------------------
/train_utils/__init__.py:
--------------------------------------------------------------------------------
1 | from .group_by_aspect_ratio import GroupedBatchSampler, create_aspect_ratio_groups
2 | from .distributed_utils import init_distributed_mode, save_on_master, mkdir
3 | from .coco_eval import EvalCOCOMetric
4 | from .coco_utils import coco_remove_images_without_annotations, convert_coco_poly_mask, convert_to_coco_api
5 |
--------------------------------------------------------------------------------
/train_utils/__pycache__/__init__.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/train_utils/__pycache__/__init__.cpython-38.pyc
--------------------------------------------------------------------------------
/train_utils/__pycache__/coco_eval.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/train_utils/__pycache__/coco_eval.cpython-38.pyc
--------------------------------------------------------------------------------
/train_utils/__pycache__/coco_utils.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/train_utils/__pycache__/coco_utils.cpython-38.pyc
--------------------------------------------------------------------------------
/train_utils/__pycache__/distributed_utils.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/train_utils/__pycache__/distributed_utils.cpython-38.pyc
--------------------------------------------------------------------------------
/train_utils/__pycache__/group_by_aspect_ratio.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/train_utils/__pycache__/group_by_aspect_ratio.cpython-38.pyc
--------------------------------------------------------------------------------
/train_utils/__pycache__/train_eval_utils.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/train_utils/__pycache__/train_eval_utils.cpython-38.pyc
--------------------------------------------------------------------------------
/train_utils/coco_eval.py:
--------------------------------------------------------------------------------
1 | import json
2 | import copy
3 |
4 | import numpy as np
5 | from pycocotools.coco import COCO
6 | from pycocotools.cocoeval import COCOeval
7 | import pycocotools.mask as mask_util
8 | from .distributed_utils import all_gather, is_main_process
9 |
10 |
11 | def merge(img_ids, eval_results):
12 | """将多个进程之间的数据汇总在一起"""
13 | all_img_ids = all_gather(img_ids)
14 | all_eval_results = all_gather(eval_results)
15 |
16 | merged_img_ids = []
17 | for p in all_img_ids:
18 | merged_img_ids.extend(p)
19 |
20 | merged_eval_results = []
21 | for p in all_eval_results:
22 | merged_eval_results.extend(p)
23 |
24 | merged_img_ids = np.array(merged_img_ids)
25 |
26 | # keep only unique (and in sorted order) images
27 | # 去除重复的图片索引,多GPU训练时为了保证每个进程的训练图片数量相同,可能将一张图片分配给多个进程
28 | merged_img_ids, idx = np.unique(merged_img_ids, return_index=True)
29 | merged_eval_results = [merged_eval_results[i] for i in idx]
30 |
31 | return list(merged_img_ids), merged_eval_results
32 |
33 |
34 | class EvalCOCOMetric:
35 | def __init__(self,
36 | coco: COCO = None,
37 | iou_type: str = None,
38 | results_file_name: str = "predict_results.json",
39 | classes_mapping: dict = None):
40 | self.coco = copy.deepcopy(coco)
41 | self.img_ids = [] # 记录每个进程处理图片的ids
42 | self.results = []
43 | self.aggregation_results = None
44 | self.classes_mapping = classes_mapping
45 | self.coco_evaluator = None
46 | assert iou_type in ["bbox", "segm", "keypoints"]
47 | self.iou_type = iou_type
48 | self.results_file_name = results_file_name
49 |
50 | def prepare_for_coco_detection(self, targets, outputs):
51 | """将预测的结果转换成COCOeval指定的格式,针对目标检测任务"""
52 | # 遍历每张图像的预测结果
53 | for target, output in zip(targets, outputs):
54 | if len(output) == 0:
55 | continue
56 |
57 | img_id = int(target["image_id"])
58 | if img_id in self.img_ids:
59 | # 防止出现重复的数据
60 | continue
61 | self.img_ids.append(img_id)
62 | per_image_boxes = output["boxes"]
63 | # 对于coco_eval, 需要的每个box的数据格式为[x_min, y_min, w, h]
64 | # 而我们预测的box格式是[x_min, y_min, x_max, y_max],所以需要转下格式
65 | per_image_boxes[:, 2:] -= per_image_boxes[:, :2]
66 | per_image_classes = output["labels"].tolist()
67 | per_image_scores = output["scores"].tolist()
68 |
69 | res_list = []
70 | # 遍历每个目标的信息
71 | for object_score, object_class, object_box in zip(
72 | per_image_scores, per_image_classes, per_image_boxes):
73 | object_score = float(object_score)
74 | class_idx = int(object_class)
75 | if self.classes_mapping is not None:
76 | class_idx = int(self.classes_mapping[str(class_idx)])
77 | # We recommend rounding coordinates to the nearest tenth of a pixel
78 | # to reduce resulting JSON file size.
79 | object_box = [round(b, 2) for b in object_box.tolist()]
80 |
81 | res = {"image_id": img_id,
82 | "category_id": class_idx,
83 | "bbox": object_box,
84 | "score": round(object_score, 3)}
85 | res_list.append(res)
86 | self.results.append(res_list)
87 |
88 | def prepare_for_coco_segmentation(self, targets, outputs):
89 | """将预测的结果转换成COCOeval指定的格式,针对实例分割任务"""
90 | # 遍历每张图像的预测结果
91 | for target, output in zip(targets, outputs):
92 | if len(output) == 0:
93 | continue
94 |
95 | img_id = int(target["image_id"])
96 | if img_id in self.img_ids:
97 | # 防止出现重复的数据
98 | continue
99 |
100 | self.img_ids.append(img_id)
101 | per_image_masks = output["masks"]
102 | per_image_classes = output["labels"].tolist()
103 | per_image_scores = output["scores"].tolist()
104 |
105 | masks = per_image_masks > 0.5
106 |
107 | res_list = []
108 | # 遍历每个目标的信息
109 | for mask, label, score in zip(masks, per_image_classes, per_image_scores):
110 | rle = mask_util.encode(np.array(mask[0, :, :, np.newaxis], dtype=np.uint8, order="F"))[0]
111 | rle["counts"] = rle["counts"].decode("utf-8")
112 |
113 | class_idx = int(label)
114 | if self.classes_mapping is not None:
115 | class_idx = int(self.classes_mapping[str(class_idx)])
116 |
117 | res = {"image_id": img_id,
118 | "category_id": class_idx,
119 | "segmentation": rle,
120 | "score": round(score, 3)}
121 | res_list.append(res)
122 | self.results.append(res_list)
123 |
124 | def update(self, targets, outputs):
125 | if self.iou_type == "bbox":
126 | self.prepare_for_coco_detection(targets, outputs)
127 | elif self.iou_type == "segm":
128 | self.prepare_for_coco_segmentation(targets, outputs)
129 | else:
130 | raise KeyError(f"not support iou_type: {self.iou_type}")
131 |
132 | def synchronize_results(self):
133 | # 同步所有进程中的数据
134 | eval_ids, eval_results = merge(self.img_ids, self.results)
135 | self.aggregation_results = {"img_ids": eval_ids, "results": eval_results}
136 |
137 | # 主进程上保存即可
138 | if is_main_process():
139 | results = []
140 | [results.extend(i) for i in eval_results]
141 | # write predict results into json file
142 | json_str = json.dumps(results, indent=4)
143 | with open(self.results_file_name, 'w') as json_file:
144 | json_file.write(json_str)
145 |
146 | def evaluate(self):
147 | # 只在主进程上评估即可
148 | if is_main_process():
149 | # accumulate predictions from all images
150 | coco_true = self.coco
151 | coco_pre = coco_true.loadRes(self.results_file_name)
152 |
153 | self.coco_evaluator = COCOeval(cocoGt=coco_true, cocoDt=coco_pre, iouType=self.iou_type)
154 |
155 | self.coco_evaluator.evaluate()
156 | self.coco_evaluator.accumulate()
157 | print(f"IoU metric: {self.iou_type}")
158 | self.coco_evaluator.summarize()
159 |
160 | coco_info = self.coco_evaluator.stats.tolist() # numpy to list
161 | return coco_info
162 | else:
163 | return None
164 |
--------------------------------------------------------------------------------
/train_utils/coco_utils.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.utils.data
3 | from pycocotools import mask as coco_mask
4 | from pycocotools.coco import COCO
5 |
6 |
7 | def coco_remove_images_without_annotations(dataset, ids):
8 | """
9 | 删除coco数据集中没有目标,或者目标面积非常小的数据
10 | refer to:
11 | https://github.com/pytorch/vision/blob/master/references/detection/coco_utils.py
12 | :param dataset:
13 | :param cat_list:
14 | :return:
15 | """
16 | def _has_only_empty_bbox(anno):
17 | return all(any(o <= 1 for o in obj["bbox"][2:]) for obj in anno)
18 |
19 | def _has_valid_annotation(anno):
20 | # if it's empty, there is no annotation
21 | if len(anno) == 0:
22 | return False
23 | # if all boxes have close to zero area, there is no annotation
24 | if _has_only_empty_bbox(anno):
25 | return False
26 |
27 | return True
28 |
29 | valid_ids = []
30 | for ds_idx, img_id in enumerate(ids):
31 | ann_ids = dataset.getAnnIds(imgIds=img_id, iscrowd=None)
32 | anno = dataset.loadAnns(ann_ids)
33 |
34 | if _has_valid_annotation(anno):
35 | valid_ids.append(img_id)
36 |
37 | return valid_ids
38 |
39 |
40 | def convert_coco_poly_mask(segmentations, height, width):
41 | masks = []
42 | for polygons in segmentations:
43 | rles = coco_mask.frPyObjects(polygons, height, width)
44 | mask = coco_mask.decode(rles)
45 | if len(mask.shape) < 3:
46 | mask = mask[..., None]
47 | mask = torch.as_tensor(mask, dtype=torch.uint8)
48 | mask = mask.any(dim=2)
49 | masks.append(mask)
50 | if masks:
51 | masks = torch.stack(masks, dim=0)
52 | else:
53 | # 如果mask为空,则说明没有目标,直接返回数值为0的mask
54 | masks = torch.zeros((0, height, width), dtype=torch.uint8)
55 | return masks
56 |
57 |
58 | def convert_to_coco_api(self):
59 | coco_ds = COCO()
60 | # annotation IDs need to start at 1, not 0, see torchvision issue #1530
61 | ann_id = 1
62 | dataset = {"images": [], "categories": [], "annotations": []}
63 | categories = set()
64 | for img_idx in range(len(self)):
65 | targets, h, w = self.get_annotations(img_idx)
66 | img_id = targets["image_id"].item()
67 | img_dict = {"id": img_id,
68 | "height": h,
69 | "width": w}
70 | dataset["images"].append(img_dict)
71 | bboxes = targets["boxes"].clone()
72 | # convert (x_min, ymin, xmax, ymax) to (xmin, ymin, w, h)
73 | bboxes[:, 2:] -= bboxes[:, :2]
74 | bboxes = bboxes.tolist()
75 | labels = targets["labels"].tolist()
76 | areas = targets["area"].tolist()
77 | iscrowd = targets["iscrowd"].tolist()
78 | if "masks" in targets:
79 | masks = targets["masks"]
80 | # make masks Fortran contiguous for coco_mask
81 | masks = masks.permute(0, 2, 1).contiguous().permute(0, 2, 1)
82 | num_objs = len(bboxes)
83 | for i in range(num_objs):
84 | ann = {"image_id": img_id,
85 | "bbox": bboxes[i],
86 | "category_id": labels[i],
87 | "area": areas[i],
88 | "iscrowd": iscrowd[i],
89 | "id": ann_id}
90 | categories.add(labels[i])
91 | if "masks" in targets:
92 | ann["segmentation"] = coco_mask.encode(masks[i].numpy())
93 | dataset["annotations"].append(ann)
94 | ann_id += 1
95 | dataset["categories"] = [{"id": i} for i in sorted(categories)]
96 | coco_ds.dataset = dataset
97 | coco_ds.createIndex()
98 | return coco_ds
99 |
--------------------------------------------------------------------------------
/train_utils/distributed_utils.py:
--------------------------------------------------------------------------------
1 | from collections import defaultdict, deque
2 | import datetime
3 | import pickle
4 | import time
5 | import errno
6 | import os
7 |
8 | import torch
9 | import torch.distributed as dist
10 |
11 |
12 | class SmoothedValue(object):
13 | """Track a series of values and provide access to smoothed values over a
14 | window or the global series average.
15 | """
16 | def __init__(self, window_size=20, fmt=None):
17 | if fmt is None:
18 | fmt = "{value:.4f} ({global_avg:.4f})"
19 | self.deque = deque(maxlen=window_size) # deque简单理解成加强版list
20 | self.total = 0.0
21 | self.count = 0
22 | self.fmt = fmt
23 |
24 | def update(self, value, n=1):
25 | self.deque.append(value)
26 | self.count += n
27 | self.total += value * n
28 |
29 | def synchronize_between_processes(self):
30 | """
31 | Warning: does not synchronize the deque!
32 | """
33 | if not is_dist_avail_and_initialized():
34 | return
35 | t = torch.tensor([self.count, self.total], dtype=torch.float64, device="cuda")
36 | dist.barrier()
37 | dist.all_reduce(t)
38 | t = t.tolist()
39 | self.count = int(t[0])
40 | self.total = t[1]
41 |
42 | @property
43 | def median(self): # @property 是装饰器,这里可简单理解为增加median属性(只读)
44 | d = torch.tensor(list(self.deque))
45 | return d.median().item()
46 |
47 | @property
48 | def avg(self):
49 | d = torch.tensor(list(self.deque), dtype=torch.float32)
50 | return d.mean().item()
51 |
52 | @property
53 | def global_avg(self):
54 | return self.total / self.count
55 |
56 | @property
57 | def max(self):
58 | return max(self.deque)
59 |
60 | @property
61 | def value(self):
62 | return self.deque[-1]
63 |
64 | def __str__(self):
65 | return self.fmt.format(
66 | median=self.median,
67 | avg=self.avg,
68 | global_avg=self.global_avg,
69 | max=self.max,
70 | value=self.value)
71 |
72 |
73 | def all_gather(data):
74 | """
75 | 收集各个进程中的数据
76 | Run all_gather on arbitrary picklable data (not necessarily tensors)
77 | Args:
78 | data: any picklable object
79 | Returns:
80 | list[data]: list of data gathered from each rank
81 | """
82 | world_size = get_world_size() # 进程数
83 | if world_size == 1:
84 | return [data]
85 |
86 | data_list = [None] * world_size
87 | dist.all_gather_object(data_list, data)
88 |
89 | return data_list
90 |
91 |
92 | def reduce_dict(input_dict, average=True):
93 | """
94 | Args:
95 | input_dict (dict): all the values will be reduced
96 | average (bool): whether to do average or sum
97 | Reduce the values in the dictionary from all processes so that all processes
98 | have the averaged results. Returns a dict with the same fields as
99 | input_dict, after reduction.
100 | """
101 | world_size = get_world_size()
102 | if world_size < 2: # 单GPU的情况
103 | return input_dict
104 | with torch.no_grad(): # 多GPU的情况
105 | names = []
106 | values = []
107 | # sort the keys so that they are consistent across processes
108 | for k in sorted(input_dict.keys()):
109 | names.append(k)
110 | values.append(input_dict[k])
111 | values = torch.stack(values, dim=0)
112 | dist.all_reduce(values)
113 | if average:
114 | values /= world_size
115 |
116 | reduced_dict = {k: v for k, v in zip(names, values)}
117 | return reduced_dict
118 |
119 |
120 | class MetricLogger(object):
121 | def __init__(self, delimiter="\t"):
122 | self.meters = defaultdict(SmoothedValue)
123 | self.delimiter = delimiter
124 |
125 | def update(self, **kwargs):
126 | for k, v in kwargs.items():
127 | if isinstance(v, torch.Tensor):
128 | v = v.item()
129 | assert isinstance(v, (float, int))
130 | self.meters[k].update(v)
131 |
132 | def __getattr__(self, attr):
133 | if attr in self.meters:
134 | return self.meters[attr]
135 | if attr in self.__dict__:
136 | return self.__dict__[attr]
137 | raise AttributeError("'{}' object has no attribute '{}'".format(
138 | type(self).__name__, attr))
139 |
140 | def __str__(self):
141 | loss_str = []
142 | for name, meter in self.meters.items():
143 | loss_str.append(
144 | "{}: {}".format(name, str(meter))
145 | )
146 | return self.delimiter.join(loss_str)
147 |
148 | def synchronize_between_processes(self):
149 | for meter in self.meters.values():
150 | meter.synchronize_between_processes()
151 |
152 | def add_meter(self, name, meter):
153 | self.meters[name] = meter
154 |
155 | def log_every(self, iterable, print_freq, header=None):
156 | i = 0
157 | if not header:
158 | header = ""
159 | start_time = time.time()
160 | end = time.time()
161 | iter_time = SmoothedValue(fmt='{avg:.4f}')
162 | data_time = SmoothedValue(fmt='{avg:.4f}')
163 | space_fmt = ":" + str(len(str(len(iterable)))) + "d"
164 | if torch.cuda.is_available():
165 | log_msg = self.delimiter.join([header,
166 | '[{0' + space_fmt + '}/{1}]',
167 | 'eta: {eta}',
168 | '{meters}',
169 | 'time: {time}',
170 | 'data: {data}',
171 | 'max mem: {memory:.0f}'])
172 | else:
173 | log_msg = self.delimiter.join([header,
174 | '[{0' + space_fmt + '}/{1}]',
175 | 'eta: {eta}',
176 | '{meters}',
177 | 'time: {time}',
178 | 'data: {data}'])
179 | MB = 1024.0 * 1024.0
180 | for obj in iterable:
181 | data_time.update(time.time() - end)
182 | yield obj
183 | iter_time.update(time.time() - end)
184 | if i % print_freq == 0 or i == len(iterable) - 1:
185 | eta_second = iter_time.global_avg * (len(iterable) - i)
186 | eta_string = str(datetime.timedelta(seconds=eta_second))
187 | if torch.cuda.is_available():
188 | print(log_msg.format(i, len(iterable),
189 | eta=eta_string,
190 | meters=str(self),
191 | time=str(iter_time),
192 | data=str(data_time),
193 | memory=torch.cuda.max_memory_allocated() / MB))
194 | else:
195 | print(log_msg.format(i, len(iterable),
196 | eta=eta_string,
197 | meters=str(self),
198 | time=str(iter_time),
199 | data=str(data_time)))
200 | i += 1
201 | end = time.time()
202 | total_time = time.time() - start_time
203 | total_time_str = str(datetime.timedelta(seconds=int(total_time)))
204 | print('{} Total time: {} ({:.4f} s / it)'.format(header,
205 | total_time_str,
206 |
207 | total_time / len(iterable)))
208 |
209 |
210 | def warmup_lr_scheduler(optimizer, warmup_iters, warmup_factor):
211 |
212 | def f(x):
213 | """根据step数返回一个学习率倍率因子"""
214 | if x >= warmup_iters: # 当迭代数大于给定的warmup_iters时,倍率因子为1
215 | return 1
216 | alpha = float(x) / warmup_iters
217 | # 迭代过程中倍率因子从warmup_factor -> 1
218 | return warmup_factor * (1 - alpha) + alpha
219 |
220 | return torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=f)
221 |
222 |
223 | def mkdir(path):
224 | try:
225 | os.makedirs(path)
226 | except OSError as e:
227 | if e.errno != errno.EEXIST:
228 | raise
229 |
230 |
231 | def setup_for_distributed(is_master):
232 | """
233 | This function disables when not in master process
234 | """
235 | import builtins as __builtin__
236 | builtin_print = __builtin__.print
237 |
238 | def print(*args, **kwargs):
239 | force = kwargs.pop('force', False)
240 | if is_master or force:
241 | builtin_print(*args, **kwargs)
242 |
243 | __builtin__.print = print
244 |
245 |
246 | def is_dist_avail_and_initialized():
247 | """检查是否支持分布式环境"""
248 | if not dist.is_available():
249 | return False
250 | if not dist.is_initialized():
251 | return False
252 | return True
253 |
254 |
255 | def get_world_size():
256 | if not is_dist_avail_and_initialized():
257 | return 1
258 | return dist.get_world_size()
259 |
260 |
261 | def get_rank():
262 | if not is_dist_avail_and_initialized():
263 | return 0
264 | return dist.get_rank()
265 |
266 |
267 | def is_main_process():
268 | return get_rank() == 0
269 |
270 |
271 | def save_on_master(*args, **kwargs):
272 | if is_main_process():
273 | torch.save(*args, **kwargs)
274 |
275 |
276 | def init_distributed_mode(args):
277 | if 'RANK' in os.environ and 'WORLD_SIZE' in os.environ:
278 | args.rank = int(os.environ["RANK"])
279 | args.world_size = int(os.environ['WORLD_SIZE'])
280 | args.gpu = int(os.environ['LOCAL_RANK'])
281 | elif 'SLURM_PROCID' in os.environ:
282 | args.rank = int(os.environ['SLURM_PROCID'])
283 | args.gpu = args.rank % torch.cuda.device_count()
284 | else:
285 | print('Not using distributed mode')
286 | args.distributed = False
287 | return
288 |
289 | args.distributed = True
290 |
291 | torch.cuda.set_device(args.gpu)
292 | args.dist_backend = 'nccl'
293 | print('| distributed init (rank {}): {}'.format(
294 | args.rank, args.dist_url), flush=True)
295 | torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
296 | world_size=args.world_size, rank=args.rank)
297 | torch.distributed.barrier()
298 | setup_for_distributed(args.rank == 0)
299 |
300 |
--------------------------------------------------------------------------------
/train_utils/group_by_aspect_ratio.py:
--------------------------------------------------------------------------------
1 | import bisect
2 | from collections import defaultdict
3 | import copy
4 | from itertools import repeat, chain
5 | import math
6 | import numpy as np
7 |
8 | import torch
9 | import torch.utils.data
10 | from torch.utils.data.sampler import BatchSampler, Sampler
11 | from torch.utils.model_zoo import tqdm
12 | import torchvision
13 |
14 | from PIL import Image
15 |
16 |
17 | def _repeat_to_at_least(iterable, n):
18 | repeat_times = math.ceil(n / len(iterable))
19 | repeated = chain.from_iterable(repeat(iterable, repeat_times))
20 | return list(repeated)
21 |
22 |
23 | class GroupedBatchSampler(BatchSampler):
24 | """
25 | Wraps another sampler to yield a mini-batch of indices.
26 | It enforces that the batch only contain elements from the same group.
27 | It also tries to provide mini-batches which follows an ordering which is
28 | as close as possible to the ordering from the original sampler.
29 | Arguments:
30 | sampler (Sampler): Base sampler.
31 | group_ids (list[int]): If the sampler produces indices in range [0, N),
32 | `group_ids` must be a list of `N` ints which contains the group id of each sample.
33 | The group ids must be a continuous set of integers starting from
34 | 0, i.e. they must be in the range [0, num_groups).
35 | batch_size (int): Size of mini-batch.
36 | """
37 | def __init__(self, sampler, group_ids, batch_size):
38 | if not isinstance(sampler, Sampler):
39 | raise ValueError(
40 | "sampler should be an instance of "
41 | "torch.utils.data.Sampler, but got sampler={}".format(sampler)
42 | )
43 | self.sampler = sampler
44 | self.group_ids = group_ids
45 | self.batch_size = batch_size
46 |
47 | def __iter__(self):
48 | buffer_per_group = defaultdict(list)
49 | samples_per_group = defaultdict(list)
50 |
51 | num_batches = 0
52 | for idx in self.sampler:
53 | group_id = self.group_ids[idx]
54 | buffer_per_group[group_id].append(idx)
55 | samples_per_group[group_id].append(idx)
56 | if len(buffer_per_group[group_id]) == self.batch_size:
57 | yield buffer_per_group[group_id]
58 | num_batches += 1
59 | del buffer_per_group[group_id]
60 | assert len(buffer_per_group[group_id]) < self.batch_size
61 |
62 | # now we have run out of elements that satisfy
63 | # the group criteria, let's return the remaining
64 | # elements so that the size of the sampler is
65 | # deterministic
66 | expected_num_batches = len(self)
67 | num_remaining = expected_num_batches - num_batches
68 | if num_remaining > 0:
69 | # for the remaining batches, take first the buffers with largest number
70 | # of elements
71 | for group_id, _ in sorted(buffer_per_group.items(),
72 | key=lambda x: len(x[1]), reverse=True):
73 | remaining = self.batch_size - len(buffer_per_group[group_id])
74 | samples_from_group_id = _repeat_to_at_least(samples_per_group[group_id], remaining)
75 | buffer_per_group[group_id].extend(samples_from_group_id[:remaining])
76 | assert len(buffer_per_group[group_id]) == self.batch_size
77 | yield buffer_per_group[group_id]
78 | num_remaining -= 1
79 | if num_remaining == 0:
80 | break
81 | assert num_remaining == 0
82 |
83 | def __len__(self):
84 | return len(self.sampler) // self.batch_size
85 |
86 |
87 | def _compute_aspect_ratios_slow(dataset, indices=None):
88 | print("Your dataset doesn't support the fast path for "
89 | "computing the aspect ratios, so will iterate over "
90 | "the full dataset and load every image instead. "
91 | "This might take some time...")
92 | if indices is None:
93 | indices = range(len(dataset))
94 |
95 | class SubsetSampler(Sampler):
96 | def __init__(self, indices):
97 | self.indices = indices
98 |
99 | def __iter__(self):
100 | return iter(self.indices)
101 |
102 | def __len__(self):
103 | return len(self.indices)
104 |
105 | sampler = SubsetSampler(indices)
106 | data_loader = torch.utils.data.DataLoader(
107 | dataset, batch_size=1, sampler=sampler,
108 | num_workers=14, # you might want to increase it for faster processing
109 | collate_fn=lambda x: x[0])
110 | aspect_ratios = []
111 | with tqdm(total=len(dataset)) as pbar:
112 | for _i, (img, _) in enumerate(data_loader):
113 | pbar.update(1)
114 | height, width = img.shape[-2:]
115 | aspect_ratio = float(width) / float(height)
116 | aspect_ratios.append(aspect_ratio)
117 | return aspect_ratios
118 |
119 |
120 | def _compute_aspect_ratios_custom_dataset(dataset, indices=None):
121 | if indices is None:
122 | indices = range(len(dataset))
123 | aspect_ratios = []
124 | for i in indices:
125 | height, width = dataset.get_height_and_width(i)
126 | aspect_ratio = float(width) / float(height)
127 | aspect_ratios.append(aspect_ratio)
128 | return aspect_ratios
129 |
130 |
131 | def _compute_aspect_ratios_coco_dataset(dataset, indices=None):
132 | if indices is None:
133 | indices = range(len(dataset))
134 | aspect_ratios = []
135 | for i in indices:
136 | img_info = dataset.coco.imgs[dataset.ids[i]]
137 | aspect_ratio = float(img_info["width"]) / float(img_info["height"])
138 | aspect_ratios.append(aspect_ratio)
139 | return aspect_ratios
140 |
141 |
142 | def _compute_aspect_ratios_voc_dataset(dataset, indices=None):
143 | if indices is None:
144 | indices = range(len(dataset))
145 | aspect_ratios = []
146 | for i in indices:
147 | # this doesn't load the data into memory, because PIL loads it lazily
148 | width, height = Image.open(dataset.images[i]).size
149 | aspect_ratio = float(width) / float(height)
150 | aspect_ratios.append(aspect_ratio)
151 | return aspect_ratios
152 |
153 |
154 | def _compute_aspect_ratios_subset_dataset(dataset, indices=None):
155 | if indices is None:
156 | indices = range(len(dataset))
157 |
158 | ds_indices = [dataset.indices[i] for i in indices]
159 | return compute_aspect_ratios(dataset.dataset, ds_indices)
160 |
161 |
162 | def compute_aspect_ratios(dataset, indices=None):
163 | if hasattr(dataset, "get_height_and_width"):
164 | return _compute_aspect_ratios_custom_dataset(dataset, indices)
165 |
166 | if isinstance(dataset, torchvision.datasets.CocoDetection):
167 | return _compute_aspect_ratios_coco_dataset(dataset, indices)
168 |
169 | if isinstance(dataset, torchvision.datasets.VOCDetection):
170 | return _compute_aspect_ratios_voc_dataset(dataset, indices)
171 |
172 | if isinstance(dataset, torch.utils.data.Subset):
173 | return _compute_aspect_ratios_subset_dataset(dataset, indices)
174 |
175 | # slow path
176 | return _compute_aspect_ratios_slow(dataset, indices)
177 |
178 |
179 | def _quantize(x, bins):
180 | bins = copy.deepcopy(bins)
181 | bins = sorted(bins)
182 | # bisect_right:寻找y元素按顺序应该排在bins中哪个元素的右边,返回的是索引
183 | quantized = list(map(lambda y: bisect.bisect_right(bins, y), x))
184 | return quantized
185 |
186 |
187 | def create_aspect_ratio_groups(dataset, k=0):
188 | # 计算所有数据集中的图片width/height比例
189 | aspect_ratios = compute_aspect_ratios(dataset)
190 | # 将[0.5, 2]区间划分成2*k+1等份
191 | bins = (2 ** np.linspace(-1, 1, 2 * k + 1)).tolist() if k > 0 else [1.0]
192 |
193 | # 统计所有图像比例在bins区间中的位置索引
194 | groups = _quantize(aspect_ratios, bins)
195 | # count number of elements per group
196 | # 统计每个区间的频次
197 | counts = np.unique(groups, return_counts=True)[1]
198 | fbins = [0] + bins + [np.inf]
199 | print("Using {} as bins for aspect ratio quantization".format(fbins))
200 | print("Count of instances per bin: {}".format(counts))
201 | return groups
202 |
--------------------------------------------------------------------------------
/train_utils/train_eval_utils.py:
--------------------------------------------------------------------------------
1 | import math
2 | import sys
3 | import time
4 |
5 | import torch
6 |
7 | import train_utils.distributed_utils as utils
8 | from .coco_eval import EvalCOCOMetric
9 |
10 |
11 | def train_one_epoch(model, optimizer, data_loader, device, epoch,attr_vectors,
12 | print_freq=50, warmup=False, scaler=None):
13 | model.train()
14 | metric_logger = utils.MetricLogger(delimiter=" ")
15 | metric_logger.add_meter('lr', utils.SmoothedValue(window_size=1, fmt='{value:.6f}'))
16 | header = 'Epoch: [{}]'.format(epoch)
17 |
18 | lr_scheduler = None
19 | if epoch == 0 and warmup is True: # 当训练第一轮(epoch=0)时,启用warmup训练方式,可理解为热身训练
20 | warmup_factor = 1.0 / 1000
21 | warmup_iters = min(1000, len(data_loader) - 1)
22 |
23 | lr_scheduler = utils.warmup_lr_scheduler(optimizer, warmup_iters, warmup_factor)
24 |
25 | mloss = torch.zeros(1).to(device) # mean losses
26 | for i, [images, targets] in enumerate(metric_logger.log_every(data_loader, print_freq, header)):
27 | images = list(image.to(device) for image in images)
28 | targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
29 |
30 | # 混合精度训练上下文管理器,如果在CPU环境中不起任何作用
31 | with torch.cuda.amp.autocast(enabled=scaler is not None):
32 | loss_dict = model(images,attr_vectors, targets)
33 |
34 | losses = sum(loss for loss in loss_dict.values())
35 |
36 | # reduce losses over all GPUs for logging purpose
37 | loss_dict_reduced = utils.reduce_dict(loss_dict)
38 | losses_reduced = sum(loss for loss in loss_dict_reduced.values())
39 |
40 | loss_value = losses_reduced.item()
41 | # 记录训练损失
42 | mloss = (mloss * i + loss_value) / (i + 1) # update mean losses
43 |
44 | if not math.isfinite(loss_value): # 当计算的损失为无穷大时停止训练
45 | print("Loss is {}, stopping training".format(loss_value))
46 | print(loss_dict_reduced)
47 | sys.exit(1)
48 |
49 | optimizer.zero_grad()
50 | if scaler is not None:
51 | scaler.scale(losses).backward()
52 | scaler.step(optimizer)
53 | scaler.update()
54 | else:
55 | losses.backward()
56 | optimizer.step()
57 |
58 | if lr_scheduler is not None: # 第一轮使用warmup训练方式
59 | lr_scheduler.step()
60 |
61 | metric_logger.update(loss=losses_reduced, **loss_dict_reduced)
62 | now_lr = optimizer.param_groups[0]["lr"]
63 | metric_logger.update(lr=now_lr)
64 |
65 | return mloss, now_lr
66 |
67 |
68 | @torch.no_grad()
69 | def evaluate(model, data_loader,attr_vectors, device):
70 | cpu_device = torch.device("cpu")
71 | model.eval()
72 | metric_logger = utils.MetricLogger(delimiter=" ")
73 | header = "Test: "
74 |
75 | det_metric = EvalCOCOMetric(data_loader.dataset.coco, iou_type="bbox", results_file_name="det_results.json")
76 | seg_metric = EvalCOCOMetric(data_loader.dataset.coco, iou_type="segm", results_file_name="seg_results.json")
77 | for image, targets in metric_logger.log_every(data_loader, 100, header):
78 | image = list(img.to(device) for img in image)
79 |
80 | # 当使用CPU时,跳过GPU相关指令
81 | if device != torch.device("cpu"):
82 | torch.cuda.synchronize(device)
83 |
84 | model_time = time.time()
85 | outputs = model(image,attr_vectors)
86 |
87 | outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs]
88 | model_time = time.time() - model_time
89 |
90 | det_metric.update(targets, outputs)
91 | seg_metric.update(targets, outputs)
92 | metric_logger.update(model_time=model_time)
93 |
94 | # gather the stats from all processes
95 | metric_logger.synchronize_between_processes()
96 | print("Averaged stats:", metric_logger)
97 |
98 | # 同步所有进程中的数据
99 | det_metric.synchronize_results()
100 | seg_metric.synchronize_results()
101 |
102 | if utils.is_main_process():
103 | coco_info = det_metric.evaluate()
104 | seg_info = seg_metric.evaluate()
105 | else:
106 | coco_info = None
107 | seg_info = None
108 |
109 | return coco_info, seg_info
110 |
--------------------------------------------------------------------------------
/transforms.py:
--------------------------------------------------------------------------------
1 | import random
2 | from torchvision.transforms import functional as F
3 |
4 |
5 | class Compose(object):
6 | """组合多个transform函数"""
7 | def __init__(self, transforms):
8 | self.transforms = transforms
9 |
10 | def __call__(self, image, target):
11 | for t in self.transforms:
12 | image, target = t(image, target)
13 | return image, target
14 |
15 |
16 | class ToTensor(object):
17 | """将PIL图像转为Tensor"""
18 | def __call__(self, image, target):
19 | image = F.to_tensor(image)
20 | return image, target
21 |
22 |
23 | class RandomHorizontalFlip(object):
24 | """随机水平翻转图像以及bboxes"""
25 | def __init__(self, prob=0.5):
26 | self.prob = prob
27 |
28 | def __call__(self, image, target):
29 | if random.random() < self.prob:
30 | height, width = image.shape[-2:]
31 | image = image.flip(-1) # 水平翻转图片
32 | bbox = target["boxes"]
33 | # bbox: xmin, ymin, xmax, ymax
34 | bbox[:, [0, 2]] = width - bbox[:, [2, 0]] # 翻转对应bbox坐标信息
35 | target["boxes"] = bbox
36 | if "masks" in target:
37 | target["masks"] = target["masks"].flip(-1)
38 | return image, target
39 |
--------------------------------------------------------------------------------
/validation.py:
--------------------------------------------------------------------------------
1 | """
2 | 该脚本用于调用训练好的模型权重去计算验证集/测试集的COCO指标
3 | 以及每个类别的mAP(IoU=0.5)
4 | """
5 |
6 | import os
7 | import json
8 |
9 | import torch
10 | from tqdm import tqdm
11 | import numpy as np
12 |
13 | import transforms
14 | from backbone import resnet50_fpn_backbone
15 | from network_files import MaskRCNN
16 | from my_dataset_coco import CocoDetection
17 | from my_dataset_cityscraps import CityscrapesDetection
18 | from train_utils import EvalCOCOMetric
19 |
20 |
21 | def summarize(self, catId=None):
22 | """
23 | Compute and display summary metrics for evaluation results.
24 | Note this functin can *only* be applied on the default parameter setting
25 | """
26 |
27 | def _summarize(ap=1, iouThr=None, areaRng='all', maxDets=100):
28 | p = self.params
29 | iStr = ' {:<18} {} @[ IoU={:<9} | area={:>6s} | maxDets={:>3d} ] = {:0.3f}'
30 | titleStr = 'Average Precision' if ap == 1 else 'Average Recall'
31 | typeStr = '(AP)' if ap == 1 else '(AR)'
32 | iouStr = '{:0.2f}:{:0.2f}'.format(p.iouThrs[0], p.iouThrs[-1]) \
33 | if iouThr is None else '{:0.2f}'.format(iouThr)
34 |
35 | aind = [i for i, aRng in enumerate(p.areaRngLbl) if aRng == areaRng]
36 | mind = [i for i, mDet in enumerate(p.maxDets) if mDet == maxDets]
37 |
38 | if ap == 1:
39 | # dimension of precision: [TxRxKxAxM]
40 | s = self.eval['precision']
41 | # IoU
42 | if iouThr is not None:
43 | t = np.where(iouThr == p.iouThrs)[0]
44 | s = s[t]
45 |
46 | if isinstance(catId, int):
47 | s = s[:, :, catId, aind, mind]
48 | else:
49 | s = s[:, :, :, aind, mind]
50 |
51 | else:
52 | # dimension of recall: [TxKxAxM]
53 | s = self.eval['recall']
54 | if iouThr is not None:
55 | t = np.where(iouThr == p.iouThrs)[0]
56 | s = s[t]
57 |
58 | if isinstance(catId, int):
59 | s = s[:, catId, aind, mind]
60 | else:
61 | s = s[:, :, aind, mind]
62 |
63 | if len(s[s > -1]) == 0:
64 | mean_s = -1
65 | else:
66 | mean_s = np.mean(s[s > -1])
67 |
68 | print_string = iStr.format(titleStr, typeStr, iouStr, areaRng, maxDets, mean_s)
69 | return mean_s, print_string
70 |
71 | stats, print_list = [0] * 12, [""] * 12
72 | stats[0], print_list[0] = _summarize(1)
73 | stats[1], print_list[1] = _summarize(1, iouThr=.5, maxDets=self.params.maxDets[2])
74 | stats[2], print_list[2] = _summarize(1, iouThr=.75, maxDets=self.params.maxDets[2])
75 | stats[3], print_list[3] = _summarize(1, areaRng='small', maxDets=self.params.maxDets[2])
76 | stats[4], print_list[4] = _summarize(1, areaRng='medium', maxDets=self.params.maxDets[2])
77 | stats[5], print_list[5] = _summarize(1, areaRng='large', maxDets=self.params.maxDets[2])
78 | stats[6], print_list[6] = _summarize(0, maxDets=self.params.maxDets[0])
79 | stats[7], print_list[7] = _summarize(0, maxDets=self.params.maxDets[1])
80 | stats[8], print_list[8] = _summarize(0, maxDets=self.params.maxDets[2])
81 | stats[9], print_list[9] = _summarize(0, areaRng='small', maxDets=self.params.maxDets[2])
82 | stats[10], print_list[10] = _summarize(0, areaRng='medium', maxDets=self.params.maxDets[2])
83 | stats[11], print_list[11] = _summarize(0, areaRng='large', maxDets=self.params.maxDets[2])
84 |
85 | print_info = "\n".join(print_list)
86 |
87 | if not self.eval:
88 | raise Exception('Please run accumulate() first')
89 |
90 | return stats, print_info
91 |
92 |
93 | def save_info(coco_evaluator,
94 | category_index: dict,
95 | save_name: str = "record_mAP.txt"):
96 | iou_type = coco_evaluator.params.iouType
97 | print(f"IoU metric: {iou_type}")
98 | # calculate COCO info for all classes
99 | coco_stats, print_coco = summarize(coco_evaluator)
100 |
101 | # calculate voc info for every classes(IoU=0.5)
102 | classes = [v for v in category_index.values() if v != "N/A"]
103 | voc_map_info_list = []
104 | for i in range(len(classes)):
105 | stats, _ = summarize(coco_evaluator, catId=i)
106 | voc_map_info_list.append(" {:15}: {}".format(classes[i], stats[1]))
107 |
108 | print_voc = "\n".join(voc_map_info_list)
109 | print(print_voc)
110 |
111 | # 将验证结果保存至txt文件中
112 | with open(save_name, "w") as f:
113 | record_lines = ["COCO results:",
114 | print_coco,
115 | "",
116 | "mAP(IoU=0.5) for each category:",
117 | print_voc]
118 | f.write("\n".join(record_lines))
119 |
120 |
121 | def main(parser_data):
122 | device = torch.device(parser_data.device if torch.cuda.is_available() else "cpu")
123 | print("Using {} device training.".format(device.type))
124 |
125 | data_transform = {
126 | "val": transforms.Compose([transforms.ToTensor()])
127 | }
128 |
129 | # read class_indict
130 | label_json_path = parser_data.label_json_path
131 | assert os.path.exists(label_json_path), "json file {} dose not exist.".format(label_json_path)
132 | with open(label_json_path, 'r') as f:
133 | category_index = json.load(f)
134 |
135 | data_root = parser_data.data_path
136 |
137 | # 注意这里的collate_fn是自定义的,因为读取的数据包括image和targets,不能直接使用默认的方法合成batch
138 | batch_size = parser_data.batch_size
139 | nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers
140 | print('Using %g dataloader workers' % nw)
141 |
142 | # load validation data set
143 | val_dataset = CityscrapesDetection(data_root, "val", data_transform["val"])
144 | # VOCdevkit -> VOC2012 -> ImageSets -> Main -> val.txt
145 | # val_dataset = VOCInstances(data_root, year="2012", txt_name="val.txt", transforms=data_transform["val"])
146 | val_dataset_loader = torch.utils.data.DataLoader(val_dataset,
147 | batch_size=batch_size,
148 | shuffle=False,
149 | pin_memory=True,
150 | num_workers=nw,
151 | collate_fn=val_dataset.collate_fn)
152 |
153 | # create model
154 | backbone = resnet50_fpn_backbone()
155 | model = MaskRCNN(backbone, num_classes=args.num_classes + 1)
156 |
157 | # 载入你自己训练好的模型权重
158 | weights_path = parser_data.weights_path
159 | assert os.path.exists(weights_path), "not found {} file.".format(weights_path)
160 | model.load_state_dict(torch.load(weights_path, map_location='cpu')['model'])
161 | # print(model)
162 |
163 | model.to(device)
164 |
165 | # evaluate on the val dataset
166 | cpu_device = torch.device("cpu")
167 |
168 | det_metric = EvalCOCOMetric(val_dataset.coco, "bbox", "det_results.json")
169 | seg_metric = EvalCOCOMetric(val_dataset.coco, "segm", "seg_results.json")
170 | model.eval()
171 | with torch.no_grad():
172 | for image, targets in tqdm(val_dataset_loader, desc="validation..."):
173 | # 将图片传入指定设备device
174 | image = list(img.to(device) for img in image)
175 |
176 | # inference
177 | outputs = model(image)
178 |
179 | outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs]
180 | det_metric.update(targets, outputs)
181 | seg_metric.update(targets, outputs)
182 |
183 | det_metric.synchronize_results()
184 | seg_metric.synchronize_results()
185 | det_metric.evaluate()
186 | seg_metric.evaluate()
187 |
188 | save_info(det_metric.coco_evaluator, category_index, "det_record_mAP.txt")
189 | save_info(seg_metric.coco_evaluator, category_index, "seg_record_mAP.txt")
190 |
191 |
192 | if __name__ == "__main__":
193 | import argparse
194 |
195 | parser = argparse.ArgumentParser(
196 | description=__doc__)
197 |
198 | # 使用设备类型
199 | parser.add_argument('--device', default='cuda', help='device')
200 |
201 | # 检测目标类别数(不包含背景)
202 | parser.add_argument('--num-classes', type=int, default=4, help='number of classes')
203 |
204 | # 数据集的根目录
205 | parser.add_argument('--data-path', default='/data/wuwentao/data/cityscapes/leftImg8bit/', help='dataset root')
206 |
207 | # 训练好的权重文件
208 | parser.add_argument('--weights-path', default='./save_weights/model_25.pth', type=str, help='training weights')
209 |
210 | # batch size(set to 1, don't change)
211 | parser.add_argument('--batch-size', default=1, type=int, metavar='N',
212 | help='batch size when validation.')
213 | # 类别索引和类别名称对应关系
214 | parser.add_argument('--label-json-path', type=str, default="coco91_indices.json")
215 |
216 | args = parser.parse_args()
217 |
218 | main(args)
219 |
--------------------------------------------------------------------------------