├── README.md ├── __pycache__ ├── my_dataset_cityscraps.cpython-38.pyc └── transforms.cpython-38.pyc ├── backbone ├── __init__.py ├── __pycache__ │ ├── __init__.cpython-38.pyc │ ├── feature_pyramid_network.cpython-38.pyc │ └── resnet50_fpn_model.cpython-38.pyc ├── feature_pyramid_network.py └── resnet50_fpn_model.py ├── cityscrapes4_indices.json ├── draw_box_utils.py ├── faster_rcnn ├── backbone │ ├── __init__.py │ ├── feature_pyramid_network.py │ ├── mobilenetv2_model.py │ ├── resnet50_fpn_model.py │ └── vgg_model.py ├── change_backbone_with_fpn.py ├── change_backbone_without_fpn.py ├── cityscrapes8_indices.json ├── cityscrayp.py ├── draw_box_utils.py ├── my_dataset.py ├── network_files │ ├── __init__.py │ ├── boxes.py │ ├── det_utils.py │ ├── faster_rcnn_framework.py │ ├── image_list.py │ ├── roi_head.py │ ├── rpn_function.py │ └── transform.py ├── pascal_voc_classes.json ├── plot_curve.py ├── predict.py ├── record_mAP.txt ├── requirements.txt ├── split_data.py ├── train_mobilenetv2.py ├── train_multi_GPU.py ├── train_res50_fpn.py ├── train_utils │ ├── __init__.py │ ├── coco_eval.py │ ├── coco_utils.py │ ├── distributed_utils.py │ ├── group_by_aspect_ratio.py │ └── train_eval_utils.py ├── transforms.py └── validation.py ├── figures ├── VehicleMAE_Det.jpg ├── detection_result.jpg ├── experimentalresults.jpg ├── firstIMG.jpg ├── proposal_attentionmaps.jpg └── proposal_attribute.jpg ├── my_dataset_cityscraps.py ├── network_files ├── __init__.py ├── __pycache__ │ ├── __init__.cpython-38.pyc │ ├── boxes.cpython-38.pyc │ ├── det_utils.cpython-38.pyc │ ├── faster_rcnn_framework.cpython-38.pyc │ ├── image_list.cpython-38.pyc │ ├── mask_rcnn.cpython-38.pyc │ ├── roi_head.cpython-38.pyc │ ├── rpn_function.cpython-38.pyc │ ├── transform.cpython-38.pyc │ └── vehiclemaeencode.cpython-38.pyc ├── boxes.py ├── det_utils.py ├── faster_rcnn_framework.py ├── image_list.py ├── mask_rcnn.py ├── roi_head.py ├── rpn_function.py ├── transform.py └── vehiclemaeencode.py ├── plot_curve.py ├── predict.py ├── requirements.txt ├── train.py ├── train_utils ├── __init__.py ├── __pycache__ │ ├── __init__.cpython-38.pyc │ ├── coco_eval.cpython-38.pyc │ ├── coco_utils.cpython-38.pyc │ ├── distributed_utils.cpython-38.pyc │ ├── group_by_aspect_ratio.cpython-38.pyc │ └── train_eval_utils.cpython-38.pyc ├── coco_eval.py ├── coco_utils.py ├── distributed_utils.py ├── group_by_aspect_ratio.py └── train_eval_utils.py ├── transforms.py └── validation.py /README.md: -------------------------------------------------------------------------------- 1 | # VFM-Det 2 | 3 |

4 | 5 |

6 | 7 | **Vehicle Detection using Pre-trained Large Vision-Language Foundation Models** 8 | 9 | ------ 10 | 11 |

17 | 18 |

19 | 20 | > **VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models**, 21 | Wentao Wu†, Fanghua Hong†, Xiao Wang*, Chenglong Li, Jin Tang 22 | [[Paper]()] 23 | [[Code]()] 24 | [[DemoVideo]()] 25 | 26 | 27 | 28 | ### News 29 | * [2024.08.23] The source code is released. 30 | 31 | 32 | 33 | ### Abastract 34 | 35 | Existing vehicle detectors are usually obtained by training a typical detector (e.g., YOLO, RCNN, DETR series) on vehicle images based on a pre-trained backbone (e.g., ResNet, ViT). Some researchers also exploit and enhance the detection performance using pre-trained large foundation models. However, we think these detectors may only get sub-optimal results because the large models they use are not specifically designed for vehicles. In addition, their results heavily rely on visual features, and seldom of they consider the alignment between the vehicle's semantic information and visual representations. In this work, we propose a new vehicle detection paradigm based on a pre-trained foundation vehicle model (VehicleMAE) and a large language model (T5), termed VFM-Det. It follows the region proposal-based detection framework and the features of each proposal can be enhanced using VehicleMAE. More importantly, we propose a new VAtt2Vec module that predicts the vehicle semantic attributes of these proposals and transforms them into feature vectors to enhance the vision features via contrastive learning. Extensive experiments on three vehicle detection benchmark datasets thoroughly proved the effectiveness of our vehicle detector. Specifically, our model improves the baseline approach by $+5.1\%$, $+6.2\%$ on the $AP_{0.5}$, $AP_{0.75}$ metrics, respectively, on the Cityscapes dataset. 36 | 37 | ### Framework 38 | 39 |

40 | 41 | ### Environment Configuration 42 | 43 | Configure the environment according to the content of the requirements.txt file. 44 | 45 | ### Model Training and Testing 46 | 47 | ```bibtex 48 | #If you training VFM-Det using a single GPU, please run. 49 | CUDA_VISIBLE_DEVICES=0 python train.py 50 | 51 | #If you testing VFM-Det, please run. 52 | CUDA_VISIBLE_DEVICES=0 python validation.py 53 | ``` 54 | 55 | ### Experimental Results 56 | 57 |

58 | 59 | 60 | ### Visual Results 61 | 62 |

63 | 64 |

65 | 66 | 67 |

68 | 69 | 70 | ### Datasets and Checkpoints Download 71 | Datasets 72 | 73 | Cityscapes dataset download address：https://www.cityscapes-dataset.com/ 74 | 75 | 76 | COCO2017 dataset download address： 77 | http://images.cocodataset.org/zips/train2017.zip 78 | http://images.cocodataset.org/annotations/annotations_trainval2017.zip 79 | http://images.cocodataset.org/zips/val2017.zip 80 | http://images.cocodataset.org/annotations/stuff_annotations_trainval2017.zip 81 | http://images.cocodataset.org/zips/test2017.zip 82 | http://images.cocodataset.org/annotations/image_info_test2017.zip 83 | 84 | UA-DETRAC dataset download address：https://www.albany.edu/cnse/research/computer-vision-machine-learning-lab 85 | 86 | Checkpoints 87 | 88 | checkpoint | [download](https://pan.baidu.com/s/1ra1fQEsXCrruUtZsBj741g?pwd=2dyx) 89 | 90 | Extracted code |2dyx 91 | 92 | 93 | ### License 94 | 95 | 96 | 97 | ### :cupid: Acknowledgement 98 | * Thanks for the [WZMIAOMIAO](https://github.com/WZMIAOMIAO/deep-learning-for-image-processing) library for a quickly implement. 99 | 100 | 101 | 102 | ### :newspaper: Citation 103 | If you find this work helps your research, please cite the following work and give us a **star**. Any questions you have, please leave an issue. 104 | 105 | ```bibtex 106 | @misc{wu2024VFMDet, 107 | title={VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models}, 108 | author={Wentao Wu and Fanghua Hong and Xiao Wang and Chenglong Li and Jin Tang}, 109 | year={2024}, 110 | eprint={2408.13031}, 111 | archivePrefix={arXiv}, 112 | primaryClass={cs.CV}, 113 | url={https://arxiv.org/abs/2408.13031}, 114 | } 115 | ``` 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | -------------------------------------------------------------------------------- /__pycache__/my_dataset_cityscraps.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/__pycache__/my_dataset_cityscraps.cpython-38.pyc -------------------------------------------------------------------------------- /__pycache__/transforms.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/__pycache__/transforms.cpython-38.pyc -------------------------------------------------------------------------------- /backbone/__init__.py: -------------------------------------------------------------------------------- 1 | from .resnet50_fpn_model import resnet50_fpn_backbone -------------------------------------------------------------------------------- /backbone/__pycache__/__init__.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/backbone/__pycache__/__init__.cpython-38.pyc -------------------------------------------------------------------------------- /backbone/__pycache__/feature_pyramid_network.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/backbone/__pycache__/feature_pyramid_network.cpython-38.pyc -------------------------------------------------------------------------------- /backbone/__pycache__/resnet50_fpn_model.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/backbone/__pycache__/resnet50_fpn_model.cpython-38.pyc -------------------------------------------------------------------------------- /backbone/feature_pyramid_network.py: -------------------------------------------------------------------------------- 1 | from collections import OrderedDict 2 | 3 | import torch.nn as nn 4 | import torch 5 | from torch import Tensor 6 | import torch.nn.functional as F 7 | 8 | from torch.jit.annotations import Tuple, List, Dict 9 | 10 | 11 | class IntermediateLayerGetter(nn.ModuleDict): 12 | """ 13 | Module wrapper that returns intermediate layers from a model 14 | It has a strong assumption that the modules have been registered 15 | into the model in the same order as they are used. 16 | This means that one should **not** reuse the same nn.Module 17 | twice in the forward if you want this to work. 18 | Additionally, it is only able to query submodules that are directly 19 | assigned to the model. So if `model` is passed, `model.feature1` can 20 | be returned, but not `model.feature1.layer2`. 21 | Arguments: 22 | model (nn.Module): model on which we will extract the features 23 | return_layers (Dict[name, new_name]): a dict containing the names 24 | of the modules for which the activations will be returned as 25 | the key of the dict, and the value of the dict is the name 26 | of the returned activation (which the user can specify). 27 | """ 28 | __annotations__ = { 29 | "return_layers": Dict[str, str], 30 | } 31 | 32 | def __init__(self, model, return_layers): 33 | if not set(return_layers).issubset([name for name, _ in model.named_children()]): 34 | raise ValueError("return_layers are not present in model") 35 | 36 | orig_return_layers = return_layers 37 | return_layers = {str(k): str(v) for k, v in return_layers.items()} 38 | layers = OrderedDict() 39 | 40 | # 遍历模型子模块按顺序存入有序字典 41 | # 只保存layer4及其之前的结构，舍去之后不用的结构 42 | for name, module in model.named_children(): 43 | layers[name] = module 44 | if name in return_layers: 45 | del return_layers[name] 46 | if not return_layers: 47 | break 48 | 49 | super().__init__(layers) 50 | self.return_layers = orig_return_layers 51 | 52 | def forward(self, x): 53 | out = OrderedDict() 54 | # 依次遍历模型的所有子模块，并进行正向传播， 55 | # 收集layer1, layer2, layer3, layer4的输出 56 | for name, module in self.items(): 57 | x = module(x) 58 | if name in self.return_layers: 59 | out_name = self.return_layers[name] 60 | out[out_name] = x 61 | return out 62 | 63 | 64 | class BackboneWithFPN(nn.Module): 65 | """ 66 | Adds a FPN on top of a model. 67 | Internally, it uses torchvision.models._utils.IntermediateLayerGetter to 68 | extract a submodel that returns the feature maps specified in return_layers. 69 | The same limitations of IntermediatLayerGetter apply here. 70 | Arguments: 71 | backbone (nn.Module) 72 | return_layers (Dict[name, new_name]): a dict containing the names 73 | of the modules for which the activations will be returned as 74 | the key of the dict, and the value of the dict is the name 75 | of the returned activation (which the user can specify). 76 | in_channels_list (List[int]): number of channels for each feature map 77 | that is returned, in the order they are present in the OrderedDict 78 | out_channels (int): number of channels in the FPN. 79 | extra_blocks: ExtraFPNBlock 80 | Attributes: 81 | out_channels (int): the number of channels in the FPN 82 | """ 83 | 84 | def __init__(self, 85 | backbone: nn.Module, 86 | return_layers=None, 87 | in_channels_list=None, 88 | out_channels=256, 89 | extra_blocks=None, 90 | re_getter=True): 91 | super().__init__() 92 | 93 | if extra_blocks is None: 94 | extra_blocks = LastLevelMaxPool() 95 | 96 | if re_getter: 97 | assert return_layers is not None 98 | self.body = IntermediateLayerGetter(backbone, return_layers=return_layers) 99 | else: 100 | self.body = backbone 101 | 102 | self.fpn = FeaturePyramidNetwork( 103 | in_channels_list=in_channels_list, 104 | out_channels=out_channels, 105 | extra_blocks=extra_blocks, 106 | ) 107 | 108 | self.out_channels = out_channels 109 | 110 | def forward(self, x): 111 | x = self.body(x) 112 | x = self.fpn(x) 113 | return x 114 | 115 | 116 | class FeaturePyramidNetwork(nn.Module): 117 | """ 118 | Module that adds a FPN from on top of a set of feature maps. This is based on 119 | `"Feature Pyramid Network for Object Detection" `_. 120 | The feature maps are currently supposed to be in increasing depth 121 | order. 122 | The input to the model is expected to be an OrderedDict[Tensor], containing 123 | the feature maps on top of which the FPN will be added. 124 | Arguments: 125 | in_channels_list (list[int]): number of channels for each feature map that 126 | is passed to the module 127 | out_channels (int): number of channels of the FPN representation 128 | extra_blocks (ExtraFPNBlock or None): if provided, extra operations will 129 | be performed. It is expected to take the fpn features, the original 130 | features and the names of the original features as input, and returns 131 | a new list of feature maps and their corresponding names 132 | """ 133 | 134 | def __init__(self, in_channels_list, out_channels, extra_blocks=None): 135 | super().__init__() 136 | # 用来调整resnet特征矩阵(layer1,2,3,4)的channel（kernel_size=1） 137 | self.inner_blocks = nn.ModuleList() 138 | # 对调整后的特征矩阵使用3x3的卷积核来得到对应的预测特征矩阵 139 | self.layer_blocks = nn.ModuleList() 140 | for in_channels in in_channels_list: 141 | if in_channels == 0: 142 | continue 143 | inner_block_module = nn.Conv2d(in_channels, out_channels, 1) 144 | layer_block_module = nn.Conv2d(out_channels, out_channels, 3, padding=1) 145 | self.inner_blocks.append(inner_block_module) 146 | self.layer_blocks.append(layer_block_module) 147 | 148 | # initialize parameters now to avoid modifying the initialization of top_blocks 149 | for m in self.children(): 150 | if isinstance(m, nn.Conv2d): 151 | nn.init.kaiming_uniform_(m.weight, a=1) 152 | nn.init.constant_(m.bias, 0) 153 | 154 | self.extra_blocks = extra_blocks 155 | 156 | def get_result_from_inner_blocks(self, x: Tensor, idx: int) -> Tensor: 157 | """ 158 | This is equivalent to self.inner_blocks[idx](x), 159 | but torchscript doesn't support this yet 160 | """ 161 | num_blocks = len(self.inner_blocks) 162 | if idx < 0: 163 | idx += num_blocks 164 | i = 0 165 | out = x 166 | for module in self.inner_blocks: 167 | if i == idx: 168 | out = module(x) 169 | i += 1 170 | return out 171 | 172 | def get_result_from_layer_blocks(self, x: Tensor, idx: int) -> Tensor: 173 | """ 174 | This is equivalent to self.layer_blocks[idx](x), 175 | but torchscript doesn't support this yet 176 | """ 177 | num_blocks = len(self.layer_blocks) 178 | if idx < 0: 179 | idx += num_blocks 180 | i = 0 181 | out = x 182 | for module in self.layer_blocks: 183 | if i == idx: 184 | out = module(x) 185 | i += 1 186 | return out 187 | 188 | def forward(self, x: Dict[str, Tensor]) -> Dict[str, Tensor]: 189 | """ 190 | Computes the FPN for a set of feature maps. 191 | Arguments: 192 | x (OrderedDict[Tensor]): feature maps for each feature level. 193 | Returns: 194 | results (OrderedDict[Tensor]): feature maps after FPN layers. 195 | They are ordered from highest resolution first. 196 | """ 197 | # unpack OrderedDict into two lists for easier handling 198 | names = list(x.keys()) 199 | x = list(x.values()) 200 | 201 | # 将resnet layer4的channel调整到指定的out_channels 202 | # last_inner = self.inner_blocks[-1](x[-1]) 203 | last_inner = self.get_result_from_inner_blocks(x[-1], -1) 204 | # result中保存着每个预测特征层 205 | results = [] 206 | # 将layer4调整channel后的特征矩阵，通过3x3卷积后得到对应的预测特征矩阵 207 | # results.append(self.layer_blocks[-1](last_inner)) 208 | results.append(self.get_result_from_layer_blocks(last_inner, -1)) 209 | 210 | for idx in range(len(x) - 2, -1, -1): 211 | inner_lateral = self.get_result_from_inner_blocks(x[idx], idx) 212 | feat_shape = inner_lateral.shape[-2:] 213 | inner_top_down = F.interpolate(last_inner, size=feat_shape, mode="nearest") 214 | last_inner = inner_lateral + inner_top_down 215 | results.insert(0, self.get_result_from_layer_blocks(last_inner, idx)) 216 | 217 | # 在layer4对应的预测特征层基础上生成预测特征矩阵5 218 | if self.extra_blocks is not None: 219 | results, names = self.extra_blocks(results, x, names) 220 | 221 | # make it back an OrderedDict 222 | out = OrderedDict([(k, v) for k, v in zip(names, results)]) 223 | 224 | return out 225 | 226 | 227 | class LastLevelMaxPool(torch.nn.Module): 228 | """ 229 | Applies a max_pool2d on top of the last feature map 230 | """ 231 | 232 | def forward(self, x: List[Tensor], y: List[Tensor], names: List[str]) -> Tuple[List[Tensor], List[str]]: 233 | names.append("pool") 234 | x.append(F.max_pool2d(x[-1], 1, 2, 0)) 235 | return x, names 236 | -------------------------------------------------------------------------------- /backbone/resnet50_fpn_model.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import torch 4 | import torch.nn as nn 5 | from torchvision.ops.misc import FrozenBatchNorm2d 6 | 7 | from .feature_pyramid_network import BackboneWithFPN, LastLevelMaxPool 8 | 9 | 10 | class Bottleneck(nn.Module): 11 | expansion = 4 12 | 13 | def __init__(self, in_channel, out_channel, stride=1, downsample=None, norm_layer=None): 14 | super().__init__() 15 | if norm_layer is None: 16 | norm_layer = nn.BatchNorm2d 17 | 18 | self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel, 19 | kernel_size=1, stride=1, bias=False) # squeeze channels 20 | self.bn1 = norm_layer(out_channel) 21 | # ----------------------------------------- 22 | self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel, 23 | kernel_size=3, stride=stride, bias=False, padding=1) 24 | self.bn2 = norm_layer(out_channel) 25 | # ----------------------------------------- 26 | self.conv3 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel * self.expansion, 27 | kernel_size=1, stride=1, bias=False) # unsqueeze channels 28 | self.bn3 = norm_layer(out_channel * self.expansion) 29 | self.relu = nn.ReLU(inplace=True) 30 | self.downsample = downsample 31 | 32 | def forward(self, x): 33 | identity = x 34 | if self.downsample is not None: 35 | identity = self.downsample(x) 36 | 37 | out = self.conv1(x) 38 | out = self.bn1(out) 39 | out = self.relu(out) 40 | 41 | out = self.conv2(out) 42 | out = self.bn2(out) 43 | out = self.relu(out) 44 | 45 | out = self.conv3(out) 46 | out = self.bn3(out) 47 | 48 | out += identity 49 | out = self.relu(out) 50 | 51 | return out 52 | 53 | 54 | class ResNet(nn.Module): 55 | 56 | def __init__(self, block, blocks_num, num_classes=1000, include_top=True, norm_layer=None): 57 | super().__init__() 58 | if norm_layer is None: 59 | norm_layer = nn.BatchNorm2d 60 | self._norm_layer = norm_layer 61 | 62 | self.include_top = include_top 63 | self.in_channel = 64 64 | 65 | self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2, 66 | padding=3, bias=False) 67 | self.bn1 = norm_layer(self.in_channel) 68 | self.relu = nn.ReLU(inplace=True) 69 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 70 | self.layer1 = self._make_layer(block, 64, blocks_num[0]) 71 | self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2) 72 | self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2) 73 | self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2) 74 | if self.include_top: 75 | self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # output size = (1, 1) 76 | self.fc = nn.Linear(512 * block.expansion, num_classes) 77 | 78 | for m in self.modules(): 79 | if isinstance(m, nn.Conv2d): 80 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 81 | 82 | def _make_layer(self, block, channel, block_num, stride=1): 83 | norm_layer = self._norm_layer 84 | downsample = None 85 | if stride != 1 or self.in_channel != channel * block.expansion: 86 | downsample = nn.Sequential( 87 | nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False), 88 | norm_layer(channel * block.expansion)) 89 | 90 | layers = [] 91 | layers.append(block(self.in_channel, channel, downsample=downsample, 92 | stride=stride, norm_layer=norm_layer)) 93 | self.in_channel = channel * block.expansion 94 | 95 | for _ in range(1, block_num): 96 | layers.append(block(self.in_channel, channel, norm_layer=norm_layer)) 97 | 98 | return nn.Sequential(*layers) 99 | 100 | def forward(self, x): 101 | x = self.conv1(x) 102 | x = self.bn1(x) 103 | x = self.relu(x) 104 | x = self.maxpool(x) 105 | 106 | x = self.layer1(x) 107 | x = self.layer2(x) 108 | x = self.layer3(x) 109 | x = self.layer4(x) 110 | 111 | if self.include_top: 112 | x = self.avgpool(x) 113 | x = torch.flatten(x, 1) 114 | x = self.fc(x) 115 | 116 | return x 117 | 118 | 119 | def overwrite_eps(model, eps): 120 | """ 121 | This method overwrites the default eps values of all the 122 | FrozenBatchNorm2d layers of the model with the provided value. 123 | This is necessary to address the BC-breaking change introduced 124 | by the bug-fix at pytorch/vision#2933. The overwrite is applied 125 | only when the pretrained weights are loaded to maintain compatibility 126 | with previous versions. 127 | 128 | Args: 129 | model (nn.Module): The model on which we perform the overwrite. 130 | eps (float): The new value of eps. 131 | """ 132 | for module in model.modules(): 133 | if isinstance(module, FrozenBatchNorm2d): 134 | module.eps = eps 135 | 136 | 137 | def resnet50_fpn_backbone(pretrain_path="/home/lcl_d/wuwentao/detection/deep-learning-for-image-processing-master/deep-learning-for-image-processing-master/pre_model/resnet50.pth", 138 | norm_layer=nn.BatchNorm2d, 139 | trainable_layers=3, 140 | returned_layers=None, 141 | extra_blocks=None): 142 | """ 143 | 搭建resnet50_fpn——backbone 144 | Args: 145 | pretrain_path: resnet50的预训练权重，如果不使用就默认为空 146 | norm_layer: 默认是nn.BatchNorm2d，如果GPU显存很小，batch_size不能设置很大， 147 | 建议将norm_layer设置成FrozenBatchNorm2d(默认是nn.BatchNorm2d) 148 | (https://github.com/facebookresearch/maskrcnn-benchmark/issues/267) 149 | trainable_layers: 指定训练哪些层结构 150 | returned_layers: 指定哪些层的输出需要返回 151 | extra_blocks: 在输出的特征层基础上额外添加的层结构 152 | 153 | Returns: 154 | 155 | """ 156 | resnet_backbone = ResNet(Bottleneck, [3, 4, 6, 3], 157 | include_top=False, 158 | norm_layer=norm_layer) 159 | 160 | if isinstance(norm_layer, FrozenBatchNorm2d): 161 | overwrite_eps(resnet_backbone, 0.0) 162 | 163 | if pretrain_path != "": 164 | #assert os.path.exists(pretrain_path), "{} is not exist.".format(pretrain_path) 165 | # 载入预训练权重 166 | print(resnet_backbone.load_state_dict(torch.load(pretrain_path), strict=False)) 167 | 168 | # select layers that wont be frozen 169 | assert 0 <= trainable_layers <= 5 170 | layers_to_train = ['layer4', 'layer3', 'layer2', 'layer1', 'conv1'][:trainable_layers] 171 | 172 | # 如果要训练所有层结构的话，不要忘了conv1后还有一个bn1 173 | if trainable_layers == 5: 174 | layers_to_train.append("bn1") 175 | 176 | # freeze layers 177 | for name, parameter in resnet_backbone.named_parameters(): 178 | # 只训练不在layers_to_train列表中的层结构 179 | if all([not name.startswith(layer) for layer in layers_to_train]): 180 | parameter.requires_grad_(False) 181 | 182 | if extra_blocks is None: 183 | extra_blocks = LastLevelMaxPool() 184 | 185 | if returned_layers is None: 186 | returned_layers = [1, 2, 3, 4] 187 | # 返回的特征层个数肯定大于0小于5 188 | assert min(returned_layers) > 0 and max(returned_layers) < 5 189 | 190 | # return_layers = {'layer1': '0', 'layer2': '1', 'layer3': '2', 'layer4': '3'} 191 | return_layers = {f'layer{k}': str(v) for v, k in enumerate(returned_layers)} 192 | 193 | # in_channel 为layer4的输出特征矩阵channel = 2048 194 | in_channels_stage2 = resnet_backbone.in_channel // 8 # 256 195 | # 记录resnet50提供给fpn的每个特征层channel 196 | in_channels_list = [in_channels_stage2 * 2 ** (i - 1) for i in returned_layers] 197 | # 通过fpn后得到的每个特征层的channel 198 | out_channels = 256 199 | return BackboneWithFPN(resnet_backbone, return_layers, in_channels_list, out_channels, extra_blocks=extra_blocks) 200 | -------------------------------------------------------------------------------- /cityscrapes4_indices.json: -------------------------------------------------------------------------------- 1 | { 2 | "1": "car", 3 | "2": "truck", 4 | "3": "bus", 5 | "4": "caravan" 6 | } -------------------------------------------------------------------------------- /draw_box_utils.py: -------------------------------------------------------------------------------- 1 | from PIL.Image import Image, fromarray 2 | import PIL.ImageDraw as ImageDraw 3 | import PIL.ImageFont as ImageFont 4 | from PIL import ImageColor 5 | import numpy as np 6 | 7 | STANDARD_COLORS = [ 8 | 'AliceBlue', 'Chartreuse', 'Aqua', 'Aquamarine', 'Azure', 'Beige', 'Bisque', 9 | 'BlanchedAlmond', 'BlueViolet', 'BurlyWood', 'CadetBlue', 'AntiqueWhite', 10 | 'Chocolate', 'Coral', 'CornflowerBlue', 'Cornsilk', 'Crimson', 'Cyan', 11 | 'DarkCyan', 'DarkGoldenRod', 'DarkGrey', 'DarkKhaki', 'DarkOrange', 12 | 'DarkOrchid', 'DarkSalmon', 'DarkSeaGreen', 'DarkTurquoise', 'DarkViolet', 13 | 'DeepPink', 'DeepSkyBlue', 'DodgerBlue', 'FireBrick', 'FloralWhite', 14 | 'ForestGreen', 'Fuchsia', 'Gainsboro', 'GhostWhite', 'Gold', 'GoldenRod', 15 | 'Salmon', 'Tan', 'HoneyDew', 'HotPink', 'IndianRed', 'Ivory', 'Khaki', 16 | 'Lavender', 'LavenderBlush', 'LawnGreen', 'LemonChiffon', 'LightBlue', 17 | 'LightCoral', 'LightCyan', 'LightGoldenRodYellow', 'LightGray', 'LightGrey', 18 | 'LightGreen', 'LightPink', 'LightSalmon', 'LightSeaGreen', 'LightSkyBlue', 19 | 'LightSlateGray', 'LightSlateGrey', 'LightSteelBlue', 'LightYellow', 'Lime', 20 | 'LimeGreen', 'Linen', 'Magenta', 'MediumAquaMarine', 'MediumOrchid', 21 | 'MediumPurple', 'MediumSeaGreen', 'MediumSlateBlue', 'MediumSpringGreen', 22 | 'MediumTurquoise', 'MediumVioletRed', 'MintCream', 'MistyRose', 'Moccasin', 23 | 'NavajoWhite', 'OldLace', 'Olive', 'OliveDrab', 'Orange', 'OrangeRed', 24 | 'Orchid', 'PaleGoldenRod', 'PaleGreen', 'PaleTurquoise', 'PaleVioletRed', 25 | 'PapayaWhip', 'PeachPuff', 'Peru', 'Pink', 'Plum', 'PowderBlue', 'Purple', 26 | 'Red', 'RosyBrown', 'RoyalBlue', 'SaddleBrown', 'Green', 'SandyBrown', 27 | 'SeaGreen', 'SeaShell', 'Sienna', 'Silver', 'SkyBlue', 'SlateBlue', 28 | 'SlateGray', 'SlateGrey', 'Snow', 'SpringGreen', 'SteelBlue', 'GreenYellow', 29 | 'Teal', 'Thistle', 'Tomato', 'Turquoise', 'Violet', 'Wheat', 'White', 30 | 'WhiteSmoke', 'Yellow', 'YellowGreen' 31 | ] 32 | 33 | 34 | def draw_text(draw, 35 | box: list, 36 | cls: int, 37 | score: float, 38 | category_index: dict, 39 | color: str, 40 | font: str = 'arial.ttf', 41 | font_size: int = 24): 42 | """ 43 | 将目标边界框和类别信息绘制到图片上 44 | """ 45 | try: 46 | font = ImageFont.truetype(font, font_size) 47 | except IOError: 48 | font = ImageFont.load_default() 49 | 50 | left, top, right, bottom = box 51 | # If the total height of the display strings added to the top of the bounding 52 | # box exceeds the top of the image, stack the strings below the bounding box 53 | # instead of above. 54 | display_str = f"{category_index[str(cls)]}: {int(100 * score)}%" 55 | display_str_heights = [font.getsize(ds)[1] for ds in display_str] 56 | # Each display_str has a top and bottom margin of 0.05x. 57 | display_str_height = (1 + 2 * 0.05) * max(display_str_heights) 58 | 59 | if top > display_str_height: 60 | text_top = top - display_str_height 61 | text_bottom = top 62 | else: 63 | text_top = bottom 64 | text_bottom = bottom + display_str_height 65 | 66 | for ds in display_str: 67 | text_width, text_height = font.getsize(ds) 68 | margin = np.ceil(0.05 * text_width) 69 | draw.rectangle([(left, text_top), 70 | (left + text_width + 2 * margin, text_bottom)], fill=color) 71 | draw.text((left + margin, text_top), 72 | ds, 73 | fill='black', 74 | font=font) 75 | left += text_width 76 | 77 | 78 | def draw_masks(image, masks, colors, thresh: float = 0.7, alpha: float = 0.5): 79 | np_image = np.array(image) 80 | masks = np.where(masks > thresh, True, False) 81 | 82 | # colors = np.array(colors) 83 | img_to_draw = np.copy(np_image) 84 | # TODO: There might be a way to vectorize this 85 | for mask, color in zip(masks, colors): 86 | img_to_draw[mask] = color 87 | 88 | out = np_image * (1 - alpha) + img_to_draw * alpha 89 | return fromarray(out.astype(np.uint8)) 90 | 91 | 92 | def draw_objs(image: Image, 93 | boxes: np.ndarray = None, 94 | classes: np.ndarray = None, 95 | scores: np.ndarray = None, 96 | masks: np.ndarray = None, 97 | category_index: dict = None, 98 | box_thresh: float = 0.1, 99 | mask_thresh: float = 0.5, 100 | line_thickness: int = 8, 101 | font: str = 'arial.ttf', 102 | font_size: int = 24, 103 | draw_boxes_on_image: bool = True, 104 | draw_masks_on_image: bool = True): 105 | """ 106 | 将目标边界框信息，类别信息，mask信息绘制在图片上 107 | Args: 108 | image: 需要绘制的图片 109 | boxes: 目标边界框信息 110 | classes: 目标类别信息 111 | scores: 目标概率信息 112 | masks: 目标mask信息 113 | category_index: 类别与名称字典 114 | box_thresh: 过滤的概率阈值 115 | mask_thresh: 116 | line_thickness: 边界框宽度 117 | font: 字体类型 118 | font_size: 字体大小 119 | draw_boxes_on_image: 120 | draw_masks_on_image: 121 | 122 | Returns: 123 | 124 | """ 125 | 126 | # 过滤掉低概率的目标 127 | idxs = np.greater(scores, box_thresh) 128 | boxes = boxes[idxs] 129 | classes = classes[idxs] 130 | scores = scores[idxs] 131 | if masks is not None: 132 | masks = masks[idxs] 133 | if len(boxes) == 0: 134 | return image 135 | 136 | colors = [ImageColor.getrgb(STANDARD_COLORS[cls % len(STANDARD_COLORS)]) for cls in classes] 137 | 138 | if draw_boxes_on_image: 139 | # Draw all boxes onto image. 140 | draw = ImageDraw.Draw(image) 141 | for box, cls, score, color in zip(boxes, classes, scores, colors): 142 | left, top, right, bottom = box 143 | # 绘制目标边界框 144 | draw.line([(left, top), (left, bottom), (right, bottom), 145 | (right, top), (left, top)], width=line_thickness, fill=color) 146 | # 绘制类别和概率信息 147 | draw_text(draw, box.tolist(), int(cls), float(score), category_index, color, font, font_size) 148 | 149 | if draw_masks_on_image and (masks is not None): 150 | # Draw all mask onto image. 151 | image = draw_masks(image, masks, colors, mask_thresh) 152 | 153 | return image 154 | -------------------------------------------------------------------------------- /faster_rcnn/backbone/__init__.py: -------------------------------------------------------------------------------- 1 | from .resnet50_fpn_model import resnet50_fpn_backbone 2 | from .mobilenetv2_model import MobileNetV2 3 | from .vgg_model import vgg 4 | from .feature_pyramid_network import LastLevelMaxPool, BackboneWithFPN 5 | -------------------------------------------------------------------------------- /faster_rcnn/backbone/feature_pyramid_network.py: -------------------------------------------------------------------------------- 1 | from collections import OrderedDict 2 | 3 | import torch.nn as nn 4 | import torch 5 | from torch import Tensor 6 | import torch.nn.functional as F 7 | 8 | from torch.jit.annotations import Tuple, List, Dict 9 | 10 | 11 | class IntermediateLayerGetter(nn.ModuleDict): 12 | """ 13 | Module wrapper that returns intermediate layers from a model 14 | It has a strong assumption that the modules have been registered 15 | into the model in the same order as they are used. 16 | This means that one should **not** reuse the same nn.Module 17 | twice in the forward if you want this to work. 18 | Additionally, it is only able to query submodules that are directly 19 | assigned to the model. So if `model` is passed, `model.feature1` can 20 | be returned, but not `model.feature1.layer2`. 21 | Arguments: 22 | model (nn.Module): model on which we will extract the features 23 | return_layers (Dict[name, new_name]): a dict containing the names 24 | of the modules for which the activations will be returned as 25 | the key of the dict, and the value of the dict is the name 26 | of the returned activation (which the user can specify). 27 | """ 28 | __annotations__ = { 29 | "return_layers": Dict[str, str], 30 | } 31 | 32 | def __init__(self, model, return_layers): 33 | if not set(return_layers).issubset([name for name, _ in model.named_children()]): 34 | raise ValueError("return_layers are not present in model") 35 | 36 | orig_return_layers = return_layers 37 | return_layers = {str(k): str(v) for k, v in return_layers.items()} 38 | layers = OrderedDict() 39 | 40 | # 遍历模型子模块按顺序存入有序字典 41 | # 只保存layer4及其之前的结构，舍去之后不用的结构 42 | for name, module in model.named_children(): 43 | layers[name] = module 44 | if name in return_layers: 45 | del return_layers[name] 46 | if not return_layers: 47 | break 48 | 49 | super().__init__(layers) 50 | self.return_layers = orig_return_layers 51 | 52 | def forward(self, x): 53 | out = OrderedDict() 54 | # 依次遍历模型的所有子模块，并进行正向传播， 55 | # 收集layer1, layer2, layer3, layer4的输出 56 | for name, module in self.items(): 57 | x = module(x) 58 | if name in self.return_layers: 59 | out_name = self.return_layers[name] 60 | out[out_name] = x 61 | return out 62 | 63 | 64 | class FeaturePyramidNetwork(nn.Module): 65 | """ 66 | Module that adds a FPN from on top of a set of feature maps. This is based on 67 | `"Feature Pyramid Network for Object Detection" `_. 68 | The feature maps are currently supposed to be in increasing depth 69 | order. 70 | The input to the model is expected to be an OrderedDict[Tensor], containing 71 | the feature maps on top of which the FPN will be added. 72 | Arguments: 73 | in_channels_list (list[int]): number of channels for each feature map that 74 | is passed to the module 75 | out_channels (int): number of channels of the FPN representation 76 | extra_blocks (ExtraFPNBlock or None): if provided, extra operations will 77 | be performed. It is expected to take the fpn features, the original 78 | features and the names of the original features as input, and returns 79 | a new list of feature maps and their corresponding names 80 | """ 81 | 82 | def __init__(self, in_channels_list, out_channels, extra_blocks=None): 83 | super().__init__() 84 | # 用来调整resnet特征矩阵(layer1,2,3,4)的channel（kernel_size=1） 85 | self.inner_blocks = nn.ModuleList() 86 | # 对调整后的特征矩阵使用3x3的卷积核来得到对应的预测特征矩阵 87 | self.layer_blocks = nn.ModuleList() 88 | for in_channels in in_channels_list: 89 | if in_channels == 0: 90 | continue 91 | inner_block_module = nn.Conv2d(in_channels, out_channels, 1) 92 | layer_block_module = nn.Conv2d(out_channels, out_channels, 3, padding=1) 93 | self.inner_blocks.append(inner_block_module) 94 | self.layer_blocks.append(layer_block_module) 95 | 96 | # initialize parameters now to avoid modifying the initialization of top_blocks 97 | for m in self.children(): 98 | if isinstance(m, nn.Conv2d): 99 | nn.init.kaiming_uniform_(m.weight, a=1) 100 | nn.init.constant_(m.bias, 0) 101 | 102 | self.extra_blocks = extra_blocks 103 | 104 | def get_result_from_inner_blocks(self, x: Tensor, idx: int) -> Tensor: 105 | """ 106 | This is equivalent to self.inner_blocks[idx](x), 107 | but torchscript doesn't support this yet 108 | """ 109 | num_blocks = len(self.inner_blocks) 110 | if idx < 0: 111 | idx += num_blocks 112 | i = 0 113 | out = x 114 | for module in self.inner_blocks: 115 | if i == idx: 116 | out = module(x) 117 | i += 1 118 | return out 119 | 120 | def get_result_from_layer_blocks(self, x: Tensor, idx: int) -> Tensor: 121 | """ 122 | This is equivalent to self.layer_blocks[idx](x), 123 | but torchscript doesn't support this yet 124 | """ 125 | num_blocks = len(self.layer_blocks) 126 | if idx < 0: 127 | idx += num_blocks 128 | i = 0 129 | out = x 130 | for module in self.layer_blocks: 131 | if i == idx: 132 | out = module(x) 133 | i += 1 134 | return out 135 | 136 | def forward(self, x: Dict[str, Tensor]) -> Dict[str, Tensor]: 137 | """ 138 | Computes the FPN for a set of feature maps. 139 | Arguments: 140 | x (OrderedDict[Tensor]): feature maps for each feature level. 141 | Returns: 142 | results (OrderedDict[Tensor]): feature maps after FPN layers. 143 | They are ordered from highest resolution first. 144 | """ 145 | # unpack OrderedDict into two lists for easier handling 146 | names = list(x.keys()) 147 | x = list(x.values()) 148 | 149 | # 将resnet layer4的channel调整到指定的out_channels 150 | # last_inner = self.inner_blocks[-1](x[-1]) 151 | last_inner = self.get_result_from_inner_blocks(x[-1], -1) 152 | # result中保存着每个预测特征层 153 | results = [] 154 | # 将layer4调整channel后的特征矩阵，通过3x3卷积后得到对应的预测特征矩阵 155 | # results.append(self.layer_blocks[-1](last_inner)) 156 | results.append(self.get_result_from_layer_blocks(last_inner, -1)) 157 | 158 | for idx in range(len(x) - 2, -1, -1): 159 | inner_lateral = self.get_result_from_inner_blocks(x[idx], idx) 160 | feat_shape = inner_lateral.shape[-2:] 161 | inner_top_down = F.interpolate(last_inner, size=feat_shape, mode="nearest") 162 | last_inner = inner_lateral + inner_top_down 163 | results.insert(0, self.get_result_from_layer_blocks(last_inner, idx)) 164 | 165 | # 在layer4对应的预测特征层基础上生成预测特征矩阵5 166 | if self.extra_blocks is not None: 167 | results, names = self.extra_blocks(results, x, names) 168 | 169 | # make it back an OrderedDict 170 | out = OrderedDict([(k, v) for k, v in zip(names, results)]) 171 | 172 | return out 173 | 174 | 175 | class LastLevelMaxPool(torch.nn.Module): 176 | """ 177 | Applies a max_pool2d on top of the last feature map 178 | """ 179 | 180 | def forward(self, x: List[Tensor], y: List[Tensor], names: List[str]) -> Tuple[List[Tensor], List[str]]: 181 | names.append("pool") 182 | x.append(F.max_pool2d(x[-1], 1, 2, 0)) # input, kernel_size, stride, padding 183 | return x, names 184 | 185 | 186 | class BackboneWithFPN(nn.Module): 187 | """ 188 | Adds a FPN on top of a model. 189 | Internally, it uses torchvision.models._utils.IntermediateLayerGetter to 190 | extract a submodel that returns the feature maps specified in return_layers. 191 | The same limitations of IntermediatLayerGetter apply here. 192 | Arguments: 193 | backbone (nn.Module) 194 | return_layers (Dict[name, new_name]): a dict containing the names 195 | of the modules for which the activations will be returned as 196 | the key of the dict, and the value of the dict is the name 197 | of the returned activation (which the user can specify). 198 | in_channels_list (List[int]): number of channels for each feature map 199 | that is returned, in the order they are present in the OrderedDict 200 | out_channels (int): number of channels in the FPN. 201 | extra_blocks: ExtraFPNBlock 202 | Attributes: 203 | out_channels (int): the number of channels in the FPN 204 | """ 205 | 206 | def __init__(self, 207 | backbone: nn.Module, 208 | return_layers=None, 209 | in_channels_list=None, 210 | out_channels=256, 211 | extra_blocks=None, 212 | re_getter=True): 213 | super().__init__() 214 | 215 | if extra_blocks is None: 216 | extra_blocks = LastLevelMaxPool() 217 | 218 | if re_getter is True: 219 | assert return_layers is not None 220 | self.body = IntermediateLayerGetter(backbone, return_layers=return_layers) 221 | else: 222 | self.body = backbone 223 | 224 | self.fpn = FeaturePyramidNetwork( 225 | in_channels_list=in_channels_list, 226 | out_channels=out_channels, 227 | extra_blocks=extra_blocks, 228 | ) 229 | 230 | self.out_channels = out_channels 231 | 232 | def forward(self, x): 233 | x = self.body(x) 234 | x = self.fpn(x) 235 | return x 236 | -------------------------------------------------------------------------------- /faster_rcnn/backbone/mobilenetv2_model.py: -------------------------------------------------------------------------------- 1 | from torch import nn 2 | import torch 3 | from torchvision.ops import misc 4 | 5 | 6 | def _make_divisible(ch, divisor=8, min_ch=None): 7 | """ 8 | This function is taken from the original tf repo. 9 | It ensures that all layers have a channel number that is divisible by 8 10 | It can be seen here: 11 | https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py 12 | """ 13 | if min_ch is None: 14 | min_ch = divisor 15 | new_ch = max(min_ch, int(ch + divisor / 2) // divisor * divisor) 16 | # Make sure that round down does not go down by more than 10%. 17 | if new_ch < 0.9 * ch: 18 | new_ch += divisor 19 | return new_ch 20 | 21 | 22 | class ConvBNReLU(nn.Sequential): 23 | def __init__(self, in_channel, out_channel, kernel_size=3, stride=1, groups=1, norm_layer=None): 24 | padding = (kernel_size - 1) // 2 25 | if norm_layer is None: 26 | norm_layer = nn.BatchNorm2d 27 | super(ConvBNReLU, self).__init__( 28 | nn.Conv2d(in_channel, out_channel, kernel_size, stride, padding, groups=groups, bias=False), 29 | norm_layer(out_channel), 30 | nn.ReLU6(inplace=True) 31 | ) 32 | 33 | 34 | class InvertedResidual(nn.Module): 35 | def __init__(self, in_channel, out_channel, stride, expand_ratio, norm_layer=None): 36 | super(InvertedResidual, self).__init__() 37 | hidden_channel = in_channel * expand_ratio 38 | self.use_shortcut = stride == 1 and in_channel == out_channel 39 | if norm_layer is None: 40 | norm_layer = nn.BatchNorm2d 41 | 42 | layers = [] 43 | if expand_ratio != 1: 44 | # 1x1 pointwise conv 45 | layers.append(ConvBNReLU(in_channel, hidden_channel, kernel_size=1, norm_layer=norm_layer)) 46 | layers.extend([ 47 | # 3x3 depthwise conv 48 | ConvBNReLU(hidden_channel, hidden_channel, stride=stride, groups=hidden_channel, norm_layer=norm_layer), 49 | # 1x1 pointwise conv(linear) 50 | nn.Conv2d(hidden_channel, out_channel, kernel_size=1, bias=False), 51 | norm_layer(out_channel), 52 | ]) 53 | 54 | self.conv = nn.Sequential(*layers) 55 | 56 | def forward(self, x): 57 | if self.use_shortcut: 58 | return x + self.conv(x) 59 | else: 60 | return self.conv(x) 61 | 62 | 63 | class MobileNetV2(nn.Module): 64 | def __init__(self, num_classes=1000, alpha=1.0, round_nearest=8, weights_path=None, norm_layer=None): 65 | super(MobileNetV2, self).__init__() 66 | block = InvertedResidual 67 | input_channel = _make_divisible(32 * alpha, round_nearest) 68 | last_channel = _make_divisible(1280 * alpha, round_nearest) 69 | 70 | if norm_layer is None: 71 | norm_layer = nn.BatchNorm2d 72 | 73 | inverted_residual_setting = [ 74 | # t, c, n, s 75 | [1, 16, 1, 1], 76 | [6, 24, 2, 2], 77 | [6, 32, 3, 2], 78 | [6, 64, 4, 2], 79 | [6, 96, 3, 1], 80 | [6, 160, 3, 2], 81 | [6, 320, 1, 1], 82 | ] 83 | 84 | features = [] 85 | # conv1 layer 86 | features.append(ConvBNReLU(3, input_channel, stride=2, norm_layer=norm_layer)) 87 | # building inverted residual residual blockes 88 | for t, c, n, s in inverted_residual_setting: 89 | output_channel = _make_divisible(c * alpha, round_nearest) 90 | for i in range(n): 91 | stride = s if i == 0 else 1 92 | features.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer)) 93 | input_channel = output_channel 94 | # building last several layers 95 | features.append(ConvBNReLU(input_channel, last_channel, 1, norm_layer=norm_layer)) 96 | # combine feature layers 97 | self.features = nn.Sequential(*features) 98 | 99 | # building classifier 100 | self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) 101 | self.classifier = nn.Sequential( 102 | nn.Dropout(0.2), 103 | nn.Linear(last_channel, num_classes) 104 | ) 105 | 106 | if weights_path is None: 107 | # weight initialization 108 | for m in self.modules(): 109 | if isinstance(m, nn.Conv2d): 110 | nn.init.kaiming_normal_(m.weight, mode='fan_out') 111 | if m.bias is not None: 112 | nn.init.zeros_(m.bias) 113 | elif isinstance(m, nn.BatchNorm2d): 114 | nn.init.ones_(m.weight) 115 | nn.init.zeros_(m.bias) 116 | elif isinstance(m, nn.Linear): 117 | nn.init.normal_(m.weight, 0, 0.01) 118 | nn.init.zeros_(m.bias) 119 | else: 120 | self.load_state_dict(torch.load(weights_path)) 121 | 122 | def forward(self, x): 123 | x = self.features(x) 124 | x = self.avgpool(x) 125 | x = torch.flatten(x, 1) 126 | x = self.classifier(x) 127 | return x 128 | -------------------------------------------------------------------------------- /faster_rcnn/backbone/resnet50_fpn_model.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import torch 4 | import torch.nn as nn 5 | from torchvision.ops.misc import FrozenBatchNorm2d 6 | 7 | from .feature_pyramid_network import BackboneWithFPN, LastLevelMaxPool 8 | 9 | 10 | class Bottleneck(nn.Module): 11 | expansion = 4 12 | 13 | def __init__(self, in_channel, out_channel, stride=1, downsample=None, norm_layer=None): 14 | super().__init__() 15 | if norm_layer is None: 16 | norm_layer = nn.BatchNorm2d 17 | 18 | self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel, 19 | kernel_size=1, stride=1, bias=False) # squeeze channels 20 | self.bn1 = norm_layer(out_channel) 21 | # ----------------------------------------- 22 | self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel, 23 | kernel_size=3, stride=stride, bias=False, padding=1) 24 | self.bn2 = norm_layer(out_channel) 25 | # ----------------------------------------- 26 | self.conv3 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel * self.expansion, 27 | kernel_size=1, stride=1, bias=False) # unsqueeze channels 28 | self.bn3 = norm_layer(out_channel * self.expansion) 29 | self.relu = nn.ReLU(inplace=True) 30 | self.downsample = downsample 31 | 32 | def forward(self, x): 33 | identity = x 34 | if self.downsample is not None: 35 | identity = self.downsample(x) 36 | 37 | out = self.conv1(x) 38 | out = self.bn1(out) 39 | out = self.relu(out) 40 | 41 | out = self.conv2(out) 42 | out = self.bn2(out) 43 | out = self.relu(out) 44 | 45 | out = self.conv3(out) 46 | out = self.bn3(out) 47 | 48 | out += identity 49 | out = self.relu(out) 50 | 51 | return out 52 | 53 | 54 | class ResNet(nn.Module): 55 | 56 | def __init__(self, block, blocks_num, num_classes=1000, include_top=True, norm_layer=None): 57 | super().__init__() 58 | if norm_layer is None: 59 | norm_layer = nn.BatchNorm2d 60 | self._norm_layer = norm_layer 61 | 62 | self.include_top = include_top 63 | self.in_channel = 64 64 | 65 | self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2, 66 | padding=3, bias=False) 67 | self.bn1 = norm_layer(self.in_channel) 68 | self.relu = nn.ReLU(inplace=True) 69 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 70 | self.layer1 = self._make_layer(block, 64, blocks_num[0]) 71 | self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2) 72 | self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2) 73 | self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2) 74 | if self.include_top: 75 | self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) # output size = (1, 1) 76 | self.fc = nn.Linear(512 * block.expansion, num_classes) 77 | 78 | for m in self.modules(): 79 | if isinstance(m, nn.Conv2d): 80 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 81 | 82 | def _make_layer(self, block, channel, block_num, stride=1): 83 | norm_layer = self._norm_layer 84 | downsample = None 85 | if stride != 1 or self.in_channel != channel * block.expansion: 86 | downsample = nn.Sequential( 87 | nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False), 88 | norm_layer(channel * block.expansion)) 89 | 90 | layers = [] 91 | layers.append(block(self.in_channel, channel, downsample=downsample, 92 | stride=stride, norm_layer=norm_layer)) 93 | self.in_channel = channel * block.expansion 94 | 95 | for _ in range(1, block_num): 96 | layers.append(block(self.in_channel, channel, norm_layer=norm_layer)) 97 | 98 | return nn.Sequential(*layers) 99 | 100 | def forward(self, x): 101 | x = self.conv1(x) 102 | x = self.bn1(x) 103 | x = self.relu(x) 104 | x = self.maxpool(x) 105 | 106 | x = self.layer1(x) 107 | x = self.layer2(x) 108 | x = self.layer3(x) 109 | x = self.layer4(x) 110 | 111 | if self.include_top: 112 | x = self.avgpool(x) 113 | x = torch.flatten(x, 1) 114 | x = self.fc(x) 115 | 116 | return x 117 | 118 | 119 | def overwrite_eps(model, eps): 120 | """ 121 | This method overwrites the default eps values of all the 122 | FrozenBatchNorm2d layers of the model with the provided value. 123 | This is necessary to address the BC-breaking change introduced 124 | by the bug-fix at pytorch/vision#2933. The overwrite is applied 125 | only when the pretrained weights are loaded to maintain compatibility 126 | with previous versions. 127 | 128 | Args: 129 | model (nn.Module): The model on which we perform the overwrite. 130 | eps (float): The new value of eps. 131 | """ 132 | for module in model.modules(): 133 | if isinstance(module, FrozenBatchNorm2d): 134 | module.eps = eps 135 | 136 | 137 | def resnet50_fpn_backbone(pretrain_path="", 138 | norm_layer=FrozenBatchNorm2d, # FrozenBatchNorm2d的功能与BatchNorm2d类似，但参数无法更新 139 | trainable_layers=3, 140 | returned_layers=None, 141 | extra_blocks=None): 142 | """ 143 | 搭建resnet50_fpn——backbone 144 | Args: 145 | pretrain_path: resnet50的预训练权重，如果不使用就默认为空 146 | norm_layer: 官方默认的是FrozenBatchNorm2d，即不会更新参数的bn层(因为如果batch_size设置的很小会导致效果更差，还不如不用bn层) 147 | 如果自己的GPU显存很大可以设置很大的batch_size，那么自己可以传入正常的BatchNorm2d层 148 | (https://github.com/facebookresearch/maskrcnn-benchmark/issues/267) 149 | trainable_layers: 指定训练哪些层结构 150 | returned_layers: 指定哪些层的输出需要返回 151 | extra_blocks: 在输出的特征层基础上额外添加的层结构 152 | 153 | Returns: 154 | 155 | """ 156 | resnet_backbone = ResNet(Bottleneck, [3, 4, 6, 3], 157 | include_top=False, 158 | norm_layer=norm_layer) 159 | 160 | if isinstance(norm_layer, FrozenBatchNorm2d): 161 | overwrite_eps(resnet_backbone, 0.0) 162 | 163 | if pretrain_path != "": 164 | assert os.path.exists(pretrain_path), "{} is not exist.".format(pretrain_path) 165 | # 载入预训练权重 166 | print(resnet_backbone.load_state_dict(torch.load(pretrain_path), strict=False)) 167 | 168 | # select layers that wont be frozen 169 | assert 0 <= trainable_layers <= 5 170 | layers_to_train = ['layer4', 'layer3', 'layer2', 'layer1', 'conv1'][:trainable_layers] 171 | 172 | # 如果要训练所有层结构的话，不要忘了conv1后还有一个bn1 173 | if trainable_layers == 5: 174 | layers_to_train.append("bn1") 175 | 176 | # freeze layers 177 | for name, parameter in resnet_backbone.named_parameters(): 178 | # 只训练不在layers_to_train列表中的层结构 179 | if all([not name.startswith(layer) for layer in layers_to_train]): 180 | parameter.requires_grad_(False) 181 | 182 | if extra_blocks is None: 183 | extra_blocks = LastLevelMaxPool() 184 | 185 | if returned_layers is None: 186 | returned_layers = [1, 2, 3, 4] 187 | # 返回的特征层个数肯定大于0小于5 188 | assert min(returned_layers) > 0 and max(returned_layers) < 5 189 | 190 | # return_layers = {'layer1': '0', 'layer2': '1', 'layer3': '2', 'layer4': '3'} 191 | return_layers = {f'layer{k}': str(v) for v, k in enumerate(returned_layers)} 192 | 193 | # in_channel 为layer4的输出特征矩阵channel = 2048 194 | in_channels_stage2 = resnet_backbone.in_channel // 8 # 256 195 | # 记录resnet50提供给fpn的每个特征层channel 196 | in_channels_list = [in_channels_stage2 * 2 ** (i - 1) for i in returned_layers] 197 | # 通过fpn后得到的每个特征层的channel 198 | out_channels = 256 199 | return BackboneWithFPN(resnet_backbone, return_layers, in_channels_list, out_channels, extra_blocks=extra_blocks) 200 | -------------------------------------------------------------------------------- /faster_rcnn/backbone/vgg_model.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | import torch 3 | 4 | 5 | class VGG(nn.Module): 6 | def __init__(self, features, class_num=1000, init_weights=False, weights_path=None): 7 | super(VGG, self).__init__() 8 | self.features = features 9 | self.classifier = nn.Sequential( 10 | nn.Linear(512*7*7, 4096), 11 | nn.ReLU(True), 12 | nn.Dropout(p=0.5), 13 | nn.Linear(4096, 4096), 14 | nn.ReLU(True), 15 | nn.Dropout(p=0.5), 16 | nn.Linear(4096, class_num) 17 | ) 18 | if init_weights and weights_path is None: 19 | self._initialize_weights() 20 | 21 | if weights_path is not None: 22 | self.load_state_dict(torch.load(weights_path)) 23 | 24 | def forward(self, x): 25 | # N x 3 x 224 x 224 26 | x = self.features(x) 27 | # N x 512 x 7 x 7 28 | x = torch.flatten(x, start_dim=1) 29 | # N x 512*7*7 30 | x = self.classifier(x) 31 | return x 32 | 33 | def _initialize_weights(self): 34 | for m in self.modules(): 35 | if isinstance(m, nn.Conv2d): 36 | # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 37 | nn.init.xavier_uniform_(m.weight) 38 | if m.bias is not None: 39 | nn.init.constant_(m.bias, 0) 40 | elif isinstance(m, nn.Linear): 41 | nn.init.xavier_uniform_(m.weight) 42 | # nn.init.normal_(m.weight, 0, 0.01) 43 | nn.init.constant_(m.bias, 0) 44 | 45 | 46 | def make_features(cfg: list): 47 | layers = [] 48 | in_channels = 3 49 | for v in cfg: 50 | if v == "M": 51 | layers += [nn.MaxPool2d(kernel_size=2, stride=2)] 52 | else: 53 | conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1) 54 | layers += [conv2d, nn.ReLU(True)] 55 | in_channels = v 56 | return nn.Sequential(*layers) 57 | 58 | 59 | cfgs = { 60 | 'vgg11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'], 61 | 'vgg13': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'], 62 | 'vgg16': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'], 63 | 'vgg19': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M', 512, 512, 512, 512, 'M'], 64 | } 65 | 66 | 67 | def vgg(model_name="vgg16", weights_path=None): 68 | assert model_name in cfgs, "Warning: model number {} not in cfgs dict!".format(model_name) 69 | cfg = cfgs[model_name] 70 | 71 | model = VGG(make_features(cfg), weights_path=weights_path) 72 | return model 73 | -------------------------------------------------------------------------------- /faster_rcnn/cityscrapes8_indices.json: -------------------------------------------------------------------------------- 1 | { 2 | "1": "car", 3 | "2": "truck", 4 | "3": "bus", 5 | "4": "caravan" 6 | } -------------------------------------------------------------------------------- /faster_rcnn/draw_box_utils.py: -------------------------------------------------------------------------------- 1 | from PIL.Image import Image, fromarray 2 | import PIL.ImageDraw as ImageDraw 3 | import PIL.ImageFont as ImageFont 4 | from PIL import ImageColor 5 | import numpy as np 6 | 7 | STANDARD_COLORS = [ 8 | 'AliceBlue', 'Chartreuse', 'Aqua', 'Aquamarine', 'Azure', 'Beige', 'Bisque', 9 | 'BlanchedAlmond', 'BlueViolet', 'BurlyWood', 'CadetBlue', 'AntiqueWhite', 10 | 'Chocolate', 'Coral', 'CornflowerBlue', 'Cornsilk', 'Crimson', 'Cyan', 11 | 'DarkCyan', 'DarkGoldenRod', 'DarkGrey', 'DarkKhaki', 'DarkOrange', 12 | 'DarkOrchid', 'DarkSalmon', 'DarkSeaGreen', 'DarkTurquoise', 'DarkViolet', 13 | 'DeepPink', 'DeepSkyBlue', 'DodgerBlue', 'FireBrick', 'FloralWhite', 14 | 'ForestGreen', 'Fuchsia', 'Gainsboro', 'GhostWhite', 'Gold', 'GoldenRod', 15 | 'Salmon', 'Tan', 'HoneyDew', 'HotPink', 'IndianRed', 'Ivory', 'Khaki', 16 | 'Lavender', 'LavenderBlush', 'LawnGreen', 'LemonChiffon', 'LightBlue', 17 | 'LightCoral', 'LightCyan', 'LightGoldenRodYellow', 'LightGray', 'LightGrey', 18 | 'LightGreen', 'LightPink', 'LightSalmon', 'LightSeaGreen', 'LightSkyBlue', 19 | 'LightSlateGray', 'LightSlateGrey', 'LightSteelBlue', 'LightYellow', 'Lime', 20 | 'LimeGreen', 'Linen', 'Magenta', 'MediumAquaMarine', 'MediumOrchid', 21 | 'MediumPurple', 'MediumSeaGreen', 'MediumSlateBlue', 'MediumSpringGreen', 22 | 'MediumTurquoise', 'MediumVioletRed', 'MintCream', 'MistyRose', 'Moccasin', 23 | 'NavajoWhite', 'OldLace', 'Olive', 'OliveDrab', 'Orange', 'OrangeRed', 24 | 'Orchid', 'PaleGoldenRod', 'PaleGreen', 'PaleTurquoise', 'PaleVioletRed', 25 | 'PapayaWhip', 'PeachPuff', 'Peru', 'Pink', 'Plum', 'PowderBlue', 'Purple', 26 | 'Red', 'RosyBrown', 'RoyalBlue', 'SaddleBrown', 'Green', 'SandyBrown', 27 | 'SeaGreen', 'SeaShell', 'Sienna', 'Silver', 'SkyBlue', 'SlateBlue', 28 | 'SlateGray', 'SlateGrey', 'Snow', 'SpringGreen', 'SteelBlue', 'GreenYellow', 29 | 'Teal', 'Thistle', 'Tomato', 'Turquoise', 'Violet', 'Wheat', 'White', 30 | 'WhiteSmoke', 'Yellow', 'YellowGreen' 31 | ] 32 | 33 | 34 | def draw_text(draw, 35 | box: list, 36 | cls: int, 37 | score: float, 38 | category_index: dict, 39 | color: str, 40 | font: str = 'arial.ttf', 41 | font_size: int = 24): 42 | """ 43 | 将目标边界框和类别信息绘制到图片上 44 | """ 45 | try: 46 | font = ImageFont.truetype(font, font_size) 47 | except IOError: 48 | font = ImageFont.load_default() 49 | 50 | left, top, right, bottom = box 51 | # If the total height of the display strings added to the top of the bounding 52 | # box exceeds the top of the image, stack the strings below the bounding box 53 | # instead of above. 54 | display_str = f"{category_index[str(cls)]}: {int(100 * score)}%" 55 | display_str_heights = [font.getsize(ds)[1] for ds in display_str] 56 | # Each display_str has a top and bottom margin of 0.05x. 57 | display_str_height = (1 + 2 * 0.05) * max(display_str_heights) 58 | 59 | if top > display_str_height: 60 | text_top = top - display_str_height 61 | text_bottom = top 62 | else: 63 | text_top = bottom 64 | text_bottom = bottom + display_str_height 65 | 66 | for ds in display_str: 67 | text_width, text_height = font.getsize(ds) 68 | margin = np.ceil(0.05 * text_width) 69 | draw.rectangle([(left, text_top), 70 | (left + text_width + 2 * margin, text_bottom)], fill=color) 71 | draw.text((left + margin, text_top), 72 | ds, 73 | fill='black', 74 | font=font) 75 | left += text_width 76 | 77 | 78 | def draw_masks(image, masks, colors, thresh: float = 0.7, alpha: float = 0.5): 79 | np_image = np.array(image) 80 | masks = np.where(masks > thresh, True, False) 81 | 82 | # colors = np.array(colors) 83 | img_to_draw = np.copy(np_image) 84 | # TODO: There might be a way to vectorize this 85 | for mask, color in zip(masks, colors): 86 | img_to_draw[mask] = color 87 | 88 | out = np_image * (1 - alpha) + img_to_draw * alpha 89 | return fromarray(out.astype(np.uint8)) 90 | 91 | 92 | def draw_objs(image: Image, 93 | boxes: np.ndarray = None, 94 | classes: np.ndarray = None, 95 | scores: np.ndarray = None, 96 | masks: np.ndarray = None, 97 | category_index: dict = None, 98 | box_thresh: float = 0.1, 99 | mask_thresh: float = 0.5, 100 | line_thickness: int = 8, 101 | font: str = 'arial.ttf', 102 | font_size: int = 24, 103 | draw_boxes_on_image: bool = True, 104 | draw_masks_on_image: bool = False): 105 | """ 106 | 将目标边界框信息，类别信息，mask信息绘制在图片上 107 | Args: 108 | image: 需要绘制的图片 109 | boxes: 目标边界框信息 110 | classes: 目标类别信息 111 | scores: 目标概率信息 112 | masks: 目标mask信息 113 | category_index: 类别与名称字典 114 | box_thresh: 过滤的概率阈值 115 | mask_thresh: 116 | line_thickness: 边界框宽度 117 | font: 字体类型 118 | font_size: 字体大小 119 | draw_boxes_on_image: 120 | draw_masks_on_image: 121 | 122 | Returns: 123 | 124 | """ 125 | 126 | # 过滤掉低概率的目标 127 | idxs = np.greater(scores, box_thresh) 128 | boxes = boxes[idxs] 129 | classes = classes[idxs] 130 | scores = scores[idxs] 131 | if masks is not None: 132 | masks = masks[idxs] 133 | if len(boxes) == 0: 134 | return image 135 | 136 | colors = [ImageColor.getrgb(STANDARD_COLORS[cls % len(STANDARD_COLORS)]) for cls in classes] 137 | 138 | if draw_boxes_on_image: 139 | # Draw all boxes onto image. 140 | draw = ImageDraw.Draw(image) 141 | for box, cls, score, color in zip(boxes, classes, scores, colors): 142 | left, top, right, bottom = box 143 | # 绘制目标边界框 144 | draw.line([(left, top), (left, bottom), (right, bottom), 145 | (right, top), (left, top)], width=line_thickness, fill=color) 146 | # 绘制类别和概率信息 147 | draw_text(draw, box.tolist(), int(cls), float(score), category_index, color, font, font_size) 148 | 149 | if draw_masks_on_image and (masks is not None): 150 | # Draw all mask onto image. 151 | image = draw_masks(image, masks, colors, mask_thresh) 152 | 153 | return image 154 | -------------------------------------------------------------------------------- /faster_rcnn/my_dataset.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from torch.utils.data import Dataset 3 | import os 4 | import torch 5 | import json 6 | from PIL import Image 7 | from lxml import etree 8 | 9 | 10 | class VOCDataSet(Dataset): 11 | """读取解析PASCAL VOC2007/2012数据集""" 12 | 13 | def __init__(self, voc_root, year="2012", transforms=None, txt_name: str = "train.txt"): 14 | assert year in ["2007", "2012"], "year must be in ['2007', '2012']" 15 | # 增加容错能力 16 | if "VOCdevkit" in voc_root: 17 | self.root = os.path.join(voc_root, f"VOC{year}") 18 | else: 19 | self.root = os.path.join(voc_root, "VOCdevkit", f"VOC{year}") 20 | self.img_root = os.path.join(self.root, "JPEGImages") 21 | self.annotations_root = os.path.join(self.root, "Annotations") 22 | 23 | # read train.txt or val.txt file 24 | txt_path = os.path.join(self.root, "ImageSets", "Main", txt_name) 25 | assert os.path.exists(txt_path), "not found {} file.".format(txt_name) 26 | 27 | with open(txt_path) as read: 28 | xml_list = [os.path.join(self.annotations_root, line.strip() + ".xml") 29 | for line in read.readlines() if len(line.strip()) > 0] 30 | 31 | self.xml_list = [] 32 | # check file 33 | for xml_path in xml_list: 34 | if os.path.exists(xml_path) is False: 35 | print(f"Warning: not found '{xml_path}', skip this annotation file.") 36 | continue 37 | 38 | # check for targets 39 | with open(xml_path) as fid: 40 | xml_str = fid.read() 41 | xml = etree.fromstring(xml_str) 42 | data = self.parse_xml_to_dict(xml)["annotation"] 43 | if "object" not in data: 44 | print(f"INFO: no objects in {xml_path}, skip this annotation file.") 45 | continue 46 | 47 | self.xml_list.append(xml_path) 48 | 49 | assert len(self.xml_list) > 0, "in '{}' file does not find any information.".format(txt_path) 50 | 51 | # read class_indict 52 | json_file = './pascal_voc_classes.json' 53 | assert os.path.exists(json_file), "{} file not exist.".format(json_file) 54 | with open(json_file, 'r') as f: 55 | self.class_dict = json.load(f) 56 | 57 | self.transforms = transforms 58 | 59 | def __len__(self): 60 | return len(self.xml_list) 61 | 62 | def __getitem__(self, idx): 63 | # read xml 64 | xml_path = self.xml_list[idx] 65 | with open(xml_path) as fid: 66 | xml_str = fid.read() 67 | xml = etree.fromstring(xml_str) 68 | data = self.parse_xml_to_dict(xml)["annotation"] 69 | img_path = os.path.join(self.img_root, data["filename"]) 70 | image = Image.open(img_path) 71 | if image.format != "JPEG": 72 | raise ValueError("Image '{}' format not JPEG".format(img_path)) 73 | 74 | boxes = [] 75 | labels = [] 76 | iscrowd = [] 77 | assert "object" in data, "{} lack of object information.".format(xml_path) 78 | for obj in data["object"]: 79 | xmin = float(obj["bndbox"]["xmin"]) 80 | xmax = float(obj["bndbox"]["xmax"]) 81 | ymin = float(obj["bndbox"]["ymin"]) 82 | ymax = float(obj["bndbox"]["ymax"]) 83 | 84 | # 进一步检查数据，有的标注信息中可能有w或h为0的情况，这样的数据会导致计算回归loss为nan 85 | if xmax <= xmin or ymax <= ymin: 86 | print("Warning: in '{}' xml, there are some bbox w/h <=0".format(xml_path)) 87 | continue 88 | 89 | boxes.append([xmin, ymin, xmax, ymax]) 90 | labels.append(self.class_dict[obj["name"]]) 91 | if "difficult" in obj: 92 | iscrowd.append(int(obj["difficult"])) 93 | else: 94 | iscrowd.append(0) 95 | 96 | # convert everything into a torch.Tensor 97 | boxes = torch.as_tensor(boxes, dtype=torch.float32) 98 | labels = torch.as_tensor(labels, dtype=torch.int64) 99 | iscrowd = torch.as_tensor(iscrowd, dtype=torch.int64) 100 | image_id = torch.tensor([idx]) 101 | area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]) 102 | 103 | target = {} 104 | target["boxes"] = boxes 105 | target["labels"] = labels 106 | target["image_id"] = image_id 107 | target["area"] = area 108 | target["iscrowd"] = iscrowd 109 | 110 | if self.transforms is not None: 111 | image, target = self.transforms(image, target) 112 | 113 | return image, target 114 | 115 | def get_height_and_width(self, idx): 116 | # read xml 117 | xml_path = self.xml_list[idx] 118 | with open(xml_path) as fid: 119 | xml_str = fid.read() 120 | xml = etree.fromstring(xml_str) 121 | data = self.parse_xml_to_dict(xml)["annotation"] 122 | data_height = int(data["size"]["height"]) 123 | data_width = int(data["size"]["width"]) 124 | return data_height, data_width 125 | 126 | def parse_xml_to_dict(self, xml): 127 | """ 128 | 将xml文件解析成字典形式，参考tensorflow的recursive_parse_xml_to_dict 129 | Args: 130 | xml: xml tree obtained by parsing XML file contents using lxml.etree 131 | 132 | Returns: 133 | Python dictionary holding XML contents. 134 | """ 135 | 136 | if len(xml) == 0: # 遍历到底层，直接返回tag对应的信息 137 | return {xml.tag: xml.text} 138 | 139 | result = {} 140 | for child in xml: 141 | child_result = self.parse_xml_to_dict(child) # 递归遍历标签信息 142 | if child.tag != 'object': 143 | result[child.tag] = child_result[child.tag] 144 | else: 145 | if child.tag not in result: # 因为object可能有多个，所以需要放入列表里 146 | result[child.tag] = [] 147 | result[child.tag].append(child_result[child.tag]) 148 | return {xml.tag: result} 149 | 150 | def coco_index(self, idx): 151 | """ 152 | 该方法是专门为pycocotools统计标签信息准备，不对图像和标签作任何处理 153 | 由于不用去读取图片，可大幅缩减统计时间 154 | 155 | Args: 156 | idx: 输入需要获取图像的索引 157 | """ 158 | # read xml 159 | xml_path = self.xml_list[idx] 160 | with open(xml_path) as fid: 161 | xml_str = fid.read() 162 | xml = etree.fromstring(xml_str) 163 | data = self.parse_xml_to_dict(xml)["annotation"] 164 | data_height = int(data["size"]["height"]) 165 | data_width = int(data["size"]["width"]) 166 | # img_path = os.path.join(self.img_root, data["filename"]) 167 | # image = Image.open(img_path) 168 | # if image.format != "JPEG": 169 | # raise ValueError("Image format not JPEG") 170 | boxes = [] 171 | labels = [] 172 | iscrowd = [] 173 | for obj in data["object"]: 174 | xmin = float(obj["bndbox"]["xmin"]) 175 | xmax = float(obj["bndbox"]["xmax"]) 176 | ymin = float(obj["bndbox"]["ymin"]) 177 | ymax = float(obj["bndbox"]["ymax"]) 178 | boxes.append([xmin, ymin, xmax, ymax]) 179 | labels.append(self.class_dict[obj["name"]]) 180 | iscrowd.append(int(obj["difficult"])) 181 | 182 | # convert everything into a torch.Tensor 183 | boxes = torch.as_tensor(boxes, dtype=torch.float32) 184 | labels = torch.as_tensor(labels, dtype=torch.int64) 185 | iscrowd = torch.as_tensor(iscrowd, dtype=torch.int64) 186 | image_id = torch.tensor([idx]) 187 | area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]) 188 | 189 | target = {} 190 | target["boxes"] = boxes 191 | target["labels"] = labels 192 | target["image_id"] = image_id 193 | target["area"] = area 194 | target["iscrowd"] = iscrowd 195 | 196 | return (data_height, data_width), target 197 | 198 | @staticmethod 199 | def collate_fn(batch): 200 | return tuple(zip(*batch)) 201 | 202 | # import transforms 203 | # from draw_box_utils import draw_objs 204 | # from PIL import Image 205 | # import json 206 | # import matplotlib.pyplot as plt 207 | # import torchvision.transforms as ts 208 | # import random 209 | # 210 | # # read class_indict 211 | # category_index = {} 212 | # try: 213 | # json_file = open('./pascal_voc_classes.json', 'r') 214 | # class_dict = json.load(json_file) 215 | # category_index = {str(v): str(k) for k, v in class_dict.items()} 216 | # except Exception as e: 217 | # print(e) 218 | # exit(-1) 219 | # 220 | # data_transform = { 221 | # "train": transforms.Compose([transforms.ToTensor(), 222 | # transforms.RandomHorizontalFlip(0.5)]), 223 | # "val": transforms.Compose([transforms.ToTensor()]) 224 | # } 225 | # 226 | # # load train data set 227 | # train_data_set = VOCDataSet(os.getcwd(), "2012", data_transform["train"], "train.txt") 228 | # print(len(train_data_set)) 229 | # for index in random.sample(range(0, len(train_data_set)), k=5): 230 | # img, target = train_data_set[index] 231 | # img = ts.ToPILImage()(img) 232 | # plot_img = draw_objs(img, 233 | # target["boxes"].numpy(), 234 | # target["labels"].numpy(), 235 | # np.ones(target["labels"].shape[0]), 236 | # category_index=category_index, 237 | # box_thresh=0.5, 238 | # line_thickness=3, 239 | # font='arial.ttf', 240 | # font_size=20) 241 | # plt.imshow(plot_img) 242 | # plt.show() 243 | -------------------------------------------------------------------------------- /faster_rcnn/network_files/__init__.py: -------------------------------------------------------------------------------- 1 | from .faster_rcnn_framework import FasterRCNN, FastRCNNPredictor 2 | from .rpn_function import AnchorsGenerator 3 | -------------------------------------------------------------------------------- /faster_rcnn/network_files/boxes.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from typing import Tuple 3 | from torch import Tensor 4 | import torchvision 5 | 6 | 7 | def nms(boxes, scores, iou_threshold): 8 | # type: (Tensor, Tensor, float) -> Tensor 9 | """ 10 | Performs non-maximum suppression (NMS) on the boxes according 11 | to their intersection-over-union (IoU). 12 | 13 | NMS iteratively removes lower scoring boxes which have an 14 | IoU greater than iou_threshold with another (higher scoring) 15 | box. 16 | 17 | Parameters 18 | ---------- 19 | boxes : Tensor[N, 4]) 20 | boxes to perform NMS on. They 21 | are expected to be in (x1, y1, x2, y2) format 22 | scores : Tensor[N] 23 | scores for each one of the boxes 24 | iou_threshold : float 25 | discards all overlapping 26 | boxes with IoU > iou_threshold 27 | 28 | Returns 29 | ------- 30 | keep : Tensor 31 | int64 tensor with the indices 32 | of the elements that have been kept 33 | by NMS, sorted in decreasing order of scores 34 | """ 35 | return torch.ops.torchvision.nms(boxes, scores, iou_threshold) 36 | 37 | 38 | def batched_nms(boxes, scores, idxs, iou_threshold): 39 | # type: (Tensor, Tensor, Tensor, float) -> Tensor 40 | """ 41 | Performs non-maximum suppression in a batched fashion. 42 | 43 | Each index value correspond to a category, and NMS 44 | will not be applied between elements of different categories. 45 | 46 | Parameters 47 | ---------- 48 | boxes : Tensor[N, 4] 49 | boxes where NMS will be performed. They 50 | are expected to be in (x1, y1, x2, y2) format 51 | scores : Tensor[N] 52 | scores for each one of the boxes 53 | idxs : Tensor[N] 54 | indices of the categories for each one of the boxes. 55 | iou_threshold : float 56 | discards all overlapping boxes 57 | with IoU < iou_threshold 58 | 59 | Returns 60 | ------- 61 | keep : Tensor 62 | int64 tensor with the indices of 63 | the elements that have been kept by NMS, sorted 64 | in decreasing order of scores 65 | """ 66 | if boxes.numel() == 0: 67 | return torch.empty((0,), dtype=torch.int64, device=boxes.device) 68 | 69 | # strategy: in order to perform NMS independently per class. 70 | # we add an offset to all the boxes. The offset is dependent 71 | # only on the class idx, and is large enough so that boxes 72 | # from different classes do not overlap 73 | # 获取所有boxes中最大的坐标值（xmin, ymin, xmax, ymax） 74 | max_coordinate = boxes.max() 75 | 76 | # to(): Performs Tensor dtype and/or device conversion 77 | # 为每一个类别/每一层生成一个很大的偏移量 78 | # 这里的to只是让生成tensor的dytpe和device与boxes保持一致 79 | offsets = idxs.to(boxes) * (max_coordinate + 1) 80 | # boxes加上对应层的偏移量后，保证不同类别/层之间boxes不会有重合的现象 81 | boxes_for_nms = boxes + offsets[:, None] 82 | keep = nms(boxes_for_nms, scores, iou_threshold) 83 | return keep 84 | 85 | 86 | def remove_small_boxes(boxes, min_size): 87 | # type: (Tensor, float) -> Tensor 88 | """ 89 | Remove boxes which contains at least one side smaller than min_size. 90 | 移除宽高小于指定阈值的索引 91 | Arguments: 92 | boxes (Tensor[N, 4]): boxes in (x1, y1, x2, y2) format 93 | min_size (float): minimum size 94 | 95 | Returns: 96 | keep (Tensor[K]): indices of the boxes that have both sides 97 | larger than min_size 98 | """ 99 | ws, hs = boxes[:, 2] - boxes[:, 0], boxes[:, 3] - boxes[:, 1] # 预测boxes的宽和高 100 | # keep = (ws >= min_size) & (hs >= min_size) # 当满足宽，高都大于给定阈值时为True 101 | keep = torch.logical_and(torch.ge(ws, min_size), torch.ge(hs, min_size)) 102 | # nonzero(): Returns a tensor containing the indices of all non-zero elements of input 103 | # keep = keep.nonzero().squeeze(1) 104 | keep = torch.where(keep)[0] 105 | return keep 106 | 107 | 108 | def clip_boxes_to_image(boxes, size): 109 | # type: (Tensor, Tuple[int, int]) -> Tensor 110 | """ 111 | Clip boxes so that they lie inside an image of size `size`. 112 | 裁剪预测的boxes信息，将越界的坐标调整到图片边界上 113 | 114 | Arguments: 115 | boxes (Tensor[N, 4]): boxes in (x1, y1, x2, y2) format 116 | size (Tuple[height, width]): size of the image 117 | 118 | Returns: 119 | clipped_boxes (Tensor[N, 4]) 120 | """ 121 | dim = boxes.dim() 122 | boxes_x = boxes[..., 0::2] # x1, x2 123 | boxes_y = boxes[..., 1::2] # y1, y2 124 | height, width = size 125 | 126 | if torchvision._is_tracing(): 127 | boxes_x = torch.max(boxes_x, torch.tensor(0, dtype=boxes.dtype, device=boxes.device)) 128 | boxes_x = torch.min(boxes_x, torch.tensor(width, dtype=boxes.dtype, device=boxes.device)) 129 | boxes_y = torch.max(boxes_y, torch.tensor(0, dtype=boxes.dtype, device=boxes.device)) 130 | boxes_y = torch.min(boxes_y, torch.tensor(height, dtype=boxes.dtype, device=boxes.device)) 131 | else: 132 | boxes_x = boxes_x.clamp(min=0, max=width) # 限制x坐标范围在[0,width]之间 133 | boxes_y = boxes_y.clamp(min=0, max=height) # 限制y坐标范围在[0,height]之间 134 | 135 | clipped_boxes = torch.stack((boxes_x, boxes_y), dim=dim) 136 | return clipped_boxes.reshape(boxes.shape) 137 | 138 | 139 | def box_area(boxes): 140 | """ 141 | Computes the area of a set of bounding boxes, which are specified by its 142 | (x1, y1, x2, y2) coordinates. 143 | 144 | Arguments: 145 | boxes (Tensor[N, 4]): boxes for which the area will be computed. They 146 | are expected to be in (x1, y1, x2, y2) format 147 | 148 | Returns: 149 | area (Tensor[N]): area for each box 150 | """ 151 | return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1]) 152 | 153 | 154 | def box_iou(boxes1, boxes2): 155 | """ 156 | Return intersection-over-union (Jaccard index) of boxes. 157 | 158 | Both sets of boxes are expected to be in (x1, y1, x2, y2) format. 159 | 160 | Arguments: 161 | boxes1 (Tensor[N, 4]) 162 | boxes2 (Tensor[M, 4]) 163 | 164 | Returns: 165 | iou (Tensor[N, M]): the NxM matrix containing the pairwise 166 | IoU values for every element in boxes1 and boxes2 167 | """ 168 | area1 = box_area(boxes1) 169 | area2 = box_area(boxes2) 170 | 171 | # When the shapes do not match, 172 | # the shape of the returned output tensor follows the broadcasting rules 173 | lt = torch.max(boxes1[:, None, :2], boxes2[:, :2]) # left-top [N,M,2] 174 | rb = torch.min(boxes1[:, None, 2:], boxes2[:, 2:]) # right-bottom [N,M,2] 175 | 176 | wh = (rb - lt).clamp(min=0) # [N,M,2] 177 | inter = wh[:, :, 0] * wh[:, :, 1] # [N,M] 178 | 179 | iou = inter / (area1[:, None] + area2 - inter) 180 | return iou 181 | 182 | -------------------------------------------------------------------------------- /faster_rcnn/network_files/image_list.py: -------------------------------------------------------------------------------- 1 | from typing import List, Tuple 2 | from torch import Tensor 3 | 4 | 5 | class ImageList(object): 6 | """ 7 | Structure that holds a list of images (of possibly 8 | varying sizes) as a single tensor. 9 | This works by padding the images to the same size, 10 | and storing in a field the original sizes of each image 11 | """ 12 | 13 | def __init__(self, tensors, image_sizes): 14 | # type: (Tensor, List[Tuple[int, int]]) -> None 15 | """ 16 | Arguments: 17 | tensors (tensor) padding后的图像数据 18 | image_sizes (list[tuple[int, int]]) padding前的图像尺寸 19 | """ 20 | self.tensors = tensors 21 | self.image_sizes = image_sizes 22 | 23 | def to(self, device): 24 | # type: (Device) -> ImageList # noqa 25 | cast_tensor = self.tensors.to(device) 26 | return ImageList(cast_tensor, self.image_sizes) 27 | 28 | -------------------------------------------------------------------------------- /faster_rcnn/pascal_voc_classes.json: -------------------------------------------------------------------------------- 1 | { 2 | "aeroplane": 1, 3 | "bicycle": 2, 4 | "bird": 3, 5 | "boat": 4, 6 | "bottle": 5, 7 | "bus": 6, 8 | "car": 7, 9 | "cat": 8, 10 | "chair": 9, 11 | "cow": 10, 12 | "diningtable": 11, 13 | "dog": 12, 14 | "horse": 13, 15 | "motorbike": 14, 16 | "person": 15, 17 | "pottedplant": 16, 18 | "sheep": 17, 19 | "sofa": 18, 20 | "train": 19, 21 | "tvmonitor": 20 22 | } -------------------------------------------------------------------------------- /faster_rcnn/plot_curve.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | import matplotlib.pyplot as plt 3 | 4 | 5 | def plot_loss_and_lr(train_loss, learning_rate): 6 | try: 7 | x = list(range(len(train_loss))) 8 | fig, ax1 = plt.subplots(1, 1) 9 | ax1.plot(x, train_loss, 'r', label='loss') 10 | ax1.set_xlabel("step") 11 | ax1.set_ylabel("loss") 12 | ax1.set_title("Train Loss and lr") 13 | plt.legend(loc='best') 14 | 15 | ax2 = ax1.twinx() 16 | ax2.plot(x, learning_rate, label='lr') 17 | ax2.set_ylabel("learning rate") 18 | ax2.set_xlim(0, len(train_loss)) # 设置横坐标整数间隔 19 | plt.legend(loc='best') 20 | 21 | handles1, labels1 = ax1.get_legend_handles_labels() 22 | handles2, labels2 = ax2.get_legend_handles_labels() 23 | plt.legend(handles1 + handles2, labels1 + labels2, loc='upper right') 24 | 25 | fig.subplots_adjust(right=0.8) # 防止出现保存图片显示不全的情况 26 | fig.savefig('./loss_and_lr{}.png'.format(datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))) 27 | plt.close() 28 | print("successful save loss curve! ") 29 | except Exception as e: 30 | print(e) 31 | 32 | 33 | def plot_map(mAP): 34 | try: 35 | x = list(range(len(mAP))) 36 | plt.plot(x, mAP, label='mAp') 37 | plt.xlabel('epoch') 38 | plt.ylabel('mAP') 39 | plt.title('Eval mAP') 40 | plt.xlim(0, len(mAP)) 41 | plt.legend(loc='best') 42 | plt.savefig('./mAP.png') 43 | plt.close() 44 | print("successful save mAP curve!") 45 | except Exception as e: 46 | print(e) 47 | -------------------------------------------------------------------------------- /faster_rcnn/predict.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import json 4 | 5 | import torch 6 | import torchvision 7 | from PIL import Image 8 | import matplotlib.pyplot as plt 9 | 10 | from torchvision import transforms 11 | from network_files import FasterRCNN, FastRCNNPredictor, AnchorsGenerator 12 | from backbone import resnet50_fpn_backbone, MobileNetV2 13 | from draw_box_utils import draw_objs 14 | 15 | 16 | def create_model(num_classes): 17 | # mobileNetv2+faster_RCNN 18 | # backbone = MobileNetV2().features 19 | # backbone.out_channels = 1280 20 | # 21 | # anchor_generator = AnchorsGenerator(sizes=((32, 64, 128, 256, 512),), 22 | # aspect_ratios=((0.5, 1.0, 2.0),)) 23 | # 24 | # roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'], 25 | # output_size=[7, 7], 26 | # sampling_ratio=2) 27 | # 28 | # model = FasterRCNN(backbone=backbone, 29 | # num_classes=num_classes, 30 | # rpn_anchor_generator=anchor_generator, 31 | # box_roi_pool=roi_pooler) 32 | 33 | # resNet50+fpn+faster_RCNN 34 | # 注意，这里的norm_layer要和训练脚本中保持一致 35 | backbone = resnet50_fpn_backbone(norm_layer=torch.nn.BatchNorm2d) 36 | model = FasterRCNN(backbone=backbone, num_classes=num_classes, rpn_score_thresh=0.5) 37 | 38 | return model 39 | 40 | 41 | def time_synchronized(): 42 | torch.cuda.synchronize() if torch.cuda.is_available() else None 43 | return time.time() 44 | 45 | 46 | def main(): 47 | # get devices 48 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 49 | print("using {} device.".format(device)) 50 | 51 | # create model 52 | model = create_model(num_classes=21) 53 | 54 | # load train weights 55 | weights_path = "./save_weights/model.pth" 56 | assert os.path.exists(weights_path), "{} file dose not exist.".format(weights_path) 57 | weights_dict = torch.load(weights_path, map_location='cpu') 58 | weights_dict = weights_dict["model"] if "model" in weights_dict else weights_dict 59 | model.load_state_dict(weights_dict) 60 | model.to(device) 61 | 62 | # read class_indict 63 | label_json_path = './pascal_voc_classes.json' 64 | assert os.path.exists(label_json_path), "json file {} dose not exist.".format(label_json_path) 65 | with open(label_json_path, 'r') as f: 66 | class_dict = json.load(f) 67 | 68 | category_index = {str(v): str(k) for k, v in class_dict.items()} 69 | 70 | # load image 71 | original_img = Image.open("./test.jpg") 72 | 73 | # from pil image to tensor, do not normalize image 74 | data_transform = transforms.Compose([transforms.ToTensor()]) 75 | img = data_transform(original_img) 76 | # expand batch dimension 77 | img = torch.unsqueeze(img, dim=0) 78 | 79 | model.eval() # 进入验证模式 80 | with torch.no_grad(): 81 | # init 82 | img_height, img_width = img.shape[-2:] 83 | init_img = torch.zeros((1, 3, img_height, img_width), device=device) 84 | model(init_img) 85 | 86 | t_start = time_synchronized() 87 | predictions = model(img.to(device))[0] 88 | t_end = time_synchronized() 89 | print("inference+NMS time: {}".format(t_end - t_start)) 90 | 91 | predict_boxes = predictions["boxes"].to("cpu").numpy() 92 | predict_classes = predictions["labels"].to("cpu").numpy() 93 | predict_scores = predictions["scores"].to("cpu").numpy() 94 | 95 | if len(predict_boxes) == 0: 96 | print("没有检测到任何目标!") 97 | 98 | plot_img = draw_objs(original_img, 99 | predict_boxes, 100 | predict_classes, 101 | predict_scores, 102 | category_index=category_index, 103 | box_thresh=0.5, 104 | line_thickness=3, 105 | font='arial.ttf', 106 | font_size=20) 107 | plt.imshow(plot_img) 108 | plt.show() 109 | # 保存预测的图片结果 110 | plot_img.save("test_result.jpg") 111 | 112 | 113 | if __name__ == '__main__': 114 | main() 115 | -------------------------------------------------------------------------------- /faster_rcnn/record_mAP.txt: -------------------------------------------------------------------------------- 1 | COCO results: 2 | Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.526 3 | Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.804 4 | Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.586 5 | Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.211 6 | Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.403 7 | Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.580 8 | Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.454 9 | Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.639 10 | Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.646 11 | Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.347 12 | Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.540 13 | Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.693 14 | 15 | mAP(IoU=0.5) for each category: 16 | aeroplane : 0.8759546352558178 17 | bicycle : 0.8554609242543677 18 | bird : 0.8434943725365999 19 | boat : 0.6753024837855667 20 | bottle : 0.7185899054232459 21 | bus : 0.8691082170432654 22 | car : 0.8771002682431779 23 | cat : 0.9169138943375639 24 | chair : 0.6403466317122392 25 | cow : 0.8285552434280278 26 | diningtable : 0.6437938565684241 27 | dog : 0.8745793980119227 28 | horse : 0.8718238708874728 29 | motorbike : 0.8910672301923952 30 | person : 0.9047338725598096 31 | pottedplant : 0.5808810399193133 32 | sheep : 0.86045368568359 33 | sofa : 0.7239390963388067 34 | train : 0.8652277764020805 35 | tvmonitor : 0.7683550206571649 -------------------------------------------------------------------------------- /faster_rcnn/requirements.txt: -------------------------------------------------------------------------------- 1 | lxml 2 | matplotlib 3 | numpy 4 | tqdm 5 | torch==1.7.1 6 | torchvision==0.8.2 7 | pycocotools 8 | Pillow 9 | -------------------------------------------------------------------------------- /faster_rcnn/split_data.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | 4 | 5 | def main(): 6 | random.seed(0) # 设置随机种子，保证随机结果可复现 7 | 8 | files_path = "./VOCdevkit/VOC2012/Annotations" 9 | assert os.path.exists(files_path), "path: '{}' does not exist.".format(files_path) 10 | 11 | val_rate = 0.5 12 | 13 | files_name = sorted([file.split(".")[0] for file in os.listdir(files_path)]) 14 | files_num = len(files_name) 15 | val_index = random.sample(range(0, files_num), k=int(files_num*val_rate)) 16 | train_files = [] 17 | val_files = [] 18 | for index, file_name in enumerate(files_name): 19 | if index in val_index: 20 | val_files.append(file_name) 21 | else: 22 | train_files.append(file_name) 23 | 24 | try: 25 | train_f = open("train.txt", "x") 26 | eval_f = open("val.txt", "x") 27 | train_f.write("\n".join(train_files)) 28 | eval_f.write("\n".join(val_files)) 29 | except FileExistsError as e: 30 | print(e) 31 | exit(1) 32 | 33 | 34 | if __name__ == '__main__': 35 | main() 36 | -------------------------------------------------------------------------------- /faster_rcnn/train_mobilenetv2.py: -------------------------------------------------------------------------------- 1 | import os 2 | import datetime 3 | 4 | import torch 5 | import torchvision 6 | 7 | import transforms 8 | from network_files import FasterRCNN, AnchorsGenerator 9 | from backbone import MobileNetV2, vgg 10 | from my_dataset import VOCDataSet 11 | from train_utils import GroupedBatchSampler, create_aspect_ratio_groups 12 | from train_utils import train_eval_utils as utils 13 | 14 | 15 | def create_model(num_classes): 16 | # https://download.pytorch.org/models/vgg16-397923af.pth 17 | # 如果使用vgg16的话就下载对应预训练权重并取消下面注释，接着把mobilenetv2模型对应的两行代码注释掉 18 | # vgg_feature = vgg(model_name="vgg16", weights_path="./backbone/vgg16.pth").features 19 | # backbone = torch.nn.Sequential(*list(vgg_feature._modules.values())[:-1]) # 删除features中最后一个Maxpool层 20 | # backbone.out_channels = 512 21 | 22 | # https://download.pytorch.org/models/mobilenet_v2-b0353104.pth 23 | backbone = MobileNetV2(weights_path="./backbone/mobilenet_v2.pth").features 24 | backbone.out_channels = 1280 # 设置对应backbone输出特征矩阵的channels 25 | 26 | anchor_generator = AnchorsGenerator(sizes=((32, 64, 128, 256, 512),), 27 | aspect_ratios=((0.5, 1.0, 2.0),)) 28 | 29 | roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'], # 在哪些特征层上进行roi pooling 30 | output_size=[7, 7], # roi_pooling输出特征矩阵尺寸 31 | sampling_ratio=2) # 采样率 32 | 33 | model = FasterRCNN(backbone=backbone, 34 | num_classes=num_classes, 35 | rpn_anchor_generator=anchor_generator, 36 | box_roi_pool=roi_pooler) 37 | 38 | return model 39 | 40 | 41 | def main(): 42 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 43 | print("Using {} device training.".format(device.type)) 44 | 45 | # 用来保存coco_info的文件 46 | results_file = "results{}.txt".format(datetime.datetime.now().strftime("%Y%m%d-%H%M%S")) 47 | 48 | # 检查保存权重文件夹是否存在，不存在则创建 49 | if not os.path.exists("save_weights"): 50 | os.makedirs("save_weights") 51 | 52 | data_transform = { 53 | "train": transforms.Compose([transforms.ToTensor(), 54 | transforms.RandomHorizontalFlip(0.5)]), 55 | "val": transforms.Compose([transforms.ToTensor()]) 56 | } 57 | 58 | VOC_root = "./" # VOCdevkit 59 | aspect_ratio_group_factor = 3 60 | batch_size = 8 61 | amp = False # 是否使用混合精度训练，需要GPU支持 62 | 63 | # check voc root 64 | if os.path.exists(os.path.join(VOC_root, "VOCdevkit")) is False: 65 | raise FileNotFoundError("VOCdevkit dose not in path:'{}'.".format(VOC_root)) 66 | 67 | # load train data set 68 | # VOCdevkit -> VOC2012 -> ImageSets -> Main -> train.txt 69 | train_dataset = VOCDataSet(VOC_root, "2012", data_transform["train"], "train.txt") 70 | train_sampler = None 71 | 72 | # 是否按图片相似高宽比采样图片组成batch 73 | # 使用的话能够减小训练时所需GPU显存，默认使用 74 | if aspect_ratio_group_factor >= 0: 75 | train_sampler = torch.utils.data.RandomSampler(train_dataset) 76 | # 统计所有图像高宽比例在bins区间中的位置索引 77 | group_ids = create_aspect_ratio_groups(train_dataset, k=aspect_ratio_group_factor) 78 | # 每个batch图片从同一高宽比例区间中取 79 | train_batch_sampler = GroupedBatchSampler(train_sampler, group_ids, batch_size) 80 | 81 | nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers 82 | print('Using %g dataloader workers' % nw) 83 | 84 | # 注意这里的collate_fn是自定义的，因为读取的数据包括image和targets，不能直接使用默认的方法合成batch 85 | if train_sampler: 86 | # 如果按照图片高宽比采样图片，dataloader中需要使用batch_sampler 87 | train_data_loader = torch.utils.data.DataLoader(train_dataset, 88 | batch_sampler=train_batch_sampler, 89 | pin_memory=True, 90 | num_workers=nw, 91 | collate_fn=train_dataset.collate_fn) 92 | else: 93 | train_data_loader = torch.utils.data.DataLoader(train_dataset, 94 | batch_size=batch_size, 95 | shuffle=True, 96 | pin_memory=True, 97 | num_workers=nw, 98 | collate_fn=train_dataset.collate_fn) 99 | 100 | # load validation data set 101 | # VOCdevkit -> VOC2012 -> ImageSets -> Main -> val.txt 102 | val_dataset = VOCDataSet(VOC_root, "2012", data_transform["val"], "val.txt") 103 | val_data_loader = torch.utils.data.DataLoader(val_dataset, 104 | batch_size=1, 105 | shuffle=False, 106 | pin_memory=True, 107 | num_workers=nw, 108 | collate_fn=val_dataset.collate_fn) 109 | 110 | # create model num_classes equal background + 20 classes 111 | model = create_model(num_classes=21) 112 | # print(model) 113 | 114 | model.to(device) 115 | 116 | scaler = torch.cuda.amp.GradScaler() if amp else None 117 | 118 | train_loss = [] 119 | learning_rate = [] 120 | val_map = [] 121 | 122 | # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 123 | # first frozen backbone and train 5 epochs # 124 | # 首先冻结前置特征提取网络权重（backbone），训练rpn以及最终预测网络部分 # 125 | # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # 126 | for param in model.backbone.parameters(): 127 | param.requires_grad = False 128 | 129 | # define optimizer 130 | params = [p for p in model.parameters() if p.requires_grad] 131 | optimizer = torch.optim.SGD(params, lr=0.005, 132 | momentum=0.9, weight_decay=0.0005) 133 | 134 | init_epochs = 5 135 | for epoch in range(init_epochs): 136 | # train for one epoch, printing every 10 iterations 137 | mean_loss, lr = utils.train_one_epoch(model, optimizer, train_data_loader, 138 | device, epoch, print_freq=50, 139 | warmup=True, scaler=scaler) 140 | train_loss.append(mean_loss.item()) 141 | learning_rate.append(lr) 142 | 143 | # evaluate on the test dataset 144 | coco_info = utils.evaluate(model, val_data_loader, device=device) 145 | 146 | # write into txt 147 | with open(results_file, "a") as f: 148 | # 写入的数据包括coco指标还有loss和learning rate 149 | result_info = [f"{i:.4f}" for i in coco_info + [mean_loss.item()]] + [f"{lr:.6f}"] 150 | txt = "epoch:{} {}".format(epoch, ' '.join(result_info)) 151 | f.write(txt + "\n") 152 | 153 | val_map.append(coco_info[1]) # pascal mAP 154 | 155 | torch.save(model.state_dict(), "./save_weights/pretrain.pth") 156 | 157 | # # # # # # # # # # # # # # # # # # # # # # # # # # # # 158 | # second unfrozen backbone and train all network # 159 | # 解冻前置特征提取网络权重（backbone），接着训练整个网络权重 # 160 | # # # # # # # # # # # # # # # # # # # # # # # # # # # # 161 | 162 | # 冻结backbone部分底层权重 163 | for name, parameter in model.backbone.named_parameters(): 164 | split_name = name.split(".")[0] 165 | if split_name in ["0", "1", "2", "3"]: 166 | parameter.requires_grad = False 167 | else: 168 | parameter.requires_grad = True 169 | 170 | # define optimizer 171 | params = [p for p in model.parameters() if p.requires_grad] 172 | optimizer = torch.optim.SGD(params, lr=0.005, 173 | momentum=0.9, weight_decay=0.0005) 174 | # learning rate scheduler 175 | lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 176 | step_size=3, 177 | gamma=0.33) 178 | num_epochs = 20 179 | for epoch in range(init_epochs, num_epochs+init_epochs, 1): 180 | # train for one epoch, printing every 50 iterations 181 | mean_loss, lr = utils.train_one_epoch(model, optimizer, train_data_loader, 182 | device, epoch, print_freq=50, 183 | warmup=True, scaler=scaler) 184 | train_loss.append(mean_loss.item()) 185 | learning_rate.append(lr) 186 | 187 | # update the learning rate 188 | lr_scheduler.step() 189 | 190 | # evaluate on the test dataset 191 | coco_info = utils.evaluate(model, val_data_loader, device=device) 192 | 193 | # write into txt 194 | with open(results_file, "a") as f: 195 | # 写入的数据包括coco指标还有loss和learning rate 196 | result_info = [f"{i:.4f}" for i in coco_info + [mean_loss.item()]] + [f"{lr:.6f}"] 197 | txt = "epoch:{} {}".format(epoch, ' '.join(result_info)) 198 | f.write(txt + "\n") 199 | 200 | val_map.append(coco_info[1]) # pascal mAP 201 | 202 | # save weights 203 | # 仅保存最后5个epoch的权重 204 | if epoch in range(num_epochs+init_epochs)[-5:]: 205 | save_files = { 206 | 'model': model.state_dict(), 207 | 'optimizer': optimizer.state_dict(), 208 | 'lr_scheduler': lr_scheduler.state_dict(), 209 | 'epoch': epoch} 210 | torch.save(save_files, "./save_weights/mobile-model-{}.pth".format(epoch)) 211 | 212 | # plot loss and lr curve 213 | if len(train_loss) != 0 and len(learning_rate) != 0: 214 | from plot_curve import plot_loss_and_lr 215 | plot_loss_and_lr(train_loss, learning_rate) 216 | 217 | # plot mAP curve 218 | if len(val_map) != 0: 219 | from plot_curve import plot_map 220 | plot_map(val_map) 221 | 222 | 223 | if __name__ == "__main__": 224 | main() 225 | -------------------------------------------------------------------------------- /faster_rcnn/train_res50_fpn.py: -------------------------------------------------------------------------------- 1 | import os 2 | import datetime 3 | 4 | import torch 5 | 6 | import transforms 7 | from network_files import FasterRCNN, FastRCNNPredictor 8 | from backbone import resnet50_fpn_backbone 9 | from cityscrayp import VOCDataSet 10 | from train_utils import GroupedBatchSampler, create_aspect_ratio_groups 11 | from train_utils import train_eval_utils as utils 12 | 13 | 14 | def create_model(num_classes, load_pretrain_weights=True): 15 | # 注意，这里的backbone默认使用的是FrozenBatchNorm2d，即不会去更新bn参数 16 | # 目的是为了防止batch_size太小导致效果更差(如果显存很小，建议使用默认的FrozenBatchNorm2d) 17 | # 如果GPU显存很大可以设置比较大的batch_size就可以将norm_layer设置为普通的BatchNorm2d 18 | # trainable_layers包括['layer4', 'layer3', 'layer2', 'layer1', 'conv1']， 5代表全部训练 19 | # resnet50 imagenet weights url: https://download.pytorch.org/models/resnet50-0676ba61.pth 20 | backbone = resnet50_fpn_backbone(pretrain_path="/home/lcl_d/wuwentao/detection/maskrcnn/pre_model/resnet50.pth", 21 | norm_layer=torch.nn.BatchNorm2d, 22 | trainable_layers=3) 23 | # 训练自己数据集时不要修改这里的91，修改的是传入的num_classes参数 24 | model = FasterRCNN(backbone=backbone, num_classes=91) 25 | 26 | if load_pretrain_weights: 27 | # 载入预训练模型权重 28 | # https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth 29 | weights_dict = torch.load("/home/lcl_d/wuwentao/detection/maskrcnn_vehiclemae_image_v3/pytorch_object_detection/faster_rcnn/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth", map_location='cpu') 30 | missing_keys, unexpected_keys = model.load_state_dict(weights_dict, strict=False) 31 | if len(missing_keys) != 0 or len(unexpected_keys) != 0: 32 | print("missing_keys: ", missing_keys) 33 | print("unexpected_keys: ", unexpected_keys) 34 | 35 | # get number of input features for the classifier 36 | in_features = model.roi_heads.box_predictor.cls_score.in_features 37 | # replace the pre-trained head with a new one 38 | model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes) 39 | 40 | return model 41 | 42 | 43 | def main(args): 44 | device = torch.device(args.device if torch.cuda.is_available() else "cpu") 45 | print("Using {} device training.".format(device.type)) 46 | 47 | # 用来保存coco_info的文件 48 | results_file = "results{}.txt".format(datetime.datetime.now().strftime("%Y%m%d-%H%M%S")) 49 | 50 | data_transform = { 51 | "train": transforms.Compose([transforms.ToTensor(), 52 | transforms.RandomHorizontalFlip(0.5)]), 53 | "val": transforms.Compose([transforms.ToTensor()]) 54 | } 55 | 56 | VOC_root = args.data_path 57 | # check voc root 58 | #if os.path.exists(os.path.join(VOC_root, "VOCdevkit")) is False: 59 | # raise FileNotFoundError("VOCdevkit dose not in path:'{}'.".format(VOC_root)) 60 | 61 | # load train data set 62 | # VOCdevkit -> VOC2012 -> ImageSets -> Main -> train.txt 63 | train_dataset = VOCDataSet(VOC_root,'train', data_transform["train"], "trainImages.txt") 64 | train_sampler = None 65 | 66 | # 是否按图片相似高宽比采样图片组成batch 67 | # 使用的话能够减小训练时所需GPU显存，默认使用 68 | if args.aspect_ratio_group_factor >= 0: 69 | train_sampler = torch.utils.data.RandomSampler(train_dataset) 70 | # 统计所有图像高宽比例在bins区间中的位置索引 71 | group_ids = create_aspect_ratio_groups(train_dataset, k=args.aspect_ratio_group_factor) 72 | # 每个batch图片从同一高宽比例区间中取 73 | train_batch_sampler = GroupedBatchSampler(train_sampler, group_ids, args.batch_size) 74 | 75 | # 注意这里的collate_fn是自定义的，因为读取的数据包括image和targets，不能直接使用默认的方法合成batch 76 | batch_size = args.batch_size 77 | nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers 78 | print('Using %g dataloader workers' % nw) 79 | if train_sampler: 80 | # 如果按照图片高宽比采样图片，dataloader中需要使用batch_sampler 81 | train_data_loader = torch.utils.data.DataLoader(train_dataset, 82 | batch_sampler=train_batch_sampler, 83 | pin_memory=True, 84 | num_workers=nw, 85 | collate_fn=train_dataset.collate_fn) 86 | else: 87 | train_data_loader = torch.utils.data.DataLoader(train_dataset, 88 | batch_size=batch_size, 89 | shuffle=True, 90 | pin_memory=True, 91 | num_workers=nw, 92 | collate_fn=train_dataset.collate_fn) 93 | 94 | # load validation data set 95 | # VOCdevkit -> VOC2012 -> ImageSets -> Main -> val.txt 96 | val_dataset = VOCDataSet(VOC_root,'val', data_transform["val"], "valImages.txt") 97 | val_data_set_loader = torch.utils.data.DataLoader(val_dataset, 98 | batch_size=1, 99 | shuffle=False, 100 | pin_memory=True, 101 | num_workers=nw, 102 | collate_fn=val_dataset.collate_fn) 103 | 104 | # create model num_classes equal background + 20 classes 105 | model = create_model(num_classes=args.num_classes + 1) 106 | # print(model) 107 | 108 | model.to(device) 109 | 110 | # define optimizer 111 | params = [p for p in model.parameters() if p.requires_grad] 112 | optimizer = torch.optim.SGD(params, 113 | lr=args.lr, 114 | momentum=args.momentum, 115 | weight_decay=args.weight_decay) 116 | 117 | scaler = torch.cuda.amp.GradScaler() if args.amp else None 118 | 119 | # learning rate scheduler 120 | lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 121 | step_size=3, 122 | gamma=0.33) 123 | 124 | # 如果指定了上次训练保存的权重文件地址，则接着上次结果接着训练 125 | if args.resume != "": 126 | checkpoint = torch.load(args.resume, map_location='cpu') 127 | model.load_state_dict(checkpoint['model']) 128 | optimizer.load_state_dict(checkpoint['optimizer']) 129 | lr_scheduler.load_state_dict(checkpoint['lr_scheduler']) 130 | args.start_epoch = checkpoint['epoch'] + 1 131 | if args.amp and "scaler" in checkpoint: 132 | scaler.load_state_dict(checkpoint["scaler"]) 133 | print("the training process from epoch{}...".format(args.start_epoch)) 134 | 135 | train_loss = [] 136 | learning_rate = [] 137 | val_map = [] 138 | 139 | for epoch in range(args.start_epoch, args.epochs): 140 | # train for one epoch, printing every 10 iterations 141 | mean_loss, lr = utils.train_one_epoch(model, optimizer, train_data_loader, 142 | device=device, epoch=epoch, 143 | print_freq=50, warmup=True, 144 | scaler=scaler) 145 | train_loss.append(mean_loss.item()) 146 | learning_rate.append(lr) 147 | 148 | # update the learning rate 149 | lr_scheduler.step() 150 | 151 | # evaluate on the test dataset 152 | coco_info = utils.evaluate(model, val_data_set_loader, device=device) 153 | 154 | # write into txt 155 | with open(results_file, "a") as f: 156 | # 写入的数据包括coco指标还有loss和learning rate 157 | result_info = [f"{i:.4f}" for i in coco_info + [mean_loss.item()]] + [f"{lr:.6f}"] 158 | txt = "epoch:{} {}".format(epoch, ' '.join(result_info)) 159 | f.write(txt + "\n") 160 | 161 | val_map.append(coco_info[1]) # pascal mAP 162 | 163 | # save weights 164 | save_files = { 165 | 'model': model.state_dict(), 166 | 'optimizer': optimizer.state_dict(), 167 | 'lr_scheduler': lr_scheduler.state_dict(), 168 | 'epoch': epoch} 169 | if args.amp: 170 | save_files["scaler"] = scaler.state_dict() 171 | torch.save(save_files, "./save_weights/resNetFpn-model-{}.pth".format(epoch)) 172 | 173 | # plot loss and lr curve 174 | if len(train_loss) != 0 and len(learning_rate) != 0: 175 | from plot_curve import plot_loss_and_lr 176 | plot_loss_and_lr(train_loss, learning_rate) 177 | 178 | # plot mAP curve 179 | if len(val_map) != 0: 180 | from plot_curve import plot_map 181 | plot_map(val_map) 182 | 183 | 184 | if __name__ == "__main__": 185 | import argparse 186 | 187 | parser = argparse.ArgumentParser( 188 | description=__doc__) 189 | 190 | # 训练设备类型 191 | parser.add_argument('--device', default='cuda:0', help='device') 192 | # 训练数据集的根目录(VOCdevkit) 193 | parser.add_argument('--data-path', default='/home/lcl_d/wuwentao/data/cityscapes/', help='dataset') 194 | # 检测目标类别数(不包含背景) 195 | parser.add_argument('--num-classes', default=4, type=int, help='num_classes') 196 | # 文件保存地址 197 | parser.add_argument('--output-dir', default='./save_weights', help='path where to save') 198 | # 若需要接着上次训练，则指定上次训练保存权重文件地址 199 | parser.add_argument('--resume', default='', type=str, help='resume from checkpoint') 200 | # 指定接着从哪个epoch数开始训练 201 | parser.add_argument('--start_epoch', default=0, type=int, help='start epoch') 202 | # 训练的总epoch数 203 | parser.add_argument('--epochs', default=31, type=int, metavar='N', 204 | help='number of total epochs to run') 205 | # 学习率 206 | parser.add_argument('--lr', default=0.01, type=float, 207 | help='initial learning rate, 0.02 is the default value for training ' 208 | 'on 8 gpus and 2 images_per_gpu') 209 | # SGD的momentum参数 210 | parser.add_argument('--momentum', default=0.9, type=float, metavar='M', 211 | help='momentum') 212 | # SGD的weight_decay参数 213 | parser.add_argument('--wd', '--weight-decay', default=1e-4, type=float, 214 | metavar='W', help='weight decay (default: 1e-4)', 215 | dest='weight_decay') 216 | # 训练的batch size 217 | parser.add_argument('--batch_size', default=8, type=int, metavar='N', 218 | help='batch size when training.') 219 | parser.add_argument('--aspect-ratio-group-factor', default=3, type=int) 220 | # 是否使用混合精度训练(需要GPU支持混合精度) 221 | parser.add_argument("--amp", default=False, help="Use torch.cuda.amp for mixed precision training") 222 | 223 | args = parser.parse_args() 224 | print(args) 225 | 226 | # 检查保存权重文件夹是否存在，不存在则创建 227 | if not os.path.exists(args.output_dir): 228 | os.makedirs(args.output_dir) 229 | 230 | main(args) 231 | -------------------------------------------------------------------------------- /faster_rcnn/train_utils/__init__.py: -------------------------------------------------------------------------------- 1 | from .group_by_aspect_ratio import GroupedBatchSampler, create_aspect_ratio_groups 2 | from .distributed_utils import init_distributed_mode, save_on_master, mkdir 3 | from .coco_utils import get_coco_api_from_dataset 4 | from .coco_eval import CocoEvaluator 5 | -------------------------------------------------------------------------------- /faster_rcnn/train_utils/coco_utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torchvision 3 | import torch.utils.data 4 | from pycocotools.coco import COCO 5 | 6 | 7 | def convert_to_coco_api(ds): 8 | coco_ds = COCO() 9 | # annotation IDs need to start at 1, not 0 10 | ann_id = 1 11 | dataset = {'images': [], 'categories': [], 'annotations': []} 12 | categories = set() 13 | for img_idx in range(len(ds)): 14 | # find better way to get target 15 | hw, targets = ds.coco_index(img_idx) 16 | image_id = targets["image_id"].item() 17 | img_dict = {} 18 | img_dict['id'] = image_id 19 | img_dict['height'] = hw[0] 20 | img_dict['width'] = hw[1] 21 | dataset['images'].append(img_dict) 22 | bboxes = targets["boxes"] 23 | bboxes[:, 2:] -= bboxes[:, :2] 24 | bboxes = bboxes.tolist() 25 | labels = targets['labels'].tolist() 26 | areas = targets['area'].tolist() 27 | iscrowd = targets['iscrowd'].tolist() 28 | num_objs = len(bboxes) 29 | for i in range(num_objs): 30 | ann = {} 31 | ann['image_id'] = image_id 32 | ann['bbox'] = bboxes[i] 33 | ann['category_id'] = labels[i] 34 | categories.add(labels[i]) 35 | ann['area'] = areas[i] 36 | ann['iscrowd'] = iscrowd[i] 37 | ann['id'] = ann_id 38 | dataset['annotations'].append(ann) 39 | ann_id += 1 40 | dataset['categories'] = [{'id': i} for i in sorted(categories)] 41 | coco_ds.dataset = dataset 42 | coco_ds.createIndex() 43 | return coco_ds 44 | 45 | 46 | def get_coco_api_from_dataset(dataset): 47 | for _ in range(10): 48 | if isinstance(dataset, torchvision.datasets.CocoDetection): 49 | break 50 | if isinstance(dataset, torch.utils.data.Subset): 51 | dataset = dataset.dataset 52 | if isinstance(dataset, torchvision.datasets.CocoDetection): 53 | return dataset.coco 54 | return convert_to_coco_api(dataset) 55 | -------------------------------------------------------------------------------- /faster_rcnn/train_utils/group_by_aspect_ratio.py: -------------------------------------------------------------------------------- 1 | import bisect 2 | from collections import defaultdict 3 | import copy 4 | from itertools import repeat, chain 5 | import math 6 | import numpy as np 7 | 8 | import torch 9 | import torch.utils.data 10 | from torch.utils.data.sampler import BatchSampler, Sampler 11 | from torch.utils.model_zoo import tqdm 12 | import torchvision 13 | 14 | from PIL import Image 15 | 16 | 17 | def _repeat_to_at_least(iterable, n): 18 | repeat_times = math.ceil(n / len(iterable)) 19 | repeated = chain.from_iterable(repeat(iterable, repeat_times)) 20 | return list(repeated) 21 | 22 | 23 | class GroupedBatchSampler(BatchSampler): 24 | """ 25 | Wraps another sampler to yield a mini-batch of indices. 26 | It enforces that the batch only contain elements from the same group. 27 | It also tries to provide mini-batches which follows an ordering which is 28 | as close as possible to the ordering from the original sampler. 29 | Arguments: 30 | sampler (Sampler): Base sampler. 31 | group_ids (list[int]): If the sampler produces indices in range [0, N), 32 | `group_ids` must be a list of `N` ints which contains the group id of each sample. 33 | The group ids must be a continuous set of integers starting from 34 | 0, i.e. they must be in the range [0, num_groups). 35 | batch_size (int): Size of mini-batch. 36 | """ 37 | def __init__(self, sampler, group_ids, batch_size): 38 | if not isinstance(sampler, Sampler): 39 | raise ValueError( 40 | "sampler should be an instance of " 41 | "torch.utils.data.Sampler, but got sampler={}".format(sampler) 42 | ) 43 | self.sampler = sampler 44 | self.group_ids = group_ids 45 | self.batch_size = batch_size 46 | 47 | def __iter__(self): 48 | buffer_per_group = defaultdict(list) 49 | samples_per_group = defaultdict(list) 50 | 51 | num_batches = 0 52 | for idx in self.sampler: 53 | group_id = self.group_ids[idx] 54 | buffer_per_group[group_id].append(idx) 55 | samples_per_group[group_id].append(idx) 56 | if len(buffer_per_group[group_id]) == self.batch_size: 57 | yield buffer_per_group[group_id] 58 | num_batches += 1 59 | del buffer_per_group[group_id] 60 | assert len(buffer_per_group[group_id]) < self.batch_size 61 | 62 | # now we have run out of elements that satisfy 63 | # the group criteria, let's return the remaining 64 | # elements so that the size of the sampler is 65 | # deterministic 66 | expected_num_batches = len(self) 67 | num_remaining = expected_num_batches - num_batches 68 | if num_remaining > 0: 69 | # for the remaining batches, take first the buffers with largest number 70 | # of elements 71 | for group_id, _ in sorted(buffer_per_group.items(), 72 | key=lambda x: len(x[1]), reverse=True): 73 | remaining = self.batch_size - len(buffer_per_group[group_id]) 74 | samples_from_group_id = _repeat_to_at_least(samples_per_group[group_id], remaining) 75 | buffer_per_group[group_id].extend(samples_from_group_id[:remaining]) 76 | assert len(buffer_per_group[group_id]) == self.batch_size 77 | yield buffer_per_group[group_id] 78 | num_remaining -= 1 79 | if num_remaining == 0: 80 | break 81 | assert num_remaining == 0 82 | 83 | def __len__(self): 84 | return len(self.sampler) // self.batch_size 85 | 86 | 87 | def _compute_aspect_ratios_slow(dataset, indices=None): 88 | print("Your dataset doesn't support the fast path for " 89 | "computing the aspect ratios, so will iterate over " 90 | "the full dataset and load every image instead. " 91 | "This might take some time...") 92 | if indices is None: 93 | indices = range(len(dataset)) 94 | 95 | class SubsetSampler(Sampler): 96 | def __init__(self, indices): 97 | self.indices = indices 98 | 99 | def __iter__(self): 100 | return iter(self.indices) 101 | 102 | def __len__(self): 103 | return len(self.indices) 104 | 105 | sampler = SubsetSampler(indices) 106 | data_loader = torch.utils.data.DataLoader( 107 | dataset, batch_size=1, sampler=sampler, 108 | num_workers=14, # you might want to increase it for faster processing 109 | collate_fn=lambda x: x[0]) 110 | aspect_ratios = [] 111 | with tqdm(total=len(dataset)) as pbar: 112 | for _i, (img, _) in enumerate(data_loader): 113 | pbar.update(1) 114 | height, width = img.shape[-2:] 115 | aspect_ratio = float(width) / float(height) 116 | aspect_ratios.append(aspect_ratio) 117 | return aspect_ratios 118 | 119 | 120 | def _compute_aspect_ratios_custom_dataset(dataset, indices=None): 121 | if indices is None: 122 | indices = range(len(dataset)) 123 | aspect_ratios = [] 124 | for i in indices: 125 | height, width = dataset.get_height_and_width(i) 126 | aspect_ratio = float(width) / float(height) 127 | aspect_ratios.append(aspect_ratio) 128 | return aspect_ratios 129 | 130 | 131 | def _compute_aspect_ratios_coco_dataset(dataset, indices=None): 132 | if indices is None: 133 | indices = range(len(dataset)) 134 | aspect_ratios = [] 135 | for i in indices: 136 | img_info = dataset.coco.imgs[dataset.ids[i]] 137 | aspect_ratio = float(img_info["width"]) / float(img_info["height"]) 138 | aspect_ratios.append(aspect_ratio) 139 | return aspect_ratios 140 | 141 | 142 | def _compute_aspect_ratios_voc_dataset(dataset, indices=None): 143 | if indices is None: 144 | indices = range(len(dataset)) 145 | aspect_ratios = [] 146 | for i in indices: 147 | # this doesn't load the data into memory, because PIL loads it lazily 148 | width, height = Image.open(dataset.images[i]).size 149 | aspect_ratio = float(width) / float(height) 150 | aspect_ratios.append(aspect_ratio) 151 | return aspect_ratios 152 | 153 | 154 | def _compute_aspect_ratios_subset_dataset(dataset, indices=None): 155 | if indices is None: 156 | indices = range(len(dataset)) 157 | 158 | ds_indices = [dataset.indices[i] for i in indices] 159 | return compute_aspect_ratios(dataset.dataset, ds_indices) 160 | 161 | 162 | def compute_aspect_ratios(dataset, indices=None): 163 | if hasattr(dataset, "get_height_and_width"): 164 | return _compute_aspect_ratios_custom_dataset(dataset, indices) 165 | 166 | if isinstance(dataset, torchvision.datasets.CocoDetection): 167 | return _compute_aspect_ratios_coco_dataset(dataset, indices) 168 | 169 | if isinstance(dataset, torchvision.datasets.VOCDetection): 170 | return _compute_aspect_ratios_voc_dataset(dataset, indices) 171 | 172 | if isinstance(dataset, torch.utils.data.Subset): 173 | return _compute_aspect_ratios_subset_dataset(dataset, indices) 174 | 175 | # slow path 176 | return _compute_aspect_ratios_slow(dataset, indices) 177 | 178 | 179 | def _quantize(x, bins): 180 | bins = copy.deepcopy(bins) 181 | bins = sorted(bins) 182 | # bisect_right：寻找y元素按顺序应该排在bins中哪个元素的右边，返回的是索引 183 | quantized = list(map(lambda y: bisect.bisect_right(bins, y), x)) 184 | return quantized 185 | 186 | 187 | def create_aspect_ratio_groups(dataset, k=0): 188 | # 计算所有数据集中的图片width/height比例 189 | aspect_ratios = compute_aspect_ratios(dataset) 190 | # 将[0.5, 2]区间划分成2*k等份(2k+1个点，2k个区间) 191 | bins = (2 ** np.linspace(-1, 1, 2 * k + 1)).tolist() if k > 0 else [1.0] 192 | 193 | # 统计所有图像比例在bins区间中的位置索引 194 | groups = _quantize(aspect_ratios, bins) 195 | # count number of elements per group 196 | # 统计每个区间的频次 197 | counts = np.unique(groups, return_counts=True)[1] 198 | fbins = [0] + bins + [np.inf] 199 | print("Using {} as bins for aspect ratio quantization".format(fbins)) 200 | print("Count of instances per bin: {}".format(counts)) 201 | return groups 202 | -------------------------------------------------------------------------------- /faster_rcnn/train_utils/train_eval_utils.py: -------------------------------------------------------------------------------- 1 | import math 2 | import sys 3 | import time 4 | 5 | import torch 6 | 7 | from .coco_utils import get_coco_api_from_dataset 8 | from .coco_eval import CocoEvaluator 9 | import train_utils.distributed_utils as utils 10 | 11 | 12 | def train_one_epoch(model, optimizer, data_loader, device, epoch, 13 | print_freq=50, warmup=False, scaler=None): 14 | model.train() 15 | metric_logger = utils.MetricLogger(delimiter=" ") 16 | metric_logger.add_meter('lr', utils.SmoothedValue(window_size=1, fmt='{value:.6f}')) 17 | header = 'Epoch: [{}]'.format(epoch) 18 | 19 | lr_scheduler = None 20 | if epoch == 0 and warmup is True: # 当训练第一轮（epoch=0）时，启用warmup训练方式，可理解为热身训练 21 | warmup_factor = 1.0 / 1000 22 | warmup_iters = min(1000, len(data_loader) - 1) 23 | 24 | lr_scheduler = utils.warmup_lr_scheduler(optimizer, warmup_iters, warmup_factor) 25 | 26 | mloss = torch.zeros(1).to(device) # mean losses 27 | for i, [images, targets] in enumerate(metric_logger.log_every(data_loader, print_freq, header)): 28 | images = list(image.to(device) for image in images) 29 | targets = [{k: v.to(device) for k, v in t.items()} for t in targets] 30 | 31 | # 混合精度训练上下文管理器，如果在CPU环境中不起任何作用 32 | with torch.cuda.amp.autocast(enabled=scaler is not None): 33 | loss_dict = model(images, targets) 34 | losses = sum(loss for loss in loss_dict.values()) 35 | 36 | # reduce losses over all GPUs for logging purpose 37 | loss_dict_reduced = utils.reduce_dict(loss_dict) 38 | losses_reduced = sum(loss for loss in loss_dict_reduced.values()) 39 | 40 | loss_value = losses_reduced.item() 41 | # 记录训练损失 42 | mloss = (mloss * i + loss_value) / (i + 1) # update mean losses 43 | 44 | if not math.isfinite(loss_value): # 当计算的损失为无穷大时停止训练 45 | print("Loss is {}, stopping training".format(loss_value)) 46 | print(loss_dict_reduced) 47 | sys.exit(1) 48 | 49 | optimizer.zero_grad() 50 | if scaler is not None: 51 | scaler.scale(losses).backward() 52 | scaler.step(optimizer) 53 | scaler.update() 54 | else: 55 | losses.backward() 56 | optimizer.step() 57 | 58 | if lr_scheduler is not None: # 第一轮使用warmup训练方式 59 | lr_scheduler.step() 60 | 61 | metric_logger.update(loss=losses_reduced, **loss_dict_reduced) 62 | now_lr = optimizer.param_groups[0]["lr"] 63 | metric_logger.update(lr=now_lr) 64 | 65 | return mloss, now_lr 66 | 67 | 68 | @torch.no_grad() 69 | def evaluate(model, data_loader, device): 70 | 71 | cpu_device = torch.device("cpu") 72 | model.eval() 73 | metric_logger = utils.MetricLogger(delimiter=" ") 74 | header = "Test: " 75 | 76 | coco = get_coco_api_from_dataset(data_loader.dataset) 77 | iou_types = _get_iou_types(model) 78 | coco_evaluator = CocoEvaluator(coco, iou_types) 79 | 80 | for image, targets in metric_logger.log_every(data_loader, 100, header): 81 | image = list(img.to(device) for img in image) 82 | 83 | # 当使用CPU时，跳过GPU相关指令 84 | if device != torch.device("cpu"): 85 | torch.cuda.synchronize(device) 86 | 87 | model_time = time.time() 88 | outputs = model(image) 89 | 90 | outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs] 91 | model_time = time.time() - model_time 92 | 93 | res = {target["image_id"].item(): output for target, output in zip(targets, outputs)} 94 | 95 | evaluator_time = time.time() 96 | coco_evaluator.update(res) 97 | evaluator_time = time.time() - evaluator_time 98 | metric_logger.update(model_time=model_time, evaluator_time=evaluator_time) 99 | 100 | # gather the stats from all processes 101 | metric_logger.synchronize_between_processes() 102 | print("Averaged stats:", metric_logger) 103 | coco_evaluator.synchronize_between_processes() 104 | 105 | # accumulate predictions from all images 106 | coco_evaluator.accumulate() 107 | coco_evaluator.summarize() 108 | 109 | coco_info = coco_evaluator.coco_eval[iou_types[0]].stats.tolist() # numpy to list 110 | 111 | return coco_info 112 | 113 | 114 | def _get_iou_types(model): 115 | model_without_ddp = model 116 | if isinstance(model, torch.nn.parallel.DistributedDataParallel): 117 | model_without_ddp = model.module 118 | iou_types = ["bbox"] 119 | return iou_types 120 | -------------------------------------------------------------------------------- /faster_rcnn/transforms.py: -------------------------------------------------------------------------------- 1 | import random 2 | from torchvision.transforms import functional as F 3 | 4 | 5 | class Compose(object): 6 | """组合多个transform函数""" 7 | def __init__(self, transforms): 8 | self.transforms = transforms 9 | 10 | def __call__(self, image, target): 11 | for t in self.transforms: 12 | image, target = t(image, target) 13 | return image, target 14 | 15 | 16 | class ToTensor(object): 17 | """将PIL图像转为Tensor""" 18 | def __call__(self, image, target): 19 | image = F.to_tensor(image) 20 | return image, target 21 | 22 | 23 | class RandomHorizontalFlip(object): 24 | """随机水平翻转图像以及bboxes""" 25 | def __init__(self, prob=0.5): 26 | self.prob = prob 27 | 28 | def __call__(self, image, target): 29 | if random.random() < self.prob: 30 | height, width = image.shape[-2:] 31 | image = image.flip(-1) # 水平翻转图片 32 | bbox = target["boxes"] 33 | # bbox: xmin, ymin, xmax, ymax 34 | bbox[:, [0, 2]] = width - bbox[:, [2, 0]] # 翻转对应bbox坐标信息 35 | target["boxes"] = bbox 36 | return image, target 37 | -------------------------------------------------------------------------------- /faster_rcnn/validation.py: -------------------------------------------------------------------------------- 1 | """ 2 | 该脚本用于调用训练好的模型权重去计算验证集/测试集的COCO指标 3 | 以及每个类别的mAP(IoU=0.5) 4 | """ 5 | 6 | import os 7 | import json 8 | 9 | import torch 10 | from tqdm import tqdm 11 | import numpy as np 12 | 13 | import transforms 14 | from network_files import FasterRCNN 15 | from backbone import resnet50_fpn_backbone 16 | from my_dataset import VOCDataSet 17 | from train_utils import get_coco_api_from_dataset, CocoEvaluator 18 | 19 | 20 | def summarize(self, catId=None): 21 | """ 22 | Compute and display summary metrics for evaluation results. 23 | Note this functin can *only* be applied on the default parameter setting 24 | """ 25 | 26 | def _summarize(ap=1, iouThr=None, areaRng='all', maxDets=100): 27 | p = self.params 28 | iStr = ' {:<18} {} @[ IoU={:<9} | area={:>6s} | maxDets={:>3d} ] = {:0.3f}' 29 | titleStr = 'Average Precision' if ap == 1 else 'Average Recall' 30 | typeStr = '(AP)' if ap == 1 else '(AR)' 31 | iouStr = '{:0.2f}:{:0.2f}'.format(p.iouThrs[0], p.iouThrs[-1]) \ 32 | if iouThr is None else '{:0.2f}'.format(iouThr) 33 | 34 | aind = [i for i, aRng in enumerate(p.areaRngLbl) if aRng == areaRng] 35 | mind = [i for i, mDet in enumerate(p.maxDets) if mDet == maxDets] 36 | 37 | if ap == 1: 38 | # dimension of precision: [TxRxKxAxM] 39 | s = self.eval['precision'] 40 | # IoU 41 | if iouThr is not None: 42 | t = np.where(iouThr == p.iouThrs)[0] 43 | s = s[t] 44 | 45 | if isinstance(catId, int): 46 | s = s[:, :, catId, aind, mind] 47 | else: 48 | s = s[:, :, :, aind, mind] 49 | 50 | else: 51 | # dimension of recall: [TxKxAxM] 52 | s = self.eval['recall'] 53 | if iouThr is not None: 54 | t = np.where(iouThr == p.iouThrs)[0] 55 | s = s[t] 56 | 57 | if isinstance(catId, int): 58 | s = s[:, catId, aind, mind] 59 | else: 60 | s = s[:, :, aind, mind] 61 | 62 | if len(s[s > -1]) == 0: 63 | mean_s = -1 64 | else: 65 | mean_s = np.mean(s[s > -1]) 66 | 67 | print_string = iStr.format(titleStr, typeStr, iouStr, areaRng, maxDets, mean_s) 68 | return mean_s, print_string 69 | 70 | stats, print_list = [0] * 12, [""] * 12 71 | stats[0], print_list[0] = _summarize(1) 72 | stats[1], print_list[1] = _summarize(1, iouThr=.5, maxDets=self.params.maxDets[2]) 73 | stats[2], print_list[2] = _summarize(1, iouThr=.75, maxDets=self.params.maxDets[2]) 74 | stats[3], print_list[3] = _summarize(1, areaRng='small', maxDets=self.params.maxDets[2]) 75 | stats[4], print_list[4] = _summarize(1, areaRng='medium', maxDets=self.params.maxDets[2]) 76 | stats[5], print_list[5] = _summarize(1, areaRng='large', maxDets=self.params.maxDets[2]) 77 | stats[6], print_list[6] = _summarize(0, maxDets=self.params.maxDets[0]) 78 | stats[7], print_list[7] = _summarize(0, maxDets=self.params.maxDets[1]) 79 | stats[8], print_list[8] = _summarize(0, maxDets=self.params.maxDets[2]) 80 | stats[9], print_list[9] = _summarize(0, areaRng='small', maxDets=self.params.maxDets[2]) 81 | stats[10], print_list[10] = _summarize(0, areaRng='medium', maxDets=self.params.maxDets[2]) 82 | stats[11], print_list[11] = _summarize(0, areaRng='large', maxDets=self.params.maxDets[2]) 83 | 84 | print_info = "\n".join(print_list) 85 | 86 | if not self.eval: 87 | raise Exception('Please run accumulate() first') 88 | 89 | return stats, print_info 90 | 91 | 92 | def main(parser_data): 93 | device = torch.device(parser_data.device if torch.cuda.is_available() else "cpu") 94 | print("Using {} device training.".format(device.type)) 95 | 96 | data_transform = { 97 | "val": transforms.Compose([transforms.ToTensor()]) 98 | } 99 | 100 | # read class_indict 101 | label_json_path = './pascal_voc_classes.json' 102 | assert os.path.exists(label_json_path), "json file {} dose not exist.".format(label_json_path) 103 | with open(label_json_path, 'r') as f: 104 | class_dict = json.load(f) 105 | 106 | category_index = {v: k for k, v in class_dict.items()} 107 | 108 | VOC_root = parser_data.data_path 109 | # check voc root 110 | if os.path.exists(os.path.join(VOC_root, "VOCdevkit")) is False: 111 | raise FileNotFoundError("VOCdevkit dose not in path:'{}'.".format(VOC_root)) 112 | 113 | # 注意这里的collate_fn是自定义的，因为读取的数据包括image和targets，不能直接使用默认的方法合成batch 114 | batch_size = parser_data.batch_size 115 | nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers 116 | print('Using %g dataloader workers' % nw) 117 | 118 | # load validation data set 119 | val_dataset = VOCDataSet(VOC_root, "2012", data_transform["val"], "val.txt") 120 | val_dataset_loader = torch.utils.data.DataLoader(val_dataset, 121 | batch_size=1, 122 | shuffle=False, 123 | num_workers=nw, 124 | pin_memory=True, 125 | collate_fn=val_dataset.collate_fn) 126 | 127 | # create model num_classes equal background + 20 classes 128 | # 注意，这里的norm_layer要和训练脚本中保持一致 129 | backbone = resnet50_fpn_backbone(norm_layer=torch.nn.BatchNorm2d) 130 | model = FasterRCNN(backbone=backbone, num_classes=parser_data.num_classes + 1) 131 | 132 | # 载入你自己训练好的模型权重 133 | weights_path = parser_data.weights_path 134 | assert os.path.exists(weights_path), "not found {} file.".format(weights_path) 135 | weights_dict = torch.load(weights_path, map_location='cpu') 136 | weights_dict = weights_dict["model"] if "model" in weights_dict else weights_dict 137 | model.load_state_dict(weights_dict) 138 | # print(model) 139 | 140 | model.to(device) 141 | 142 | # evaluate on the test dataset 143 | coco = get_coco_api_from_dataset(val_dataset) 144 | iou_types = ["bbox"] 145 | coco_evaluator = CocoEvaluator(coco, iou_types) 146 | cpu_device = torch.device("cpu") 147 | 148 | model.eval() 149 | with torch.no_grad(): 150 | for image, targets in tqdm(val_dataset_loader, desc="validation..."): 151 | # 将图片传入指定设备device 152 | image = list(img.to(device) for img in image) 153 | 154 | # inference 155 | outputs = model(image) 156 | 157 | outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs] 158 | res = {target["image_id"].item(): output for target, output in zip(targets, outputs)} 159 | coco_evaluator.update(res) 160 | 161 | coco_evaluator.synchronize_between_processes() 162 | 163 | # accumulate predictions from all images 164 | coco_evaluator.accumulate() 165 | coco_evaluator.summarize() 166 | 167 | coco_eval = coco_evaluator.coco_eval["bbox"] 168 | # calculate COCO info for all classes 169 | coco_stats, print_coco = summarize(coco_eval) 170 | 171 | # calculate voc info for every classes(IoU=0.5) 172 | voc_map_info_list = [] 173 | for i in range(len(category_index)): 174 | stats, _ = summarize(coco_eval, catId=i) 175 | voc_map_info_list.append(" {:15}: {}".format(category_index[i + 1], stats[1])) 176 | 177 | print_voc = "\n".join(voc_map_info_list) 178 | print(print_voc) 179 | 180 | # 将验证结果保存至txt文件中 181 | with open("record_mAP.txt", "w") as f: 182 | record_lines = ["COCO results:", 183 | print_coco, 184 | "", 185 | "mAP(IoU=0.5) for each category:", 186 | print_voc] 187 | f.write("\n".join(record_lines)) 188 | 189 | 190 | if __name__ == "__main__": 191 | import argparse 192 | 193 | parser = argparse.ArgumentParser( 194 | description=__doc__) 195 | 196 | # 使用设备类型 197 | parser.add_argument('--device', default='cuda', help='device') 198 | 199 | # 检测目标类别数 200 | parser.add_argument('--num-classes', type=int, default='20', help='number of classes') 201 | 202 | # 数据集的根目录(VOCdevkit) 203 | parser.add_argument('--data-path', default='/data/', help='dataset root') 204 | 205 | # 训练好的权重文件 206 | parser.add_argument('--weights-path', default='./save_weights/model.pth', type=str, help='training weights') 207 | 208 | # batch size 209 | parser.add_argument('--batch_size', default=1, type=int, metavar='N', 210 | help='batch size when validation.') 211 | 212 | args = parser.parse_args() 213 | 214 | main(args) 215 | -------------------------------------------------------------------------------- /figures/VehicleMAE_Det.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/figures/VehicleMAE_Det.jpg -------------------------------------------------------------------------------- /figures/detection_result.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/figures/detection_result.jpg -------------------------------------------------------------------------------- /figures/experimentalresults.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/figures/experimentalresults.jpg -------------------------------------------------------------------------------- /figures/firstIMG.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/figures/firstIMG.jpg -------------------------------------------------------------------------------- /figures/proposal_attentionmaps.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/figures/proposal_attentionmaps.jpg -------------------------------------------------------------------------------- /figures/proposal_attribute.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/figures/proposal_attribute.jpg -------------------------------------------------------------------------------- /my_dataset_cityscraps.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | 4 | import torch 5 | from PIL import Image 6 | import torch.utils.data as data 7 | from pycocotools.coco import COCO 8 | from train_utils import coco_remove_images_without_annotations, convert_coco_poly_mask 9 | 10 | 11 | class CityscrapesDetection(data.Dataset): 12 | """`MS Coco Detection `_ Dataset. 13 | 14 | Args: 15 | root (string): Root directory where images are downloaded to. 16 | dataset (string): train or val. 17 | transforms (callable, optional): A function/transform that takes input sample and its target as entry 18 | and returns a transformed version. 19 | """ 20 | 21 | def __init__(self, root, dataset="train", transforms=None): 22 | super(CityscrapesDetection, self).__init__() 23 | assert dataset in ["train", "val"], 'dataset must be in ["train", "val"]' 24 | anno_file = f"instances_{dataset}.json" 25 | assert os.path.exists(root), "file '{}' does not exist.".format(root) 26 | self.img_root = os.path.join(root, f"{dataset}") 27 | assert os.path.exists(self.img_root), "path '{}' does not exist.".format(self.img_root) 28 | self.anno_path = os.path.join(root, "annotations", anno_file) 29 | assert os.path.exists(self.anno_path), "file '{}' does not exist.".format(self.anno_path) 30 | 31 | self.mode = dataset 32 | self.transforms = transforms 33 | self.coco = COCO(self.anno_path) 34 | 35 | # 获取coco数据索引与类别名称的关系 36 | # 注意在object80中的索引并不是连续的，虽然只有80个类别，但索引还是按照stuff91来排序的 37 | data_classes = dict([(v["id"], v["name"]) for k, v in self.coco.cats.items()]) 38 | max_index = max(data_classes.keys()) # 90 39 | # 将缺失的类别名称设置成N/A 40 | coco_classes = {} 41 | for k in range(1, max_index + 1): 42 | if k in data_classes: 43 | coco_classes[k] = data_classes[k] 44 | else: 45 | coco_classes[k] = "N/A" 46 | 47 | if dataset == "train": 48 | json_str = json.dumps(coco_classes, indent=4) 49 | with open("/data/wuwentao/VehicleDetection/cityscrapes4_indices.json", "w") as f: 50 | f.write(json_str) 51 | 52 | self.coco_classes = coco_classes 53 | 54 | ids = list(sorted(self.coco.imgs.keys())) 55 | if dataset == "train": 56 | # 移除没有目标，或者目标面积非常小的数据 57 | valid_ids = coco_remove_images_without_annotations(self.coco, ids) 58 | self.ids = valid_ids 59 | else: 60 | self.ids = ids 61 | 62 | def parse_targets(self, 63 | img_id: int, 64 | coco_targets: list, 65 | w: int = None, 66 | h: int = None): 67 | assert w > 0 68 | assert h > 0 69 | 70 | # 只筛选出单个对象的情况 71 | anno = [obj for obj in coco_targets if obj['iscrowd'] == 0] 72 | 73 | boxes = [obj["bbox"] for obj in anno] 74 | 75 | # guard against no boxes via resizing 76 | boxes = torch.as_tensor(boxes, dtype=torch.float32).reshape(-1, 4) 77 | # [xmin, ymin, w, h] -> [xmin, ymin, xmax, ymax] 78 | boxes[:, 2:] += boxes[:, :2] 79 | boxes[:, 0::2].clamp_(min=0, max=w) 80 | boxes[:, 1::2].clamp_(min=0, max=h) 81 | 82 | classes = [obj["category_id"] for obj in anno] 83 | classes = torch.tensor(classes, dtype=torch.int64) 84 | 85 | area = torch.tensor([obj["area"] for obj in anno]) 86 | iscrowd = torch.tensor([obj["iscrowd"] for obj in anno]) 87 | 88 | segmentations = [obj["segmentation"] for obj in anno] 89 | masks = convert_coco_poly_mask(segmentations, h, w) 90 | 91 | # 筛选出合法的目标，即x_max>x_min且y_max>y_min 92 | keep = (boxes[:, 3] > boxes[:, 1]) & (boxes[:, 2] > boxes[:, 0]) 93 | boxes = boxes[keep] 94 | classes = classes[keep] 95 | masks = masks[keep] 96 | area = area[keep] 97 | iscrowd = iscrowd[keep] 98 | 99 | target = {} 100 | target["boxes"] = boxes 101 | target["labels"] = classes 102 | target["masks"] = masks 103 | target["image_id"] = torch.tensor([img_id]) 104 | 105 | # for conversion to coco api 106 | target["area"] = area 107 | target["iscrowd"] = iscrowd 108 | 109 | return target 110 | 111 | def __getitem__(self, index): 112 | """ 113 | Args: 114 | index (int): Index 115 | 116 | Returns: 117 | tuple: Tuple (image, target). target is the object returned by ``coco.loadAnns``. 118 | """ 119 | coco = self.coco 120 | img_id = self.ids[index] 121 | ann_ids = coco.getAnnIds(imgIds=img_id) 122 | coco_target = coco.loadAnns(ann_ids) 123 | 124 | path = coco.loadImgs(img_id)[0]['file_name'] 125 | path = '/data/wuwentao/data/cityscapes/leftImg8bit/' + path 126 | #img = Image.open(os.path.join(self.img_root, path)).convert('RGB') 127 | img = Image.open(path).convert('RGB') 128 | 129 | w, h = img.size 130 | target = self.parse_targets(img_id, coco_target, w, h) 131 | if self.transforms is not None: 132 | img, target = self.transforms(img, target) 133 | 134 | return img, target 135 | 136 | def __len__(self): 137 | return len(self.ids) 138 | 139 | def get_height_and_width(self, index): 140 | coco = self.coco 141 | img_id = self.ids[index] 142 | 143 | img_info = coco.loadImgs(img_id)[0] 144 | w = img_info["width"] 145 | h = img_info["height"] 146 | return h, w 147 | 148 | @staticmethod 149 | def collate_fn(batch): 150 | return tuple(zip(*batch)) 151 | 152 | 153 | if __name__ == '__main__': 154 | train = CityscrapesDetection("/data/wuwentao/data/cityscapes/leftImg8bit/", dataset="train") 155 | print(len(train)) 156 | t = train[0] 157 | -------------------------------------------------------------------------------- /network_files/__init__.py: -------------------------------------------------------------------------------- 1 | from .faster_rcnn_framework import FasterRCNN, FastRCNNPredictor 2 | from .rpn_function import AnchorsGenerator 3 | from .mask_rcnn import MaskRCNN 4 | from .vehiclemaeencode import VehiclemaeEncode 5 | -------------------------------------------------------------------------------- /network_files/__pycache__/__init__.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/__init__.cpython-38.pyc -------------------------------------------------------------------------------- /network_files/__pycache__/boxes.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/boxes.cpython-38.pyc -------------------------------------------------------------------------------- /network_files/__pycache__/det_utils.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/det_utils.cpython-38.pyc -------------------------------------------------------------------------------- /network_files/__pycache__/faster_rcnn_framework.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/faster_rcnn_framework.cpython-38.pyc -------------------------------------------------------------------------------- /network_files/__pycache__/image_list.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/image_list.cpython-38.pyc -------------------------------------------------------------------------------- /network_files/__pycache__/mask_rcnn.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/mask_rcnn.cpython-38.pyc -------------------------------------------------------------------------------- /network_files/__pycache__/roi_head.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/roi_head.cpython-38.pyc -------------------------------------------------------------------------------- /network_files/__pycache__/rpn_function.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/rpn_function.cpython-38.pyc -------------------------------------------------------------------------------- /network_files/__pycache__/transform.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/transform.cpython-38.pyc -------------------------------------------------------------------------------- /network_files/__pycache__/vehiclemaeencode.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/network_files/__pycache__/vehiclemaeencode.cpython-38.pyc -------------------------------------------------------------------------------- /network_files/boxes.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from typing import Tuple 3 | from torch import Tensor 4 | import torchvision 5 | 6 | 7 | def nms(boxes, scores, iou_threshold): 8 | # type: (Tensor, Tensor, float) -> Tensor 9 | """ 10 | Performs non-maximum suppression (NMS) on the boxes according 11 | to their intersection-over-union (IoU). 12 | 13 | NMS iteratively removes lower scoring boxes which have an 14 | IoU greater than iou_threshold with another (higher scoring) 15 | box. 16 | 17 | Parameters 18 | ---------- 19 | boxes : Tensor[N, 4]) 20 | boxes to perform NMS on. They 21 | are expected to be in (x1, y1, x2, y2) format 22 | scores : Tensor[N] 23 | scores for each one of the boxes 24 | iou_threshold : float 25 | discards all overlapping 26 | boxes with IoU > iou_threshold 27 | 28 | Returns 29 | ------- 30 | keep : Tensor 31 | int64 tensor with the indices 32 | of the elements that have been kept 33 | by NMS, sorted in decreasing order of scores 34 | """ 35 | return torch.ops.torchvision.nms(boxes, scores, iou_threshold) 36 | 37 | 38 | def batched_nms(boxes, scores, idxs, iou_threshold): 39 | # type: (Tensor, Tensor, Tensor, float) -> Tensor 40 | """ 41 | Performs non-maximum suppression in a batched fashion. 42 | 43 | Each index value correspond to a category, and NMS 44 | will not be applied between elements of different categories. 45 | 46 | Parameters 47 | ---------- 48 | boxes : Tensor[N, 4] 49 | boxes where NMS will be performed. They 50 | are expected to be in (x1, y1, x2, y2) format 51 | scores : Tensor[N] 52 | scores for each one of the boxes 53 | idxs : Tensor[N] 54 | indices of the categories for each one of the boxes. 55 | iou_threshold : float 56 | discards all overlapping boxes 57 | with IoU < iou_threshold 58 | 59 | Returns 60 | ------- 61 | keep : Tensor 62 | int64 tensor with the indices of 63 | the elements that have been kept by NMS, sorted 64 | in decreasing order of scores 65 | """ 66 | if boxes.numel() == 0: 67 | return torch.empty((0,), dtype=torch.int64, device=boxes.device) 68 | 69 | # strategy: in order to perform NMS independently per class. 70 | # we add an offset to all the boxes. The offset is dependent 71 | # only on the class idx, and is large enough so that boxes 72 | # from different classes do not overlap 73 | # 获取所有boxes中最大的坐标值（xmin, ymin, xmax, ymax） 74 | max_coordinate = boxes.max() 75 | 76 | # to(): Performs Tensor dtype and/or device conversion 77 | # 为每一个类别/每一层生成一个很大的偏移量 78 | # 这里的to只是让生成tensor的dytpe和device与boxes保持一致 79 | offsets = idxs.to(boxes) * (max_coordinate + 1) 80 | # boxes加上对应层的偏移量后，保证不同类别/层之间boxes不会有重合的现象 81 | boxes_for_nms = boxes + offsets[:, None] 82 | keep = nms(boxes_for_nms, scores, iou_threshold) 83 | return keep 84 | 85 | 86 | def remove_small_boxes(boxes, min_size): 87 | # type: (Tensor, float) -> Tensor 88 | """ 89 | Remove boxes which contains at least one side smaller than min_size. 90 | 移除宽高小于指定阈值的索引 91 | Arguments: 92 | boxes (Tensor[N, 4]): boxes in (x1, y1, x2, y2) format 93 | min_size (float): minimum size 94 | 95 | Returns: 96 | keep (Tensor[K]): indices of the boxes that have both sides 97 | larger than min_size 98 | """ 99 | ws, hs = boxes[:, 2] - boxes[:, 0], boxes[:, 3] - boxes[:, 1] # 预测boxes的宽和高 100 | # keep = (ws >= min_size) & (hs >= min_size) # 当满足宽，高都大于给定阈值时为True 101 | keep = torch.logical_and(torch.ge(ws, min_size), torch.ge(hs, min_size)) 102 | # nonzero(): Returns a tensor containing the indices of all non-zero elements of input 103 | # keep = keep.nonzero().squeeze(1) 104 | keep = torch.where(keep)[0] 105 | return keep 106 | 107 | 108 | def clip_boxes_to_image(boxes, size): 109 | # type: (Tensor, Tuple[int, int]) -> Tensor 110 | """ 111 | Clip boxes so that they lie inside an image of size `size`. 112 | 裁剪预测的boxes信息，将越界的坐标调整到图片边界上 113 | 114 | Arguments: 115 | boxes (Tensor[N, 4]): boxes in (x1, y1, x2, y2) format 116 | size (Tuple[height, width]): size of the image 117 | 118 | Returns: 119 | clipped_boxes (Tensor[N, 4]) 120 | """ 121 | dim = boxes.dim() 122 | boxes_x = boxes[..., 0::2] # x1, x2 123 | boxes_y = boxes[..., 1::2] # y1, y2 124 | height, width = size 125 | 126 | if torchvision._is_tracing(): 127 | boxes_x = torch.max(boxes_x, torch.tensor(0, dtype=boxes.dtype, device=boxes.device)) 128 | boxes_x = torch.min(boxes_x, torch.tensor(width, dtype=boxes.dtype, device=boxes.device)) 129 | boxes_y = torch.max(boxes_y, torch.tensor(0, dtype=boxes.dtype, device=boxes.device)) 130 | boxes_y = torch.min(boxes_y, torch.tensor(height, dtype=boxes.dtype, device=boxes.device)) 131 | else: 132 | boxes_x = boxes_x.clamp(min=0, max=width) # 限制x坐标范围在[0,width]之间 133 | boxes_y = boxes_y.clamp(min=0, max=height) # 限制y坐标范围在[0,height]之间 134 | 135 | clipped_boxes = torch.stack((boxes_x, boxes_y), dim=dim) 136 | return clipped_boxes.reshape(boxes.shape) 137 | 138 | 139 | def box_area(boxes): 140 | """ 141 | Computes the area of a set of bounding boxes, which are specified by its 142 | (x1, y1, x2, y2) coordinates. 143 | 144 | Arguments: 145 | boxes (Tensor[N, 4]): boxes for which the area will be computed. They 146 | are expected to be in (x1, y1, x2, y2) format 147 | 148 | Returns: 149 | area (Tensor[N]): area for each box 150 | """ 151 | return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1]) 152 | 153 | 154 | def box_iou(boxes1, boxes2): 155 | """ 156 | Return intersection-over-union (Jaccard index) of boxes. 157 | 158 | Both sets of boxes are expected to be in (x1, y1, x2, y2) format. 159 | 160 | Arguments: 161 | boxes1 (Tensor[N, 4]) 162 | boxes2 (Tensor[M, 4]) 163 | 164 | Returns: 165 | iou (Tensor[N, M]): the NxM matrix containing the pairwise 166 | IoU values for every element in boxes1 and boxes2 167 | """ 168 | area1 = box_area(boxes1) 169 | area2 = box_area(boxes2) 170 | 171 | # When the shapes do not match, 172 | # the shape of the returned output tensor follows the broadcasting rules 173 | lt = torch.max(boxes1[:, None, :2], boxes2[:, :2]) # left-top [N,M,2] 174 | rb = torch.min(boxes1[:, None, 2:], boxes2[:, 2:]) # right-bottom [N,M,2] 175 | 176 | wh = (rb - lt).clamp(min=0) # [N,M,2] 177 | inter = wh[:, :, 0] * wh[:, :, 1] # [N,M] 178 | 179 | iou = inter / (area1[:, None] + area2 - inter) 180 | return iou 181 | 182 | -------------------------------------------------------------------------------- /network_files/image_list.py: -------------------------------------------------------------------------------- 1 | from typing import List, Tuple 2 | from torch import Tensor 3 | 4 | 5 | class ImageList(object): 6 | """ 7 | Structure that holds a list of images (of possibly 8 | varying sizes) as a single tensor. 9 | This works by padding the images to the same size, 10 | and storing in a field the original sizes of each image 11 | """ 12 | 13 | def __init__(self, tensors, image_sizes): 14 | # type: (Tensor, List[Tuple[int, int]]) -> None 15 | """ 16 | Arguments: 17 | tensors (tensor) padding后的图像数据 18 | image_sizes (list[tuple[int, int]]) padding前的图像尺寸 19 | """ 20 | self.tensors = tensors 21 | self.image_sizes = image_sizes 22 | 23 | def to(self, device): 24 | cast_tensor = self.tensors.to(device) 25 | return ImageList(cast_tensor, self.image_sizes) 26 | 27 | -------------------------------------------------------------------------------- /network_files/vehiclemaeencode.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | import torch.nn as nn 4 | from itertools import repeat 5 | import math 6 | import torch 7 | import torch.nn as nn 8 | import collections.abc as container_abcs 9 | from timm.models.vision_transformer import PatchEmbed, Block 10 | 11 | def _ntuple(n): 12 | def parse(x): 13 | if isinstance(x, container_abcs.Iterable): 14 | return x 15 | return tuple(repeat(x, n)) 16 | return parse 17 | to_2tuple = _ntuple(2) 18 | def drop_path(x, drop_prob: float = 0., training: bool = False): 19 | """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). 20 | 21 | This is the same as the DropConnect impl I created for EfficientNet, etc networks, however, 22 | the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper... 23 | See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for 24 | changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use 25 | 'survival rate' as the argument. 26 | 27 | """ 28 | if drop_prob == 0. or not training: 29 | return x 30 | keep_prob = 1 - drop_prob 31 | shape = (x.shape[0],) + (1,) * (x.ndim - 1) # work with diff dim tensors, not just 2D ConvNets 32 | random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device) 33 | random_tensor.floor_() # binarize 34 | output = x.div(keep_prob) * random_tensor 35 | return output 36 | 37 | class DropPath(nn.Module): 38 | """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). 39 | """ 40 | def __init__(self, drop_prob=None): 41 | super(DropPath, self).__init__() 42 | self.drop_prob = drop_prob 43 | 44 | def forward(self, x): 45 | return drop_path(x, self.drop_prob, self.training) 46 | 47 | class Mlp(nn.Module): 48 | def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.): 49 | super().__init__() 50 | out_features = out_features or in_features 51 | hidden_features = hidden_features or in_features 52 | self.fc1 = nn.Linear(in_features, hidden_features) 53 | self.act = act_layer() 54 | self.fc2 = nn.Linear(hidden_features, out_features) 55 | self.drop = nn.Dropout(drop) 56 | 57 | def forward(self, x): 58 | x = self.fc1(x) 59 | x = self.act(x) 60 | x = self.drop(x) 61 | x = self.fc2(x) 62 | x = self.drop(x) 63 | return x 64 | 65 | 66 | class Attention(nn.Module): 67 | def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0.): 68 | super().__init__() 69 | self.num_heads = num_heads 70 | head_dim = dim // num_heads 71 | # NOTE scale factor was wrong in my original version, can set manually to be compat with prev weights 72 | self.scale = qk_scale or head_dim ** -0.5 73 | 74 | self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias) 75 | self.attn_drop = nn.Dropout(attn_drop) 76 | self.proj = nn.Linear(dim, dim) 77 | self.proj_drop = nn.Dropout(proj_drop) 78 | 79 | def forward(self, x): 80 | B, N, C = x.shape 81 | qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4) 82 | q, k, v = qkv[0], qkv[1], qkv[2] # make torchscript happy (cannot use tensor as tuple) 83 | 84 | attn = (q @ k.transpose(-2, -1)) * self.scale 85 | attn = attn.softmax(dim=-1) 86 | attn = self.attn_drop(attn) 87 | 88 | x = (attn @ v).transpose(1, 2).reshape(B, N, C) 89 | x = self.proj(x) 90 | x = self.proj_drop(x) 91 | return x 92 | 93 | 94 | class Block(nn.Module): 95 | 96 | def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=True, qk_scale=None, drop=0., attn_drop=0., 97 | drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm): 98 | super().__init__() 99 | self.norm1 = norm_layer(dim) 100 | self.attn = Attention( 101 | dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop) 102 | # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here 103 | self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity() 104 | self.norm2 = norm_layer(dim) 105 | mlp_hidden_dim = int(dim * mlp_ratio) 106 | self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop) 107 | 108 | def forward(self, x): 109 | x = x + self.drop_path(self.attn(self.norm1(x))) 110 | x = x + self.drop_path(self.mlp(self.norm2(x))) 111 | return x 112 | 113 | class PatchEmbed(nn.Module): 114 | def __init__(self, img_size=224, patch_size=16, stride_size=20, in_chans=3, embed_dim=768): 115 | super().__init__() 116 | img_size = to_2tuple(img_size) 117 | patch_size = to_2tuple(patch_size) 118 | stride_size_tuple = to_2tuple(stride_size) 119 | self.num_x = (img_size[1] - patch_size[1]) // stride_size_tuple[1] + 1 120 | self.num_y = (img_size[0] - patch_size[0]) // stride_size_tuple[0] + 1 121 | 122 | num_patches = self.num_x * self.num_y 123 | self.img_size = img_size 124 | self.patch_size = patch_size 125 | self.num_patches = num_patches 126 | 127 | self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=stride_size) 128 | for m in self.modules(): 129 | if isinstance(m, nn.Conv2d): 130 | n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels 131 | m.weight.data.normal_(0, math.sqrt(2. / n)) 132 | elif isinstance(m, nn.BatchNorm2d): 133 | m.weight.data.fill_(1) 134 | m.bias.data.zero_() 135 | elif isinstance(m, nn.InstanceNorm2d): 136 | m.weight.data.fill_(1) 137 | m.bias.data.zero_() 138 | 139 | def forward(self, x): 140 | B, C, H, W = x.shape 141 | 142 | # FIXME look at relaxing size constraints 143 | assert H == self.img_size[0] and W == self.img_size[1], \ 144 | f"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]})." 145 | x = self.proj(x) 146 | 147 | x = x.flatten(2).transpose(1, 2) # [64, 8, 768] 148 | return x 149 | 150 | class VTBClassifier(nn.Module): 151 | def __init__(self, attr_num, dim=768,num_heads=12, mlp_ratio=4., qkv_bias=True, qk_scale=None, drop_rate=0., attn_drop_rate=0., 152 | drop_path_rate=0., norm_layer=nn.LayerNorm):#checkpoint-last.pth 153 | 154 | super().__init__() 155 | self.attr_num = attr_num 156 | self.word_embed = nn.Linear(768, dim) 157 | dpr = [x.item() for x in torch.linspace(0, drop_path_rate, 1)] # stochastic depth decay rule 158 | self.blocks = nn.ModuleList([ 159 | Block( 160 | dim=dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale, 161 | drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer)#多头注意力头数num_heads,, qk_scale=None 162 | for i in range(1)])#最新版timm要注释掉qk_scale 163 | self.norm = norm_layer(768)#对encode output做归一化 164 | self.weight_layer = nn.ModuleList([nn.Linear(dim, 1) for i in range(self.attr_num)]) 165 | self.bn = nn.BatchNorm1d(self.attr_num) 166 | 167 | self.vis_embed = nn.Parameter(torch.zeros(1, 1, dim)) 168 | self.tex_embed = nn.Parameter(torch.zeros(1, 1, dim)) 169 | 170 | @torch.no_grad() 171 | def forward(self, features, word_vec, label=None): 172 | 173 | word_embed = self.word_embed(word_vec).expand(features.shape[0], word_vec.shape[0], features.shape[-1]) 174 | 175 | tex_embed = word_embed + self.tex_embed 176 | vis_embed = features + self.vis_embed 177 | 178 | x = torch.cat([tex_embed, vis_embed], dim=1) 179 | for blk in self.blocks: 180 | x = blk(x) 181 | x = self.norm(x) #torch.Size([1024, 260, 768]) 182 | tex_feature = x[:,:47,:] 183 | 184 | logits = torch.cat([self.weight_layer[i](x[:, i, :]) for i in range(self.attr_num)], dim=1) 185 | logits = self.bn(logits) 186 | 187 | return logits,tex_feature 188 | 189 | class VehiclemaeEncode(nn.Module): 190 | def __init__(self,img_size=224,patch_size=16, stride_size=16, in_chans=3,embed_dim=768, depth=12, 191 | num_heads=12, mlp_ratio=4., qkv_bias=True, qk_scale=None, drop_rate=0., attn_drop_rate=0., 192 | drop_path_rate=0., norm_layer=nn.LayerNorm): 193 | super().__init__() 194 | self.patch_embed = PatchEmbed( 195 | img_size=img_size, patch_size=patch_size, stride_size=stride_size, in_chans=in_chans, 196 | embed_dim=embed_dim) 197 | 198 | num_patches = self.patch_embed.num_patches 199 | self.cls_token = nn.Parameter(torch.zeros(1, 1, embed_dim)) 200 | self.other_token = nn.Parameter(torch.randn(1, 1, embed_dim))#可训练的token，用于替换掉那些被mask的块 201 | self.pos_embed = nn.Parameter(torch.zeros(1, num_patches + 1, embed_dim)) 202 | 203 | dpr = [x.item() for x in torch.linspace(0, drop_path_rate, depth)] # stochastic depth decay rule 204 | 205 | self.pos_drop = nn.Dropout(p=drop_rate) 206 | self.blocks = nn.ModuleList([ 207 | Block( 208 | dim=embed_dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale, 209 | drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer)#多头注意力头数num_heads,, qk_scale=None 210 | for i in range(depth)])#最新版timm要注释掉qk_scale 211 | self.norm = norm_layer(768)#对encode output做归一化 212 | 213 | torch.nn.init.normal_(self.cls_token, std=.02) 214 | torch.nn.init.normal_(self.pos_embed, std=.02) 215 | torch.nn.init.normal_(self.other_token, std=.02) 216 | 217 | #self.apply(self._init_weights) 218 | 219 | 220 | 221 | @torch.no_grad() 222 | def forward(self, x): 223 | B = x.shape[0] 224 | x = self.patch_embed(x) 225 | 226 | cls_tokens = self.cls_token.expand(B, -1, -1) 227 | x = torch.cat((cls_tokens, x), dim=1) + self.pos_embed #(512*batch_size,197,768)256 228 | x = self.pos_drop(x) 229 | 230 | other_token = self.other_token.repeat(x.shape[0], 8, 1)#扩维 231 | 232 | x1 = x[:,:1,:] 233 | x2 = x[:,1:,:] 234 | x = torch.cat((x1,other_token, x2), dim=1) 235 | 236 | i = 0 237 | vtb_frature = None 238 | for blk in self.blocks: 239 | x = blk(x) 240 | i += 1 241 | if i == 11: 242 | vtb_frature = x 243 | 244 | x = self.norm(x)#encode的输出 245 | 246 | return x ,vtb_frature 247 | 248 | -------------------------------------------------------------------------------- /plot_curve.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | import matplotlib.pyplot as plt 3 | 4 | 5 | def plot_loss_and_lr(train_loss, learning_rate): 6 | try: 7 | x = list(range(len(train_loss))) 8 | fig, ax1 = plt.subplots(1, 1) 9 | ax1.plot(x, train_loss, 'r', label='loss') 10 | ax1.set_xlabel("step") 11 | ax1.set_ylabel("loss") 12 | ax1.set_title("Train Loss and lr") 13 | plt.legend(loc='best') 14 | 15 | ax2 = ax1.twinx() 16 | ax2.plot(x, learning_rate, label='lr') 17 | ax2.set_ylabel("learning rate") 18 | ax2.set_xlim(0, len(train_loss)) # 设置横坐标整数间隔 19 | plt.legend(loc='best') 20 | 21 | handles1, labels1 = ax1.get_legend_handles_labels() 22 | handles2, labels2 = ax2.get_legend_handles_labels() 23 | plt.legend(handles1 + handles2, labels1 + labels2, loc='upper right') 24 | 25 | fig.subplots_adjust(right=0.8) # 防止出现保存图片显示不全的情况 26 | fig.savefig('./loss_and_lr{}.png'.format(datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))) 27 | plt.close() 28 | print("successful save loss curve! ") 29 | except Exception as e: 30 | print(e) 31 | 32 | 33 | def plot_map(mAP): 34 | try: 35 | x = list(range(len(mAP))) 36 | plt.plot(x, mAP, label='mAp') 37 | plt.xlabel('epoch') 38 | plt.ylabel('mAP') 39 | plt.title('Eval mAP') 40 | plt.xlim(0, len(mAP)) 41 | plt.legend(loc='best') 42 | plt.savefig('./mAP.png') 43 | plt.close() 44 | print("successful save mAP curve!") 45 | except Exception as e: 46 | print(e) 47 | -------------------------------------------------------------------------------- /predict.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import json 4 | import pickle 5 | import numpy as np 6 | from PIL import Image 7 | import matplotlib.pyplot as plt 8 | import torch 9 | from torchvision import transforms 10 | 11 | from network_files import MaskRCNN 12 | from backbone import resnet50_fpn_backbone 13 | from draw_box_utils import draw_objs 14 | 15 | 16 | def create_model(num_classes, box_thresh=0.5): 17 | backbone = resnet50_fpn_backbone() 18 | model = MaskRCNN(backbone, 19 | num_classes=num_classes, 20 | rpn_score_thresh=box_thresh, 21 | box_score_thresh=box_thresh) 22 | 23 | return model 24 | 25 | 26 | def time_synchronized(): 27 | torch.cuda.synchronize() if torch.cuda.is_available() else None 28 | return time.time() 29 | 30 | 31 | def main(): 32 | num_classes = 4 # 不包含背景 33 | box_thresh = 0.5 34 | weights_path = "./save_weights/city_checkpoint.pth" 35 | img_path = "./image.png" 36 | label_json_path = './cityscrapes4_indices.json' 37 | 38 | data_path = './pre_model/Attribute_word_embedding_t5.pkl' 39 | dataset_info = pickle.load(open(data_path, 'rb+')) 40 | attr_vectors = dataset_info.attr_vectors.astype(np.float32)#.cuda()#.tolist() 41 | attr_vectors = torch.from_numpy(attr_vectors).cuda() 42 | 43 | # get devices 44 | device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 45 | print("using {} device.".format(device)) 46 | 47 | # create model 48 | model = create_model(num_classes=num_classes + 1, box_thresh=box_thresh) 49 | 50 | # load train weights 51 | assert os.path.exists(weights_path), "{} file dose not exist.".format(weights_path) 52 | weights_dict = torch.load(weights_path, map_location='cpu') 53 | weights_dict = weights_dict["model"] if "model" in weights_dict else weights_dict 54 | model.load_state_dict(weights_dict) 55 | model.to(device) 56 | 57 | # read class_indict 58 | assert os.path.exists(label_json_path), "json file {} dose not exist.".format(label_json_path) 59 | with open(label_json_path, 'r') as json_file: 60 | category_index = json.load(json_file) 61 | 62 | # load image 63 | assert os.path.exists(img_path), f"{img_path} does not exits." 64 | original_img = Image.open(img_path).convert('RGB') 65 | 66 | # from pil image to tensor, do not normalize image 67 | data_transform = transforms.Compose([transforms.ToTensor()]) 68 | img = data_transform(original_img) 69 | # expand batch dimension 70 | img = torch.unsqueeze(img, dim=0) 71 | 72 | model.eval() # 进入验证模式 73 | with torch.no_grad(): 74 | # init 75 | img_height, img_width = img.shape[-2:] 76 | init_img = torch.zeros((1, 3, img_height, img_width), device=device) 77 | model(init_img, attr_vectors) 78 | 79 | t_start = time_synchronized() 80 | predictions = model(img.to(device), attr_vectors)[0] 81 | t_end = time_synchronized() 82 | print("inference+NMS time: {}".format(t_end - t_start)) 83 | 84 | predict_boxes = predictions["boxes"].to("cpu").numpy() 85 | predict_classes = predictions["labels"].to("cpu").numpy() 86 | predict_scores = predictions["scores"].to("cpu").numpy() 87 | predict_mask = predictions["masks"].to("cpu").numpy() 88 | predict_mask = np.squeeze(predict_mask, axis=1) # [batch, 1, h, w] -> [batch, h, w] 89 | 90 | if len(predict_boxes) == 0: 91 | print("没有检测到任何目标!") 92 | return 93 | 94 | plot_img = draw_objs(original_img, 95 | boxes=predict_boxes, 96 | classes=predict_classes, 97 | scores=predict_scores, 98 | masks=predict_mask, 99 | category_index=category_index, 100 | line_thickness=3, 101 | font='arial.ttf', 102 | font_size=20) 103 | plt.imshow(plot_img) 104 | plt.show() 105 | # 保存预测的图片结果 106 | plot_img.save("test_result.jpg") 107 | 108 | 109 | if __name__ == '__main__': 110 | main() 111 | 112 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | _libgcc_mutex 2 | addict 3 | aliyun-python-sdk-core 4 | aliyun-python-sdk-kms 5 | appdirs 6 | ca-certificates 7 | certifi 8 | cffi 9 | chardet 10 | charset-normalizer 11 | cityscapesscripts 12 | click 13 | colorama 14 | coloredlogs 15 | crcmod 16 | cryptography 17 | cycler 18 | cython 19 | easydict 20 | filelock 21 | fonttools 22 | fsspec 23 | huggingface-hub 24 | humanfriendly 25 | idna 26 | imagecorruptions 27 | imageio 28 | importlib-metadata 29 | jmespath 30 | joblib 31 | kiwisolver 32 | ld_impl_linux-64 33 | libffi 34 | libgcc-ng 35 | libstdcxx-ng 36 | lxml 37 | markdown 38 | markdown-it-py 39 | matplotlib 40 | mdurl 41 | mmcv 42 | mmengine 43 | model-index 44 | ncurses 45 | networkx 46 | nltk 47 | numpy 48 | opencv-python 49 | opendatalab 50 | openmim 51 | openssl 52 | openxlab 53 | ordered-set 54 | oss2 55 | packaging 56 | pandas 57 | pillow 58 | pip 59 | platformdirs 60 | pycocotools 61 | pycparser 62 | pycryptodome 63 | pygments 64 | pyparsing 65 | pyquaternion 66 | python 67 | python-dateutil 68 | pytz 69 | pywavelets 70 | pyyaml 71 | readline 72 | regex 73 | requests 74 | rich 75 | safetensors 76 | scikit-image 77 | scikit-learn 78 | scipy 79 | sentence-transformers 80 | sentencepiece 81 | setuptools 82 | shapely 83 | six 84 | sqlite 85 | summary 86 | tabulate 87 | termcolor 88 | terminaltables 89 | threadpoolctl 90 | tifffile 91 | timm 92 | tk 93 | tokenizers 94 | tomli 95 | torch 96 | torchaudio 97 | torchvision 98 | tqdm 99 | transformers 100 | typing 101 | typing-extensions 102 | urllib3 103 | wheel 104 | xz 105 | yapf 106 | zipp 107 | zlib 108 | -------------------------------------------------------------------------------- /train_utils/__init__.py: -------------------------------------------------------------------------------- 1 | from .group_by_aspect_ratio import GroupedBatchSampler, create_aspect_ratio_groups 2 | from .distributed_utils import init_distributed_mode, save_on_master, mkdir 3 | from .coco_eval import EvalCOCOMetric 4 | from .coco_utils import coco_remove_images_without_annotations, convert_coco_poly_mask, convert_to_coco_api 5 | -------------------------------------------------------------------------------- /train_utils/__pycache__/__init__.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/train_utils/__pycache__/__init__.cpython-38.pyc -------------------------------------------------------------------------------- /train_utils/__pycache__/coco_eval.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/train_utils/__pycache__/coco_eval.cpython-38.pyc -------------------------------------------------------------------------------- /train_utils/__pycache__/coco_utils.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/train_utils/__pycache__/coco_utils.cpython-38.pyc -------------------------------------------------------------------------------- /train_utils/__pycache__/distributed_utils.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/train_utils/__pycache__/distributed_utils.cpython-38.pyc -------------------------------------------------------------------------------- /train_utils/__pycache__/group_by_aspect_ratio.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/train_utils/__pycache__/group_by_aspect_ratio.cpython-38.pyc -------------------------------------------------------------------------------- /train_utils/__pycache__/train_eval_utils.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Event-AHU/VFM-Det/590b73510b28088dc2a74e6fbbb689b49818b6b8/train_utils/__pycache__/train_eval_utils.cpython-38.pyc -------------------------------------------------------------------------------- /train_utils/coco_eval.py: -------------------------------------------------------------------------------- 1 | import json 2 | import copy 3 | 4 | import numpy as np 5 | from pycocotools.coco import COCO 6 | from pycocotools.cocoeval import COCOeval 7 | import pycocotools.mask as mask_util 8 | from .distributed_utils import all_gather, is_main_process 9 | 10 | 11 | def merge(img_ids, eval_results): 12 | """将多个进程之间的数据汇总在一起""" 13 | all_img_ids = all_gather(img_ids) 14 | all_eval_results = all_gather(eval_results) 15 | 16 | merged_img_ids = [] 17 | for p in all_img_ids: 18 | merged_img_ids.extend(p) 19 | 20 | merged_eval_results = [] 21 | for p in all_eval_results: 22 | merged_eval_results.extend(p) 23 | 24 | merged_img_ids = np.array(merged_img_ids) 25 | 26 | # keep only unique (and in sorted order) images 27 | # 去除重复的图片索引，多GPU训练时为了保证每个进程的训练图片数量相同，可能将一张图片分配给多个进程 28 | merged_img_ids, idx = np.unique(merged_img_ids, return_index=True) 29 | merged_eval_results = [merged_eval_results[i] for i in idx] 30 | 31 | return list(merged_img_ids), merged_eval_results 32 | 33 | 34 | class EvalCOCOMetric: 35 | def __init__(self, 36 | coco: COCO = None, 37 | iou_type: str = None, 38 | results_file_name: str = "predict_results.json", 39 | classes_mapping: dict = None): 40 | self.coco = copy.deepcopy(coco) 41 | self.img_ids = [] # 记录每个进程处理图片的ids 42 | self.results = [] 43 | self.aggregation_results = None 44 | self.classes_mapping = classes_mapping 45 | self.coco_evaluator = None 46 | assert iou_type in ["bbox", "segm", "keypoints"] 47 | self.iou_type = iou_type 48 | self.results_file_name = results_file_name 49 | 50 | def prepare_for_coco_detection(self, targets, outputs): 51 | """将预测的结果转换成COCOeval指定的格式，针对目标检测任务""" 52 | # 遍历每张图像的预测结果 53 | for target, output in zip(targets, outputs): 54 | if len(output) == 0: 55 | continue 56 | 57 | img_id = int(target["image_id"]) 58 | if img_id in self.img_ids: 59 | # 防止出现重复的数据 60 | continue 61 | self.img_ids.append(img_id) 62 | per_image_boxes = output["boxes"] 63 | # 对于coco_eval, 需要的每个box的数据格式为[x_min, y_min, w, h] 64 | # 而我们预测的box格式是[x_min, y_min, x_max, y_max]，所以需要转下格式 65 | per_image_boxes[:, 2:] -= per_image_boxes[:, :2] 66 | per_image_classes = output["labels"].tolist() 67 | per_image_scores = output["scores"].tolist() 68 | 69 | res_list = [] 70 | # 遍历每个目标的信息 71 | for object_score, object_class, object_box in zip( 72 | per_image_scores, per_image_classes, per_image_boxes): 73 | object_score = float(object_score) 74 | class_idx = int(object_class) 75 | if self.classes_mapping is not None: 76 | class_idx = int(self.classes_mapping[str(class_idx)]) 77 | # We recommend rounding coordinates to the nearest tenth of a pixel 78 | # to reduce resulting JSON file size. 79 | object_box = [round(b, 2) for b in object_box.tolist()] 80 | 81 | res = {"image_id": img_id, 82 | "category_id": class_idx, 83 | "bbox": object_box, 84 | "score": round(object_score, 3)} 85 | res_list.append(res) 86 | self.results.append(res_list) 87 | 88 | def prepare_for_coco_segmentation(self, targets, outputs): 89 | """将预测的结果转换成COCOeval指定的格式，针对实例分割任务""" 90 | # 遍历每张图像的预测结果 91 | for target, output in zip(targets, outputs): 92 | if len(output) == 0: 93 | continue 94 | 95 | img_id = int(target["image_id"]) 96 | if img_id in self.img_ids: 97 | # 防止出现重复的数据 98 | continue 99 | 100 | self.img_ids.append(img_id) 101 | per_image_masks = output["masks"] 102 | per_image_classes = output["labels"].tolist() 103 | per_image_scores = output["scores"].tolist() 104 | 105 | masks = per_image_masks > 0.5 106 | 107 | res_list = [] 108 | # 遍历每个目标的信息 109 | for mask, label, score in zip(masks, per_image_classes, per_image_scores): 110 | rle = mask_util.encode(np.array(mask[0, :, :, np.newaxis], dtype=np.uint8, order="F"))[0] 111 | rle["counts"] = rle["counts"].decode("utf-8") 112 | 113 | class_idx = int(label) 114 | if self.classes_mapping is not None: 115 | class_idx = int(self.classes_mapping[str(class_idx)]) 116 | 117 | res = {"image_id": img_id, 118 | "category_id": class_idx, 119 | "segmentation": rle, 120 | "score": round(score, 3)} 121 | res_list.append(res) 122 | self.results.append(res_list) 123 | 124 | def update(self, targets, outputs): 125 | if self.iou_type == "bbox": 126 | self.prepare_for_coco_detection(targets, outputs) 127 | elif self.iou_type == "segm": 128 | self.prepare_for_coco_segmentation(targets, outputs) 129 | else: 130 | raise KeyError(f"not support iou_type: {self.iou_type}") 131 | 132 | def synchronize_results(self): 133 | # 同步所有进程中的数据 134 | eval_ids, eval_results = merge(self.img_ids, self.results) 135 | self.aggregation_results = {"img_ids": eval_ids, "results": eval_results} 136 | 137 | # 主进程上保存即可 138 | if is_main_process(): 139 | results = [] 140 | [results.extend(i) for i in eval_results] 141 | # write predict results into json file 142 | json_str = json.dumps(results, indent=4) 143 | with open(self.results_file_name, 'w') as json_file: 144 | json_file.write(json_str) 145 | 146 | def evaluate(self): 147 | # 只在主进程上评估即可 148 | if is_main_process(): 149 | # accumulate predictions from all images 150 | coco_true = self.coco 151 | coco_pre = coco_true.loadRes(self.results_file_name) 152 | 153 | self.coco_evaluator = COCOeval(cocoGt=coco_true, cocoDt=coco_pre, iouType=self.iou_type) 154 | 155 | self.coco_evaluator.evaluate() 156 | self.coco_evaluator.accumulate() 157 | print(f"IoU metric: {self.iou_type}") 158 | self.coco_evaluator.summarize() 159 | 160 | coco_info = self.coco_evaluator.stats.tolist() # numpy to list 161 | return coco_info 162 | else: 163 | return None 164 | -------------------------------------------------------------------------------- /train_utils/coco_utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.utils.data 3 | from pycocotools import mask as coco_mask 4 | from pycocotools.coco import COCO 5 | 6 | 7 | def coco_remove_images_without_annotations(dataset, ids): 8 | """ 9 | 删除coco数据集中没有目标，或者目标面积非常小的数据 10 | refer to: 11 | https://github.com/pytorch/vision/blob/master/references/detection/coco_utils.py 12 | :param dataset: 13 | :param cat_list: 14 | :return: 15 | """ 16 | def _has_only_empty_bbox(anno): 17 | return all(any(o <= 1 for o in obj["bbox"][2:]) for obj in anno) 18 | 19 | def _has_valid_annotation(anno): 20 | # if it's empty, there is no annotation 21 | if len(anno) == 0: 22 | return False 23 | # if all boxes have close to zero area, there is no annotation 24 | if _has_only_empty_bbox(anno): 25 | return False 26 | 27 | return True 28 | 29 | valid_ids = [] 30 | for ds_idx, img_id in enumerate(ids): 31 | ann_ids = dataset.getAnnIds(imgIds=img_id, iscrowd=None) 32 | anno = dataset.loadAnns(ann_ids) 33 | 34 | if _has_valid_annotation(anno): 35 | valid_ids.append(img_id) 36 | 37 | return valid_ids 38 | 39 | 40 | def convert_coco_poly_mask(segmentations, height, width): 41 | masks = [] 42 | for polygons in segmentations: 43 | rles = coco_mask.frPyObjects(polygons, height, width) 44 | mask = coco_mask.decode(rles) 45 | if len(mask.shape) < 3: 46 | mask = mask[..., None] 47 | mask = torch.as_tensor(mask, dtype=torch.uint8) 48 | mask = mask.any(dim=2) 49 | masks.append(mask) 50 | if masks: 51 | masks = torch.stack(masks, dim=0) 52 | else: 53 | # 如果mask为空，则说明没有目标，直接返回数值为0的mask 54 | masks = torch.zeros((0, height, width), dtype=torch.uint8) 55 | return masks 56 | 57 | 58 | def convert_to_coco_api(self): 59 | coco_ds = COCO() 60 | # annotation IDs need to start at 1, not 0, see torchvision issue #1530 61 | ann_id = 1 62 | dataset = {"images": [], "categories": [], "annotations": []} 63 | categories = set() 64 | for img_idx in range(len(self)): 65 | targets, h, w = self.get_annotations(img_idx) 66 | img_id = targets["image_id"].item() 67 | img_dict = {"id": img_id, 68 | "height": h, 69 | "width": w} 70 | dataset["images"].append(img_dict) 71 | bboxes = targets["boxes"].clone() 72 | # convert (x_min, ymin, xmax, ymax) to (xmin, ymin, w, h) 73 | bboxes[:, 2:] -= bboxes[:, :2] 74 | bboxes = bboxes.tolist() 75 | labels = targets["labels"].tolist() 76 | areas = targets["area"].tolist() 77 | iscrowd = targets["iscrowd"].tolist() 78 | if "masks" in targets: 79 | masks = targets["masks"] 80 | # make masks Fortran contiguous for coco_mask 81 | masks = masks.permute(0, 2, 1).contiguous().permute(0, 2, 1) 82 | num_objs = len(bboxes) 83 | for i in range(num_objs): 84 | ann = {"image_id": img_id, 85 | "bbox": bboxes[i], 86 | "category_id": labels[i], 87 | "area": areas[i], 88 | "iscrowd": iscrowd[i], 89 | "id": ann_id} 90 | categories.add(labels[i]) 91 | if "masks" in targets: 92 | ann["segmentation"] = coco_mask.encode(masks[i].numpy()) 93 | dataset["annotations"].append(ann) 94 | ann_id += 1 95 | dataset["categories"] = [{"id": i} for i in sorted(categories)] 96 | coco_ds.dataset = dataset 97 | coco_ds.createIndex() 98 | return coco_ds 99 | -------------------------------------------------------------------------------- /train_utils/distributed_utils.py: -------------------------------------------------------------------------------- 1 | from collections import defaultdict, deque 2 | import datetime 3 | import pickle 4 | import time 5 | import errno 6 | import os 7 | 8 | import torch 9 | import torch.distributed as dist 10 | 11 | 12 | class SmoothedValue(object): 13 | """Track a series of values and provide access to smoothed values over a 14 | window or the global series average. 15 | """ 16 | def __init__(self, window_size=20, fmt=None): 17 | if fmt is None: 18 | fmt = "{value:.4f} ({global_avg:.4f})" 19 | self.deque = deque(maxlen=window_size) # deque简单理解成加强版list 20 | self.total = 0.0 21 | self.count = 0 22 | self.fmt = fmt 23 | 24 | def update(self, value, n=1): 25 | self.deque.append(value) 26 | self.count += n 27 | self.total += value * n 28 | 29 | def synchronize_between_processes(self): 30 | """ 31 | Warning: does not synchronize the deque! 32 | """ 33 | if not is_dist_avail_and_initialized(): 34 | return 35 | t = torch.tensor([self.count, self.total], dtype=torch.float64, device="cuda") 36 | dist.barrier() 37 | dist.all_reduce(t) 38 | t = t.tolist() 39 | self.count = int(t[0]) 40 | self.total = t[1] 41 | 42 | @property 43 | def median(self): # @property 是装饰器，这里可简单理解为增加median属性(只读) 44 | d = torch.tensor(list(self.deque)) 45 | return d.median().item() 46 | 47 | @property 48 | def avg(self): 49 | d = torch.tensor(list(self.deque), dtype=torch.float32) 50 | return d.mean().item() 51 | 52 | @property 53 | def global_avg(self): 54 | return self.total / self.count 55 | 56 | @property 57 | def max(self): 58 | return max(self.deque) 59 | 60 | @property 61 | def value(self): 62 | return self.deque[-1] 63 | 64 | def __str__(self): 65 | return self.fmt.format( 66 | median=self.median, 67 | avg=self.avg, 68 | global_avg=self.global_avg, 69 | max=self.max, 70 | value=self.value) 71 | 72 | 73 | def all_gather(data): 74 | """ 75 | 收集各个进程中的数据 76 | Run all_gather on arbitrary picklable data (not necessarily tensors) 77 | Args: 78 | data: any picklable object 79 | Returns: 80 | list[data]: list of data gathered from each rank 81 | """ 82 | world_size = get_world_size() # 进程数 83 | if world_size == 1: 84 | return [data] 85 | 86 | data_list = [None] * world_size 87 | dist.all_gather_object(data_list, data) 88 | 89 | return data_list 90 | 91 | 92 | def reduce_dict(input_dict, average=True): 93 | """ 94 | Args: 95 | input_dict (dict): all the values will be reduced 96 | average (bool): whether to do average or sum 97 | Reduce the values in the dictionary from all processes so that all processes 98 | have the averaged results. Returns a dict with the same fields as 99 | input_dict, after reduction. 100 | """ 101 | world_size = get_world_size() 102 | if world_size < 2: # 单GPU的情况 103 | return input_dict 104 | with torch.no_grad(): # 多GPU的情况 105 | names = [] 106 | values = [] 107 | # sort the keys so that they are consistent across processes 108 | for k in sorted(input_dict.keys()): 109 | names.append(k) 110 | values.append(input_dict[k]) 111 | values = torch.stack(values, dim=0) 112 | dist.all_reduce(values) 113 | if average: 114 | values /= world_size 115 | 116 | reduced_dict = {k: v for k, v in zip(names, values)} 117 | return reduced_dict 118 | 119 | 120 | class MetricLogger(object): 121 | def __init__(self, delimiter="\t"): 122 | self.meters = defaultdict(SmoothedValue) 123 | self.delimiter = delimiter 124 | 125 | def update(self, **kwargs): 126 | for k, v in kwargs.items(): 127 | if isinstance(v, torch.Tensor): 128 | v = v.item() 129 | assert isinstance(v, (float, int)) 130 | self.meters[k].update(v) 131 | 132 | def __getattr__(self, attr): 133 | if attr in self.meters: 134 | return self.meters[attr] 135 | if attr in self.__dict__: 136 | return self.__dict__[attr] 137 | raise AttributeError("'{}' object has no attribute '{}'".format( 138 | type(self).__name__, attr)) 139 | 140 | def __str__(self): 141 | loss_str = [] 142 | for name, meter in self.meters.items(): 143 | loss_str.append( 144 | "{}: {}".format(name, str(meter)) 145 | ) 146 | return self.delimiter.join(loss_str) 147 | 148 | def synchronize_between_processes(self): 149 | for meter in self.meters.values(): 150 | meter.synchronize_between_processes() 151 | 152 | def add_meter(self, name, meter): 153 | self.meters[name] = meter 154 | 155 | def log_every(self, iterable, print_freq, header=None): 156 | i = 0 157 | if not header: 158 | header = "" 159 | start_time = time.time() 160 | end = time.time() 161 | iter_time = SmoothedValue(fmt='{avg:.4f}') 162 | data_time = SmoothedValue(fmt='{avg:.4f}') 163 | space_fmt = ":" + str(len(str(len(iterable)))) + "d" 164 | if torch.cuda.is_available(): 165 | log_msg = self.delimiter.join([header, 166 | '[{0' + space_fmt + '}/{1}]', 167 | 'eta: {eta}', 168 | '{meters}', 169 | 'time: {time}', 170 | 'data: {data}', 171 | 'max mem: {memory:.0f}']) 172 | else: 173 | log_msg = self.delimiter.join([header, 174 | '[{0' + space_fmt + '}/{1}]', 175 | 'eta: {eta}', 176 | '{meters}', 177 | 'time: {time}', 178 | 'data: {data}']) 179 | MB = 1024.0 * 1024.0 180 | for obj in iterable: 181 | data_time.update(time.time() - end) 182 | yield obj 183 | iter_time.update(time.time() - end) 184 | if i % print_freq == 0 or i == len(iterable) - 1: 185 | eta_second = iter_time.global_avg * (len(iterable) - i) 186 | eta_string = str(datetime.timedelta(seconds=eta_second)) 187 | if torch.cuda.is_available(): 188 | print(log_msg.format(i, len(iterable), 189 | eta=eta_string, 190 | meters=str(self), 191 | time=str(iter_time), 192 | data=str(data_time), 193 | memory=torch.cuda.max_memory_allocated() / MB)) 194 | else: 195 | print(log_msg.format(i, len(iterable), 196 | eta=eta_string, 197 | meters=str(self), 198 | time=str(iter_time), 199 | data=str(data_time))) 200 | i += 1 201 | end = time.time() 202 | total_time = time.time() - start_time 203 | total_time_str = str(datetime.timedelta(seconds=int(total_time))) 204 | print('{} Total time: {} ({:.4f} s / it)'.format(header, 205 | total_time_str, 206 | 207 | total_time / len(iterable))) 208 | 209 | 210 | def warmup_lr_scheduler(optimizer, warmup_iters, warmup_factor): 211 | 212 | def f(x): 213 | """根据step数返回一个学习率倍率因子""" 214 | if x >= warmup_iters: # 当迭代数大于给定的warmup_iters时，倍率因子为1 215 | return 1 216 | alpha = float(x) / warmup_iters 217 | # 迭代过程中倍率因子从warmup_factor -> 1 218 | return warmup_factor * (1 - alpha) + alpha 219 | 220 | return torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=f) 221 | 222 | 223 | def mkdir(path): 224 | try: 225 | os.makedirs(path) 226 | except OSError as e: 227 | if e.errno != errno.EEXIST: 228 | raise 229 | 230 | 231 | def setup_for_distributed(is_master): 232 | """ 233 | This function disables when not in master process 234 | """ 235 | import builtins as __builtin__ 236 | builtin_print = __builtin__.print 237 | 238 | def print(*args, **kwargs): 239 | force = kwargs.pop('force', False) 240 | if is_master or force: 241 | builtin_print(*args, **kwargs) 242 | 243 | __builtin__.print = print 244 | 245 | 246 | def is_dist_avail_and_initialized(): 247 | """检查是否支持分布式环境""" 248 | if not dist.is_available(): 249 | return False 250 | if not dist.is_initialized(): 251 | return False 252 | return True 253 | 254 | 255 | def get_world_size(): 256 | if not is_dist_avail_and_initialized(): 257 | return 1 258 | return dist.get_world_size() 259 | 260 | 261 | def get_rank(): 262 | if not is_dist_avail_and_initialized(): 263 | return 0 264 | return dist.get_rank() 265 | 266 | 267 | def is_main_process(): 268 | return get_rank() == 0 269 | 270 | 271 | def save_on_master(*args, **kwargs): 272 | if is_main_process(): 273 | torch.save(*args, **kwargs) 274 | 275 | 276 | def init_distributed_mode(args): 277 | if 'RANK' in os.environ and 'WORLD_SIZE' in os.environ: 278 | args.rank = int(os.environ["RANK"]) 279 | args.world_size = int(os.environ['WORLD_SIZE']) 280 | args.gpu = int(os.environ['LOCAL_RANK']) 281 | elif 'SLURM_PROCID' in os.environ: 282 | args.rank = int(os.environ['SLURM_PROCID']) 283 | args.gpu = args.rank % torch.cuda.device_count() 284 | else: 285 | print('Not using distributed mode') 286 | args.distributed = False 287 | return 288 | 289 | args.distributed = True 290 | 291 | torch.cuda.set_device(args.gpu) 292 | args.dist_backend = 'nccl' 293 | print('| distributed init (rank {}): {}'.format( 294 | args.rank, args.dist_url), flush=True) 295 | torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url, 296 | world_size=args.world_size, rank=args.rank) 297 | torch.distributed.barrier() 298 | setup_for_distributed(args.rank == 0) 299 | 300 | -------------------------------------------------------------------------------- /train_utils/group_by_aspect_ratio.py: -------------------------------------------------------------------------------- 1 | import bisect 2 | from collections import defaultdict 3 | import copy 4 | from itertools import repeat, chain 5 | import math 6 | import numpy as np 7 | 8 | import torch 9 | import torch.utils.data 10 | from torch.utils.data.sampler import BatchSampler, Sampler 11 | from torch.utils.model_zoo import tqdm 12 | import torchvision 13 | 14 | from PIL import Image 15 | 16 | 17 | def _repeat_to_at_least(iterable, n): 18 | repeat_times = math.ceil(n / len(iterable)) 19 | repeated = chain.from_iterable(repeat(iterable, repeat_times)) 20 | return list(repeated) 21 | 22 | 23 | class GroupedBatchSampler(BatchSampler): 24 | """ 25 | Wraps another sampler to yield a mini-batch of indices. 26 | It enforces that the batch only contain elements from the same group. 27 | It also tries to provide mini-batches which follows an ordering which is 28 | as close as possible to the ordering from the original sampler. 29 | Arguments: 30 | sampler (Sampler): Base sampler. 31 | group_ids (list[int]): If the sampler produces indices in range [0, N), 32 | `group_ids` must be a list of `N` ints which contains the group id of each sample. 33 | The group ids must be a continuous set of integers starting from 34 | 0, i.e. they must be in the range [0, num_groups). 35 | batch_size (int): Size of mini-batch. 36 | """ 37 | def __init__(self, sampler, group_ids, batch_size): 38 | if not isinstance(sampler, Sampler): 39 | raise ValueError( 40 | "sampler should be an instance of " 41 | "torch.utils.data.Sampler, but got sampler={}".format(sampler) 42 | ) 43 | self.sampler = sampler 44 | self.group_ids = group_ids 45 | self.batch_size = batch_size 46 | 47 | def __iter__(self): 48 | buffer_per_group = defaultdict(list) 49 | samples_per_group = defaultdict(list) 50 | 51 | num_batches = 0 52 | for idx in self.sampler: 53 | group_id = self.group_ids[idx] 54 | buffer_per_group[group_id].append(idx) 55 | samples_per_group[group_id].append(idx) 56 | if len(buffer_per_group[group_id]) == self.batch_size: 57 | yield buffer_per_group[group_id] 58 | num_batches += 1 59 | del buffer_per_group[group_id] 60 | assert len(buffer_per_group[group_id]) < self.batch_size 61 | 62 | # now we have run out of elements that satisfy 63 | # the group criteria, let's return the remaining 64 | # elements so that the size of the sampler is 65 | # deterministic 66 | expected_num_batches = len(self) 67 | num_remaining = expected_num_batches - num_batches 68 | if num_remaining > 0: 69 | # for the remaining batches, take first the buffers with largest number 70 | # of elements 71 | for group_id, _ in sorted(buffer_per_group.items(), 72 | key=lambda x: len(x[1]), reverse=True): 73 | remaining = self.batch_size - len(buffer_per_group[group_id]) 74 | samples_from_group_id = _repeat_to_at_least(samples_per_group[group_id], remaining) 75 | buffer_per_group[group_id].extend(samples_from_group_id[:remaining]) 76 | assert len(buffer_per_group[group_id]) == self.batch_size 77 | yield buffer_per_group[group_id] 78 | num_remaining -= 1 79 | if num_remaining == 0: 80 | break 81 | assert num_remaining == 0 82 | 83 | def __len__(self): 84 | return len(self.sampler) // self.batch_size 85 | 86 | 87 | def _compute_aspect_ratios_slow(dataset, indices=None): 88 | print("Your dataset doesn't support the fast path for " 89 | "computing the aspect ratios, so will iterate over " 90 | "the full dataset and load every image instead. " 91 | "This might take some time...") 92 | if indices is None: 93 | indices = range(len(dataset)) 94 | 95 | class SubsetSampler(Sampler): 96 | def __init__(self, indices): 97 | self.indices = indices 98 | 99 | def __iter__(self): 100 | return iter(self.indices) 101 | 102 | def __len__(self): 103 | return len(self.indices) 104 | 105 | sampler = SubsetSampler(indices) 106 | data_loader = torch.utils.data.DataLoader( 107 | dataset, batch_size=1, sampler=sampler, 108 | num_workers=14, # you might want to increase it for faster processing 109 | collate_fn=lambda x: x[0]) 110 | aspect_ratios = [] 111 | with tqdm(total=len(dataset)) as pbar: 112 | for _i, (img, _) in enumerate(data_loader): 113 | pbar.update(1) 114 | height, width = img.shape[-2:] 115 | aspect_ratio = float(width) / float(height) 116 | aspect_ratios.append(aspect_ratio) 117 | return aspect_ratios 118 | 119 | 120 | def _compute_aspect_ratios_custom_dataset(dataset, indices=None): 121 | if indices is None: 122 | indices = range(len(dataset)) 123 | aspect_ratios = [] 124 | for i in indices: 125 | height, width = dataset.get_height_and_width(i) 126 | aspect_ratio = float(width) / float(height) 127 | aspect_ratios.append(aspect_ratio) 128 | return aspect_ratios 129 | 130 | 131 | def _compute_aspect_ratios_coco_dataset(dataset, indices=None): 132 | if indices is None: 133 | indices = range(len(dataset)) 134 | aspect_ratios = [] 135 | for i in indices: 136 | img_info = dataset.coco.imgs[dataset.ids[i]] 137 | aspect_ratio = float(img_info["width"]) / float(img_info["height"]) 138 | aspect_ratios.append(aspect_ratio) 139 | return aspect_ratios 140 | 141 | 142 | def _compute_aspect_ratios_voc_dataset(dataset, indices=None): 143 | if indices is None: 144 | indices = range(len(dataset)) 145 | aspect_ratios = [] 146 | for i in indices: 147 | # this doesn't load the data into memory, because PIL loads it lazily 148 | width, height = Image.open(dataset.images[i]).size 149 | aspect_ratio = float(width) / float(height) 150 | aspect_ratios.append(aspect_ratio) 151 | return aspect_ratios 152 | 153 | 154 | def _compute_aspect_ratios_subset_dataset(dataset, indices=None): 155 | if indices is None: 156 | indices = range(len(dataset)) 157 | 158 | ds_indices = [dataset.indices[i] for i in indices] 159 | return compute_aspect_ratios(dataset.dataset, ds_indices) 160 | 161 | 162 | def compute_aspect_ratios(dataset, indices=None): 163 | if hasattr(dataset, "get_height_and_width"): 164 | return _compute_aspect_ratios_custom_dataset(dataset, indices) 165 | 166 | if isinstance(dataset, torchvision.datasets.CocoDetection): 167 | return _compute_aspect_ratios_coco_dataset(dataset, indices) 168 | 169 | if isinstance(dataset, torchvision.datasets.VOCDetection): 170 | return _compute_aspect_ratios_voc_dataset(dataset, indices) 171 | 172 | if isinstance(dataset, torch.utils.data.Subset): 173 | return _compute_aspect_ratios_subset_dataset(dataset, indices) 174 | 175 | # slow path 176 | return _compute_aspect_ratios_slow(dataset, indices) 177 | 178 | 179 | def _quantize(x, bins): 180 | bins = copy.deepcopy(bins) 181 | bins = sorted(bins) 182 | # bisect_right：寻找y元素按顺序应该排在bins中哪个元素的右边，返回的是索引 183 | quantized = list(map(lambda y: bisect.bisect_right(bins, y), x)) 184 | return quantized 185 | 186 | 187 | def create_aspect_ratio_groups(dataset, k=0): 188 | # 计算所有数据集中的图片width/height比例 189 | aspect_ratios = compute_aspect_ratios(dataset) 190 | # 将[0.5, 2]区间划分成2*k+1等份 191 | bins = (2 ** np.linspace(-1, 1, 2 * k + 1)).tolist() if k > 0 else [1.0] 192 | 193 | # 统计所有图像比例在bins区间中的位置索引 194 | groups = _quantize(aspect_ratios, bins) 195 | # count number of elements per group 196 | # 统计每个区间的频次 197 | counts = np.unique(groups, return_counts=True)[1] 198 | fbins = [0] + bins + [np.inf] 199 | print("Using {} as bins for aspect ratio quantization".format(fbins)) 200 | print("Count of instances per bin: {}".format(counts)) 201 | return groups 202 | -------------------------------------------------------------------------------- /train_utils/train_eval_utils.py: -------------------------------------------------------------------------------- 1 | import math 2 | import sys 3 | import time 4 | 5 | import torch 6 | 7 | import train_utils.distributed_utils as utils 8 | from .coco_eval import EvalCOCOMetric 9 | 10 | 11 | def train_one_epoch(model, optimizer, data_loader, device, epoch,attr_vectors, 12 | print_freq=50, warmup=False, scaler=None): 13 | model.train() 14 | metric_logger = utils.MetricLogger(delimiter=" ") 15 | metric_logger.add_meter('lr', utils.SmoothedValue(window_size=1, fmt='{value:.6f}')) 16 | header = 'Epoch: [{}]'.format(epoch) 17 | 18 | lr_scheduler = None 19 | if epoch == 0 and warmup is True: # 当训练第一轮（epoch=0）时，启用warmup训练方式，可理解为热身训练 20 | warmup_factor = 1.0 / 1000 21 | warmup_iters = min(1000, len(data_loader) - 1) 22 | 23 | lr_scheduler = utils.warmup_lr_scheduler(optimizer, warmup_iters, warmup_factor) 24 | 25 | mloss = torch.zeros(1).to(device) # mean losses 26 | for i, [images, targets] in enumerate(metric_logger.log_every(data_loader, print_freq, header)): 27 | images = list(image.to(device) for image in images) 28 | targets = [{k: v.to(device) for k, v in t.items()} for t in targets] 29 | 30 | # 混合精度训练上下文管理器，如果在CPU环境中不起任何作用 31 | with torch.cuda.amp.autocast(enabled=scaler is not None): 32 | loss_dict = model(images,attr_vectors, targets) 33 | 34 | losses = sum(loss for loss in loss_dict.values()) 35 | 36 | # reduce losses over all GPUs for logging purpose 37 | loss_dict_reduced = utils.reduce_dict(loss_dict) 38 | losses_reduced = sum(loss for loss in loss_dict_reduced.values()) 39 | 40 | loss_value = losses_reduced.item() 41 | # 记录训练损失 42 | mloss = (mloss * i + loss_value) / (i + 1) # update mean losses 43 | 44 | if not math.isfinite(loss_value): # 当计算的损失为无穷大时停止训练 45 | print("Loss is {}, stopping training".format(loss_value)) 46 | print(loss_dict_reduced) 47 | sys.exit(1) 48 | 49 | optimizer.zero_grad() 50 | if scaler is not None: 51 | scaler.scale(losses).backward() 52 | scaler.step(optimizer) 53 | scaler.update() 54 | else: 55 | losses.backward() 56 | optimizer.step() 57 | 58 | if lr_scheduler is not None: # 第一轮使用warmup训练方式 59 | lr_scheduler.step() 60 | 61 | metric_logger.update(loss=losses_reduced, **loss_dict_reduced) 62 | now_lr = optimizer.param_groups[0]["lr"] 63 | metric_logger.update(lr=now_lr) 64 | 65 | return mloss, now_lr 66 | 67 | 68 | @torch.no_grad() 69 | def evaluate(model, data_loader,attr_vectors, device): 70 | cpu_device = torch.device("cpu") 71 | model.eval() 72 | metric_logger = utils.MetricLogger(delimiter=" ") 73 | header = "Test: " 74 | 75 | det_metric = EvalCOCOMetric(data_loader.dataset.coco, iou_type="bbox", results_file_name="det_results.json") 76 | seg_metric = EvalCOCOMetric(data_loader.dataset.coco, iou_type="segm", results_file_name="seg_results.json") 77 | for image, targets in metric_logger.log_every(data_loader, 100, header): 78 | image = list(img.to(device) for img in image) 79 | 80 | # 当使用CPU时，跳过GPU相关指令 81 | if device != torch.device("cpu"): 82 | torch.cuda.synchronize(device) 83 | 84 | model_time = time.time() 85 | outputs = model(image,attr_vectors) 86 | 87 | outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs] 88 | model_time = time.time() - model_time 89 | 90 | det_metric.update(targets, outputs) 91 | seg_metric.update(targets, outputs) 92 | metric_logger.update(model_time=model_time) 93 | 94 | # gather the stats from all processes 95 | metric_logger.synchronize_between_processes() 96 | print("Averaged stats:", metric_logger) 97 | 98 | # 同步所有进程中的数据 99 | det_metric.synchronize_results() 100 | seg_metric.synchronize_results() 101 | 102 | if utils.is_main_process(): 103 | coco_info = det_metric.evaluate() 104 | seg_info = seg_metric.evaluate() 105 | else: 106 | coco_info = None 107 | seg_info = None 108 | 109 | return coco_info, seg_info 110 | -------------------------------------------------------------------------------- /transforms.py: -------------------------------------------------------------------------------- 1 | import random 2 | from torchvision.transforms import functional as F 3 | 4 | 5 | class Compose(object): 6 | """组合多个transform函数""" 7 | def __init__(self, transforms): 8 | self.transforms = transforms 9 | 10 | def __call__(self, image, target): 11 | for t in self.transforms: 12 | image, target = t(image, target) 13 | return image, target 14 | 15 | 16 | class ToTensor(object): 17 | """将PIL图像转为Tensor""" 18 | def __call__(self, image, target): 19 | image = F.to_tensor(image) 20 | return image, target 21 | 22 | 23 | class RandomHorizontalFlip(object): 24 | """随机水平翻转图像以及bboxes""" 25 | def __init__(self, prob=0.5): 26 | self.prob = prob 27 | 28 | def __call__(self, image, target): 29 | if random.random() < self.prob: 30 | height, width = image.shape[-2:] 31 | image = image.flip(-1) # 水平翻转图片 32 | bbox = target["boxes"] 33 | # bbox: xmin, ymin, xmax, ymax 34 | bbox[:, [0, 2]] = width - bbox[:, [2, 0]] # 翻转对应bbox坐标信息 35 | target["boxes"] = bbox 36 | if "masks" in target: 37 | target["masks"] = target["masks"].flip(-1) 38 | return image, target 39 | -------------------------------------------------------------------------------- /validation.py: -------------------------------------------------------------------------------- 1 | """ 2 | 该脚本用于调用训练好的模型权重去计算验证集/测试集的COCO指标 3 | 以及每个类别的mAP(IoU=0.5) 4 | """ 5 | 6 | import os 7 | import json 8 | 9 | import torch 10 | from tqdm import tqdm 11 | import numpy as np 12 | 13 | import transforms 14 | from backbone import resnet50_fpn_backbone 15 | from network_files import MaskRCNN 16 | from my_dataset_coco import CocoDetection 17 | from my_dataset_cityscraps import CityscrapesDetection 18 | from train_utils import EvalCOCOMetric 19 | 20 | 21 | def summarize(self, catId=None): 22 | """ 23 | Compute and display summary metrics for evaluation results. 24 | Note this functin can *only* be applied on the default parameter setting 25 | """ 26 | 27 | def _summarize(ap=1, iouThr=None, areaRng='all', maxDets=100): 28 | p = self.params 29 | iStr = ' {:<18} {} @[ IoU={:<9} | area={:>6s} | maxDets={:>3d} ] = {:0.3f}' 30 | titleStr = 'Average Precision' if ap == 1 else 'Average Recall' 31 | typeStr = '(AP)' if ap == 1 else '(AR)' 32 | iouStr = '{:0.2f}:{:0.2f}'.format(p.iouThrs[0], p.iouThrs[-1]) \ 33 | if iouThr is None else '{:0.2f}'.format(iouThr) 34 | 35 | aind = [i for i, aRng in enumerate(p.areaRngLbl) if aRng == areaRng] 36 | mind = [i for i, mDet in enumerate(p.maxDets) if mDet == maxDets] 37 | 38 | if ap == 1: 39 | # dimension of precision: [TxRxKxAxM] 40 | s = self.eval['precision'] 41 | # IoU 42 | if iouThr is not None: 43 | t = np.where(iouThr == p.iouThrs)[0] 44 | s = s[t] 45 | 46 | if isinstance(catId, int): 47 | s = s[:, :, catId, aind, mind] 48 | else: 49 | s = s[:, :, :, aind, mind] 50 | 51 | else: 52 | # dimension of recall: [TxKxAxM] 53 | s = self.eval['recall'] 54 | if iouThr is not None: 55 | t = np.where(iouThr == p.iouThrs)[0] 56 | s = s[t] 57 | 58 | if isinstance(catId, int): 59 | s = s[:, catId, aind, mind] 60 | else: 61 | s = s[:, :, aind, mind] 62 | 63 | if len(s[s > -1]) == 0: 64 | mean_s = -1 65 | else: 66 | mean_s = np.mean(s[s > -1]) 67 | 68 | print_string = iStr.format(titleStr, typeStr, iouStr, areaRng, maxDets, mean_s) 69 | return mean_s, print_string 70 | 71 | stats, print_list = [0] * 12, [""] * 12 72 | stats[0], print_list[0] = _summarize(1) 73 | stats[1], print_list[1] = _summarize(1, iouThr=.5, maxDets=self.params.maxDets[2]) 74 | stats[2], print_list[2] = _summarize(1, iouThr=.75, maxDets=self.params.maxDets[2]) 75 | stats[3], print_list[3] = _summarize(1, areaRng='small', maxDets=self.params.maxDets[2]) 76 | stats[4], print_list[4] = _summarize(1, areaRng='medium', maxDets=self.params.maxDets[2]) 77 | stats[5], print_list[5] = _summarize(1, areaRng='large', maxDets=self.params.maxDets[2]) 78 | stats[6], print_list[6] = _summarize(0, maxDets=self.params.maxDets[0]) 79 | stats[7], print_list[7] = _summarize(0, maxDets=self.params.maxDets[1]) 80 | stats[8], print_list[8] = _summarize(0, maxDets=self.params.maxDets[2]) 81 | stats[9], print_list[9] = _summarize(0, areaRng='small', maxDets=self.params.maxDets[2]) 82 | stats[10], print_list[10] = _summarize(0, areaRng='medium', maxDets=self.params.maxDets[2]) 83 | stats[11], print_list[11] = _summarize(0, areaRng='large', maxDets=self.params.maxDets[2]) 84 | 85 | print_info = "\n".join(print_list) 86 | 87 | if not self.eval: 88 | raise Exception('Please run accumulate() first') 89 | 90 | return stats, print_info 91 | 92 | 93 | def save_info(coco_evaluator, 94 | category_index: dict, 95 | save_name: str = "record_mAP.txt"): 96 | iou_type = coco_evaluator.params.iouType 97 | print(f"IoU metric: {iou_type}") 98 | # calculate COCO info for all classes 99 | coco_stats, print_coco = summarize(coco_evaluator) 100 | 101 | # calculate voc info for every classes(IoU=0.5) 102 | classes = [v for v in category_index.values() if v != "N/A"] 103 | voc_map_info_list = [] 104 | for i in range(len(classes)): 105 | stats, _ = summarize(coco_evaluator, catId=i) 106 | voc_map_info_list.append(" {:15}: {}".format(classes[i], stats[1])) 107 | 108 | print_voc = "\n".join(voc_map_info_list) 109 | print(print_voc) 110 | 111 | # 将验证结果保存至txt文件中 112 | with open(save_name, "w") as f: 113 | record_lines = ["COCO results:", 114 | print_coco, 115 | "", 116 | "mAP(IoU=0.5) for each category:", 117 | print_voc] 118 | f.write("\n".join(record_lines)) 119 | 120 | 121 | def main(parser_data): 122 | device = torch.device(parser_data.device if torch.cuda.is_available() else "cpu") 123 | print("Using {} device training.".format(device.type)) 124 | 125 | data_transform = { 126 | "val": transforms.Compose([transforms.ToTensor()]) 127 | } 128 | 129 | # read class_indict 130 | label_json_path = parser_data.label_json_path 131 | assert os.path.exists(label_json_path), "json file {} dose not exist.".format(label_json_path) 132 | with open(label_json_path, 'r') as f: 133 | category_index = json.load(f) 134 | 135 | data_root = parser_data.data_path 136 | 137 | # 注意这里的collate_fn是自定义的，因为读取的数据包括image和targets，不能直接使用默认的方法合成batch 138 | batch_size = parser_data.batch_size 139 | nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers 140 | print('Using %g dataloader workers' % nw) 141 | 142 | # load validation data set 143 | val_dataset = CityscrapesDetection(data_root, "val", data_transform["val"]) 144 | # VOCdevkit -> VOC2012 -> ImageSets -> Main -> val.txt 145 | # val_dataset = VOCInstances(data_root, year="2012", txt_name="val.txt", transforms=data_transform["val"]) 146 | val_dataset_loader = torch.utils.data.DataLoader(val_dataset, 147 | batch_size=batch_size, 148 | shuffle=False, 149 | pin_memory=True, 150 | num_workers=nw, 151 | collate_fn=val_dataset.collate_fn) 152 | 153 | # create model 154 | backbone = resnet50_fpn_backbone() 155 | model = MaskRCNN(backbone, num_classes=args.num_classes + 1) 156 | 157 | # 载入你自己训练好的模型权重 158 | weights_path = parser_data.weights_path 159 | assert os.path.exists(weights_path), "not found {} file.".format(weights_path) 160 | model.load_state_dict(torch.load(weights_path, map_location='cpu')['model']) 161 | # print(model) 162 | 163 | model.to(device) 164 | 165 | # evaluate on the val dataset 166 | cpu_device = torch.device("cpu") 167 | 168 | det_metric = EvalCOCOMetric(val_dataset.coco, "bbox", "det_results.json") 169 | seg_metric = EvalCOCOMetric(val_dataset.coco, "segm", "seg_results.json") 170 | model.eval() 171 | with torch.no_grad(): 172 | for image, targets in tqdm(val_dataset_loader, desc="validation..."): 173 | # 将图片传入指定设备device 174 | image = list(img.to(device) for img in image) 175 | 176 | # inference 177 | outputs = model(image) 178 | 179 | outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs] 180 | det_metric.update(targets, outputs) 181 | seg_metric.update(targets, outputs) 182 | 183 | det_metric.synchronize_results() 184 | seg_metric.synchronize_results() 185 | det_metric.evaluate() 186 | seg_metric.evaluate() 187 | 188 | save_info(det_metric.coco_evaluator, category_index, "det_record_mAP.txt") 189 | save_info(seg_metric.coco_evaluator, category_index, "seg_record_mAP.txt") 190 | 191 | 192 | if __name__ == "__main__": 193 | import argparse 194 | 195 | parser = argparse.ArgumentParser( 196 | description=__doc__) 197 | 198 | # 使用设备类型 199 | parser.add_argument('--device', default='cuda', help='device') 200 | 201 | # 检测目标类别数(不包含背景) 202 | parser.add_argument('--num-classes', type=int, default=4, help='number of classes') 203 | 204 | # 数据集的根目录 205 | parser.add_argument('--data-path', default='/data/wuwentao/data/cityscapes/leftImg8bit/', help='dataset root') 206 | 207 | # 训练好的权重文件 208 | parser.add_argument('--weights-path', default='./save_weights/model_25.pth', type=str, help='training weights') 209 | 210 | # batch size(set to 1, don't change) 211 | parser.add_argument('--batch-size', default=1, type=int, metavar='N', 212 | help='batch size when validation.') 213 | # 类别索引和类别名称对应关系 214 | parser.add_argument('--label-json-path', type=str, default="coco91_indices.json") 215 | 216 | args = parser.parse_args() 217 | 218 | main(args) 219 | --------------------------------------------------------------------------------