├── CIoU.png ├── LICENSE ├── README.md ├── config ├── __init__.py ├── __pycache__ │ ├── __init__.cpython-35.pyc │ ├── __init__.cpython-36.pyc │ ├── config.cpython-35.pyc │ └── config.cpython-36.pyc └── config.py ├── data ├── CRACK.py ├── VOC.py ├── __init__.py ├── __pycache__ │ ├── CRACK.cpython-36.pyc │ ├── VOC.cpython-36.pyc │ └── __init__.cpython-36.pyc └── utils │ ├── __init__.py │ ├── __pycache__ │ ├── __init__.cpython-35.pyc │ ├── __init__.cpython-36.pyc │ ├── augmentations.cpython-35.pyc │ └── augmentations.cpython-36.pyc │ └── augmentations.py ├── model ├── __init__.py ├── __pycache__ │ ├── __init__.cpython-35.pyc │ ├── __init__.cpython-36.pyc │ ├── build_ssd.cpython-35.pyc │ └── build_ssd.cpython-36.pyc ├── backbone │ ├── __init__.py │ ├── __pycache__ │ │ ├── __init__.cpython-35.pyc │ │ ├── __init__.cpython-36.pyc │ │ ├── build_backbone.cpython-35.pyc │ │ └── build_backbone.cpython-36.pyc │ └── build_backbone.py ├── build_ssd.py ├── head │ ├── __init__.py │ ├── __pycache__ │ │ ├── __init__.cpython-35.pyc │ │ ├── __init__.cpython-36.pyc │ │ ├── build_head.cpython-35.pyc │ │ └── build_head.cpython-36.pyc │ └── build_head.py ├── neck │ ├── __init__.py │ ├── __pycache__ │ │ ├── __init__.cpython-35.pyc │ │ ├── __init__.cpython-36.pyc │ │ ├── build_neck.cpython-35.pyc │ │ ├── build_neck.cpython-36.pyc │ │ ├── ssd_neck.cpython-35.pyc │ │ └── ssd_neck.cpython-36.pyc │ ├── build_neck.py │ └── ssd_neck.py └── utils │ ├── __init__.py │ ├── __pycache__ │ ├── __init__.cpython-35.pyc │ ├── __init__.cpython-36.pyc │ ├── conv_module.cpython-35.pyc │ ├── conv_module.cpython-36.pyc │ ├── norm.cpython-35.pyc │ ├── norm.cpython-36.pyc │ ├── weight_init.cpython-35.pyc │ └── weight_init.cpython-36.pyc │ ├── conv_module.py │ ├── norm.py │ └── weight_init.py ├── tools ├── ap.py ├── eval.py ├── test.py └── train.py ├── utils ├── __init__.py ├── __pycache__ │ ├── __init__.cpython-35.pyc │ ├── __init__.cpython-35.sublime-workspace │ └── __init__.cpython-36.pyc ├── box │ ├── __init__.py │ ├── __pycache__ │ │ ├── __init__.cpython-35.pyc │ │ ├── __init__.cpython-36.pyc │ │ ├── box_utils.cpython-35.pyc │ │ ├── box_utils.cpython-36.pyc │ │ ├── prior_box.cpython-35.pyc │ │ └── prior_box.cpython-36.pyc │ ├── box_utils.py │ └── prior_box.py ├── detection │ ├── __init__.py │ ├── __pycache__ │ │ ├── __init__.cpython-35.pyc │ │ ├── __init__.cpython-36.pyc │ │ ├── detection.cpython-35.pyc │ │ └── detection.cpython-36.pyc │ └── detection.py └── loss │ ├── __init__.py │ ├── __pycache__ │ ├── __init__.cpython-35.pyc │ ├── __init__.cpython-36.pyc │ ├── multibox_loss.cpython-35.pyc │ └── multibox_loss.cpython-36.pyc │ └── multibox_loss.py └── work_dir └── DIoU-NMS.txt /CIoU.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/CIoU.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | ## Complete-IoU Loss and Cluster-NMS for improving Object Detection and Instance Segmentation. 4 | 5 | This is the code for our papers: 6 | - [Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression](https://arxiv.org/abs/1911.08287) 7 | - [Enhancing Geometric Factors into Model Learning and Inference for Object Detection and Instance Segmentation](https://arxiv.org/abs/2005.03572) 8 | 9 | ``` 10 | @Inproceedings{zheng2020diou, 11 | author = {Zheng, Zhaohui and Wang, Ping and Liu, Wei and Li, Jinze and Ye, Rongguang and Ren, Dongwei}, 12 | title = {Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression}, 13 | booktitle = {The AAAI Conference on Artificial Intelligence (AAAI)}, 14 | year = {2020}, 15 | } 16 | 17 | @Article{zheng2021ciou, 18 | author = {Zheng, Zhaohui and Wang, Ping and Ren, Dongwei and Liu, Wei and Ye, Rongguang and Hu, Qinghua and Zuo, Wangmeng}, 19 | title = {Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation}, 20 | booktitle = {IEEE Transactions on Cybernetics}, 21 | year = {2021}, 22 | } 23 | ``` 24 | 25 | ## SSD_FPN_DIoU,CIoU in PyTorch 26 | The code references [SSD: Single Shot MultiBox Object Detector, in PyTorch](https://github.com/amdegroot/ssd.pytorch), [mmdet](https://github.com/open-mmlab/mmdetection) and [**JavierHuang**](https://github.com/JaryHuang). Currently, some experiments are carried out on the VOC dataset, if you want to train your own dataset, more details can be refer to the links above. 27 | 28 | ### Losses 29 | 30 | Losses can be chosen with the `losstype` option in the `config/config.py` file The valid options are currently: `[Iou|Giou|Diou|Ciou|SmoothL1]`. 31 | 32 | ``` 33 | VOC: 34 | 'losstype': 'Ciou' 35 | ``` 36 | 37 | ## Fold-Structure 38 | The fold structure as follow: 39 | - config/ 40 | - config.py 41 | - __init__.py 42 | - data/ 43 | - __init__.py 44 | - VOC.py 45 | - VOCdevkit/ 46 | - model/ 47 | - build_ssd.py 48 | - __init__.py 49 | - backbone/ 50 | - neck/ 51 | - head/ 52 | - utils/ 53 | - utils/ 54 | - box/ 55 | - detection/ 56 | - loss/ 57 | - __init__.py 58 | - tools/ 59 | - train.py 60 | - eval.py 61 | - test.py 62 | - work_dir/ 63 | 64 | 65 | ## Environment 66 | - pytorch 0.4.1 67 | - python3+ 68 | - visdom 69 | - for real-time loss visualization during training! 70 | ```Shell 71 | pip install visdom 72 | ``` 73 | - Start the server (probably in a screen or tmux) 74 | ```Shell 75 | python visdom 76 | ``` 77 | * Then (during training) navigate to http://localhost:8097/ (see the Train section below for training details). 78 | 79 | 80 | ## Datasets 81 | - PASCAL VOC:Download VOC2007, VOC2012 dataset, then put VOCdevkit in the data directory 82 | 83 | 84 | ## Training 85 | 86 | ### Training VOC 87 | - The pretrained model refer [pretrained-models.pytorch](https://github.com/Cadene/pretrained-models.pytorch),you can download it. 88 | 89 | - In the DIoU-SSD-pytorch fold: 90 | ```Shell 91 | python tools/train.py 92 | ``` 93 | 94 | - Note: 95 | * For training, default NVIDIA GPU. 96 | * You can set the parameters in the train.py (see 'tools/train.py` for options) 97 | * In the config,you can set the work_dir to save your training weight.(see 'configs/config.py`) 98 | 99 | ## Evaluation 100 | - To evaluate a trained network: 101 | 102 | ```Shell 103 | python tools/ap.py --trained_model {your_weight_address} 104 | ``` 105 | 106 | For example: (the output is AP50, AP75 and AP of our CIoU loss) 107 | ``` 108 | Results: 109 | 0.033 110 | 0.015 111 | 0.009 112 | 0.011 113 | 0.008 114 | 0.083 115 | 0.044 116 | 0.042 117 | 0.004 118 | 0.014 119 | 0.026 120 | 0.034 121 | 0.010 122 | 0.006 123 | 0.009 124 | 0.006 125 | 0.009 126 | 0.013 127 | 0.106 128 | 0.011 129 | 0.025 130 | ~~~~~~~~ 131 | 132 | -------------------------------------------------------------- 133 | Results computed with the **unofficial** Python eval code. 134 | Results should be very close to the official MATLAB eval code. 135 | -------------------------------------------------------------- 136 | 0.7884902583981603 0.5615516772893671 0.5143832356646468 137 | ``` 138 | 139 | ## Test 140 | - To test a trained network: 141 | 142 | ```Shell 143 | python test.py -- trained_model {your_weight_address} 144 | ``` 145 | if you want to visual the box, you can add the command --visbox True(default False) 146 | 147 | ## Performance 148 | 149 | #### VOC2007 Test mAP 150 | - Backbone is ResNet50-FPN: 151 | 152 | | Test |AP|AP75| 153 | |:-:|:-:|:-:| 154 | |IoU|51.0|54.7| 155 | |GIoU|51.1|55.4| 156 | |DIoU|51.3|55.7| 157 | |CIoU|51.5|56.4| 158 | |CIoU 16|53.3|58.2| 159 | 160 | ##### "16" means bbox regression weight is set to 16. 161 | ## Cluster-NMS 162 | 163 | See `Detect` function of [utils/detection/detection.py](utils/detection/detection.py) for our Cluster-NMS implementation. 164 | 165 | Currently, NMS only surports `cluster_nms`, `cluster_diounms`, `cluster_weighted_nms`, `cluster_weighted_diounms`. (See `'nms_kind'` in [config/config.py](config/config.py)) 166 | 167 | #### Hardware 168 | - 1 RTX 2080 Ti 169 | - Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz 170 | 171 | | Backbone | Loss | Regression weight | NMS | FPS | time | box AP | box AP75 | 172 | |:-------------:|:-------:|:-------:|:------------------------------------:|:----:|:----:|:----:|:----:| 173 | | Resnet50-FPN | CIoU | 5 | Fast NMS |**28.8**|**34.7**| 50.7 | 56.2 | 174 | | Resnet50-FPN | CIoU | 5 | Original NMS | 17.8 | 56.1 | 51.5 | 56.4 | 175 | | Resnet50-FPN | CIoU | 5 | DIoU-NMS | 11.4 | 87.6 | 51.9 | 56.6 | 176 | | Resnet50-FPN | CIoU | 5 | Cluster-NMS | 28.0 | 35.7 | 51.5 | 56.4 | 177 | | Resnet50-FPN | CIoU | 5 | Cluster-DIoU-NMS | 27.7 | 36.1 | 51.9 | 56.6 | 178 | | Resnet50-FPN | CIoU | 5 | Weighted Cluster-NMS | 26.8 | 37.3 | 51.9 | 56.3 | 179 | | Resnet50-FPN | CIoU | 5 | Weighted + Cluster-DIoU-NMS | 26.5 | 37.8 |**52.4**|**57.0**| 180 | 181 | #### Hardware 182 | - 1 RTX 2080 183 | - Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz 184 | 185 | | Backbone | Loss | Regression weight | NMS | FPS | time | box AP | box AP75 | 186 | |:-------------:|:-------:|:-------:|:------------------------------------:|:----:|:----:|:----:|:----:| 187 | | Resnet50-FPN | CIoU | 16 | Original NMS | 19.7 | 50.9 | 53.3 | 58.2 | 188 | | Resnet50-FPN | CIoU | 16 | Cluster-NMS | 28.0 | 35.7 | 53.4 | 58.2 | 189 | | Resnet50-FPN | CIoU | 16 | Cluster-DIoU-NMS | 26.5 | 37.7 | 53.7 | 58.6 | 190 | | Resnet50-FPN | CIoU | 16 | Weighted Cluster-NMS | 26.9 | 37.2 | 53.8 | 58.7 | 191 | | Resnet50-FPN | CIoU | 16 | Weighted + Cluster-DIoU-NMS | 26.3 | 38.0 |**54.1**|**59.0**| 192 | #### Note: 193 | - Here the box coordinate weighted average is only performed in `IoU> 0.8`. We searched that `IoU > NMS thresh` is not good for SSD and `IoU>0.9` is almost same to `Cluster-NMS`. (Refer to [CAD](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8265304) for the details of Weighted-NMS.) 194 | 195 | - We further incorporate DIoU into Weighted Cluster-NMS for SSD which can get higher AP. 196 | 197 | - Note that Torchvision NMS has the fastest speed, that is owing to CUDA implementation and engineering accelerations (like upper triangular IoU matrix only). However, our Cluster-NMS requires less iterations for NMS and can also be further accelerated by adopting engineering tricks. 198 | 199 | - Currently, Torchvision NMS use IoU as criterion, not DIoU. However, if we directly replace IoU with DIoU in Original NMS, it will costs much more time due to the sequence operation. Now, Cluster-DIoU-NMS will significantly speed up DIoU-NMS and obtain exactly the same result. 200 | 201 | - Torchvision NMS is a function in Torchvision>=0.3, and our Cluster-NMS can be applied to any projects that use low version of Torchvision and other deep learning frameworks as long as it can do matrix operations. **No other import, no need to compile, less iteration, fully GPU-accelerated and better performance**. 202 | ## Pretrained weights 203 | 204 | Here are the trained models using the configurations in this repository. 205 | 206 | - [IoU bbox regression weight 5](https://pan.baidu.com/s/1eNcD9CrnRL79VIH5lsOTPA) 207 | - [GIoU bbox regression weight 5](https://pan.baidu.com/s/1_b1RS5qaRVJUwi27mcpXow) 208 | - [DIoU bbox regression weight 5](https://pan.baidu.com/s/1x1keVP958-DyN_OuWdDAXA) 209 | - [CIoU bbox regression weight 5](https://share.weiyun.com/5LSzur7) 210 | - [CIoU bbox regression weight 16](https://share.weiyun.com/5U3OHez) 211 | -------------------------------------------------------------------------------- /config/__init__.py: -------------------------------------------------------------------------------- 1 | from .config import voc,crack,coco,trafic -------------------------------------------------------------------------------- /config/__pycache__/__init__.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/config/__pycache__/__init__.cpython-35.pyc -------------------------------------------------------------------------------- /config/__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/config/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /config/__pycache__/config.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/config/__pycache__/config.cpython-35.pyc -------------------------------------------------------------------------------- /config/__pycache__/config.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/config/__pycache__/config.cpython-36.pyc -------------------------------------------------------------------------------- /config/config.py: -------------------------------------------------------------------------------- 1 | # config.py 2 | import os.path 3 | 4 | # gets home dir cross platform 5 | #HOME = os.path.expanduser("~") 6 | #HOME = os.path.abspath(os.path.dirname(__file__)).split("/") this path 7 | HOME = os.path.join(os.getcwd()) #../path 8 | # for making bounding boxes pretty 9 | COLORS = ((255, 0, 0, 128), (0, 255, 0, 128), (0, 0, 255, 128), 10 | (0, 255, 255, 128), (255, 0, 255, 128), (255, 255, 0, 128)) 11 | 12 | #crcak (104, 117, 123) 13 | 14 | # SSD300 CONFIGS 15 | 16 | 17 | voc= { 18 | 'model':"resnet50", 19 | 'losstype':'Ciou', 20 | 'num_classes':21, 21 | 'mean':(123.675, 116.28, 103.53), 22 | 'std':(1.0,1.0,1.0),#(58.395, 57.12, 57.375), 23 | 'lr_steps': (80000, 100000,120000), 24 | 'max_iter': 120000, 25 | 'max_epoch':80, 26 | 'feature_maps': [38, 19, 10, 5, 3, 1], 27 | 'min_dim': 300, 28 | 'backbone_out':[512,1024,2048,512,256,256], 29 | 'neck_out':[256,256,256,256,256,256], 30 | 'steps':[8, 16, 32, 64, 100, 300], 31 | 'min_sizes': [30, 60, 111, 162, 213, 264], 32 | 'max_sizes': [60, 111, 162, 213, 264, 315], 33 | 'aspect_ratios': [[2], [2, 3], [2, 3], [2, 3], [2], [2]], 34 | 'variance': [0.1, 0.2], 35 | 'clip': True, 36 | 'nms_kind': "cluster_weighted_diounms", #Currently, NMS only surports 'cluster_nms', 'cluster_diounms', 'cluster_weighted_nms', 'cluster_weighted_diounms' 37 | 'beta1':0.5, 38 | 'name': 'VOC', 39 | 'work_name':"SSD300_VOC_FPN_GIOU", 40 | } 41 | 42 | 43 | crack = { 44 | 'model':"resnet50", 45 | 'num_classes': 2, 46 | 'mean':(127.5, 127.5, 127.5), 47 | 'std':(1.0, 1.0, 1.0), 48 | 'lr_steps': (25000, 35000, 45000), 49 | 'max_iter': 50000, 50 | 'max_epoch':2000, 51 | 'feature_maps': [38, 19, 10, 5, 3, 1], 52 | 'min_dim': 300, 53 | 'backbone_out':[512,1024,2048,512,256,256], 54 | 'neck_out':[256,256,256,256,256,256], 55 | 'steps': [8, 16, 32, 64, 100, 300], 56 | 'min_sizes': [21, 45, 99, 153, 207, 261], 57 | 'max_sizes': [45, 99, 153, 207, 261, 315], 58 | 'aspect_ratios': [[2], [2, 3], [2, 3], [2, 3], [2], [2]],#[[1/0.49], [1/0.16,1/0.09], [1/0.16,1/0.09], [1/0.16,1/0.09], [1/0.09], [1/0.09]], 59 | 'variance': [0.1, 0.2], 60 | 'clip': True, 61 | 'name': 'CRACK', 62 | 'work_name':"SSD300_CRACK_FPN_GIOU", 63 | } 64 | 65 | coco = { 66 | 'num_classes': 201, 67 | 'lr_steps': (280000, 360000, 400000), 68 | 'max_iter': 400000, 69 | 'max_epoch':80, 70 | 'feature_maps': [38, 19, 10, 5, 3, 1], 71 | 'min_dim': 300, 72 | 'steps': [8, 16, 32, 64, 100, 300], 73 | 'min_sizes': [21, 45, 99, 153, 207, 261], 74 | 'max_sizes': [45, 99, 153, 207, 261, 315], 75 | 'aspect_ratios': [[2], [2, 3], [2, 3], [2, 3], [2], [2]], 76 | 'variance': [0.1, 0.2], 77 | 'clip': True, 78 | 'name': 'COCO', 79 | } 80 | 81 | 82 | 83 | trafic = { 84 | 'num_classes': 21, 85 | 'lr_steps': (80000, 100000, 120000), 86 | 'max_iter': 12000, 87 | 'max_epoch':2000, 88 | 'feature_maps': [50, 25, 13, 7, 5, 3], 89 | 'min_dim': 800, 90 | 'steps': [16, 32, 64, 100, 300, 600], 91 | 'min_sizes': [16, 32, 64, 128, 256, 512], 92 | 'max_sizes': [32, 64, 128, 256, 512, 630], 93 | 'aspect_ratios': [[1], [1,1/2], [1/2,1], [1/2,1], [1], [1]], 94 | 'variance': [0.1, 0.2], 95 | 'clip': True, 96 | 'name': 'TRAFIC', 97 | } 98 | -------------------------------------------------------------------------------- /data/CRACK.py: -------------------------------------------------------------------------------- 1 | """VOC Dataset Classes 2 | 3 | Original author: Francisco Massa 4 | https://github.com/fmassa/vision/blob/voc_dataset/torchvision/datasets/voc.py 5 | 6 | Updated by: Ellis Brown, Max deGroot 7 | """ 8 | import os 9 | import sys 10 | import cv2 11 | import numpy as np 12 | if sys.version_info[0] == 2: 13 | import xml.etree.cElementTree as ET 14 | else: 15 | import xml.etree.ElementTree as ET 16 | 17 | from .VOC import VOCDetection,VOCAnnotationTransform 18 | 19 | 20 | 21 | 22 | CRACK_CLASSES = ( # always index 0 23 | 'neg',) 24 | 25 | HOME = os.path.join(os.getcwd()) 26 | CRACK_ROOT = os.path.join(HOME, "data/CrackData/") 27 | 28 | class CRACKDetection(VOCDetection): 29 | """VOC Detection Dataset Object 30 | 31 | input is image, target is annotation 32 | 33 | Arguments: 34 | root (string): filepath to VOCdevkit folder. 35 | image_set (string): imageset to use (eg. 'train', 'val', 'test') 36 | transform (callable, optional): transformation to perform on the 37 | input image 38 | target_transform (callable, optional): transformation to perform on the 39 | target `annotation` 40 | (eg: take in caption string, return tensor of word indices) 41 | dataset_name (string, optional): which dataset to load 42 | (default: 'VOC2007') 43 | """ 44 | 45 | def __init__(self, root = CRACK_ROOT, 46 | image_sets= 'trainval.txt',transform=None, 47 | bbox_transform = VOCAnnotationTransform(class_to_ind = CRACK_CLASSES), 48 | dataset_name = 'CRACK'): 49 | self.root = root 50 | self.transform = transform 51 | self.bbox = bbox_transform 52 | self.name = dataset_name 53 | self._annopath = os.path.join('%s', 'Annotations', '%s.xml') 54 | self._imgpath = os.path.join('%s', 'JPEGImages', '%s.jpg') 55 | self.ids = list() 56 | rootpath = os.path.join(self.root, 'crack/') 57 | for line in open(os.path.join(rootpath, 'ImageSets', 'Main',image_sets)): 58 | self.ids.append((rootpath, line.strip())) 59 | #self.ids = self.ids[0:20] 60 | 61 | 62 | def mix_up(self,fir_index): 63 | fir_id = self.ids[fir_index] 64 | sec_index = np.random.randint(0,len(self.ids)) 65 | sec_id = self.ids[sec_index] 66 | first_img = cv2.imread(self._imgpath % fir_id) 67 | second_img = cv2.imread(self._imgpath % sec_id) 68 | if eq(first_img.shape,second_img.shape) is False: 69 | raise Exception("The image shape is not same, please!,the first img is {},shape = {},\ 70 | the second is {},shape = {}".format(fir_index,str(first_img.shape),sec_index,str(second_img.shape))) 71 | 72 | else: 73 | height,width,channels = first_img.shape 74 | 75 | lam = np.random.beta(1.5, 1.5) 76 | res = cv2.addWeighted(first_img, lam, second_img, 1-lam,0) 77 | 78 | first_target = ET.parse(self._annopath % fir_id).getroot() 79 | second_target = ET.parse(self._annopath % sec_id).getroot() 80 | 81 | target = [] 82 | if lam <= 0.9 and lam >= 0.1: 83 | target = self.target_transform(first_target, width, height) 84 | target+= self.target_transform(second_target, width, height) 85 | elif lam>0.9: 86 | target = self.target_transform(first_target, width, height) 87 | else: 88 | target = self.target_transform(second_target, width, height) 89 | return res,target,height,width 90 | 91 | 92 | 93 | 94 | 95 | 96 | -------------------------------------------------------------------------------- /data/VOC.py: -------------------------------------------------------------------------------- 1 | """VOC Dataset Classes 2 | Original author: Francisco Massa 3 | https://github.com/fmassa/vision/blob/voc_dataset/torchvision/datasets/voc.py 4 | Updated by: Ellis Brown, Max deGroot 5 | """ 6 | import os 7 | import os.path as osp 8 | import sys 9 | import torch 10 | import torch.utils.data as data 11 | import cv2 12 | import numpy as np 13 | if sys.version_info[0] == 2: 14 | import xml.etree.cElementTree as ET 15 | else: 16 | import xml.etree.ElementTree as ET 17 | 18 | 19 | VOC_CLASSES = ( # always index 0 20 | 'aeroplane', 'bicycle', 'bird', 'boat', 21 | 'bottle', 'bus', 'car', 'cat', 'chair', 22 | 'cow', 'diningtable', 'dog', 'horse', 23 | 'motorbike', 'person', 'pottedplant', 24 | 'sheep', 'sofa', 'train', 'tvmonitor') 25 | 26 | # note: if you used our download scripts, this should be right 27 | HOME = osp.join(os.getcwd()) 28 | VOC_ROOT = osp.join(HOME, "data/VOCdevkit/") 29 | 30 | class VOCAnnotationTransform(object): 31 | """Transforms a VOC annotation into a Tensor of bbox coords and label index 32 | Initilized with a dictionary lookup of classnames to indexes 33 | 34 | Arguments: 35 | class_to_ind (dict, optional): dictionary lookup of classnames -> indexes 36 | (default: alphabetic indexing of VOC's 20 classes) 37 | keep_difficult (bool, optional): keep difficult instances or not 38 | (default: False) 39 | height (int): height 40 | width (int): width 41 | """ 42 | 43 | def __init__(self, class_to_ind=None, keep_difficult=False): 44 | self.class_to_ind = dict( 45 | zip(class_to_ind, range(len(class_to_ind)))) 46 | 47 | self.keep_difficult = keep_difficult 48 | 49 | def __call__(self, target, width, height): 50 | """ 51 | Arguments: 52 | target (annotation) : the target annotation to be made usable 53 | will be an ET.Element 54 | Returns: 55 | a list containing lists of bounding boxes [bbox coords, class name] 56 | """ 57 | res = [] 58 | for obj in target.iter('object'): 59 | difficult = int(obj.find('difficult').text) == 1 60 | if not self.keep_difficult and difficult: 61 | continue 62 | name = obj.find('name').text.lower().strip() 63 | bbox = obj.find('bndbox') 64 | 65 | pts = ['xmin', 'ymin', 'xmax', 'ymax'] 66 | bndbox = [] 67 | for i, pt in enumerate(pts): 68 | cur_pt = int(bbox.find(pt).text) - 1 69 | # scale height or width:x/width,y/height 70 | cur_pt = cur_pt / width if i % 2 == 0 else cur_pt / height 71 | bndbox.append(cur_pt) 72 | label_idx = self.class_to_ind[name] 73 | bndbox.append(label_idx) 74 | res += [bndbox] # [xmin, ymin, xmax, ymax, label_ind] 75 | #img_id = target.find('filename').text[:-4] 76 | 77 | return res # [[xmin, ymin, xmax, ymax, label_ind], ... ] 78 | 79 | 80 | class VOCDetection(data.Dataset): 81 | """VOC Detection Dataset Object 82 | input is image, target is annotation 83 | Arguments: 84 | root (string): filepath to VOCdevkit folder. 85 | image_set (string): imageset to use (eg. 'train', 'val', 'test') 86 | transform (callable, optional): transformation to perform on the 87 | input image 88 | bbox_transform (callable, optional): transformation to perform on the 89 | target `annotation` 90 | (eg: take in caption string, return tensor of word indices) 91 | dataset_name (string, optional): which dataset to load 92 | (default: 'VOC2007') 93 | """ 94 | def __init__(self, root, 95 | image_sets=[('2007', 'trainval'), ('2012', 'trainval')], 96 | transform=None, bbox_transform=VOCAnnotationTransform(class_to_ind = VOC_CLASSES), 97 | dataset_name='VOC0712'): 98 | self.root = root 99 | self.image_set = image_sets 100 | self.transform = transform 101 | self.bbox = bbox_transform 102 | self.name = dataset_name 103 | self._annopath = osp.join('%s', 'Annotations', '%s.xml') 104 | self._imgpath = osp.join('%s', 'JPEGImages', '%s.jpg') 105 | self.ids = list() 106 | for (year, name) in image_sets: 107 | rootpath = osp.join(self.root, 'VOC' + year) 108 | for line in open(osp.join(rootpath, 'ImageSets', 'Main', name + '.txt')): 109 | self.ids.append((rootpath, line.strip())) 110 | #self.ids = self.ids[0:400] 111 | def __getitem__(self, index): 112 | im, gt, h, w = self.pull_item(index) 113 | 114 | return im, gt 115 | 116 | def __len__(self): 117 | return len(self.ids) 118 | 119 | def pull_item(self, index): 120 | img_id = self.ids[index] 121 | 122 | target = ET.parse(self._annopath % img_id).getroot() 123 | img = cv2.imread(self._imgpath % img_id) 124 | height, width, channels = img.shape 125 | 126 | if self.bbox is not None: 127 | target = self.bbox(target, width, height) 128 | 129 | if self.transform is not None: 130 | target = np.array(target) 131 | img, boxes, labels = self.transform(img, target[:, :4], target[:, 4]) 132 | # to rgb 133 | img = img[:, :, (2, 1, 0)] 134 | # img = img.transpose(2, 0, 1) 135 | target = np.hstack((boxes, np.expand_dims(labels, axis=1))) 136 | return torch.from_numpy(img).permute(2, 0, 1), target, height, width 137 | 138 | def pull_image(self, index): 139 | '''Returns the original image object at index in PIL form 140 | 141 | Note: not using self.__getitem__(), as any transformations passed in 142 | could mess up this functionality. 143 | 144 | Argument: 145 | index (int): index of img to show 146 | Return: 147 | PIL img 148 | ''' 149 | img_id = self.ids[index] 150 | image = cv2.imread(self._imgpath % img_id, cv2.IMREAD_COLOR) 151 | rgb_image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB) 152 | print(rgb_image.shape) 153 | return rgb_image 154 | 155 | def pull_anno(self, index): 156 | '''Returns the original annotation of image at index 157 | 158 | Note: not using self.__getitem__(), as any transformations passed in 159 | could mess up this functionality. 160 | 161 | Argument: 162 | index (int): index of img to get annotation of 163 | Return: 164 | list: [img_id, [(label, bbox coords),...]] 165 | eg: ('001718', [('dog', (96, 13, 438, 332))]) 166 | ''' 167 | img_id = self.ids[index] 168 | anno = ET.parse(self._annopath % img_id).getroot() 169 | gt = self.bbox(anno, 1, 1) 170 | return img_id[1], gt 171 | 172 | def pull_tensor(self, index): 173 | '''Returns the original image at an index in tensor form 174 | 175 | Note: not using self.__getitem__(), as any transformations passed in 176 | could mess up this functionality. 177 | 178 | Argument: 179 | index (int): index of img to show 180 | Return: 181 | tensorized version of img, squeezed 182 | ''' 183 | return torch.Tensor(self.pull_image(index)).unsqueeze_(0) 184 | 185 | 186 | def detection_collate(batch): 187 | """Custom collate fn for dealing with batches of images that have a different 188 | number of associated object annotations (bounding boxes). 189 | 190 | Arguments: 191 | batch: (tuple) A tuple of tensor images and lists of annotations 192 | 193 | Return: 194 | A tuple containing: 195 | 1) (tensor) batch of images stacked on their 0 dim 196 | 2) (list of tensors) annotations for a given image are stacked on 197 | 0 dim 198 | """ 199 | targets = [] 200 | imgs = [] 201 | for sample in batch: 202 | imgs.append(sample[0]) 203 | targets.append(torch.FloatTensor(sample[1])) 204 | return torch.stack(imgs, 0), targets 205 | -------------------------------------------------------------------------------- /data/__init__.py: -------------------------------------------------------------------------------- 1 | from .VOC import VOC_ROOT, VOC_CLASSES, VOCAnnotationTransform, VOCDetection 2 | from .VOC import detection_collate 3 | from .CRACK import CRACK_ROOT, CRACK_CLASSES, CRACKDetection 4 | from .utils import SSDAugmentation,BaseTransform 5 | -------------------------------------------------------------------------------- /data/__pycache__/CRACK.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/data/__pycache__/CRACK.cpython-36.pyc -------------------------------------------------------------------------------- /data/__pycache__/VOC.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/data/__pycache__/VOC.cpython-36.pyc -------------------------------------------------------------------------------- /data/__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/data/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /data/utils/__init__.py: -------------------------------------------------------------------------------- 1 | from .augmentations import SSDAugmentation, BaseTransform 2 | -------------------------------------------------------------------------------- /data/utils/__pycache__/__init__.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/data/utils/__pycache__/__init__.cpython-35.pyc -------------------------------------------------------------------------------- /data/utils/__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/data/utils/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /data/utils/__pycache__/augmentations.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/data/utils/__pycache__/augmentations.cpython-35.pyc -------------------------------------------------------------------------------- /data/utils/__pycache__/augmentations.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/data/utils/__pycache__/augmentations.cpython-36.pyc -------------------------------------------------------------------------------- /data/utils/augmentations.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torchvision import transforms 3 | import cv2 4 | import numpy as np 5 | import types 6 | from numpy import random 7 | 8 | 9 | def intersect(box_a, box_b): 10 | ''' 11 | calcute the intersect of box 12 | args: 13 | box_a = [boxs_num,4] 14 | box_b = [4] 15 | 16 | return iou_area = [boxs_num,1] 17 | ''' 18 | max_xy = np.minimum(box_a[:, 2:], box_b[2:]) 19 | min_xy = np.maximum(box_a[:, :2], box_b[:2]) 20 | inter = np.clip((max_xy - min_xy), a_min=0, a_max=np.inf) 21 | return inter[:, 0] * inter[:, 1] 22 | 23 | 24 | def jaccard_numpy(box_a, box_b): 25 | """Compute the jaccard overlap of two sets of boxes. The jaccard overlap 26 | is simply the intersection over union of two boxes. 27 | E.g.: 28 | A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B) 29 | Args: 30 | box_a: Multiple bounding boxes, Shape: [num_boxes,4] 31 | box_b: Single bounding box, Shape: [4] 32 | Return: 33 | jaccard overlap: Shape: [box_a.shape[0], box_a.shape[1]] 34 | """ 35 | inter = intersect(box_a, box_b) 36 | area_a = ((box_a[:, 2]-box_a[:, 0]) * 37 | (box_a[:, 3]-box_a[:, 1])) # [A,B] 38 | area_b = ((box_b[2]-box_b[0]) * 39 | (box_b[3]-box_b[1])) # [A,B] 40 | union = area_a + area_b - inter 41 | return inter / union # [A,B] 42 | 43 | 44 | class Compose(object): 45 | """ 46 | Composes several augmentations together. 47 | Args: 48 | transforms (List[Transform]): list of transforms to compose. 49 | Example: 50 | augmentations.Compose([ 51 | transforms.CenterCrop(10), 52 | transforms.ToTensor(), 53 | ]) 54 | """ 55 | 56 | def __init__(self, transforms): 57 | self.transforms = transforms 58 | 59 | def __call__(self, img, boxes=None, labels=None): 60 | for t in self.transforms: 61 | img, boxes, labels = t(img, boxes, labels) 62 | return img, boxes, labels 63 | 64 | 65 | class Lambda(object): 66 | """Applies a lambda as a transform.""" 67 | 68 | def __init__(self, lambd): 69 | assert isinstance(lambd, types.LambdaType) 70 | self.lambd = lambd 71 | 72 | def __call__(self, img, boxes=None, labels=None): 73 | return self.lambd(img, boxes, labels) 74 | 75 | 76 | class ConvertFromInts(object): 77 | ''' 78 | Convert the image to ints 79 | ''' 80 | def __call__(self, image, boxes=None, labels=None): 81 | return image.astype(np.float32), boxes, labels 82 | 83 | 84 | class SubtractMeans(object): 85 | ''' 86 | Sub the image means 87 | ''' 88 | def __init__(self, mean): 89 | self.mean = np.array(mean, dtype=np.float32) 90 | 91 | def __call__(self, image, boxes=None, labels=None): 92 | image = image.astype(np.float32) 93 | image -= self.mean 94 | return image.astype(np.float32), boxes, labels 95 | 96 | 97 | class Standform(object): 98 | ''' 99 | make the image to standorm 100 | ''' 101 | def __init__(self,mean,std): 102 | self.means = np.array(mean,dtype = np.float32) 103 | self.std = np.array(std,dtype = np.float32) 104 | def __call__(self, image, boxes=None, labels=None): 105 | image = image.astype(np.float32) 106 | return (image - self.means)/self.std,boxes,labels 107 | 108 | 109 | class ToAbsoluteCoords(object): 110 | ''' 111 | make the boxes to Absolute Coords 112 | ''' 113 | def __call__(self, image, boxes=None, labels=None): 114 | height, width, channels = image.shape 115 | boxes[:, 0] *= width 116 | boxes[:, 2] *= width 117 | boxes[:, 1] *= height 118 | boxes[:, 3] *= height 119 | 120 | return image, boxes, labels 121 | 122 | 123 | class ToPercentCoords(object): 124 | ''' 125 | make the boxes to Percent Coords 126 | ''' 127 | def __call__(self, image, boxes=None, labels=None): 128 | height, width, channels = image.shape 129 | boxes[:, 0] /= width 130 | boxes[:, 2] /= width 131 | boxes[:, 1] /= height 132 | boxes[:, 3] /= height 133 | 134 | return image, boxes, labels 135 | 136 | 137 | class Resize(object): 138 | ''' 139 | resize the image 140 | args: 141 | size = (size,size) 142 | ''' 143 | def __init__(self, size=300): 144 | if isinstance(size,int): 145 | self.size = (size,size) 146 | elif isinstance(size,tuple): 147 | self.size = size 148 | else: 149 | raise Exception("The size is int or tuple") 150 | 151 | def __call__(self, image, boxes=None, labels=None): 152 | image = cv2.resize(image, self.size) 153 | return image, boxes, labels 154 | 155 | 156 | class RandomSaturation(object): 157 | ''' 158 | Random to change the Saturation(HSV):0.0~1.0 159 | assert: this image is HSV 160 | args: 161 | lower,upper is the parameter to random the saturation 162 | ''' 163 | def __init__(self, lower=0.5, upper=1.5): 164 | self.lower = lower 165 | self.upper = upper 166 | assert self.upper >= self.lower, "contrast upper must be >= lower." 167 | assert self.lower >= 0, "contrast lower must be non-negative." 168 | 169 | def __call__(self, image, boxes=None, labels=None): 170 | if random.randint(2): 171 | image[:, :, 1] *= random.uniform(self.lower, self.upper) 172 | 173 | return image, boxes, labels 174 | 175 | 176 | class RandomHue(object): 177 | ''' 178 | Random to change the Hue(HSV):0~360 179 | assert: this image is HSV 180 | args: 181 | delta is the parameters to random change the hue. 182 | 183 | ''' 184 | def __init__(self, delta=18.0): 185 | assert delta >= 0.0 and delta <= 360.0 186 | self.delta = delta 187 | 188 | def __call__(self, image, boxes=None, labels=None): 189 | if random.randint(2): 190 | image[:, :, 0] += random.uniform(-self.delta, self.delta) 191 | image[:, :, 0][image[:, :, 0] > 360.0] -= 360.0 192 | image[:, :, 0][image[:, :, 0] < 0.0] += 360.0 193 | return image, boxes, labels 194 | 195 | 196 | class RandomLightingNoise(object): 197 | def __init__(self): 198 | self.perms = ((0, 1, 2), (0, 2, 1), 199 | (1, 0, 2), (1, 2, 0), 200 | (2, 0, 1), (2, 1, 0)) 201 | 202 | def __call__(self, image, boxes=None, labels=None): 203 | if random.randint(2): 204 | swap = self.perms[random.randint(len(self.perms))] 205 | shuffle = SwapChannels(swap) # shuffle channels 206 | image = shuffle(image) 207 | return image, boxes, labels 208 | 209 | 210 | class ConvertColor(object): 211 | ''' 212 | change the image from HSV to BGR or from BGR to HSV color 213 | args: 214 | current 215 | transform 216 | ''' 217 | def __init__(self, current='RGB', transform='HSV'): 218 | self.transform = transform 219 | self.current = current 220 | 221 | def __call__(self, image, boxes=None, labels=None): 222 | if self.current == 'RGB' and self.transform == 'HSV': 223 | image = cv2.cvtColor(image, cv2.COLOR_RGB2HSV) 224 | elif self.current == 'HSV' and self.transform == 'RGB': 225 | image = cv2.cvtColor(image, cv2.COLOR_HSV2RGB) 226 | else: 227 | raise NotImplementedError 228 | return image, boxes, labels 229 | 230 | 231 | class RandomContrast(object): 232 | ''' 233 | Random to improve the image contrast:g(i,j) = alpha*f(i,j) 234 | ''' 235 | def __init__(self, lower=0.5, upper=1.5): 236 | self.lower = lower 237 | self.upper = upper 238 | assert self.upper >= self.lower, "contrast upper must be >= lower." 239 | assert self.lower >= 0, "contrast lower must be non-negative." 240 | 241 | # expects float image 242 | def __call__(self, image, boxes=None, labels=None): 243 | if random.randint(2): 244 | alpha = random.uniform(self.lower, self.upper) 245 | image *= alpha 246 | return image, boxes, labels 247 | 248 | 249 | class RandomBrightness(object): 250 | ''' 251 | Random to improve the image bright:g(i,j) = f(i,j) + beta 252 | ''' 253 | def __init__(self, delta=32): 254 | assert delta >= 0.0 255 | assert delta <= 255.0 256 | self.delta = delta 257 | 258 | def __call__(self, image, boxes=None, labels=None): 259 | if random.randint(2): 260 | delta = random.uniform(-self.delta, self.delta) 261 | image += delta 262 | return image, boxes, labels 263 | 264 | 265 | class ToCV2Image(object): 266 | ''' 267 | change the iamge shape c,h,w to h,w,c 268 | ''' 269 | def __call__(self, tensor, boxes=None, labels=None): 270 | return tensor.cpu().numpy().astype(np.float32).transpose((1, 2, 0)), boxes, labels 271 | 272 | 273 | class ToTensor(object): 274 | ''' 275 | chage the image shape h,w,c to c,h,w 276 | ''' 277 | 278 | def __call__(self, cvimage, boxes=None, labels=None): 279 | return torch.from_numpy(cvimage.astype(np.float32)).permute(2, 0, 1), boxes, labels 280 | 281 | 282 | class RandomSampleCrop(object): 283 | """Crop 284 | Arguments: 285 | img (Image): the image being input during training 286 | boxes (Tensor): the original bounding boxes in pt form 287 | labels (Tensor): the class labels for each bbox 288 | mode (float tuple): the min and max jaccard overlaps 289 | Return: 290 | (img, boxes, classes) 291 | img (Image): the cropped image 292 | boxes (Tensor): the adjusted bounding boxes in pt form 293 | labels (Tensor): the class labels for each bbox 294 | """ 295 | def __init__(self): 296 | self.sample_options = ( 297 | # using entire original input image 298 | None, 299 | # sample a patch s.t. MIN jaccard w/ obj in .1,.3,.4,.7,.9 300 | (0.1, None), 301 | (0.3, None), 302 | (0.7, None), 303 | (0.9, None), 304 | # randomly sample a patch 305 | (None, None), 306 | ) 307 | 308 | def __call__(self, image, boxes=None, labels=None): 309 | height, width, _ = image.shape 310 | while True: 311 | # randomly choose a mode 312 | mode = random.choice(self.sample_options) 313 | if mode is None: 314 | return image, boxes, labels 315 | 316 | min_iou, max_iou = mode 317 | if min_iou is None: 318 | min_iou = float('-inf') 319 | if max_iou is None: 320 | max_iou = float('inf') 321 | 322 | # max trails (50) 323 | for _ in range(50): 324 | current_image = image 325 | 326 | w = random.uniform(0.3 * width, width) 327 | h = random.uniform(0.3 * height, height) 328 | 329 | # aspect ratio constraint b/t .5 & 2 330 | if h / w < 0.5 or h / w > 2: 331 | continue 332 | 333 | left = random.uniform(width - w) 334 | top = random.uniform(height - h) 335 | 336 | # convert to integer rect x1,y1,x2,y2 337 | rect = np.array([int(left), int(top), int(left+w), int(top+h)]) 338 | 339 | # calculate IoU (jaccard overlap) b/t the cropped and gt boxes 340 | overlap = jaccard_numpy(boxes, rect) 341 | 342 | # is min and max overlap constraint satisfied? if not try again 343 | if overlap.min() < min_iou and max_iou < overlap.max(): 344 | continue 345 | 346 | # cut the crop from the image 347 | current_image = current_image[rect[1]:rect[3], rect[0]:rect[2], 348 | :] 349 | 350 | # keep overlap with gt box IF center in sampled patch 351 | #calcute the center in the boxes 352 | centers = (boxes[:, :2] + boxes[:, 2:]) / 2.0 353 | 354 | # mask in all gt boxes that above and to the left of centers 355 | m1 = (rect[0] < centers[:, 0]) * (rect[1] < centers[:, 1]) 356 | 357 | # mask in all gt boxes that under and to the right of centers 358 | m2 = (rect[2] > centers[:, 0]) * (rect[3] > centers[:, 1]) 359 | 360 | # mask in that both m1 and m2 are true 361 | #select the valid box that center in the rect 362 | mask = m1 * m2 363 | 364 | # have any valid boxes? try again if not 365 | if not mask.any(): 366 | continue 367 | 368 | # take only matching gt boxes 369 | current_boxes = boxes[mask, :].copy() 370 | 371 | # take only matching gt labels 372 | current_labels = labels[mask] 373 | 374 | # should we use the box left and top corner or the crop's 375 | current_boxes[:, :2] = np.maximum(current_boxes[:, :2], 376 | rect[:2]) 377 | # adjust to crop (by substracting crop's left,top) 378 | current_boxes[:, :2] -= rect[:2] 379 | 380 | current_boxes[:, 2:] = np.minimum(current_boxes[:, 2:], 381 | rect[2:]) 382 | # adjust to crop (by substracting crop's left,top) 383 | current_boxes[:, 2:] -= rect[:2] 384 | 385 | return current_image, current_boxes, current_labels 386 | 387 | 388 | class Expand(object): 389 | ''' 390 | expand:ratio = 0.5 391 | ''' 392 | def __init__(self, mean): 393 | self.mean = mean 394 | 395 | def __call__(self, image, boxes, labels): 396 | if random.randint(2): 397 | return image, boxes, labels 398 | 399 | height, width, depth = image.shape 400 | ratio = random.uniform(1, 4) 401 | # random to make the left and top 402 | left = random.uniform(0, width*ratio - width) 403 | top = random.uniform(0, height*ratio - height) 404 | 405 | expand_image = np.zeros( 406 | (int(height*ratio), int(width*ratio), depth), 407 | dtype=image.dtype) 408 | expand_image[:, :, :] = self.mean 409 | #put the image to the expand image 410 | expand_image[int(top):int(top + height), 411 | int(left):int(left + width)] = image 412 | image = expand_image 413 | 414 | boxes = boxes.copy() 415 | #match the box left and top 416 | boxes[:, :2] += (int(left), int(top)) 417 | boxes[:, 2:] += (int(left), int(top)) 418 | 419 | return image, boxes, labels 420 | 421 | ''' 422 | horizontal flip: ration = 0.5 423 | ''' 424 | class RandomMirror(object): 425 | def __call__(self, image, boxes, classes): 426 | _, width, _ = image.shape 427 | if random.randint(2): 428 | image = image[:, ::-1] 429 | boxes = boxes.copy() 430 | boxes[:, 0::2] = width - boxes[:, 2::-2] 431 | return image, boxes, classes 432 | 433 | 434 | class SwapChannels(object): 435 | """Transforms a tensorized image by swapping the channels in the order 436 | specified in the swap tuple. 437 | Args: 438 | swaps (int triple): final order of channels 439 | eg: (2, 1, 0) 440 | """ 441 | 442 | def __init__(self, swaps): 443 | self.swaps = swaps 444 | 445 | def __call__(self, image): 446 | """ 447 | Args: 448 | image (Tensor): image tensor to be transformed 449 | Return: 450 | a tensor with channels swapped according to swap 451 | """ 452 | # if torch.is_tensor(image): 453 | # image = image.data.cpu().numpy() 454 | # else: 455 | # image = np.array(image) 456 | image = image[:, :, self.swaps] 457 | return image 458 | 459 | 460 | class PhotometricDistort(object): 461 | def __init__(self): 462 | self.pd = [ 463 | RandomContrast(), 464 | ConvertColor(transform='HSV'), 465 | RandomSaturation(), 466 | RandomHue(), 467 | ConvertColor(current='HSV', transform='RGB'), 468 | RandomContrast() 469 | ] 470 | self.rand_brightness = RandomBrightness() 471 | self.rand_light_noise = RandomLightingNoise() 472 | 473 | def __call__(self, image, boxes, labels): 474 | im = image.copy() 475 | im, boxes, labels = self.rand_brightness(im, boxes, labels) 476 | if random.randint(2): 477 | distort = Compose(self.pd[:-1]) 478 | else: 479 | distort = Compose(self.pd[1:]) 480 | im, boxes, labels = distort(im, boxes, labels) 481 | return self.rand_light_noise(im, boxes, labels) 482 | 483 | 484 | class SSDAugmentation(object): 485 | def __init__(self, size=300, mean=(104, 117, 123),std =(104, 117, 123)): 486 | self.mean = mean 487 | self.std = std 488 | self.size = size 489 | self.augment = Compose([ 490 | ConvertFromInts(), 491 | ToAbsoluteCoords(), 492 | PhotometricDistort(), 493 | Expand(self.mean), 494 | RandomSampleCrop(), 495 | RandomMirror(), 496 | ToPercentCoords(), 497 | Resize(self.size), 498 | Standform(self.mean,self.std) 499 | #SubtractMeans(self.mean) 500 | ]) 501 | 502 | def __call__(self, img, boxes, labels): 503 | return self.augment(img, boxes, labels) 504 | 505 | def base_transform(image, size, mean): 506 | x = Standform(self.mean,self.std) 507 | x = cv2.resize(image, (size, size)).astype(np.float32) 508 | x -= mean 509 | x = x.astype(np.float32) 510 | return x 511 | 512 | 513 | class BaseTransform: 514 | def __init__(self, size, mean,std): 515 | self.mean = mean 516 | self.std = std 517 | self.size = size 518 | self.augment = Compose([ 519 | ConvertFromInts(), 520 | Resize(self.size), 521 | Standform(self.mean,self.std) 522 | 523 | ]) 524 | 525 | def __call__(self, image, boxes=None, labels=None): 526 | return self.augment(image, boxes, labels) 527 | -------------------------------------------------------------------------------- /model/__init__.py: -------------------------------------------------------------------------------- 1 | from .build_ssd import build_ssd 2 | -------------------------------------------------------------------------------- /model/__pycache__/__init__.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/__pycache__/__init__.cpython-35.pyc -------------------------------------------------------------------------------- /model/__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /model/__pycache__/build_ssd.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/__pycache__/build_ssd.cpython-35.pyc -------------------------------------------------------------------------------- /model/__pycache__/build_ssd.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/__pycache__/build_ssd.cpython-36.pyc -------------------------------------------------------------------------------- /model/backbone/__init__.py: -------------------------------------------------------------------------------- 1 | from .build_backbone import Backbone 2 | -------------------------------------------------------------------------------- /model/backbone/__pycache__/__init__.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/backbone/__pycache__/__init__.cpython-35.pyc -------------------------------------------------------------------------------- /model/backbone/__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/backbone/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /model/backbone/__pycache__/build_backbone.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/backbone/__pycache__/build_backbone.cpython-35.pyc -------------------------------------------------------------------------------- /model/backbone/__pycache__/build_backbone.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/backbone/__pycache__/build_backbone.cpython-36.pyc -------------------------------------------------------------------------------- /model/backbone/build_backbone.py: -------------------------------------------------------------------------------- 1 | import pretrainedmodels 2 | import torch.nn as nn 3 | from torchsummary import summary 4 | from ..utils import ConvModule 5 | 6 | 7 | 8 | 9 | 10 | class Backbone(nn.Module): 11 | def __init__(self, model_name, feature_map): 12 | super(Backbone,self).__init__() 13 | self.normalize = {'type':'BN'} 14 | lay,channal = self.get_pretrainedmodel(model_name) 15 | self.model = self.add_extras(lay, channal) 16 | self.model_length = len(self.model) 17 | self.feature_map = feature_map 18 | 19 | 20 | 21 | def get_pretrainedmodel(self,model_name,pretrained = 'imagenet'):#'imagenet' 22 | ''' 23 | get the pretraindmodel lay 24 | args: 25 | model_name 26 | pretrained:None or imagenet 27 | ''' 28 | model = pretrainedmodels.__dict__[model_name](num_classes = 1000,pretrained = pretrained) 29 | #get the model lay,it's a list 30 | if model_name in ['resnet18','resnet34','resnet50','resnet101','resnet152']: 31 | lay = nn.Sequential(*list(model.children())[:-2]) 32 | if model_name == 'resnet50': 33 | out_channels = 2048 34 | 35 | return lay,out_channels 36 | 37 | def add_extras(self,lay,in_channel): 38 | exts1 = nn.Sequential( 39 | ConvModule(2048,256,1,normalize=None,stride = 1, 40 | bias=True,inplace=False), 41 | ConvModule(256,512,3,normalize=None,stride = 2,padding = 1, 42 | bias=True,inplace=False) 43 | 44 | 45 | #nn.Conv2d(in_channels = 256, out_channels = 512, kernel_size = 3 ,stride = 2, padding = 1) 46 | ) 47 | lay.add_module("exts1",exts1) 48 | 49 | exts2 = nn.Sequential( 50 | ConvModule(512,128,1,normalize=None,stride = 1, 51 | bias=True,inplace=False), 52 | ConvModule(128,256,3,normalize=None,stride = 2,padding = 1, 53 | bias=True,inplace=False) 54 | 55 | ) 56 | lay.add_module("exts2",exts2) 57 | 58 | exts3 = nn.Sequential( 59 | ConvModule(256,128,1,normalize=None,stride = 1, 60 | bias=True,inplace=False), 61 | ConvModule(128,256,3,normalize=None,stride = 1,padding = 0, 62 | bias=True,inplace=False) 63 | ) 64 | lay.add_module("exts3",exts3) 65 | 66 | return lay 67 | 68 | def forward(self,x): 69 | outs = [] 70 | 71 | for i in range(self.model_length): 72 | x = self.model[i](x) 73 | 74 | if i+1 in self.feature_map: 75 | 76 | outs.append(x) 77 | #for i in range(len(outs)): 78 | #print(outs[i].shape[1]) 79 | if len(outs) == 1: 80 | return outs[0] 81 | else: 82 | return tuple(outs) 83 | 84 | 85 | 86 | if __name__=='__main__': 87 | import torch.nn as nn 88 | use_gpu = True 89 | model_name = 'resnet50' 90 | 91 | 92 | # could be fbresnet152 or inceptionresnetv2 93 | feature_map = [6,7,8,9,10,11] 94 | bone_model = Backbone(model_name,feature_map) 95 | if use_gpu: 96 | bone_model.cuda() 97 | summary(bone_model, (3,300, 300)) 98 | -------------------------------------------------------------------------------- /model/build_ssd.py: -------------------------------------------------------------------------------- 1 | ''' 2 | import torch 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | from layers import * 6 | from data import voc, coco 7 | import os 8 | import torchvision 9 | ''' 10 | from model.backbone import Backbone 11 | from model.neck import Neck,SSDNeck 12 | from model.head import SSDHead 13 | import torch.nn as nn 14 | import torch 15 | from utils import PriorBox,Detect 16 | import torch.nn.functional as F 17 | 18 | class SSD(nn.Module): 19 | """Single Shot Multibox Architecture 20 | The network is composed of a base VGG network followed by the 21 | added multibox conv layers. Each multibox layer branches into 22 | 1) conv2d for class conf scores 23 | 2) conv2d for localization predictions 24 | 3) associated priorbox layer to produce default bounding 25 | boxes specific to the layer's feature map size. 26 | See: https://arxiv.org/pdf/1512.02325.pdf for more details. 27 | 28 | Args: 29 | phase: (string) Can be "test" or "train" 30 | size: input image size 31 | base: VGG16 layers for input, size of either 300 or 500 32 | extras: extra layers that feed to multibox loc and conf layers 33 | head: "multibox head" consists of loc and conf conv layers 34 | """ 35 | 36 | def __init__(self, phase, size, Backbone, Neck, Head, cfg): 37 | super(SSD, self).__init__() 38 | self.phase = phase 39 | self.cfg = cfg 40 | self.priorbox = PriorBox(self.cfg) 41 | self.priors = self.priorbox.forward() 42 | self.size = size 43 | # SSD network 44 | self.backbone = Backbone 45 | self.neck = Neck 46 | self.head = Head 47 | self.num_classes = cfg['num_classes'] 48 | self.softmax = nn.Softmax(dim=-1) 49 | self.detect = Detect(self.num_classes , 0, 200, 0.01, 0.45,variance = cfg['variance'], nms_kind=cfg['nms_kind'], beta1=cfg['beta1']) 50 | 51 | def forward(self, x, phase): 52 | """Applies network layers and ops on input image(s) x. 53 | 54 | Args: 55 | x: input image or batch of images. Shape: [batch,3,300,300]. 56 | 57 | Return: 58 | Depending on phase: 59 | test: 60 | Variable(tensor) of output class label predictions, 61 | confidence score, and corresponding location predictions for 62 | each object detected. Shape: [batch,topk,7] 63 | 64 | train: 65 | list of concat outputs from: 66 | 1: confidence layers, Shape: [batch*num_priors,num_classes] 67 | 2: localization layers, Shape: [batch,num_priors*4] 68 | 3: priorbox layers, Shape: [2,num_priors*4] 69 | """ 70 | 71 | 72 | x = self.backbone(x) 73 | if self.neck is not None: 74 | x = self.neck(x) 75 | 76 | conf,loc = self.head(x) 77 | 78 | loc = torch.cat([o.view(o.size(0), -1) for o in loc], 1) 79 | conf = torch.cat([o.view(o.size(0), -1) for o in conf], 1) 80 | if phase == "test": 81 | output = self.detect( 82 | loc.view(loc.size(0), -1, 4), # loc preds 83 | self.softmax(conf.view(conf.size(0), -1, 84 | self.num_classes)), # conf preds 85 | #self.priors.type(type(x.data)) # default boxes 86 | self.priors 87 | ) 88 | else: 89 | output = ( 90 | loc.view(loc.size(0), -1, 4), 91 | conf.view(conf.size(0), -1, self.num_classes), 92 | self.priors 93 | ) 94 | return output 95 | 96 | def load_weights(self, base_file): 97 | other, ext = os.path.splitext(base_file) 98 | if ext == '.pkl' or '.pth': 99 | print('Loading weights into state dict...') 100 | self.load_state_dict(torch.load(base_file, 101 | map_location=lambda storage, loc: storage)) 102 | print('Finished!') 103 | else: 104 | print('Sorry only .pth and .pkl files supported.') 105 | 106 | 107 | def build_ssd(phase, size=300, cfg=None): 108 | if phase != "test" and phase != "train": 109 | print("ERROR: Phase: " + phase + " not recognized") 110 | return 111 | if size != 300 and size!=600 and size!=800: 112 | print("ERROR: You specified size " + repr(size) + ". However, " + 113 | "currently only SSD300 (size=300) is supported!") 114 | return 115 | print(phase) 116 | base = Backbone(cfg['model'],[6,7,8,9,10,11]) 117 | 118 | neck = Neck(in_channels = cfg['backbone_out'], out_channels = cfg['neck_out']) 119 | head = SSDHead(num_classes = cfg['num_classes'],in_channels = cfg['neck_out'],aspect_ratios = cfg['aspect_ratios']) 120 | 121 | return SSD(phase, size, base, neck, head, cfg) 122 | -------------------------------------------------------------------------------- /model/head/__init__.py: -------------------------------------------------------------------------------- 1 | from .build_head import SSDHead -------------------------------------------------------------------------------- /model/head/__pycache__/__init__.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/head/__pycache__/__init__.cpython-35.pyc -------------------------------------------------------------------------------- /model/head/__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/head/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /model/head/__pycache__/build_head.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/head/__pycache__/build_head.cpython-35.pyc -------------------------------------------------------------------------------- /model/head/__pycache__/build_head.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/head/__pycache__/build_head.cpython-36.pyc -------------------------------------------------------------------------------- /model/head/build_head.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | 6 | 7 | 8 | class SSDHead(nn.Module): 9 | 10 | def __init__(self, 11 | num_classes=81, 12 | in_channels=[256,256,256,256,256], 13 | aspect_ratios=([2], [2, 3], [2, 3], [2, 3], [2], [2])): 14 | super(SSDHead, self).__init__() 15 | self.num_classes = num_classes 16 | self.in_channels = in_channels 17 | num_anchors = [len(ratios) * 2 + 2 for ratios in aspect_ratios] 18 | reg_convs = [] 19 | cls_convs = [] 20 | for i in range(len(in_channels)): 21 | reg_convs.append( 22 | nn.Conv2d( 23 | in_channels[i], 24 | num_anchors[i] * 4, 25 | kernel_size=3, 26 | padding=1)) 27 | cls_convs.append( 28 | nn.Conv2d( 29 | in_channels[i], 30 | num_anchors[i] * num_classes, 31 | kernel_size=3, 32 | padding=1)) 33 | self.reg_convs = nn.ModuleList(reg_convs) 34 | self.cls_convs = nn.ModuleList(cls_convs) 35 | 36 | self.init_weights() 37 | def init_weights(self): 38 | for m in self.modules(): 39 | if isinstance(m, nn.Conv2d): 40 | torch.nn.init.xavier_uniform_(m.weight) 41 | 42 | def forward(self, feats): 43 | cls_scores = [] 44 | bbox_preds = [] 45 | for feat, reg_conv, cls_conv in zip(feats, self.reg_convs, 46 | self.cls_convs): 47 | #[num_featuremap,w,h,c] 48 | cls_scores.append(cls_conv(feat).permute(0, 2, 3, 1).contiguous()) 49 | bbox_preds.append(reg_conv(feat).permute(0, 2, 3, 1).contiguous()) 50 | 51 | return cls_scores, bbox_preds -------------------------------------------------------------------------------- /model/neck/__init__.py: -------------------------------------------------------------------------------- 1 | from .build_neck import Neck 2 | from .ssd_neck import SSDNeck 3 | -------------------------------------------------------------------------------- /model/neck/__pycache__/__init__.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/neck/__pycache__/__init__.cpython-35.pyc -------------------------------------------------------------------------------- /model/neck/__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/neck/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /model/neck/__pycache__/build_neck.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/neck/__pycache__/build_neck.cpython-35.pyc -------------------------------------------------------------------------------- /model/neck/__pycache__/build_neck.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/neck/__pycache__/build_neck.cpython-36.pyc -------------------------------------------------------------------------------- /model/neck/__pycache__/ssd_neck.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/neck/__pycache__/ssd_neck.cpython-35.pyc -------------------------------------------------------------------------------- /model/neck/__pycache__/ssd_neck.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/neck/__pycache__/ssd_neck.cpython-36.pyc -------------------------------------------------------------------------------- /model/neck/build_neck.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | import torch.nn.functional as F 3 | import sys 4 | from ..utils import ConvModule 5 | 6 | 7 | class Neck(nn.Module): 8 | def __init__(self, in_channels = [64,256,512,1024,2048],out_channels = 256,out_map=None,start_level = 0,end_level = None): 9 | super(Neck,self).__init__() 10 | self.in_channels = in_channels 11 | if isinstance(out_channels,int): 12 | out_channels = [out_channels for i in range(len(self.in_channels))] 13 | self.out_channels = out_channels 14 | 15 | #select the out map 16 | self.out_map = out_map 17 | self.start_level = start_level 18 | self.end_level = end_level 19 | self.normalize = {'type':'BN'} 20 | if self.end_level is None: 21 | self.end_level = len(self.in_channels) 22 | 23 | if self.start_level<0 or self.end_level>len(self.in_channels): 24 | assert Exception("start_level or end_level is error") 25 | 26 | 27 | self.lateral_convs = nn.ModuleList() 28 | self.fpn_convs = nn.ModuleList() 29 | 30 | for i in range(self.start_level, self.end_level): 31 | l_conv = ConvModule( 32 | self.in_channels[i], 33 | self.out_channels[i], 34 | 1, 35 | normalize=self.normalize, 36 | bias=False, 37 | inplace=False) 38 | fpn_conv = ConvModule( 39 | out_channels[i], 40 | out_channels[i], 41 | 3, 42 | padding=1, 43 | normalize=self.normalize, 44 | bias=False, 45 | inplace=True) 46 | 47 | self.lateral_convs.append(l_conv) 48 | self.fpn_convs.append(fpn_conv) 49 | 50 | self.init_weights() 51 | 52 | 53 | def init_weights(self): 54 | for m in self.modules(): 55 | if isinstance(m, nn.Conv2d): 56 | nn.init.xavier_uniform_(m.weight) 57 | 58 | def forward(self,inputs): 59 | assert len(inputs) == len(self.in_channels) 60 | 61 | # build laterals 62 | laterals = [ 63 | lateral_conv(inputs[i + self.start_level]) 64 | for i, lateral_conv in enumerate(self.lateral_convs) 65 | ] 66 | 67 | # build top-down path 68 | used_backbone_levels = len(laterals) 69 | for i in range(used_backbone_levels - 1, 0, -1): 70 | 71 | laterals[i - 1] += F.interpolate( 72 | laterals[i],size = laterals[i-1].shape[2:], mode='nearest') 73 | # build outputs 74 | # part 1: from original levels 75 | outs = [ 76 | self.fpn_convs[i](laterals[i]) for i in range(used_backbone_levels) 77 | ] 78 | if self.out_map is not None: 79 | outs = outs[self.out_map] 80 | ''' 81 | for i in range(len(outs)): 82 | print(outs[i].shape) 83 | ''' 84 | return tuple(outs) 85 | 86 | 87 | 88 | -------------------------------------------------------------------------------- /model/neck/ssd_neck.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | import torch.nn.functional as F 3 | from ..utils import ConvModule 4 | 5 | 6 | class SSDNeck(nn.Module): 7 | def __init__(self, in_channels = [1024,2048],out_channels = 256,out_map=None,start_level = 0,end_level = None): 8 | super(SSDNeck,self).__init__() 9 | self.in_channels = in_channels 10 | if isinstance(out_channels,int): 11 | out_channels = [out_channels for i in range(len(self.in_channels))] 12 | self.out_channels = out_channels 13 | 14 | #select the out map 15 | self.out_map = out_map 16 | self.start_level = start_level 17 | self.end_level = end_level 18 | self.normalize = {'type':'BN'} 19 | if self.end_level is None: 20 | self.end_level = len(self.out_channels) 21 | 22 | 23 | self.fpn_convs = nn.ModuleList() 24 | 25 | for i in range(self.start_level, self.end_level): 26 | if i == 0 : 27 | fpn_conv = ConvModule( 28 | in_channels[-1], 29 | out_channels[0], 30 | 3, 31 | stride = 2, 32 | padding=1, 33 | normalize=self.normalize, 34 | bias=True, 35 | inplace=True) 36 | else: 37 | fpn_conv = ConvModule( 38 | out_channels[i-1], 39 | out_channels[i], 40 | 3, 41 | stride = 1, 42 | padding=0, 43 | normalize=self.normalize, 44 | bias=True, 45 | inplace=True) 46 | 47 | self.fpn_convs.append(fpn_conv) 48 | 49 | self.init_weights() 50 | def init_weights(self): 51 | for m in self.modules(): 52 | if isinstance(m, nn.Conv2d): 53 | nn.init.xavier_uniform_(m.weight) 54 | 55 | def forward(self,inputs): 56 | assert len(inputs) == len(self.in_channels) 57 | 58 | outs = [] 59 | # build outputs 60 | # part 1: from original levels 61 | 62 | x = inputs[-1] 63 | outs += inputs 64 | for i in range(self.start_level, self.end_level): 65 | x = self.fpn_convs[i](x) 66 | outs.append(x) 67 | if self.out_map is not None: 68 | outs = outs[self.out_map] 69 | ''' 70 | for i in range(len(outs)): 71 | print(outs[i].shape) 72 | ''' 73 | return tuple(outs) 74 | 75 | 76 | 77 | -------------------------------------------------------------------------------- /model/utils/__init__.py: -------------------------------------------------------------------------------- 1 | from .conv_module import ConvModule 2 | from .norm import build_norm_layer 3 | from .weight_init import (xavier_init, normal_init, uniform_init, kaiming_init, 4 | bias_init_with_prob) 5 | 6 | __all__ = [ 7 | 'ConvModule', 'build_norm_layer', 'xavier_init', 'normal_init', 8 | 'uniform_init', 'kaiming_init', 'bias_init_with_prob' 9 | ] 10 | -------------------------------------------------------------------------------- /model/utils/__pycache__/__init__.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/utils/__pycache__/__init__.cpython-35.pyc -------------------------------------------------------------------------------- /model/utils/__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/utils/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /model/utils/__pycache__/conv_module.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/utils/__pycache__/conv_module.cpython-35.pyc -------------------------------------------------------------------------------- /model/utils/__pycache__/conv_module.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/utils/__pycache__/conv_module.cpython-36.pyc -------------------------------------------------------------------------------- /model/utils/__pycache__/norm.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/utils/__pycache__/norm.cpython-35.pyc -------------------------------------------------------------------------------- /model/utils/__pycache__/norm.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/utils/__pycache__/norm.cpython-36.pyc -------------------------------------------------------------------------------- /model/utils/__pycache__/weight_init.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/utils/__pycache__/weight_init.cpython-35.pyc -------------------------------------------------------------------------------- /model/utils/__pycache__/weight_init.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/model/utils/__pycache__/weight_init.cpython-36.pyc -------------------------------------------------------------------------------- /model/utils/conv_module.py: -------------------------------------------------------------------------------- 1 | import warnings 2 | 3 | import torch.nn as nn 4 | from mmcv.cnn import kaiming_init, constant_init 5 | 6 | from .norm import build_norm_layer 7 | 8 | 9 | class ConvModule(nn.Module): 10 | 11 | def __init__(self, 12 | in_channels, 13 | out_channels, 14 | kernel_size, 15 | stride=1, 16 | padding=0, 17 | dilation=1, 18 | groups=1, 19 | bias=True, 20 | normalize=None, 21 | activation='relu', 22 | inplace=True, 23 | activate_last=True): 24 | super(ConvModule, self).__init__() 25 | self.with_norm = normalize is not None 26 | self.with_activatation = activation is not None 27 | self.with_bias = bias 28 | self.activation = activation 29 | self.activate_last = activate_last 30 | 31 | if self.with_norm and self.with_bias: 32 | warnings.warn('ConvModule has norm and bias at the same time') 33 | 34 | self.conv = nn.Conv2d( 35 | in_channels, 36 | out_channels, 37 | kernel_size, 38 | stride, 39 | padding, 40 | dilation, 41 | groups, 42 | bias=bias) 43 | 44 | self.in_channels = self.conv.in_channels 45 | self.out_channels = self.conv.out_channels 46 | self.kernel_size = self.conv.kernel_size 47 | self.stride = self.conv.stride 48 | self.padding = self.conv.padding 49 | self.dilation = self.conv.dilation 50 | self.transposed = self.conv.transposed 51 | self.output_padding = self.conv.output_padding 52 | self.groups = self.conv.groups 53 | 54 | if self.with_norm: 55 | # norm after conv or not 56 | norm_channels = out_channels if self.activate_last else in_channels 57 | self.norm_name, norm = build_norm_layer(normalize, norm_channels) 58 | self.add_module(self.norm_name, norm) 59 | 60 | if self.with_activatation: 61 | assert activation in ['relu'], 'Only ReLU supported.' 62 | if self.activation == 'relu': 63 | self.activate = nn.ReLU(inplace=inplace) 64 | 65 | @property 66 | def norm(self): 67 | return getattr(self, self.norm_name) 68 | 69 | def forward(self, x, activate=True, norm=True): 70 | if self.activate_last: 71 | x = self.conv(x) 72 | if norm and self.with_norm: 73 | x = self.norm(x) 74 | if activate and self.with_activatation: 75 | x = self.activate(x) 76 | else: 77 | if norm and self.with_norm: 78 | x = self.norm(x) 79 | if activate and self.with_activatation: 80 | x = self.activate(x) 81 | x = self.conv(x) 82 | return x 83 | -------------------------------------------------------------------------------- /model/utils/norm.py: -------------------------------------------------------------------------------- 1 | import torch.nn as nn 2 | 3 | 4 | norm_cfg = { 5 | # format: layer_type: (abbreviation, module) 6 | 'BN': ('bn', nn.BatchNorm2d), 7 | 'SyncBN': ('bn', None), 8 | 'GN': ('gn', nn.GroupNorm), 9 | # and potentially 'SN' 10 | } 11 | 12 | 13 | def build_norm_layer(cfg, num_features, postfix=''): 14 | """ Build normalization layer 15 | 16 | Args: 17 | cfg (dict): cfg should contain: 18 | type (str): identify norm layer type. 19 | layer args: args needed to instantiate a norm layer. 20 | frozen (bool): [optional] whether stop gradient updates 21 | of norm layer, it is helpful to set frozen mode 22 | in backbone's norms. 23 | num_features (int): number of channels from input 24 | postfix (int, str): appended into norm abbreation to 25 | create named layer. 26 | 27 | Returns: 28 | name (str): abbreation + postfix 29 | layer (nn.Module): created norm layer 30 | """ 31 | assert isinstance(cfg, dict) and 'type' in cfg 32 | cfg_ = cfg.copy() 33 | 34 | layer_type = cfg_.pop('type') 35 | if layer_type not in norm_cfg: 36 | raise KeyError('Unrecognized norm type {}'.format(layer_type)) 37 | else: 38 | abbr, norm_layer = norm_cfg[layer_type] 39 | if norm_layer is None: 40 | raise NotImplementedError 41 | 42 | assert isinstance(postfix, (int, str)) 43 | name = abbr + str(postfix) 44 | 45 | frozen = cfg_.pop('frozen', False) 46 | cfg_.setdefault('eps', 1e-5) 47 | if layer_type != 'GN': 48 | layer = norm_layer(num_features, **cfg_) 49 | else: 50 | assert 'num_groups' in cfg_ 51 | layer = norm_layer(num_channels=num_features, **cfg_) 52 | 53 | if frozen: 54 | for param in layer.parameters(): 55 | param.requires_grad = False 56 | 57 | return name, layer 58 | -------------------------------------------------------------------------------- /model/utils/weight_init.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch.nn as nn 3 | 4 | 5 | def xavier_init(module, gain=1, bias=0, distribution='normal'): 6 | assert distribution in ['uniform', 'normal'] 7 | if distribution == 'uniform': 8 | nn.init.xavier_uniform_(module.weight, gain=gain) 9 | else: 10 | nn.init.xavier_normal_(module.weight, gain=gain) 11 | if hasattr(module, 'bias'): 12 | nn.init.constant_(module.bias, bias) 13 | 14 | 15 | def normal_init(module, mean=0, std=1, bias=0): 16 | nn.init.normal_(module.weight, mean, std) 17 | if hasattr(module, 'bias'): 18 | nn.init.constant_(module.bias, bias) 19 | 20 | 21 | def uniform_init(module, a=0, b=1, bias=0): 22 | nn.init.uniform_(module.weight, a, b) 23 | if hasattr(module, 'bias'): 24 | nn.init.constant_(module.bias, bias) 25 | 26 | 27 | def kaiming_init(module, 28 | mode='fan_out', 29 | nonlinearity='relu', 30 | bias=0, 31 | distribution='normal'): 32 | assert distribution in ['uniform', 'normal'] 33 | if distribution == 'uniform': 34 | nn.init.kaiming_uniform_( 35 | module.weight, mode=mode, nonlinearity=nonlinearity) 36 | else: 37 | nn.init.kaiming_normal_( 38 | module.weight, mode=mode, nonlinearity=nonlinearity) 39 | if hasattr(module, 'bias'): 40 | nn.init.constant_(module.bias, bias) 41 | 42 | 43 | def bias_init_with_prob(prior_prob): 44 | """ initialize conv/fc bias value according to giving probablity""" 45 | bias_init = float(-np.log((1 - prior_prob) / prior_prob)) 46 | return bias_init 47 | -------------------------------------------------------------------------------- /tools/ap.py: -------------------------------------------------------------------------------- 1 | """Adapted from: 2 | @longcw faster_rcnn_pytorch: https://github.com/longcw/faster_rcnn_pytorch 3 | @rbgirshick py-faster-rcnn https://github.com/rbgirshick/py-faster-rcnn 4 | Licensed under The MIT License [see LICENSE for details] 5 | """ 6 | 7 | from __future__ import print_function 8 | import os 9 | import sys 10 | import time 11 | sys.path.append(os.getcwd()) 12 | 13 | import torch 14 | from data import * 15 | from config import crack,voc,coco,trafic 16 | from data import VOC_CLASSES as labelmap 17 | from model import build_ssd 18 | 19 | import argparse 20 | import numpy as np 21 | import pickle 22 | 23 | from tqdm import tqdm 24 | 25 | if sys.version_info[0] == 2: 26 | import xml.etree.cElementTree as ET 27 | else: 28 | import xml.etree.ElementTree as ET 29 | 30 | 31 | def str2bool(v): 32 | return v.lower() in ("yes", "true", "t", "1") 33 | 34 | 35 | parser = argparse.ArgumentParser( 36 | description='Single Shot MultiBox Detector Evaluation') 37 | parser.add_argument('--trained_model', 38 | default='weights/ssd300_mAP_77.43_v2.pth', type=str, 39 | help='Trained state_dict file path to open') 40 | parser.add_argument('--save_folder', default='eval/', type=str, 41 | help='File path to save results') 42 | parser.add_argument('--confidence_threshold', default=0.01, type=float, 43 | help='Detection confidence threshold') 44 | parser.add_argument('--top_k', default=5, type=int, 45 | help='Further restrict the number of predictions to parse') 46 | parser.add_argument('--cuda', default=True, type=str2bool, 47 | help='Use cuda to train model') 48 | parser.add_argument('--voc_root', default=VOC_ROOT, 49 | help='Location of VOC root directory') 50 | parser.add_argument('--over_thresh', default=0.5, type=float, 51 | help='Cleanup and remove results files following eval') 52 | args = parser.parse_args() 53 | 54 | if not os.path.exists(args.save_folder): 55 | os.mkdir(args.save_folder) 56 | 57 | 58 | 59 | 60 | 61 | class Timer(object): 62 | """A simple timer.""" 63 | def __init__(self): 64 | self.total_time = 0. 65 | self.calls = 0 66 | self.start_time = 0. 67 | self.diff = 0. 68 | self.average_time = 0. 69 | 70 | def tic(self): 71 | # using time.time instead of time.clock because time time.clock 72 | # does not normalize for multithreading 73 | self.start_time = time.time() 74 | 75 | def toc(self, average=True): 76 | self.diff = time.time() - self.start_time 77 | self.total_time += self.diff 78 | self.calls += 1 79 | self.average_time = self.total_time / self.calls 80 | if average: 81 | return self.average_time 82 | else: 83 | return self.diff 84 | 85 | 86 | def parse_rec(filename): 87 | """ Parse a PASCAL VOC xml file """ 88 | tree = ET.parse(filename) 89 | objects = [] 90 | for obj in tree.findall('object'): 91 | obj_struct = {} 92 | obj_struct['name'] = obj.find('name').text 93 | obj_struct['pose'] = obj.find('pose').text 94 | obj_struct['truncated'] = int(obj.find('truncated').text) 95 | obj_struct['difficult'] = int(obj.find('difficult').text) 96 | bbox = obj.find('bndbox') 97 | obj_struct['bbox'] = [int(bbox.find('xmin').text) - 1, 98 | int(bbox.find('ymin').text) - 1, 99 | int(bbox.find('xmax').text) - 1, 100 | int(bbox.find('ymax').text) - 1] 101 | objects.append(obj_struct) 102 | 103 | return objects 104 | 105 | 106 | def get_output_dir(name, phase): 107 | """Return the directory where experimental artifacts are placed. 108 | If the directory does not exist, it is created. 109 | A canonical path is built using the name from an imdb and a network 110 | (if not None). 111 | """ 112 | filedir = os.path.join(name, phase) 113 | if not os.path.exists(filedir): 114 | os.makedirs(filedir) 115 | return filedir 116 | 117 | 118 | def get_voc_results_file_template(data_dir,image_set, cls): 119 | # VOCdevkit/VOC2007/results/det_test_aeroplane.txt 120 | filename = 'det_' + image_set + '_%s.txt' % (cls) 121 | filedir = os.path.join(data_dir, 'results') 122 | if not os.path.exists(filedir): 123 | os.makedirs(filedir) 124 | path = os.path.join(filedir, filename) 125 | return path 126 | 127 | 128 | def write_voc_results_file(data_dir,all_boxes, dataset ,set_type): 129 | for cls_ind, cls in enumerate(labelmap): 130 | #get any class to store the result 131 | filename = get_voc_results_file_template(data_dir,set_type, cls) 132 | with open(filename, 'wt') as f: 133 | for im_ind, index in enumerate(dataset.ids): 134 | dets = all_boxes[cls_ind+1][im_ind] 135 | if dets == []: 136 | continue 137 | # the VOCdevkit expects 1-based indices 138 | for k in range(dets.shape[0]): 139 | f.write('{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}\n'. 140 | format(index[1], dets[k, -1], 141 | dets[k, 0] + 1, dets[k, 1] + 1, 142 | dets[k, 2] + 1, dets[k, 3] + 1)) 143 | 144 | 145 | def do_python_eval(output_dir, set_type,thresh, use_07=True): 146 | cachedir = os.path.join(output_dir, 'annotations_cache') 147 | imgsetpath = os.path.join(output_dir,'ImageSets', 148 | 'Main', '{:s}.txt') 149 | annopath = os.path.join(output_dir, 'Annotations', '%s.xml') 150 | if not os.path.isdir(cachedir): 151 | os.mkdir(cachedir) 152 | #print(cachedir) 153 | aps = [] 154 | # The PASCAL VOC metric changed in 2010 155 | use_07_metric = use_07 156 | print('VOC07 metric? ' + ('Yes' if use_07_metric else 'No')) 157 | 158 | for i, cls in enumerate(labelmap): 159 | filename = get_voc_results_file_template(output_dir,set_type, cls) 160 | rec, prec, ap = voc_eval( 161 | filename, annopath, imgsetpath.format(set_type), cls, cachedir, 162 | ovthresh=thresh, use_07_metric=use_07_metric) 163 | aps += [ap] 164 | print('AP for {} = {:.4f}'.format(cls, ap)) 165 | #with open(os.path.join(output_dir, cls + '_pr.pkl'), 'wb') as f: 166 | #pickle.dump({'rec': rec, 'prec': prec, 'ap': ap}, f) 167 | print('Mean AP = {:.4f}'.format(np.mean(aps))) 168 | print('~~~~~~~~') 169 | print('Results:') 170 | for ap in aps: 171 | print('{:.3f}'.format(ap)) 172 | print('{:.3f}'.format(np.mean(aps))) 173 | print('~~~~~~~~') 174 | print('') 175 | print('--------------------------------------------------------------') 176 | print('Results computed with the **unofficial** Python eval code.') 177 | print('Results should be very close to the official MATLAB eval code.') 178 | print('--------------------------------------------------------------') 179 | return np.mean(aps) 180 | 181 | def voc_ap(rec, prec, use_07_metric=True): 182 | """ ap = voc_ap(rec, prec, [use_07_metric]) 183 | Compute VOC AP given precision and recall. 184 | If use_07_metric is true, uses the 185 | VOC 07 11 point method (default:True). 186 | """ 187 | if use_07_metric: 188 | # 11 point metric 189 | ap = 0. 190 | for t in np.arange(0., 1.1, 0.1): 191 | if np.sum(rec >= t) == 0: 192 | p = 0 193 | else: 194 | p = np.max(prec[rec >= t]) 195 | ap = ap + p / 11. 196 | else: 197 | # correct AP calculation 198 | # first append sentinel values at the end 199 | mrec = np.concatenate(([0.], rec, [1.])) 200 | mpre = np.concatenate(([0.], prec, [0.])) 201 | 202 | # compute the precision envelope 203 | for i in range(mpre.size - 1, 0, -1): 204 | mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i]) 205 | 206 | # to calculate area under PR curve, look for points 207 | # where X axis (recall) changes value 208 | i = np.where(mrec[1:] != mrec[:-1])[0] 209 | 210 | # and sum (\Delta recall) * prec 211 | ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) 212 | return ap 213 | 214 | 215 | def voc_eval(detpath, 216 | annopath, 217 | imagesetfile, 218 | classname, 219 | cachedir, 220 | ovthresh=0.5, 221 | use_07_metric=True): 222 | """rec, prec, ap = voc_eval(detpath, 223 | annopath, 224 | imagesetfile, 225 | classname, 226 | [ovthresh], 227 | [use_07_metric]) 228 | Top level function that does the PASCAL VOC evaluation. 229 | detpath: Path to detections 230 | detpath.format(classname) should produce the detection results file. 231 | annopath: Path to annotations 232 | annopath.format(imagename) should be the xml annotations file. 233 | imagesetfile: Text file containing the list of images, one image per line. 234 | classname: Category name (duh) 235 | cachedir: Directory for caching the annotations 236 | [ovthresh]: Overlap threshold (default = 0.5) 237 | [use_07_metric]: Whether to use VOC07's 11 point AP computation 238 | (default True) 239 | """ 240 | # assumes detections are in detpath.format(classname) 241 | # assumes annotations are in annopath.format(imagename) 242 | # assumes imagesetfile is a text file with each line an image name 243 | # cachedir caches the annotations in a pickle file 244 | # first load gt 245 | cachefile = os.path.join(cachedir, 'annots.pkl') 246 | # read list of images 247 | with open(imagesetfile, 'r') as f: 248 | lines = f.readlines() 249 | imagenames = [x.strip() for x in lines] 250 | # save the truth data as pickle,if the pickle in the file, just load it. 251 | if not os.path.isfile(cachefile): 252 | #load annots 253 | recs = {} 254 | for i, imagename in enumerate(imagenames): 255 | recs[imagename] = parse_rec(annopath % (imagename)) 256 | # save 257 | print('Saving cached annotations to {:s}'.format(cachefile)) 258 | with open(cachefile, 'wb') as f: 259 | pickle.dump(recs, f) 260 | else: 261 | # load 262 | with open(cachefile, 'rb') as f: 263 | recs = pickle.load(f) 264 | 265 | # extract gt objects for this class 266 | class_recs = {} 267 | npos = 0 268 | for imagename in imagenames: 269 | 270 | R = [obj for obj in recs[imagename] if obj['name'] == classname] 271 | bbox = np.array([x['bbox'] for x in R]) 272 | difficult = np.array([x['difficult'] for x in R]).astype(np.bool) 273 | det = [False] * len(R) 274 | npos = npos + sum(~difficult) 275 | class_recs[imagename] = {'bbox': bbox, 276 | 'difficult': difficult, 277 | 'det': det} 278 | 279 | # read dets 280 | detfile = detpath.format(classname) 281 | with open(detfile, 'r') as f: 282 | lines = f.readlines() 283 | if any(lines) == 1: 284 | 285 | splitlines = [x.strip().split(' ') for x in lines] 286 | image_ids = [x[0] for x in splitlines] 287 | confidence = np.array([float(x[1]) for x in splitlines]) 288 | BB = np.array([[float(z) for z in x[2:]] for x in splitlines]) 289 | 290 | # sort by confidence 291 | sorted_ind = np.argsort(-confidence) 292 | sorted_scores = np.sort(-confidence) 293 | BB = BB[sorted_ind, :] 294 | image_ids = [image_ids[x] for x in sorted_ind] 295 | 296 | # go down dets and mark TPs and FPs 297 | nd = len(image_ids) 298 | tp = np.zeros(nd) 299 | fp = np.zeros(nd) 300 | for d in range(nd): 301 | R = class_recs[image_ids[d]] 302 | bb = BB[d, :].astype(float) 303 | ovmax = -np.inf 304 | BBGT = R['bbox'].astype(float) 305 | if BBGT.size > 0: 306 | # compute overlaps 307 | # intersection 308 | ixmin = np.maximum(BBGT[:, 0], bb[0]) 309 | iymin = np.maximum(BBGT[:, 1], bb[1]) 310 | ixmax = np.minimum(BBGT[:, 2], bb[2]) 311 | iymax = np.minimum(BBGT[:, 3], bb[3]) 312 | iw = np.maximum(ixmax - ixmin, 0.) 313 | ih = np.maximum(iymax - iymin, 0.) 314 | inters = iw * ih 315 | uni = ((bb[2] - bb[0]) * (bb[3] - bb[1]) + 316 | (BBGT[:, 2] - BBGT[:, 0]) * 317 | (BBGT[:, 3] - BBGT[:, 1]) - inters) 318 | overlaps = inters / uni 319 | ovmax = np.max(overlaps) 320 | jmax = np.argmax(overlaps) 321 | 322 | if ovmax > ovthresh: 323 | if not R['difficult'][jmax]: 324 | if not R['det'][jmax]: 325 | tp[d] = 1. 326 | R['det'][jmax] = 1 327 | else: 328 | fp[d] = 1. 329 | else: 330 | fp[d] = 1. 331 | 332 | # compute precision recall 333 | fp = np.cumsum(fp) 334 | tp = np.cumsum(tp) 335 | rec = tp / float(npos) 336 | # avoid divide by zero in case the first detection matches a difficult 337 | # ground truth 338 | prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps) 339 | ap = voc_ap(rec, prec, use_07_metric) 340 | else: 341 | rec = -1. 342 | prec = -1. 343 | ap = -1. 344 | 345 | return rec, prec, ap 346 | 347 | 348 | def test_net(save_folder, net, cuda, dataset, top_k,im_size=300, thresh=0.05): 349 | #the len of pic 350 | num_images = len(dataset) 351 | # all detections are collected into:[21,4952,0] 352 | # all_boxes[cls][image] = N x 5 array of detections in 353 | # (x1, y1, x2, y2, score) 354 | all_boxes = [[[] for _ in range(num_images)] 355 | for _ in range(len(labelmap)+1)] 356 | # timers 357 | _t = {'im_detect': Timer(), 'misc': Timer()} 358 | 359 | print(num_images) 360 | for i in tqdm(range(num_images)): 361 | with torch.no_grad(): 362 | im, gt, h, w = dataset.pull_item(i) 363 | x = im.unsqueeze(0) 364 | if args.cuda: 365 | x = x.cuda() 366 | _t['im_detect'].tic() 367 | detections = net(x,'test').data 368 | detect_time = _t['im_detect'].toc(average=False) 369 | 370 | # skip j = 0, because it's the background class 371 | for j in range(1, detections.size(1)): 372 | dets = detections[0, j, :] 373 | mask = dets[:, 0].gt(0.).expand(5, dets.size(0)).t() 374 | dets = torch.masked_select(dets, mask).view(-1, 5) 375 | if dets.size(0) == 0: 376 | continue 377 | boxes = dets[:, 1:] 378 | boxes[:, 0] *= w 379 | boxes[:, 2] *= w 380 | boxes[:, 1] *= h 381 | boxes[:, 3] *= h 382 | scores = dets[:, 0].cpu().numpy() 383 | cls_dets = np.hstack((boxes.cpu().numpy(), 384 | scores[:, np.newaxis])).astype(np.float32, 385 | copy=False) 386 | all_boxes[j][i] = cls_dets 387 | return all_boxes 388 | 389 | def evaluate_detections(data_dir,box_list,dataset, thresh, eval_type = 'test'): 390 | #write the det result to dir 391 | write_voc_results_file(data_dir,box_list, dataset, eval_type) 392 | return do_python_eval(data_dir,eval_type,thresh=thresh) 393 | 394 | 395 | if __name__ == '__main__': 396 | if torch.cuda.is_available(): 397 | if args.cuda: 398 | torch.set_default_tensor_type('torch.cuda.FloatTensor') 399 | if not args.cuda: 400 | print("WARNING: It looks like you have a CUDA device, but aren't using \ 401 | CUDA. Run with --cuda for optimal eval speed.") 402 | torch.set_default_tensor_type('torch.FloatTensor') 403 | else: 404 | torch.set_default_tensor_type('torch.FloatTensor') 405 | num_classes = len(labelmap) + 1 # +1 for background 406 | net = build_ssd('test', size = 300, cfg = voc) # initialize SSD 407 | net.load_state_dict(torch.load(args.trained_model)) 408 | 409 | print('Finished loading model!') 410 | # load data 411 | dataset = VOCDetection(args.voc_root, [('2007', 'test')], 412 | BaseTransform(300, voc['mean'],voc['std'])) 413 | if args.cuda: 414 | net = net.cuda() 415 | torch.backends.cudnn.benchmark = False 416 | net.eval() 417 | 418 | # evaluation 419 | devkit_path = VOC_ROOT +'VOC2007/' 420 | 421 | all_boxes = test_net(args.save_folder, net, args.cuda, dataset,args.top_k, 300, 422 | thresh=args.confidence_threshold) 423 | print('Evaluating detections') 424 | results = [] 425 | for thresh in np.arange(0.5,1,0.05): 426 | result = evaluate_detections(devkit_path,all_boxes, dataset,thresh, 'test') 427 | results.append(result) 428 | print(results[0], results[5], sum(results)/10) 429 | 430 | -------------------------------------------------------------------------------- /tools/eval.py: -------------------------------------------------------------------------------- 1 | """Adapted from: 2 | @longcw faster_rcnn_pytorch: https://github.com/longcw/faster_rcnn_pytorch 3 | @rbgirshick py-faster-rcnn https://github.com/rbgirshick/py-faster-rcnn 4 | Licensed under The MIT License [see LICENSE for details] 5 | """ 6 | 7 | from __future__ import print_function 8 | import os 9 | import sys 10 | import time 11 | sys.path.append(os.getcwd()) 12 | 13 | import torch 14 | from data import * 15 | from config import crack,voc,coco,trafic 16 | from data import VOC_CLASSES as labelmap 17 | from model import build_ssd 18 | 19 | import argparse 20 | import numpy as np 21 | import pickle 22 | 23 | from tqdm import tqdm 24 | 25 | if sys.version_info[0] == 2: 26 | import xml.etree.cElementTree as ET 27 | else: 28 | import xml.etree.ElementTree as ET 29 | 30 | 31 | def str2bool(v): 32 | return v.lower() in ("yes", "true", "t", "1") 33 | 34 | 35 | parser = argparse.ArgumentParser( 36 | description='Single Shot MultiBox Detector Evaluation') 37 | parser.add_argument('--trained_model', 38 | default='weights/ssd300_mAP_77.43_v2.pth', type=str, 39 | help='Trained state_dict file path to open') 40 | parser.add_argument('--save_folder', default='eval/', type=str, 41 | help='File path to save results') 42 | parser.add_argument('--confidence_threshold', default=0.01, type=float, 43 | help='Detection confidence threshold') 44 | parser.add_argument('--top_k', default=5, type=int, 45 | help='Further restrict the number of predictions to parse') 46 | parser.add_argument('--cuda', default=True, type=str2bool, 47 | help='Use cuda to train model') 48 | parser.add_argument('--voc_root', default=VOC_ROOT, 49 | help='Location of VOC root directory') 50 | parser.add_argument('--over_thresh', default=0.5, type=float, 51 | help='Cleanup and remove results files following eval') 52 | args = parser.parse_args() 53 | 54 | if not os.path.exists(args.save_folder): 55 | os.mkdir(args.save_folder) 56 | 57 | 58 | 59 | 60 | 61 | class Timer(object): 62 | """A simple timer.""" 63 | def __init__(self): 64 | self.total_time = 0. 65 | self.calls = 0 66 | self.start_time = 0. 67 | self.diff = 0. 68 | self.average_time = 0. 69 | 70 | def tic(self): 71 | # using time.time instead of time.clock because time time.clock 72 | # does not normalize for multithreading 73 | self.start_time = time.time() 74 | 75 | def toc(self, average=True): 76 | self.diff = time.time() - self.start_time 77 | self.total_time += self.diff 78 | self.calls += 1 79 | self.average_time = self.total_time / self.calls 80 | if average: 81 | return self.average_time 82 | else: 83 | return self.diff 84 | 85 | 86 | def parse_rec(filename): 87 | """ Parse a PASCAL VOC xml file """ 88 | tree = ET.parse(filename) 89 | objects = [] 90 | for obj in tree.findall('object'): 91 | obj_struct = {} 92 | obj_struct['name'] = obj.find('name').text 93 | obj_struct['pose'] = obj.find('pose').text 94 | obj_struct['truncated'] = int(obj.find('truncated').text) 95 | obj_struct['difficult'] = int(obj.find('difficult').text) 96 | bbox = obj.find('bndbox') 97 | obj_struct['bbox'] = [int(bbox.find('xmin').text) - 1, 98 | int(bbox.find('ymin').text) - 1, 99 | int(bbox.find('xmax').text) - 1, 100 | int(bbox.find('ymax').text) - 1] 101 | objects.append(obj_struct) 102 | 103 | return objects 104 | 105 | 106 | def get_output_dir(name, phase): 107 | """Return the directory where experimental artifacts are placed. 108 | If the directory does not exist, it is created. 109 | A canonical path is built using the name from an imdb and a network 110 | (if not None). 111 | """ 112 | filedir = os.path.join(name, phase) 113 | if not os.path.exists(filedir): 114 | os.makedirs(filedir) 115 | return filedir 116 | 117 | 118 | def get_voc_results_file_template(data_dir,image_set, cls): 119 | # VOCdevkit/VOC2007/results/det_test_aeroplane.txt 120 | filename = 'det_' + image_set + '_%s.txt' % (cls) 121 | filedir = os.path.join(data_dir, 'results') 122 | if not os.path.exists(filedir): 123 | os.makedirs(filedir) 124 | path = os.path.join(filedir, filename) 125 | return path 126 | 127 | 128 | def write_voc_results_file(data_dir,all_boxes, dataset ,set_type): 129 | for cls_ind, cls in enumerate(labelmap): 130 | #get any class to store the result 131 | filename = get_voc_results_file_template(data_dir,set_type, cls) 132 | with open(filename, 'wt') as f: 133 | for im_ind, index in enumerate(dataset.ids): 134 | dets = all_boxes[cls_ind+1][im_ind] 135 | if dets == []: 136 | continue 137 | # the VOCdevkit expects 1-based indices 138 | for k in range(dets.shape[0]): 139 | f.write('{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}\n'. 140 | format(index[1], dets[k, -1], 141 | dets[k, 0] + 1, dets[k, 1] + 1, 142 | dets[k, 2] + 1, dets[k, 3] + 1)) 143 | 144 | 145 | def do_python_eval(output_dir, set_type,use_07=True): 146 | cachedir = os.path.join(output_dir, 'annotations_cache') 147 | imgsetpath = os.path.join(output_dir,'ImageSets', 148 | 'Main', '{:s}.txt') 149 | annopath = os.path.join(output_dir, 'Annotations', '%s.xml') 150 | if not os.path.isdir(cachedir): 151 | os.mkdir(cachedir) 152 | #print(cachedir) 153 | aps = [] 154 | # The PASCAL VOC metric changed in 2010 155 | use_07_metric = use_07 156 | print('VOC07 metric? ' + ('Yes' if use_07_metric else 'No')) 157 | 158 | for i, cls in enumerate(labelmap): 159 | filename = get_voc_results_file_template(output_dir,set_type, cls) 160 | rec, prec, ap = voc_eval( 161 | filename, annopath, imgsetpath.format(set_type), cls, cachedir, 162 | ovthresh=args.over_thresh, use_07_metric=use_07_metric) 163 | aps += [ap] 164 | print('AP for {} = {:.4f}'.format(cls, ap)) 165 | #with open(os.path.join(output_dir, cls + '_pr.pkl'), 'wb') as f: 166 | #pickle.dump({'rec': rec, 'prec': prec, 'ap': ap}, f) 167 | print('Mean AP = {:.4f}'.format(np.mean(aps))) 168 | print('~~~~~~~~') 169 | print('Results:') 170 | for ap in aps: 171 | print('{:.3f}'.format(ap)) 172 | print('{:.3f}'.format(np.mean(aps))) 173 | print('~~~~~~~~') 174 | print('') 175 | print('--------------------------------------------------------------') 176 | print('Results computed with the **unofficial** Python eval code.') 177 | print('Results should be very close to the official MATLAB eval code.') 178 | print('--------------------------------------------------------------') 179 | return np.mean(aps) 180 | 181 | def voc_ap(rec, prec, use_07_metric=True): 182 | """ ap = voc_ap(rec, prec, [use_07_metric]) 183 | Compute VOC AP given precision and recall. 184 | If use_07_metric is true, uses the 185 | VOC 07 11 point method (default:True). 186 | """ 187 | if use_07_metric: 188 | # 11 point metric 189 | ap = 0. 190 | for t in np.arange(0., 1.1, 0.1): 191 | if np.sum(rec >= t) == 0: 192 | p = 0 193 | else: 194 | p = np.max(prec[rec >= t]) 195 | ap = ap + p / 11. 196 | else: 197 | # correct AP calculation 198 | # first append sentinel values at the end 199 | mrec = np.concatenate(([0.], rec, [1.])) 200 | mpre = np.concatenate(([0.], prec, [0.])) 201 | 202 | # compute the precision envelope 203 | for i in range(mpre.size - 1, 0, -1): 204 | mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i]) 205 | 206 | # to calculate area under PR curve, look for points 207 | # where X axis (recall) changes value 208 | i = np.where(mrec[1:] != mrec[:-1])[0] 209 | 210 | # and sum (\Delta recall) * prec 211 | ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) 212 | return ap 213 | 214 | 215 | def voc_eval(detpath, 216 | annopath, 217 | imagesetfile, 218 | classname, 219 | cachedir, 220 | ovthresh=0.5, 221 | use_07_metric=True): 222 | """rec, prec, ap = voc_eval(detpath, 223 | annopath, 224 | imagesetfile, 225 | classname, 226 | [ovthresh], 227 | [use_07_metric]) 228 | Top level function that does the PASCAL VOC evaluation. 229 | detpath: Path to detections 230 | detpath.format(classname) should produce the detection results file. 231 | annopath: Path to annotations 232 | annopath.format(imagename) should be the xml annotations file. 233 | imagesetfile: Text file containing the list of images, one image per line. 234 | classname: Category name (duh) 235 | cachedir: Directory for caching the annotations 236 | [ovthresh]: Overlap threshold (default = 0.5) 237 | [use_07_metric]: Whether to use VOC07's 11 point AP computation 238 | (default True) 239 | """ 240 | # assumes detections are in detpath.format(classname) 241 | # assumes annotations are in annopath.format(imagename) 242 | # assumes imagesetfile is a text file with each line an image name 243 | # cachedir caches the annotations in a pickle file 244 | # first load gt 245 | cachefile = os.path.join(cachedir, 'annots.pkl') 246 | # read list of images 247 | with open(imagesetfile, 'r') as f: 248 | lines = f.readlines() 249 | imagenames = [x.strip() for x in lines] 250 | # save the truth data as pickle,if the pickle in the file, just load it. 251 | if not os.path.isfile(cachefile): 252 | #load annots 253 | recs = {} 254 | for i, imagename in enumerate(imagenames): 255 | recs[imagename] = parse_rec(annopath % (imagename)) 256 | # save 257 | print('Saving cached annotations to {:s}'.format(cachefile)) 258 | with open(cachefile, 'wb') as f: 259 | pickle.dump(recs, f) 260 | else: 261 | # load 262 | with open(cachefile, 'rb') as f: 263 | recs = pickle.load(f) 264 | 265 | # extract gt objects for this class 266 | class_recs = {} 267 | npos = 0 268 | for imagename in imagenames: 269 | 270 | R = [obj for obj in recs[imagename] if obj['name'] == classname] 271 | bbox = np.array([x['bbox'] for x in R]) 272 | difficult = np.array([x['difficult'] for x in R]).astype(np.bool) 273 | det = [False] * len(R) 274 | npos = npos + sum(~difficult) 275 | class_recs[imagename] = {'bbox': bbox, 276 | 'difficult': difficult, 277 | 'det': det} 278 | 279 | # read dets 280 | detfile = detpath.format(classname) 281 | with open(detfile, 'r') as f: 282 | lines = f.readlines() 283 | if any(lines) == 1: 284 | 285 | splitlines = [x.strip().split(' ') for x in lines] 286 | image_ids = [x[0] for x in splitlines] 287 | confidence = np.array([float(x[1]) for x in splitlines]) 288 | BB = np.array([[float(z) for z in x[2:]] for x in splitlines]) 289 | 290 | # sort by confidence 291 | sorted_ind = np.argsort(-confidence) 292 | sorted_scores = np.sort(-confidence) 293 | BB = BB[sorted_ind, :] 294 | image_ids = [image_ids[x] for x in sorted_ind] 295 | 296 | # go down dets and mark TPs and FPs 297 | nd = len(image_ids) 298 | tp = np.zeros(nd) 299 | fp = np.zeros(nd) 300 | for d in range(nd): 301 | R = class_recs[image_ids[d]] 302 | bb = BB[d, :].astype(float) 303 | ovmax = -np.inf 304 | BBGT = R['bbox'].astype(float) 305 | if BBGT.size > 0: 306 | # compute overlaps 307 | # intersection 308 | ixmin = np.maximum(BBGT[:, 0], bb[0]) 309 | iymin = np.maximum(BBGT[:, 1], bb[1]) 310 | ixmax = np.minimum(BBGT[:, 2], bb[2]) 311 | iymax = np.minimum(BBGT[:, 3], bb[3]) 312 | iw = np.maximum(ixmax - ixmin, 0.) 313 | ih = np.maximum(iymax - iymin, 0.) 314 | inters = iw * ih 315 | uni = ((bb[2] - bb[0]) * (bb[3] - bb[1]) + 316 | (BBGT[:, 2] - BBGT[:, 0]) * 317 | (BBGT[:, 3] - BBGT[:, 1]) - inters) 318 | overlaps = inters / uni 319 | ovmax = np.max(overlaps) 320 | jmax = np.argmax(overlaps) 321 | 322 | if ovmax > ovthresh: 323 | if not R['difficult'][jmax]: 324 | if not R['det'][jmax]: 325 | tp[d] = 1. 326 | R['det'][jmax] = 1 327 | else: 328 | fp[d] = 1. 329 | else: 330 | fp[d] = 1. 331 | 332 | # compute precision recall 333 | fp = np.cumsum(fp) 334 | tp = np.cumsum(tp) 335 | rec = tp / float(npos) 336 | # avoid divide by zero in case the first detection matches a difficult 337 | # ground truth 338 | prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps) 339 | ap = voc_ap(rec, prec, use_07_metric) 340 | else: 341 | rec = -1. 342 | prec = -1. 343 | ap = -1. 344 | 345 | return rec, prec, ap 346 | 347 | 348 | def test_net(save_folder, net, cuda, dataset, top_k,im_size=300, thresh=0.05): 349 | #the len of pic 350 | num_images = len(dataset) 351 | # all detections are collected into:[21,4952,0] 352 | # all_boxes[cls][image] = N x 5 array of detections in 353 | # (x1, y1, x2, y2, score) 354 | all_boxes = [[[] for _ in range(num_images)] 355 | for _ in range(len(labelmap)+1)] 356 | # timers 357 | _t = {'im_detect': Timer(), 'misc': Timer()} 358 | 359 | print(num_images) 360 | for i in tqdm(range(num_images)): 361 | with torch.no_grad(): 362 | im, gt, h, w = dataset.pull_item(i) 363 | x = im.unsqueeze(0) 364 | if args.cuda: 365 | x = x.cuda() 366 | _t['im_detect'].tic() 367 | detections = net(x,'test').data 368 | detect_time = _t['im_detect'].toc(average=False) 369 | 370 | # skip j = 0, because it's the background class 371 | for j in range(1, detections.size(1)): 372 | dets = detections[0, j, :] 373 | mask = dets[:, 0].gt(0.).expand(5, dets.size(0)).t() 374 | dets = torch.masked_select(dets, mask).view(-1, 5) 375 | if dets.size(0) == 0: 376 | continue 377 | boxes = dets[:, 1:] 378 | boxes[:, 0] *= w 379 | boxes[:, 2] *= w 380 | boxes[:, 1] *= h 381 | boxes[:, 3] *= h 382 | scores = dets[:, 0].cpu().numpy() 383 | cls_dets = np.hstack((boxes.cpu().numpy(), 384 | scores[:, np.newaxis])).astype(np.float32, 385 | copy=False) 386 | all_boxes[j][i] = cls_dets 387 | return all_boxes 388 | 389 | def evaluate_detections(data_dir,box_list,dataset,eval_type = 'test'): 390 | #write the det result to dir 391 | write_voc_results_file(data_dir,box_list, dataset, eval_type) 392 | return do_python_eval(data_dir,eval_type) 393 | 394 | 395 | if __name__ == '__main__': 396 | if torch.cuda.is_available(): 397 | if args.cuda: 398 | torch.set_default_tensor_type('torch.cuda.FloatTensor') 399 | if not args.cuda: 400 | print("WARNING: It looks like you have a CUDA device, but aren't using \ 401 | CUDA. Run with --cuda for optimal eval speed.") 402 | torch.set_default_tensor_type('torch.FloatTensor') 403 | else: 404 | torch.set_default_tensor_type('torch.FloatTensor') 405 | num_classes = len(labelmap) + 1 # +1 for background 406 | net = build_ssd('test', size = 300, cfg = voc) # initialize SSD 407 | net.load_state_dict(torch.load(args.trained_model)) 408 | 409 | print('Finished loading model!') 410 | # load data 411 | dataset = VOCDetection(args.voc_root, [('2007', 'test')], 412 | BaseTransform(300, voc['mean'],voc['std'])) 413 | if args.cuda: 414 | net = net.cuda() 415 | #torch.backends.cudnn.benchmark = True 416 | net.eval() 417 | 418 | # evaluation 419 | devkit_path = VOC_ROOT +'VOC2007/' 420 | 421 | all_boxes = test_net(args.save_folder, net, args.cuda, dataset,args.top_k, 300, 422 | thresh=args.confidence_threshold) 423 | print('Evaluating detections') 424 | result = evaluate_detections(devkit_path,all_boxes, dataset,'test') 425 | -------------------------------------------------------------------------------- /tools/test.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import sys 3 | import os 4 | sys.path.append(os.getcwd()) 5 | import PIL 6 | from PIL import Image,ImageDraw,ImageFont 7 | import argparse 8 | import torch 9 | import torch.nn as nn 10 | import torchvision.transforms as transforms 11 | from data import VOC_CLASSES as labelmap 12 | from PIL import Image 13 | from data import VOCAnnotationTransform, VOCDetection, BaseTransform, VOC_CLASSES,CRACK_CLASSES,CRACKDetection 14 | import torch.utils.data as data 15 | from model import build_ssd 16 | from data import * 17 | from tqdm import tqdm 18 | import pandas as pd 19 | from matplotlib import pyplot as plot 20 | import numpy as np 21 | import matplotlib.pyplot as plt 22 | from config import voc 23 | parser = argparse.ArgumentParser(description='Single Shot MultiBox Detection') 24 | parser.add_argument('--trained_model', default='weights/ssd300_COCO_14100_0.9832380952380952.pth', 25 | type=str, help='Trained state_dict file path to open') 26 | parser.add_argument('--save_folder', default='result/', type=str, 27 | help='Dir to save results') 28 | parser.add_argument('--visual_threshold', default=0.6, type=float, 29 | help='Final confidence threshold') 30 | parser.add_argument('--cuda', default=True, type=bool, 31 | help='Use cuda to train model') 32 | parser.add_argument('--voc_root', default=VOC_ROOT, help='Location of VOC root directory') 33 | parser.add_argument('--visbox', default=False, type=bool, help="vis the boxes") 34 | args = parser.parse_args() 35 | 36 | 37 | 38 | def vis_image(img, ax=None): 39 | """Visualize a color image. 40 | 41 | Args: 42 | img (~numpy.ndarray): An array of shape :math:`(3, height, width)`. 43 | This is in RGB format and the range of its value is 44 | :math:`[0, 255]`. 45 | ax (matplotlib.axes.Axis): The visualization is displayed on this 46 | axis. If this is :obj:`None` (default), a new axis is created. 47 | 48 | Returns: 49 | ~matploblib.axes.Axes: 50 | Returns the Axes object with the plot for further tweaking. 51 | 52 | """ 53 | #print(img.shape) 54 | if ax is None: 55 | fig = plot.figure() 56 | ax = fig.add_subplot(1, 1, 1) 57 | # CHW -> HWC 58 | #img = np.transpose(img,(1, 2, 0)) 59 | ax.imshow(img.astype(np.uint8)) 60 | return ax 61 | 62 | 63 | def vis_bbox(img, bbox, label=None, score=None, ax=None): 64 | """Visualize bounding boxes inside image. 65 | 66 | Args: 67 | img (~numpy.ndarray): An array of shape :math:`(3, height, width)`. 68 | This is in RGB format and the range of its value is 69 | :math:`[0, 255]`. 70 | bbox (~numpy.ndarray): An array of shape :math:`(R, 4)`, where 71 | :math:`R` is the number of bounding boxes in the image. 72 | Each element is organized 73 | by :math:`(y_{min}, x_{min}, y_{max}, x_{max})` in the second axis. 74 | label (~numpy.ndarray): An integer array of shape :math:`(R,)`. 75 | The values correspond to id for label names stored in 76 | :obj:`label_names`. This is optional. 77 | score (~numpy.ndarray): A float array of shape :math:`(R,)`. 78 | Each value indicates how confident the prediction is. 79 | This is optional. 80 | label_names (iterable of strings): Name of labels ordered according 81 | to label ids. If this is :obj:`None`, labels will be skipped. 82 | ax (matplotlib.axes.Axis): The visualization is displayed on this 83 | axis. If this is :obj:`None` (default), a new axis is created. 84 | 85 | Returns: 86 | ~matploblib.axes.Axes: 87 | Returns the Axes object with the plot for further tweaking. 88 | 89 | """ 90 | #label_names = ['neg','bg'] 91 | label_names = list(labelmap) + ['bg'] 92 | # add for index `-1` 93 | if label is not None and not len(bbox) == len(label): 94 | raise ValueError('The length of label must be same as that of bbox') 95 | if score is not None and not len(bbox) == len(score): 96 | raise ValueError('The length of score must be same as that of bbox') 97 | 98 | # Returns newly instantiated matplotlib.axes.Axes object if ax is None 99 | ax = vis_image(img, ax=ax) 100 | # If there is no bounding box to display, visualize the image and exit. 101 | if len(bbox) == 0: 102 | return ax 103 | 104 | for i, bb in enumerate(bbox): 105 | xy = (bb[0], bb[1]) 106 | width = bb[2] - bb[0] 107 | height = bb[3] - bb[1] 108 | 109 | ax.add_patch(plot.Rectangle( 110 | xy, width, height, fill=False, edgecolor='red', linewidth=2)) 111 | #plt.text(bb[0],bb[1],score,family='fantasy',fontsize=36,style='italic',color='mediumvioletred') 112 | caption = list() 113 | 114 | if label is not None and label_names is not None: 115 | lb = label[i] 116 | if not (-1 <= lb < len(label_names)): # modfy here to add backgroud 117 | raise ValueError('No corresponding name is given') 118 | caption.append(label_names[lb]) 119 | if score is not None: 120 | sc = score[i] 121 | caption.append('{:.2f}'.format(sc)) 122 | 123 | if len(caption) > 0: 124 | ax.text(bb[1], bb[0], 125 | ': '.join(caption), 126 | style='italic', 127 | bbox={'facecolor': 'white', 'alpha': 0.5, 'pad': 0}) 128 | return ax 129 | 130 | def test_net(save_folder, net, cuda, testset, transform, thresh): 131 | # dump predictions and assoc. ground truth to text file for now 132 | filename = save_folder+'test.txt' 133 | num_images = len(testset) 134 | for i in range(num_images): 135 | print('Testing image {:d}/{:d}....'.format(i+1, num_images)) 136 | img = testset.pull_image(i) 137 | img_id, annotation = testset.pull_anno(i) 138 | x = torch.from_numpy(transform(img)[0]).permute(2, 0, 1) 139 | x = x.unsqueeze(0) 140 | 141 | with open(filename, mode='a') as f: 142 | f.write('\nGROUND TRUTH FOR: '+img_id+'\n') 143 | for box in annotation: 144 | f.write('label: '+' || '.join(str(b) for b in box)+'\n') 145 | if cuda: 146 | x = x.cuda() 147 | 148 | y = net(x,'test') # forward pass 149 | detections = y.data #[batch_size,num_class,top_k,conf+locloc][1,21,200,5] 150 | # scale each detection back up to the image 151 | scale = torch.Tensor([img.shape[1], img.shape[0], 152 | img.shape[1], img.shape[0]]) 153 | if args.visbox == True: 154 | boxs = detections[0,1:,:,:] 155 | #print(boxs.shape) 156 | boxs = boxs[boxs[:,:,0]>args.visual_threshold] 157 | #print(boxs.shape) 158 | for t in range(21): 159 | boxes = detections[0,t,:,:] 160 | for gg in range(200): 161 | if boxes[gg,0]>=args.visual_threshold: 162 | tt= boxes[gg,:] 163 | print(tt) 164 | with open(r'/mnt/home/test_ciou.txt','a') as f1: 165 | f1.write(str(i)) 166 | f1.write(' ') 167 | f1.write(str(t)) 168 | f1.write(' ') 169 | f1.write(str(tt)) 170 | f1.write('\n') 171 | continue 172 | #print(boxs) 173 | if boxs.shape[0] != 0: 174 | boxs= boxs[:,1:] 175 | vis_bbox(np.array(img),boxs*scale) 176 | #x1=boxs[:,0] 177 | #y2=boxs[:,1] 178 | #x2=boxs[:,2] 179 | #y2=boxs[:,3] 180 | #print(y2) 181 | #r=boxs.shape 182 | #print(r[0]) 183 | #plt.text(bb[0],bb[1],score,family='fantasy',fontsize=36,style='italic',color='mediumvioletred') 184 | plt.axis('off') 185 | plt.savefig('/mnt/home/ciou/%d.png'%(i)) 186 | plot.show() 187 | pred_num = 0 188 | for i in range(detections.size(1)): 189 | j = 0 190 | 191 | while detections[0, i, j, 0] >= 0.6: 192 | if pred_num == 0: 193 | with open(filename, mode='a') as f: 194 | f.write('PREDICTIONS: '+'\n') 195 | 196 | 197 | score = detections[0, i, j, 0] 198 | label_name = labelmap[i-1] 199 | pt = (detections[0, i, j, 1:]*scale).cpu().numpy() 200 | coords = (pt[0], pt[1], pt[2], pt[3]) 201 | 202 | pred_num += 1 203 | with open(filename, mode='a') as f: 204 | f.write(str(pred_num)+' label: '+label_name+' score: ' + 205 | str(score) + ' '+' || '.join(str(c) for c in coords) + '\n') 206 | j += 1 207 | 208 | 209 | def test_voc(): 210 | # load net 211 | net = build_ssd('test', 300, voc) # initialize SSD 212 | net.load_state_dict(torch.load(args.trained_model)) 213 | net.eval() 214 | print('Finished loading model!') 215 | # load data 216 | testset = VOCDetection(args.voc_root, [('2007', 'test')], 217 | BaseTransform(300, voc['mean'],voc['std'])) 218 | if args.cuda: 219 | net = net.cuda() 220 | #torch.backends.cudnn.benchmark = True 221 | # evaluation 222 | test_net(args.save_folder, net, args.cuda, testset, 223 | BaseTransform(300, voc['mean'],voc['std']), 224 | thresh=args.visual_threshold) 225 | 226 | 227 | 228 | if __name__ == '__main__': 229 | 230 | if args.cuda and torch.cuda.is_available(): 231 | torch.set_default_tensor_type('torch.cuda.FloatTensor') 232 | else: 233 | torch.set_default_tensor_type('torch.FloatTensor') 234 | 235 | if not os.path.exists(args.save_folder): 236 | os.mkdir(args.save_folder) 237 | test_voc() 238 | -------------------------------------------------------------------------------- /tools/train.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | sys.path.append(os.getcwd()) 4 | from model import build_ssd 5 | from data import * 6 | from config import crack,voc 7 | from utils import MultiBoxLoss 8 | 9 | 10 | import time 11 | import torch 12 | import torch.nn as nn 13 | import torch.optim as optim 14 | import torch.backends.cudnn as cudnn 15 | import torch.nn.init as init 16 | import torch.utils.data as data 17 | 18 | import argparse 19 | from tqdm import tqdm 20 | 21 | 22 | def str2bool(v): 23 | return v.lower() in ("yes", "true", "t", "1") 24 | ''' 25 | from eval import test_net 26 | ''' 27 | parser = argparse.ArgumentParser(description= 28 | 'Single Shot MultiBox Detector Training With Pytorch') 29 | train_set = parser.add_mutually_exclusive_group() 30 | parser.add_argument('--dataset', default='VOC', choices=['VOC', 'COCO','CRACK','TRAFIC'], 31 | type=str, help='VOC or COCO') 32 | parser.add_argument('--basenet', default=None,#'vgg16_reducedfc.pth', 33 | help='Pretrained base model') 34 | parser.add_argument('--batch_size', default=32, type=int, 35 | help='Batch size for training') 36 | parser.add_argument('--max_epoch', default=232, type=int, 37 | help='Max Epoch for training') 38 | parser.add_argument('--resume', default=None, type=str, 39 | help='Checkpoint state_dict file to resume training from') 40 | parser.add_argument('--start_iter', default=0, type=int, 41 | help='Resume training at this iter') 42 | parser.add_argument('--num_workers', default=4, type=int, 43 | help='Number of workers used in dataloading') 44 | parser.add_argument('--cuda', default=True, type=str2bool, 45 | help='Use CUDA to train model') 46 | parser.add_argument('--lr', '--learning-rate', default=1e-3, type=float, 47 | help='initial learning rate') 48 | parser.add_argument('--momentum', default=0.9, type=float, 49 | help='Momentum value for optim') 50 | parser.add_argument('--weight_decay', default=5e-4, type=float, 51 | help='Weight decay for SGD') 52 | parser.add_argument('--gamma', default=0.1, type=float, 53 | help='Gamma update for SGD') 54 | parser.add_argument('--visdom', default='VOC',type=str, 55 | help='Use visdom') 56 | parser.add_argument('--work_dir', default='work_dir/', 57 | help='Directory for saving checkpoint models') 58 | 59 | parser.add_argument('--weight', default=5, type=int) 60 | args = parser.parse_args() 61 | 62 | weight = args.weight 63 | 64 | if torch.cuda.is_available(): 65 | if args.cuda: 66 | torch.set_default_tensor_type('torch.cuda.FloatTensor') 67 | if not args.cuda: 68 | print("WARNING: It looks like you have a CUDA device, but aren't " + 69 | "using CUDA.\nRun with --cuda for optimal training speed.") 70 | torch.set_default_tensor_type('torch.FloatTensor') 71 | else: 72 | torch.set_default_tensor_type('torch.FloatTensor') 73 | 74 | if not os.path.exists(args.work_dir): 75 | os.mkdir(args.work_dir) 76 | 77 | 78 | def data_eval(dataset, net): 79 | return test_net('eval/', net, True, dataset, 80 | BaseTransform(trafic['min_dim'], MEANS), 5, 300, 81 | thresh=0.05) 82 | 83 | 84 | def train(): 85 | ''' 86 | get the dataset and dataloader 87 | ''' 88 | print(args.dataset) 89 | if args.dataset == 'COCO': 90 | if not os.path.exists(COCO_ROOT): 91 | parser.error('Must specify dataset_root if specifying dataset') 92 | 93 | cfg = coco 94 | dataset = COCODetection(root=COCO_ROOT, 95 | transform=SSDAugmentation(cfg['min_dim'], 96 | MEANS),filename = 'train.txt') 97 | elif args.dataset == 'VOC': 98 | if not os.path.exists(VOC_ROOT): 99 | parser.error('Must specify dataset_root if specifying dataset') 100 | 101 | cfg = voc 102 | dataset = VOCDetection(root=VOC_ROOT, 103 | transform = SSDAugmentation(cfg['min_dim'], 104 | mean = cfg['mean'],std = cfg['std'])) 105 | print(len(dataset)) 106 | elif args.dataset == 'CRACK': 107 | if not os.path.exists(CRACK_ROOT): 108 | parser.error('Must specify dataset_root if specifying dataset') 109 | 110 | cfg = crack 111 | dataset = CRACKDetection(root=CRACK_ROOT, 112 | transform=SSDAugmentation(cfg['min_dim'], 113 | mean = cfg['mean'],std = cfg['std'])) 114 | 115 | data_loader = data.DataLoader(dataset, args.batch_size, 116 | num_workers=args.num_workers, 117 | shuffle=True, collate_fn=detection_collate, 118 | pin_memory=True) 119 | 120 | #build, load, the net 121 | ssd_net = build_ssd('train',size = cfg['min_dim'],cfg = cfg) 122 | ''' 123 | for name,param in ssd_net.named_parameters(): 124 | if param.requires_grad: 125 | print(name) 126 | ''' 127 | if args.resume: 128 | print('Resuming training, loading {}...'.format(args.resume)) 129 | ssd_net.load_state_dict(torch.load(args.resume)) 130 | 131 | if args.cuda: 132 | net = ssd_net.cuda() 133 | net.train() 134 | 135 | #optimizer 136 | optimizer = optim.SGD(net.parameters(), lr=args.lr, momentum=args.momentum, 137 | weight_decay=args.weight_decay) 138 | 139 | 140 | #loss:SmoothL1\Iou\Giou\Diou\Ciou 141 | print(cfg['losstype']) 142 | criterion = MultiBoxLoss(cfg = cfg,overlap_thresh = 0.5, 143 | prior_for_matching = True,bkg_label = 0, 144 | neg_mining = True, neg_pos = 3,neg_overlap = 0.5, 145 | encode_target = False, use_gpu = args.cuda,loss_name = cfg['losstype']) 146 | 147 | if args.visdom: 148 | import visdom 149 | viz = visdom.Visdom(env=cfg['work_name']) 150 | vis_title = 'SSD on ' + args.dataset 151 | vis_legend = ['Loc Loss', 'Conf Loss', 'Total Loss'] 152 | iter_plot = create_vis_plot(viz,'Iteration', 'Loss', vis_title, vis_legend) 153 | epoch_plot = create_vis_plot(viz,'Epoch', 'Loss', vis_title+" epoch loss", vis_legend) 154 | #epoch_acc = create_acc_plot(viz,'Epoch', 'acc', args.dataset+" Acc",["Acc"]) 155 | 156 | 157 | 158 | 159 | 160 | epoch_size = len(dataset) // args.batch_size 161 | print('Training SSD on:', dataset.name,epoch_size) 162 | iteration = args.start_iter 163 | step_index = 0 164 | loc_loss = 0 165 | conf_loss = 0 166 | for epoch in range(args.max_epoch): 167 | for ii, batch_iterator in tqdm(enumerate(data_loader)): 168 | iteration += 1 169 | 170 | if iteration in cfg['lr_steps']: 171 | step_index += 1 172 | adjust_learning_rate(optimizer, args.gamma, step_index) 173 | 174 | # load train data 175 | images, targets = batch_iterator 176 | #print(images,targets) 177 | if args.cuda: 178 | images = images.cuda() 179 | targets = [ann.cuda() for ann in targets] 180 | else: 181 | images = images 182 | targets = [ann for ann in targets] 183 | t0 = time.time() 184 | out = net(images,'train') 185 | optimizer.zero_grad() 186 | loss_l, loss_c = criterion(out, targets) 187 | loss = weight * loss_l + loss_c 188 | loss.backward() 189 | optimizer.step() 190 | t1 = time.time() 191 | loc_loss += loss_l.item() 192 | conf_loss += loss_c.item() 193 | #print(iteration) 194 | if iteration % 10 == 0: 195 | print('timer: %.4f sec.' % (t1 - t0)) 196 | print('iter ' + repr(iteration) + ' || Loss: %.4f ||' % (loss.item()), end=' ') 197 | 198 | 199 | if args.visdom: 200 | if iteration>20 and iteration% 10 == 0: 201 | update_vis_plot(viz,iteration, loss_l.item(), loss_c.item(), 202 | iter_plot, epoch_plot, 'append') 203 | 204 | if epoch % 10 == 0 and epoch >60:#epoch>1000 and epoch % 50 == 0: 205 | print('Saving state, iter:', iteration) 206 | #print('loss_l:'+weight * loss_l+', loss_c:'+'loss_c') 207 | save_folder = args.work_dir+cfg['work_name'] 208 | if not os.path.exists(save_folder): 209 | os.mkdir(save_folder) 210 | torch.save(net.state_dict(),args.work_dir+cfg['work_name']+'/ssd'+ 211 | repr(epoch)+'_.pth') 212 | if args.visdom: 213 | update_vis_plot(viz, epoch, loc_loss, conf_loss, epoch_plot, epoch_plot, 214 | 'append', epoch_size) 215 | loc_loss = 0 216 | conf_loss = 0 217 | 218 | torch.save(net.state_dict(),args.work_dir+cfg['work_name']+'/ssd'+repr(epoch)+ str(args.weight) +'_.pth') 219 | 220 | def adjust_learning_rate(optimizer, gamma, step): 221 | """Sets the learning rate to the initial LR decayed by 10 at every 222 | specified step 223 | # Adapted from PyTorch Imagenet example: 224 | # https://github.com/pytorch/examples/blob/master/imagenet/main.py 225 | """ 226 | lr = args.lr * (gamma ** (step)) 227 | for param_group in optimizer.param_groups: 228 | param_group['lr'] = lr 229 | print(param_group['lr']) 230 | 231 | 232 | def create_vis_plot(viz,_xlabel, _ylabel, _title, _legend): 233 | return viz.line( 234 | X=torch.zeros((1,)).cpu(), 235 | Y=torch.zeros((1, 3)).cpu(), 236 | opts=dict( 237 | xlabel=_xlabel, 238 | ylabel=_ylabel, 239 | title=_title, 240 | legend=_legend 241 | ) 242 | ) 243 | 244 | def create_acc_plot(viz,_xlabel, _ylabel, _title, _legend): 245 | return viz.line( 246 | X=torch.zeros((1,)).cpu(), 247 | Y=torch.zeros((1,)).cpu(), 248 | opts=dict( 249 | xlabel=_xlabel, 250 | ylabel=_ylabel, 251 | title=_title, 252 | legend=_legend 253 | ) 254 | ) 255 | 256 | 257 | def update_vis_plot(viz,iteration, loc, conf, window1, window2, update_type, 258 | epoch_size=1): 259 | viz.line( 260 | X=torch.ones((1, 3)).cpu() * iteration, 261 | Y=torch.Tensor([loc, conf, loc + conf]).unsqueeze(0).cpu() / epoch_size, 262 | win=window1, 263 | update=update_type 264 | ) 265 | 266 | 267 | def update_acc_plot(viz,iteration,acc, window1,update_type, 268 | epoch_size=1): 269 | viz.line( 270 | X=torch.ones((1, 1)).cpu()*iteration, 271 | Y=torch.Tensor([acc]).unsqueeze(0).cpu(), 272 | win=window1, 273 | update=update_type 274 | ) 275 | # initialize epoch plot on first iteration 276 | ''' 277 | if iteration == 0: 278 | print(loc, conf, loc + conf) 279 | viz.line( 280 | X=torch.zeros((1, 3)).cpu(), 281 | Y=torch.Tensor([loc, conf, loc + conf]).unsqueeze(0).cpu(), 282 | win=window2, 283 | update=True 284 | ) 285 | ''' 286 | if __name__ == '__main__': 287 | train() 288 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- 1 | from .box import * 2 | from .detection import * 3 | from .loss import * 4 | -------------------------------------------------------------------------------- /utils/__pycache__/__init__.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/utils/__pycache__/__init__.cpython-35.pyc -------------------------------------------------------------------------------- /utils/__pycache__/__init__.cpython-35.sublime-workspace: -------------------------------------------------------------------------------- 1 | { 2 | "auto_complete": 3 | { 4 | "selected_items": 5 | [ 6 | [ 7 | "set", 8 | "set_type\tstatement" 9 | ], 10 | [ 11 | "eval", 12 | "eval_type\tparam" 13 | ], 14 | [ 15 | "all", 16 | "all_boxes\tstatement" 17 | ], 18 | [ 19 | "clas", 20 | "classname\tparam" 21 | ], 22 | [ 23 | "loss", 24 | "loss_l\tstatement" 25 | ], 26 | [ 27 | "area", 28 | "area2\tstatement" 29 | ], 30 | [ 31 | "ten", 32 | "Tensor\tclass" 33 | ], 34 | [ 35 | "tensor", 36 | "tensor" 37 | ], 38 | [ 39 | "loc", 40 | "loc_t\tstatement" 41 | ], 42 | [ 43 | "pri", 44 | "priors\tstatement" 45 | ], 46 | [ 47 | "b", 48 | "bbox_overlaps_gious\tmodule" 49 | ], 50 | [ 51 | "box", 52 | "boxes2\tstatement" 53 | ], 54 | [ 55 | "g", 56 | "GiouLoss\tclass" 57 | ], 58 | [ 59 | "num", 60 | "num_pos\tstatement" 61 | ], 62 | [ 63 | "match", 64 | "match_gious\tfunction" 65 | ], 66 | [ 67 | "mat", 68 | "match_gious\tfunction" 69 | ], 70 | [ 71 | "per", 72 | "permute" 73 | ], 74 | [ 75 | "Voc", 76 | "VOC_CLASSES\tinstance" 77 | ], 78 | [ 79 | "model", 80 | "model_name\tparam" 81 | ], 82 | [ 83 | "out", 84 | "out_channels" 85 | ], 86 | [ 87 | "mode", 88 | "model_length" 89 | ], 90 | [ 91 | "mo", 92 | "model\tstatement" 93 | ], 94 | [ 95 | "exts", 96 | "exts3_1\tstatement" 97 | ], 98 | [ 99 | "modu", 100 | "ModuleList\tclass" 101 | ], 102 | [ 103 | "con", 104 | "Conv2d\tclass" 105 | ], 106 | [ 107 | "data", 108 | "dataset\tstatement" 109 | ], 110 | [ 111 | "i", 112 | "i\tstatement" 113 | ], 114 | [ 115 | "input", 116 | "inputs\tparam" 117 | ], 118 | [ 119 | "x", 120 | "xavier_uniform\tstatement" 121 | ], 122 | [ 123 | "in", 124 | "init_weights\tfunction" 125 | ], 126 | [ 127 | "Mu", 128 | "MultiBoxLoss" 129 | ], 130 | [ 131 | "la", 132 | "laterals\tstatement" 133 | ], 134 | [ 135 | "dete", 136 | "Detect" 137 | ], 138 | [ 139 | "p", 140 | "PriorBox\tclass" 141 | ], 142 | [ 143 | "P", 144 | "PriorBox" 145 | ], 146 | [ 147 | "pr", 148 | "prior_box\tmodule" 149 | ], 150 | [ 151 | "with", 152 | "with_norm\tstatement" 153 | ], 154 | [ 155 | "buil", 156 | "build_ssd\tmodule" 157 | ], 158 | [ 159 | "c", 160 | "ConvModule\tclass" 161 | ], 162 | [ 163 | "v", 164 | "voc\tmodule" 165 | ], 166 | [ 167 | "SS", 168 | "SSDHead\tclass" 169 | ], 170 | [ 171 | "base", 172 | "base_feature\tstatement" 173 | ], 174 | [ 175 | "fea", 176 | "feature_map_out\tstatement" 177 | ], 178 | [ 179 | "ne", 180 | "build_neck\tmodule" 181 | ], 182 | [ 183 | "ex", 184 | "Exception\tclass" 185 | ], 186 | [ 187 | "bu", 188 | "build_backbone\tmodule" 189 | ], 190 | [ 191 | "is", 192 | "issubset\tfunction" 193 | ], 194 | [ 195 | "pre", 196 | "pretrained\tparam" 197 | ], 198 | [ 199 | "dataset", 200 | "dataset_root" 201 | ], 202 | [ 203 | "VOC", 204 | "VOC_ROOT\tinstance" 205 | ], 206 | [ 207 | "SSD", 208 | "SSD_Eval_Augmentation\tclass" 209 | ], 210 | [ 211 | "au", 212 | "augmentations\tmodule" 213 | ], 214 | [ 215 | "gt", 216 | "gt_bboxes\tstatement" 217 | ], 218 | [ 219 | "ras", 220 | "raise\tkeyword" 221 | ], 222 | [ 223 | "Cra", 224 | "CRACKDetection\tclass" 225 | ], 226 | [ 227 | "Cr", 228 | "CRACK_ROOT\tstatement" 229 | ], 230 | [ 231 | "voc", 232 | "VOCDetection\tclass" 233 | ], 234 | [ 235 | "vo", 236 | "VOC\tmodule" 237 | ], 238 | [ 239 | "Tr", 240 | "True\tkeyword" 241 | ], 242 | [ 243 | "Tra", 244 | "TRAFIC_CLASSES\tinstance" 245 | ], 246 | [ 247 | "tra", 248 | "TRAFICDetection\tclass" 249 | ], 250 | [ 251 | "TRA", 252 | "TRAFIC_ROOT\tinstance" 253 | ], 254 | [ 255 | "anc", 256 | "anchor_list\tstatement" 257 | ], 258 | [ 259 | "nu", 260 | "num_ins\tstatement" 261 | ], 262 | [ 263 | "wat", 264 | "waitKey\tfunction" 265 | ], 266 | [ 267 | "sco", 268 | "Score_precision" 269 | ], 270 | [ 271 | "sc", 272 | "Score_recall" 273 | ], 274 | [ 275 | "wi", 276 | "write\tfunction" 277 | ], 278 | [ 279 | "wri", 280 | "write\tfunction" 281 | ], 282 | [ 283 | "wr", 284 | "write\tfunction" 285 | ], 286 | [ 287 | "image", 288 | "img_names\tstatement" 289 | ], 290 | [ 291 | "spl", 292 | "splitext\tfunction" 293 | ], 294 | [ 295 | "src", 296 | "src_xml_dir\tstatement" 297 | ], 298 | [ 299 | "sq", 300 | "sqrt\tinstance" 301 | ], 302 | [ 303 | "cra", 304 | "Crack\tclass" 305 | ], 306 | [ 307 | "get", 308 | "get_dataset\tfunction" 309 | ], 310 | [ 311 | "val", 312 | "val_dataset\tstatement" 313 | ], 314 | [ 315 | "ro", 316 | "roi_crop\tmodule" 317 | ], 318 | [ 319 | "im", 320 | "imdb" 321 | ], 322 | [ 323 | "mask", 324 | "masked_select\tfunction" 325 | ], 326 | [ 327 | "ima", 328 | "img_label" 329 | ], 330 | [ 331 | "file", 332 | "filename\tstatement" 333 | ], 334 | [ 335 | "fir", 336 | "first_img\tstatement" 337 | ], 338 | [ 339 | "fri", 340 | "first_img\tstatement" 341 | ], 342 | [ 343 | "E", 344 | "Exception\tclass" 345 | ], 346 | [ 347 | "sec", 348 | "second_img\tstatement" 349 | ], 350 | [ 351 | "img", 352 | "img_two\tstatement" 353 | ], 354 | [ 355 | "add", 356 | "addWeighted\tfunction" 357 | ], 358 | [ 359 | "wa", 360 | "waitKey\tfunction" 361 | ], 362 | [ 363 | "conf", 364 | "conf_t\tstatement" 365 | ], 366 | [ 367 | "max", 368 | "max_pro\tstatement" 369 | ], 370 | [ 371 | "cr", 372 | "CRACKDetection\tclass" 373 | ], 374 | [ 375 | "res", 376 | "resnet\tparam" 377 | ], 378 | [ 379 | "pos", 380 | "pos_idx\tstatement" 381 | ], 382 | [ 383 | "e", 384 | "empty_cache\tfunction" 385 | ], 386 | [ 387 | "ss", 388 | "SSD_Eval_Augmentation\tclass" 389 | ], 390 | [ 391 | "ssd", 392 | "SSD_Eval_Augmentation\tclass" 393 | ], 394 | [ 395 | "class", 396 | "class_to_ind\tparam" 397 | ], 398 | [ 399 | "CAR", 400 | "CRACK_ROOT\tinstance" 401 | ], 402 | [ 403 | "root", 404 | "rootpath\tstatement" 405 | ], 406 | [ 407 | "CRAC", 408 | "CRACK_ROOT" 409 | ], 410 | [ 411 | "re", 412 | "reduction\tparam" 413 | ], 414 | [ 415 | "pic", 416 | "pic_size" 417 | ], 418 | [ 419 | "f", 420 | "functional\tmodule" 421 | ], 422 | [ 423 | "load", 424 | "load_model_path" 425 | ], 426 | [ 427 | "INPUT", 428 | "input_IMG\tstatement" 429 | ], 430 | [ 431 | "r", 432 | "r" 433 | ], 434 | [ 435 | "triph", 436 | "triphard-rate" 437 | ], 438 | [ 439 | "trip", 440 | "TripHardLoss\tclass" 441 | ], 442 | [ 443 | "ide", 444 | "ide_face\tstatement" 445 | ], 446 | [ 447 | "best", 448 | "best_threshold_index" 449 | ], 450 | [ 451 | "cv", 452 | "cvtColor\tfunction" 453 | ], 454 | [ 455 | "COLOR_B", 456 | "COLOR_BGR2GRAY\tinstance" 457 | ], 458 | [ 459 | "epo", 460 | "epoch_acc\tstatement" 461 | ], 462 | [ 463 | "Mys", 464 | "Myseresnext50_32x4d\tclass" 465 | ], 466 | [ 467 | "ince", 468 | "inception_v3\tfunction" 469 | ], 470 | [ 471 | "lr", 472 | "lr_find\tstatement" 473 | ], 474 | [ 475 | "plot", 476 | "plot_lr\tfunction" 477 | ], 478 | [ 479 | "optimizer", 480 | "optimizer\tparam" 481 | ], 482 | [ 483 | "ran", 484 | "randint\tfunction" 485 | ], 486 | [ 487 | "torchv", 488 | "torchvision\tmodule" 489 | ], 490 | [ 491 | "bat", 492 | "batch_size\tstatement" 493 | ], 494 | [ 495 | "tes", 496 | "test_data\tstatement" 497 | ], 498 | [ 499 | "da", 500 | "data_aug\tmodule" 501 | ], 502 | [ 503 | "tans", 504 | "data_transforms" 505 | ], 506 | [ 507 | "fo", 508 | "FocalLoss\tclass" 509 | ], 510 | [ 511 | "result", 512 | "result_list" 513 | ], 514 | [ 515 | "str", 516 | "str_result\tstatement" 517 | ] 518 | ] 519 | }, 520 | "buffers": 521 | [ 522 | { 523 | "file": "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/tools/train.py", 524 | "settings": 525 | { 526 | "buffer_size": 10125, 527 | "encoding": "UTF-8", 528 | "line_ending": "Unix" 529 | } 530 | }, 531 | { 532 | "file": "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/data/VOC.py", 533 | "settings": 534 | { 535 | "buffer_size": 7310, 536 | "encoding": "UTF-8", 537 | "line_ending": "Unix" 538 | } 539 | }, 540 | { 541 | "file": "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/config/config.py", 542 | "settings": 543 | { 544 | "buffer_size": 3019, 545 | "encoding": "UTF-8", 546 | "line_ending": "Unix" 547 | } 548 | }, 549 | { 550 | "file": "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/tools/eval.py", 551 | "settings": 552 | { 553 | "buffer_size": 15766, 554 | "encoding": "UTF-8", 555 | "line_ending": "Unix" 556 | } 557 | } 558 | ], 559 | "build_system": "Packages/Python/Python.sublime-build", 560 | "build_system_choices": 561 | [ 562 | [ 563 | [ 564 | [ 565 | "Packages/Python/Python.sublime-build", 566 | "" 567 | ], 568 | [ 569 | "Packages/Python/Python.sublime-build", 570 | "Syntax Check" 571 | ] 572 | ], 573 | [ 574 | "Packages/Python/Python.sublime-build", 575 | "" 576 | ] 577 | ] 578 | ], 579 | "build_varint": "", 580 | "command_palette": 581 | { 582 | "height": 92.0, 583 | "last_filter": "Package Control: install", 584 | "selected_items": 585 | [ 586 | [ 587 | "Package Control: install", 588 | "Package Control: Install Package" 589 | ], 590 | [ 591 | "Package Control: instal", 592 | "Package Control: Install Package" 593 | ] 594 | ], 595 | "width": 449.0 596 | }, 597 | "console": 598 | { 599 | "height": 126.0, 600 | "history": 601 | [ 602 | "import urllib.request,os,hashlib; h = '6f4c264a24d933ce70df5dedcf1dcaee' + 'ebe013ee18cced0ef93d5f746d80ef60'; pf = 'Package Control.sublime-package'; ipp = sublime.installed_packages_path(); urllib.request.install_opener( urllib.request.build_opener( urllib.request.ProxyHandler()) ); by = urllib.request.urlopen( 'http://packagecontrol.io/' + pf.replace(' ', '%20')).read(); dh = hashlib.sha256(by).hexdigest(); print('Error validating download (got %s instead of %s), please try manual install' % (dh, h)) if dh != h else open(os.path.join( ipp, pf), 'wb' ).write(by) " 603 | ] 604 | }, 605 | "distraction_free": 606 | { 607 | "menu_visible": true, 608 | "show_minimap": false, 609 | "show_open_files": false, 610 | "show_tabs": false, 611 | "side_bar_visible": false, 612 | "status_bar_visible": false 613 | }, 614 | "file_history": 615 | [ 616 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/utils/box/box_utils.py", 617 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/data/utils/augmentations.py", 618 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/utils/loss/multibox_loss.py", 619 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/tools/test.py", 620 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD_Pytorch/test.py", 621 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/data/VOC.py", 622 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD_Pytorch/eval.py", 623 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/tools/eval.py", 624 | "/home/hzw/MachineLearning/DeepLearning/test.py", 625 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/config/config.py", 626 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/tools/train.py", 627 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/model/build_ssd.py", 628 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/model/head/build_head.py", 629 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/utils/detection/detection.py", 630 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/utils/box/__init__.py", 631 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD_Pytorch/ssd.py", 632 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD_Pytorch/train.py", 633 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/model/backbone/build_backbone.py", 634 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/data/__init__.py", 635 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/data/utils/__init__.py", 636 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD_Pytorch/data/voc0712.py", 637 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/configs/pascal_voc/ssd300_voc.py", 638 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD_Pytorch/data/config.py", 639 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/model/utils/conv_module.py", 640 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/model/neck/build_neck.py", 641 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/model/neck/ssd_neck.py", 642 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/utils/loss/__init__.py", 643 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/model/neck/build_neck (copy).py", 644 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/Rail defect detection_submit/main.py", 645 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/models/utils/conv_module.py", 646 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/model/utils/weight_init.py", 647 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/utils/__init__.py", 648 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/utils/box/prior_box.py", 649 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/utils/prior_box/prior_box.py", 650 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/utils/prior_box/__init__.py", 651 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/model/__init__.py", 652 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/model/backbone/__init__.py", 653 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/model/head/__init__.py", 654 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/model/utils/norm.py", 655 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc0712.py", 656 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/models/anchor_heads/ssd_head.py", 657 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/models/necks/fpn.py", 658 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/models/builder.py", 659 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/models/utils/norm.py", 660 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/data/CRACK.py", 661 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/model/ssd.py", 662 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/datasets/voc.py", 663 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/datasets/xml_style.py", 664 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/config/__init__.py", 665 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/models/detectors/two_stage.py", 666 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/tools/test_trafic.py", 667 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/models/detectors/cascade_rcnn.py", 668 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/datasets/custom.py", 669 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/models/bbox_heads/bbox_head.py", 670 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/models/anchor_heads/rpn_head.py", 671 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/configs/trafic/faster_rcnn_r101_fpn_1x_trafic.py", 672 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/configs/trafic/cascade_rcnn_r101_fpn_1x_trafic.py", 673 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/models/anchor_heads/anchor_head.py", 674 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/datasets/transforms.py", 675 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/models/detectors/rpn.py", 676 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/core/anchor/anchor_target.py", 677 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/models/detectors/base.py", 678 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/models/detectors/faster_rcnn.py", 679 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/core/loss/losses.py", 680 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/apis/train.py", 681 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/datasets/loader/sampler.py", 682 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/datasets/trafic_signal.py", 683 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/datasets/loader/build_loader.py", 684 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/core/bbox/transforms.py", 685 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/core/bbox/bbox_target.py", 686 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/data/Traffic-Sign/csv2xml.py", 687 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/data/Traffic-Sign/createVOC.py", 688 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/mmdet/apis/train.py", 689 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/tools/train.py", 690 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/data/voc0712.py", 691 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD_Pytorch/layers/functions/detection.py", 692 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD_Pytorch/utils/augmentations.py", 693 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/simple-faster-rcnn-pytorch-master/data/voc_dataset.py", 694 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD_Pytorch/utils/eval_tool.py", 695 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/simple-faster-rcnn-pytorch-master/train.py", 696 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/simple-faster-rcnn-pytorch-master/model/utils/bbox_tools.py", 697 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD_Pytorch/layers/functions/prior_box.py", 698 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD_Pytorch/layers/modules/multibox_loss.py", 699 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD_Pytorch/data/__init__.py", 700 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/Rail defect detection_submit/config.py", 701 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/Rail defect detection_submit/models/BasicModule.py", 702 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/data/Traffic-Sign/Traiffic-Sign0510/k-means_Anchor.py", 703 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/configs/trafic/faster_rcnn_r50_fpn_1x_trafic.py", 704 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/configs/trafic/cascade_rcnn_r50_fpn_1x_trafic.py", 705 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/data/show_anno.py", 706 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/configs/faster_rcnn_ohem_r50_fpn_1x.py", 707 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/core/anchor/anchor_generator.py", 708 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/configs/trafic/cascade_rcnn_r50_fpn_1x.py", 709 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/models/roi_extractors/single_level.py", 710 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/models/registry.py", 711 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/models/backbones/resnet.py", 712 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/data/Traffic-Sign/Traiffic-Sign0510/createVOC.py", 713 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/datasets/utils.py", 714 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/tools/voc_eval.py", 715 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/core/evaluation/mean_ap.py", 716 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/configs/trafic/cascade_rcnn_r101_fpn_1x.py", 717 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/configs/trafic/faster_rcnn_x101_64x4d_fpn_1x_trafic.py", 718 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/configs/trafic/faster_rcnn_r101_fpn_1x.py", 719 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/configs/trafic/faster_rcnn_x101_64x4d_fpn_1x.py", 720 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/configs/pascal_voc/faster_rcnn_r50_fpn_1x_trafic.py", 721 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/tools/test _trafic.py", 722 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc0712 (copy).py", 723 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/tools/test.py", 724 | "/home/hzw/MachineLearning/DeepLearning/work/tk/tk_windows.py", 725 | "/media/hzw/Seagate Expansion Drive/研究生/DeepLearning/项目/work/tk/tk_windows.py", 726 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/configs/pascal_voc/faster_rcnn_r50_fpn_1x_crack.py", 727 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/datasets/__init__.py", 728 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/datasets/trafic-signal.py", 729 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/core/evaluation/__init__.py", 730 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/core/evaluation/class_names.py", 731 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/setup.py", 732 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/core/evaluation/coco_utils.py", 733 | "/usr/local/lib/python3.5/dist-packages/mmdet-0.5.7+head-py3.5.egg/mmdet/datasets/crack.py", 734 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/compile.sh", 735 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/mmdet/models/builder.py", 736 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/tools/just_do_test.py", 737 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/mmdet/datasets/utils.py", 738 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/build/lib/mmdet/datasets/utils.py", 739 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/mmdetection/build/lib/mmdet/models/builder.py", 740 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/fpn.pytorch/lib/datasets/vg.py", 741 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/fpn.pytorch/lib/datasets/pascal_voc.py", 742 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/fpn.pytorch/lib/pycocotools/mask.py", 743 | "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/fpn.pytorch/lib/datasets/imagenet.py" 744 | ], 745 | "find": 746 | { 747 | "height": 40.0 748 | }, 749 | "find_in_files": 750 | { 751 | "height": 0.0, 752 | "where_history": 753 | [ 754 | ] 755 | }, 756 | "find_state": 757 | { 758 | "case_sensitive": false, 759 | "find_history": 760 | [ 761 | "devkit_path", 762 | "transform", 763 | "cachedir", 764 | "save_folder", 765 | "output_dir", 766 | "dataset_mean", 767 | "annopath", 768 | "print", 769 | "output_dir", 770 | "type", 771 | "BaseTransform", 772 | "labelmap", 773 | "cls", 774 | "000005", 775 | "match", 776 | "jaccard", 777 | "intersect", 778 | "net", 779 | "ssd_net", 780 | "Detect", 781 | "transform", 782 | "BaseTransform", 783 | "means", 784 | "self.softmax", 785 | "epoch_size", 786 | "self.negpos_ratio", 787 | "num_classes", 788 | "normalize", 789 | "print", 790 | "loc_loss", 791 | "print", 792 | "decoded_boxes", 793 | "nms", 794 | "cfg", 795 | "cls_out_channels", 796 | "out_map", 797 | "xavier_init", 798 | "input_size", 799 | "normalize", 800 | "norm_layer", 801 | "normalize", 802 | "ConvModule", 803 | "self.modules()", 804 | "image_transform", 805 | "detection_collate", 806 | "voc", 807 | "osp", 808 | "Standform", 809 | "SubtractMeans", 810 | "RandomMirror", 811 | "print", 812 | "rpn_head", 813 | "DC", 814 | "BboxTransform", 815 | "img_infos", 816 | "get", 817 | "image_transform", 818 | "transform", 819 | "image_transform", 820 | "eq", 821 | "osp", 822 | "torch", 823 | "flip", 824 | "det_file", 825 | "resnet", 826 | "lr", 827 | "trafic", 828 | "print", 829 | "im_detect", 830 | "print", 831 | "pre_acc", 832 | "max_pre_acc", 833 | "set_type", 834 | "imgsetpath", 835 | "voc_eval", 836 | "voc_eval,", 837 | "imagesetfile,", 838 | "voc_eval", 839 | "set_type", 840 | "labelmap", 841 | "voc_eval", 842 | "set_type", 843 | "BaseTransform", 844 | "dataset_mean", 845 | "labelmap", 846 | "filename", 847 | "Detect", 848 | "CRACK_CLASSES", 849 | "flip", 850 | "print", 851 | "anchor_target", 852 | "sampling", 853 | "multi_apply", 854 | "print", 855 | "anchor_generators", 856 | "anchor_target", 857 | "img_meta", 858 | "train_cfg", 859 | "self.train_cfg", 860 | "featmap_strides", 861 | "in_channels", 862 | "args.validate", 863 | "sample", 864 | "group", 865 | "validate", 866 | "cfg", 867 | "flip", 868 | "flip_ratio", 869 | "box", 870 | "results2json", 871 | "det2json", 872 | "dataset", 873 | "cPickle", 874 | "print", 875 | "cPickle", 876 | "print", 877 | "best_map", 878 | "self.feat_stride", 879 | "anchor", 880 | "text", 881 | "generate_anchor_base", 882 | "print", 883 | "create_vis_plot", 884 | "max_pre_acc", 885 | "pull_anno", 886 | "pull_image", 887 | "labels", 888 | "targets" 889 | ], 890 | "highlight": true, 891 | "in_selection": false, 892 | "preserve_case": false, 893 | "regex": false, 894 | "replace_history": 895 | [ 896 | "pickle", 897 | "np", 898 | "\t" 899 | ], 900 | "reverse": false, 901 | "show_context": true, 902 | "use_buffer2": true, 903 | "whole_word": false, 904 | "wrap": true 905 | }, 906 | "groups": 907 | [ 908 | { 909 | "selected": 0, 910 | "sheets": 911 | [ 912 | { 913 | "buffer": 0, 914 | "file": "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/tools/train.py", 915 | "semi_transient": false, 916 | "settings": 917 | { 918 | "buffer_size": 10125, 919 | "regions": 920 | { 921 | }, 922 | "selection": 923 | [ 924 | [ 925 | 8073, 926 | 8073 927 | ] 928 | ], 929 | "settings": 930 | { 931 | "auto_complete_triggers": 932 | [ 933 | { 934 | "characters": ".", 935 | "selector": "source.python - string - comment - constant.numeric" 936 | }, 937 | { 938 | "characters": ".", 939 | "selector": "source.python - string - constant.numeric" 940 | } 941 | ], 942 | "syntax": "Packages/Python/Python.sublime-syntax", 943 | "tab_size": 4, 944 | "translate_tabs_to_spaces": true 945 | }, 946 | "translation.x": 0.0, 947 | "translation.y": 4608.0, 948 | "zoom_level": 1.0 949 | }, 950 | "stack_index": 0, 951 | "type": "text" 952 | }, 953 | { 954 | "buffer": 1, 955 | "file": "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/data/VOC.py", 956 | "semi_transient": false, 957 | "settings": 958 | { 959 | "buffer_size": 7310, 960 | "regions": 961 | { 962 | }, 963 | "selection": 964 | [ 965 | [ 966 | 3638, 967 | 3638 968 | ] 969 | ], 970 | "settings": 971 | { 972 | "syntax": "Packages/Python/Python.sublime-syntax", 973 | "tab_size": 4, 974 | "translate_tabs_to_spaces": true 975 | }, 976 | "translation.x": 0.0, 977 | "translation.y": 1872.0, 978 | "zoom_level": 1.0 979 | }, 980 | "stack_index": 1, 981 | "type": "text" 982 | }, 983 | { 984 | "buffer": 2, 985 | "file": "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/config/config.py", 986 | "semi_transient": false, 987 | "settings": 988 | { 989 | "buffer_size": 3019, 990 | "regions": 991 | { 992 | }, 993 | "selection": 994 | [ 995 | [ 996 | 981, 997 | 981 998 | ] 999 | ], 1000 | "settings": 1001 | { 1002 | "syntax": "Packages/Python/Python.sublime-syntax", 1003 | "tab_size": 4, 1004 | "translate_tabs_to_spaces": true 1005 | }, 1006 | "translation.x": 0.0, 1007 | "translation.y": 144.0, 1008 | "zoom_level": 1.0 1009 | }, 1010 | "stack_index": 3, 1011 | "type": "text" 1012 | }, 1013 | { 1014 | "buffer": 3, 1015 | "file": "/home/hzw/MachineLearning/DeepLearning/ObjectDetection/SSD/tools/eval.py", 1016 | "semi_transient": false, 1017 | "settings": 1018 | { 1019 | "buffer_size": 15766, 1020 | "regions": 1021 | { 1022 | }, 1023 | "selection": 1024 | [ 1025 | [ 1026 | 5972, 1027 | 5972 1028 | ] 1029 | ], 1030 | "settings": 1031 | { 1032 | "auto_complete_triggers": 1033 | [ 1034 | { 1035 | "characters": ".", 1036 | "selector": "source.python - string - comment - constant.numeric" 1037 | }, 1038 | { 1039 | "characters": ".", 1040 | "selector": "source.python - string - constant.numeric" 1041 | } 1042 | ], 1043 | "syntax": "Packages/Python/Python.sublime-syntax", 1044 | "tab_size": 4, 1045 | "translate_tabs_to_spaces": true 1046 | }, 1047 | "translation.x": 0.0, 1048 | "translation.y": 3900.0, 1049 | "zoom_level": 1.0 1050 | }, 1051 | "stack_index": 2, 1052 | "type": "text" 1053 | } 1054 | ] 1055 | } 1056 | ], 1057 | "incremental_find": 1058 | { 1059 | "height": 32.0 1060 | }, 1061 | "input": 1062 | { 1063 | "height": 0.0 1064 | }, 1065 | "layout": 1066 | { 1067 | "cells": 1068 | [ 1069 | [ 1070 | 0, 1071 | 0, 1072 | 1, 1073 | 1 1074 | ] 1075 | ], 1076 | "cols": 1077 | [ 1078 | 0.0, 1079 | 1.0 1080 | ], 1081 | "rows": 1082 | [ 1083 | 0.0, 1084 | 1.0 1085 | ] 1086 | }, 1087 | "menu_visible": true, 1088 | "output.exec": 1089 | { 1090 | "height": 154.0 1091 | }, 1092 | "output.find_results": 1093 | { 1094 | "height": 0.0 1095 | }, 1096 | "output.unsaved_changes": 1097 | { 1098 | "height": 154.0 1099 | }, 1100 | "pinned_build_system": "Packages/Python/Python.sublime-build", 1101 | "project": "__init__.cpython-35.sublime-project", 1102 | "replace": 1103 | { 1104 | "height": 60.0 1105 | }, 1106 | "save_all_on_build": true, 1107 | "select_file": 1108 | { 1109 | "height": 0.0, 1110 | "last_filter": "", 1111 | "selected_items": 1112 | [ 1113 | [ 1114 | "", 1115 | "~/MachineLearning/DeepLearning/classfication/VggNet/VggNet.py" 1116 | ] 1117 | ], 1118 | "width": 0.0 1119 | }, 1120 | "select_project": 1121 | { 1122 | "height": 500.0, 1123 | "last_filter": "", 1124 | "selected_items": 1125 | [ 1126 | ], 1127 | "width": 380.0 1128 | }, 1129 | "select_symbol": 1130 | { 1131 | "height": 0.0, 1132 | "last_filter": "", 1133 | "selected_items": 1134 | [ 1135 | ], 1136 | "width": 0.0 1137 | }, 1138 | "selected_group": 0, 1139 | "settings": 1140 | { 1141 | }, 1142 | "show_minimap": true, 1143 | "show_open_files": true, 1144 | "show_tabs": true, 1145 | "side_bar_visible": true, 1146 | "side_bar_width": 150.0, 1147 | "status_bar_visible": true, 1148 | "template_settings": 1149 | { 1150 | } 1151 | } 1152 | -------------------------------------------------------------------------------- /utils/__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/utils/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /utils/box/__init__.py: -------------------------------------------------------------------------------- 1 | from .prior_box import PriorBox 2 | from .box_utils import decode,nms, diounms 3 | from .box_utils import match, log_sum_exp,match_ious,bbox_overlaps_iou, bbox_overlaps_giou, bbox_overlaps_diou, bbox_overlaps_ciou 4 | 5 | 6 | 7 | -------------------------------------------------------------------------------- /utils/box/__pycache__/__init__.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/utils/box/__pycache__/__init__.cpython-35.pyc -------------------------------------------------------------------------------- /utils/box/__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/utils/box/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /utils/box/__pycache__/box_utils.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/utils/box/__pycache__/box_utils.cpython-35.pyc -------------------------------------------------------------------------------- /utils/box/__pycache__/box_utils.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/utils/box/__pycache__/box_utils.cpython-36.pyc -------------------------------------------------------------------------------- /utils/box/__pycache__/prior_box.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/utils/box/__pycache__/prior_box.cpython-35.pyc -------------------------------------------------------------------------------- /utils/box/__pycache__/prior_box.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/utils/box/__pycache__/prior_box.cpython-36.pyc -------------------------------------------------------------------------------- /utils/box/box_utils.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import torch 3 | import math 4 | 5 | def bbox_overlaps_diou(bboxes1, bboxes2): 6 | 7 | rows = bboxes1.shape[0] 8 | cols = bboxes2.shape[0] 9 | dious = torch.zeros((rows, cols)) 10 | if rows * cols == 0: 11 | return dious 12 | exchange = False 13 | if bboxes1.shape[0] > bboxes2.shape[0]: 14 | bboxes1, bboxes2 = bboxes2, bboxes1 15 | dious = torch.zeros((cols, rows)) 16 | exchange = True 17 | 18 | w1 = bboxes1[:, 2] - bboxes1[:, 0] 19 | h1 = bboxes1[:, 3] - bboxes1[:, 1] 20 | w2 = bboxes2[:, 2] - bboxes2[:, 0] 21 | h2 = bboxes2[:, 3] - bboxes2[:, 1] 22 | 23 | area1 = w1 * h1 24 | area2 = w2 * h2 25 | center_x1 = (bboxes1[:, 2] + bboxes1[:, 0]) / 2 26 | center_y1 = (bboxes1[:, 3] + bboxes1[:, 1]) / 2 27 | center_x2 = (bboxes2[:, 2] + bboxes2[:, 0]) / 2 28 | center_y2 = (bboxes2[:, 3] + bboxes2[:, 1]) / 2 29 | 30 | inter_max_xy = torch.min(bboxes1[:, 2:],bboxes2[:, 2:]) 31 | inter_min_xy = torch.max(bboxes1[:, :2],bboxes2[:, :2]) 32 | out_max_xy = torch.max(bboxes1[:, 2:],bboxes2[:, 2:]) 33 | out_min_xy = torch.min(bboxes1[:, :2],bboxes2[:, :2]) 34 | 35 | inter = torch.clamp((inter_max_xy - inter_min_xy), min=0) 36 | inter_area = inter[:, 0] * inter[:, 1] 37 | inter_diag = (center_x2 - center_x1)**2 + (center_y2 - center_y1)**2 38 | outer = torch.clamp((out_max_xy - out_min_xy), min=0) 39 | outer_diag = (outer[:, 0] ** 2) + (outer[:, 1] ** 2) 40 | union = area1+area2-inter_area 41 | dious = inter_area / union - (inter_diag) / outer_diag 42 | dious = torch.clamp(dious,min=-1.0,max = 1.0) 43 | if exchange: 44 | dious = dious.T 45 | return dious 46 | 47 | def bbox_overlaps_ciou(bboxes1, bboxes2): 48 | rows = bboxes1.shape[0] 49 | cols = bboxes2.shape[0] 50 | cious = torch.zeros((rows, cols)) 51 | if rows * cols == 0: 52 | return cious 53 | exchange = False 54 | if bboxes1.shape[0] > bboxes2.shape[0]: 55 | bboxes1, bboxes2 = bboxes2, bboxes1 56 | cious = torch.zeros((cols, rows)) 57 | exchange = True 58 | 59 | w1 = bboxes1[:, 2] - bboxes1[:, 0] 60 | h1 = bboxes1[:, 3] - bboxes1[:, 1] 61 | w2 = bboxes2[:, 2] - bboxes2[:, 0] 62 | h2 = bboxes2[:, 3] - bboxes2[:, 1] 63 | 64 | area1 = w1 * h1 65 | area2 = w2 * h2 66 | 67 | center_x1 = (bboxes1[:, 2] + bboxes1[:, 0]) / 2 68 | center_y1 = (bboxes1[:, 3] + bboxes1[:, 1]) / 2 69 | center_x2 = (bboxes2[:, 2] + bboxes2[:, 0]) / 2 70 | center_y2 = (bboxes2[:, 3] + bboxes2[:, 1]) / 2 71 | 72 | inter_max_xy = torch.min(bboxes1[:, 2:],bboxes2[:, 2:]) 73 | inter_min_xy = torch.max(bboxes1[:, :2],bboxes2[:, :2]) 74 | out_max_xy = torch.max(bboxes1[:, 2:],bboxes2[:, 2:]) 75 | out_min_xy = torch.min(bboxes1[:, :2],bboxes2[:, :2]) 76 | 77 | inter = torch.clamp((inter_max_xy - inter_min_xy), min=0) 78 | inter_area = inter[:, 0] * inter[:, 1] 79 | inter_diag = (center_x2 - center_x1)**2 + (center_y2 - center_y1)**2 80 | outer = torch.clamp((out_max_xy - out_min_xy), min=0) 81 | outer_diag = (outer[:, 0] ** 2) + (outer[:, 1] ** 2) 82 | union = area1+area2-inter_area 83 | u = (inter_diag) / outer_diag 84 | iou = inter_area / union 85 | v = (4 / (math.pi ** 2)) * torch.pow((torch.atan(w2 / h2) - torch.atan(w1 / h1)), 2) 86 | with torch.no_grad(): 87 | S = 1 - iou 88 | alpha = v / (S + v) 89 | cious = iou - (u + alpha * v) 90 | cious = torch.clamp(cious,min=-1.0,max = 1.0) 91 | if exchange: 92 | cious = cious.T 93 | return cious 94 | 95 | def bbox_overlaps_iou(bboxes1, bboxes2): 96 | rows = bboxes1.shape[0] 97 | cols = bboxes2.shape[0] 98 | ious = torch.zeros((rows, cols)) 99 | if rows * cols == 0: 100 | return ious 101 | exchange = False 102 | if bboxes1.shape[0] > bboxes2.shape[0]: 103 | bboxes1, bboxes2 = bboxes2, bboxes1 104 | ious = torch.zeros((cols, rows)) 105 | exchange = True 106 | area1 = (bboxes1[:, 2] - bboxes1[:, 0]) * ( 107 | bboxes1[:, 3] - bboxes1[:, 1]) 108 | area2 = (bboxes2[:, 2] - bboxes2[:, 0]) * ( 109 | bboxes2[:, 3] - bboxes2[:, 1]) 110 | 111 | inter_max_xy = torch.min(bboxes1[:, 2:],bboxes2[:, 2:]) 112 | inter_min_xy = torch.max(bboxes1[:, :2],bboxes2[:, :2]) 113 | 114 | inter = torch.clamp((inter_max_xy - inter_min_xy), min=0) 115 | inter_area = inter[:, 0] * inter[:, 1] 116 | union = area1+area2-inter_area 117 | ious = inter_area / union 118 | ious = torch.clamp(ious,min=0,max = 1.0) 119 | if exchange: 120 | ious = ious.T 121 | return ious 122 | 123 | def bbox_overlaps_giou(bboxes1, bboxes2): 124 | rows = bboxes1.shape[0] 125 | cols = bboxes2.shape[0] 126 | ious = torch.zeros((rows, cols)) 127 | if rows * cols == 0: 128 | return ious 129 | exchange = False 130 | if bboxes1.shape[0] > bboxes2.shape[0]: 131 | bboxes1, bboxes2 = bboxes2, bboxes1 132 | ious = torch.zeros((cols, rows)) 133 | exchange = True 134 | area1 = (bboxes1[:, 2] - bboxes1[:, 0]) * ( 135 | bboxes1[:, 3] - bboxes1[:, 1]) 136 | area2 = (bboxes2[:, 2] - bboxes2[:, 0]) * ( 137 | bboxes2[:, 3] - bboxes2[:, 1]) 138 | 139 | inter_max_xy = torch.min(bboxes1[:, 2:],bboxes2[:, 2:]) 140 | 141 | inter_min_xy = torch.max(bboxes1[:, :2],bboxes2[:, :2]) 142 | 143 | out_max_xy = torch.max(bboxes1[:, 2:],bboxes2[:, 2:]) 144 | 145 | out_min_xy = torch.min(bboxes1[:, :2],bboxes2[:, :2]) 146 | 147 | inter = torch.clamp((inter_max_xy - inter_min_xy), min=0) 148 | inter_area = inter[:, 0] * inter[:, 1] 149 | outer = torch.clamp((out_max_xy - out_min_xy), min=0) 150 | outer_area = outer[:, 0] * outer[:, 1] 151 | union = area1+area2-inter_area 152 | closure = outer_area 153 | 154 | ious = inter_area / union - (closure - union) / closure 155 | ious = torch.clamp(ious,min=-1.0,max = 1.0) 156 | if exchange: 157 | ious = ious.T 158 | return ious 159 | 160 | def point_form(boxes): 161 | """ Convert prior_boxes to (xmin, ymin, xmax, ymax) 162 | representation for comparison to point form ground truth data. 163 | Args: 164 | boxes: (tensor) center-size default boxes from priorbox layers. 165 | Return: 166 | boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes. 167 | """ 168 | #print(boxes) 169 | return torch.cat((boxes[:, :2] - boxes[:, 2:]/2, # xmin, ymin 170 | boxes[:, :2] + boxes[:, 2:]/2), 1) # xmax, ymax 171 | 172 | 173 | def center_size(boxes): 174 | """ Convert prior_boxes to (cx, cy, w, h) 175 | representation for comparison to center-size form ground truth data. 176 | Args: 177 | boxes: (tensor) point_form boxes 178 | Return: 179 | boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes. 180 | """ 181 | return torch.cat((boxes[:, 2:] + boxes[:, :2])/2, # cx, cy 182 | boxes[:, 2:] - boxes[:, :2], 1) # w, h 183 | 184 | 185 | def intersect(box_a, box_b): 186 | """ We resize both tensors to [A,B,2] without new malloc: 187 | [A,2] -> [A,1,2] -> [A,B,2] 188 | [B,2] -> [1,B,2] -> [A,B,2] 189 | Then we compute the area of intersect between box_a and box_b. 190 | Args: 191 | box_a: (tensor) bounding boxes, Shape: [A,4]. 192 | box_b: (tensor) bounding boxes, Shape: [B,4]. 193 | Return: 194 | (tensor) intersection area, Shape: [A,B]. 195 | """ 196 | #print(box_a) 197 | #print(box_b) 198 | A = box_a.size(0) 199 | B = box_b.size(0) 200 | max_xy = torch.min(box_a[:, 2:].unsqueeze(1).expand(A, B, 2), 201 | box_b[:, 2:].unsqueeze(0).expand(A, B, 2)) 202 | min_xy = torch.max(box_a[:, :2].unsqueeze(1).expand(A, B, 2), 203 | box_b[:, :2].unsqueeze(0).expand(A, B, 2)) 204 | inter = torch.clamp((max_xy - min_xy), min=0) 205 | return inter[:, :, 0] * inter[:, :, 1] 206 | 207 | 208 | def jaccard(box_a, box_b): 209 | """Compute the jaccard overlap of two sets of boxes. The jaccard overlap 210 | is simply the intersection over union of two boxes. Here we operate on 211 | ground truth boxes and default boxes. 212 | E.g.: 213 | A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B) 214 | Args: 215 | box_a: (tensor) Ground truth bounding boxes, Shape: [num_objects,4] 216 | box_b: (tensor) Prior boxes from priorbox layers, Shape: [num_priors,4] 217 | Return: 218 | jaccard overlap: (tensor) Shape: [box_a.size(0), box_b.size(0)] 219 | """ 220 | inter = intersect(box_a, box_b) 221 | area_a = ((box_a[:, 2]-box_a[:, 0]) * 222 | (box_a[:, 3]-box_a[:, 1])).unsqueeze(1).expand_as(inter) # [A,B] 223 | area_b = ((box_b[:, 2]-box_b[:, 0]) * 224 | (box_b[:, 3]-box_b[:, 1])).unsqueeze(0).expand_as(inter) # [A,B] 225 | union = area_a + area_b - inter 226 | return inter / union # [A,B] 227 | 228 | 229 | def match_ious(threshold, truths, priors, variances, labels, loc_t, conf_t, idx): 230 | """Match each prior box with the ground truth box of the highest jaccard 231 | overlap, encode the bounding boxes, then return the matched indices 232 | corresponding to both confidence and location preds. 233 | Args: 234 | threshold: (float) The overlap threshold used when mathing boxes. 235 | truths: (tensor) Ground truth boxes, Shape: [num_obj, num_priors]. 236 | priors: (tensor) Prior boxes from priorbox layers, Shape: [n_priors,4]. 237 | variances: (tensor) Variances corresponding to each prior coord, 238 | Shape: [num_priors, 4]. 239 | labels: (tensor) All the class labels for the image, Shape: [num_obj]. 240 | loc_t: (tensor) Tensor to be filled w/ endcoded location targets. 241 | conf_t: (tensor) Tensor to be filled w/ matched indices for conf preds. 242 | idx: (int) current batch index 243 | Return: 244 | The matched indices corresponding to 1)location and 2)confidence preds. 245 | """ 246 | # jaccard index 247 | loc_t[idx] = point_form(priors) 248 | overlaps = jaccard( 249 | truths, 250 | point_form(priors) 251 | ) 252 | # (Bipartite Matching) 253 | # [1,num_objects] best prior for each ground truth 254 | best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True) 255 | # [1,num_priors] best ground truth for each prior 256 | best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True) 257 | best_truth_idx.squeeze_(0) 258 | best_truth_overlap.squeeze_(0) 259 | best_prior_idx.squeeze_(1) 260 | best_prior_overlap.squeeze_(1) 261 | best_truth_overlap.index_fill_(0, best_prior_idx, 2) # ensure best prior 262 | # TODO refactor: index best_prior_idx with long tensor 263 | # ensure every gt matches with its prior of max overlap 264 | for j in range(best_prior_idx.size(0)): 265 | best_truth_idx[best_prior_idx[j]] = j 266 | matches = truths[best_truth_idx] # Shape: [num_priors,4] 267 | conf = labels[best_truth_idx] + 1 # Shape: [num_priors] 268 | conf[best_truth_overlap < threshold] = 0 # label as background 269 | loc_t[idx] = matches # [num_priors,4] encoded offsets to learn 270 | conf_t[idx] = conf # [num_priors] top class label for each prior 271 | 272 | 273 | def match(threshold, truths, priors, variances, labels, loc_t, conf_t, idx): 274 | """Match each prior box with the ground truth box of the highest jaccard 275 | overlap, encode the bounding boxes, then return the matched indices 276 | corresponding to both confidence and location preds. 277 | Args: 278 | threshold: (float) The overlap threshold used when mathing boxes. 279 | truths: (tensor) Ground truth boxes, Shape: [num_obj, num_priors]. 280 | priors: (tensor) Prior boxes from priorbox layers, Shape: [n_priors,4]. 281 | variances: (tensor) Variances corresponding to each prior coord, 282 | Shape: [num_priors, 4]. 283 | labels: (tensor) All the class labels for the image, Shape: [num_obj]. 284 | loc_t: (tensor) Tensor to be filled w/ endcoded location targets. 285 | conf_t: (tensor) Tensor to be filled w/ matched indices for conf preds. 286 | idx: (int) current batch index 287 | Return: 288 | The matched indices corresponding to 1)location and 2)confidence preds. 289 | """ 290 | # jaccard index 291 | overlaps = jaccard( 292 | truths, 293 | point_form(priors) 294 | ) 295 | # (Bipartite Matching) 296 | # [1,num_objects] best prior for each ground truth 297 | best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True) 298 | # [1,num_priors] best ground truth for each prior 299 | best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True) 300 | best_truth_idx.squeeze_(0) 301 | best_truth_overlap.squeeze_(0) 302 | best_prior_idx.squeeze_(1) 303 | best_prior_overlap.squeeze_(1) 304 | best_truth_overlap.index_fill_(0, best_prior_idx, 2) # ensure best prior 305 | # TODO refactor: index best_prior_idx with long tensor 306 | # ensure every gt matches with its prior of max overlap 307 | for j in range(best_prior_idx.size(0)): 308 | best_truth_idx[best_prior_idx[j]] = j 309 | matches = truths[best_truth_idx] # Shape: [num_priors,4] 310 | conf = labels[best_truth_idx] + 1 # Shape: [num_priors] 311 | conf[best_truth_overlap < threshold] = 0 # label as background 312 | loc = encode(matches, priors, variances) 313 | loc_t[idx] = loc # [num_priors,4] encoded offsets to learn 314 | conf_t[idx] = conf # [num_priors] top class label for each prior 315 | 316 | 317 | def encode(matched, priors, variances): 318 | """Encode the variances from the priorbox layers into the ground truth boxes 319 | we have matched (based on jaccard overlap) with the prior boxes. 320 | Args: 321 | matched: (tensor) Coords of ground truth for each prior in point-form 322 | Shape: [num_priors, 4]. 323 | priors: (tensor) Prior boxes in center-offset form 324 | Shape: [num_priors,4]. 325 | variances: (list[float]) Variances of priorboxes 326 | Return: 327 | encoded boxes (tensor), Shape: [num_priors, 4] 328 | """ 329 | 330 | # dist b/t match center and prior's center 331 | g_cxcy = (matched[:, :2] + matched[:, 2:])/2 - priors[:, :2] 332 | # encode variance 333 | g_cxcy /= (variances[0] * priors[:, 2:]) 334 | # match wh / prior wh 335 | g_wh = (matched[:, 2:] - matched[:, :2]) / priors[:, 2:] 336 | g_wh = torch.log(g_wh) / variances[1] 337 | # return target for smooth_l1_loss 338 | return torch.cat([g_cxcy, g_wh], 1) # [num_priors,4] 339 | 340 | 341 | # Adapted from https://github.com/Hakuyume/chainer-ssd 342 | def decode(loc, priors, variances): 343 | """Decode locations from predictions using priors to undo 344 | the encoding we did for offset regression at train time. 345 | Args: 346 | loc (tensor): location predictions for loc layers, 347 | Shape: [num_priors,4] 348 | priors (tensor): Prior boxes in center-offset form. 349 | Shape: [num_priors,4]. 350 | variances: (list[float]) Variances of priorboxes 351 | Return: 352 | decoded bounding box predictions 353 | """ 354 | 355 | boxes = torch.cat(( 356 | priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:], 357 | priors[:, 2:] * torch.exp(loc[:, 2:] * variances[1])), 1) 358 | boxes[:, :2] -= boxes[:, 2:] / 2 359 | boxes[:, 2:] += boxes[:, :2] 360 | #print(boxes) 361 | return boxes 362 | 363 | 364 | def log_sum_exp(x): 365 | """Utility function for computing log_sum_exp while determining 366 | This will be used to determine unaveraged confidence loss across 367 | all examples in a batch. 368 | Args: 369 | x (Variable(tensor)): conf_preds from conf layers 370 | """ 371 | x_max = x.data.max() 372 | return torch.log(torch.sum(torch.exp(x-x_max), 1, keepdim=True)) + x_max 373 | 374 | 375 | # Original author: Francisco Massa: 376 | # https://github.com/fmassa/object-detection.torch 377 | # Ported to PyTorch by Max deGroot (02/01/2017) 378 | def nms(boxes, scores, overlap=0.5, top_k=200): 379 | """Apply non-maximum suppression at test time to avoid detecting too many 380 | overlapping bounding boxes for a given object. 381 | Args: 382 | boxes: (tensor) The location preds for the img, Shape: [num_priors,4]. 383 | scores: (tensor) The class predscores for the img, Shape:[num_priors]. 384 | overlap: (float) The overlap thresh for suppressing unnecessary boxes. 385 | top_k: (int) The Maximum number of box preds to consider. 386 | Return: 387 | The indices of the kept boxes with respect to num_priors. 388 | """ 389 | 390 | keep = scores.new(scores.size(0)).zero_().long() 391 | if boxes.numel() == 0: 392 | return keep 393 | x1 = boxes[:, 0] 394 | y1 = boxes[:, 1] 395 | x2 = boxes[:, 2] 396 | y2 = boxes[:, 3] 397 | area = torch.mul(x2 - x1, y2 - y1) 398 | v, idx = scores.sort(0) # sort in ascending order 399 | # I = I[v >= 0.01] 400 | idx = idx[-top_k:] # indices of the top-k largest vals 401 | xx1 = boxes.new() 402 | yy1 = boxes.new() 403 | xx2 = boxes.new() 404 | yy2 = boxes.new() 405 | w = boxes.new() 406 | h = boxes.new() 407 | 408 | # keep = torch.Tensor() 409 | count = 0 410 | while idx.numel() > 0: 411 | i = idx[-1] # index of current largest val 412 | # keep.append(i) 413 | keep[count] = i 414 | count += 1 415 | if idx.size(0) == 1: 416 | break 417 | idx = idx[:-1] # remove kept element from view 418 | # load bboxes of next highest vals 419 | torch.index_select(x1, 0, idx, out=xx1) 420 | torch.index_select(y1, 0, idx, out=yy1) 421 | torch.index_select(x2, 0, idx, out=xx2) 422 | torch.index_select(y2, 0, idx, out=yy2) 423 | # store element-wise max with next highest score 424 | xx1 = torch.clamp(xx1, min=x1[i]) 425 | yy1 = torch.clamp(yy1, min=y1[i]) 426 | xx2 = torch.clamp(xx2, max=x2[i]) 427 | yy2 = torch.clamp(yy2, max=y2[i]) 428 | w.resize_as_(xx2) 429 | h.resize_as_(yy2) 430 | w = xx2 - xx1 431 | h = yy2 - yy1 432 | # check sizes of xx1 and xx2.. after each iteration 433 | w = torch.clamp(w, min=0.0) 434 | h = torch.clamp(h, min=0.0) 435 | inter = w*h 436 | # IoU = i / (area(a) + area(b) - i) 437 | rem_areas = torch.index_select(area, 0, idx) # load remaining areas) 438 | union = (rem_areas - inter) + area[i] 439 | IoU = inter/union # store result in iou 440 | # keep only elements with an IoU <= overlap 441 | idx = idx[IoU.le(overlap)] 442 | return keep, count 443 | 444 | def diounms(boxes, scores, overlap=0.5, top_k=200, beta1=1.0): 445 | """Apply DIoU-NMS at test time to avoid detecting too many 446 | overlapping bounding boxes for a given object. 447 | Args: 448 | boxes: (tensor) The location preds for the img, Shape: [num_priors,4]. 449 | scores: (tensor) The class predscores for the img, Shape:[num_priors]. 450 | overlap: (float) The overlap thresh for suppressing unnecessary boxes. 451 | top_k: (int) The Maximum number of box preds to consider. 452 | beta1: (float) DIoU=IoU-R_DIoU^{beta1}. 453 | Return: 454 | The indices of the kept boxes with respect to num_priors. 455 | """ 456 | 457 | keep = scores.new(scores.size(0)).zero_().long() 458 | if boxes.numel() == 0: 459 | return keep 460 | x1 = boxes[:, 0] 461 | y1 = boxes[:, 1] 462 | x2 = boxes[:, 2] 463 | y2 = boxes[:, 3] 464 | area = torch.mul(x2 - x1, y2 - y1) 465 | v, idx = scores.sort(0) # sort in ascending order 466 | # I = I[v >= 0.01] 467 | idx = idx[-top_k:] # indices of the top-k largest vals 468 | xx1 = boxes.new() 469 | yy1 = boxes.new() 470 | xx2 = boxes.new() 471 | yy2 = boxes.new() 472 | w = boxes.new() 473 | h = boxes.new() 474 | 475 | # keep = torch.Tensor() 476 | count = 0 477 | while idx.numel() > 0: 478 | i = idx[-1] # index of current largest val 479 | # keep.append(i) 480 | keep[count] = i 481 | count += 1 482 | if idx.size(0) == 1: 483 | break 484 | idx = idx[:-1] # remove kept element from view 485 | # load bboxes of next highest vals 486 | torch.index_select(x1, 0, idx, out=xx1) 487 | torch.index_select(y1, 0, idx, out=yy1) 488 | torch.index_select(x2, 0, idx, out=xx2) 489 | torch.index_select(y2, 0, idx, out=yy2) 490 | # store element-wise max with next highest score 491 | inx1 = torch.clamp(xx1, min=x1[i]) 492 | iny1 = torch.clamp(yy1, min=y1[i]) 493 | inx2 = torch.clamp(xx2, max=x2[i]) 494 | iny2 = torch.clamp(yy2, max=y2[i]) 495 | center_x1 = (x1[i] + x2[i]) / 2 496 | center_y1 = (y1[i] + y2[i]) / 2 497 | center_x2 = (xx1 + xx2) / 2 498 | center_y2 = (yy1 + yy2) / 2 499 | d = (center_x1 - center_x2) ** 2 + (center_y1 - center_y2) ** 2 500 | cx1 = torch.clamp(xx1, max=x1[i]) 501 | cy1 = torch.clamp(yy1, max=y1[i]) 502 | cx2 = torch.clamp(xx2, min=x2[i]) 503 | cy2 = torch.clamp(yy2, min=y2[i]) 504 | c = (cx2 - cx1) ** 2 + (cy2 - cy1) ** 2 505 | u= d / c 506 | w.resize_as_(xx2) 507 | h.resize_as_(yy2) 508 | w = inx2 - inx1 509 | h = iny2 - iny1 510 | # check sizes of xx1 and xx2.. after each iteration 511 | w = torch.clamp(w, min=0.0) 512 | h = torch.clamp(h, min=0.0) 513 | inter = w*h 514 | # IoU = i / (area(a) + area(b) - i) 515 | rem_areas = torch.index_select(area, 0, idx) # load remaining areas) 516 | union = (rem_areas - inter) + area[i] 517 | IoU = inter/union - u ** beta1 # store result in diou 518 | # keep only elements with an IoU <= overlap 519 | idx = idx[IoU.le(overlap)] 520 | return keep, count 521 | -------------------------------------------------------------------------------- /utils/box/prior_box.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | from math import sqrt as sqrt 3 | from itertools import product as product 4 | import torch 5 | 6 | 7 | class PriorBox(object): 8 | """Compute priorbox coordinates in center-offset form for each source 9 | feature map. 10 | """ 11 | def __init__(self, cfg): 12 | super(PriorBox, self).__init__() 13 | self.image_size = cfg['min_dim'] 14 | # number of priors for feature map location (either 4 or 6) 15 | self.num_priors = len(cfg['aspect_ratios']) 16 | self.variance = cfg['variance'] or [0.1] 17 | self.feature_maps = cfg['feature_maps'] 18 | self.min_sizes = cfg['min_sizes'] 19 | self.max_sizes = cfg['max_sizes'] 20 | self.steps = cfg['steps'] 21 | self.aspect_ratios = cfg['aspect_ratios'] 22 | self.clip = cfg['clip'] 23 | self.version = cfg['name'] 24 | for v in self.variance: 25 | if v <= 0: 26 | raise ValueError('Variances must be greater than 0') 27 | 28 | def forward(self): 29 | mean = [] 30 | for k, f in enumerate(self.feature_maps): 31 | for i, j in product(range(f), repeat=2): 32 | f_k = self.image_size / self.steps[k] 33 | # unit center x,y 34 | cx = (j + 0.5) / f_k 35 | cy = (i + 0.5) / f_k 36 | 37 | # aspect_ratio: 1 38 | # rel size: min_size 39 | s_k = self.min_sizes[k]/self.image_size 40 | mean += [cx, cy, s_k, s_k] 41 | 42 | # aspect_ratio: 1 43 | # rel size: sqrt(s_k * s_(k+1)) 44 | s_k_prime = sqrt(s_k * (self.max_sizes[k]/self.image_size)) 45 | mean += [cx, cy, s_k_prime, s_k_prime] 46 | 47 | # rest of aspect ratios 48 | #print(self.aspect_ratios[k]) 49 | for ar in self.aspect_ratios[k]: 50 | mean += [cx, cy, s_k*sqrt(ar), s_k/sqrt(ar)] 51 | mean += [cx, cy, s_k/sqrt(ar), s_k*sqrt(ar)] 52 | # back to torch land 53 | output = torch.Tensor(mean).view(-1, 4) 54 | if self.clip: 55 | output.clamp_(max=1, min=0) 56 | #print(output.shape) 57 | return output 58 | -------------------------------------------------------------------------------- /utils/detection/__init__.py: -------------------------------------------------------------------------------- 1 | from .detection import Detect 2 | 3 | -------------------------------------------------------------------------------- /utils/detection/__pycache__/__init__.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/utils/detection/__pycache__/__init__.cpython-35.pyc -------------------------------------------------------------------------------- /utils/detection/__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/utils/detection/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /utils/detection/__pycache__/detection.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/utils/detection/__pycache__/detection.cpython-35.pyc -------------------------------------------------------------------------------- /utils/detection/__pycache__/detection.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/utils/detection/__pycache__/detection.cpython-36.pyc -------------------------------------------------------------------------------- /utils/detection/detection.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch.autograd import Function 3 | from ..box import decode, nms, diounms 4 | 5 | def intersect(box_a, box_b): 6 | 7 | n = box_a.size(0) 8 | A = box_a.size(1) 9 | B = box_b.size(1) 10 | max_xy = torch.min(box_a[:, :, 2:].unsqueeze(2).expand(n, A, B, 2), 11 | box_b[:, :, 2:].unsqueeze(1).expand(n, A, B, 2)) 12 | min_xy = torch.max(box_a[:, :, :2].unsqueeze(2).expand(n, A, B, 2), 13 | box_b[:, :, :2].unsqueeze(1).expand(n, A, B, 2)) 14 | inter = torch.clamp((max_xy - min_xy), min=0) 15 | return inter[:, :, :, 0] * inter[:, :, :, 1] 16 | 17 | def jaccard(box_a, box_b, iscrowd:bool=False): 18 | 19 | use_batch = True 20 | if box_a.dim() == 2: 21 | use_batch = False 22 | box_a = box_a[None, ...] 23 | box_b = box_b[None, ...] 24 | 25 | inter = intersect(box_a, box_b) 26 | area_a = ((box_a[:, :, 2]-box_a[:, :, 0]) * 27 | (box_a[:, :, 3]-box_a[:, :, 1])).unsqueeze(2).expand_as(inter) # [A,B] 28 | area_b = ((box_b[:, :, 2]-box_b[:, :, 0]) * 29 | (box_b[:, :, 3]-box_b[:, :, 1])).unsqueeze(1).expand_as(inter) # [A,B] 30 | union = area_a + area_b - inter 31 | out = inter / area_a if iscrowd else inter / union 32 | 33 | return out if use_batch else out.squeeze(0) 34 | 35 | def box_diou(boxes1, boxes2, beta): 36 | 37 | def box_area(box): 38 | # box = 4xn 39 | return (box[2] - box[0]) * (box[3] - box[1]) 40 | 41 | area1 = box_area(boxes1.t()) 42 | area2 = box_area(boxes2.t()) 43 | 44 | lt = torch.max(boxes1[:, None, :2], boxes2[:, :2]) # [N,M,2] 45 | rb = torch.min(boxes1[:, None, 2:], boxes2[:, 2:]) # [N,M,2] 46 | clt=torch.min(boxes1[:, None, :2], boxes2[:, :2]) 47 | crb=torch.max(boxes1[:, None, 2:], boxes2[:, 2:]) 48 | x1=(boxes1[:, None, 0] + boxes1[:, None, 2])/2 49 | y1=(boxes1[:, None, 1] + boxes1[:, None, 3])/2 50 | x2=(boxes2[:, None, 0] + boxes2[:, None, 2])/2 51 | y2=(boxes2[:, None, 1] + boxes2[:, None, 3])/2 52 | d=(x1-x2.t())**2 + (y1-y2.t())**2 53 | c=((crb-clt)**2).sum(dim=2) 54 | inter = (rb - lt).clamp(min=0).prod(2) # [N,M] 55 | return inter / (area1[:, None] + area2 - inter) - (d / c) ** beta # iou = inter / (area1 + area2 - inter) 56 | 57 | class Detect(Function): 58 | """At test time, Detect is the final layer of SSD. Decode location preds, 59 | apply non-maximum suppression to location predictions based on conf 60 | scores and threshold to a top_k number of output predictions for both 61 | confidence score and locations. 62 | """ 63 | def __init__(self, num_classes, bkg_label, top_k, conf_thresh, nms_thresh,variance, nms_kind, beta1): 64 | self.num_classes = num_classes 65 | self.background_label = bkg_label 66 | self.top_k = top_k 67 | # Parameters used in nms. 68 | self.nms_thresh = nms_thresh 69 | if nms_thresh <= 0: 70 | raise ValueError('nms_threshold must be non negative.') 71 | self.conf_thresh = conf_thresh 72 | self.variance = variance 73 | self.nms_kind = nms_kind 74 | self.beta1 = beta1 75 | 76 | def forward(self, loc_data, conf_data, prior_data): 77 | """ 78 | Args: 79 | loc_data: (tensor) Loc preds from loc layers 80 | Shape: [batch,num_priors*4] 81 | conf_data: (tensor) Shape: Conf preds from conf layers 82 | Shape: [batch*num_priors,num_classes] 83 | prior_data: (tensor) Prior boxes and variances from priorbox layers 84 | Shape: [1,num_priors,4] 85 | nms_kind: greedynms or diounms 86 | """ 87 | num = loc_data.size(0) 88 | num_priors = prior_data.size(0) 89 | output = torch.zeros(num, self.num_classes, self.top_k, 5) 90 | conf_preds = conf_data.view(num, num_priors, 91 | self.num_classes).transpose(2, 1) 92 | 93 | # Decode predictions into bboxes. 94 | for i in range(num): 95 | decoded_boxes = decode(loc_data[i], prior_data, self.variance) 96 | # For each class, perform nms 97 | conf_scores = conf_preds[i].clone() 98 | sort_scores, idx = conf_scores.sort(1, descending=True) 99 | c_mask = (sort_scores>=self.conf_thresh)[:,:self.top_k] 100 | 101 | s1,s2 = decoded_boxes.size() 102 | z = decoded_boxes[idx] 103 | 104 | h = (torch.arange(0,21).cuda()).float().unsqueeze(1).unsqueeze(1) 105 | one = torch.ones(21,s1,s2).cuda().mul(h) 106 | boxes = z[:,:self.top_k][c_mask] #[N,4] box 107 | z = one*2 + z 108 | 109 | boxes_batch = z[:,:self.top_k][c_mask] #[N,4] box with offset 110 | 111 | scores = sort_scores[:,:self.top_k][c_mask] #[N,1] 112 | classes = one[:,:self.top_k][c_mask][:,0] #[N,1] 113 | 114 | # Do not support Fast NMS, due to it damages the performance. 115 | 116 | if self.nms_kind == "cluster_nms" or self.nms_kind == "cluster_weighted_nms" : 117 | iou = jaccard(boxes_batch, boxes_batch).triu_(diagonal=1) 118 | else: 119 | if self.nms_kind == "cluster_diounms" or self.nms_kind == "cluster_weighted_diounms": 120 | iou = box_diou(boxes_batch, boxes_batch, self.beta1).triu_(diagonal=1) 121 | else: 122 | assert Exception("Currently, NMS only surports 'cluster_nms', 'cluster_diounms', 'cluster_weighted_nms', 'cluster_weighted_diounms'.") 123 | B = iou 124 | for j in range(999): 125 | A=B 126 | maxA=A.max(dim=0)[0] 127 | E = (maxA0.8).float() + torch.eye(n).cuda()) * (scores.reshape((1,n))) 135 | xx1 = boxes[:,0].expand(n,n) 136 | yy1 = boxes[:,1].expand(n,n) 137 | xx2 = boxes[:,2].expand(n,n) 138 | yy2 = boxes[:,3].expand(n,n) 139 | 140 | weightsum=weights.sum(dim=1) 141 | xx1 = (xx1*weights).sum(dim=1)/(weightsum) 142 | yy1 = (yy1*weights).sum(dim=1)/(weightsum) 143 | xx2 = (xx2*weights).sum(dim=1)/(weightsum) 144 | yy2 = (yy2*weights).sum(dim=1)/(weightsum) 145 | boxes = torch.stack([xx1, yy1, xx2, yy2], 1) 146 | 147 | boxes = boxes[keep] 148 | scores = scores[keep] 149 | classes = classes[keep] 150 | 151 | score_box = torch.cat((scores.unsqueeze(1),boxes), 1) 152 | 153 | for cl in range(1, self.num_classes): 154 | mask = (classes == cl) 155 | output[i, cl, :]=torch.cat((score_box[mask],output[i, cl, :]),0)[:self.top_k] 156 | return output 157 | 158 | def forward_traditional_nms(self, loc_data, conf_data, prior_data): 159 | """ 160 | Args: 161 | loc_data: (tensor) Loc preds from loc layers 162 | Shape: [batch,num_priors*4] 163 | conf_data: (tensor) Shape: Conf preds from conf layers 164 | Shape: [batch*num_priors,num_classes] 165 | prior_data: (tensor) Prior boxes and variances from priorbox layers 166 | Shape: [1,num_priors,4] 167 | nms_kind: greedynms or diounms 168 | """ 169 | 170 | # This funtion is no longer supported. Due to extremely time-consuming. 171 | 172 | num = loc_data.size(0) 173 | num_priors = prior_data.size(0) 174 | output = torch.zeros(num, self.num_classes, self.top_k, 5) 175 | conf_preds = conf_data.view(num, num_priors, 176 | self.num_classes).transpose(2, 1) 177 | 178 | # Decode predictions into bboxes. 179 | for i in range(num): 180 | decoded_boxes = decode(loc_data[i], prior_data, self.variance) 181 | # For each class, perform nms 182 | conf_scores = conf_preds[i].clone() 183 | 184 | for cl in range(1, self.num_classes): 185 | c_mask = conf_scores[cl].gt(self.conf_thresh) 186 | scores = conf_scores[cl][c_mask] 187 | if scores.size(0) == 0: 188 | continue 189 | l_mask = c_mask.unsqueeze(1).expand_as(decoded_boxes) 190 | boxes = decoded_boxes[l_mask].view(-1, 4) 191 | # idx of highest scoring and non-overlapping boxes per class 192 | if self.nms_kind == "greedynms": 193 | ids, count = diounms(boxes, scores, self.nms_thresh, self.top_k) 194 | else: 195 | if self.nms_kind == "diounms": 196 | ids, count = diounms(boxes, scores, self.nms_thresh, self.top_k, self.beta1) 197 | else: 198 | print("use default greedy-NMS") 199 | ids, count = nms(boxes, scores, self.nms_thresh, self.top_k) 200 | output[i, cl, :count] = \ 201 | torch.cat((scores[ids[:count]].unsqueeze(1), 202 | boxes[ids[:count]]), 1) 203 | flt = output.contiguous().view(num, -1, 5) 204 | _, idx = flt[:, :, 0].sort(1, descending=True) 205 | _, rank = idx.sort(1) 206 | flt[(rank < self.top_k).unsqueeze(-1).expand_as(flt)].fill_(0) 207 | return output 208 | -------------------------------------------------------------------------------- /utils/loss/__init__.py: -------------------------------------------------------------------------------- 1 | from .multibox_loss import MultiBoxLoss 2 | -------------------------------------------------------------------------------- /utils/loss/__pycache__/__init__.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/utils/loss/__pycache__/__init__.cpython-35.pyc -------------------------------------------------------------------------------- /utils/loss/__pycache__/__init__.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/utils/loss/__pycache__/__init__.cpython-36.pyc -------------------------------------------------------------------------------- /utils/loss/__pycache__/multibox_loss.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/utils/loss/__pycache__/multibox_loss.cpython-35.pyc -------------------------------------------------------------------------------- /utils/loss/__pycache__/multibox_loss.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Zzh-tju/DIoU-SSD-pytorch/cec038bc1057f0cd532752413b24924fde427f09/utils/loss/__pycache__/multibox_loss.cpython-36.pyc -------------------------------------------------------------------------------- /utils/loss/multibox_loss.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import torch 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | from ..box import match, log_sum_exp 6 | from ..box import match_ious, bbox_overlaps_iou, bbox_overlaps_giou, bbox_overlaps_diou, bbox_overlaps_ciou, decode 7 | 8 | class FocalLoss(nn.Module): 9 | """ 10 | This criterion is a implemenation of Focal Loss, which is proposed in 11 | Focal Loss for Dense Object Detection. 12 | 13 | Loss(x, class) = - \alpha (1-softmax(x)[class])^gamma \log(softmax(x)[class]) 14 | 15 | The losses are averaged across observations for each minibatch. 16 | 17 | Args: 18 | alpha(1D Tensor, Variable) : the scalar factor for this criterion 19 | gamma(float, double) : gamma > 0; reduces the relative loss for well-classified examples (p > .5), 20 | putting more focus on hard, misclassified examples 21 | size_average(bool): By default, the losses are averaged over observations for each minibatch. 22 | However, if the field size_average is set to False, the losses are 23 | instead summed for each minibatch. 24 | """ 25 | def __init__(self, class_num, alpha=None, gamma=2, size_average=True): 26 | super(FocalLoss, self).__init__() 27 | if alpha is None: 28 | self.alpha = torch.ones(class_num, 1) 29 | else: 30 | if isinstance(alpha, Variable): 31 | self.alpha = alpha 32 | else: 33 | self.alpha = alpha 34 | self.gamma = gamma 35 | self.class_num = class_num 36 | self.size_average = size_average 37 | print(self.gamma) 38 | def forward(self, inputs, targets): 39 | N = inputs.size(0) 40 | C = inputs.size(1) 41 | P = F.softmax(inputs,dim= 1) 42 | class_mask = inputs.data.new(N, C).fill_(0) 43 | class_mask = Variable(class_mask) 44 | ids = targets.view(-1, 1) 45 | class_mask.scatter_(1, ids.data, 1.) 46 | 47 | if inputs.is_cuda and not self.alpha.is_cuda: 48 | self.alpha = self.alpha.cuda() 49 | alpha = self.alpha[ids.data.view(-1)] 50 | 51 | probs = (P*class_mask).sum(1).view(-1,1) 52 | 53 | log_p = probs.log() 54 | 55 | batch_loss = -alpha*(torch.pow((1-probs), self.gamma))*log_p 56 | 57 | if self.size_average: 58 | loss = batch_loss.mean() 59 | else: 60 | loss = batch_loss.sum() 61 | return loss 62 | 63 | 64 | class IouLoss(nn.Module): 65 | 66 | def __init__(self,pred_mode = 'Center',size_sum=True,variances=None,losstype='Giou'): 67 | super(IouLoss, self).__init__() 68 | self.size_sum = size_sum 69 | self.pred_mode = pred_mode 70 | self.variances = variances 71 | self.loss = losstype 72 | def forward(self, loc_p, loc_t,prior_data): 73 | num = loc_p.shape[0] 74 | 75 | if self.pred_mode == 'Center': 76 | decoded_boxes = decode(loc_p, prior_data, self.variances) 77 | else: 78 | decoded_boxes = loc_p 79 | if self.loss == 'Iou': 80 | loss = torch.sum(1.0 - bbox_overlaps_iou(decoded_boxes, loc_t)) 81 | else: 82 | if self.loss == 'Giou': 83 | loss = torch.sum(1.0 - bbox_overlaps_giou(decoded_boxes,loc_t)) 84 | else: 85 | if self.loss == 'Diou': 86 | loss = torch.sum(1.0 - bbox_overlaps_diou(decoded_boxes,loc_t)) 87 | else: 88 | loss = torch.sum(1.0 - bbox_overlaps_ciou(decoded_boxes, loc_t)) 89 | 90 | if self.size_sum: 91 | loss = loss 92 | else: 93 | loss = loss/num 94 | return loss 95 | 96 | class MultiBoxLoss(nn.Module): 97 | """SSD Weighted Loss Function 98 | Compute Targets: 99 | 1) Produce Confidence Target Indices by matching ground truth boxes 100 | with (default) 'priorboxes' that have jaccard index > threshold parameter 101 | (default threshold: 0.5). 102 | 2) Produce localization target by 'encoding' variance into offsets of ground 103 | truth boxes and their matched 'priorboxes'. 104 | 3) Hard negative mining to filter the excessive number of negative examples 105 | that comes with using a large number of default bounding boxes. 106 | (default negative:positive ratio 3:1) 107 | Objective Loss: 108 | L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N 109 | Where, Lconf is the CrossEntropy Loss and Lloc is the SmoothL1 Loss 110 | weighted by α which is set to 1 by cross val. 111 | Args: 112 | c: class confidences, 113 | l: predicted boxes, 114 | g: ground truth boxes 115 | N: number of matched default boxes 116 | See: https://arxiv.org/pdf/1512.02325.pdf for more details. 117 | """ 118 | 119 | def __init__(self, cfg, overlap_thresh, prior_for_matching, 120 | bkg_label, neg_mining, neg_pos, neg_overlap, encode_target, 121 | use_gpu=True,loss_name = 'SmoothL1'): 122 | super(MultiBoxLoss, self).__init__() 123 | self.use_gpu = use_gpu 124 | 125 | self.num_classes = cfg['num_classes'] 126 | self.threshold = overlap_thresh 127 | self.background_label = bkg_label 128 | self.encode_target = encode_target 129 | self.use_prior_for_matching = prior_for_matching 130 | self.do_neg_mining = neg_mining 131 | self.negpos_ratio = neg_pos 132 | self.neg_overlap = neg_overlap 133 | self.variance = cfg['variance'] 134 | self.focalloss = FocalLoss(self.num_classes,gamma=2,size_average = False) 135 | self.loss = loss_name 136 | self.gious = IouLoss(pred_mode = 'Center',size_sum=True,variances=self.variance, losstype=self.loss) 137 | if self.loss != 'SmoothL1' or self.loss !='Giou': 138 | assert Exception("THe loss is Error, loss name must be SmoothL1 or Giou") 139 | 140 | else: 141 | match_ious(self.threshold, truths, defaults, self.variance, labels, 142 | loc_t, conf_t, idx) 143 | 144 | def forward(self, predictions, targets): 145 | """Multibox Loss 146 | Args: 147 | predictions (tuple): A tuple containing loc preds, conf preds, 148 | and prior boxes from SSD net. 149 | conf shape: torch.size(batch_size,num_priors,num_classes) 150 | loc shape: torch.size(batch_size,num_priors,4) 151 | priors shape: torch.size(num_priors,4) 152 | 153 | targets (tensor): Ground truth boxes and labels for a batch, 154 | shape: [batch_size,num_objs,5] (last idx is the label). 155 | """ 156 | loc_data, conf_data, priors = predictions 157 | num = loc_data.size(0) 158 | 159 | priors = priors[:loc_data.size(1), :] 160 | 161 | num_priors = (priors.size(0)) 162 | 163 | # match priors (default boxes) and ground truth boxes 164 | loc_t = torch.Tensor(num, num_priors, 4) 165 | 166 | conf_t = torch.LongTensor(num, num_priors) 167 | for idx in range(num): 168 | truths = targets[idx][:, :-1].data 169 | labels = targets[idx][:, -1].data 170 | defaults = priors.data 171 | if self.loss == 'SmoothL1': 172 | match(self.threshold, truths, defaults, self.variance, labels, 173 | loc_t, conf_t, idx) 174 | else: 175 | match_ious(self.threshold, truths, defaults, self.variance, labels, 176 | loc_t, conf_t, idx) 177 | 178 | if self.use_gpu: 179 | loc_t = loc_t.cuda() 180 | conf_t = conf_t.cuda() 181 | # wrap targets 182 | #loc_t = Variable(loc_t, requires_grad=True) 183 | #conf_t = Variable(conf_t, requires_grad=True) 184 | 185 | pos = conf_t > 0 186 | num_pos = pos.sum(dim=1, keepdim=True) 187 | # Localization Loss (Smooth L1) 188 | # Shape: [batch,num_priors,4] 189 | pos_idx = pos.unsqueeze(pos.dim()).expand_as(loc_data) 190 | 191 | loc_p = loc_data[pos_idx].view(-1, 4) 192 | loc_t = loc_t[pos_idx].view(-1, 4) 193 | 194 | if self.loss == 'SmoothL1': 195 | loss_l = F.smooth_l1_loss(loc_p, loc_t, reduction='sum') 196 | else: 197 | giou_priors = priors.data.unsqueeze(0).expand_as(loc_data) 198 | loss_l = self.gious(loc_p,loc_t,giou_priors[pos_idx].view(-1, 4)) 199 | # Compute max conf across batch for hard negative mining 200 | batch_conf = conf_data.view(-1, self.num_classes) 201 | loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1, 1)) 202 | 203 | # Hard Negative Mining 204 | loss_c = loss_c.view(num, -1) 205 | loss_c[pos] = 0 206 | _, loss_idx = loss_c.sort(1, descending=True) 207 | _, idx_rank = loss_idx.sort(1) 208 | num_pos = pos.long().sum(1, keepdim=True) 209 | num_neg = torch.clamp(self.negpos_ratio*num_pos, max=pos.size(1)-1) 210 | neg = idx_rank < num_neg.expand_as(idx_rank) 211 | 212 | # Confidence Loss Including Positive and Negative Examples 213 | pos_idx = pos.unsqueeze(2).expand_as(conf_data) 214 | neg_idx = neg.unsqueeze(2).expand_as(conf_data) 215 | conf_p = conf_data[(pos_idx+neg_idx).gt(0)].view(-1, self.num_classes) 216 | targets_weighted = conf_t[(pos+neg).gt(0)] 217 | loss_c = F.cross_entropy(conf_p, targets_weighted, reduction='sum') 218 | 219 | # Sum of losses: L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N 220 | ''' 221 | batch_conf = conf_data.view(-1, self.num_classes) 222 | loss_c = self.focalloss(batch_conf,conf_t) 223 | ''' 224 | N = num_pos.data.sum().double() 225 | loss_l = loss_l.double() 226 | loss_c = loss_c.double() 227 | loss_l /= N 228 | loss_c /= N 229 | 230 | return loss_l, loss_c 231 | 232 | 233 | -------------------------------------------------------------------------------- /work_dir/DIoU-NMS.txt: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------- 2 | Results computed with the **unofficial** Python eval code. 3 | Results should be very close to the official MATLAB eval code. 4 | -------------------------------------------------------------- 5 | 0.7857084352769135 0.5633757991885664 0.5162622076659289 6 | --------------------------------------------------------------------------------