├── .gitignore ├── README.md ├── convert_to_onnx.py ├── data ├── __init__.py ├── config.py ├── data_augment.py └── wider_face.py ├── detect.py ├── images ├── _-20_1008_0.jpg ├── _-20_144_2.jpg ├── c2f3bc4499934b8ed942743ee8e3082a.jpg └── mobilenetv22222.jpg ├── layers ├── __init__.py ├── functions │ └── prior_box.py └── modules │ ├── __init__.py │ └── multibox_loss.py ├── models ├── __init__.py ├── net.py ├── retinaface.py └── utils.py ├── test_widerface.py ├── train.py ├── utils ├── __init__.py ├── box_utils.py ├── nms │ ├── __init__.py │ └── py_cpu_nms.py └── timer.py └── widerface_evaluate ├── README.md ├── box_overlaps.c ├── box_overlaps.pyx ├── evaluation.py ├── ground_truth ├── wider_easy_val.mat ├── wider_face_val.mat ├── wider_hard_val.mat └── wider_medium_val.mat └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | pip-wheel-metadata/ 24 | share/python-wheels/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | MANIFEST 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .nox/ 44 | .coverage 45 | .coverage.* 46 | .cache 47 | nosetests.xml 48 | coverage.xml 49 | *.cover 50 | *.py,cover 51 | .hypothesis/ 52 | .pytest_cache/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | target/ 76 | 77 | # Jupyter Notebook 78 | .ipynb_checkpoints 79 | 80 | # IPython 81 | profile_default/ 82 | ipython_config.py 83 | 84 | # pyenv 85 | .python-version 86 | 87 | # pipenv 88 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 89 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 90 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 91 | # install all needed dependencies. 92 | #Pipfile.lock 93 | 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 95 | __pypackages__/ 96 | 97 | # Celery stuff 98 | celerybeat-schedule 99 | celerybeat.pid 100 | 101 | # SageMath parsed files 102 | *.sage.py 103 | 104 | # Environments 105 | .env 106 | .venv 107 | env/ 108 | venv/ 109 | ENV/ 110 | env.bak/ 111 | venv.bak/ 112 | 113 | # Spyder project settings 114 | .spyderproject 115 | .spyproject 116 | 117 | # Rope project settings 118 | .ropeproject 119 | 120 | # mkdocs documentation 121 | /site 122 | 123 | # mypy 124 | .mypy_cache/ 125 | .dmypy.json 126 | dmypy.json 127 | 128 | # Pyre type checker 129 | .pyre/ 130 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # LightWeightFaceDetector 2 | 3 | ## update20210510 4 | Upload a new face detection dataset with face box and 5 landmarks. this dataset makes up of big face and helps to improve detection acc with close face.You can add this dataset to widerface. 5 | 6 | 7 | [MobileFaceDet](https://pan.baidu.com/s/1x8zATo7TDx300JLPxyvI8g) passwords: eu8w 8 | 9 | 10 | 开源一个近场景人脸检测和关键点数据集,数据集包含了27k+张人脸标注,全部为近距离人脸,类似于手机前置拍摄。有助于改善移动端人脸检测和关键点回归精度,将这个数据集和widerface train数据集合并,可以训练了一个大小仅仅120k的人脸检测模型. 11 | 数据集标注如下: 12 | 13 | 14 | ![](images/_-20_1008_0.jpg) 15 | ![](images/_-20_144_2.jpg) 16 | 17 | Ultra Light Weight Face Detection with Landmark, model size is around 1M+ for Mobile or Edge devices. I samplified RetinaFace structure for fast inference. 18 | 19 | I test four light-weight network as backbone including mobilenet v1, v2, v3 and efficientnet-b0. 20 | 21 | 适用于移动端或者边缘计算的轻量人脸检测和关键点检测模型,模型仅仅1M多。主要基于RetinaFace结构简化,删除了前面几个大特征图上的Head,因此小目标的人脸检测可能会有影响,在一般应用场景下影响不大。 22 | 23 | 这里速度最快的是mobilenet_v2_0.1,效果如图: 24 | 25 | ![](./images/mobilenetv22222.jpg) 26 | 27 | ## WiderFace Val Performance 28 | 29 | | Models | Easy | Medium | Hard | 30 | | ----------------- | ------- | ------- | ------- | 31 | | mobilenetv1_0.25 | 0.91718 | 0.79766 | 0.3592 | 32 | | mobilenetv2_0.1 | 0.85330 | 0.68946 | 0.2993 | 33 | | mobilenetv3_small | 0.93419 | 0.83259 | 0.3850 | 34 | | efficientnet-b0 | 0.93167 | 0.81466 | 0.37020 | 35 | 36 | ## Data 37 | 38 | 1. Download the [WIDERFACE](http://shuoyang1213.me/WIDERFACE/WiderFace_Results.html) dataset. 39 | 2. Here we use the organized dataset we used as in the above directory structure. 40 | 41 | Link: from [google cloud](https://drive.google.com/open?id=11UGV3nbVv1x9IC--_tK3Uxf7hA6rlbsS) or [baidu cloud](https://pan.baidu.com/s/1jIp9t30oYivrAvrgUgIoLQ) Password: ruck 42 | 43 | ## Training 44 | 45 | We provide four light weight backbone(mobilenetv1, mobilenetv2, mobilenetv3, efficientnetb0) network to train model. 46 | 47 | 1.make dir ./weights/ and download imagenet pretrained weights from [link](链接: https://pan.baidu.com/s/1zhyL9ULuIi1KdtXzhSQ4yQ 提取码: urei) and put them in ./weights/ 48 | 49 | ```Shell 50 | ./weights/ 51 | mobilenet0.25_Final.pth 52 | mobilenetV1X0.25_pretrain.tar 53 | efficientnetb0_face.pth 54 | mobilenetv3.pth 55 | mobilenetv2_0.1_face.pth 56 | ... 57 | ``` 58 | 59 | 1. Before training, you can check network configuration (e.g. batch_size, min_sizes and steps etc..) in ``data/config.py and train.py``. 60 | 61 | 2. Train the model using WIDER FACE: 62 | 63 | ```Shell 64 | CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --network mobilenetv1 65 | CUDA_VISIBLE_DEVICES=0 python train.py --network mobilenetv1 66 | ``` 67 | 68 | 69 | ## Evaluation 70 | 71 | ### Evaluation widerface val 72 | 73 | 1. Generate txt file 74 | 75 | ```Shell 76 | python test_widerface.py --trained_model weight_file --network mobilenetv1(or mobilenetv2, mobilenetv3, efficientnetb0) 77 | ``` 78 | 79 | 2. Evaluate txt results. 80 | 81 | ```Shell 82 | cd ./widerface_evaluate 83 | python setup.py build_ext --inplace 84 | python evaluation.py 85 | ``` 86 | ## Android and IOS 87 | Android is deployed with libtorch:https://github.com/midasklr/facedetection_android.pytorch 88 | IOS use ncnn 89 | ## References 90 | 91 | [Pytorch_Retinaface](https://github.com/biubug6/Pytorch_Retinaface) 92 | 93 | -------------------------------------------------------------------------------- /convert_to_onnx.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import os 3 | import argparse 4 | import torch 5 | import torch.backends.cudnn as cudnn 6 | import numpy as np 7 | from data import cfg_mnet, cfg_re50 8 | from layers.functions.prior_box import PriorBox 9 | from utils.nms.py_cpu_nms import py_cpu_nms 10 | import cv2 11 | from models.retinaface import RetinaFace 12 | from utils.box_utils import decode, decode_landm 13 | from utils.timer import Timer 14 | 15 | 16 | parser = argparse.ArgumentParser(description='Test') 17 | parser.add_argument('-m', '--trained_model', default='./weights/mobilenet0.25_Final.pth', 18 | type=str, help='Trained state_dict file path to open') 19 | parser.add_argument('--network', default='mobile0.25', help='Backbone network mobile0.25 or resnet50') 20 | parser.add_argument('--long_side', default=640, help='when origin_size is false, long_side is scaled size(320 or 640 for long side)') 21 | parser.add_argument('--cpu', action="store_true", default=True, help='Use cpu inference') 22 | 23 | args = parser.parse_args() 24 | 25 | 26 | def check_keys(model, pretrained_state_dict): 27 | ckpt_keys = set(pretrained_state_dict.keys()) 28 | model_keys = set(model.state_dict().keys()) 29 | used_pretrained_keys = model_keys & ckpt_keys 30 | unused_pretrained_keys = ckpt_keys - model_keys 31 | missing_keys = model_keys - ckpt_keys 32 | print('Missing keys:{}'.format(len(missing_keys))) 33 | print('Unused checkpoint keys:{}'.format(len(unused_pretrained_keys))) 34 | print('Used keys:{}'.format(len(used_pretrained_keys))) 35 | assert len(used_pretrained_keys) > 0, 'load NONE from pretrained checkpoint' 36 | return True 37 | 38 | 39 | def remove_prefix(state_dict, prefix): 40 | ''' Old style model is stored with all names of parameters sharing common prefix 'module.' ''' 41 | print('remove prefix \'{}\''.format(prefix)) 42 | f = lambda x: x.split(prefix, 1)[-1] if x.startswith(prefix) else x 43 | return {f(key): value for key, value in state_dict.items()} 44 | 45 | 46 | def load_model(model, pretrained_path, load_to_cpu): 47 | print('Loading pretrained model from {}'.format(pretrained_path)) 48 | if load_to_cpu: 49 | pretrained_dict = torch.load(pretrained_path, map_location=lambda storage, loc: storage) 50 | else: 51 | device = torch.cuda.current_device() 52 | pretrained_dict = torch.load(pretrained_path, map_location=lambda storage, loc: storage.cuda(device)) 53 | if "state_dict" in pretrained_dict.keys(): 54 | pretrained_dict = remove_prefix(pretrained_dict['state_dict'], 'module.') 55 | else: 56 | pretrained_dict = remove_prefix(pretrained_dict, 'module.') 57 | check_keys(model, pretrained_dict) 58 | model.load_state_dict(pretrained_dict, strict=False) 59 | return model 60 | 61 | 62 | if __name__ == '__main__': 63 | torch.set_grad_enabled(False) 64 | cfg = None 65 | if args.network == "mobile0.25": 66 | cfg = cfg_mnet 67 | elif args.network == "resnet50": 68 | cfg = cfg_re50 69 | # net and model 70 | net = RetinaFace(cfg=cfg, phase = 'test') 71 | net = load_model(net, args.trained_model, args.cpu) 72 | net.eval() 73 | print('Finished loading model!') 74 | print(net) 75 | device = torch.device("cpu" if args.cpu else "cuda") 76 | net = net.to(device) 77 | 78 | # ------------------------ export ----------------------------- 79 | output_onnx = 'FaceDetector.onnx' 80 | print("==> Exporting model to ONNX format at '{}'".format(output_onnx)) 81 | input_names = ["input0"] 82 | output_names = ["output0"] 83 | inputs = torch.randn(1, 3, args.long_side, args.long_side).to(device) 84 | 85 | torch_out = torch.onnx._export(net, inputs, output_onnx, export_params=True, verbose=False, 86 | input_names=input_names, output_names=output_names) 87 | 88 | 89 | -------------------------------------------------------------------------------- /data/__init__.py: -------------------------------------------------------------------------------- 1 | from .wider_face import WiderFaceDetection, detection_collate 2 | from .data_augment import * 3 | from .config import * 4 | -------------------------------------------------------------------------------- /data/config.py: -------------------------------------------------------------------------------- 1 | # config.py 2 | 3 | cfg_mnetv1 = { 4 | 'name': 'mobilenet0.25', 5 | 'min_sizes': [[64, 128], [256, 512]], 6 | 'steps': [16, 32], 7 | 'variance': [0.1, 0.2], 8 | 'clip': False, 9 | 'loc_weight': 2.0, 10 | 'gpu_train': True, 11 | 'batch_size': 64, 12 | 'ngpu': 1, 13 | 'epoch': 120, 14 | 'decay1': 80, 15 | 'decay2': 100, 16 | 'image_size': 640, 17 | 'pretrain': True, 18 | 'return_layers': {'stage2': 2, 'stage3': 3}, 19 | 'in_channel': 32, 20 | 'out_channel': 64 21 | } 22 | 23 | cfg_mnetv2 = { 24 | 'name': 'mobilenetv2_0.1', 25 | 'min_sizes': [[64, 128], [256, 512]], 26 | 'steps': [16, 32], 27 | 'variance': [0.1, 0.2], 28 | 'clip': False, 29 | 'loc_weight': 2.0, 30 | 'gpu_train': True, 31 | 'batch_size': 64, 32 | 'ngpu': 1, 33 | 'epoch': 120, 34 | 'decay1': 80, 35 | 'decay2': 100, 36 | 'image_size': 640, 37 | 'pretrain': True, 38 | 'return_layers': {'stage2': 2, 'stage3': 3}, 39 | 'in_channel1': 12, 40 | 'in_channel2': 1280, 41 | 'out_channel': 64 42 | } 43 | 44 | cfg_mnetv3 = { 45 | 'name': 'mobilenetv3', 46 | 'min_sizes': [[64, 128], [256, 512]], 47 | 'steps': [16, 32], 48 | 'variance': [0.1, 0.2], 49 | 'clip': False, 50 | 'loc_weight': 2.0, 51 | 'gpu_train': True, 52 | 'batch_size': 32, 53 | 'ngpu': 1, 54 | 'epoch': 120, 55 | 'decay1': 80, 56 | 'decay2': 100, 57 | 'image_size': 640, 58 | 'pretrain': True, 59 | 'return_layers': {'stage2': 2, 'stage3': 3}, 60 | 'in_channel1': 48, 61 | 'in_channel2': 576, 62 | 'out_channel': 64 63 | } 64 | 65 | cfg_efnetb0 = { 66 | 'name': 'efficientnetb0', 67 | 'min_sizes': [[64, 128], [256, 512]], 68 | 'steps': [16, 32], 69 | 'variance': [0.1, 0.2], 70 | 'clip': False, 71 | 'loc_weight': 2.0, 72 | 'gpu_train': True, 73 | 'batch_size': 8, 74 | 'ngpu': 1, 75 | 'epoch': 120, 76 | 'decay1': 80, 77 | 'decay2': 100, 78 | 'image_size': 640, 79 | 'pretrain': True, 80 | 'return_layers': {'stage2': 2, 'stage3': 3}, 81 | 'in_channel1': 112, 82 | 'in_channel2': 1280, 83 | 'out_channel': 64 84 | } 85 | 86 | 87 | 88 | -------------------------------------------------------------------------------- /data/data_augment.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | import random 4 | from utils.box_utils import matrix_iof 5 | 6 | 7 | def _crop(image, boxes, labels, landm, img_dim): 8 | height, width, _ = image.shape 9 | pad_image_flag = True 10 | 11 | for _ in range(250): 12 | """ 13 | if random.uniform(0, 1) <= 0.2: 14 | scale = 1.0 15 | else: 16 | scale = random.uniform(0.3, 1.0) 17 | """ 18 | PRE_SCALES = [0.3, 0.45, 0.6, 0.8, 1.0] 19 | scale = random.choice(PRE_SCALES) 20 | short_side = min(width, height) 21 | w = int(scale * short_side) 22 | h = w 23 | 24 | if width == w: 25 | l = 0 26 | else: 27 | l = random.randrange(width - w) 28 | if height == h: 29 | t = 0 30 | else: 31 | t = random.randrange(height - h) 32 | roi = np.array((l, t, l + w, t + h)) 33 | 34 | value = matrix_iof(boxes, roi[np.newaxis]) 35 | flag = (value >= 1) 36 | if not flag.any(): 37 | continue 38 | 39 | centers = (boxes[:, :2] + boxes[:, 2:]) / 2 40 | mask_a = np.logical_and(roi[:2] < centers, centers < roi[2:]).all(axis=1) 41 | boxes_t = boxes[mask_a].copy() 42 | labels_t = labels[mask_a].copy() 43 | landms_t = landm[mask_a].copy() 44 | landms_t = landms_t.reshape([-1, 5, 2]) 45 | 46 | if boxes_t.shape[0] == 0: 47 | continue 48 | 49 | image_t = image[roi[1]:roi[3], roi[0]:roi[2]] 50 | 51 | boxes_t[:, :2] = np.maximum(boxes_t[:, :2], roi[:2]) 52 | boxes_t[:, :2] -= roi[:2] 53 | boxes_t[:, 2:] = np.minimum(boxes_t[:, 2:], roi[2:]) 54 | boxes_t[:, 2:] -= roi[:2] 55 | 56 | # landm 57 | landms_t[:, :, :2] = landms_t[:, :, :2] - roi[:2] 58 | landms_t[:, :, :2] = np.maximum(landms_t[:, :, :2], np.array([0, 0])) 59 | landms_t[:, :, :2] = np.minimum(landms_t[:, :, :2], roi[2:] - roi[:2]) 60 | landms_t = landms_t.reshape([-1, 10]) 61 | 62 | 63 | # make sure that the cropped image contains at least one face > 16 pixel at training image scale 64 | b_w_t = (boxes_t[:, 2] - boxes_t[:, 0] + 1) / w * img_dim 65 | b_h_t = (boxes_t[:, 3] - boxes_t[:, 1] + 1) / h * img_dim 66 | mask_b = np.minimum(b_w_t, b_h_t) > 0.0 67 | boxes_t = boxes_t[mask_b] 68 | labels_t = labels_t[mask_b] 69 | landms_t = landms_t[mask_b] 70 | 71 | if boxes_t.shape[0] == 0: 72 | continue 73 | 74 | pad_image_flag = False 75 | 76 | return image_t, boxes_t, labels_t, landms_t, pad_image_flag 77 | return image, boxes, labels, landm, pad_image_flag 78 | 79 | 80 | def _distort(image): 81 | 82 | def _convert(image, alpha=1, beta=0): 83 | tmp = image.astype(float) * alpha + beta 84 | tmp[tmp < 0] = 0 85 | tmp[tmp > 255] = 255 86 | image[:] = tmp 87 | 88 | image = image.copy() 89 | 90 | if random.randrange(2): 91 | 92 | #brightness distortion 93 | if random.randrange(2): 94 | _convert(image, beta=random.uniform(-32, 32)) 95 | 96 | #contrast distortion 97 | if random.randrange(2): 98 | _convert(image, alpha=random.uniform(0.5, 1.5)) 99 | 100 | image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) 101 | 102 | #saturation distortion 103 | if random.randrange(2): 104 | _convert(image[:, :, 1], alpha=random.uniform(0.5, 1.5)) 105 | 106 | #hue distortion 107 | if random.randrange(2): 108 | tmp = image[:, :, 0].astype(int) + random.randint(-18, 18) 109 | tmp %= 180 110 | image[:, :, 0] = tmp 111 | 112 | image = cv2.cvtColor(image, cv2.COLOR_HSV2BGR) 113 | 114 | else: 115 | 116 | #brightness distortion 117 | if random.randrange(2): 118 | _convert(image, beta=random.uniform(-32, 32)) 119 | 120 | image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) 121 | 122 | #saturation distortion 123 | if random.randrange(2): 124 | _convert(image[:, :, 1], alpha=random.uniform(0.5, 1.5)) 125 | 126 | #hue distortion 127 | if random.randrange(2): 128 | tmp = image[:, :, 0].astype(int) + random.randint(-18, 18) 129 | tmp %= 180 130 | image[:, :, 0] = tmp 131 | 132 | image = cv2.cvtColor(image, cv2.COLOR_HSV2BGR) 133 | 134 | #contrast distortion 135 | if random.randrange(2): 136 | _convert(image, alpha=random.uniform(0.5, 1.5)) 137 | 138 | return image 139 | 140 | 141 | def _expand(image, boxes, fill, p): 142 | if random.randrange(2): 143 | return image, boxes 144 | 145 | height, width, depth = image.shape 146 | 147 | scale = random.uniform(1, p) 148 | w = int(scale * width) 149 | h = int(scale * height) 150 | 151 | left = random.randint(0, w - width) 152 | top = random.randint(0, h - height) 153 | 154 | boxes_t = boxes.copy() 155 | boxes_t[:, :2] += (left, top) 156 | boxes_t[:, 2:] += (left, top) 157 | expand_image = np.empty( 158 | (h, w, depth), 159 | dtype=image.dtype) 160 | expand_image[:, :] = fill 161 | expand_image[top:top + height, left:left + width] = image 162 | image = expand_image 163 | 164 | return image, boxes_t 165 | 166 | 167 | def _mirror(image, boxes, landms): 168 | _, width, _ = image.shape 169 | if random.randrange(2): 170 | image = image[:, ::-1] 171 | boxes = boxes.copy() 172 | boxes[:, 0::2] = width - boxes[:, 2::-2] 173 | 174 | # landm 175 | landms = landms.copy() 176 | landms = landms.reshape([-1, 5, 2]) 177 | landms[:, :, 0] = width - landms[:, :, 0] 178 | tmp = landms[:, 1, :].copy() 179 | landms[:, 1, :] = landms[:, 0, :] 180 | landms[:, 0, :] = tmp 181 | tmp1 = landms[:, 4, :].copy() 182 | landms[:, 4, :] = landms[:, 3, :] 183 | landms[:, 3, :] = tmp1 184 | landms = landms.reshape([-1, 10]) 185 | 186 | return image, boxes, landms 187 | 188 | 189 | def _pad_to_square(image, rgb_mean, pad_image_flag): 190 | if not pad_image_flag: 191 | return image 192 | height, width, _ = image.shape 193 | long_side = max(width, height) 194 | image_t = np.empty((long_side, long_side, 3), dtype=image.dtype) 195 | image_t[:, :] = rgb_mean 196 | image_t[0:0 + height, 0:0 + width] = image 197 | return image_t 198 | 199 | 200 | def _resize_subtract_mean(image, insize, rgb_mean): 201 | interp_methods = [cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_NEAREST, cv2.INTER_LANCZOS4] 202 | interp_method = interp_methods[random.randrange(5)] 203 | image = cv2.resize(image, (insize, insize), interpolation=interp_method) 204 | image = image.astype(np.float32) 205 | image -= rgb_mean 206 | return image.transpose(2, 0, 1) 207 | 208 | 209 | class preproc(object): 210 | 211 | def __init__(self, img_dim, rgb_means): 212 | self.img_dim = img_dim 213 | self.rgb_means = rgb_means 214 | 215 | def __call__(self, image, targets): 216 | assert targets.shape[0] > 0, "this image does not have gt" 217 | 218 | boxes = targets[:, :4].copy() 219 | labels = targets[:, -1].copy() 220 | landm = targets[:, 4:-1].copy() 221 | 222 | image_t, boxes_t, labels_t, landm_t, pad_image_flag = _crop(image, boxes, labels, landm, self.img_dim) 223 | image_t = _distort(image_t) 224 | image_t = _pad_to_square(image_t,self.rgb_means, pad_image_flag) 225 | image_t, boxes_t, landm_t = _mirror(image_t, boxes_t, landm_t) 226 | height, width, _ = image_t.shape 227 | image_t = _resize_subtract_mean(image_t, self.img_dim, self.rgb_means) 228 | boxes_t[:, 0::2] /= width 229 | boxes_t[:, 1::2] /= height 230 | 231 | landm_t[:, 0::2] /= width 232 | landm_t[:, 1::2] /= height 233 | 234 | labels_t = np.expand_dims(labels_t, 1) 235 | targets_t = np.hstack((boxes_t, landm_t, labels_t)) 236 | 237 | return image_t, targets_t 238 | -------------------------------------------------------------------------------- /data/wider_face.py: -------------------------------------------------------------------------------- 1 | import os 2 | import os.path 3 | import sys 4 | import torch 5 | import torch.utils.data as data 6 | import cv2 7 | import numpy as np 8 | 9 | class WiderFaceDetection(data.Dataset): 10 | def __init__(self, txt_path, preproc=None): 11 | self.preproc = preproc 12 | self.imgs_path = [] 13 | self.words = [] 14 | f = open(txt_path,'r') 15 | lines = f.readlines() 16 | isFirst = True 17 | labels = [] 18 | for line in lines: 19 | line = line.rstrip() 20 | if line.startswith('#'): 21 | if isFirst is True: 22 | isFirst = False 23 | else: 24 | labels_copy = labels.copy() 25 | self.words.append(labels_copy) 26 | labels.clear() 27 | path = line[2:] 28 | path = txt_path.replace('label.txt','images/') + path 29 | self.imgs_path.append(path) 30 | else: 31 | line = line.split(' ') 32 | label = [float(x) for x in line] 33 | labels.append(label) 34 | 35 | self.words.append(labels) 36 | 37 | def __len__(self): 38 | return len(self.imgs_path) 39 | 40 | def __getitem__(self, index): 41 | img = cv2.imread(self.imgs_path[index]) 42 | height, width, _ = img.shape 43 | 44 | labels = self.words[index] 45 | annotations = np.zeros((0, 15)) 46 | if len(labels) == 0: 47 | return annotations 48 | for idx, label in enumerate(labels): 49 | annotation = np.zeros((1, 15)) 50 | # bbox 51 | annotation[0, 0] = label[0] # x1 52 | annotation[0, 1] = label[1] # y1 53 | annotation[0, 2] = label[0] + label[2] # x2 54 | annotation[0, 3] = label[1] + label[3] # y2 55 | 56 | # landmarks 57 | annotation[0, 4] = label[4] # l0_x 58 | annotation[0, 5] = label[5] # l0_y 59 | annotation[0, 6] = label[7] # l1_x 60 | annotation[0, 7] = label[8] # l1_y 61 | annotation[0, 8] = label[10] # l2_x 62 | annotation[0, 9] = label[11] # l2_y 63 | annotation[0, 10] = label[13] # l3_x 64 | annotation[0, 11] = label[14] # l3_y 65 | annotation[0, 12] = label[16] # l4_x 66 | annotation[0, 13] = label[17] # l4_y 67 | if (annotation[0, 4]<0): 68 | annotation[0, 14] = -1 69 | else: 70 | annotation[0, 14] = 1 71 | 72 | annotations = np.append(annotations, annotation, axis=0) 73 | target = np.array(annotations) 74 | if self.preproc is not None: 75 | img, target = self.preproc(img, target) 76 | 77 | return torch.from_numpy(img), target 78 | 79 | def detection_collate(batch): 80 | """Custom collate fn for dealing with batches of images that have a different 81 | number of associated object annotations (bounding boxes). 82 | 83 | Arguments: 84 | batch: (tuple) A tuple of tensor images and lists of annotations 85 | 86 | Return: 87 | A tuple containing: 88 | 1) (tensor) batch of images stacked on their 0 dim 89 | 2) (list of tensors) annotations for a given image are stacked on 0 dim 90 | """ 91 | targets = [] 92 | imgs = [] 93 | for _, sample in enumerate(batch): 94 | for _, tup in enumerate(sample): 95 | if torch.is_tensor(tup): 96 | imgs.append(tup) 97 | elif isinstance(tup, type(np.empty(0))): 98 | annos = torch.from_numpy(tup).float() 99 | targets.append(annos) 100 | 101 | return (torch.stack(imgs, 0), targets) 102 | -------------------------------------------------------------------------------- /detect.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import os 3 | import argparse 4 | import torch 5 | import torch.backends.cudnn as cudnn 6 | import numpy as np 7 | from data import cfg_mnetv1, cfg_mnetv2, cfg_mnetv3, cfg_efnetb0 8 | from layers.functions.prior_box import PriorBox 9 | from utils.nms.py_cpu_nms import py_cpu_nms 10 | import cv2 11 | from models.retinaface import RetinaFace 12 | from utils.box_utils import decode, decode_landm 13 | import time 14 | 15 | parser = argparse.ArgumentParser(description='Retinaface') 16 | 17 | parser.add_argument('-m', '--trained_model', default='./weights/mobilenetv2_0.1_Final.pth', 18 | type=str, help='Trained state_dict file path to open') 19 | parser.add_argument('--network', default='mobilenetv2', help='Backbone network mobile0.25 ,mobilenetv2 ,mobilenetv3 or efficientnetb0') 20 | parser.add_argument('--cpu', action="store_true", default=False, help='Use cpu inference') 21 | parser.add_argument('--confidence_threshold', default=0.02, type=float, help='confidence_threshold') 22 | parser.add_argument('--top_k', default=5000, type=int, help='top_k') 23 | parser.add_argument('--nms_threshold', default=0.4, type=float, help='nms_threshold') 24 | parser.add_argument('--keep_top_k', default=750, type=int, help='keep_top_k') 25 | parser.add_argument('-s', '--save_image', action="store_true", default=True, help='show detection results') 26 | parser.add_argument('--vis_thres', default=0.6, type=float, help='visualization_threshold') 27 | args = parser.parse_args() 28 | 29 | 30 | def check_keys(model, pretrained_state_dict): 31 | ckpt_keys = set(pretrained_state_dict.keys()) 32 | model_keys = set(model.state_dict().keys()) 33 | used_pretrained_keys = model_keys & ckpt_keys 34 | unused_pretrained_keys = ckpt_keys - model_keys 35 | missing_keys = model_keys - ckpt_keys 36 | print('Missing keys:{}'.format(len(missing_keys))) 37 | print('Unused checkpoint keys:{}'.format(len(unused_pretrained_keys))) 38 | print('Used keys:{}'.format(len(used_pretrained_keys))) 39 | assert len(used_pretrained_keys) > 0, 'load NONE from pretrained checkpoint' 40 | return True 41 | 42 | 43 | def remove_prefix(state_dict, prefix): 44 | ''' Old style model is stored with all names of parameters sharing common prefix 'module.' ''' 45 | print('remove prefix \'{}\''.format(prefix)) 46 | f = lambda x: x.split(prefix, 1)[-1] if x.startswith(prefix) else x 47 | return {f(key): value for key, value in state_dict.items()} 48 | 49 | 50 | def load_model(model, pretrained_path, load_to_cpu): 51 | print('Loading pretrained model from {}'.format(pretrained_path)) 52 | if load_to_cpu: 53 | pretrained_dict = torch.load(pretrained_path, map_location=lambda storage, loc: storage) 54 | else: 55 | device = torch.cuda.current_device() 56 | pretrained_dict = torch.load(pretrained_path, map_location=lambda storage, loc: storage.cuda(device)) 57 | if "state_dict" in pretrained_dict.keys(): 58 | pretrained_dict = remove_prefix(pretrained_dict['state_dict'], 'module.') 59 | else: 60 | pretrained_dict = remove_prefix(pretrained_dict, 'module.') 61 | check_keys(model, pretrained_dict) 62 | model.load_state_dict(pretrained_dict, strict=False) 63 | return model 64 | 65 | 66 | if __name__ == '__main__': 67 | torch.set_grad_enabled(False) 68 | cfg = None 69 | if args.network == "mobile0.25": 70 | cfg = cfg_mnetv1 71 | elif args.network == "mobilenetv2": 72 | cfg = cfg_mnetv2 73 | elif args.network == "mobilenetv3": 74 | cfg = cfg_mnetv3 75 | elif args.network == "efficientnetb0": 76 | cfg = cfg_efnetb0 77 | # net and model 78 | net = RetinaFace(cfg=cfg, phase = 'test') 79 | net = load_model(net, args.trained_model, args.cpu) 80 | net.eval() 81 | print('Finished loading model!') 82 | print(net) 83 | cudnn.benchmark = True 84 | device = torch.device("cpu" if args.cpu else "cuda") 85 | net = net.to(device) 86 | 87 | resize = 1 88 | 89 | # testing begin 90 | for i in range(1): 91 | image_path = "./images/c2f3bc4499934b8ed942743ee8e3082a.jpg" 92 | img_raw = cv2.imread(image_path, cv2.IMREAD_COLOR) 93 | img = np.float32(img_raw) 94 | 95 | im_height, im_width, _ = img.shape 96 | scale = torch.Tensor([img.shape[1], img.shape[0], img.shape[1], img.shape[0]]) 97 | img -= (104, 117, 123) 98 | img = img.transpose(2, 0, 1) 99 | img = torch.from_numpy(img).unsqueeze(0) 100 | img = img.to(device) 101 | scale = scale.to(device) 102 | 103 | tic = time.time() 104 | loc, conf, landms = net(img) # forward pass 105 | print('net forward time: {:.4f}'.format(time.time() - tic)) 106 | 107 | priorbox = PriorBox(cfg, image_size=(im_height, im_width)) 108 | priors = priorbox.forward() 109 | priors = priors.to(device) 110 | prior_data = priors.data 111 | boxes = decode(loc.data.squeeze(0), prior_data, cfg['variance']) 112 | boxes = boxes * scale / resize 113 | boxes = boxes.cpu().numpy() 114 | scores = conf.squeeze(0).data.cpu().numpy()[:, 1] 115 | landms = decode_landm(landms.data.squeeze(0), prior_data, cfg['variance']) 116 | scale1 = torch.Tensor([img.shape[3], img.shape[2], img.shape[3], img.shape[2], 117 | img.shape[3], img.shape[2], img.shape[3], img.shape[2], 118 | img.shape[3], img.shape[2]]) 119 | scale1 = scale1.to(device) 120 | landms = landms * scale1 / resize 121 | landms = landms.cpu().numpy() 122 | 123 | # ignore low scores 124 | inds = np.where(scores > args.confidence_threshold)[0] 125 | boxes = boxes[inds] 126 | landms = landms[inds] 127 | scores = scores[inds] 128 | 129 | # keep top-K before NMS 130 | order = scores.argsort()[::-1][:args.top_k] 131 | boxes = boxes[order] 132 | landms = landms[order] 133 | scores = scores[order] 134 | 135 | # do NMS 136 | dets = np.hstack((boxes, scores[:, np.newaxis])).astype(np.float32, copy=False) 137 | keep = py_cpu_nms(dets, args.nms_threshold) 138 | # keep = nms(dets, args.nms_threshold,force_cpu=args.cpu) 139 | dets = dets[keep, :] 140 | landms = landms[keep] 141 | 142 | # keep top-K faster NMS 143 | dets = dets[:args.keep_top_k, :] 144 | landms = landms[:args.keep_top_k, :] 145 | 146 | dets = np.concatenate((dets, landms), axis=1) 147 | 148 | # show image 149 | if args.save_image: 150 | for b in dets: 151 | if b[4] < args.vis_thres: 152 | continue 153 | text = "{:.4f}".format(b[4]) 154 | b = list(map(int, b)) 155 | cv2.rectangle(img_raw, (b[0], b[1]), (b[2], b[3]), (0, 0, 255), 4) 156 | cx = b[0] 157 | cy = b[1] + 12 158 | cv2.putText(img_raw, text, (cx, cy), 159 | cv2.FONT_HERSHEY_DUPLEX, 0.5, (255, 255, 255)) 160 | 161 | # landms 162 | cv2.circle(img_raw, (b[5], b[6]), 2, (0, 0, 255), 4) 163 | cv2.circle(img_raw, (b[7], b[8]), 2, (0, 255, 255), 4) 164 | cv2.circle(img_raw, (b[9], b[10]), 2, (255, 0, 255), 4) 165 | cv2.circle(img_raw, (b[11], b[12]), 2, (0, 255, 0), 4) 166 | cv2.circle(img_raw, (b[13], b[14]), 2, (255, 0, 0), 4) 167 | # save image 168 | 169 | name = "mobilenetv22222.jpg" 170 | cv2.imwrite(name, img_raw) 171 | 172 | -------------------------------------------------------------------------------- /images/_-20_1008_0.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/images/_-20_1008_0.jpg -------------------------------------------------------------------------------- /images/_-20_144_2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/images/_-20_144_2.jpg -------------------------------------------------------------------------------- /images/c2f3bc4499934b8ed942743ee8e3082a.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/images/c2f3bc4499934b8ed942743ee8e3082a.jpg -------------------------------------------------------------------------------- /images/mobilenetv22222.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/images/mobilenetv22222.jpg -------------------------------------------------------------------------------- /layers/__init__.py: -------------------------------------------------------------------------------- 1 | from .functions import * 2 | from .modules import * 3 | -------------------------------------------------------------------------------- /layers/functions/prior_box.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from itertools import product as product 3 | import numpy as np 4 | from math import ceil 5 | 6 | 7 | class PriorBox(object): 8 | def __init__(self, cfg, image_size=None, phase='train'): 9 | super(PriorBox, self).__init__() 10 | self.min_sizes = cfg['min_sizes'] 11 | self.steps = cfg['steps'] 12 | self.clip = cfg['clip'] 13 | self.image_size = image_size 14 | self.feature_maps = [[ceil(self.image_size[0]/step), ceil(self.image_size[1]/step)] for step in self.steps] 15 | self.name = "s" 16 | 17 | def forward(self): 18 | anchors = [] 19 | for k, f in enumerate(self.feature_maps): 20 | min_sizes = self.min_sizes[k] 21 | for i, j in product(range(f[0]), range(f[1])): 22 | for min_size in min_sizes: 23 | s_kx = min_size / self.image_size[1] 24 | s_ky = min_size / self.image_size[0] 25 | dense_cx = [x * self.steps[k] / self.image_size[1] for x in [j + 0.5]] 26 | dense_cy = [y * self.steps[k] / self.image_size[0] for y in [i + 0.5]] 27 | for cy, cx in product(dense_cy, dense_cx): 28 | anchors += [cx, cy, s_kx, s_ky] 29 | 30 | # back to torch land 31 | output = torch.Tensor(anchors).view(-1, 4) 32 | if self.clip: 33 | output.clamp_(max=1, min=0) 34 | return output 35 | -------------------------------------------------------------------------------- /layers/modules/__init__.py: -------------------------------------------------------------------------------- 1 | from .multibox_loss import MultiBoxLoss 2 | 3 | __all__ = ['MultiBoxLoss'] 4 | -------------------------------------------------------------------------------- /layers/modules/multibox_loss.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | from torch.autograd import Variable 5 | from utils.box_utils import match, log_sum_exp 6 | from data import cfg_mnetv1, cfg_mnetv3 7 | GPU = cfg_mnetv1['gpu_train'] 8 | 9 | class MultiBoxLoss(nn.Module): 10 | """SSD Weighted Loss Function 11 | Compute Targets: 12 | 1) Produce Confidence Target Indices by matching ground truth boxes 13 | with (default) 'priorboxes' that have jaccard index > threshold parameter 14 | (default threshold: 0.5). 15 | 2) Produce localization target by 'encoding' variance into offsets of ground 16 | truth boxes and their matched 'priorboxes'. 17 | 3) Hard negative mining to filter the excessive number of negative examples 18 | that comes with using a large number of default bounding boxes. 19 | (default negative:positive ratio 3:1) 20 | Objective Loss: 21 | L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N 22 | Where, Lconf is the CrossEntropy Loss and Lloc is the SmoothL1 Loss 23 | weighted by α which is set to 1 by cross val. 24 | Args: 25 | c: class confidences, 26 | l: predicted boxes, 27 | g: ground truth boxes 28 | N: number of matched default boxes 29 | See: https://arxiv.org/pdf/1512.02325.pdf for more details. 30 | """ 31 | 32 | def __init__(self, num_classes, overlap_thresh, prior_for_matching, bkg_label, neg_mining, neg_pos, neg_overlap, encode_target): 33 | super(MultiBoxLoss, self).__init__() 34 | self.num_classes = num_classes 35 | self.threshold = overlap_thresh 36 | self.background_label = bkg_label 37 | self.encode_target = encode_target 38 | self.use_prior_for_matching = prior_for_matching 39 | self.do_neg_mining = neg_mining 40 | self.negpos_ratio = neg_pos 41 | self.neg_overlap = neg_overlap 42 | self.variance = [0.1, 0.2] 43 | 44 | def forward(self, predictions, priors, targets): 45 | """Multibox Loss 46 | Args: 47 | predictions (tuple): A tuple containing loc preds, conf preds, 48 | and prior boxes from SSD net. 49 | conf shape: torch.size(batch_size,num_priors,num_classes) 50 | loc shape: torch.size(batch_size,num_priors,4) 51 | priors shape: torch.size(num_priors,4) 52 | 53 | ground_truth (tensor): Ground truth boxes and labels for a batch, 54 | shape: [batch_size,num_objs,5] (last idx is the label). 55 | """ 56 | 57 | loc_data, conf_data, landm_data = predictions 58 | priors = priors 59 | num = loc_data.size(0) 60 | num_priors = (priors.size(0)) 61 | 62 | # match priors (default boxes) and ground truth boxes 63 | loc_t = torch.Tensor(num, num_priors, 4) 64 | landm_t = torch.Tensor(num, num_priors, 10) 65 | conf_t = torch.LongTensor(num, num_priors) 66 | for idx in range(num): 67 | truths = targets[idx][:, :4].data 68 | labels = targets[idx][:, -1].data 69 | landms = targets[idx][:, 4:14].data 70 | defaults = priors.data 71 | match(self.threshold, truths, defaults, self.variance, labels, landms, loc_t, conf_t, landm_t, idx) 72 | if GPU: 73 | loc_t = loc_t.cuda() 74 | conf_t = conf_t.cuda() 75 | landm_t = landm_t.cuda() 76 | 77 | zeros = torch.tensor(0).cuda() 78 | # landm Loss (Smooth L1) 79 | # Shape: [batch,num_priors,10] 80 | pos1 = conf_t > zeros 81 | num_pos_landm = pos1.long().sum(1, keepdim=True) 82 | N1 = max(num_pos_landm.data.sum().float(), 1) 83 | pos_idx1 = pos1.unsqueeze(pos1.dim()).expand_as(landm_data) 84 | landm_p = landm_data[pos_idx1].view(-1, 10) 85 | landm_t = landm_t[pos_idx1].view(-1, 10) 86 | loss_landm = F.smooth_l1_loss(landm_p, landm_t, reduction='sum') 87 | 88 | 89 | pos = conf_t != zeros 90 | conf_t[pos] = 1 91 | 92 | # Localization Loss (Smooth L1) 93 | # Shape: [batch,num_priors,4] 94 | pos_idx = pos.unsqueeze(pos.dim()).expand_as(loc_data) 95 | loc_p = loc_data[pos_idx].view(-1, 4) 96 | loc_t = loc_t[pos_idx].view(-1, 4) 97 | loss_l = F.smooth_l1_loss(loc_p, loc_t, reduction='sum') 98 | 99 | # Compute max conf across batch for hard negative mining 100 | batch_conf = conf_data.view(-1, self.num_classes) 101 | loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1, 1)) 102 | 103 | # Hard Negative Mining 104 | loss_c[pos.view(-1, 1)] = 0 # filter out pos boxes for now 105 | loss_c = loss_c.view(num, -1) 106 | _, loss_idx = loss_c.sort(1, descending=True) 107 | _, idx_rank = loss_idx.sort(1) 108 | num_pos = pos.long().sum(1, keepdim=True) 109 | num_neg = torch.clamp(self.negpos_ratio*num_pos, max=pos.size(1)-1) 110 | neg = idx_rank < num_neg.expand_as(idx_rank) 111 | 112 | # Confidence Loss Including Positive and Negative Examples 113 | pos_idx = pos.unsqueeze(2).expand_as(conf_data) 114 | neg_idx = neg.unsqueeze(2).expand_as(conf_data) 115 | conf_p = conf_data[(pos_idx+neg_idx).gt(0)].view(-1,self.num_classes) 116 | targets_weighted = conf_t[(pos+neg).gt(0)] 117 | loss_c = F.cross_entropy(conf_p, targets_weighted, reduction='sum') 118 | 119 | # Sum of losses: L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N 120 | N = max(num_pos.data.sum().float(), 1) 121 | loss_l /= N 122 | loss_c /= N 123 | loss_landm /= N1 124 | 125 | return loss_l, loss_c, loss_landm 126 | -------------------------------------------------------------------------------- /models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/models/__init__.py -------------------------------------------------------------------------------- /models/net.py: -------------------------------------------------------------------------------- 1 | import time 2 | import torch 3 | import torch.nn as nn 4 | import torchvision.models._utils as _utils 5 | import torchvision.models as models 6 | import torch.nn.functional as F 7 | from torch.autograd import Variable 8 | from torch.nn import init 9 | from .utils import ( 10 | round_filters, 11 | round_repeats, 12 | drop_connect, 13 | get_same_padding_conv2d, 14 | get_model_params, 15 | efficientnet_params, 16 | load_pretrained_weights, 17 | Swish, 18 | MemoryEfficientSwish, 19 | calculate_output_image_size 20 | ) 21 | 22 | def _make_divisible(v, divisor, min_value=None): 23 | """ 24 | This function is taken from the original tf repo. 25 | It ensures that all layers have a channel number that is divisible by 8 26 | It can be seen here: 27 | https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py 28 | :param v: 29 | :param divisor: 30 | :param min_value: 31 | :return: 32 | """ 33 | if min_value is None: 34 | min_value = divisor 35 | new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) 36 | # Make sure that round down does not go down by more than 10%. 37 | if new_v < 0.9 * v: 38 | new_v += divisor 39 | return new_v 40 | 41 | 42 | def conv_3x3_bn(inp, oup, stride): 43 | return nn.Sequential( 44 | nn.Conv2d(inp, oup, 3, stride, 1, bias=False), 45 | nn.BatchNorm2d(oup), 46 | nn.ReLU6(inplace=True) 47 | ) 48 | 49 | 50 | def conv_1x1_bn(inp, oup): 51 | return nn.Sequential( 52 | nn.Conv2d(inp, oup, 1, 1, 0, bias=False), 53 | nn.BatchNorm2d(oup), 54 | nn.ReLU6(inplace=True) 55 | ) 56 | 57 | 58 | class InvertedResidual(nn.Module): 59 | def __init__(self, inp, oup, stride, expand_ratio): 60 | super(InvertedResidual, self).__init__() 61 | assert stride in [1, 2] 62 | 63 | hidden_dim = round(inp * expand_ratio) 64 | self.identity = stride == 1 and inp == oup 65 | 66 | if expand_ratio == 1: 67 | self.conv = nn.Sequential( 68 | # dw 69 | nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False), 70 | nn.BatchNorm2d(hidden_dim), 71 | nn.ReLU6(inplace=True), 72 | # pw-linear 73 | nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False), 74 | nn.BatchNorm2d(oup), 75 | ) 76 | else: 77 | self.conv = nn.Sequential( 78 | # pw 79 | nn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False), 80 | nn.BatchNorm2d(hidden_dim), 81 | nn.ReLU6(inplace=True), 82 | # dw 83 | nn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False), 84 | nn.BatchNorm2d(hidden_dim), 85 | nn.ReLU6(inplace=True), 86 | # pw-linear 87 | nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False), 88 | nn.BatchNorm2d(oup), 89 | ) 90 | 91 | def forward(self, x): 92 | if self.identity: 93 | return x + self.conv(x) 94 | else: 95 | return self.conv(x) 96 | 97 | class hswish(nn.Module): 98 | def forward(self, x): 99 | out = x * F.relu6(x + 3, inplace=True) / 6 100 | return out 101 | 102 | 103 | class hsigmoid(nn.Module): 104 | def forward(self, x): 105 | out = F.relu6(x + 3, inplace=True) / 6 106 | return out 107 | 108 | 109 | class SeModule(nn.Module): 110 | def __init__(self, in_size, reduction=4): 111 | super(SeModule, self).__init__() 112 | self.se = nn.Sequential( 113 | nn.AdaptiveAvgPool2d(1), 114 | nn.Conv2d(in_size, in_size // reduction, kernel_size=1, stride=1, padding=0, bias=False), 115 | nn.BatchNorm2d(in_size // reduction), 116 | nn.ReLU(inplace=True), 117 | nn.Conv2d(in_size // reduction, in_size, kernel_size=1, stride=1, padding=0, bias=False), 118 | nn.BatchNorm2d(in_size), 119 | hsigmoid() 120 | ) 121 | 122 | def forward(self, x): 123 | return x * self.se(x) 124 | 125 | 126 | class Block(nn.Module): 127 | '''expand + depthwise + pointwise''' 128 | def __init__(self, kernel_size, in_size, expand_size, out_size, nolinear, semodule, stride): 129 | super(Block, self).__init__() 130 | self.stride = stride 131 | self.se = semodule 132 | self.conv1 = nn.Conv2d(in_size, expand_size, kernel_size=1, stride=1, padding=0, bias=False) 133 | self.bn1 = nn.BatchNorm2d(expand_size) 134 | self.nolinear1 = nolinear 135 | self.conv2 = nn.Conv2d(expand_size, expand_size, kernel_size=kernel_size, stride=stride, padding=kernel_size//2, groups=expand_size, bias=False) 136 | self.bn2 = nn.BatchNorm2d(expand_size) 137 | self.nolinear2 = nolinear 138 | self.conv3 = nn.Conv2d(expand_size, out_size, kernel_size=1, stride=1, padding=0, bias=False) 139 | self.bn3 = nn.BatchNorm2d(out_size) 140 | 141 | self.shortcut = nn.Sequential() 142 | if stride == 1 and in_size != out_size: 143 | self.shortcut = nn.Sequential( 144 | nn.Conv2d(in_size, out_size, kernel_size=1, stride=1, padding=0, bias=False), 145 | nn.BatchNorm2d(out_size), 146 | ) 147 | 148 | def forward(self, x): 149 | out = self.nolinear1(self.bn1(self.conv1(x))) 150 | out = self.nolinear2(self.bn2(self.conv2(out))) 151 | out = self.bn3(self.conv3(out)) 152 | if self.se != None: 153 | out = self.se(out) 154 | out = out + self.shortcut(x) if self.stride==1 else out 155 | return out 156 | 157 | def conv_bn(inp, oup, stride = 1, leaky = 0): 158 | return nn.Sequential( 159 | nn.Conv2d(inp, oup, 3, stride, 1, bias=False), 160 | nn.BatchNorm2d(oup), 161 | nn.LeakyReLU(negative_slope=leaky, inplace=True) 162 | ) 163 | 164 | def conv_bn_no_relu(inp, oup, stride): 165 | return nn.Sequential( 166 | nn.Conv2d(inp, oup, 3, stride, 1, bias=False), 167 | nn.BatchNorm2d(oup), 168 | ) 169 | 170 | def conv_bn1X1(inp, oup, stride, leaky=0): 171 | return nn.Sequential( 172 | nn.Conv2d(inp, oup, 1, stride, padding=0, bias=False), 173 | nn.BatchNorm2d(oup), 174 | nn.LeakyReLU(negative_slope=leaky, inplace=True) 175 | ) 176 | 177 | def conv_dw(inp, oup, stride, leaky=0.1): 178 | return nn.Sequential( 179 | nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False), 180 | nn.BatchNorm2d(inp), 181 | nn.LeakyReLU(negative_slope= leaky,inplace=True), 182 | 183 | nn.Conv2d(inp, oup, 1, 1, 0, bias=False), 184 | nn.BatchNorm2d(oup), 185 | nn.LeakyReLU(negative_slope= leaky,inplace=True), 186 | ) 187 | 188 | class SSH(nn.Module): 189 | def __init__(self, in_channel, out_channel): 190 | super(SSH, self).__init__() 191 | assert out_channel % 4 == 0 192 | leaky = 0 193 | if (out_channel <= 64): 194 | leaky = 0.1 195 | self.conv3X3 = conv_bn_no_relu(in_channel, out_channel//2, stride=1) 196 | 197 | self.conv5X5_1 = conv_bn(in_channel, out_channel//4, stride=1, leaky = leaky) 198 | self.conv5X5_2 = conv_bn_no_relu(out_channel//4, out_channel//4, stride=1) 199 | 200 | self.conv7X7_2 = conv_bn(out_channel//4, out_channel//4, stride=1, leaky = leaky) 201 | self.conv7x7_3 = conv_bn_no_relu(out_channel//4, out_channel//4, stride=1) 202 | 203 | def forward(self, input): 204 | conv3X3 = self.conv3X3(input) 205 | 206 | conv5X5_1 = self.conv5X5_1(input) 207 | conv5X5 = self.conv5X5_2(conv5X5_1) 208 | 209 | conv7X7_2 = self.conv7X7_2(conv5X5_1) 210 | conv7X7 = self.conv7x7_3(conv7X7_2) 211 | 212 | out = torch.cat([conv3X3, conv5X5, conv7X7], dim=1) 213 | out = F.relu(out) 214 | return out 215 | 216 | class FPN(nn.Module): 217 | def __init__(self,in_channels_list,out_channels): 218 | super(FPN,self).__init__() 219 | leaky = 0 220 | if (out_channels <= 64): 221 | leaky = 0.1 222 | # self.output1 = conv_bn1X1(in_channels_list[0], out_channels, stride = 1, leaky = leaky) 223 | self.output2 = conv_bn1X1(in_channels_list[0], out_channels, stride = 1, leaky = leaky) 224 | self.output3 = conv_bn1X1(in_channels_list[1], out_channels, stride = 1, leaky = leaky) 225 | 226 | self.merge1 = conv_bn(out_channels, out_channels, leaky = leaky) 227 | self.merge2 = conv_bn(out_channels, out_channels, leaky = leaky) 228 | 229 | def forward(self, input): 230 | # names = list(input.keys()) 231 | input = list(input.values()) 232 | 233 | # output1 = self.output1(input[0]) 234 | output2 = self.output2(input[0]) 235 | output3 = self.output3(input[1]) 236 | # print("output size 1 : {} 2 : {} 3 : {}".format(output1.size(),output2)) 237 | 238 | up3 = F.interpolate(output3, size=[output2.size(2), output2.size(3)], mode="nearest") 239 | output2 = output2 + up3 240 | output2 = self.merge2(output2) 241 | 242 | # up2 = F.interpolate(output2, size=[output1.size(2), output1.size(3)], mode="nearest") 243 | # output1 = output1 + up2 244 | # output1 = self.merge1(output1) 245 | 246 | out = [output2, output3] 247 | return out 248 | 249 | 250 | 251 | class MobileNetV1(nn.Module): 252 | def __init__(self): 253 | super(MobileNetV1, self).__init__() 254 | self.stage1 = nn.Sequential( 255 | conv_bn(3, 8, 2, leaky = 0.1), # 3 ##2 256 | conv_dw(8, 16, 1), # 7 257 | conv_dw(16, 32, 2), # 11 ## 4 258 | conv_dw(32, 32, 1), # 19 259 | conv_dw(32, 64, 2), # 27 ##8 260 | conv_dw(64, 64, 1), # 43 261 | ) 262 | self.stage2 = nn.Sequential( 263 | conv_dw(64, 128, 2), # 43 + 16 = 59 #16 264 | conv_dw(128, 128, 1), # 59 + 32 = 91 265 | conv_dw(128, 128, 1), # 91 + 32 = 123 266 | conv_dw(128, 128, 1), # 123 + 32 = 155 267 | conv_dw(128, 128, 1), # 155 + 32 = 187 268 | conv_dw(128, 128, 1), # 187 + 32 = 219 269 | ) 270 | self.stage3 = nn.Sequential( #32 271 | conv_dw(128, 256, 2), # 219 +3 2 = 241 272 | conv_dw(256, 256, 1), # 241 + 64 = 301 273 | ) 274 | self.avg = nn.AdaptiveAvgPool2d((1,1)) 275 | self.fc = nn.Linear(256, 1000) 276 | 277 | def forward(self, x): 278 | x = self.stage1(x) 279 | x = self.stage2(x) 280 | x = self.stage3(x) 281 | x = self.avg(x) 282 | # x = self.model(x) 283 | x = x.view(-1, 256) 284 | x = self.fc(x) 285 | return x 286 | 287 | class MobileNetV2(nn.Module): 288 | def __init__(self, num_classes=1000, width_mult=0.1): 289 | super(MobileNetV2, self).__init__() 290 | # setting of inverted residual blocks 291 | self.cfgs = [ 292 | # t, c, n, s 293 | [1, 16, 1, 1], 294 | [6, 24, 2, 2], 295 | [6, 32, 3, 2], 296 | [6, 64, 4, 2], 297 | [6, 96, 3, 1], 298 | [6, 160, 3, 2], 299 | [6, 320, 1, 1], 300 | ] 301 | 302 | # building first layer 303 | input_channel = _make_divisible(32 * width_mult, 4) 304 | layers = [conv_3x3_bn(3, input_channel, 2)] 305 | # building inverted residual blocks 306 | block = InvertedResidual 307 | for t, c, n, s in self.cfgs[:3]: 308 | output_channel = _make_divisible(c * width_mult, 4) 309 | for i in range(n): 310 | layers.append(block(input_channel, output_channel, s if i == 0 else 1, t)) 311 | input_channel = output_channel 312 | self.stage1 = nn.Sequential(*layers) 313 | layers2 = [] 314 | for t, c, n, s in self.cfgs[3:5]: 315 | output_channel = _make_divisible(c * width_mult, 4) 316 | for i in range(n): 317 | layers2.append(block(input_channel, output_channel, s if i == 0 else 1, t)) 318 | input_channel = output_channel 319 | self.stage2 = nn.Sequential(*layers2) 320 | layers3 = [] 321 | for t, c, n, s in self.cfgs[5:]: 322 | output_channel = _make_divisible(c * width_mult, 4) 323 | for i in range(n): 324 | layers3.append(block(input_channel, output_channel, s if i == 0 else 1, t)) 325 | input_channel = output_channel 326 | # building last several layers 327 | output_channel = _make_divisible(1280 * width_mult, 4 ) if width_mult > 1.0 else 1280 328 | layers3.append(conv_1x1_bn(input_channel, output_channel)) 329 | self.stage3 = nn.Sequential(*layers3) 330 | self.avgpool = nn.AdaptiveAvgPool2d((1, 1)) 331 | self.classifier = nn.Linear(output_channel, num_classes) 332 | 333 | def forward(self, x): 334 | x = self.stage1(x) 335 | x = self.stage2(x) 336 | x = self.stage3(x) 337 | x = self.avgpool(x) 338 | x = x.view(x.size(0), -1) 339 | x = self.classifier(x) 340 | return x 341 | 342 | 343 | 344 | class MobileNetV3_Small(nn.Module): 345 | def __init__(self, num_classes=1000): 346 | super(MobileNetV3_Small, self).__init__() 347 | self.stage1 = nn.Sequential(nn.Conv2d(3, 16, kernel_size=3, stride=2, padding=1, bias=False), 348 | nn.BatchNorm2d(16), 349 | hswish(), 350 | Block(3, 16, 16, 16, nn.ReLU(inplace=True), SeModule(16), 2), 351 | Block(3, 16, 72, 24, nn.ReLU(inplace=True), None, 2), 352 | Block(3, 24, 88, 24, nn.ReLU(inplace=True), None, 1)) 353 | self.stage2 = nn.Sequential(Block(5, 24, 96, 40, hswish(), SeModule(40), 2), 354 | Block(5, 40, 240, 40, hswish(), SeModule(40), 1), 355 | Block(5, 40, 240, 40, hswish(), SeModule(40), 1), 356 | Block(5, 40, 120, 48, hswish(), SeModule(48), 1), 357 | Block(5, 48, 144, 48, hswish(), SeModule(48), 1)) 358 | self.stage3 = nn.Sequential(Block(5, 48, 288, 96, hswish(), SeModule(96), 2), 359 | Block(5, 96, 576, 96, hswish(), SeModule(96), 1), 360 | Block(5, 96, 576, 96, hswish(), SeModule(96), 1), 361 | nn.Conv2d(96, 576, kernel_size=1, stride=1, padding=0, bias=False), 362 | nn.BatchNorm2d(576), 363 | hswish()) 364 | # self.conv2 = nn.Conv2d(96, 576, kernel_size=1, stride=1, padding=0, bias=False) 365 | # self.bn2 = nn.BatchNorm2d(576) 366 | # self.hs2 = hswish() 367 | self.linear3 = nn.Linear(576, 1280) 368 | self.bn3 = nn.BatchNorm1d(1280) 369 | self.hs3 = hswish() 370 | self.linear4 = nn.Linear(1280, num_classes) 371 | 372 | def forward(self, out): 373 | out = self.stage1(out) 374 | out = self.stage2(out) 375 | out = self.stage3(out) 376 | out = F.avg_pool2d(out, 7) 377 | out = out.view(out.size(0), -1) 378 | out = self.hs3(self.bn3(self.linear3(out))) 379 | out = self.linear4(out) 380 | return out 381 | 382 | class MBConvBlock(nn.Module): 383 | """Mobile Inverted Residual Bottleneck Block. 384 | 385 | Args: 386 | block_args (namedtuple): BlockArgs, defined in utils.py. 387 | global_params (namedtuple): GlobalParam, defined in utils.py. 388 | image_size (tuple or list): [image_height, image_width]. 389 | 390 | References: 391 | [1] https://arxiv.org/abs/1704.04861 (MobileNet v1) 392 | [2] https://arxiv.org/abs/1801.04381 (MobileNet v2) 393 | [3] https://arxiv.org/abs/1905.02244 (MobileNet v3) 394 | """ 395 | 396 | def __init__(self, block_args, global_params, drop_connect_rate=None, image_size=None): 397 | super().__init__() 398 | self._block_args = block_args 399 | self._bn_mom = 1 - global_params.batch_norm_momentum # pytorch's difference from tensorflow 400 | self._bn_eps = global_params.batch_norm_epsilon 401 | self.has_se = (self._block_args.se_ratio is not None) and (0 < self._block_args.se_ratio <= 1) 402 | self.id_skip = block_args.id_skip # whether to use skip connection and drop connect 403 | 404 | # Expansion phase (Inverted Bottleneck) 405 | inp = self._block_args.input_filters # number of input channels 406 | oup = self._block_args.input_filters * self._block_args.expand_ratio # number of output channels 407 | if self._block_args.expand_ratio != 1: 408 | Conv2d = get_same_padding_conv2d(image_size=image_size) 409 | self._expand_conv = Conv2d(in_channels=inp, out_channels=oup, kernel_size=1, bias=False) 410 | self._bn0 = nn.BatchNorm2d(num_features=oup, momentum=self._bn_mom, eps=self._bn_eps) 411 | # image_size = calculate_output_image_size(image_size, 1) <-- this wouldn't modify image_size 412 | 413 | # Depthwise convolution phase 414 | k = self._block_args.kernel_size 415 | s = self._block_args.stride 416 | Conv2d = get_same_padding_conv2d(image_size=image_size) 417 | self._depthwise_conv = Conv2d( 418 | in_channels=oup, out_channels=oup, groups=oup, # groups makes it depthwise 419 | kernel_size=k, stride=s, bias=False) 420 | self._bn1 = nn.BatchNorm2d(num_features=oup, momentum=self._bn_mom, eps=self._bn_eps) 421 | image_size = calculate_output_image_size(image_size, s) 422 | 423 | # Squeeze and Excitation layer, if desired 424 | if self.has_se: 425 | Conv2d = get_same_padding_conv2d(image_size=(1,1)) 426 | num_squeezed_channels = max(1, int(self._block_args.input_filters * self._block_args.se_ratio)) 427 | self._se_reduce = Conv2d(in_channels=oup, out_channels=num_squeezed_channels, kernel_size=1) 428 | self._se_expand = Conv2d(in_channels=num_squeezed_channels, out_channels=oup, kernel_size=1) 429 | 430 | # Pointwise convolution phase 431 | final_oup = self._block_args.output_filters 432 | Conv2d = get_same_padding_conv2d(image_size=image_size) 433 | self._project_conv = Conv2d(in_channels=oup, out_channels=final_oup, kernel_size=1, bias=False) 434 | self._bn2 = nn.BatchNorm2d(num_features=final_oup, momentum=self._bn_mom, eps=self._bn_eps) 435 | self._swish = MemoryEfficientSwish() 436 | self.drop_connect_rate = drop_connect_rate 437 | 438 | def forward(self, inputs): 439 | """MBConvBlock's forward function. 440 | 441 | Args: 442 | inputs (tensor): Input tensor. 443 | drop_connect_rate (bool): Drop connect rate (float, between 0 and 1). 444 | 445 | Returns: 446 | Output of this block after processing. 447 | """ 448 | 449 | # Expansion and Depthwise Convolution 450 | x = inputs 451 | if self._block_args.expand_ratio != 1: 452 | x = self._expand_conv(inputs) 453 | x = self._bn0(x) 454 | x = self._swish(x) 455 | 456 | x = self._depthwise_conv(x) 457 | x = self._bn1(x) 458 | x = self._swish(x) 459 | 460 | # Squeeze and Excitation 461 | if self.has_se: 462 | x_squeezed = F.adaptive_avg_pool2d(x, 1) 463 | x_squeezed = self._se_reduce(x_squeezed) 464 | x_squeezed = self._swish(x_squeezed) 465 | x_squeezed = self._se_expand(x_squeezed) 466 | x = torch.sigmoid(x_squeezed) * x 467 | 468 | # Pointwise Convolution 469 | x = self._project_conv(x) 470 | x = self._bn2(x) 471 | 472 | # Skip connection and drop connect 473 | input_filters, output_filters = self._block_args.input_filters, self._block_args.output_filters 474 | if self.id_skip and self._block_args.stride == 1 and input_filters == output_filters: 475 | # The combination of skip connection and drop connect brings about stochastic depth. 476 | if self.drop_connect_rate: 477 | x = drop_connect(x, p=self.drop_connect_rate, training=self.training) 478 | x = x + inputs # skip connection 479 | return x 480 | 481 | def set_swish(self, memory_efficient=True): 482 | """Sets swish function as memory efficient (for training) or standard (for export). 483 | 484 | Args: 485 | memory_efficient (bool): Whether to use memory-efficient version of swish. 486 | """ 487 | self._swish = MemoryEfficientSwish() if memory_efficient else Swish() 488 | 489 | 490 | 491 | class EfficientNet(nn.Module): 492 | """EfficientNet model. 493 | Most easily loaded with the .from_name or .from_pretrained methods. 494 | 495 | Args: 496 | blocks_args (list[namedtuple]): A list of BlockArgs to construct blocks. 497 | global_params (namedtuple): A set of GlobalParams shared between blocks. 498 | 499 | References: 500 | [1] https://arxiv.org/abs/1905.11946 (EfficientNet) 501 | """ 502 | 503 | def __init__(self, blocks_args=None, global_params=None): 504 | super().__init__() 505 | assert isinstance(blocks_args, list), 'blocks_args should be a list' 506 | assert len(blocks_args) > 0, 'block args must be greater than 0' 507 | self._global_params = global_params 508 | self._blocks_args = blocks_args 509 | 510 | # Batch norm parameters 511 | bn_mom = 1 - self._global_params.batch_norm_momentum 512 | bn_eps = self._global_params.batch_norm_epsilon 513 | 514 | # Get stem static or dynamic convolution depending on image size 515 | image_size = global_params.image_size 516 | Conv2d = get_same_padding_conv2d(image_size=image_size) 517 | 518 | # Stem 519 | in_channels = 3 # rgb 520 | out_channels = round_filters(32, self._global_params) # number of output channels 521 | image_size = calculate_output_image_size(image_size, 2) 522 | # Build blocks 523 | stage1 = [Conv2d(in_channels, out_channels, kernel_size=3, stride=2, bias=False), 524 | nn.BatchNorm2d(num_features=out_channels, momentum=bn_mom, eps=bn_eps), 525 | MemoryEfficientSwish()] 526 | stage2 = [] 527 | stage3 = [] 528 | print("self._global_params.drop_connect_rate:",self._global_params.drop_connect_rate) 529 | for idx, block_args in enumerate(self._blocks_args[:3]): 530 | # Update block input and output filters based on depth multiplier. 531 | block_args = block_args._replace( 532 | input_filters=round_filters(block_args.input_filters, self._global_params), 533 | output_filters=round_filters(block_args.output_filters, self._global_params), 534 | num_repeat=round_repeats(block_args.num_repeat, self._global_params) 535 | ) 536 | drop_connect_rate = self._global_params.drop_connect_rate 537 | if drop_connect_rate: 538 | drop_connect_rate *= float(idx) / len(self._blocks_args) # scale drop connect_rate 539 | 540 | # The first block needs to take care of stride and filter size increase. 541 | stage1.append(MBConvBlock(block_args, self._global_params, drop_connect_rate=drop_connect_rate, image_size=image_size)) 542 | image_size = calculate_output_image_size(image_size, block_args.stride) 543 | if block_args.num_repeat > 1: # modify block_args to keep same output size 544 | block_args = block_args._replace(input_filters=block_args.output_filters, stride=1) 545 | for _ in range(block_args.num_repeat - 1): 546 | stage1.append(MBConvBlock(block_args, self._global_params, drop_connect_rate=drop_connect_rate,image_size=image_size)) 547 | # image_size = calculate_output_image_size(image_size, block_args.stride) # stride = 1 548 | ########################### stage 2 ############################ 549 | for idx, block_args in enumerate(self._blocks_args[3:5]): 550 | 551 | # Update block input and output filters based on depth multiplier. 552 | block_args = block_args._replace( 553 | input_filters=round_filters(block_args.input_filters, self._global_params), 554 | output_filters=round_filters(block_args.output_filters, self._global_params), 555 | num_repeat=round_repeats(block_args.num_repeat, self._global_params) 556 | ) 557 | drop_connect_rate = self._global_params.drop_connect_rate 558 | if drop_connect_rate: 559 | drop_connect_rate *= float(idx+3) / len(self._blocks_args) # scale drop connect_rate 560 | # The first block needs to take care of stride and filter size increase. 561 | stage2.append(MBConvBlock(block_args, self._global_params, drop_connect_rate=drop_connect_rate, image_size=image_size)) 562 | image_size = calculate_output_image_size(image_size, block_args.stride) 563 | if block_args.num_repeat > 1: # modify block_args to keep same output size 564 | block_args = block_args._replace(input_filters=block_args.output_filters, stride=1) 565 | for _ in range(block_args.num_repeat - 1): 566 | stage2.append(MBConvBlock(block_args, self._global_params, drop_connect_rate=drop_connect_rate, image_size=image_size)) 567 | # image_size = calculate_output_image_size(image_size, block_args.stride) # stride = 1 568 | #############################stage 3 ########################### 569 | for idx, block_args in enumerate(self._blocks_args[5:]): 570 | 571 | # Update block input and output filters based on depth multiplier. 572 | block_args = block_args._replace( 573 | input_filters=round_filters(block_args.input_filters, self._global_params), 574 | output_filters=round_filters(block_args.output_filters, self._global_params), 575 | num_repeat=round_repeats(block_args.num_repeat, self._global_params) 576 | ) 577 | drop_connect_rate = self._global_params.drop_connect_rate 578 | if drop_connect_rate: 579 | drop_connect_rate *= float(idx + 5) / len(self._blocks_args) # scale drop connect_rate 580 | # The first block needs to take care of stride and filter size increase. 581 | stage3.append(MBConvBlock(block_args, self._global_params, drop_connect_rate=drop_connect_rate,image_size=image_size)) 582 | image_size = calculate_output_image_size(image_size, block_args.stride) 583 | if block_args.num_repeat > 1: # modify block_args to keep same output size 584 | block_args = block_args._replace(input_filters=block_args.output_filters, stride=1) 585 | for _ in range(block_args.num_repeat - 1): 586 | stage3.append(MBConvBlock(block_args, self._global_params, drop_connect_rate=drop_connect_rate, image_size=image_size)) 587 | # image_size = calculate_output_image_size(image_size, block_args.stride) # stride = 1 588 | 589 | # Head 590 | in_channels = block_args.output_filters # output of final block 591 | out_channels = round_filters(1280, self._global_params) 592 | Conv2d = get_same_padding_conv2d(image_size=image_size) 593 | stage3.extend([Conv2d(in_channels, out_channels, kernel_size=1, bias=False), 594 | nn.BatchNorm2d(num_features=out_channels, momentum=bn_mom, eps=bn_eps), 595 | MemoryEfficientSwish()]) 596 | self.stage1 = nn.Sequential(*stage1) 597 | self.stage2 = nn.Sequential(*stage2) 598 | self.stage3 = nn.Sequential(*stage3) 599 | 600 | # Final linear layer 601 | self._avg_pooling = nn.AdaptiveAvgPool2d(1) 602 | self._dropout = nn.Dropout(self._global_params.dropout_rate) 603 | self._fc = nn.Linear(out_channels, self._global_params.num_classes) 604 | 605 | def set_swish(self, memory_efficient=True): 606 | """Sets swish function as memory efficient (for training) or standard (for export). 607 | 608 | Args: 609 | memory_efficient (bool): Whether to use memory-efficient version of swish. 610 | 611 | """ 612 | self._swish = MemoryEfficientSwish() if memory_efficient else Swish() 613 | for block in self._blocks: 614 | block.set_swish(memory_efficient) 615 | 616 | def forward(self, inputs): 617 | # Convolution layers 618 | print("input size:",inputs.size()) 619 | x = self.stage1(inputs) 620 | print("out size : ",x.size()) 621 | x = self.stage2(x) 622 | x = self.stage3(x) 623 | # Pooling and final linear layer 624 | x = self._avg_pooling(x) 625 | x = x.flatten(start_dim=1) 626 | x = self._dropout(x) 627 | x = self._fc(x) 628 | 629 | return x 630 | 631 | @classmethod 632 | def from_name(cls, model_name='efficientnet-b0', in_channels=3, **override_params): 633 | blocks_args, global_params = get_model_params(model_name, override_params) 634 | model = cls(blocks_args, global_params) 635 | return model 636 | 637 | @classmethod 638 | def get_image_size(cls, model_name): 639 | """Get the input image size for a given efficientnet model. 640 | 641 | Args: 642 | model_name (str): Name for efficientnet. 643 | 644 | Returns: 645 | Input image size (resolution). 646 | """ 647 | cls._check_model_name_is_valid(model_name) 648 | _, _, res, _ = efficientnet_params(model_name) 649 | return res 650 | 651 | 652 | 653 | 654 | -------------------------------------------------------------------------------- /models/retinaface.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torchvision.models.detection.backbone_utils as backbone_utils 4 | import torchvision.models._utils as _utils 5 | import torch.nn.functional as F 6 | from collections import OrderedDict 7 | 8 | from models.net import MobileNetV1 as MobileNetV1 9 | from models.net import MobileNetV3_Small as MobileNetV3 10 | from models.net import MobileNetV2 as MobileNetV2 11 | from models.net import EfficientNet as EfficientNet 12 | from models.net import FPN as FPN 13 | from models.net import SSH as SSH 14 | 15 | 16 | 17 | class ClassHead(nn.Module): 18 | def __init__(self,inchannels=512,num_anchors=3): 19 | super(ClassHead,self).__init__() 20 | self.num_anchors = num_anchors 21 | self.conv1x1 = nn.Conv2d(inchannels,self.num_anchors*2,kernel_size=(1,1),stride=1,padding=0) 22 | 23 | def forward(self,x): 24 | out = self.conv1x1(x) 25 | out = out.permute(0,2,3,1).contiguous() 26 | 27 | return out.view(out.shape[0], -1, 2) 28 | 29 | class BboxHead(nn.Module): 30 | def __init__(self,inchannels=512,num_anchors=3): 31 | super(BboxHead,self).__init__() 32 | self.conv1x1 = nn.Conv2d(inchannels,num_anchors*4,kernel_size=(1,1),stride=1,padding=0) 33 | 34 | def forward(self,x): 35 | out = self.conv1x1(x) 36 | out = out.permute(0,2,3,1).contiguous() 37 | 38 | return out.view(out.shape[0], -1, 4) 39 | 40 | class LandmarkHead(nn.Module): 41 | def __init__(self,inchannels=512,num_anchors=3): 42 | super(LandmarkHead,self).__init__() 43 | self.conv1x1 = nn.Conv2d(inchannels,num_anchors*10,kernel_size=(1,1),stride=1,padding=0) 44 | 45 | def forward(self,x): 46 | out = self.conv1x1(x) 47 | out = out.permute(0,2,3,1).contiguous() 48 | 49 | return out.view(out.shape[0], -1, 10) 50 | 51 | class RetinaFace(nn.Module): 52 | def __init__(self, cfg = None, phase = 'train'): 53 | """ 54 | :param cfg: Network related settings. 55 | :param phase: train or test. 56 | """ 57 | super(RetinaFace,self).__init__() 58 | self.phase = phase 59 | backbone = None 60 | if cfg['name'] == 'mobilenet0.25': 61 | backbone = MobileNetV1() 62 | if cfg['pretrain']: 63 | checkpoint = torch.load("./weights/mobilenetV1X0.25_pretrain.tar", map_location=torch.device('cpu')) 64 | from collections import OrderedDict 65 | new_state_dict = OrderedDict() 66 | for k, v in checkpoint['state_dict'].items(): 67 | name = k[7:] # remove module. 68 | new_state_dict[name] = v 69 | # load params 70 | backbone.load_state_dict(new_state_dict) 71 | elif cfg['name'] == 'Resnet50': 72 | import torchvision.models as models 73 | backbone = models.resnet50(pretrained=cfg['pretrain']) 74 | elif cfg['name'] == 'mobilenetv3': 75 | backbone = MobileNetV3() 76 | # print(backbone) 77 | if cfg['pretrain']: 78 | checkpoint = torch.load("./weights/mobilenetv3.pth", map_location=torch.device('cpu')) 79 | print("Pretrained Weights : ",type(checkpoint)) 80 | backbone.load_state_dict(checkpoint) 81 | elif cfg['name'] == 'mobilenetv2_0.1': 82 | backbone = MobileNetV2() 83 | if cfg['pretrain']: 84 | checkpoint = torch.load("./weights/mobilenetv2_0.1_face.pth", map_location=torch.device('cpu')) 85 | backbone.load_state_dict(checkpoint) 86 | elif cfg['name'] == 'efficientnetb0': 87 | backbone = EfficientNet.from_name("efficientnet-b0") 88 | if cfg['pretrain']: 89 | checkpoint = torch.load("./weights/efficientnetb0_face.pth", map_location=torch.device('cpu')) 90 | backbone.load_state_dict(checkpoint) 91 | print("succeed loaded weights...") 92 | self.body = _utils.IntermediateLayerGetter(backbone, cfg['return_layers']) 93 | if cfg['name'] == 'mobilenet0.25': 94 | in_channels_stage2 = cfg['in_channel'] 95 | in_channels_list = [ 96 | # in_channels_stage2 * 2, 97 | in_channels_stage2*4, 98 | in_channels_stage2*8, 99 | ] 100 | elif cfg['name'] == 'mobilenetv2_0.1': 101 | in_channels_stage2 = cfg['in_channel1'] 102 | in_channels_stage3 = cfg['in_channel2'] 103 | in_channels_list = [ 104 | # in_channels_stage2 * 2, 105 | in_channels_stage2, 106 | in_channels_stage3, 107 | ] 108 | elif cfg['name'] == 'mobilenetv3': 109 | in_channels_stage2 = cfg['in_channel1'] 110 | in_channels_stage3 = cfg['in_channel2'] 111 | in_channels_list = [ 112 | # in_channels_stage2 * 2, 113 | in_channels_stage2, 114 | in_channels_stage3, 115 | ] 116 | elif cfg['name'] == 'efficientnetb0': 117 | in_channels_stage2 = cfg['in_channel1'] 118 | in_channels_stage3 = cfg['in_channel2'] 119 | in_channels_list = [ 120 | # in_channels_stage2 * 2, 121 | in_channels_stage2, 122 | in_channels_stage3, 123 | ] 124 | out_channels = cfg['out_channel'] 125 | self.fpn = FPN(in_channels_list,out_channels) 126 | # self.ssh1 = SSH(out_channels, out_channels) 127 | self.ssh2 = SSH(out_channels, out_channels) 128 | self.ssh3 = SSH(out_channels, out_channels) 129 | 130 | self.ClassHead = self._make_class_head(fpn_num=2, inchannels=cfg['out_channel']) 131 | self.BboxHead = self._make_bbox_head(fpn_num=2, inchannels=cfg['out_channel']) 132 | self.LandmarkHead = self._make_landmark_head(fpn_num=2, inchannels=cfg['out_channel']) 133 | 134 | def _make_class_head(self,fpn_num=2,inchannels=64,anchor_num=2): 135 | classhead = nn.ModuleList() 136 | for i in range(fpn_num): 137 | classhead.append(ClassHead(inchannels,anchor_num)) 138 | return classhead 139 | 140 | def _make_bbox_head(self,fpn_num=2,inchannels=64,anchor_num=2): 141 | bboxhead = nn.ModuleList() 142 | for i in range(fpn_num): 143 | bboxhead.append(BboxHead(inchannels,anchor_num)) 144 | return bboxhead 145 | 146 | def _make_landmark_head(self,fpn_num=2,inchannels=64,anchor_num=2): 147 | landmarkhead = nn.ModuleList() 148 | for i in range(fpn_num): 149 | landmarkhead.append(LandmarkHead(inchannels,anchor_num)) 150 | return landmarkhead 151 | 152 | def forward(self,inputs): 153 | out = self.body(inputs) 154 | # print("out size : ",out.size()) 155 | # FPN 156 | fpn = self.fpn(out) 157 | 158 | # SSH 159 | # feature1 = self.ssh1(fpn[0]) 160 | feature2 = self.ssh2(fpn[0]) 161 | feature3 = self.ssh3(fpn[1]) 162 | # print("Feature1 size {} 2 size : {}".format(feature2.size(),feature3.size())) 163 | # features = [feature1, feature2, feature3] 164 | features = [feature2, feature3] 165 | 166 | bbox_regressions = torch.cat([self.BboxHead[i](feature) for i, feature in enumerate(features)], dim=1) 167 | classifications = torch.cat([self.ClassHead[i](feature) for i, feature in enumerate(features)],dim=1) 168 | ldm_regressions = torch.cat([self.LandmarkHead[i](feature) for i, feature in enumerate(features)], dim=1) 169 | 170 | if self.phase == 'train': 171 | output = (bbox_regressions, classifications, ldm_regressions) 172 | else: 173 | output = (bbox_regressions, F.softmax(classifications, dim=-1), ldm_regressions) 174 | return output 175 | 176 | -------------------------------------------------------------------------------- /models/utils.py: -------------------------------------------------------------------------------- 1 | """utils.py - Helper functions for building the model and for loading model parameters. 2 | These helper functions are built to mirror those in the official TensorFlow implementation. 3 | """ 4 | 5 | # Author: lukemelas (github username) 6 | # Github repo: https://github.com/lukemelas/EfficientNet-PyTorch 7 | # With adjustments and added comments by workingcoder (github username). 8 | 9 | import re 10 | import math 11 | import collections 12 | from functools import partial 13 | import torch 14 | from torch import nn 15 | from torch.nn import functional as F 16 | from torch.utils import model_zoo 17 | 18 | 19 | ################################################################################ 20 | ### Help functions for model architecture 21 | ################################################################################ 22 | 23 | # GlobalParams and BlockArgs: Two namedtuples 24 | # Swish and MemoryEfficientSwish: Two implementations of the method 25 | # round_filters and round_repeats: 26 | # Functions to calculate params for scaling model width and depth ! ! ! 27 | # get_width_and_height_from_size and calculate_output_image_size 28 | # drop_connect: A structural design 29 | # get_same_padding_conv2d: 30 | # Conv2dDynamicSamePadding 31 | # Conv2dStaticSamePadding 32 | # get_same_padding_maxPool2d: 33 | # MaxPool2dDynamicSamePadding 34 | # MaxPool2dStaticSamePadding 35 | # It's an additional function, not used in EfficientNet, 36 | # but can be used in other model (such as EfficientDet). 37 | # Identity: An implementation of identical mapping 38 | 39 | # Parameters for the entire model (stem, all blocks, and head) 40 | GlobalParams = collections.namedtuple('GlobalParams', [ 41 | 'width_coefficient', 'depth_coefficient', 'image_size', 'dropout_rate', 42 | 'num_classes', 'batch_norm_momentum', 'batch_norm_epsilon', 43 | 'drop_connect_rate', 'depth_divisor', 'min_depth']) 44 | 45 | # Parameters for an individual model block 46 | BlockArgs = collections.namedtuple('BlockArgs', [ 47 | 'num_repeat', 'kernel_size', 'stride', 'expand_ratio', 48 | 'input_filters', 'output_filters', 'se_ratio', 'id_skip']) 49 | 50 | # Set GlobalParams and BlockArgs's defaults 51 | GlobalParams.__new__.__defaults__ = (None,) * len(GlobalParams._fields) 52 | BlockArgs.__new__.__defaults__ = (None,) * len(BlockArgs._fields) 53 | 54 | 55 | # An ordinary implementation of Swish function 56 | class Swish(nn.Module): 57 | def forward(self, x): 58 | return x * torch.sigmoid(x) 59 | 60 | 61 | # A memory-efficient implementation of Swish function 62 | class SwishImplementation(torch.autograd.Function): 63 | @staticmethod 64 | def forward(ctx, i): 65 | result = i * torch.sigmoid(i) 66 | ctx.save_for_backward(i) 67 | return result 68 | 69 | @staticmethod 70 | def backward(ctx, grad_output): 71 | i = ctx.saved_tensors[0] 72 | sigmoid_i = torch.sigmoid(i) 73 | return grad_output * (sigmoid_i * (1 + i * (1 - sigmoid_i))) 74 | 75 | class MemoryEfficientSwish(nn.Module): 76 | def forward(self, x): 77 | return SwishImplementation.apply(x) 78 | 79 | 80 | def round_filters(filters, global_params): 81 | """Calculate and round number of filters based on width multiplier. 82 | Use width_coefficient, depth_divisor and min_depth of global_params. 83 | 84 | Args: 85 | filters (int): Filters number to be calculated. 86 | global_params (namedtuple): Global params of the model. 87 | 88 | Returns: 89 | new_filters: New filters number after calculating. 90 | """ 91 | multiplier = global_params.width_coefficient 92 | if not multiplier: 93 | return filters 94 | # TODO: modify the params names. 95 | # maybe the names (width_divisor,min_width) 96 | # are more suitable than (depth_divisor,min_depth). 97 | divisor = global_params.depth_divisor 98 | min_depth = global_params.min_depth 99 | filters *= multiplier 100 | min_depth = min_depth or divisor # pay attention to this line when using min_depth 101 | # follow the formula transferred from official TensorFlow implementation 102 | new_filters = max(min_depth, int(filters + divisor / 2) // divisor * divisor) 103 | if new_filters < 0.9 * filters: # prevent rounding by more than 10% 104 | new_filters += divisor 105 | return int(new_filters) 106 | 107 | 108 | def round_repeats(repeats, global_params): 109 | """Calculate module's repeat number of a block based on depth multiplier. 110 | Use depth_coefficient of global_params. 111 | 112 | Args: 113 | repeats (int): num_repeat to be calculated. 114 | global_params (namedtuple): Global params of the model. 115 | 116 | Returns: 117 | new repeat: New repeat number after calculating. 118 | """ 119 | multiplier = global_params.depth_coefficient 120 | if not multiplier: 121 | return repeats 122 | # follow the formula transferred from official TensorFlow implementation 123 | return int(math.ceil(multiplier * repeats)) 124 | 125 | 126 | def drop_connect(inputs, p, training): 127 | """Drop connect. 128 | 129 | Args: 130 | input (tensor: BCWH): Input of this structure. 131 | p (float: 0.0~1.0): Probability of drop connection. 132 | training (bool): The running mode. 133 | 134 | Returns: 135 | output: Output after drop connection. 136 | """ 137 | assert p >= 0 and p <= 1, 'p must be in range of [0,1]' 138 | 139 | if not training: 140 | return inputs 141 | 142 | batch_size = inputs.shape[0] 143 | keep_prob = 1 - p 144 | 145 | # generate binary_tensor mask according to probability (p for 0, 1-p for 1) 146 | random_tensor = keep_prob 147 | random_tensor += torch.rand([batch_size, 1, 1, 1], dtype=inputs.dtype, device=inputs.device) 148 | binary_tensor = torch.floor(random_tensor) 149 | 150 | output = inputs / keep_prob * binary_tensor 151 | return output 152 | 153 | 154 | def get_width_and_height_from_size(x): 155 | """Obtain height and width from x. 156 | 157 | Args: 158 | x (int, tuple or list): Data size. 159 | 160 | Returns: 161 | size: A tuple or list (H,W). 162 | """ 163 | if isinstance(x, int): 164 | return x, x 165 | if isinstance(x, list) or isinstance(x, tuple): 166 | return x 167 | else: 168 | raise TypeError() 169 | 170 | 171 | def calculate_output_image_size(input_image_size, stride): 172 | """Calculates the output image size when using Conv2dSamePadding with a stride. 173 | Necessary for static padding. Thanks to mannatsingh for pointing this out. 174 | 175 | Args: 176 | input_image_size (int, tuple or list): Size of input image. 177 | stride (int, tuple or list): Conv2d operation's stride. 178 | 179 | Returns: 180 | output_image_size: A list [H,W]. 181 | """ 182 | if input_image_size is None: 183 | return None 184 | image_height, image_width = get_width_and_height_from_size(input_image_size) 185 | stride = stride if isinstance(stride, int) else stride[0] 186 | image_height = int(math.ceil(image_height / stride)) 187 | image_width = int(math.ceil(image_width / stride)) 188 | return [image_height, image_width] 189 | 190 | 191 | # Note: 192 | # The following 'SamePadding' functions make output size equal ceil(input size/stride). 193 | # Only when stride equals 1, can the output size be the same as input size. 194 | # Don't be confused by their function names ! ! ! 195 | 196 | def get_same_padding_conv2d(image_size=None): 197 | """Chooses static padding if you have specified an image size, and dynamic padding otherwise. 198 | Static padding is necessary for ONNX exporting of models. 199 | 200 | Args: 201 | image_size (int or tuple): Size of the image. 202 | 203 | Returns: 204 | Conv2dDynamicSamePadding or Conv2dStaticSamePadding. 205 | """ 206 | if image_size is None: 207 | return Conv2dDynamicSamePadding 208 | else: 209 | return partial(Conv2dStaticSamePadding, image_size=image_size) 210 | 211 | 212 | class Conv2dDynamicSamePadding(nn.Conv2d): 213 | """2D Convolutions like TensorFlow, for a dynamic image size. 214 | The padding is operated in forward function by calculating dynamically. 215 | """ 216 | 217 | # Tips for 'SAME' mode padding. 218 | # Given the following: 219 | # i: width or height 220 | # s: stride 221 | # k: kernel size 222 | # d: dilation 223 | # p: padding 224 | # Output after Conv2d: 225 | # o = floor((i+p-((k-1)*d+1))/s+1) 226 | # If o equals i, i = floor((i+p-((k-1)*d+1))/s+1), 227 | # => p = (i-1)*s+((k-1)*d+1)-i 228 | 229 | def __init__(self, in_channels, out_channels, kernel_size, stride=1, dilation=1, groups=1, bias=True): 230 | super().__init__(in_channels, out_channels, kernel_size, stride, 0, dilation, groups, bias) 231 | self.stride = self.stride if len(self.stride) == 2 else [self.stride[0]] * 2 232 | 233 | def forward(self, x): 234 | ih, iw = x.size()[-2:] 235 | kh, kw = self.weight.size()[-2:] 236 | sh, sw = self.stride 237 | oh, ow = math.ceil(ih / sh), math.ceil(iw / sw) # change the output size according to stride ! ! ! 238 | pad_h = max((oh - 1) * self.stride[0] + (kh - 1) * self.dilation[0] + 1 - ih, 0) 239 | pad_w = max((ow - 1) * self.stride[1] + (kw - 1) * self.dilation[1] + 1 - iw, 0) 240 | if pad_h > 0 or pad_w > 0: 241 | x = F.pad(x, [pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2]) 242 | return F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups) 243 | 244 | 245 | class Conv2dStaticSamePadding(nn.Conv2d): 246 | """2D Convolutions like TensorFlow's 'SAME' mode, with the given input image size. 247 | The padding mudule is calculated in construction function, then used in forward. 248 | """ 249 | 250 | # With the same calculation as Conv2dDynamicSamePadding 251 | 252 | def __init__(self, in_channels, out_channels, kernel_size, stride=1, image_size=None, **kwargs): 253 | super().__init__(in_channels, out_channels, kernel_size, stride, **kwargs) 254 | self.stride = self.stride if len(self.stride) == 2 else [self.stride[0]] * 2 255 | 256 | # Calculate padding based on image size and save it 257 | assert image_size is not None 258 | ih, iw = (image_size, image_size) if isinstance(image_size, int) else image_size 259 | kh, kw = self.weight.size()[-2:] 260 | sh, sw = self.stride 261 | oh, ow = math.ceil(ih / sh), math.ceil(iw / sw) 262 | pad_h = max((oh - 1) * self.stride[0] + (kh - 1) * self.dilation[0] + 1 - ih, 0) 263 | pad_w = max((ow - 1) * self.stride[1] + (kw - 1) * self.dilation[1] + 1 - iw, 0) 264 | if pad_h > 0 or pad_w > 0: 265 | self.static_padding = nn.ZeroPad2d((pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2)) 266 | else: 267 | self.static_padding = Identity() 268 | 269 | def forward(self, x): 270 | x = self.static_padding(x) 271 | x = F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups) 272 | return x 273 | 274 | 275 | def get_same_padding_maxPool2d(image_size=None): 276 | """Chooses static padding if you have specified an image size, and dynamic padding otherwise. 277 | Static padding is necessary for ONNX exporting of models. 278 | 279 | Args: 280 | image_size (int or tuple): Size of the image. 281 | 282 | Returns: 283 | MaxPool2dDynamicSamePadding or MaxPool2dStaticSamePadding. 284 | """ 285 | if image_size is None: 286 | return MaxPool2dDynamicSamePadding 287 | else: 288 | return partial(MaxPool2dStaticSamePadding, image_size=image_size) 289 | 290 | 291 | class MaxPool2dDynamicSamePadding(nn.MaxPool2d): 292 | """2D MaxPooling like TensorFlow's 'SAME' mode, with a dynamic image size. 293 | The padding is operated in forward function by calculating dynamically. 294 | """ 295 | 296 | def __init__(self, kernel_size, stride, padding=0, dilation=1, return_indices=False, ceil_mode=False): 297 | super().__init__(kernel_size, stride, padding, dilation, return_indices, ceil_mode) 298 | self.stride = [self.stride] * 2 if isinstance(self.stride, int) else self.stride 299 | self.kernel_size = [self.kernel_size] * 2 if isinstance(self.kernel_size, int) else self.kernel_size 300 | self.dilation = [self.dilation] * 2 if isinstance(self.dilation, int) else self.dilation 301 | 302 | def forward(self, x): 303 | ih, iw = x.size()[-2:] 304 | kh, kw = self.kernel_size 305 | sh, sw = self.stride 306 | oh, ow = math.ceil(ih / sh), math.ceil(iw / sw) 307 | pad_h = max((oh - 1) * self.stride[0] + (kh - 1) * self.dilation[0] + 1 - ih, 0) 308 | pad_w = max((ow - 1) * self.stride[1] + (kw - 1) * self.dilation[1] + 1 - iw, 0) 309 | if pad_h > 0 or pad_w > 0: 310 | x = F.pad(x, [pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2]) 311 | return F.max_pool2d(x, self.kernel_size, self.stride, self.padding, 312 | self.dilation, self.ceil_mode, self.return_indices) 313 | 314 | class MaxPool2dStaticSamePadding(nn.MaxPool2d): 315 | """2D MaxPooling like TensorFlow's 'SAME' mode, with the given input image size. 316 | The padding mudule is calculated in construction function, then used in forward. 317 | """ 318 | 319 | def __init__(self, kernel_size, stride, image_size=None, **kwargs): 320 | super().__init__(kernel_size, stride, **kwargs) 321 | self.stride = [self.stride] * 2 if isinstance(self.stride, int) else self.stride 322 | self.kernel_size = [self.kernel_size] * 2 if isinstance(self.kernel_size, int) else self.kernel_size 323 | self.dilation = [self.dilation] * 2 if isinstance(self.dilation, int) else self.dilation 324 | 325 | # Calculate padding based on image size and save it 326 | assert image_size is not None 327 | ih, iw = (image_size, image_size) if isinstance(image_size, int) else image_size 328 | kh, kw = self.kernel_size 329 | sh, sw = self.stride 330 | oh, ow = math.ceil(ih / sh), math.ceil(iw / sw) 331 | pad_h = max((oh - 1) * self.stride[0] + (kh - 1) * self.dilation[0] + 1 - ih, 0) 332 | pad_w = max((ow - 1) * self.stride[1] + (kw - 1) * self.dilation[1] + 1 - iw, 0) 333 | if pad_h > 0 or pad_w > 0: 334 | self.static_padding = nn.ZeroPad2d((pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2)) 335 | else: 336 | self.static_padding = Identity() 337 | 338 | def forward(self, x): 339 | x = self.static_padding(x) 340 | x = F.max_pool2d(x, self.kernel_size, self.stride, self.padding, 341 | self.dilation, self.ceil_mode, self.return_indices) 342 | return x 343 | 344 | class Identity(nn.Module): 345 | """Identity mapping. 346 | Send input to output directly. 347 | """ 348 | 349 | def __init__(self): 350 | super(Identity, self).__init__() 351 | 352 | def forward(self, input): 353 | return input 354 | 355 | 356 | ################################################################################ 357 | ### Helper functions for loading model params 358 | ################################################################################ 359 | 360 | # BlockDecoder: A Class for encoding and decoding BlockArgs 361 | # efficientnet_params: A function to query compound coefficient 362 | # get_model_params and efficientnet: 363 | # Functions to get BlockArgs and GlobalParams for efficientnet 364 | # url_map and url_map_advprop: Dicts of url_map for pretrained weights 365 | # load_pretrained_weights: A function to load pretrained weights 366 | 367 | class BlockDecoder(object): 368 | """Block Decoder for readability, 369 | straight from the official TensorFlow repository. 370 | """ 371 | 372 | @staticmethod 373 | def _decode_block_string(block_string): 374 | """Get a block through a string notation of arguments. 375 | 376 | Args: 377 | block_string (str): A string notation of arguments. 378 | Examples: 'r1_k3_s11_e1_i32_o16_se0.25_noskip'. 379 | 380 | Returns: 381 | BlockArgs: The namedtuple defined at the top of this file. 382 | """ 383 | assert isinstance(block_string, str) 384 | 385 | ops = block_string.split('_') 386 | options = {} 387 | for op in ops: 388 | splits = re.split(r'(\d.*)', op) 389 | if len(splits) >= 2: 390 | key, value = splits[:2] 391 | options[key] = value 392 | 393 | # Check stride 394 | assert (('s' in options and len(options['s']) == 1) or 395 | (len(options['s']) == 2 and options['s'][0] == options['s'][1])) 396 | 397 | return BlockArgs( 398 | num_repeat=int(options['r']), 399 | kernel_size=int(options['k']), 400 | stride=[int(options['s'][0])], 401 | expand_ratio=int(options['e']), 402 | input_filters=int(options['i']), 403 | output_filters=int(options['o']), 404 | se_ratio=float(options['se']) if 'se' in options else None, 405 | id_skip=('noskip' not in block_string)) 406 | 407 | @staticmethod 408 | def _encode_block_string(block): 409 | """Encode a block to a string. 410 | 411 | Args: 412 | block (namedtuple): A BlockArgs type argument. 413 | 414 | Returns: 415 | block_string: A String form of BlockArgs. 416 | """ 417 | args = [ 418 | 'r%d' % block.num_repeat, 419 | 'k%d' % block.kernel_size, 420 | 's%d%d' % (block.strides[0], block.strides[1]), 421 | 'e%s' % block.expand_ratio, 422 | 'i%d' % block.input_filters, 423 | 'o%d' % block.output_filters 424 | ] 425 | if 0 < block.se_ratio <= 1: 426 | args.append('se%s' % block.se_ratio) 427 | if block.id_skip is False: 428 | args.append('noskip') 429 | return '_'.join(args) 430 | 431 | @staticmethod 432 | def decode(string_list): 433 | """Decode a list of string notations to specify blocks inside the network. 434 | 435 | Args: 436 | string_list (list[str]): A list of strings, each string is a notation of block. 437 | 438 | Returns: 439 | blocks_args: A list of BlockArgs namedtuples of block args. 440 | """ 441 | assert isinstance(string_list, list) 442 | blocks_args = [] 443 | for block_string in string_list: 444 | blocks_args.append(BlockDecoder._decode_block_string(block_string)) 445 | return blocks_args 446 | 447 | @staticmethod 448 | def encode(blocks_args): 449 | """Encode a list of BlockArgs to a list of strings. 450 | 451 | Args: 452 | blocks_args (list[namedtuples]): A list of BlockArgs namedtuples of block args. 453 | 454 | Returns: 455 | block_strings: A list of strings, each string is a notation of block. 456 | """ 457 | block_strings = [] 458 | for block in blocks_args: 459 | block_strings.append(BlockDecoder._encode_block_string(block)) 460 | return block_strings 461 | 462 | 463 | def efficientnet_params(model_name): 464 | """Map EfficientNet model name to parameter coefficients. 465 | 466 | Args: 467 | model_name (str): Model name to be queried. 468 | 469 | Returns: 470 | params_dict[model_name]: A (width,depth,res,dropout) tuple. 471 | """ 472 | params_dict = { 473 | # Coefficients: width,depth,res,dropout 474 | 'efficientnet-b0': (1.0, 1.0, 224, 0.2), 475 | 'efficientnet-b1': (1.0, 1.1, 240, 0.2), 476 | 'efficientnet-b2': (1.1, 1.2, 260, 0.3), 477 | 'efficientnet-b3': (1.2, 1.4, 300, 0.3), 478 | 'efficientnet-b4': (1.4, 1.8, 380, 0.4), 479 | 'efficientnet-b5': (1.6, 2.2, 456, 0.4), 480 | 'efficientnet-b6': (1.8, 2.6, 528, 0.5), 481 | 'efficientnet-b7': (2.0, 3.1, 600, 0.5), 482 | 'efficientnet-b8': (2.2, 3.6, 672, 0.5), 483 | 'efficientnet-l2': (4.3, 5.3, 800, 0.5), 484 | } 485 | return params_dict[model_name] 486 | 487 | 488 | def efficientnet(width_coefficient=None, depth_coefficient=None, image_size=None, 489 | dropout_rate=0.2, drop_connect_rate=0.2, num_classes=1000): 490 | """Create BlockArgs and GlobalParams for efficientnet model. 491 | 492 | Args: 493 | width_coefficient (float) 494 | depth_coefficient (float) 495 | image_size (int) 496 | dropout_rate (float) 497 | drop_connect_rate (float) 498 | num_classes (int) 499 | 500 | Meaning as the name suggests. 501 | 502 | Returns: 503 | blocks_args, global_params. 504 | """ 505 | 506 | # Blocks args for the whole model(efficientnet-b0 by default) 507 | # It will be modified in the construction of EfficientNet Class according to model 508 | blocks_args = [ 509 | 'r1_k3_s11_e1_i32_o16_se0.25', 510 | 'r2_k3_s22_e6_i16_o24_se0.25', 511 | 'r2_k5_s22_e6_i24_o40_se0.25', 512 | 'r3_k3_s22_e6_i40_o80_se0.25', 513 | 'r3_k5_s11_e6_i80_o112_se0.25', 514 | 'r4_k5_s22_e6_i112_o192_se0.25', 515 | 'r1_k3_s11_e6_i192_o320_se0.25', 516 | ] 517 | blocks_args = BlockDecoder.decode(blocks_args) 518 | 519 | global_params = GlobalParams( 520 | width_coefficient=width_coefficient, 521 | depth_coefficient=depth_coefficient, 522 | image_size=image_size, 523 | dropout_rate=dropout_rate, 524 | 525 | num_classes=num_classes, 526 | batch_norm_momentum=0.99, 527 | batch_norm_epsilon=1e-3, 528 | drop_connect_rate=drop_connect_rate, 529 | depth_divisor=8, 530 | min_depth=None, 531 | ) 532 | 533 | return blocks_args, global_params 534 | 535 | 536 | def get_model_params(model_name, override_params): 537 | """Get the block args and global params for a given model name. 538 | 539 | Args: 540 | model_name (str): Model's name. 541 | override_params (dict): A dict to modify global_params. 542 | 543 | Returns: 544 | blocks_args, global_params 545 | """ 546 | if model_name.startswith('efficientnet'): 547 | w, d, s, p = efficientnet_params(model_name) 548 | # note: all models have drop connect rate = 0.2 549 | blocks_args, global_params = efficientnet( 550 | width_coefficient=w, depth_coefficient=d, dropout_rate=p, image_size=s) 551 | else: 552 | raise NotImplementedError('model name is not pre-defined: %s' % model_name) 553 | if override_params: 554 | # ValueError will be raised here if override_params has fields not included in global_params. 555 | global_params = global_params._replace(**override_params) 556 | return blocks_args, global_params 557 | 558 | 559 | # train with Standard methods 560 | # check more details in paper(EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks) 561 | url_map = { 562 | 'efficientnet-b0': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b0-355c32eb.pth', 563 | 'efficientnet-b1': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b1-f1951068.pth', 564 | 'efficientnet-b2': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b2-8bb594d6.pth', 565 | 'efficientnet-b3': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b3-5fb5a3c3.pth', 566 | 'efficientnet-b4': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b4-6ed6700e.pth', 567 | 'efficientnet-b5': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b5-b6417697.pth', 568 | 'efficientnet-b6': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b6-c76e70fd.pth', 569 | 'efficientnet-b7': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b7-dcc49843.pth', 570 | } 571 | 572 | # train with Adversarial Examples(AdvProp) 573 | # check more details in paper(Adversarial Examples Improve Image Recognition) 574 | url_map_advprop = { 575 | 'efficientnet-b0': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b0-b64d5a18.pth', 576 | 'efficientnet-b1': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b1-0f3ce85a.pth', 577 | 'efficientnet-b2': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b2-6e9d97e5.pth', 578 | 'efficientnet-b3': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b3-cdd7c0f4.pth', 579 | 'efficientnet-b4': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b4-44fb3a87.pth', 580 | 'efficientnet-b5': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b5-86493f6b.pth', 581 | 'efficientnet-b6': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b6-ac80338e.pth', 582 | 'efficientnet-b7': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b7-4652b6dd.pth', 583 | 'efficientnet-b8': 'https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/adv-efficientnet-b8-22a8fe65.pth', 584 | } 585 | 586 | # TODO: add the petrained weights url map of 'efficientnet-l2' 587 | 588 | 589 | def load_pretrained_weights(model, model_name, weights_path=None, load_fc=True, advprop=False): 590 | """Loads pretrained weights from weights path or download using url. 591 | 592 | Args: 593 | model (Module): The whole model of efficientnet. 594 | model_name (str): Model name of efficientnet. 595 | weights_path (None or str): 596 | str: path to pretrained weights file on the local disk. 597 | None: use pretrained weights downloaded from the Internet. 598 | load_fc (bool): Whether to load pretrained weights for fc layer at the end of the model. 599 | advprop (bool): Whether to load pretrained weights 600 | trained with advprop (valid when weights_path is None). 601 | """ 602 | if isinstance(weights_path,str): 603 | state_dict = torch.load(weights_path) 604 | else: 605 | # AutoAugment or Advprop (different preprocessing) 606 | url_map_ = url_map_advprop if advprop else url_map 607 | state_dict = model_zoo.load_url(url_map_[model_name]) 608 | 609 | if load_fc: 610 | ret = model.load_state_dict(state_dict, strict=False) 611 | assert not ret.missing_keys, f'Missing keys when loading pretrained weights: {ret.missing_keys}' 612 | else: 613 | state_dict.pop('_fc.weight') 614 | state_dict.pop('_fc.bias') 615 | ret = model.load_state_dict(state_dict, strict=False) 616 | assert set(ret.missing_keys) == set( 617 | ['_fc.weight', '_fc.bias']), f'Missing keys when loading pretrained weights: {ret.missing_keys}' 618 | assert not ret.unexpected_keys, f'Missing keys when loading pretrained weights: {ret.unexpected_keys}' 619 | 620 | print('Loaded pretrained weights for {}'.format(model_name)) 621 | -------------------------------------------------------------------------------- /test_widerface.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import os 3 | import argparse 4 | import torch 5 | import torch.backends.cudnn as cudnn 6 | import numpy as np 7 | from data import cfg_mnetv1, cfg_mnetv2, cfg_mnetv3 8 | from layers.functions.prior_box import PriorBox 9 | from utils.nms.py_cpu_nms import py_cpu_nms 10 | import cv2 11 | from models.retinaface import RetinaFace 12 | from utils.box_utils import decode, decode_landm 13 | from utils.timer import Timer 14 | 15 | 16 | parser = argparse.ArgumentParser(description='Retinaface') 17 | parser.add_argument('-m', '--trained_model', default='./weights/mobilenetv2_0.1_Final.pth', 18 | type=str, help='Trained state_dict file path to open') 19 | parser.add_argument('--network', default='mobilenetv2', help='Backbone network mobilenetv1, mobilenetv2 or mobilenetv3') 20 | parser.add_argument('--origin_size', default=True, type=str, help='Whether use origin image size to evaluate') 21 | parser.add_argument('--save_folder', default='./widerface_evaluate/widerface_txt/', type=str, help='Dir to save txt results') 22 | parser.add_argument('--cpu', action="store_true", default=False, help='Use cpu inference') 23 | parser.add_argument('--dataset_folder', default='./data/widerface/val/images/', type=str, help='dataset path') 24 | parser.add_argument('--confidence_threshold', default=0.02, type=float, help='confidence_threshold') 25 | parser.add_argument('--top_k', default=5000, type=int, help='top_k') 26 | parser.add_argument('--nms_threshold', default=0.4, type=float, help='nms_threshold') 27 | parser.add_argument('--keep_top_k', default=750, type=int, help='keep_top_k') 28 | parser.add_argument('-s', '--save_image', action="store_true", default=False, help='show detection results') 29 | parser.add_argument('--vis_thres', default=0.5, type=float, help='visualization_threshold') 30 | args = parser.parse_args() 31 | 32 | 33 | def check_keys(model, pretrained_state_dict): 34 | ckpt_keys = set(pretrained_state_dict.keys()) 35 | model_keys = set(model.state_dict().keys()) 36 | used_pretrained_keys = model_keys & ckpt_keys 37 | unused_pretrained_keys = ckpt_keys - model_keys 38 | missing_keys = model_keys - ckpt_keys 39 | print('Missing keys:{}'.format(len(missing_keys))) 40 | print('Unused checkpoint keys:{}'.format(len(unused_pretrained_keys))) 41 | print('Used keys:{}'.format(len(used_pretrained_keys))) 42 | assert len(used_pretrained_keys) > 0, 'load NONE from pretrained checkpoint' 43 | return True 44 | 45 | 46 | def remove_prefix(state_dict, prefix): 47 | ''' Old style model is stored with all names of parameters sharing common prefix 'module.' ''' 48 | print('remove prefix \'{}\''.format(prefix)) 49 | f = lambda x: x.split(prefix, 1)[-1] if x.startswith(prefix) else x 50 | return {f(key): value for key, value in state_dict.items()} 51 | 52 | 53 | def load_model(model, pretrained_path, load_to_cpu): 54 | print('Loading pretrained model from {}'.format(pretrained_path)) 55 | if load_to_cpu: 56 | pretrained_dict = torch.load(pretrained_path, map_location=lambda storage, loc: storage) 57 | else: 58 | device = torch.cuda.current_device() 59 | pretrained_dict = torch.load(pretrained_path, map_location=lambda storage, loc: storage.cuda(device)) 60 | if "state_dict" in pretrained_dict.keys(): 61 | pretrained_dict = remove_prefix(pretrained_dict['state_dict'], 'module.') 62 | else: 63 | pretrained_dict = remove_prefix(pretrained_dict, 'module.') 64 | check_keys(model, pretrained_dict) 65 | model.load_state_dict(pretrained_dict, strict=False) 66 | return model 67 | 68 | 69 | if __name__ == '__main__': 70 | torch.set_grad_enabled(False) 71 | 72 | cfg = None 73 | if args.network == "mobilenetv1": 74 | cfg = cfg_mnetv1 75 | elif args.network == "mobilenetv2": 76 | cfg = cfg_mnetv2 77 | elif args.network == "mobilenetv3": 78 | cfg = cfg_mnetv3 79 | # net and model 80 | net = RetinaFace(cfg=cfg, phase = 'test') 81 | net = load_model(net, args.trained_model, args.cpu) 82 | net.eval() 83 | print('Finished loading model!') 84 | print(net) 85 | cudnn.benchmark = True 86 | device = torch.device("cpu" if args.cpu else "cuda") 87 | net = net.to(device) 88 | 89 | # testing dataset 90 | testset_folder = args.dataset_folder 91 | testset_list = args.dataset_folder[:-7] + "wider_val.txt" 92 | 93 | with open(testset_list, 'r') as fr: 94 | test_dataset = fr.read().split() 95 | num_images = len(test_dataset) 96 | 97 | _t = {'forward_pass': Timer(), 'misc': Timer()} 98 | 99 | # testing begin 100 | for i, img_name in enumerate(test_dataset): 101 | image_path = testset_folder + img_name 102 | img_raw = cv2.imread(image_path, cv2.IMREAD_COLOR) 103 | img = np.float32(img_raw) 104 | 105 | # testing scale 106 | target_size = 1600 107 | max_size = 2150 108 | im_shape = img.shape 109 | im_size_min = np.min(im_shape[0:2]) 110 | im_size_max = np.max(im_shape[0:2]) 111 | resize = float(target_size) / float(im_size_min) 112 | # prevent bigger axis from being more than max_size: 113 | if np.round(resize * im_size_max) > max_size: 114 | resize = float(max_size) / float(im_size_max) 115 | if args.origin_size: 116 | resize = 1 117 | 118 | if resize != 1: 119 | img = cv2.resize(img, None, None, fx=resize, fy=resize, interpolation=cv2.INTER_LINEAR) 120 | im_height, im_width, _ = img.shape 121 | scale = torch.Tensor([img.shape[1], img.shape[0], img.shape[1], img.shape[0]]) 122 | img -= (104, 117, 123) 123 | img = img.transpose(2, 0, 1) 124 | img = torch.from_numpy(img).unsqueeze(0) 125 | img = img.to(device) 126 | scale = scale.to(device) 127 | 128 | _t['forward_pass'].tic() 129 | loc, conf, landms = net(img) # forward pass 130 | _t['forward_pass'].toc() 131 | _t['misc'].tic() 132 | priorbox = PriorBox(cfg, image_size=(im_height, im_width)) 133 | priors = priorbox.forward() 134 | priors = priors.to(device) 135 | prior_data = priors.data 136 | boxes = decode(loc.data.squeeze(0), prior_data, cfg['variance']) 137 | boxes = boxes * scale / resize 138 | boxes = boxes.cpu().numpy() 139 | scores = conf.squeeze(0).data.cpu().numpy()[:, 1] 140 | landms = decode_landm(landms.data.squeeze(0), prior_data, cfg['variance']) 141 | scale1 = torch.Tensor([img.shape[3], img.shape[2], img.shape[3], img.shape[2], 142 | img.shape[3], img.shape[2], img.shape[3], img.shape[2], 143 | img.shape[3], img.shape[2]]) 144 | scale1 = scale1.to(device) 145 | landms = landms * scale1 / resize 146 | landms = landms.cpu().numpy() 147 | 148 | # ignore low scores 149 | inds = np.where(scores > args.confidence_threshold)[0] 150 | boxes = boxes[inds] 151 | landms = landms[inds] 152 | scores = scores[inds] 153 | 154 | # keep top-K before NMS 155 | order = scores.argsort()[::-1] 156 | # order = scores.argsort()[::-1][:args.top_k] 157 | boxes = boxes[order] 158 | landms = landms[order] 159 | scores = scores[order] 160 | 161 | # do NMS 162 | dets = np.hstack((boxes, scores[:, np.newaxis])).astype(np.float32, copy=False) 163 | keep = py_cpu_nms(dets, args.nms_threshold) 164 | # keep = nms(dets, args.nms_threshold,force_cpu=args.cpu) 165 | dets = dets[keep, :] 166 | landms = landms[keep] 167 | 168 | # keep top-K faster NMS 169 | # dets = dets[:args.keep_top_k, :] 170 | # landms = landms[:args.keep_top_k, :] 171 | 172 | dets = np.concatenate((dets, landms), axis=1) 173 | _t['misc'].toc() 174 | 175 | # -------------------------------------------------------------------- 176 | save_name = args.save_folder + img_name[:-4] + ".txt" 177 | dirname = os.path.dirname(save_name) 178 | if not os.path.isdir(dirname): 179 | os.makedirs(dirname) 180 | with open(save_name, "w") as fd: 181 | bboxs = dets 182 | file_name = os.path.basename(save_name)[:-4] + "\n" 183 | bboxs_num = str(len(bboxs)) + "\n" 184 | fd.write(file_name) 185 | fd.write(bboxs_num) 186 | for box in bboxs: 187 | x = int(box[0]) 188 | y = int(box[1]) 189 | w = int(box[2]) - int(box[0]) 190 | h = int(box[3]) - int(box[1]) 191 | confidence = str(box[4]) 192 | line = str(x) + " " + str(y) + " " + str(w) + " " + str(h) + " " + confidence + " \n" 193 | fd.write(line) 194 | 195 | print('im_detect: {:d}/{:d} forward_pass_time: {:.4f}s misc: {:.4f}s'.format(i + 1, num_images, _t['forward_pass'].average_time, _t['misc'].average_time)) 196 | 197 | # save image 198 | if args.save_image: 199 | for b in dets: 200 | if b[4] < args.vis_thres: 201 | continue 202 | text = "{:.4f}".format(b[4]) 203 | b = list(map(int, b)) 204 | cv2.rectangle(img_raw, (b[0], b[1]), (b[2], b[3]), (0, 0, 255), 2) 205 | cx = b[0] 206 | cy = b[1] + 12 207 | cv2.putText(img_raw, text, (cx, cy), 208 | cv2.FONT_HERSHEY_DUPLEX, 0.5, (255, 255, 255)) 209 | 210 | # landms 211 | cv2.circle(img_raw, (b[5], b[6]), 1, (0, 0, 255), 4) 212 | cv2.circle(img_raw, (b[7], b[8]), 1, (0, 255, 255), 4) 213 | cv2.circle(img_raw, (b[9], b[10]), 1, (255, 0, 255), 4) 214 | cv2.circle(img_raw, (b[11], b[12]), 1, (0, 255, 0), 4) 215 | cv2.circle(img_raw, (b[13], b[14]), 1, (255, 0, 0), 4) 216 | # save image 217 | if not os.path.exists("./results/"): 218 | os.makedirs("./results/") 219 | name = "./results/" + str(i) + ".jpg" 220 | cv2.imwrite(name, img_raw) 221 | 222 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import os 3 | import torch 4 | import torch.optim as optim 5 | import torch.backends.cudnn as cudnn 6 | import argparse 7 | import torch.utils.data as data 8 | from data import WiderFaceDetection, detection_collate, preproc, cfg_mnetv1, cfg_mnetv2, cfg_mnetv3, cfg_efnetb0 9 | from layers.modules import MultiBoxLoss 10 | from layers.functions.prior_box import PriorBox 11 | import time 12 | import datetime 13 | import math 14 | from models.retinaface import RetinaFace 15 | 16 | parser = argparse.ArgumentParser(description='Retinaface Training') 17 | parser.add_argument('--training_dataset', default='./data/widerface/train/label.txt', help='Training dataset directory') 18 | parser.add_argument('--network', default='efficientnetb0', help='Backbone network mobilenetv2, mobilenetv2, mobilenetv3, efficientnetb0') 19 | parser.add_argument('--num_workers', default=4, type=int, help='Number of workers used in dataloading') 20 | parser.add_argument('--lr', '--learning-rate', default=1e-3, type=float, help='initial learning rate') 21 | parser.add_argument('--momentum', default=0.9, type=float, help='momentum') 22 | parser.add_argument('--resume_net', default=None, help='resume net for retraining') 23 | parser.add_argument('--resume_epoch', default=0, type=int, help='resume iter for retraining') 24 | parser.add_argument('--weight_decay', default=5e-4, type=float, help='Weight decay for SGD') 25 | parser.add_argument('--gamma', default=0.1, type=float, help='Gamma update for SGD') 26 | parser.add_argument('--save_folder', default='./weights/', help='Location to save checkpoint models') 27 | 28 | args = parser.parse_args() 29 | 30 | if not os.path.exists(args.save_folder): 31 | os.mkdir(args.save_folder) 32 | cfg = None 33 | if args.network == "mobilenetv1": 34 | cfg = cfg_mnetv1 35 | elif args.network == "mobilenetv2": 36 | cfg = cfg_mnetv2 37 | elif args.network == "mobilenetv3": 38 | cfg = cfg_mnetv3 39 | elif args.network == "efficientnetb0": 40 | print("Loading Backbone EfficientNet-b0") 41 | cfg = cfg_efnetb0 42 | else: 43 | print("Light weight FaceDet only support mobilenet...") 44 | 45 | rgb_mean = (104, 117, 123) # bgr order 46 | num_classes = 2 47 | img_dim = cfg['image_size'] 48 | num_gpu = cfg['ngpu'] 49 | batch_size = cfg['batch_size'] 50 | max_epoch = cfg['epoch'] 51 | gpu_train = cfg['gpu_train'] 52 | 53 | num_workers = args.num_workers 54 | momentum = args.momentum 55 | weight_decay = args.weight_decay 56 | initial_lr = args.lr 57 | gamma = args.gamma 58 | training_dataset = args.training_dataset 59 | save_folder = args.save_folder 60 | 61 | net = RetinaFace(cfg=cfg) 62 | print("Printing net...") 63 | print(net) 64 | 65 | if args.resume_net is not None: 66 | print('Loading resume network...') 67 | state_dict = torch.load(args.resume_net) 68 | # create new OrderedDict that does not contain `module.` 69 | from collections import OrderedDict 70 | new_state_dict = OrderedDict() 71 | for k, v in state_dict.items(): 72 | head = k[:7] 73 | if head == 'module.': 74 | name = k[7:] # remove `module.` 75 | else: 76 | name = k 77 | new_state_dict[name] = v 78 | net.load_state_dict(new_state_dict) 79 | 80 | if num_gpu > 1 and gpu_train: 81 | net = torch.nn.DataParallel(net).cuda() 82 | else: 83 | net = net.cuda() 84 | 85 | cudnn.benchmark = True 86 | 87 | 88 | #optimizer = optim.SGD(net.parameters(), lr=initial_lr, momentum=momentum, weight_decay=weight_decay) 89 | optimizer = optim.AdamW(net.parameters(), lr=initial_lr, weight_decay=weight_decay) 90 | 91 | criterion = MultiBoxLoss(num_classes, 0.35, True, 0, True, 7, 0.35, False) 92 | 93 | priorbox = PriorBox(cfg, image_size=(img_dim, img_dim)) 94 | with torch.no_grad(): 95 | priors = priorbox.forward() 96 | priors = priors.cuda() 97 | 98 | def train(): 99 | net.train() 100 | epoch = 0 + args.resume_epoch 101 | print('Loading Dataset...') 102 | 103 | dataset = WiderFaceDetection( training_dataset,preproc(img_dim, rgb_mean)) 104 | 105 | epoch_size = math.ceil(len(dataset) / batch_size) 106 | max_iter = max_epoch * epoch_size 107 | 108 | stepvalues = (cfg['decay1'] * epoch_size, cfg['decay2'] * epoch_size) 109 | step_index = 0 110 | 111 | if args.resume_epoch > 0: 112 | start_iter = args.resume_epoch * epoch_size 113 | else: 114 | start_iter = 0 115 | 116 | for iteration in range(start_iter, max_iter): 117 | if iteration % epoch_size == 0: 118 | # create batch iterator 119 | batch_iterator = iter(data.DataLoader(dataset, batch_size, shuffle=True, num_workers=num_workers, collate_fn=detection_collate)) 120 | if (epoch % 10 == 0 and epoch > 0) or (epoch % 5 == 0 and epoch > cfg['decay1']): 121 | torch.save(net.state_dict(), save_folder + cfg['name']+ '_epoch_' + str(epoch) + '.pth') 122 | epoch += 1 123 | 124 | load_t0 = time.time() 125 | if iteration in stepvalues: 126 | step_index += 1 127 | lr = adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size) 128 | 129 | # load train data 130 | images, targets = next(batch_iterator) 131 | images = images.cuda() 132 | targets = [anno.cuda() for anno in targets] 133 | 134 | # forward 135 | out = net(images) 136 | 137 | # backprop 138 | optimizer.zero_grad() 139 | loss_l, loss_c, loss_landm = criterion(out, priors, targets) 140 | loss = cfg['loc_weight'] * loss_l + loss_c + loss_landm 141 | loss.backward() 142 | optimizer.step() 143 | load_t1 = time.time() 144 | batch_time = load_t1 - load_t0 145 | eta = int(batch_time * (max_iter - iteration)) 146 | print('Epoch:{}/{} || Epochiter: {}/{} || Iter: {}/{} || Loc: {:.4f} Cla: {:.4f} Landm: {:.4f} || LR: {:.8f} || Batchtime: {:.4f} s || ETA: {}' 147 | .format(epoch, max_epoch, (iteration % epoch_size) + 1, 148 | epoch_size, iteration + 1, max_iter, loss_l.item(), loss_c.item(), loss_landm.item(), lr, batch_time, str(datetime.timedelta(seconds=eta)))) 149 | 150 | torch.save(net.state_dict(), save_folder + cfg['name'] + '_Final.pth') 151 | # torch.save(net.state_dict(), save_folder + 'Final_Retinaface.pth') 152 | 153 | 154 | def adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size): 155 | """Sets the learning rate 156 | # Adapted from PyTorch Imagenet example: 157 | # https://github.com/pytorch/examples/blob/master/imagenet/main.py 158 | """ 159 | warmup_epoch = 5 160 | if epoch <= warmup_epoch: 161 | lr = 1e-6 + (initial_lr-1e-6) * iteration / (epoch_size * warmup_epoch) 162 | else: 163 | lr = initial_lr * (gamma ** (step_index)) 164 | for param_group in optimizer.param_groups: 165 | param_group['lr'] = lr 166 | return lr 167 | 168 | if __name__ == '__main__': 169 | train() 170 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/utils/__init__.py -------------------------------------------------------------------------------- /utils/box_utils.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import numpy as np 3 | 4 | 5 | def point_form(boxes): 6 | """ Convert prior_boxes to (xmin, ymin, xmax, ymax) 7 | representation for comparison to point form ground truth data. 8 | Args: 9 | boxes: (tensor) center-size default boxes from priorbox layers. 10 | Return: 11 | boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes. 12 | """ 13 | return torch.cat((boxes[:, :2] - boxes[:, 2:]/2, # xmin, ymin 14 | boxes[:, :2] + boxes[:, 2:]/2), 1) # xmax, ymax 15 | 16 | 17 | def center_size(boxes): 18 | """ Convert prior_boxes to (cx, cy, w, h) 19 | representation for comparison to center-size form ground truth data. 20 | Args: 21 | boxes: (tensor) point_form boxes 22 | Return: 23 | boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes. 24 | """ 25 | return torch.cat((boxes[:, 2:] + boxes[:, :2])/2, # cx, cy 26 | boxes[:, 2:] - boxes[:, :2], 1) # w, h 27 | 28 | 29 | def intersect(box_a, box_b): 30 | """ We resize both tensors to [A,B,2] without new malloc: 31 | [A,2] -> [A,1,2] -> [A,B,2] 32 | [B,2] -> [1,B,2] -> [A,B,2] 33 | Then we compute the area of intersect between box_a and box_b. 34 | Args: 35 | box_a: (tensor) bounding boxes, Shape: [A,4]. 36 | box_b: (tensor) bounding boxes, Shape: [B,4]. 37 | Return: 38 | (tensor) intersection area, Shape: [A,B]. 39 | """ 40 | A = box_a.size(0) 41 | B = box_b.size(0) 42 | max_xy = torch.min(box_a[:, 2:].unsqueeze(1).expand(A, B, 2), 43 | box_b[:, 2:].unsqueeze(0).expand(A, B, 2)) 44 | min_xy = torch.max(box_a[:, :2].unsqueeze(1).expand(A, B, 2), 45 | box_b[:, :2].unsqueeze(0).expand(A, B, 2)) 46 | inter = torch.clamp((max_xy - min_xy), min=0) 47 | return inter[:, :, 0] * inter[:, :, 1] 48 | 49 | 50 | def jaccard(box_a, box_b): 51 | """Compute the jaccard overlap of two sets of boxes. The jaccard overlap 52 | is simply the intersection over union of two boxes. Here we operate on 53 | ground truth boxes and default boxes. 54 | E.g.: 55 | A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B) 56 | Args: 57 | box_a: (tensor) Ground truth bounding boxes, Shape: [num_objects,4] 58 | box_b: (tensor) Prior boxes from priorbox layers, Shape: [num_priors,4] 59 | Return: 60 | jaccard overlap: (tensor) Shape: [box_a.size(0), box_b.size(0)] 61 | """ 62 | inter = intersect(box_a, box_b) 63 | area_a = ((box_a[:, 2]-box_a[:, 0]) * 64 | (box_a[:, 3]-box_a[:, 1])).unsqueeze(1).expand_as(inter) # [A,B] 65 | area_b = ((box_b[:, 2]-box_b[:, 0]) * 66 | (box_b[:, 3]-box_b[:, 1])).unsqueeze(0).expand_as(inter) # [A,B] 67 | union = area_a + area_b - inter 68 | return inter / union # [A,B] 69 | 70 | 71 | def matrix_iou(a, b): 72 | """ 73 | return iou of a and b, numpy version for data augenmentation 74 | """ 75 | lt = np.maximum(a[:, np.newaxis, :2], b[:, :2]) 76 | rb = np.minimum(a[:, np.newaxis, 2:], b[:, 2:]) 77 | 78 | area_i = np.prod(rb - lt, axis=2) * (lt < rb).all(axis=2) 79 | area_a = np.prod(a[:, 2:] - a[:, :2], axis=1) 80 | area_b = np.prod(b[:, 2:] - b[:, :2], axis=1) 81 | return area_i / (area_a[:, np.newaxis] + area_b - area_i) 82 | 83 | 84 | def matrix_iof(a, b): 85 | """ 86 | return iof of a and b, numpy version for data augenmentation 87 | """ 88 | lt = np.maximum(a[:, np.newaxis, :2], b[:, :2]) 89 | rb = np.minimum(a[:, np.newaxis, 2:], b[:, 2:]) 90 | 91 | area_i = np.prod(rb - lt, axis=2) * (lt < rb).all(axis=2) 92 | area_a = np.prod(a[:, 2:] - a[:, :2], axis=1) 93 | return area_i / np.maximum(area_a[:, np.newaxis], 1) 94 | 95 | 96 | def match(threshold, truths, priors, variances, labels, landms, loc_t, conf_t, landm_t, idx): 97 | """Match each prior box with the ground truth box of the highest jaccard 98 | overlap, encode the bounding boxes, then return the matched indices 99 | corresponding to both confidence and location preds. 100 | Args: 101 | threshold: (float) The overlap threshold used when mathing boxes. 102 | truths: (tensor) Ground truth boxes, Shape: [num_obj, 4]. 103 | priors: (tensor) Prior boxes from priorbox layers, Shape: [n_priors,4]. 104 | variances: (tensor) Variances corresponding to each prior coord, 105 | Shape: [num_priors, 4]. 106 | labels: (tensor) All the class labels for the image, Shape: [num_obj]. 107 | landms: (tensor) Ground truth landms, Shape [num_obj, 10]. 108 | loc_t: (tensor) Tensor to be filled w/ endcoded location targets. 109 | conf_t: (tensor) Tensor to be filled w/ matched indices for conf preds. 110 | landm_t: (tensor) Tensor to be filled w/ endcoded landm targets. 111 | idx: (int) current batch index 112 | Return: 113 | The matched indices corresponding to 1)location 2)confidence 3)landm preds. 114 | """ 115 | # jaccard index 116 | overlaps = jaccard( 117 | truths, 118 | point_form(priors) 119 | ) 120 | # (Bipartite Matching) 121 | # [1,num_objects] best prior for each ground truth 122 | best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True) 123 | 124 | # ignore hard gt 125 | valid_gt_idx = best_prior_overlap[:, 0] >= 0.2 126 | best_prior_idx_filter = best_prior_idx[valid_gt_idx, :] 127 | if best_prior_idx_filter.shape[0] <= 0: 128 | loc_t[idx] = 0 129 | conf_t[idx] = 0 130 | return 131 | 132 | # [1,num_priors] best ground truth for each prior 133 | best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True) 134 | best_truth_idx.squeeze_(0) 135 | best_truth_overlap.squeeze_(0) 136 | best_prior_idx.squeeze_(1) 137 | best_prior_idx_filter.squeeze_(1) 138 | best_prior_overlap.squeeze_(1) 139 | best_truth_overlap.index_fill_(0, best_prior_idx_filter, 2) # ensure best prior 140 | # TODO refactor: index best_prior_idx with long tensor 141 | # ensure every gt matches with its prior of max overlap 142 | for j in range(best_prior_idx.size(0)): # 判别此anchor是预测哪一个boxes 143 | best_truth_idx[best_prior_idx[j]] = j 144 | matches = truths[best_truth_idx] # Shape: [num_priors,4] 此处为每一个anchor对应的bbox取出来 145 | conf = labels[best_truth_idx] # Shape: [num_priors] 此处为每一个anchor对应的label取出来 146 | conf[best_truth_overlap < threshold] = 0 # label as background overlap<0.35的全部作为负样本 147 | loc = encode(matches, priors, variances) 148 | 149 | matches_landm = landms[best_truth_idx] 150 | landm = encode_landm(matches_landm, priors, variances) 151 | loc_t[idx] = loc # [num_priors,4] encoded offsets to learn 152 | conf_t[idx] = conf # [num_priors] top class label for each prior 153 | landm_t[idx] = landm 154 | 155 | 156 | def encode(matched, priors, variances): 157 | """Encode the variances from the priorbox layers into the ground truth boxes 158 | we have matched (based on jaccard overlap) with the prior boxes. 159 | Args: 160 | matched: (tensor) Coords of ground truth for each prior in point-form 161 | Shape: [num_priors, 4]. 162 | priors: (tensor) Prior boxes in center-offset form 163 | Shape: [num_priors,4]. 164 | variances: (list[float]) Variances of priorboxes 165 | Return: 166 | encoded boxes (tensor), Shape: [num_priors, 4] 167 | """ 168 | 169 | # dist b/t match center and prior's center 170 | g_cxcy = (matched[:, :2] + matched[:, 2:])/2 - priors[:, :2] 171 | # encode variance 172 | g_cxcy /= (variances[0] * priors[:, 2:]) 173 | # match wh / prior wh 174 | g_wh = (matched[:, 2:] - matched[:, :2]) / priors[:, 2:] 175 | g_wh = torch.log(g_wh) / variances[1] 176 | # return target for smooth_l1_loss 177 | return torch.cat([g_cxcy, g_wh], 1) # [num_priors,4] 178 | 179 | def encode_landm(matched, priors, variances): 180 | """Encode the variances from the priorbox layers into the ground truth boxes 181 | we have matched (based on jaccard overlap) with the prior boxes. 182 | Args: 183 | matched: (tensor) Coords of ground truth for each prior in point-form 184 | Shape: [num_priors, 10]. 185 | priors: (tensor) Prior boxes in center-offset form 186 | Shape: [num_priors,4]. 187 | variances: (list[float]) Variances of priorboxes 188 | Return: 189 | encoded landm (tensor), Shape: [num_priors, 10] 190 | """ 191 | 192 | # dist b/t match center and prior's center 193 | matched = torch.reshape(matched, (matched.size(0), 5, 2)) 194 | priors_cx = priors[:, 0].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2) 195 | priors_cy = priors[:, 1].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2) 196 | priors_w = priors[:, 2].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2) 197 | priors_h = priors[:, 3].unsqueeze(1).expand(matched.size(0), 5).unsqueeze(2) 198 | priors = torch.cat([priors_cx, priors_cy, priors_w, priors_h], dim=2) 199 | g_cxcy = matched[:, :, :2] - priors[:, :, :2] 200 | # encode variance 201 | g_cxcy /= (variances[0] * priors[:, :, 2:]) 202 | # g_cxcy /= priors[:, :, 2:] 203 | g_cxcy = g_cxcy.reshape(g_cxcy.size(0), -1) 204 | # return target for smooth_l1_loss 205 | return g_cxcy 206 | 207 | 208 | # Adapted from https://github.com/Hakuyume/chainer-ssd 209 | def decode(loc, priors, variances): 210 | """Decode locations from predictions using priors to undo 211 | the encoding we did for offset regression at train time. 212 | Args: 213 | loc (tensor): location predictions for loc layers, 214 | Shape: [num_priors,4] 215 | priors (tensor): Prior boxes in center-offset form. 216 | Shape: [num_priors,4]. 217 | variances: (list[float]) Variances of priorboxes 218 | Return: 219 | decoded bounding box predictions 220 | """ 221 | 222 | boxes = torch.cat(( 223 | priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:], 224 | priors[:, 2:] * torch.exp(loc[:, 2:] * variances[1])), 1) 225 | boxes[:, :2] -= boxes[:, 2:] / 2 226 | boxes[:, 2:] += boxes[:, :2] 227 | return boxes 228 | 229 | def decode_landm(pre, priors, variances): 230 | """Decode landm from predictions using priors to undo 231 | the encoding we did for offset regression at train time. 232 | Args: 233 | pre (tensor): landm predictions for loc layers, 234 | Shape: [num_priors,10] 235 | priors (tensor): Prior boxes in center-offset form. 236 | Shape: [num_priors,4]. 237 | variances: (list[float]) Variances of priorboxes 238 | Return: 239 | decoded landm predictions 240 | """ 241 | landms = torch.cat((priors[:, :2] + pre[:, :2] * variances[0] * priors[:, 2:], 242 | priors[:, :2] + pre[:, 2:4] * variances[0] * priors[:, 2:], 243 | priors[:, :2] + pre[:, 4:6] * variances[0] * priors[:, 2:], 244 | priors[:, :2] + pre[:, 6:8] * variances[0] * priors[:, 2:], 245 | priors[:, :2] + pre[:, 8:10] * variances[0] * priors[:, 2:], 246 | ), dim=1) 247 | return landms 248 | 249 | 250 | def log_sum_exp(x): 251 | """Utility function for computing log_sum_exp while determining 252 | This will be used to determine unaveraged confidence loss across 253 | all examples in a batch. 254 | Args: 255 | x (Variable(tensor)): conf_preds from conf layers 256 | """ 257 | x_max = x.data.max() 258 | return torch.log(torch.sum(torch.exp(x-x_max), 1, keepdim=True)) + x_max 259 | 260 | 261 | # Original author: Francisco Massa: 262 | # https://github.com/fmassa/object-detection.torch 263 | # Ported to PyTorch by Max deGroot (02/01/2017) 264 | def nms(boxes, scores, overlap=0.5, top_k=200): 265 | """Apply non-maximum suppression at test time to avoid detecting too many 266 | overlapping bounding boxes for a given object. 267 | Args: 268 | boxes: (tensor) The location preds for the img, Shape: [num_priors,4]. 269 | scores: (tensor) The class predscores for the img, Shape:[num_priors]. 270 | overlap: (float) The overlap thresh for suppressing unnecessary boxes. 271 | top_k: (int) The Maximum number of box preds to consider. 272 | Return: 273 | The indices of the kept boxes with respect to num_priors. 274 | """ 275 | 276 | keep = torch.Tensor(scores.size(0)).fill_(0).long() 277 | if boxes.numel() == 0: 278 | return keep 279 | x1 = boxes[:, 0] 280 | y1 = boxes[:, 1] 281 | x2 = boxes[:, 2] 282 | y2 = boxes[:, 3] 283 | area = torch.mul(x2 - x1, y2 - y1) 284 | v, idx = scores.sort(0) # sort in ascending order 285 | # I = I[v >= 0.01] 286 | idx = idx[-top_k:] # indices of the top-k largest vals 287 | xx1 = boxes.new() 288 | yy1 = boxes.new() 289 | xx2 = boxes.new() 290 | yy2 = boxes.new() 291 | w = boxes.new() 292 | h = boxes.new() 293 | 294 | # keep = torch.Tensor() 295 | count = 0 296 | while idx.numel() > 0: 297 | i = idx[-1] # index of current largest val 298 | # keep.append(i) 299 | keep[count] = i 300 | count += 1 301 | if idx.size(0) == 1: 302 | break 303 | idx = idx[:-1] # remove kept element from view 304 | # load bboxes of next highest vals 305 | torch.index_select(x1, 0, idx, out=xx1) 306 | torch.index_select(y1, 0, idx, out=yy1) 307 | torch.index_select(x2, 0, idx, out=xx2) 308 | torch.index_select(y2, 0, idx, out=yy2) 309 | # store element-wise max with next highest score 310 | xx1 = torch.clamp(xx1, min=x1[i]) 311 | yy1 = torch.clamp(yy1, min=y1[i]) 312 | xx2 = torch.clamp(xx2, max=x2[i]) 313 | yy2 = torch.clamp(yy2, max=y2[i]) 314 | w.resize_as_(xx2) 315 | h.resize_as_(yy2) 316 | w = xx2 - xx1 317 | h = yy2 - yy1 318 | # check sizes of xx1 and xx2.. after each iteration 319 | w = torch.clamp(w, min=0.0) 320 | h = torch.clamp(h, min=0.0) 321 | inter = w*h 322 | # IoU = i / (area(a) + area(b) - i) 323 | rem_areas = torch.index_select(area, 0, idx) # load remaining areas) 324 | union = (rem_areas - inter) + area[i] 325 | IoU = inter/union # store result in iou 326 | # keep only elements with an IoU <= overlap 327 | idx = idx[IoU.le(overlap)] 328 | return keep, count 329 | 330 | 331 | -------------------------------------------------------------------------------- /utils/nms/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/utils/nms/__init__.py -------------------------------------------------------------------------------- /utils/nms/py_cpu_nms.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | 10 | def py_cpu_nms(dets, thresh): 11 | """Pure Python NMS baseline.""" 12 | x1 = dets[:, 0] 13 | y1 = dets[:, 1] 14 | x2 = dets[:, 2] 15 | y2 = dets[:, 3] 16 | scores = dets[:, 4] 17 | 18 | areas = (x2 - x1 + 1) * (y2 - y1 + 1) 19 | order = scores.argsort()[::-1] 20 | 21 | keep = [] 22 | while order.size > 0: 23 | i = order[0] 24 | keep.append(i) 25 | xx1 = np.maximum(x1[i], x1[order[1:]]) 26 | yy1 = np.maximum(y1[i], y1[order[1:]]) 27 | xx2 = np.minimum(x2[i], x2[order[1:]]) 28 | yy2 = np.minimum(y2[i], y2[order[1:]]) 29 | 30 | w = np.maximum(0.0, xx2 - xx1 + 1) 31 | h = np.maximum(0.0, yy2 - yy1 + 1) 32 | inter = w * h 33 | ovr = inter / (areas[i] + areas[order[1:]] - inter) 34 | 35 | inds = np.where(ovr <= thresh)[0] 36 | order = order[inds + 1] 37 | 38 | return keep 39 | -------------------------------------------------------------------------------- /utils/timer.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import time 9 | 10 | 11 | class Timer(object): 12 | """A simple timer.""" 13 | def __init__(self): 14 | self.total_time = 0. 15 | self.calls = 0 16 | self.start_time = 0. 17 | self.diff = 0. 18 | self.average_time = 0. 19 | 20 | def tic(self): 21 | # using time.time instead of time.clock because time time.clock 22 | # does not normalize for multithreading 23 | self.start_time = time.time() 24 | 25 | def toc(self, average=True): 26 | self.diff = time.time() - self.start_time 27 | self.total_time += self.diff 28 | self.calls += 1 29 | self.average_time = self.total_time / self.calls 30 | if average: 31 | return self.average_time 32 | else: 33 | return self.diff 34 | 35 | def clear(self): 36 | self.total_time = 0. 37 | self.calls = 0 38 | self.start_time = 0. 39 | self.diff = 0. 40 | self.average_time = 0. 41 | -------------------------------------------------------------------------------- /widerface_evaluate/README.md: -------------------------------------------------------------------------------- 1 | # WiderFace-Evaluation 2 | Python Evaluation Code for [Wider Face Dataset](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/) 3 | 4 | 5 | ## Usage 6 | 7 | 8 | ##### before evaluating .... 9 | 10 | ```` 11 | python3 setup.py build_ext --inplace 12 | ```` 13 | 14 | ##### evaluating 15 | 16 | **GroungTruth:** `wider_face_val.mat`, `wider_easy_val.mat`, `wider_medium_val.mat`,`wider_hard_val.mat` 17 | 18 | ```` 19 | python3 evaluation.py -p -g 20 | ```` 21 | 22 | ## Bugs & Problems 23 | please issue 24 | 25 | ## Acknowledgements 26 | 27 | some code borrowed from Sergey Karayev 28 | -------------------------------------------------------------------------------- /widerface_evaluate/box_overlaps.pyx: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Sergey Karayev 6 | # -------------------------------------------------------- 7 | 8 | cimport cython 9 | import numpy as np 10 | cimport numpy as np 11 | 12 | DTYPE = np.float 13 | ctypedef np.float_t DTYPE_t 14 | 15 | def bbox_overlaps( 16 | np.ndarray[DTYPE_t, ndim=2] boxes, 17 | np.ndarray[DTYPE_t, ndim=2] query_boxes): 18 | """ 19 | Parameters 20 | ---------- 21 | boxes: (N, 4) ndarray of float 22 | query_boxes: (K, 4) ndarray of float 23 | Returns 24 | ------- 25 | overlaps: (N, K) ndarray of overlap between boxes and query_boxes 26 | """ 27 | cdef unsigned int N = boxes.shape[0] 28 | cdef unsigned int K = query_boxes.shape[0] 29 | cdef np.ndarray[DTYPE_t, ndim=2] overlaps = np.zeros((N, K), dtype=DTYPE) 30 | cdef DTYPE_t iw, ih, box_area 31 | cdef DTYPE_t ua 32 | cdef unsigned int k, n 33 | for k in range(K): 34 | box_area = ( 35 | (query_boxes[k, 2] - query_boxes[k, 0] + 1) * 36 | (query_boxes[k, 3] - query_boxes[k, 1] + 1) 37 | ) 38 | for n in range(N): 39 | iw = ( 40 | min(boxes[n, 2], query_boxes[k, 2]) - 41 | max(boxes[n, 0], query_boxes[k, 0]) + 1 42 | ) 43 | if iw > 0: 44 | ih = ( 45 | min(boxes[n, 3], query_boxes[k, 3]) - 46 | max(boxes[n, 1], query_boxes[k, 1]) + 1 47 | ) 48 | if ih > 0: 49 | ua = float( 50 | (boxes[n, 2] - boxes[n, 0] + 1) * 51 | (boxes[n, 3] - boxes[n, 1] + 1) + 52 | box_area - iw * ih 53 | ) 54 | overlaps[n, k] = iw * ih / ua 55 | return overlaps -------------------------------------------------------------------------------- /widerface_evaluate/evaluation.py: -------------------------------------------------------------------------------- 1 | """ 2 | WiderFace evaluation code 3 | author: wondervictor 4 | mail: tianhengcheng@gmail.com 5 | copyright@wondervictor 6 | """ 7 | 8 | import os 9 | import tqdm 10 | import pickle 11 | import argparse 12 | import numpy as np 13 | from scipy.io import loadmat 14 | from bbox import bbox_overlaps 15 | from IPython import embed 16 | 17 | 18 | def get_gt_boxes(gt_dir): 19 | """ gt dir: (wider_face_val.mat, wider_easy_val.mat, wider_medium_val.mat, wider_hard_val.mat)""" 20 | 21 | gt_mat = loadmat(os.path.join(gt_dir, 'wider_face_val.mat')) 22 | hard_mat = loadmat(os.path.join(gt_dir, 'wider_hard_val.mat')) 23 | medium_mat = loadmat(os.path.join(gt_dir, 'wider_medium_val.mat')) 24 | easy_mat = loadmat(os.path.join(gt_dir, 'wider_easy_val.mat')) 25 | 26 | facebox_list = gt_mat['face_bbx_list'] 27 | event_list = gt_mat['event_list'] 28 | file_list = gt_mat['file_list'] 29 | 30 | hard_gt_list = hard_mat['gt_list'] 31 | medium_gt_list = medium_mat['gt_list'] 32 | easy_gt_list = easy_mat['gt_list'] 33 | 34 | return facebox_list, event_list, file_list, hard_gt_list, medium_gt_list, easy_gt_list 35 | 36 | 37 | def get_gt_boxes_from_txt(gt_path, cache_dir): 38 | 39 | cache_file = os.path.join(cache_dir, 'gt_cache.pkl') 40 | if os.path.exists(cache_file): 41 | f = open(cache_file, 'rb') 42 | boxes = pickle.load(f) 43 | f.close() 44 | return boxes 45 | 46 | f = open(gt_path, 'r') 47 | state = 0 48 | lines = f.readlines() 49 | lines = list(map(lambda x: x.rstrip('\r\n'), lines)) 50 | boxes = {} 51 | print(len(lines)) 52 | f.close() 53 | current_boxes = [] 54 | current_name = None 55 | for line in lines: 56 | if state == 0 and '--' in line: 57 | state = 1 58 | current_name = line 59 | continue 60 | if state == 1: 61 | state = 2 62 | continue 63 | 64 | if state == 2 and '--' in line: 65 | state = 1 66 | boxes[current_name] = np.array(current_boxes).astype('float32') 67 | current_name = line 68 | current_boxes = [] 69 | continue 70 | 71 | if state == 2: 72 | box = [float(x) for x in line.split(' ')[:4]] 73 | current_boxes.append(box) 74 | continue 75 | 76 | f = open(cache_file, 'wb') 77 | pickle.dump(boxes, f) 78 | f.close() 79 | return boxes 80 | 81 | 82 | def read_pred_file(filepath): 83 | 84 | with open(filepath, 'r') as f: 85 | lines = f.readlines() 86 | img_file = lines[0].rstrip('\n\r') 87 | lines = lines[2:] 88 | 89 | # b = lines[0].rstrip('\r\n').split(' ')[:-1] 90 | # c = float(b) 91 | # a = map(lambda x: [[float(a[0]), float(a[1]), float(a[2]), float(a[3]), float(a[4])] for a in x.rstrip('\r\n').split(' ')], lines) 92 | boxes = [] 93 | for line in lines: 94 | line = line.rstrip('\r\n').split(' ') 95 | if line[0] is '': 96 | continue 97 | # a = float(line[4]) 98 | boxes.append([float(line[0]), float(line[1]), float(line[2]), float(line[3]), float(line[4])]) 99 | boxes = np.array(boxes) 100 | # boxes = np.array(list(map(lambda x: [float(a) for a in x.rstrip('\r\n').split(' ')], lines))).astype('float') 101 | return img_file.split('/')[-1], boxes 102 | 103 | 104 | def get_preds(pred_dir): 105 | events = os.listdir(pred_dir) 106 | boxes = dict() 107 | pbar = tqdm.tqdm(events) 108 | 109 | for event in pbar: 110 | pbar.set_description('Reading Predictions ') 111 | event_dir = os.path.join(pred_dir, event) 112 | event_images = os.listdir(event_dir) 113 | current_event = dict() 114 | for imgtxt in event_images: 115 | imgname, _boxes = read_pred_file(os.path.join(event_dir, imgtxt)) 116 | current_event[imgname.rstrip('.jpg')] = _boxes 117 | boxes[event] = current_event 118 | return boxes 119 | 120 | 121 | def norm_score(pred): 122 | """ norm score 123 | pred {key: [[x1,y1,x2,y2,s]]} 124 | """ 125 | 126 | max_score = 0 127 | min_score = 1 128 | 129 | for _, k in pred.items(): 130 | for _, v in k.items(): 131 | if len(v) == 0: 132 | continue 133 | _min = np.min(v[:, -1]) 134 | _max = np.max(v[:, -1]) 135 | max_score = max(_max, max_score) 136 | min_score = min(_min, min_score) 137 | 138 | diff = max_score - min_score 139 | for _, k in pred.items(): 140 | for _, v in k.items(): 141 | if len(v) == 0: 142 | continue 143 | v[:, -1] = (v[:, -1] - min_score)/diff 144 | 145 | 146 | def image_eval(pred, gt, ignore, iou_thresh): 147 | """ single image evaluation 148 | pred: Nx5 149 | gt: Nx4 150 | ignore: 151 | """ 152 | 153 | _pred = pred.copy() 154 | _gt = gt.copy() 155 | pred_recall = np.zeros(_pred.shape[0]) 156 | recall_list = np.zeros(_gt.shape[0]) 157 | proposal_list = np.ones(_pred.shape[0]) 158 | 159 | _pred[:, 2] = _pred[:, 2] + _pred[:, 0] 160 | _pred[:, 3] = _pred[:, 3] + _pred[:, 1] 161 | _gt[:, 2] = _gt[:, 2] + _gt[:, 0] 162 | _gt[:, 3] = _gt[:, 3] + _gt[:, 1] 163 | 164 | overlaps = bbox_overlaps(_pred[:, :4], _gt) 165 | 166 | for h in range(_pred.shape[0]): 167 | 168 | gt_overlap = overlaps[h] 169 | max_overlap, max_idx = gt_overlap.max(), gt_overlap.argmax() 170 | if max_overlap >= iou_thresh: 171 | if ignore[max_idx] == 0: 172 | recall_list[max_idx] = -1 173 | proposal_list[h] = -1 174 | elif recall_list[max_idx] == 0: 175 | recall_list[max_idx] = 1 176 | 177 | r_keep_index = np.where(recall_list == 1)[0] 178 | pred_recall[h] = len(r_keep_index) 179 | return pred_recall, proposal_list 180 | 181 | 182 | def img_pr_info(thresh_num, pred_info, proposal_list, pred_recall): 183 | pr_info = np.zeros((thresh_num, 2)).astype('float') 184 | for t in range(thresh_num): 185 | 186 | thresh = 1 - (t+1)/thresh_num 187 | r_index = np.where(pred_info[:, 4] >= thresh)[0] 188 | if len(r_index) == 0: 189 | pr_info[t, 0] = 0 190 | pr_info[t, 1] = 0 191 | else: 192 | r_index = r_index[-1] 193 | p_index = np.where(proposal_list[:r_index+1] == 1)[0] 194 | pr_info[t, 0] = len(p_index) 195 | pr_info[t, 1] = pred_recall[r_index] 196 | return pr_info 197 | 198 | 199 | def dataset_pr_info(thresh_num, pr_curve, count_face): 200 | _pr_curve = np.zeros((thresh_num, 2)) 201 | for i in range(thresh_num): 202 | _pr_curve[i, 0] = pr_curve[i, 1] / pr_curve[i, 0] 203 | _pr_curve[i, 1] = pr_curve[i, 1] / count_face 204 | return _pr_curve 205 | 206 | 207 | def voc_ap(rec, prec): 208 | 209 | # correct AP calculation 210 | # first append sentinel values at the end 211 | mrec = np.concatenate(([0.], rec, [1.])) 212 | mpre = np.concatenate(([0.], prec, [0.])) 213 | 214 | # compute the precision envelope 215 | for i in range(mpre.size - 1, 0, -1): 216 | mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i]) 217 | 218 | # to calculate area under PR curve, look for points 219 | # where X axis (recall) changes value 220 | i = np.where(mrec[1:] != mrec[:-1])[0] 221 | 222 | # and sum (\Delta recall) * prec 223 | ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) 224 | return ap 225 | 226 | 227 | def evaluation(pred, gt_path, iou_thresh=0.5): 228 | pred = get_preds(pred) 229 | norm_score(pred) 230 | facebox_list, event_list, file_list, hard_gt_list, medium_gt_list, easy_gt_list = get_gt_boxes(gt_path) 231 | event_num = len(event_list) 232 | thresh_num = 1000 233 | settings = ['easy', 'medium', 'hard'] 234 | setting_gts = [easy_gt_list, medium_gt_list, hard_gt_list] 235 | aps = [] 236 | for setting_id in range(3): 237 | # different setting 238 | gt_list = setting_gts[setting_id] 239 | count_face = 0 240 | pr_curve = np.zeros((thresh_num, 2)).astype('float') 241 | # [hard, medium, easy] 242 | pbar = tqdm.tqdm(range(event_num)) 243 | for i in pbar: 244 | pbar.set_description('Processing {}'.format(settings[setting_id])) 245 | event_name = str(event_list[i][0][0]) 246 | img_list = file_list[i][0] 247 | pred_list = pred[event_name] 248 | sub_gt_list = gt_list[i][0] 249 | # img_pr_info_list = np.zeros((len(img_list), thresh_num, 2)) 250 | gt_bbx_list = facebox_list[i][0] 251 | 252 | for j in range(len(img_list)): 253 | pred_info = pred_list[str(img_list[j][0][0])] 254 | 255 | gt_boxes = gt_bbx_list[j][0].astype('float') 256 | keep_index = sub_gt_list[j][0] 257 | count_face += len(keep_index) 258 | 259 | if len(gt_boxes) == 0 or len(pred_info) == 0: 260 | continue 261 | ignore = np.zeros(gt_boxes.shape[0]) 262 | if len(keep_index) != 0: 263 | ignore[keep_index-1] = 1 264 | pred_recall, proposal_list = image_eval(pred_info, gt_boxes, ignore, iou_thresh) 265 | 266 | _img_pr_info = img_pr_info(thresh_num, pred_info, proposal_list, pred_recall) 267 | 268 | pr_curve += _img_pr_info 269 | pr_curve = dataset_pr_info(thresh_num, pr_curve, count_face) 270 | 271 | propose = pr_curve[:, 0] 272 | recall = pr_curve[:, 1] 273 | 274 | ap = voc_ap(recall, propose) 275 | aps.append(ap) 276 | 277 | print("==================== Results ====================") 278 | print("Easy Val AP: {}".format(aps[0])) 279 | print("Medium Val AP: {}".format(aps[1])) 280 | print("Hard Val AP: {}".format(aps[2])) 281 | print("=================================================") 282 | 283 | 284 | if __name__ == '__main__': 285 | 286 | parser = argparse.ArgumentParser() 287 | parser.add_argument('-p', '--pred', default="./widerface_txt/") 288 | parser.add_argument('-g', '--gt', default='./ground_truth/') 289 | 290 | args = parser.parse_args() 291 | evaluation(args.pred, args.gt) 292 | 293 | 294 | 295 | 296 | 297 | 298 | 299 | 300 | 301 | 302 | 303 | 304 | -------------------------------------------------------------------------------- /widerface_evaluate/ground_truth/wider_easy_val.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/widerface_evaluate/ground_truth/wider_easy_val.mat -------------------------------------------------------------------------------- /widerface_evaluate/ground_truth/wider_face_val.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/widerface_evaluate/ground_truth/wider_face_val.mat -------------------------------------------------------------------------------- /widerface_evaluate/ground_truth/wider_hard_val.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/widerface_evaluate/ground_truth/wider_hard_val.mat -------------------------------------------------------------------------------- /widerface_evaluate/ground_truth/wider_medium_val.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/midasklr/LightWeightFaceDetector/96100289d74b143f04a4a8e61550d9c505900aca/widerface_evaluate/ground_truth/wider_medium_val.mat -------------------------------------------------------------------------------- /widerface_evaluate/setup.py: -------------------------------------------------------------------------------- 1 | """ 2 | WiderFace evaluation code 3 | author: wondervictor 4 | mail: tianhengcheng@gmail.com 5 | copyright@wondervictor 6 | """ 7 | 8 | from distutils.core import setup, Extension 9 | from Cython.Build import cythonize 10 | import numpy 11 | 12 | package = Extension('bbox', ['box_overlaps.pyx'], include_dirs=[numpy.get_include()]) 13 | setup(ext_modules=cythonize([package])) 14 | --------------------------------------------------------------------------------